Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5612
Julie A. Jacko (Ed.)
Human-Computer Interaction Ambient, Ubiquitous and Intelligent Interaction 13th International Conference, HCI International 2009 San Diego, CA, USA, July 19-24, 2009 Proceedings, Part III
13
Volume Editor Julie A. Jacko University of Minnesota Institute of Health Informatics MMC 912, 420 Delaware Street S.E., Minneapolis, MN 55455, USA E-mail:
[email protected]
Library of Congress Control Number: 2009929048 CR Subject Classification (1998): H.5, I.3, I.7.5, I.5, I.2.10 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13
0302-9743 3-642-02579-X Springer Berlin Heidelberg New York 978-3-642-02579-2 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12707225 06/3180 543210
Foreword
The 13th International Conference on Human–Computer Interaction, HCI International 2009, was held in San Diego, California, USA, July 19–24, 2009, jointly with the Symposium on Human Interface (Japan) 2009, the 8th International Conference on Engineering Psychology and Cognitive Ergonomics, the 5th International Conference on Universal Access in Human–Computer Interaction, the Third International Conference on Virtual and Mixed Reality, the Third International Conference on Internationalization, Design and Global Development, the Third International Conference on Online Communities and Social Computing, the 5th International Conference on Augmented Cognition, the Second International Conference on Digital Human Modeling, and the First International Conference on Human Centered Design. A total of 4,348 individuals from academia, research institutes, industry and governmental agencies from 73 countries submitted contributions, and 1,397 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of the design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human–computer interaction, addressing major advances in the knowledge and effective use of computers in a variety of application areas. This volume, edited by Julie A. Jacko, contains papers in the thematic area of Human–Computer Interaction, addressing the following major topics: • • • • • •
Mobile Interaction In-vehicle Interaction and Environment Navigation Agents, Avatars and Personalization Ambient Interaction Affect, Emotion and Engagement Smart and Wearable Materials and Devices
The remaining volumes of the HCI International 2009 proceedings are: • • • • •
Volume 1, LNCS 5610, Human–Computer Interaction––New Trends (Part I), edited by Julie A. Jacko Volume 2, LNCS 5611, Human–Computer Interaction––Novel Interaction Methods and Techniques (Part II), edited by Julie A. Jacko Volume 4, LNCS 5613, Human–Computer Interaction––Interacting in Various Application Domains (Part IV), edited by Julie A. Jacko Volume 5, LNCS 5614, Universal Access in Human–Computer Interaction––Addressing Diversity (Part I), edited by Constantine Stephanidis Volume 6, LNCS 5615, Universal Access in Human–Computer Interaction––Intelligent and Ubiquitous Interaction Environments (Part II), edited by Constantine Stephanidis
VI
Foreword
• • • • • • • • • • •
Volume 7, LNCS 5616, Universal Access in Human–Computer Interaction––Applications and Services (Part III), edited by Constantine Stephanidis Volume 8, LNCS 5617, Human Interface and the Management of Information––Designing Information Environments (Part I), edited by Michael J. Smith and Gavriel Salvendy Volume 9, LNCS 5618, Human Interface and the Management of Information––Information and Interaction (Part II), edited by Gavriel Salvendy and Michael J. Smith Volume 10, LNCS 5619, Human Centered Design, edited by Masaaki Kurosu Volume 11, LNCS 5620, Digital Human Modeling, edited by Vincent G. Duffy Volume 12, LNCS 5621, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris Volume 13, LNCS 5622, Virtual and Mixed Reality, edited by Randall Shumaker Volume 14, LNCS 5623, Internationalization, Design and Global Development, edited by Nuray Aykin Volume 15, LNCS 5624, Ergonomics and Health Aspects of Work with Computers, edited by Ben-Tzion Karsh Volume 16, LNAI 5638, The Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience, edited by Dylan Schmorrow, Ivy Estabrooke and Marc Grootjen Volume 17, LNAI 5639, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris
I would like to thank the Program Chairs and the members of the Program Boards of all thematic areas, listed below, for their contribution to the highest scientific quality and the overall success of HCI International 2009.
Ergonomics and Health Aspects of Work with Computers Program Chair: Ben-Tzion Karsh Arne Aarås, Norway Pascale Carayon, USA Barbara G.F. Cohen, USA Wolfgang Friesdorf, Germany John Gosbee, USA Martin Helander, Singapore Ed Israelski, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Kari Lindström, Finland
Holger Luczak, Germany Aura C. Matias, Philippines Kyung (Ken) Park, Korea Michelle M. Robertson, USA Michelle L. Rogers, USA Steven L. Sauter, USA Dominique L. Scapin, France Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK Teresa Zayas-Cabán, USA
Foreword
Human Interface and the Management of Information Program Chair: Michael J. Smith Gunilla Bradley, Sweden Hans-Jörg Bullinger, Germany Alan Chan, Hong Kong Klaus-Peter Fähnrich, Germany Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA Youngho Rhee, Korea
Anxo Cereijo Roibás, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P.R. China Bernhard Zimolong, Germany
Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven Landry, USA
Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Ron Laughery, USA Wen-Chin Li, Taiwan James T. Luxhøj, USA
Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK
VII
VIII
Foreword
Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa
Matthew J.W. Thomas, Australia Mark Young, UK
Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth André, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian Bühler, Germany Noelle Carbonell, France Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Arthur I. Karshmer, USA Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA
Patrick M. Langdon, UK Seongil Lee, Korea Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Toshiki Yamaoka, Japan Panayiotis Zaphiris, UK
Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA
Gordon M. Mair, UK Miguel A. Otaduy, Switzerland David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Dieter Schmalstieg, Austria Dylan Schmorrow, USA Mark Wiederhold, USA
Internationalization, Design and Global Development Program Chair: Nuray Aykin Michael L. Best, USA Ram Bishu, USA Alan Chan, Hong Kong Andy M. Dearden, UK
Susan M. Dray, USA Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA
Foreword
Sung H. Han, Korea Veikko Ikonen, Finland Esin Kiris, USA Masaaki Kurosu, Japan Apala Lahiri Chavan, USA James R. Lewis, USA Ann Light, UK James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA
Elizabeth D. Mynatt, USA Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin Röse, Germany Eunice Ratna Sari, Indonesia Supriya Singh, Australia Christian Sturm, Spain Adi Tedjasaputra, Singapore Kentaro Toyama, India Alvin W. Yeo, Malaysia Chen Zhao, P.R. China Wei Zhou, P.R. China
Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Amy Bruckman, USA Peter Day, UK Fiorella De Cindio, Italy Michael Gurstein, Canada Tom Horan, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Jonathan Lazar, USA Stefanie Lindstaedt, Austria
Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan Anthony F. Norcio, USA Jennifer Preece, USA Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Sergei Stafeev, Russia Charalambos Vrasidas, Cyprus Cheng-Yen Wang, Taiwan
Augmented Cognition Program Chair: Dylan D. Schmorrow Andy Bellenkes, USA Andrew Belyavin, UK Joseph Cohn, USA Martha E. Crosby, USA Tjerk de Greef, The Netherlands Blair Dickson, UK Traci Downs, USA Julie Drexler, USA Ivy Estabrooke, USA Cali Fidopiastis, USA Chris Forsythe, USA Wai Tat Fu, USA Henry Girolamo, USA
Marc Grootjen, The Netherlands Taro Kanno, Japan Wilhelm E. Kincses, Germany David Kobus, USA Santosh Mathan, USA Rob Matthews, Australia Dennis McBride, USA Robert McCann, USA Jeff Morrison, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Glenn Osga, USA
IX
X
Foreword
Dennis Proffitt, USA Leah Reeves, USA Mike Russo, USA Kay Stanney, USA Roy Stripling, USA Mike Swetnam, USA Rob Taylor, UK
Maria L.Thomas, USA Peter-Paul van Maanen, The Netherlands Karl van Orden, USA Roman Vilimek, Germany Glenn Wilson, USA Thorsten Zander, Germany
Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Thomas J. Armstrong, USA Norm Badler, USA Kathryn Cormican, Ireland Afzal Godil, USA Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Sung H. Han, Korea Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Tianzi Jiang, P.R. China
Kang Li, USA Zhizhong Li, P.R. China Timo J. Määttä, Finland Woojin Park, USA Matthew Parkinson, USA Jim Potvin, Canada Rajesh Subramanian, USA Xuguang Wang, France John F. Wiechel, USA Jingzhou (James) Yang, USA Xiu-gan Yuan, P.R. China
Human Centered Design Program Chair: Masaaki Kurosu Gerhard Fischer, USA Tom Gross, Germany Naotake Hirasawa, Japan Yasuhiro Horibe, Japan Minna Isomursu, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan
Kun-Pyo Lee, Korea Loïc Martínez-Normand, Spain Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Kazuhiko Yamazaki, Japan
In addition to the members of the Program Boards above, I also wish to thank the following volunteer external reviewers: Gavin Lew from the USA, Daniel Su from the UK, and Ilia Adami, Ioannis Basdekis, Yannis Georgalis, Panagiotis Karampelas, Iosif Klironomos, Alexandros Mourouzis, and Stavroula Ntoa from Greece. This conference could not have been possible without the continuous support and advice of the Conference Scientific Advisor, Prof. Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications Chair and Editor of HCI International News, Abbas Moallem.
Foreword
XI
I would also like to thank for their contribution toward the organization of the HCI International 2009 conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, and Maria Bouhli. Constantine Stephanidis
HCI International 2011
The 14th International Conference on Human–Computer Interaction, HCI International 2011, will be held jointly with the affiliated conferences in the summer of 2011. It will cover a broad spectrum of themes related to human–computer interaction, including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. More information about the topics, as well as the venue and dates of the conference, will be announced through the HCI International Conference series website: http://www.hci-international.org/
General Chair Professor Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email:
[email protected]
Table of Contents
Part I: Mobile Interaction BigKey: A Virtual Keyboard for Mobile Devices . . . . . . . . . . . . . . . . . . . . . Khaldoun Al Faraj, Mustapha Mojahid, and Nadine Vigouroux
3
TringIt: Easy Triggering of Web Actions from a Phone . . . . . . . . . . . . . . . . Vinod Anupam
11
Context Awareness and Perceived Interactivity in Multimedia Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiao Dong and Pei-Luen Patrick Rau
21
Human Computer Interaction with a PIM Application: Merging Activity, Location and Social Setting into Context . . . . . . . . . . . . . . . . . . . Tor-Morten Grønli and Gheorghita Ghinea
30
CLURD: A New Character-Inputting System Using One 5-Way Key Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyunjin Ji and Taeyong Kim
39
Menu Design in Cell Phones: Use of 3D Menus . . . . . . . . . . . . . . . . . . . . . . Kyungdoh Kim, Robert W. Proctor, and Gavriel Salvendy
48
Mobile Interfaces in Tangible Mnemonics Interaction . . . . . . . . . . . . . . . . . Thorsten Mahler, Marc Hermann, and Michael Weber
58
Understanding the Relationship between Requirements and Context Elements in Mobile Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Ochoa, Rosa Alarcon, and Luis Guerrero Continuous User Interfaces for Seamless Task Migration . . . . . . . . . . . . . . Pardha S. Pyla, Manas Tungare, Jerome Holman, and Manuel A. P´erez-Qui˜ nones A Study of Information Retrieval of En Route Display of Fire Information on PDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weina Qu, Xianghong Sun, Thomas Plocher, and Li Wang A Mobile and Desktop Application for Enhancing Group Awareness in Knowledge Work Teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timo Saari, Kari Kallinen, Mikko Salminen, Niklas Ravaja, and Marco Rapino A Study of Fire Information Detection on PDA Device . . . . . . . . . . . . . . . Xianghong Sun, Weina Qu, Thomas Plocher, and Li Wang
67 77
86
95
105
XVI
Table of Contents
Empirical Comparison of Task Completion Time between Mobile Phone Models with Matched Interaction Sequences . . . . . . . . . . . . . . . . . . . Shunsuke Suzuki, Yusuke Nakao, Toshiyuki Asahi, Victoria Bellotti, Nick Yee, and Shin’ichi Fukuzumi
114
Part II: In-Vehicle Interaction and Environment Navigation Nine Assistant Guiding Methods in Subway Design – A Research of Shanghai Subway Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linong Dai
125
Pull and Push: Proximity-Aware User Interface for Navigating in 3D Space Using a Handheld Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mingming Fan and Yuanchun Shi
133
A Study on the Design of Voice Navigation of Car Navigation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chih-Fu Wu, Wan-Fu Huang, and Tung-Chen Wu
141
Front Environment Recognition of Personal Vehicle Using the Image Sensor and Acceleration Sensors for Everyday Computing . . . . . . . . . . . . . Takahiro Matsui, Takeshi Imanaka, and Yasuyuki Kono
151
Common Interaction Schemes for In-Vehicle User-Interfaces . . . . . . . . . . . Simon Nestler, Marcus T¨ onnis, and Gudrun Klinker
159
Dynamic Maps for Future Navigation Systems: Agile Design Exploration of User Interface Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volker Paelke and Karsten Nebe
169
Flight Searching – A Comparison of Two User-Interface Design Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antti Pirhonen and Niko Kotilainen
179
Agent-Based Driver Abnormality Estimation . . . . . . . . . . . . . . . . . . . . . . . . Tony Poitschke, Florian Laquai, and Gerhard Rigoll Enhancing the Accessibility of Maps with Personal Frames of Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Falko Schmid
189
199
Augmented Interaction and Visualization in the Automotive Domain . . . Roland Spies, Markus Ablaßmeier, Heiner Bubb, and Werner Hamberger
211
Proposal of a Direction Guidance System for Evacuation . . . . . . . . . . . . . . Chikamune Wada, Yu Yoneda, and Yukinobu Sugimura
221
Table of Contents
A Virtual Environment for Learning Aiport Emergency Management Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telmo Zarraonandia, Mario Rafael Ruiz Vargas, Paloma D´ıaz, and Ignacio Aedo
XVII
228
Part III: Agents, Avatars and Personalisation User Profiling for Web Search Based on Biological Fluctuation . . . . . . . . . Yuki Arase, Takahiro Hara, and Shojiro Nishio Expression of Personality through Avatars: Analysis of Effects of Gender and Race on Perceptions of Personality . . . . . . . . . . . . . . . . . . . . . . Jennifer Cloud-Buckner, Michael Sellick, Bhanuteja Sainathuni, Betty Yang, and Jennie Gallimore User-Definable Rule Description Framework for Autonomous Actor Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Narichika Hamaguichi, Hiroyuki Kaneko, Mamoru Doke, and Seiki Inoue Cognitive and Emotional Characteristics of Communication in Human-Human/Human-Agent Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . Yugo Hayashi and Kazuhisa Miwa
239
248
257
267
Identification of the User by Analyzing Human Computer Interaction . . . R¨ udiger Heimg¨ artner
275
The Anticipation of Human Behavior Using “Parasitic Humanoid” . . . . . Hiroyuki Iizuka, Hideyuki Ando, and Taro Maeda
284
Modeling Personal Preferences on Commodities by Behavior Log Analysis with Ubiquitous Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naoki Imamura, Akihiro Ogino, and Toshikazu Kato A System to Construct an Interest Model of User Based on Information in Browsed Web Page by User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kosuke Kawazu, Masakazu Murao, Takeru Ohta, Masayoshi Mase, and Takashi Maeno
294
304
Adaptive User Interfaces for the Clothing Retail . . . . . . . . . . . . . . . . . . . . . Karim Khakzar, Jonas George, and Rainer Blum
314
Implementing Affect Parameters in Personalized Web-Based Design . . . . Zacharias Lekkas, Nikos Tsianos, Panagiotis Germanakos, Constantinos Mourlas, and George Samaras
320
XVIII
Table of Contents
Modeling of User Interest Based on Its Interaction with a Collaborative Knowledge Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaime Moreno-Llorena, Xavier Alam´ an Rold´ an, and Ruth Cobos Perez
330
Some Pitfalls for Developing Enculturated Conversational Agents . . . . . . Matthias Rehm, Elisabeth Andr´e, and Yukiko Nakano
340
Comparison of Different Talking Heads in Non-Interactive Settings . . . . . Benjamin Weiss, Christine K¨ uhnel, Ina Wechsung, Sebastian M¨ oller, and Sascha Fagel
349
Video Content Production Support System with Speech-Driven Embodied Entrainment Character by Speech and Hand Motion Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michiya Yamamoto, Kouzi Osaki, and Tomio Watanabe Autonomous Turn-Taking Agent System Based on Behavior Model . . . . . Masahide Yuasa, Hiroko Tokunaga, and Naoki Mukawa
358
368
Part IV: Ambient Interaction An Interoperable Concept for Controlling Smart Homes – The ASK-IT Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evangelos Bekiaris, Kostas Kalogirou, Alexandros Mourouzis, and Mary Panou Towards Ambient Augmented Reality with Tangible Interfaces . . . . . . . . . Mark Billinghurst, Rapha¨el Grasset, Hartmut Seichter, and Andreas D¨ unser Rapid Prototyping of an AmI-Augmented Office Environment Demonstrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimitris Grammenos, Yannis Georgalis, Nikolaos Partarakis, Xenophon Zabulis, Thomas Sarmis, Sokratis Kartakis, Panagiotis Tourlakis, Antonis Argyros, and Constantine Stephanidis Challenges for User Centered Smart Environments . . . . . . . . . . . . . . . . . . . Fabian Hermann, Roland Blach, Doris Janssen, Thorsten Klein, Andreas Schuller, and Dieter Spath Point and Control: The Intuitive Method to Control Multi-device with Single Remote Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sung Soo Hong and Ju Il Eom New Integrated Framework for Video Based Moving Object Tracking . . . Md. Zahidul Islam, Chi-Min Oh, and Chil-Woo Lee
377
387
397
407
416
423
Table of Contents
XIX
Object Scanning Using a Sensor Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Soonmook Jeong, Taehoun Song, Gihoon Go, Keyho Kwon, and Jaewook Jeon
433
Mixed Realities – Virtual Object Lessons . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Kratky
440
New Human-Computer Interactions Using Tangible Objects: Application on a Digital Tabletop with RFID Technology . . . . . . . . . . . . . S´ebastien Kubicki, Sophie Lepreux, Yoann Lebrun, Philippe Dos Santos, Christophe Kolski, and Jean Caelen
446
Context-Aware Cognitive Agent Architecture for Ambient User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youngho Lee, Choonsung Shin, and Woontack Woo
456
An Embodied Approach for Engaged Interaction in Ubiquitous Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark O. Millard and Firat Soylu
464
Generic Framework for Transforming Everyday Objects into Interactive Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elena Mugellini, Omar Abou Khaled, St´ephane Pierroz, Stefano Carrino, and Houda Chabbi Drissi mæve – An Interactive Tabletop Installation for Exploring Background Information in Exhibitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Till Nagel, Larissa Pschetz, Moritz Stefaner, Matina Halkia, and Boris M¨ uller
473
483
Relationality Design toward Enriched Communications . . . . . . . . . . . . . . . Yukiko Nakano, Masao Morizane, Ivan Tanev, and Katsunori Shimohara
492
Ultra Compact Laser Based Projectors and Imagers . . . . . . . . . . . . . . . . . . Harald Schenk, Thilo Sandner, Christian Drabe, Michael Scholles, Klaus Frommhagen, Christian Gerwig, and Hubert Lakner
501
Understanding the Older User of Ambient Technologies . . . . . . . . . . . . . . . Andrew Sixsmith
511
Multi-pointing Method Using a Desk Lamp and Single Camera for Effective Human-Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Taehoun Song, Thien Cong Pham, Soonmook Jung, Jihwan Park, Keyho Kwon, and Jaewook Jeon Communication Grill/Salon: Hybrid Physical/Digital Artifacts for Stimulating Spontaneous Real World Communication . . . . . . . . . . . . . . . . . Koh Sueda, Koji Ishii, Takashi Miyaki, and Jun Rekimoto
520
526
XX
Table of Contents
Motion Capture System Using an Optical Resolver . . . . . . . . . . . . . . . . . . . Takuji Tokiwa, Masashi Yoshidzumi, Hideaki Nii, Maki Sugimoto, and Masahiko Inami
536
The Effects of an Anti-glare Sleeve Installed on Fluorescent Tube Lamps on Glare and Reading Comfort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shiaw-Tsyr Uang, Cheng-Li Liu, and Mali Chang
544
Electromyography Focused on Passiveness and Activeness in Embodied Interaction: Toward a Novel Interface for Co-creating Expressive Body Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takabumi Watanabe, Norikazu Matsushima, Ryutaro Seto, Hiroko Nishi, and Yoshiyuki Miwa
554
Part V: Affect, Emotion and Engagement An Integrated Approach to Emotion Recognition for Advanced Emotional Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Panagiotis D. Bamidis, Christos A. Frantzidis, Evdokimos I. Konstantinidis, Andrej Luneski, Chrysa Lithari, Manousos A. Klados, Charalambos Bratsas, Christos L. Papadelis, and Costas Pappas Addressing the Interplay of Culture and Affect in HCI: An Ontological Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emmanuel G. Blanchard, Riichiro Mizoguchi, and Susanne P. Lajoie Love at First Encounter – Start-Up of New Applications . . . . . . . . . . . . . . Henning Breuer, Marlene Kettner, Matthias Wagler, Nathalie Preuschen, and Fee Steinhoff
565
575
585
Responding to Learners’ Cognitive-Affective States with Supportive and Shakeup Dialogues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sidney D‘Mello, Scotty Craig, Karl Fike, and Arthur Graesser
595
Trust in Online Technology: Towards Practical Guidelines Based on Experimentally Verified Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Detweiler and Joost Broekens
605
Influence of User Experience on Affectiveness . . . . . . . . . . . . . . . . . . . . . . . . Ryoko Fukuda
615
A Human-Centered Model for Detecting Technology Engagement . . . . . . James Glasnapp and Oliver Brdiczka
621
Relationship Learning Software: Design and Assessment . . . . . . . . . . . . . . Kyla A. McMullen and Gregory H. Wakefield
631
Table of Contents
XXI
Relationship Enhancer: Interactive Recipe in Kitchen Island . . . . . . . . . . . Tsai-Yun Mou, Tay-Sheng Jeng, and Chun-Heng Ho
641
ConvoCons: Encouraging Affinity on Multitouch Interfaces . . . . . . . . . . . . Michael A. Oren and Stephen B. Gilbert
651
Development of an Emotional Interface for Sustainable Water Consumption in the Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehdi Ravandi, Jon Mok, and Mark Chignell Influences of Telops on Television Audiences’ Interpretation . . . . . . . . . . . Hidetsugu Suto, Hiroshi Kawakami, and Osamu Katai
660 670
Extracting High-Order Aesthetic and Affective Components from Composer’s Writings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akifumi Tokosumi and Hajime Murai
679
Affective Technology, Affective Management, towards Affective Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroyuki Umemuro
683
Bio-sensing for Emotional Characterization without Word Labels . . . . . . Tessa Verhoef, Christine Lisetti, Armando Barreto, Francisco Ortega, Tijn van der Zant, and Fokie Cnossen An Affect-Sensitive Social Interaction Paradigm Utilizing Virtual Reality Environments for Autism Intervention . . . . . . . . . . . . . . . . . . . . . . . Karla Conn Welch, Uttama Lahiri, Changchun Liu, Rebecca Weller, Nilanjan Sarkar, and Zachary Warren Recognizing and Responding to Student Affect . . . . . . . . . . . . . . . . . . . . . . Beverly Woolf, Toby Dragon, Ivon Arroyo, David Cooper, Winslow Burleson, and Kasia Muldner
693
703
713
Part 6: Smart and Wearable Materials and Devices Usability Studies on Sensor Smart Clothing . . . . . . . . . . . . . . . . . . . . . . . . . Haeng Suk Chae, Woon Jung Cho, Soo Hyun Kim, and Kwang Hee Han Considering Personal Profiles for Comfortable and Efficient Interactions with Smart Clothes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S´ebastien Duval, Christian Hoareau, and Gilsoo Cho Interaction Wearable Computer with Networked Virtual Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiung-yao Huang, Ming-Chih Tung, Huan-Chao Keh, Ji-jen Wu, Kun-Hang Lee, and Chung-Hsien Tsai
725
731
741
XXII
Table of Contents
The Impact of Different Visual Feedback Presentation Methods in a Wearable Computing Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hendrik Iben, Hendrik Witt, and Ernesto Morales Kluge
752
Gold Coating of a Plastic Optical Fiber Based on PMMA . . . . . . . . . . . . . Seok Min Kim, Sung Hun Kim, Eun Ju Park, Dong Lyun Cho, and Moo Sung Lee
760
Standardization for Smart Clothing Technology . . . . . . . . . . . . . . . . . . . . . . Kwangil Lee and Yong Gu Ji
768
Wearable ECG Monitoring System Using Conductive Fabrics and Active Electrodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Su Ho Lee, Seok Myung Jung, Chung Ki Lee, Kee Sam Jeong, Gilsoo Cho, and Sun K. Yoo Establishing a Measurement System for Human Motions Using a Textile-Based Motion Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moonsoo Sung, Keesam Jeong, and Gilsoo Cho
778
784
A Context-Aware AR Navigation System Using Wearable Sensors . . . . . . Daisuke Takada, Takefumi Ogawa, Kiyoshi Kiyokawa, and Haruo Takemura
793
Emotional Smart Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akira Wakita, Midori Shibutani, and Kohei Tsuji
802
Novel Stretchable Textile-Based Transmission Bands: Electrical Performance and Appearance after Abrasion/Laundering, and Wearability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoonjung Yang and Gilsoo Cho Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
806
815
BigKey: A Virtual Keyboard for Mobile Devices Khaldoun Al Faraj, Mustapha Mojahid, and Nadine Vigouroux University of Toulouse, IRIT, 31062 Toulouse, France {alfaraj,mojahid,vigourou}@irit.fr
Abstract. This paper describes BigKey virtual keyboard for mobile devices, designed to make the keys of virtual keyboard easier to acquire. The tiny size of keys makes efficient selection difficult. To overcome this drawback, we propose to expand key size that corresponds to next character entry. The proposed solution helps to facilitate the selection task by expanding the next entry. Moreover the prediction system reduces the visual scanning time to find letters that one is looking for. Users’ performance study showed that they were 25.14% faster and more accurate with BigKey virtual keyboard than with normal virtual keyboard. Keywords: Virtual keyboard, text input, PDAs, expanding targets, letter prediction.
1 Introduction An efficient text entry method for mobile devices is becoming one of the most prominent challenges in the world of mobile computing. The shrinking size of handheld devices has resulted in a keyboard that is not as convenient as that of desktop computer. Personnel Digital Assistants (PDAs) and smart phones equipped with a touch screen and a stylus, in general, tend to have alternative text input techniques like handwriting recognition technology and virtual keyboard on the screen. Handwriting recognition systems largely help to overcome the screen-space constraints of mobile computing products. However, user learning of making character strokes is required and that is not always easy, especially for novice users. Virtual keyboard (sometimes called on-screen keyboard) is a reproduction of a hardware keyboard on the screen of computing devices. It was originally designed for people with disabilities to access to computers and for some special needs as well. Also the mobile devices equipped with a touch screen and a stylus, have used it as another text entry solution. The virtual keyboard of handheld device has less number of keys (60) as compared to desktop keyboard (105). It was reduced by modes of labeled keys to enter numbers and special characters. However, the accurate selection of smaller keys still remains difficult. It requires a great amount of attention as user also has to focus on what he/she is writing. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 3–10, 2009. © Springer-Verlag Berlin Heidelberg 2009
4
K. Al Faraj, M. Mojahid, and N. Vigouroux
A combination of gesture and virtual keyboard has been constructed like Quickwriting [1] and Shark [2]. The main goal of this concept is to allow word level entry with keyboard. Although many investigations have been devoted to find the most efficient text entry method for mobile devices by changing character arrangement and key shapes of virtual keyboard [3] [4]. Less attention has been paid to the key size that we consider as an essential element of efficient mobile text inputting. In this paper, we discuss our BigKey virtual keyboard principle that aims to optimize user’s performance. Subsequently, we will present our preliminary results obtained from a formal study of the proposal.
2 BigKey Virtual Keyboard Users of PDAs and smart phone feel difficulty in using the tiny size of keys while using virtual keyboard for small screen. Our primary design objective is to break this main obstacle that limits user’s performance. The principle of BigKey virtual keyboard is the expanding key size corresponding to the next character entry. This system is primarily designed for the mobile devices such as Ultra Mobile PC (UMPC), PDAs, smart phones and so forth, using their virtual keyboard. However this design is applicable to any computing device supporting the same input pattern. McGuffin and Balakrishnan have proposed an interface design of one-dimensional arrays of widgets consisting of button strip [5]. It is based on expanding a target when the pointer approaches it to facilitate its selection. To solve the drawback of a sideways motion resulting from expanding target, some overlapping between adjacent buttons is allowed. This principle is much more similar to that was proposed by Dasher [6]. Cirrin is a continuous stylus-based text entry technique [7]. The letters of English alphabet are arranged on the circumference of a circle by using the common sequences in English to minimize stylus travel. A word is entered by pressing and moving the stylus over the letters. Expanding targets idea explained above has been applied to Cirrin in order to improve user’s performance [8]. The experimental results indicate that Cirrin is more fast but less accurate. This is due to the occlusion and overlapping between neighboring buttons. Another application of expanding targets principle to virtual keyboard with QWERTY layout, called Fisheye, has been proposed [9]. The aim is always to make selection task easier for PDAs. The character is selected by lifting up the stylus when it is over the key. While analyzing the selection model, we noticed that the successive selection of characters on touch-screen with stylus is accomplished by using three-dimensional mode (3D mode). In other words, user has to lift the stylus between every selection of character (sequence of stylus up and stylus down); hence the stylus moves in threedimensional space. Whereas using the mouse as a pointing device makes selection task in two-dimensional mode (2D mode). Note that the pointing device used to evaluate the measure of performance of expanding targets design mentioned earlier was the mouse.
BigKey: A Virtual Keyboard for Mobile Devices
5
Considering the 3D mode, expanding targets when the stylus is close to them is rarely occurred because of the third dimension (lifting up the stylus). In other words, the expansion of target is mostly occurred when the stylus is over the target. In this case, there is no advantage to expand this key as the stylus is exactly over the key and user can select it if it is expanded or not. Moreover, with each animation of Fisheye keyboard, user has to re-compute the target’s position that makes it more difficult to acquire and more attention is required. According to Fitts’ law [10], the time “MT” to acquire a target of width “W” which lies at a distance or amplitude “A” is given by the relationship [11]:
M MT = a + b log 2 (
A + 1) W
(1)
Where a and b are constants determined through linear regression. The logarithmic term is called the index of difficulty (ID) and is measured in “bits”. As character selection is made from key-to-key, according to Fitt’s law, larger the size of the next key to be selected, shorter the time will be required to acquire it. Our BigKey system is based on two processes: first is to predict the next character entry based on the users previous input, second is to expand the corresponding predicted keys. The keys expand as a function of their letter probability of entry: the most probable the next letter, the largest the key is. To build our prediction system, we have employed tables of single letter and diagram frequency Counts proposed by [12]. Four expanded keys are maximum number used in our BigKey implementation as shown in figure 1.
Fig. 1. BigKey virtual keyboard while selecting the next entry for word “the”
The BigKey system offers two main advantages. On the one hand, it facilitates character selection according to Fitt’s law. On the other hand, it reduces the time of keyboard visual scanning to find the letters that one is looking for, especially for novice users.
6
K. Al Faraj, M. Mojahid, and N. Vigouroux
It could also be effective for the people with motor impairments who need to reduce the acquisition target time. Furthermore, this would help immensely persons with Alzheimer’s disease. As a result our design model makes the selection task easier through expanding the targets of next entry when the stylus is over the previous entry. The regular QWERTY keyboard layout is used as for the user’s familiarity with its layout. Moreover it was originally designed to keep commonly used letter combination farther from each other. In this way the overlapping between expanded keys is not occurred. Unlike hardware keyboard where one can type over by using several fingers. While using a virtual keyboard, one has to carry out all movements by using only a pointing device. In our BigKey proposed solution, the expanded keys can be considered as the most probable keys that user always puts his fingers over. Comparing the most probable next letter predictions for each letter with others, an intersection is mostly detected especially for vowels which are typically proposed as predictions. In hardware keyboard, the most accessible keys are those that are under the fingers. Similarly in our Bigkey system, the expanded keys are those that are the most accessible like the initial fingers configuration, even if the user does not type them when the prediction does not give the desired result.
3 Experiment The aim of this study is to verify our hypothesis: the proposed animation of virtual keyboard had significant effects on user performance for text entry task. 3.1 Subjects Nine volunteers (3 female) from our university campus participated in this study. Users average 27.11 years of age (ranging from 24 to 33 years). They were novice stylus-based text input users. All users had normal or corrected eyesight and all were right-handed to use the stylus as a pointing device. 3.2 Apparatus Users conducted the study on a Sony VAIO UMPC with a 1.2GHz Core Solo processor and 512MB of RAM. The display size was 4.5" SVGA TFT and ran at a resolution of 600 × 800. The pointing device used in our experiment was the stylus. The experiment included the three following virtual keyboards: The virtual keyboard without expanded keys (No-BigKey), the BigKey virtual keyboard with one expanded key (One-BigKey), and the BigKey virtual keyboard with four expanded keys (Four-BigKey). Table 1 shows the key size used in our study, the normal key had the same size of that is used for PDA virtual keyboard. Besides the key of the first most probable letter had the same size of that is used for the virtual keyboard of desktop computer. For each virtual keyboard, the program reads a series of 10 phrases ranging from 16 to 43 characters [13]. All virtual keyboards are built in .NET C#.
BigKey: A Virtual Keyboard for Mobile Devices
7
Table 1. The key size of BigKey virtual keyboard
Key
Size
Normal
18 × 16 pixels
The first most probable letter
26 × 24 pixels (+ 44.44 %)
The second most probable letter
24 × 22 pixels (+ 33.33 %)
The third most probable letter
22 × 20 pixels (+ 22.22 %)
The fourth most probable letter
20 × 18 pixels (+ 11.11 %)
3.3 Procedure The experiment consisted of two parts: training session followed by testing session. The first session consisted of entering the sentence “the quick brown fox jumps over the lazy dog” by using Four-BigKey virtual keyboard. Then, in the testing session each participant completed three sentence tasks using three virtual keyboards. Participants were divided into three-person groups to perform tasks in a different order. The first performed the experiment in following order: No-BigKey, OneBigKey and then Four-BigKey. While the second followed the order: One-BigKey, Four-BigKey and then No-BigKey. The succession of tasks for the third group was as following: Four-BigKey, One-BigKey and then No-BigKey. In this way, the task order had no impact on the results. The same phrases were used for all tasks but were in a different order for each one so that user cannot anticipate the phrase with the other tasks. Participants were instructed to enter the phrases “as quickly and accurately as possible”. They could make errors and corrections.
4 Results and Discussion In order to evaluate the efficiency of text entry technique, two essential metrics are available up till now. The text entry speed expressed in words per minute (wpm) or in characters per second (cps), and the accuracy during and after the text entry task. 4.1 Text Entry Speed The analysis of entry speed yields a significant result in favor of Four-BigKey. Comparing three virtual keyboards, the No-BigKey speed was the slowest 20.84 wpm, the OneBigKey was faster 23.66 wpm, and the Four-BigKey was the fastest 26.08 wpm, as shown in figure 2. The average improvement of speed is 25.14 % with Four-BigKey.
8
K. Al Faraj, M. Mojahid, and N. Vigouroux
This study shows that the fastest text entry speed was achieved with increasing the number of expanded keys. As letter prediction does not always give the intended result, increasing the number of expanded keys over one, was necessary. The question that remains still to be answered: What is the optimal number of expanded keys for each letter?
Fig. 2. Text entry speed for three virtual keyboards
4.2 Accuracy In our study, participants were allowed to enter phrases naturally, thus they may commit errors and make corrections. We measured errors made during text entry and the errors left in transcribed string using Corrected Error Rate and Not Corrected Error Rate metrics respectively [14]. Figures 3&4 show that participants made more corrections than leaving errors for presented text in all tasks. However, error rates were not significantly different between all tasks.
Fig. 3. Corrected error rate for three virtual keyboards
BigKey: A Virtual Keyboard for Mobile Devices
9
Fig. 4. Not corrected error rate for three virtual keyboards
Comparing the three virtual keyboards, the analysis yields the least error rate in favor of the Four-BigKey (see figures 3&4). These results suggest that keys were easier to acquire when expanding them through Four-BigKey.
5 Conclusion and Future Work We have shown that the expanding targets based on letter prediction can be an effective means of making targets easier to acquire for the virtual keyboard of handheld devices. This design offered 25.14% better speed during text entry for the BigKey virtual keyboard over the normal virtual keyboard and higher accuracy at the same time. On the basis of these preliminary results, we are conducting detailed experimentation to explore the optimal number and size of predicted keys. In the future, we are planning to study the efficiency of the BigKey system for people with motor impairment who need to reduce the fatigue of target acquisition. We also intend to explore the impact of expanding the next entry on the recall of the word completion for people with Alzheimer’s disease.
References 1. Perlin, K.: Quikwriting: Continuous Stylus-Based Text Entry. In: UIST 1998, pp. 251– 316. ACM Press, San Francisco (1998) 2. Zhai, S., Kristensson, P.O.: Shorthand Writing on Stylus Keyboard. In: CHI 2003, pp. 97– 104. ACM Press, Ft. Lauderdale (2003) 3. MacKenzie, I.S., Soukoreff, R.W.: Text Entry for Mobile Computing: Models and Methods Theory and Practice. In: Human-Computer Interaction, vol. 17(2), pp. 147–198. Lawrence Erlbaum, Mahwah (2002) 4. Zhai, S., Hunter, M., Smith, B.A.: The Metropolis keyboard: An exploration of quantitative for graphical keyboard design. In: UIST 2000, pp. 119–128. ACM Press, San Diego (2000)
10
K. Al Faraj, M. Mojahid, and N. Vigouroux
5. McGffin, M., Balakrishnan, R.: Acquisition of expanding targets. In: CHI 2002, pp. 57– 64. ACM Press, Minneapolis (2002) 6. Ward, D.J., Blackwell, A.F., Mackay, D.J.C.: Dasher: A Data Entry Interface Using Continuous Gestures and Language Models. In: UIST 2000, pp. 129–137. ACM Press, San Diego (2000) 7. Mankoff, J., Abowd, G.D.: Cirrin: A Word-Level Unistroke Keyboard for Pen Input. In: UIST 1998, pp. 213–214. ACM Press, San Francisco (1998) 8. Cechanowicz, J., Dawson, S., Victor, M., Subramanian, S.: Stylus based text input using expanding CIRRIN. In: Proceedings of the Working Conference on Advanced Visual Interface, AVI 2006, pp. 163–166. ACM Press, New York (2006) 9. Raynal, M., Truillet, P.: Fisheye keyboard: Whole keyboard displayed on PDA. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 452–459. Springer, Heidelberg (2007) 10. Fitts, P.M.: The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology 47(6), 381–391 (1954) 11. MacKenzie, I.S.: Fitts’ Law as a Research and Design Tool in Human-Computer Interaction. In: Human-Computer Interaction, vol. 7(1), pp. 91–139. Lawrence Erlbaum, Mahwah (1992) 12. Mayzner, M.S., Tresselt, M.E.: Tables of Single-Letter and Diagram Frequency Counts for Various Word-Length and Letter-Position Combinations. Psychonomic Monograph Supplements 1(2), 13–32 (1965) 13. MacKenzie, I.S., Soukoreff, R.W.: Phrase set for evaluating text entry techniques. In: CHI 2003, pp. 754–755. ACM Press, Ft. Lauderdale (2003) 14. Soukoreff, R.W., MacKenzie, I.S.: Metrics for text entry research: An evaluation of MSD and KSPC, and a new unified error metric. In: CHI 2003, pp. 113–120. ACM Press, Ft. Lauderdale (2003)
TringIt: Easy Triggering of Web Actions from a Phone Vinod Anupam Anexas Inc, 67 Shields Ln, Bridgewater NJ 08807, USA
[email protected]
Abstract. Much information that is of interest to mobile users is available on the Web, yet is difficult to access for most users. We introduce a novel method for users to interact with network-connected computers using their phones, and describe a system called TringIt that implements the method. TringIt enables users to trigger Web actions by simply dialing specific numbers – an action that we call a Phone Click. TringIt can be used out-of-the-box by any phone user. The Phone Click is most useful from mobile phones that can receive messages in response to the click. TringIt enables users to easily initiate interaction with businesses and content owners by simply dialing numbers discovered in offline media (e.g. print, TV, radio) as well as online media (e.g. Web, SMS, MMS.) It makes every mobile phone a more compelling information, interaction and participation device. Keywords: Phone Click, Tring, Dial-to-Click, Call Triggered Messaging, Userto-Application Signaling, SMS/MMS Click-through, Dial-able hyperlinks.
1 Introduction Dialing phone numbers is the easiest thing that users can do from a mobile phone. Users manually dial numbers, dial them out of address books, speed dial them, voicedial them, and even "click" on phone numbers in messages and Web pages to dial them. Phone-car integration systems (including e.g. Microsoft Sync) allow users to place calls hands-free in their automobiles. So far, users have been able to do one of two things by dialing a number: set up a voice call (to interact via voice with another user or with an Interactive Voice Response system) or set up a dial-up connection (to interact with a remote system via data modulated over the voice channel). In this paper, we describe a new way in which users can interact with networkconnected computers using their phones. We introduce the notion of a “Phone Click” – the triggering of a Web action in response to the dialing of a number. We describe a system called TringIt that enables any mobile phone user to interact with Web applications in this easy-to-use yet powerful way, and discuss how it is used to easily request information. The paper is organized as follows. In Section 2 we discuss several capabilities of mobile phones, some of which are leveraged by the new technique. In Section 3 we describe how phone signaling is used for user-to-application signaling, and describe some applications. Finally, we summarize our conclusions and discuss upcoming work. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 11–20, 2009. © Springer-Verlag Berlin Heidelberg 2009
12
V. Anupam
2 Background and Motivation The mobile phone is the most widely used connected device. Over 3.9 billion (and growing) mobile phone users worldwide connect to the "network" - including the Public Switched Telephone Network (PSTN) and the Internet - via their mobile phones. Importantly, every mobile phone – even a relatively basic feature-phone – is a very sophisticated, network-connected computer. For most users, however, it is still difficult and/or expensive to request and receive information in their phone – the device is vastly under-utilized, compared to its potential! Most users use their mobile phone just to make voice calls and to send and receive messages. While the number of mobile users who access the Internet from their mobile phones is steadily increasing, adoption is still low in most markets (other than Japan and South Korea, which have seen significant adoption.) In the US, about 16% of subscribers use applications that access the Internet from their mobile phones [9]. Adoption is even lower in other mature markets - e.g., 10% in France, 7% in Germany – and especially in emerging markets - e.g., 2% in India, 7% in China. Many emerging mobile Web/Internet based applications have been unable to reach critical mass because limited adoption of mobile Internet leaves little room for network effects to kick in. Key barriers to adoption include: • Cost - User-incurred cost of data service is a key issue. • Usability - Mobile web browsing poses significant usability challenges. Mobileoptimized Web sites are still more the exception than the rule, and most mobileoptimized applications are not yet widely used. It is reasonable to posit that information solutions that work for all users, are easy to use and also are sensitive to user-incurred cost have a higher likelihood of adoption than those that are not. Mobile phone subscribers typically pay for calls that they place from their mobile phone, for messages that they send from their phone as well as for data connections initiated from their phone. In many parts of the world (e.g. Europe, Africa and Asia), incoming calls and incoming messages are free to the user, while in other parts of the world (e.g. USA) subscribers pay for incoming calls as well as incoming messages. Messaging uptake and usage has historically been high in markets where calls are/were significantly more expensive than messaging. This can be attributed to the subscriber's desire to minimize cost of communication, even at the cost of poor usability (e.g. triple-tapping to create messages.). Let us quickly look at the main communication channels available to users in their mobile phone in terms of their reach, their relative cost and their key deficiencies. 2.1 Phone Calls Voice calls dominate user-generated traffic in mobile phone networks, and continue to be the mainstay of mobile network operator revenue worldwide. Users call to interact with other users and businesses. Phones are optimized for voice calls, and provide capabilities like address books, speed dialing, voice dialing etc. to streamline communication via voice calls.
TringIt: Easy Triggering of Web Actions from a Phone
13
Businesses widely use phone numbers (often toll free numbers, like 1-800 numbers in the US) that users can call to speak with representatives, customer service agents etc. to receive information and to trigger interactions. While most voice calls are person-to-person calls, use of Interactive Voice Response (IVR) [7] systems is the second largest use case. Sophisticated IVR applications are widely used worldwide – reducing the need for expensive human agents. Technologies like VoiceXML [12] allow users to interact with these systems using voice and/or touch-tone input. IVR systems interconnected to the Web provide a powerful channel for information retrieval by all phone users. Universal reach (any phone user can make voice calls) is a key strength of the voice call. Additionally, phone numbers are familiar, and easily disseminated via both offline and online channels. However, there are some key deficiencies of voice as a channel for requesting and receiving information. Voice is the most expensive channel in most markets. Information received via voice does not persist in the phone for subsequent use (the user must either remember what he was told by the other party, or must transcribe it onto another medium - e.g., by writing something down on paper.) IVR systems are somewhat cumbersome to navigate. Finally, voice interfaces are not optimal for use in noisy environments – typical of mobile usage. 2.2 Messaging Messaging is the second most frequently used communication channel from mobile phones. Users transmit and receive billions of messages every day. Messaging is analogous to email – messages created by a sending party are transmitted using a store-and-forward metaphor towards the target party. SMS (Short Message Service) [1] is the most popular form of messaging. Since it is limited to a small payload of 160 7-bit characters, SMS is frequently used to send short messages and status updates. MMS (Multimedia Message Service) [10] is a high-capacity messaging channel that has support for larger messages possibly containing images and/or short video clips. While MMS message size limits are sometimes imposed by operators, typical limits are upwards of 100K bytes. Messaging initially offered an economical communication channel alternative to the much more expensive voice call. The benefit of lower cost outweighed poor usability - the fairly cumbersome process of multi-tapping messages via a phone's numeric keypad. However, messaging has some key strengths. It works in every mobile phone. It is less intrusive than voice calls. It is very easy to receive messages in phones. And much application infrastructure now exists allowing messaging systems to be interfaced to Web systems for a variety of applications. However, messaging has drawback: it is still somewhat cumbersome to create a message by multi-tapping. A typical call to action displayed in a magazine or TV ad, or transmitted via SMS/MMS is of the form "TEXT PIZZA to short code 123456 to receive a coupon.” It takes about 18 button presses to react to that call to action on a typical phone. And the vocabulary for interacting with different application servers varies greatly, imposing a cognitive load. Collectively, these present a usability barrier for users who want to respond to the call to action.
14
V. Anupam
2.3 USSD Unstructured Supplementary Services Data [2] is a communication channel available in GSM networks. Users can manually (or via an in-phone client application) create and send USSD messages, and can receive/review USSD messages via a notifier application. Unlike SMS, which uses transaction-based store-and-forward communication, USSD uses session based point-to-point communication and is therefore much faster. The key strengths of USSD are its wide deployment on GSM networks (every GSM phone is capable of sending and receiving USSD messages) and its low cost. The key drawbacks there are no standard vocabularies for interacting with application servers to request information, so messages are cumbersome to create. Additionally, USSD application server infrastructure is not as widely deployed as SMS application server infrastructure. And USSD does not work in CDMA networks. 2.4 Mobile Internet/Web Internet access from mobile phones is a powerful communication channel. Modern mobile networks provide high bandwidth and low latency Internet access. Most modern mobile phones have a mobile Web browser built in, and mobile-ready Web sites can provide a compelling interactive user experience. However, the mobile phone is a very constrained device, with a small screen and limited input capabilities. The browser in a typical mobile phone is significantly less sophisticated than a PC browser (but this is changing with high end phones like the Apple iPhone). Most Web sites are not mobile ready, leading to a very poor user experience when accessed from a mobile phone. These factors collectively make it a low-reach information channel - most mobile phone users do not use the mobile Web. To address usability concerns in mobile phones, a variety of in-phone client applications are being built and deployed. Clients that let users interact with email as well as instant messaging (IM) systems are available in many phones, and have seen some uptake. However, application infrastructure that enables interfacing these channels with Web servers is limited. Emerging mobile widget solutions (like Yahoo Go! [13]) provide the user a mobile-optimized information browsing experience, but have seen limited uptake. Such solutions, however, do provide infrastructure for interfacing with Web applications.
3 TringIt: Dial to Click Dialing a number is the easiest thing that a user can do from a mobile phone. And every mobile phone can receive messages. We exploit these attributes to create a solution that is both easy to use and usable by the entire mobile phone user population. The key idea is to use telephone signaling for application signaling, and to use messaging for information delivery. The PSTN (Public Switched Telephone Network) is the interconnection of POTS (Plain Old Telephony System) networks and PLMN (Public Land Mobile Network) networks operated by multiple network operators. To support VoIP (Voice Over IP), the PSTN interconnects to the Internet.
TringIt: Easy Triggering of Web Actions from a Phone
15
Modern phone networks use separate signaling and "data" channels and can be thought of as two parallel networks - the Common Channel Signaling network that uses protocols like SS7 (Signaling System 7) [6] to set up and tear down calls, and the "media" network that carries audio "data." Many networks now have all-IP cores. From the user perspective, however, the details are irrelevant. Users simply care about the fact that the call gets set up to the appropriate party when they place a call. A call in a telephone network proceeds in two stages: • A "signaling" stage in which the phone network locates and rings the called phone and provides feedback to the calling phone. • A "communication" stage involving voice transport between the calling party and the called party, possibly with data modulated over the voice channel. Importantly, the telephone network transmits two key pieces of information in the signaling stage of a call - the calling number and the called number. Dialed Number Identification Service (DNIS) [11] is a telephone network service that lets a subscriber of the service determine which telephone number was actually dialed by a caller. Automatic Number Identification (ANI) [4] is a telephone network service that permits subscribers of the service to capture the telephone number of a calling party. Via such services, call signaling automatically transmits a small amount of information from the calling party’s network to the called party’s network. 3.1 Similarity between Phone Signaling and Web Requests Interestingly, a user's interaction to place a call via the telephone network can be likened to Web request. We can think of a phone number as a URL, the phone as a browser, and the PSTN as an amalgamation of Internet infrastructure (name servers, routers etc.) and Web servers. A user uses his phone (the browser) to manually enter a number (a URL) or to speed-dial a number (a "browser bookmark"). The phone (the browser) transmits a request into the phone network. The request includes information about the called party (the URL) and the calling party (a Header in the request). The phone network attempts to route the call to the appropriate party (like the browser uses DNS to identify the IP address of the target server) and delivers signaling information to the called party (like a Web request being delivered to the server). If there is an error during this process (e.g., invalid number) appropriate feedback is provided to the calling party (like a “404 Not Found” error in HTTP). Otherwise, the phone starts the "voice client" that allows the user to listen to ringing tones. If the call is answered by the called party, all subsequent interaction happens via bi-directional streaming over a voice channel that is set up as a side effect of successful signaling. (This is equivalent to the browser launching a helper application that supports bidirectional voice.) Placing a phone call, therefore, can thus be thought of as a special kind of Web "click" from a phone. So far, this click has had very limited range of behavior - the click either results in the voice call being established or failing. (Setting up of a dialup connection is a special case, where a voice channel is first established, and the established channel is used to modulate/demodulate data over a carrier signal.).
16
V. Anupam
TringIt generalizes the use of telecom signaling, enabling it to drive any Internetconnected application. In particular, it enables rich phone + Internet converged applications [8] that work with any phone. TringIt enables mobile phone users to interact with Web applications in a powerful yet easy-to-use way by enabling Web actions to be triggered by simply dialing numbers - "Dial to Click!" TringIt is most useful for users who do not yet use the mobile Internet - it uses SMS/MMS to deliver information to users in response to the received phone call. However, TringIt is also useful for mobile phone users who currently use the mobile Internet. Numbers can be discovered via any media - online, TV, print etc. In a few button presses from an idle screen - the phone number + SEND, the user can "connect" with the real world entity associated with that number. A mobile phone with Internet access can be driven to the appropriate Web site. 3.2 Signaling Numbers: Phone Numbers as URLs As described in the preceding section, dialing a number to place a phone call can be likened to making an HTTP request. A phone number can therefore be thought of as a URL! So far, these URLs have simply been used to trigger the setting up of a voice call. We know that any action on the Web can be represented as a URL [5] - data needed for that action can either be encoded into the URL used for an HTTP GET request, or can be stored in a descriptor that is used to generate an HTTP POST request to the appropriate server. A phone number can be associated with such a URL by simply maintaining a lookup table mapping the phone number to the URL. If the system can detect when the phone number is called, it can use the lookup table to retrieve the associated URL and can trigger the appropriate Web action. This fundamentally changes what users can do by calling a number! The behavior of such a phone number is different from a regular phone number that is used to set up a voice call. We refer to this kind of number as a "Phone Signaling Number" (more concisely, "Signaling Number.” Any Web action can be associated with a Signaling Number, and thus be triggered by a simple phone call from any phone - wired or mobile. Importantly, this requires no new software in the phone, and thus works out-of-the-box with every phone in use on every network. If the associated Web action causes the generation and transmission of e.g. an SMS/MMS message to the phone, the calling user can dynamically receive "feedback" for the call – possibly containing requested information! The numeric form of this new kind of URL is ideal for easy input on the phone. Its compact and familiar form makes it easy to display and disseminate via any medium - books, magazines, newspapers, flyers, packaging, receipts, TV, radio, video and audio content, Web sites, email, SMS/MMS messages etc. And conventional letters-to-number mnemonics (1-777-4PIZZA to represent 1-777-474992) can make some subset of the numbers self-describing. These numeric URLs – “dial-able hyperlinks” – can easily be direct dialed from an idle phone screen. They can be stored in phone address books and can be speed-dialed and voice-dialed. They can be communicated from user to user. They can be embedded in SMS/MMS/email messages or in Web content and can be easily clicked by the receiving user.
TringIt: Easy Triggering of Web Actions from a Phone
17
3.3 Tring: A Phone Click As described earlier, when a call is being set up in the phone network, the calling party’s network transmits a small amount of information to the called party. This information can be used to create tremendous value. If the called party is associated with multiple phone numbers, the calling party can signal different things to the called party, based on the number it calls. If the calling party and called party can maintain state - i.e. remember information about previous transmissions, the interaction gets richer. E.g. if the calling party calls different numbers in some sequence it can convey additional information to the called party. Additionally, if the calling party and called party have shared context - i.e. some persistent (non-transient) data that both parties share (that is possibly transmitted and shared between the calling party and the called party using some separate offline mechanism) - the interaction gets even richer. The shared context allows specific meaning to be attributed to calls to and from different numbers, and the calling and called parties can be aware of that meaning. The ability to exchange user-to-application signals in the presence of state as well as shared context enables very compelling applications. TringIt enables the act of placing a call to a Signaling Number to be interpreted as the intent of the calling party to trigger the associated action registered by the called party. The associated Web action can equivalently be triggered by the called party or by any entity along the signaling path between the calling party and the called party. This makes the humble phone call much more versatile. A simple call can be used to trigger any Web action that has been associated with the number – a “Phone Click.”
TringIt Server Mobile Phone Network
1
3
Phone Network
Internet
2
Web Server Fig. 1. The Versatile “Phone Click”
Figure 1 depicts the flow of a typical Phone Click. In Step 1, signaling information travels over the mobile phone network and the PSTN to a TringIt Server when a call is placed to a Signaling Number. The TringIt Server looks up and triggers the corresponding Web action in Step 2, over the Internet. In Step 3, the TringIt Server transmits a message to the calling phone via the Internet and mobile phone network. The TringIt Server is a network-based server that interfaces to the phone network to receive Phone Clicks - calls that are used to trigger Web actions. Details of the TringIt Server are outside the scope of this paper, and are being published separately
18
V. Anupam
[3]. At a high level, the TringIt Server interconnects to the PSTN via physical and logical connections. It receives and terminates phone calls aimed at Signaling Numbers and triggers appropriate Web actions in response. Any phone number can potentially be used as a Signaling Number – the TringIt server just needs to be able to receive the call. Calls can be directed at the TringIt Server by routing or forwarding. For best usability, however, Signaling Numbers should come from a separate part of the phone number-space so that they are visually identifiable as numbers that trigger Web actions (as opposed to those that set up calls.) This can be achieved, e.g., by using numbers from a separate unused area code or country code, to create a concept like 1-800 numbers – users “know” that calls to 1-800 numbers are toll free. An appropriate prefix can be allocated and assigned by the appropriate phone number administration authority that controls and manages the phone numbering space. 3.4 Network-Based Context and Personalization One of the key challenges for non-voice use of mobile phones is usability. Portability and power constraints impose the requirement that the screen be small. A majority of phones only have a small set of buttons (typically 15-20), most representing digits for dialing numbers (and via techniques that use multiple taps per key, allow users to input alphabets and symbols as well). While mobile phones have been optimized for numeric and voice input, non-voice and non-numeric input is still cumbersome. Usability of interaction from such constrained devices can be greatly improved by maintaining as much information as possible in the network and allowing users and application servers to use that information easily as needed. We refer to this userrelevant information in the network as "Network Context" or simply context. TringIt stores commonly used user profile information like email addresses, postal addresses, preferences etc. in the network. The user maintains information in the context through a rich, non-phone mechanism – a Web portal accessed via an Internetconnected PC. The user can also maintain context information directly from the mobile phone - via a voice portal, SMS portal, phone client etc. if desired. The mobile phone differs from a Web browser in a PC in a very important way. It is the only device that most users carry with them almost all the time yet rarely share with other users - making it an ideal device to deliver a personalized interactive experience. As described earlier, signaling information that is transmitted across the phone network to set up a phone call typically includes the calling party number and the called party number. TringIt uses information derived from the calling party number as part of the data used to trigger the Web action. Awareness of the calling party's "identity" is used to provide personalized services triggered by the Phone Click – TringIt uses ANI information as a key to look up profile data in the context database. Importantly, ANI information is transmitted by default - and requires no manual intervention on the part of the calling user. The calling user not only indicates to TringIt what action he wants to trigger (based on the called number) but also simultaneously, and with no extra effort, provides information that can be used to personalize his request. This greatly improves usability for the calling user. TringIt uses well-known and understood techniques and best practices for maintaining context
TringIt: Easy Triggering of Web Actions from a Phone
19
information in the cloud, and to deal with issues of scalability, reliability, security, access control etc. Maintaining personalizable network context has a significant additional benefit – it enables multi-modal interaction. TringIt allows the user to dynamically intermix interaction via multiple input channels - e.g. via voice, by sending an SMS/MMS message, by clicking a link on a Web page or by using an application, and via Phone Clicks - the user can choose the channel that works best based on dynamic usability and price considerations. 3.5 Internet Access Not Required The mobile phone differs from a Web browser in a PC in some other key ways. By virtue of built-in technologies (e.g. a messaging client, address book), a mobile phone provides “data persistence” and is also “spontaneously reachable” from the network – making it an ideal portable receiver of information delivered by Web applications. TringIt leverages these features to make the mobile phone much more useful, even without Internet access. TringIt enables any mobile phone user to easily use the phone as an input device to trigger Web applications – like a programmable universal remote control – that then deliver information via messaging. The Phone Click is also useful for users with mobile Internet access. The incoming message can be a rich email message containing hyperlinks to Web sites, or a WAP Push that drives an in-phone Web browser to a particular mobile Web site to start a browsing session, or can wake up an in-phone application, triggering it to perform some action. A key contribution is the elimination of the need for end-to-end HTTP from the phone. Users can request information via a Phone Click. Application servers can retrieve Web documents on behalf of the user, transform the results appropriately to fit into an SMS or MMS message, and transmit the message to the user. Users can allow received information to persist in the mobile phone for as long as required, possibly eliminating subsequent need to re-fetch that information. TringIt makes it possible for a user to request news, weather and traffic updates, stock quotes etc. by simply dialing the corresponding number. The information request is personalized using profile data stored in the network. By simply adding numbers and content providers, TringIt makes it possible for users to easily request directions to a hotel, information about a product, daily specials at a restaurant, check order status etc. Like a Web click, the possibilities are limitless.
4 Conclusion The Phone Click uniquely enables a class of applications that work by simply calling numbers – without requiring any additional input over the telephony media channel. These applications automatically trigger Web actions and possibly send information to users in response to a call. By using information that is delivered over the signaling channel as well as any context information available in the network (relevant to the calling number, the called number and the application server) very compelling applications can be created - the caller simply needs to be aware of the action that will be triggered by the application server when he calls that number. The Phone Click can
20
V. Anupam
be employed to enable useful services for all phone users, not just users with mobile Internet access. Importantly, the Phone Click complements all existing channels for communication from a mobile phone - voice, messaging, USSD and mobile Web, making them all more powerful. This user-to-application signaling capability is a broadly useful enabler. It enables all mobile phone users to economically and easily initiate contact and interact with businesses, brands and information providers via their mobile phone. Users simply need to know the number to dial, and numbers can easily be discovered via traditional offline and online channels. TringIt eliminates key usability and cost “friction,” enabling mobile phone users to economically and easily initiate contact and interact with information providers via their mobile phone. It makes the mobile phone a more powerful tool and unlocks its potential as the ultimate information, interaction and participation device. Future Work. While the act of dialing numbers to initiate a voice call is very familiar to users, the notion of triggering Web actions by doing so is not. The authors plan to study the usability barriers associated with the Phone Click via experiments and focus groups. The authors are working on simplifying discovery of Signaling Numbers and on simplifying browsing/review of information delivered by applications – existing software in mobile phones offers significant scope for improvement – and are designing an in-phone client application that will further simplify the user experience. The authors are also developing cloud-based infrastructure that will enable businesses, brands and content providers to easily integrate the Phone Click into their applications.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
3GPP TS 03.40 – Technical Realization of the Short Message Service (SMS) 3GPP TS 03.90 – Unstructured Supplementary Service Data (USSD) Anupam, V.: Using Phone Signaling for Application Signaling (in preparation) Automatic Number Identification, http://en.wikipedia.org/wiki/Automatic_number_identification Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H.F., Secret, A.: The World-Wide Web. Comm. ACM 37(8), 76–82 (1994) Dryburgh, L., Hewitt, J.: Signaling System No.7 (SS7/C7): Protocol, Architecture and Services. Cisco Press (2004) Interactive Voice Response, http://en.wikipedia.org/wiki/IVR Kocan, K.F., Roome, W.D., Anupam, V.: A Novel Software Approach for Service Brokering in Advanced Service Architectures. Bell Labs Technical Journal 11(1), 5–20 (2006) Nielsen Mobile: Critical Mass - The Worldwide State of the Mobile Web (July 2008) OMA Multimedia Messaging Service V1.3 Dialed Number Information Service, http://en.wikipedia.org/wiki/DNIS W3C Voice Extensible Markup Language (VoiceXML) 2.1 Yahoo Go!, http://mobile.yahoo.com/go
Context Awareness and Perceived Interactivity in Multimedia Computing Xiao Dong1 and Pei-Luen Patrick Rau2 1
Industrial & System Engineering Department, University of Minneosta, Minnesota, USA
[email protected] 2 Industrial Engineering Department, Tsinghua University, Beijing, P.R. China
[email protected] H
H
H
H
Abstract. Context awareness and perceived interactivity are two factors that might benefit mobile multimedia computing. This research takes mobile TV advertisements as a scenario and verifies the impacts of perceived interactivity and its interaction with context awareness. Seventy-two participants were recruited and an experiment was conducted in order to identify those impacts. The main findings indicated the following: (1) the effect of high perceived interactivity advertisement is significantly better than the effect of low perceived interactivity advertisement; (2) the interaction of context awareness and perceived interactivity has a significant influence on the effect of mobile TV advertising. Keywords: Context awareness, perceived interactivity, mobile TV advertising.
1 Introduction Mobile multimedia has emerged as the hottest growth area in wireless services. It brings a host of new features and functions into the wireless market, providing advanced forms of communication, entertainment and productivity [1]. Mobile operators are investing considerably in broadcasting mobile TV with fully fledged services in various countries throughout Asia as well as other large scale trials around the world [2]. Although mobile advertising and TV advertising have been studied for many years, few researchers have extended their studies to include TV advertising on the platform of mobile devices. With the development of network and hardware capacity, mobile devices can be used to watch TV programs. The purpose of this study was to identify user perception and response to mobile multimedia services. The result can be a guidance of advertisement designers and a base for further studies.
2 Literature Review 2.1 Mobile Multimedia Mobile TV fulfills the growing need of entertaining and staying informed on the move. Mobile TV builds on established consumer behavior: end-users are familiar J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 21–29, 2009. © Springer-Verlag Berlin Heidelberg 2009
22
X. Dong and P.-L.P. Rau
with the concept of television and, with the continued need for mobility, the benefits of this new medium are clear. Mobile TV is enhanced by the element of interactivity, which adds value to the user experience and makes it a richer entertainment option. 2.2 Mobile Advertising Compared with traditional advertising media, mobile advertising can promote sales of goods and services, create brand images and product awareness (branding), disseminate information using a personally relevant and context-aware approach, support direct business transactions and encourage customer interaction [3,4]. In recent years, mobile advertising has been enhanced by the processing capability of handheld devices and the development of networks. Consequently, some innovative advertising methods are found in everyday life. For instance, mobile games and MMS advertising have emerged as creative advertisement venues. However, mobile advertisers must be very careful not to risk privacy issues and exhaust customer tolerance. We are of the opinion that it is the users and not the media designers or the market that are the ultimate determinants of effectiveness. 2.3 Context The context is frequently determined by user location [5,6]. Hence, depiction and association of user location is pivotal to a context-sensitive advertising system. Potential application fields can be found in areas such as travel information, shopping, entertainment, event information and different mobile professions [5]. Another reason for the importance of location is that it is easier to identify and measure compared with other context components. It can be measured with different positioning systems, such as embedded GPS modules, mobile phones which can be located by the telecom operator of a network, or service points utilizing WLAN, Bluetooth, or infrared technologies. 2.4 Perceived Interactivity Interactivity is generally believed to be a multi-dimensional construct [7,8,9,10], but there is no general consensus regarding the nature and the content of the dimensions. Based on constructs identified in internet studies and analysis of the characteristics of mobile communication, a model of interactivity for mobile advertisements was constructed by Gao et al. [11], comprising user control, direction of communication, synchronicity, connectedness, playfulness, and interpersonal communication. They also stated that different mobile advertising tools might differ in these dimensions due to the different communication style each tool has. For example, message push-ads might allow less user control, but an included reply option will give customers a convenient channel to respond; mobile banners are less intrusive compared with pushads, but they might be ignored or assumed to be only decorative images. User control is conceptualized as the degree of user intervention that is required to operate the system [12]. Dholakia et al. refer to user control as the extent to which an individual can choose the content, timing and sequence of a communication to change his/her viewing experience [13]. It is taken as the core component of interactivity by some researchers [10,14]. Two constructs identified by Steuer [10], range and
Context Awareness and Perceived Interactivity in Multimedia Computing
23
mapping, actually describe two aspects of control. The former refers to the number of options the environment provides the user to modify the task flow and the environment, and the latter refers to the extent to which the controls and manipulations in a computer-mediated environment are similar to controls and manipulations in the real word. This paper manipulates interactivity by adding different user controls to the advertisement.
3 Hypotheses and Methodology 3.1 Hypotheses Hypothesis 1: High perceived interactivity advertisements will have better advertising effectiveness(better memory, better attitudes towards the ads and brand, higher purchase intention) than low perceived interactivity advertisements. For message advertising, user control options are important. Control choice/range and mapping have been traditionally considered as fundamental constructs of interactivity [15,10]. Users can respond by replying to the message directly, call back with a provided telephone number, or visit another source linked in the message. Users also want the control and manipulation in computer-mediated environments to be similar to those in the real world. The more a user can control the options provided, the more interactive the customer perceives the advertisement to be. The more similar the mediated environment is to the real world, the more interactive the customer perceives the advertisement to be. Studies on Internet advertising interactivity have found that there is a strong correlation between perceived interactivity and advertising effectiveness in terms of attitude towards brand, attitude towards ads, and purchase intention [16,17,18]. Previous study also suggested that higher interactivity helps the customer experience “flow” during the interaction [19] and the consequences of the “flow” experience are increased learning and perceived behavioral control. Therefore it was hypothesized that high perceived interactivity has a positive influence on advertising effectiveness. Hypothesis 2: The interaction between interactivity and context-awareness will have an influence on mobile advertising effectiveness. While context-awareness ads give users higher involvement and make the ads more relevant, interactivity provides the customer a chance to communicate with the company, to search for further information or to disseminate information to others conveniently and quickly. As Kannan et al. [20] have already pointed out, it is critical to provide the customer a chance to respond at the point of purchase or usage immediately when sending context-aware advertisements. Immediately redeemable m-coupons, callback numbers, or simply a message requiring a reply from the customer are hypothesized as most likely to exert influence when sent in a contextaware manner compared to when sent in a context-irrelevant manner. Context awareness and interactivity are hypothesized to have an interaction effect on advertising effectiveness.
24
X. Dong and P.-L.P. Rau
3.2 Experiment Participants. Seventy-two participants (36 female and 36 male) from Universities in Beijing voluntarily took part in the experiment. They were randomly assigned to six groups with the combination of two-level context awareness and three-level perceived interactivity. The participants were all undergraduate and graduate students with no prior knowledge about the tasks to be performed during the experiment. The participants’ ages ranged from 20 to 36 years (mean=24, S.D. =2.26) and 55% of the participants had used mobile phones for more than five years. In addition, 98% of the participants had previously received SMS advertisements, and 32% of the participants had received MMS advertisements. 90% of the participants had more than five years of Internet experience, while 32% of the participants had experience connecting to the Internet via mobile devices. Experimental design and variables. Independent variables are perceived interactivity and its interaction with context awareness. Dependent variables were advertising effectiveness which consisting of memory of the advertisement, attitude towards the advertisement and brand, and purchase intention. Memory was measured by a free recall and recognition test. The ads attitude [8], brand attitude [21], user involvement [22] and purchase intention was measured using scales from other researches. Procedures. Each participant was tested individually. They were asked to complete a demographic and technology (internet, mobile services and TV advertisements) usage questionnaire. Then the participants were given an introduction to the experiment’s procedure and their tasks. A practice task was provided to let the participants view sample mobile TV advertisements and make sure they knew how to use the experiment devices (PDA-ASUS A620, KONKA TPC880). During the experiment, all participants visited five different scenarios (mall, bookstore, cell phone market, McDonald’s, and the IE building at Tsinghua University) with a predefined sequence. In each scenario, they did two information seeking tasks and viewed mobile TV advertisements on experiment devices after each task. According to different groups, context awareness and perceived interactivity features were embedded into mobile TV advertisements. Upon completion of all tasks, a free-recall test and an advertisement recognition test were given; the participants were not informed prior to the task that these tests would be applied. Then participants were asked to finish a post-test questionnaire, which measured the user’s attitude towards the ads, brand attitude, perceived interactivity, as well as purchase intention.
4 Results and Discussions In this section we present the reliability of each measure and the results of test hypotheses one and two. The internal consistencies for the questionnaire responses, using Cronbach’s a, were 0.95 for the advertisement attitude questionnaire, 0.95 for the brand attitude questionnaire and 0.87 for the involvement with the advertisement questionnaire, and 0.78 for the perceived interactivity questionnaire.
Context Awareness and Perceived Interactivity in Multimedia Computing
25
4.1 The Effect of Perceived Interactivity on Advertisement It was hypothesized in this study that high perceived interactivity advertisements would have better advertising effectiveness than low perceived interactivity advertisements. High range and high mapping are two different kinds of high level perceived interactivity. They are compared with low perceived interactivity separately to identify whether this hypothesis is true. After the experiment, the data showed that there were significant differences between exposure time to mobile TV advertisements in each group. This factor was used as a covariate in the ANCOVA process. The results can be found in Tables 1 and 2. Table 1. Data for Testing Hypothesis Three (High range×Low interactivity) Perceived interactivity High range
Low interactivity
Variable
Mean
SD
Mean
SD
P value
Ad free recall
2.93
1.73
1.77
1.05
Ad recognition
7.33
1.88
6.13
2.25
0.04*
Ad attitude
4.80
0.52
4.24
0.53
0.00*
0.003*
Brand attitude
4.87
0.53
4.28
0.41
0.00*
Purchase intention
4.69
0.69
4.08
0.49
0.00*
Table 2. Data for Testing Hypothesis Three (High mapping×Low interactivity) Perceived interactivity High range
Low interactivity
P value
Variable
Mean
SD
Mean
Ad free recall
2.82
1.46
1.77
1.05
0.006*
Ad recognition
7.46
2.43
6.13
2.25
0.02*
Ad attitude
4.60
0.65
4.24
0.53
0.02*
Brand attitude
4.62
0.57
4.28
0.41
0.01*
Purchase intention
4.35
0.70
4.08
0.49
0.10
SD
From these results we can see that hypothesis one is supported. High perceived interactivity advertisements will have better advertising effectiveness than low perceived interactivity advertisements. This was consistent with past studies. Cho and Leckenby [16] measured participants' intention to interact with a target (banner) ad and found positive relations (correlation coefficients ranging between .30 and .75) between intention to interact with the ad and attitudes toward the ad, attitudes toward the brand, and purchase intention. Yoo and Stout [18] also achieved similar results. McMillan and Hwang’s [17] study demonstrated that interactivity and involvement
26
X. Dong and P.-L.P. Rau
with the subject of a site were two possible predictors of positive attitude towards the Web site, and perceived interactivity accounted for more of the variance in attitude than did involvement. Analysis of relationships among the variables in the study suggested that the control sub-dimension of perceived interactivity had the strongest correlation with attitude toward the Web site. 4.2 Joint Effect of Context-Aware and Perceived Interactivity It was hypothesized in this study that the interaction between interactivity and context awareness influences mobile advertising effectiveness. As mentioned above, we also included exposure time to mobile TV advertisement in each group as the covariate in the ANCOVA test. The results showed that the interaction between perceived interactivity and context awareness significantly influenced the subject’s attitude towards mobile TV advertisements (F=4.183, p=0.019), their attitude towards brands (F=5.011, p=0.009), and their purchase intention (F=7.732, p=0.001). Although the interaction between context awareness and perceived interactivity have no significant influence on advertisement free recall and advertisement recognition, the p value is quite close to the significant level p=0.05 (F=2.22, p=0.117 for free recall; F=2.793, p=0.068 for recognition). Table 3. The Effect of Interactivity under the Context-aware Condition
Variable Ad free recall Ad recognition Ad attitude Brand attitude Purchase intention
Perceived interactivity High range High mapping Mean SD Mean SD 3.47 2.07 3.31 1.68 7.92 1.51 8.50 2.02 5.14 0.40 4.68 0.72 5.20 0.38 4.64 0.66 5.13 0.52 4.28 0.83
Low interactivity Mean SD 1.55 0.87 5.75 2.42 4.13 0.52 4.19 0.43 3.88 0.46
F value
P value
5.16 6.20 9.72 11.86 12.45
0.01 0.01 0.00 0.00 0.00
Table 4. The Effect of Interactivity under the Context-irrelevant Condition
Variable Ad free recall Ad recognition Ad attitude Brand attitude Purchase intention
Perceived interactivity High range High mapping Mean SD Mean SD 2.39 1.15 2.33 1.03 6.75 2.09 6.42 2.43 4.46 0.40 4.52 0.58 4.54 0.44 4.61 0.49 4.26 0.57 4.41 0.57
Low interactivity Mean SD 1.99 1.20 6.50 2.11 4.35 0.54 4.37 0.39 4.27 0.46
F value
P value
0.43 0.07 0.34 0.94 0.30
0.65 0.93 0.71 0.40 0.75
After examining the effect of interactivity in different context conditions (Tables 3 and 4), we found that when mobile advertisements are distributed in a context-aware manner, the interactivity of ads has a positive influence on the advertising effectiveness in terms of advertisement memory, advertisement attitude, brand attitude, and purchase intention. However, when mobile ads are distributed in a context-irrelevant way, the interactivity of advertisements has no significant influence on advertisement memory,
Context Awareness and Perceived Interactivity in Multimedia Computing
27
advertisement attitude, brand attitude and purchase intention. This finding is partially contradictory with previous literatures in Internet advertising, where it is generally agreed that perceived interactivity has positive influences on advertising effectiveness [16,17,18]. The main reason for the lack of significant effects of interactivity in the context-irrelevant situation may be the lower message involvement in the contextirrelevant condition. First, low involvement leads to a lower information processing level, which may cause the subject to neglect the interaction options of the advertisement or even the advertisement itself. Second, lower involvement with the advertisement content also results in lower motivation to respond to the ad [23]. The lower motivation to respond makes it less critical to provide interaction options, since they do not seek interaction options from the beginning. When the same advertisements were sent in a context-aware manner, however, the effects of the interactivity of mobile advertisements on advertising effectiveness became significant, as expected in hypotheses one. The results in Tables 5, 6 and 7 show that when the interactivity of mobile ads is low, to send them in a context-aware manner could result in even worse advertising Table 5. The Effect of Context Awareness to High Range Advertisements
Variable Ad free recall Ad recognition Ad attitude Brand attitude Purchase intention
Context awareness Location relevant Location irrelevant Mean SD Mean SD 3.47 2.07 2.39 1.15 7.92 1.51 6.75 2.09 5.14 0.40 4.46 0.40 5.20 0.38 4.54 0.44 5.13 0.52 4.26 0.57
F value
P value
2.46 2.46 17.55 15.50 15.13
0.13 0.13 0.00 0.00 0.00
Table 6. The Effect of Context Awareness to High Mapping Advertisements
Variable Ad free recall Ad recognition Ad attitude Brand attitude Purchase intention
Context awareness Location relevant Location irrelevant Mean SD Mean SD 3.31 1.68 2.33 1.03 8.50 2.02 6.42 2.43 4.68 0.72 4.52 0.58 4.64 0.66 4.61 0.49 4.28 0.83 4.41 0.57
F value
P value
2.97 5.21 0.37 0.02 0.19
0.10 0.03 0.55 0.90 0.67
Table 7. The Effect of Context Awareness to Low Interactivity Advertisements
Variable Ad free recall Ad recognition Ad attitude Brand attitude Purchase intention
Context awareness Location relevant Location irrelevant Mean SD Mean SD 1.55 0.87 1.99 1.20 5.75 2.42 6.50 2.11 4.13 0.52 4.35 0.54 4.19 0.43 4.37 0.39 3.88 0.46 4.27 0.46
F value
P value
1.06 0.66 0.98 1.05 4.14
0.31 0.43 0.33 0.32 0.05
28
X. Dong and P.-L.P. Rau
effectiveness than to send them in a context-irrelevant manner, in terms of memory, advertisement attitude, brand attitude and purchase intention. However, when the interactivity of mobile ads is high, it is better to send them in a context-aware manner so as to promote the advertising effectiveness in terms of memory, advertisement attitude, brand attitude and purchase intention. In this study, we also found that the high range effect was greater than the high mapping effect as the indicator of perceived interactivity. The finding that context awareness with low interactivity ads has a negative influence on advertisement effectiveness is surprising. We propose two reasons to address this. (1) Location-based services and advertisements should be provided with great care so as not to invade users’ privacy, since handsets are very personal devices. The low interactivity ads only “broadcast” information about products and services. It is possible that the feeling of being invaded by such ads might become even stronger when users detect that their cell phones are being spammed just because they happen to walk past a particular store. (2) Mobile TV advertisement is characterized by its rich media features. Context awareness may influence advertisement effectiveness through other factors, such as interactivity. But further study is required to identify their relationship.
5 Conclusion and Future Study The effect of perceived interactivity is clear in this study. High perceived interactivity advertisements have better advertising effectiveness than low perceived interactivity advertisements. This finding is consistent with former research on internet and mobile message advertisement. One guideline of perceived interactivity based on this research ought to be formed to give mobile TV advertisement business market instructions. The effects of context awareness on mobile TV advertising effectiveness take place according to different levels of interactivity. With high interactive advertisements, contextual advertising information does increase user response effectively and results in a more accepting attitude. Therefore, when the goal of a mobile advertising campaign is to generate responses, the context in which the response options are given to the users are of importance and must be taken into consideration by the mobile marketer. However, with “broadcasting” advertisements, the user’s attitude towards the brand and the consequent purchase intention were impaired by context aware advertisements rather than improved. This interaction style is needed to be thoroughly studied by future studies.
References 1. Robert, W.S.: Mobile multimedia goes Net-centric. Electronic Engineering Times. Manhasset: Mar 5(1156), 78–79 (2001) 2. Kenton, O., April, S.M., Alex, V.: Consuming Video on Mobile Devices. In: Proc. CHI 2007, pp. 857–866. ACM Press, New York (2007)
Context Awareness and Perceived Interactivity in Multimedia Computing
29
3. Dickinger, A., Haghirian, P., Murphy, J., Scharl, A.: An Investigation and Conceptual Model of SMS Marketing. In: Proc. 37th Hawaii International Conference on System Sciences 2004, p. 10031.2. IEEE Computer Society Press, Los Alamitos (2004) 4. Yunos, H., Gao, J.: Wireless Advertising’s Challenges and Opportunities. Computer 36, 30–37 (2003) 5. Kaasinen, E.: User needs for location-aware mobile services. Personal and Ubiquitous Computing 7, 70–79 (2003) 6. Younghee, J., Per, P., Jan, B.: DeDe: Design and Evaluation of a Context-Enhanced Mobile Messaging System. In: Proc. of CHI 2005. ACM Press, New York (2005) 7. Wu, G.: Perceived Interactivity and Attitude Toward Web Sites. In: Proc. Conference of the American Academy of Advertising 1999. American Academy of Advertising (1999) 8. Ha, L., James, E.: Interactivity Reexamined: An Analysis of Business Web Sites. In: Proc. Conference of the American Academy of Advertising (1998) 9. Liu, P., Shrum, L.: What is interactivity and is it always such a good thing? Implications of definition, person, and situation for the influence of interactivity on advertising effectiveness. Journal of Advertising 31, 53–64 (2002) 10. Steuer, J.: Defining Virtual Reality: Dimensions Determining Telepresence. Journal of Communication 42(4), 73–93 (1992) 11. Gao, Q., Rau, P.L.P., Salvendy, G.: Measuring perceived interactivity of mobile advertisements. Behavior & Information Technology (2006) 12. Van der Heijden, H.: Ubiquitous computing, user control, and user performance: conceptual model and preliminary experimental design. In: Proc. Tenth Research Symposium on Emerging Electronic Markets, pp. 107–112 (2003) 13. Dholakia, R., Zhao, M., Dholakia, N., Fortin, D.: Interactivity and Revisits to Websites: A Theoretical Framework, http://ritim.cba.uri.edu/wp/ 14. Bezjian-Avery, A., Calder, B., Lacobucci, D.: New Media Interactive Advertising vs.Traditional Advertising. Journal of Advertising Research 38(94), 23–32 (1998) 15. Coyle, J., Thorson, E.: The Effects of Progressive Levels of Interactivity and Vivideness in Web Marketing Sites. Journal of Advertising 30(3), 65–78 (2001) 16. Cho, C.-H., Leckenby, J.: Interactivity as a measure of advertising effectiveness: Antecedents and Consequences of Interactivity in Web Advertising. In: Proc. Conference of the American Academy of Advertising (1999) 17. McMillan, S., Hwang, J.S.: Measures of Perceived Interactivity: An Exploration of Communication, User Control, and Time in Shaping Perceptions of Interactivity. Journal of Advertising 31(3), 41–54 (2002) 18. Yoo, C.Y., Stout, P.: Factors Affecting User’s Interactivity with the Web site and the Consequences of User’s Interactivity. In: Proc. Conference of the American Academy of Advertising (2001) 19. Hoffman, D., Novak, T.: Marketing in Hypermedia Computer-Mediated Environments: Conceptual Foundations. Journal of Marketing 60(3), 50–68 (1995) 20. Kannan, P., Chang, A., Whinston, A.: Wireless commerce: marketing issues and possibilities. In: Proc. International Conference on System Science 2001. IEEE Computer Society Press, Los Alamitos (2001) 21. Li, H., Bukovac, J.L.: Cognitive Impact of Banner Ad Characteristics: An Experimental Stud. Journalism and Mass Communication Quarterly 76(2), 341–353 (1999) 22. Norris, C.E., Colman, A.M.: Context effects on recall and recognition of magazine advertisements. Journal of Advertising 21(3), 37–46 (1992) 23. Petty, R.E., Cacioppo, J.T.: Attitudes and Persuasion: Classic and Contemporary Approaches. Westview Press (1981)
Human Computer Interaction with a PIM Application: Merging Activity, Location and Social Setting into Context Tor-Morten Grønli and Gheorghita Ghinea The Norwegian School of Information Technology, Schweigaardsgt. 14, 0185 Oslo, Norway School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge UB8 3PH, London, United Kingdom
[email protected],
[email protected]
Abstract. Personal Information Managers exploit the ubiquitous paradigm in mobile computing technology to integrate services and programs for business and leisure. Recognizing that every situation is constituted by information and events, this context will vary depending on the situation users are in, and the tasks they are about to commit. The value of context as a source of information is highly recognized and for individual dimensions context has been both conceptually described and prototypes implemented. The novelty in this paper is a new implementation of context by integrating three dimensions of context: social information, activity information and geographical position. Based on an application developed for Microsoft Window Mobile these three dimensions of context are explored and implemented in an application for mobile telephone users. Experiment conducted show the viability of tailoring contextual information in three dimensions to provide user with timely and relevant information. Keywords: PIM, context, context-aware, Microsoft pocket outlook, ubiquitous computing, HCI.
1 Introduction Personal Information Managers (PIM’S) exploit the ubiquitous paradigm in mobile computing technology to integrate services and programs for business and leisure. Activities performed with PIMs range from plotting appointments and tasks in a calendar to the automatic information exchange between mobile devices, different device synchronizations and base station communication. In every situation in our daily life, an independent context of information and events is defined. This context will vary depending on the situation users are in, and the tasks they are about to commit. Nonetheless, despite the fact that the challenge of defining context has already been addressed [1-3] and that its value as a source of information is recognised, the link between context and PIMs is still an immature and unexplored area of research. This paper addresses the use of context in an everyday mobile phone based PIM application which makes use of context based information to enhance the user experience. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 30–38, 2009. © Springer-Verlag Berlin Heidelberg 2009
Human Computer Interaction with a PIM Application
31
The unique use of context in this PIM application combines activities, social information and geographical information.
2 Background Developers and researchers agree that context is an important factor when designing new applications. With PIM devices becoming increasingly widespread and in daily use by a large population, this opens interesting possibilities for development. Such applications would potentially be used daily by people in their homes or at their workplace, especially bearing in mind that people carry mobile devices and thereby the application with them almost 24 hours a day. Recent arguments state the possibility for business travelers and other mobile workers to leave the laptop at home and shift entirely to mobile devices because of their increased capacity and convenient size. Context in mobile applications has been looked at by more than a few researchers [1,10]. For example, Ludford et al. [8] looked at the use of context to provide useful information to the user on their mobile phone. In their work, context is based on the location and/or time of the day. This partly makes use of daily available context information, which, however, is not instantly fed back into the system as parameters for information retrieval. Efforts have also been made to make use of context as a tool for supporting business travelers in [8]. The definition of context here as the user’s planned activity in combination with the location is quite interesting. This, because it generates quite a lot of information about the user, but information is of reduced interest if we have no way of making use of it. Zhou et al. [11] have also demonstrated the use of context sensitive information for the real estate business. These are just two examples out of many and one could only imagine many other possible scenarios. On an overall basis, though, the use of context in applications is often missing or single dimensional. This focus should be changed, since automated PIM applications which take into account the total context of the user would possibly be able to not only support the everyday tasks of the user, but also improve efficiency and ease the work of the user by automatically tailoring information to the user’s needs and/or adapting the application to the user’s current setting. The CityFlocks application [1] is one step in this direction, however it falls short of offering a full solution to the problem. The widespread use of small mobile devices (PIMs) has, as shown, forced researchers and developers to consider context as an important criteria for highly mobile systems. The notion of context-aware computing is generally the ability for the devices to adapt their behavior to the surrounding environment, hence enhancing usability [7]. Towards this goal, Dey and Abowd [2] state that if we understand context fully in a given environment and setting, we would then be able to better choose what contextaware behaviours to sustain in applications. This could lead to more realistic applications and thereby applications more meaningful to users. This is also exemplified by Edwards [4] when he uses context information to build an application. In his application different layers represent different sources of information and they can be reused in later settings. Edwards argues that context is a major part of our daily life and that
32
T.-M. Grønli and G. Ghinea
computing with support for sharing and using contextual information (context-aware computing) would improve user interaction. Indeed when viewing people rather than systems as consumers of information, a new infrastructure is needed.
3 Design Our application interacts with the user by presenting information relevant to the users’ context. To be able to do this, system design and functionality are split into three main modules, each of which generate context-aware information and responds to different user interactions. This enables a precise definition of elements and thereby tailoring of information to be displayed according to the users’ actual needs. One of these main modules handles activity, one handles social context and the third handles the geographical location. The input from all three sources together provides the foundation for the user context computation. By this operationalization of context and contextaware concepts we are able to create a user context. This user context is then the foundation upon which the application takes action to present information. We now proceed to provide further details of each of the three modules involved in our PIM application. 3.1 Social Context This module computes the foundation upon which the social context is determined. Social context will naturally differ tremendously based upon each situation and for each individual. Still, it is possible to use one social context in common by several people by choosing concepts that are interpreted the same by most people. This is achieved through building a taxonomy of concept terms illustrated in Table 1 below. Table 1. Taxonomy of social context
Categories:
Sub categories:
Leisure
Work
Shopping Cinema Spare time Food Culture
Meeting Preparation Own time Travelling Phone meeting
Travel Train Car Tube Foot Transport
In our approach, building on the description of activities from Prekop and Burnett [9] information of the social context in the application are stored and meta-tagged as Pocket Outlook activities / appointments. These tags are based on the above taxonomy and are implemented for the user through extending the standard Pocket Outlook category interface (Figure 1). Thereby the user can enrich (tag) the activity with category tags through a familiar interface and thereby greatly increase the familiarity of the system.
Human Computer Interaction with a PIM Application
33
Fig. 1. Taxonomy interface
3.2 Geographical Location Module This module calculates location based on input from the internal Global Positioning System (GPS) in the device. When a valid connection through the GPS device to a GPS satellite occurs, it returns location information to the application. The input from the GPS is then parsed and the actual location retrieved by inspecting the longitude and latitude coordinates. These coordinates are then mapped to a one of 16 specific zones (in our case, Oslo, Norway). As the user is moving, information about the current zone is updated and stored in the application. This is done by letting the device interact with the running applications data sources and although the user not actually is feeding any information into the device, the physical moving around is sufficient for context-aware exploitation of user information. 3.3 Activity Module This module communicates with the Microsoft Pocket Outlook storage on the mobile device and retrieves appointments and activities. The module accesses the Pocket Outlook storage directly and also listens to system events generated in this storage. The user interacts with the activity module through the familiar Pocket Outlook interface, and attaches one or more of the category terms as described previously. In doing this, almost unknowingly, the user improves the quality of activity information and thus eases the use of the PIM application.
4 Implementation The application prototype is designed for and implemented on a Pocket PC device [6] (HTC 3600 phone) using the Microsoft Windows Mobile 6.0 operating system. The application is programmed in Microsoft .NET Compact Framework with C# as
34
T.-M. Grønli and G. Ghinea
implementation language. Geographical position is acquired through the use of GPS and activities and appointments are acquired through Microsoft Pocket Outlook. All data on the device are kept continuously up to date by computer synchronization with Microsoft Outlook 2007. This device is also chosen since it contains powerful enough hardware and has a decent enough storage area to be suitable for software development.
5 User Evaluation The PIM application was evaluated with a test group of 15 users who undertook a set of social and work-related activities whilst navigating a route through central Oslo (Figure 2).
Fig. 2. Suggested route through city
Users had to complete an evaluation questionnaire after the testing shown in Table 2 below. For all questions, users were asked to state whether they agreed or not to each statement. Measurement scale: Strongly Disagree (SD), Disagree (D), Mildly Disagree (MD), Mildly Agree (MA), Agree (A) and Strongly Agree (SA). Each possible answer from SD to SA was mapped to a number from 1 to 6, respectively, and the responses thus received were analyzed using a T-test (Table 3). We will in the following sections elaborate on the implications of our evaluation exercise.
Human Computer Interaction with a PIM Application
35
Table 2. Questionnaire
1 2 3
The information provided by the reminder system correctly matched my current location The information provided when I was “Sightseeing in old town” was incorrect. The summary of blocked events I received after appointment “DNB Nor Solli plass” was useful
4 5
The system provides duplicated information I liked the fact that the application is integrated with Outlook
6 7
The reminder system is useful I would use this application in my daily life
User responses to the evaluation questionnaire are summarized in Table 3 below. Table 3. T-Test results
Question 1
Mean Response 4.80
T-Value
P
8.088
0.000
2
2.47
-1.054
0.310
3
3.73
1.585
0.135
4
2.73
-0.654
0.524
5
5.47
12.854
0.000
6
5.20
12.602
0.000
7
3.93
3.287
0.005
From Table 3 we can see that all answers display a positive bias (Questions 2 and 4 are in fact negative statements about the application, therefore the negative bias here, reflects positive user statements). In question 1, 14 out of 15 answered that information displayed did match their current location and the responses are statistically significant. This would indicate a correct computed context and correct displayed information on an overall basis. For question number two, 11 out of 15 respondents were negative to that the information being displayed in one appointment was incorrect. This indicates that 11 got, at least partly, displayed correct information and four had incorrect or no information displayed. There is thus a strong bias towards negative answers, a few (4) positive answers and no middle values (MA / MD). This polarization of results however leads to the data for this question not to be statistically significant.
36
T.-M. Grønli and G. Ghinea
Fig. 3. Questionnaire results
As described, the application prototype is adaptable to different scenarios and user settings, but a context dependent application needs to be tailored to the users’ need when being deployed for a real life setting, e.g. the initial categories and their weight need to be configured in accordance with the findings of Prokop and Burnett [9]. When these issues are taken care of, the user experience might improve and increase even more the positive trend in answers to question six and seven. Moreover, as shown by Zhou et al. [11], information tailoring is an important task to help users interpret data. Our application focuses on tailoring by having minimal information displayed at the same time, when new messages are shown, thereby easing the users’ interpretation. Currently calendars and applications based on these do not take multi contextual information into account [7] as they often only reproduce the information already available there. Thereby, this can at worst lead to incorrect display of data and at the best a reproduction of data in a new interface. Our developed PIM application greatly differentiates from this by only displaying information based on the computed user context, given by the three factors social context, location and activity / appointment. Earlier approaches that have made use of calendar data from Pocket Outlook, often end in using Pocket Outlook data together with a simple timeline (i.e. [5]). In our approach, the use of Pocket Outlook data is extended to not only retrieve and display the data, but also to add extra meta-information to the appointments. Results from question five show that all respondents stated they liked the integration of the developed PIM application with the Outlook calendar. This is important because it shows they had no problems entering an appointment in one application, and having information displayed in another application (the prototype). In the evaluation exercise, generation of information was tightly connected with the actual task at hand and participants were asked to judge whether or not they found the application useful (question six). Our results show that all 15 users involved in the evaluation thought the
Human Computer Interaction with a PIM Application
37
application gave useful value. This would indicate that the reminder system is an application with practical use for the users. One other side of usefulness is the behaviour of the device and the application. Therefore each was asked to evaluate these parameters as well in question six. As a final question, after the test, the users were asked to state whether or not they would like to use this application in their daily life. As the t-test shows, the results for this question are statistically significant and one should note this indicates that the users found value in using the application in a daily life.
6 Concluding Remarks Context and context-awareness have long been acknowledged as important and have generated considerable research effort. However, integration into PIMs has so far been limited and the perspective has often been single-dimensional. In this article, the main aspects of the design, implementation and evaluation of an application prototype which integrates context / context-awareness into a PIM from a novel threedimensional perspective combining social-, geographical- and activity information have been presented. User evaluation of the proof of concept displayed a strong positive bias, highlighting its potential usefulness and applicability. Based on the developed prototype, we have shown the viability and usefulness of our approach and we do believe that tailoring information in the manner described in this paper takes the PIM concept one step further towards the ideal of providing tailored and timely information to mobile information users everywhere.
References 1. Bilandzic, M., Foth, M., Luca, A.: CityFlocks: Designing Social Navigation for Urban Mobile Information Systems. In: Proceedings ACM Designing Interactive Systems (2008) 2. Dey, A.K., Abowd, G.: Towards a Better Understanding of Context and ContextAwareness. In: Gellersen, H.-W. (ed.) HUC 1999. LNCS, vol. 1707, p. 304. Springer, Heidelberg (1999) 3. Dourish, P.: What we talk about when we talk about context. Journal of Personal and Ubiquitous Computing 8, 19–30 (2004) 4. Edwards, K.: Putting Computing in Context. ACM Transactions on Computer-Human Interaction 12(4), 446–474 (2005) 5. Hertzog, P., Torrens, M.: Context-aware Mobile Assistants for Optimal Interaction: a Prototype for Supporting the Business Traveler. In: Proceedings of the 9th international conference on Intelligent User Interfaces, pp. 256–258 (2004) 6. HTC, The HTC 3600 (2007), http://www.htc.com/product/03-product_p3600.htm (June 1, 2007) 7. Kaasinen, E.: User needs for location-aware mobile services. ACM Personal and Ubiquitous Computing 7(1), 70–79 (2003) 8. Ludford, P., Rankowski, D., Reily, K., Wilms, K., Terveen, L.: Because I carry my cell phone anyway: functional location-based reminder applications. In: Proceedings of Conference on Human Factors in Computing Systems, April 2006, pp. 889–898 (2006)
38
T.-M. Grønli and G. Ghinea
9. Prekop, P., And Burnett, M.: Activities, Context and Ubiquitous Computing. Journal of Computer Communications, Special Issue on Ubiquitous Computing 26, 1168–1176 (2003) 10. Rodden, T., Cheverest, K., Davies, K., Dix, A.: Exploiting context in HCI design for mobile systems. In: Workshop on Human Computer Interaction with Mobile Devices (1998), http://www.dcs.gla.ac.uk/~johnson/papers/mobile/HCIMD1.html 11. Zhou, M., Houck, K., Pan, S., Shaw, J., Aggarwal, V., Wen, Z.: Enabling ContextSensitive Information Seeking. In: Conference of Intelligent User Interfaces, January / February 2006, pp. 116–123 (2006)
CLURD: A New Character-Inputting System Using One 5-Way Key Module Hyunjin Ji1 and Taeyong Kim2 1 CLURD, 211-903, Hyundai Apt. Guui 3-dong, Kwangjin-gu, Seoul, Korea 2 School of Journalism & Communication, Kyunghee University, Hoegi-dong, Dongdaemun-gu, Seoul, Korea
[email protected],
[email protected]
Abstract. A character inputting system using one 5-way key module has been developed for use in mobile devices such as cell phones, MP3 players, navigation systems, and remote controllers. All Korean and English alphabet characters are assembled by two key clicks, and because the five keys are adjacent to each other and the user does not have to monitor his/her finger movements while typing, the speed of generating characters can be extremely high and its convenience is also remarkable. Keywords: Character Input, Typing, 5-way Key Module, Mobile Device, Keyboard, Wearable Computer.
1 Background People use cell phones, MP3 players, navigation systems, and remote controllers almost every day. Since these devices have to be small enough to hold and carry, only a limited number of keys can be installed in them. A challenge rises in that we have an increasing level of need to input text data using these devices. Therefore, with a few exceptions like Blackberry, which has 26 separate keys assigned for 26 English alphabet characters, device manufacturers has employed various methods that make possible for the users to input all characters conveniently with a small number of keys. The oldest and most popular method should be the one we have in our telephones. This is a sort of 'toggle' method, in which each numerical key from '2' to '9' is made to correspond to 3-4 character keys, so that a desired character is selected and inputted. Fig. 1 is the key layout of this method.
1 2 ABC 4 GHI 5 JKL 7 PQRS 8 TUV 0
3 DEF 6 MNO 9 WXYZ
Fig. 1. This may be the most widely applied character inputting system as it is installed in most of the traditional telephones J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 39–47, 2009. © Springer-Verlag Berlin Heidelberg 2009
40
H. Ji and T. Kim
A serious limitation of this method is experienced when a user tries to input a word like ‘feed.’ Since the characters ‘d,’ ‘e,’ ‘f’ are assigned to the same key, the user has to either wait awhile or input a splitter signal in order to input ‘e’ after ‘f.’ It is because the phone does not understand whether the user attempts to input a new character, which happens to be assigned to the same key, or change the inputted character to the next character assigned to the same key. In Korea, since the 'toggle' method requires too many times of manipulation when characters are inputted, to thus consume much time, other methods for inputting characters have been developed. The 'chun-ji-in' method and the 'naragul' method are two well-known examples. However, even with these widely-adopted methods in Korea, complicated and burdensome key manipulations are needed for inputting, and therefore characters cannot be speedily inputted. A source of the problem is that, like the old method of inputting English alphabet characters mentioned above, the two methods use 11 or 12 keys that are spread widely over the surface of the device. Another is that, in case of chun-ji-in, a splitter is required for the same reason that the old English inputting method is inconvenient, and in case of naragul, two modifier keys have to be used very frequently to change an inputted character into others that have the same root. Because of these limitations, a user has to move his/her fingers around the 11 or 12 keys busily and click as many as three keys located apart to input one character.
2 The CLURD System In order to overcome the limitations of the methods mentioned above, a new method of inputting characters using a 5-way key module has been developed and named “CLURD,” which stands for Center-Left-Up-Right-Down. 5-way key module comprises a center key disposed in the center, and an upper key, a lower key, a left key and a right key which are disposed in the top, bottom, left and right sides of the center key, respectively. Typical examples of the module are shown in Fig. 2.
Fig. 2. Typical 5-way key modules that can adopt the CLURD system
Even though there are only five keys in the key module, the CLURD system does not use the ‘toggle’ method. Instead, the system makes it a rule to assemble a character with two key clicks. Theoretically, a total of 25 combinations can be created with two clicks of five keys(5x5). If these 25 combinations are assigned to 24 Korean characters and a space datum respectively, it should be feasible to input Korean words
CLURD: A New Character-Inputting System Using One 5-Way Key Module
41
with no additional splitter or modifier keys. The combinations for the 24 Korean characters are illustrated in Figs. 3 & 4. The only unassigned combination is CenterCenter, and this is used to input a space datum. Incidentally, the data generator turns
ㄱ ㄷ ㅂ ㅅ
ㅈ
ㄲ ㄸ ㅃ ㅆ
ㅉ
‘ ’, ‘ ,’ ‘ ,’ ‘ ,’and ‘ ’ into ' ', ' ', ' ', ' ', and ' ,' respectively when the second key is pressed longer than a predetermined time. As shown in Figs. 3 & 4, the combinations are not randomly assigned to characters. Rather, the combination for each character is determined based on the geometric shapes of the keys. (See the first round-shape key module in Fig. 2.) The shapes of five keys are apparently different, and if the shapes of the two keys to be clicked are combined, they should resemble the shape of each of the 24 Korean characters fairly closely.
Fig. 3. CLURD Combinations for 10 Korean Vowels
Fig. 4. CLURD Combinations for 14 Korean Consonants
42
H. Ji and T. Kim
The CLURD system works for English alphabet characters in the same logic. As in the case of Korean, the combinations are thoughtfully assigned to characters so that they resemble the shapes of the characters as closely as possible. As shown in Fig. 5, the data generator assembles 'n', 'D', 'U', and 'C' among the English alphabetical characters if a signal generated by clicking each key combination of Up-Center, RightCenter, Down-Center, and Left-Center is input. The data generator assembles 'm', 'B', 'W', and 'E' if a signal generated by clicking each key combination of Up-Up, RightRight, Down-Down, and Left-Left is input.
Fig. 5. CLURD Combinations for English characters – Group 1
Fig. 6. CLURD Combinations for English characters – Group 2
CLURD: A New Character-Inputting System Using One 5-Way Key Module
43
Fig. 7. CLURD Combinations for English Characters – Group3
The Figs. 6 and 7 illustrate the key combinations of 'A', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'O', 'P', 'Q', 'R', 'S', 'T', 'X' and 'Z'. The data generator assembles 'V' and 'Y' as it does 'U' and 'T', respectively, however the second key should be pressed longer than a predetermined time. In the case that the language mode is set to be the Chinese Pinyin, ' ' is generated instead of 'V'.
Ü
3 Functionality and Usability of the CLURD System The CLURD system is compared with the existing methods, that is, the 'chun-ji-in' Korean character inputting method which is used in mobile phone terminals produced in Samsung Electronics Co., Ltd., the 'naragul' Korean character inputting method which is used in mobile phone terminals produced in LG Electronics Co., Ltd., and the Korean character inputting method which is used in mobile phone terminals produced in SK Telecom Co., Ltd., and illustrated in Table 1. In Table 1, the numeric values are theoretical input time expressed in units of milliseconds (ms; 1/1000 seconds). The numeric values in the parentheses means the number of the input typographical hits. These data in the specification have been Table 1. Theoretical comparison with the existing methods in light of typing speed and number of key clicks
Sentence to be input
사랑해 지금 전화해줘 늦을 것 같으니까 조금만 기다려
Samsung
LG
SK
CLURD
8975(13) 19800(28)
6740(9) 19070(27)
6805(10) 18935(24)
6480(16) 14580(36)
40115(54)
36435(46)
37350(50)
29970(74)
44
H. Ji and T. Kim
cited from the paper of Kim S., Kim K., Myung R. [1], and the data of CLURD are calculated in the same manner by the present authors. The CLURD system uses five keys that are adjacent to one another and located below a thumb of the user. Thus, a time consumed for moving fingers is greatly reduced compared with the existing methods, and the user’s gaze needs not to be shifted between the screen and keys because the user can easily locate five adjacent keys forming a ‘+’ shape located under his/her thumb. Thus, although the number of typographical hits increases by 50%, the total input time greatly decreases in comparison with the existing methods. Here, since the above-indicated times are theoretical input times calculated by a formula, and are presented for a purpose of comparison, an input can be performed within a shorter or longer time in all systems described. Also, the CLURD system has a merit in that the user is capable of inputting characters with the same hand holding his/her phone. The CLURD system can be used, though some level of caution is necessary for a safety reason, even when the user walks, drives a car, or puts the hand taking hold of the mobile device under table or in his/her pocket. The CLURD system does not have the kinds of problems that the existing methods have. That is, it does not need a splitter or a modifier key. The users only have to recall the combinations and input in the device using the key module. Also, the fact that the CLURD system has the space datum included in the key combinations improves its usability, according to the testers who experienced the system for a few days.
4 Limitation As the CLURD system works based on the predetermined key combinations for Korean and English characters, the users need to invest some time and efforts to Table 2. Results of memory test using an instructional animation (Korean)
CLURD: A New Character-Inputting System Using One 5-Way Key Module
45
memorize the combinations. Thanks to the fact that the combinations are not random, but associated with the shapes of the characters fairly closely, the memory burden seems not significant. Table 2 is the results of a memory test with junior high school students.
5 Augmentation of the CLURD System For the case in which the user of the CLURD system decides not to monitor the screen while typing for any reason, it should be better to add some sort of function notifying him/her in the case in which an error has occurred. Because the user’s vision is likely to have been allocated to another task in such a situation, it should be wise to utilize another sensory modality. A short vibration may be a good idea in case of a cell phone. Also, if the CLURD system is used as the main communicating device by speech-handicapped persons, it can be combined with a text to speech (TTS) module which converts texts into sounds.
6 Devices Using the CLURD System In the case of cell phone, PDA or wearable computer, the CLURD system can be used as a substitute for a regular keypad/keyboard. In the case of the remote controller for a digital TV set, the CLURD system is used for the purpose of searching program titles, inputting identification and password, or writing in the user’s address to order a product. Some examples of devices(cell phone, TV remote, door lock, typing mouse) are shown in Fig. 8. Other examples, though not shown, may include watch phones, wearable computers, and so on. For watch phone, the 5-way key module can be installed on the surface of the window in the form of transparent thermal touch pad. A watch phone can be used as an inputting device for
Fig. 8. Cell phone, remote controller for digital TV set, door lock, typing mouse with the CLURD system installed (Conceptual illustrations or prototype models)
46
H. Ji and T. Kim
Fig. 9. The on-screen keypad and the palm-size full-function keyboard with the CLURD system installed (Available in market); The palm-size keyboard was developed with a financial support from the Korea Agency for Digital Opportunity and Promotion
wearable computers with a wireless connection to the CPU. The on-screen keypad program and the palm-size keyboard shown in Fig. 9 are the first official products in market (introduced in April 2009) that adopt the CLURD system.
7 Conclusion As described above, CLURD is a character inputting system suitable for a mobile device where a sufficient number of keys cannot be arranged due to space restriction. Besides the fact that this character inputting system takes up little space on the surface of a device, the system has remarkable edges that other existing methods cannot materialize. First, it requires only one or two fingers for operation, and this allows the user to allocate the other hand and/or the rest of fingers to other concurrent tasks. This also means that the physically handicapped people who can use only one hand or one finger can type characters speedily. In fact, a Korean lady who is partly paralyzed and thus can use only one hand has passed an official computer literacy test that requires a fair speed of typing only with this palm-size keyboard. Another edge of the CLURD system is that the user does not have to look at the keys while typing since his/her finger knows where the five keys are located. This allows him/her to monitor whether he/she makes any typo by fixing his/her gaze to the screen. The CLURD system is also useful to the blind, especially when they take a note reading a book in Braille. They may read a book with left hand and type in a computer using the CLURD keyboard. (Taking a note on paper is meaningless for them because they cannot read it later.) A possible application for the mute is a hand-held device in which the CLURD system is installed together with a TTS system so that when the user presses the TTS button the device reads what he/she has just typed in the device using the CLURD system. The CLURD system can be nicely installed in the form of a virtual keypad in touch-screen interface. This has been confirmed with the on-screen keypad program designed for tablet computers and the prototype model of typing mouse. The time and efforts needed to get familiar with the key combinations may be the only limitation with the CLURD system. A good sign was that most of the testers (over 2,000 people) who have used add-on software designed for cell phone stated
CLURD: A New Character-Inputting System Using One 5-Way Key Module
47
that the time and efforts needed for learning the combinations were quite minimal and its functionality and usability were remarkable enough for them to willingly make such an investment.
Reference 1. Kim, S., Kim, K., Myung, R.: Hangul input system’s physical interface evaluation model for mobile phone. Journal of Korean Institute of Industrial Engineers 28(2), 193–200 (2002)
Menu Design in Cell Phones: Use of 3D Menus Kyungdoh Kim1, Robert W. Proctor2, and Gavriel Salvendy1,3 1
School of Industrial Engineering, Purdue University, 315 N. Grant St., West Lafayette, IN 47907 USA
[email protected] 2 Department of Psychological Sciences, Purdue University, 703 Third St., West Lafayette, IN 47907 USA
[email protected] 3 Department of Industrial Engineering, Tsinghua University, Beijing, P.R. China
[email protected]
Abstract. The number of mobile phone users has been steadily increasing due to the development of microtechnology and human needs for ubiquitous communication. Menu design features play a significant role in cell phone design from the perspective of customer satisfaction. Moreover, small screens of the type used on mobile phones are limited in the amount of available space. Therefore, it is important to obtain good menu design. Review of previous menu design studies for human-computer interaction suggests that design guidelines for mobile phones need to be reappraised, especially 3D display features. We propose a conceptual model for cell phone menu design with 3D displays. The three main factors included in the model are: the number of items, task complexity, and task type. Keywords: cell phones, menu design, 3D menu, task complexity, task type.
1 Introduction The number of mobile phone users has been steadily increasing due to the development of microtechnology and human needs for ubiquitous communication. People use mobile phones to communicate with their friends, family, and business partners, and also to obtain information through the mobile Internet. Moreover, people use embedded mobile phone features such as games, cameras and wireless Internet for various purposes of entertainment and shopping. Due to increasing features, mental workload of using cell phones has increased. Ling et al. [1] prioritized the design features and aspects of cell phones based on users’ feedback to optimize customers’ satisfaction. Although physical appearance and body color of cell phones had considerable influence on overall user satisfaction, menu design features also played a significant role. Therefore, obtaining a good menu design in cell phones is an important issue. There has been a lot of research about menu design for computers. When it comes to menu dimensions, many researchers have concluded that performance time and errors increase as the hierarchical levels of the menu structure increase [2, 3]. With J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 48–57, 2009. © Springer-Verlag Berlin Heidelberg 2009
Menu Design in Cell Phones: Use of 3D Menus
49
regard to menu type, hierarchical menus are more accurate and faster than fisheye menus [4]. Three-dimensional (3D) displays show many items of a menu at the same time, so they may give the same effect as a broader menu [5]. With regard to adaptability, computer menus that can be customized by users have been shown to be better than ones that adapt automatically [6]. Research on cell phone menu design is relatively recent. Geven, Sefelin, and Tscheligi [7] concluded that narrow hierarchies performed better than broader hierarchies in mobile devices, contrary to menu design in computers. With respect to menu type, Gutwin and Fedak [8] found that people were able to carry out a web navigation task better with the fisheye view than with alternatives. For adaptability, results have been similar to those for computer displays. Customized menus produced better performance and evaluation than the traditional static menu [9]. But, a lack of studies about 3D display for cell phones was found, and in this paper 3D design research is investigated in more detail. At this point, there are no standard interaction devices or interfaces used in 3D environments, and there is a lack of specific best practice guidelines to develop these 3D designs. 3D design is able to convey more information than text or twodimensional (2D) images, and it enhances the usability of the limited screen on a typical wireless device. Interactive 3D can therefore be used to remove some of the complexity and clutter present on menu systems of today’s handsets. 3D icons can be animated to show activity or changes in status, and the depth dimension can be utilized to show urgency or relative importance [10]. Therefore, new standards should be developed to allow personal digital assistants (PDAs) and mobile devices to render 3D applications. Review of previous menu design studies for human-computer interaction suggests that design guidelines for mobile phones need to be reappraised, especially 3D display features. To this end, the main objective of this paper is to propose an overall framework to develop mobile phone menu design guidelines regarding 3D displays. We review menu design components for computers in section 2 and investigate menu design factors for cell phones in section 3. Strengths and weaknesses of 3D design factors are considered in section 4. We compare menu design factors in section 5 and conclude after explaining a model of cell phone menu design in section 6.
2 Menu Design in Computers 2.1 Menu Dimension Many of the early studies of menu design for computers focused on the cognitive factors of a menu’s hierarchical structure and the structure’s impact on end users’ behaviors and performance in retrieving information. Out of this research, studies about whether it is better to have a broad or deep design have been conducted. Jacko et al. [2] suggested three components of hierarchical menu design: menu dimension, task complexity, and user knowledge structure. The results about the menu dimension supported that both performance time and errors increased as the levels of the menu structure increased. That is, depth in an information structure increases the likelihood of navigational errors and also increases performance time [11].
50
K. Kim, R.W. Proctor, and G. Salvendy
Seppala and Salvendy [3] also drew the conclusion that a broader mode of data presentation is more effective than a deeper one. Because searching back and forth through the menu system decreases the speed and accuracy of performance, the broader menu has better performance in the case of a personal computer. This is because increased depth involves additional visual search and decision-making, and greater uncertainty as to the location of target items due to the increased number of menu frames [12]. In other words, as the depth increases and the number of responses needed while going through a menu tree increases, more time for decision making and responding is required [3]. 2.2 Menu Type Menu structure can be classified as hierarchical and fisheye [4]. Fisheye is a menu display method that shows a region of the menu at high magnification, while items before and after that region are shown at gradually reduced sizes. Hornbaek and Hertzum [4] provided evidence that, for finding known items, conventional hierarchical menus were more accurate and faster than fisheye menus. Also, participants rated hierarchical menus as more satisfying than fisheye menus. For browsing tasks, the menus did not differ with respect to accuracy or selection time. Fisheye interfaces have an advantage in that they can accommodate many menu items in a limited amount of screen space by showing part of an information space at high magnification, while other parts are shown at low magnification to provide context. However, performance remained worse with fisheye menus than with hierarchical menus because the latter impose lower mental demands on users [4]. Within a hierarchical menu, cascading and indexed menus can be compared [13]. Participants searched three types of menu layouts: categorical index; horizontal cascading; vertical cascading. Search time differences between the three menu layouts were detected that strongly favored the index menu. One possible reason for this result is that the items in the index menus were in closer proximity. Another is that the index menus were centrally located on the screen, and thus would have been easier to see and acquire. 2.3 Adaptability Some commercial applications now have adaptable interfaces. For example, the Start Menu in Microsoft Windows XPTM has an adaptive function that provides automatically generated shortcuts to frequently used applications. Microsoft Office also provides Smart Menus, which are an adaptive mechanism where infrequently used menu items are hidden from view. Understanding these interfaces through strong empirical and theoretical studies is particularly important, because adaptive interfaces are now being introduced into productivity software and used by an increasing number of people [14]. Mitchell and Shneiderman [15] compared dynamic vs. static menus using a menu-driven computer program. Subjects who used adaptive dynamic menus for the first set of tasks were significantly slower than those who used static menus. Moreover, 81% of the subjects preferred working with static menus to working with dynamic menus. This preference
Menu Design in Cell Phones: Use of 3D Menus
51
likely is because dynamic menus can slow down first-time users, at least until they become accustomed to this interaction style. Findlater and McGrenere [6] compared the measured and perceived efficiency of three menu conditions: static, adaptable and adaptive. They found that users generally preferred the customizable version to the adaptive menus. In terms of performance, adaptive menus were not faster than either of the other conditions. User-driven customization is a more viable approach for personalizing user interfaces than systemdriven adaptation. The static menu was found to be significantly faster than the adaptive menu, and the adaptable menu was found to be significantly faster than adaptive menu under certain conditions. But, in terms of accuracy, there were no differences. However, the majority of users preferred the adaptable menu overall and ranked it first for perceived efficiency. Therefore, this study suggests that system-driven adaptation is not helpful.
3 Menu Design in Cell Phones 3.1 Menu Dimension As screens become smaller, the information they display changes more extensively with each scrolling action, making it more difficult to refocus on the page. In this way, screen size affects the navigation behavior and perceptions of mobile phone users [11]. Therefore, the breadth of information structures should be adapted to anticipated screen size. The advantage of depth is that it encourages funneling; the disadvantage is that it induces errors and increases the number of page transactions. On the other hand, the advantage of breadth is that it reduces navigation errors and the number of page transactions; the disadvantage is that it leads to crowding. Therefore, a user encountering greater depth has fewer options to process on a single page. Thus, the cognitive load on the user is reduced. Findings consistently have suggested an advantage of employing a deeper menu structure to achieve better user performance and accuracy. Geven et al. [7] showed that people perform better with narrow hierarchies than with broader hierarchies on small screens. Contrary to computers, where many options are usually presented at once, it is better to use a layered design in cell phones. Huang [16] showed that users prefer a less extensive menu structure on a smallscreen device. This result supports the recommendation of not having a broad menu structure on a small screen. With less space to display information, designers of cell phones tend to chunk menu items of a broader menu into several pages or screens. This chunking requires end-users to employ more scrolling operations, maintain more information in working memory, and engage in more searching and navigation behaviors. The consequence is to reduce the speed and accuracy in use of the menus. The following describes the two suggestions that Huang [16] developed: (1) Reduce both breadth and depth of the menu. (2) Instead of displaying only a limited number of items on one screen, include more menu items and options in one page.
52
K. Kim, R.W. Proctor, and G. Salvendy
Dawkins [9] also suggests that filling the screen as much as possible without requiring scrolling should be the ideal breadth of the menu. 3.2 Menu Type and Adaptability Many of the current visualization methods aimed at small screens rely on distorting the view. The viewpoint information is manipulated in a way that enables seeing important objects in detail, and the whole information space can be displayed at once with very low amount of detail [17]. The rubber sheet is one of view distortion techniques that allow the user to choose areas on the screen to be enlarged. Zooming and zoomable user interfaces (ZUI) are another way of presenting large information spaces even on a small screen. Combs and Bederson [18] studied image browsers and found that their system, based on a ZUI method (as well as 2D thumbnail grid), outperformed 3D browsers in terms of retrieval time and error rate. Displaying the overview and the detail at the same time is also more beneficial than the traditional linear format because the global context allows faster navigation [8]. Gutwin and Fedak [8] found that people were able to carry out a web navigation task better with the fisheye view. Some phones are already being designed with a fisheye display for selected items to be salient and clear. Therefore, a fisheye menu may be better than a 2D hierarchical menu. In computers, users can create folders, reorder the layout, and make shortcuts. But a mobile phone has limited screen size and a small input device. Moreover, telecommunication carriers want the buttons to be used for their wireless Internet service. They are therefore reluctant to offer many customization functions to users. In other words, mobile phones do not provide enough adaptation functions. Dawkins [9] evaluated personalized menus alongside a traditional static menu structure based on user preference and performance. He concluded that customized menus had better performance and evaluation than the traditional static menu. Therefore, customers seem to want more customization functions in their cell phones from the perspectives of performance and satisfaction.
4 3D Design 4.1 Benefits of 3D Design Human information-processing has evolved to recognize and interact with a 3D world. And the 3D design space is richer than the 2D design space, because a 2D space is part of 3D space. It is always possible to flatten out part of a 3D display and represent it in 2D [19]. Therefore, it is unsurprising that 2D interfaces have performed relatively poorly. For example, Ware and Franck [20] conducted an experiment that was designed to provide quantitative measurements of how much more (or less) can be understood in 3D than 2D. Results showed that the 2D interface was outperformed by 3D interfaces. These results provide strong reasons for using advanced 3D graphics for interacting with a large variety of information structures [20].
Menu Design in Cell Phones: Use of 3D Menus
53
The 3D interfaces make it possible to display more information without incurring additional cognitive load, because of pre-attentive processing of perspective views (e.g., smaller size indicates spatial relations at a distance). An ability to recognize spatial relations based on 3D depth cues makes it possible to place pages at a distance (thereby using less screen space) and understand their spatial relations without effort [21]. As described before, there are many 3D depth cues that can be provided to facilitate spatial cognition. The most obvious of these are perspective view and occlusion. Using these cues, the user gets the advantages of a 3D environment (better use of space, spatial relations perceived at low cognitive overhead, etc.). 3D allows larger menu items than the screen size. This would be a desirable feature for small screens that have a restricted screen resolution and size [22]. The effect of 3D is to increase the effective density of the screen space in the sense that the same amount of screen can hold more objects, which the user can zoom into or animate into view in a short time. It seems reasonable that 3D can be used to maximize effective use of screen space [23], especially in cell phones for which the screens are small screens. The use of 3D models on the Internet is gaining popularity, and the number of 3D model databases is increasing rapidly because 3D interfaces enable a more natural and intuitive style of interaction [24]. Since the use of 3D models is becoming more common on various cellular phone web sites, development of algorithms that retrieve similar information will be important in cell phone menu design [25]. 4.2 Weaknesses of 3D Design Creating a 3D visualization environment is considerably more difficult than creating a 2D system with similar capabilities. As the study of Cockburn and McKenzie [26] suggests, one should not assume that use of 3D provides more readily accessible information. In determining whether to implement a 3D display, designers should decide whether there are enough subtasks that would benefit from 3D representations. The complexity and the consistency of the user interface for the whole application must also be weighed in the decision. In the study of Ware [19], 3D navigation methods took considerably longer than 2D alternatives. Even if somewhat more information can be shown in 3D than in 2D, the rate of information access may be slower, and 3D applications may have greater visual complexity than 2D applications [27]. People often find it difficult to understand 3D spaces and to perform actions in them. It is clear that simply adapting traditional WIMP (windows, icons, menus, and pointers) interaction styles to 3D does not provide a complete solution to this problem. Rather, novel 3D user interfaces, based on interactions with the physical world, must be developed. Jones and Dumais [28] have suggested that little significant value is provided by adding physical location information to the storage and subsequent retrieval of a document over and above simply providing a semantic label for the same purposes. 4.3 Direct Comparison between 2D and 3D Few prior studies have directly compared 2D and 3D interactive systems. Also, there is a surprising lack of empirical research into the benefits (or costs) that are produced
54
K. Kim, R.W. Proctor, and G. Salvendy
by moving from 2D to 3D. Cockburn and McKenzie [29] compared subject’s efficiency in locating files when using Cone-Trees (a 3D technique for exploring hierarchical data structures) and when using a ‘normal’ folding tree interface similar to that used in Windows Explorer. Results showed that the subjects took longer to complete their tasks when using the cone interface. They rated the cone interface as poorer than the normal one for seeing and interacting with the data structure. Also, Cockburn and McKenzie [26] showed no significant difference between task performance in 2D and 3D, but a significant preference for the 3D interfaces. Recently there has been a growth of interest in 3D interactive systems for everyday ‘desktop’ computing applications, such as document and file management. However, the relative value of the third visual dimension in cell phone menu design has not previously been evaluated.
5 Models for Cell Phone Menu Design Jacko et al. [2] proposed modifications to an information-processing model developed by Salvendy and Knight [30]. In this model, three constructs of hierarchical menu retrieval were proposed: menu dimension, task complexity, and knowledge structure. Figure 1 illustrates a version of Jacko et al.’s [2] information-processing model extended to cell phone menu retrieval operation. The model takes advantage of the natural 3D human information-processing capabilities for cell phone menu interfaces, with distinctions similar to those identified by Jacko et al. The three main factors for cell phone menu design within 3D display included in the model are: the number of items, task complexity, and task type. Cell phones support more features such as broadcasting, mobile wallet and health condition sensor, etc. This is consistent with an issue raised by Norman [31], which is “a tendency to add to the number of features that a device can do, often extending the number beyond all reasons” (p. 173). With human cognitive limitations, a cell phone with too many features may overwhelm users due to its complexity [1]. Under these circumstances, it is important to investigate how the number of items can influence 3D menu design in cell phones. The number of items could influence menu dimensions, resulting in effects on perception, cognition, and motor response time. In this way, the number of items is an important characteristic of a virtual menu that will influence the item selection time. Moreover, inclusion of many menu items may decrease the usability of a 2D display solution. Therefore, deciding whether or not to use 3D design should depend on the number of items per menu screen. Task complexity can impact performance and satisfaction of 3D menu design because in a 3D environment the spatial relationships are perceived at low cognitive overhead [22]. Thus, performing a complex task may be better in a 3D environment than in a 2D environment. On the other hand, a 3D display sometimes has greater visual complexity. Therefore, direct comparisons between 2D and 3D menus for different levels of task complexity are needed. Task type influences the perceptual information required, the cognition operations involved in using that information, and necessary motor responses. Experiments need to be conducted to validate the proposed conceptual model.
Menu Design in Cell Phones: Use of 3D Menus
55
Fig. 1. Modified Information-processing Model for Cell Phone Menu Operation
6 Conclusion The widespread use of cell phones for a variety of purposes provides evidence that they are shifting from just a communication tool to being an integral part of people’s everyday life. It is important to study cell phone menu design because, though menu design plays a crucial role in cell phone usability, little work exists on developing cell phone menu design. Three factors were identified that may influence performance of menu retrieval tasks with 2D and 3D displays in cell phones: the number of items, task complexity, and the type of tasks. These three factors are included in the proposed conceptual model for cell phone menu design with 3D displays. Research designed to validate this model should provide insights into the human information-processing requirements of various cell phone menu interfaces.
References 1. Ling, C., Hwang, W., Salvendy, G.: A survey of what customers want in a cell phone design. Behaviour & Information Technology 26, 149–163 (2007) 2. Jacko, J.A., Salvendy, G., Koubek, R.J.: Modelling of menu design in computerized work. Interacting with Computers 7, 304–330 (1995) 3. Seppala, P., Salvendy, G.: Impact of depth of menu hierarchy on performance effectiveness in a supervisory task: computerized flexible manufacturing system. Human Factors 27, 713–722 (1985)
56
K. Kim, R.W. Proctor, and G. Salvendy
4. Hornbaek, K., Hertzum, M.: Untangling the usability of fisheye menus. ACM Trans. on Computer-Human Interaction, Article 6, 1–32 (2007) 5. Dachselt, R., Ebert, J.: Collapsible cylindrical trees: a fast hierarchical navigation technique. In: Information Visualization, INFOVIS 2001, pp. 79–86 (2001) 6. Findlater, L., McGrenere, J.: A comparison of static, adaptive, and adaptable menus. In: Proceedings of the 2004 conference on Human Factors in Computing Systems, pp. 89–96 (2004) 7. Geven, A., Sefelin, R., Tscheligi, M.: Depth and breadth away from the desktop: the optimal information hierarchy for mobile use. In: Proceedings of the 8th conference on Human-Computer Interaction with Mobile Devices and Services, pp. 157–164 (2006) 8. Gutwin, C., Fedak, C.: Interacting with big interfaces on small screens: a comparison of fisheye, zoom, and panning techniques. In: Proceedings of the 2004 conference on Graphics Interface, pp. 145–152 (2004) 9. Dawkins, A.L.: Personalized Hierarchical Menu Organization for Mobile Device Users. Vol. Master. North Carolina (2007) 10. Beardow, P.: Enabling Wireless Interactive 3D. article retrieved from the Superscape Plc (June 2004), http://www.superscape.comin 11. Chae, M., Kim, J.: Do size and structure matter to mobile users? An empirical study of the effects of screen size, information structure, and task complexity on user activities with standard web phones. Behaviour and Information Technology 23, 165–181 (2004) 12. Jacko, J.A., Salvendy, G.: Hierarchical menu design: Breadth, depth, and task complexity. Perceptual and Motor Skills 82, 1187–1201 (1996) 13. Bernard, M., Hamblin, C.: Cascading versus Indexed Menu Design. Usability News 5 (2003) 14. Gajos, K.Z., Czerwinski, M., Tan, D.S., Weld, D.S.: Exploring the design space for adaptive graphical user interfaces. In: Proceedings of the working conference on Advanced Visual Interfaces, pp. 201–208 (2006) 15. Mitchell, J., Shneiderman, B.: Dynamic versus static menus: an exploratory comparison. ACM SIGCHI Bulletin 20, 33–37 (1989) 16. Huang, S.C.: Empirical Evaluation of a Popular Cellular Phone’s Menu System: Theory Meets Practice. Journal of Usability Studies, 136–150 (2006) 17. Hakala, T., Lehikoinen, J., Aaltonen, A.: Spatial interactive visualization on small screen. In: Proceedings of the 7th international conference on Human Computer Interaction with Mobile Devices & Services, pp. 137–144 (2005) 18. Combs, T.T.A., Bederson, B.B.: Does Zooming Improve Image Browsing? In: Proceedings of the fourth ACM Conference on Digital Libraries, pp. 130–137 (1999) 19. Ware, C.: Information Visualization: Perception for Design. Morgan Kaufmann, San Francisco (2004) 20. Ware, C., Franck, G.: Evaluating stereo and motion cues for visualizing information nets in three dimensions. ACM Transactions on Graphics 15, 121–140 (1996) 21. Robertson, C.M., Larson, K., Robbins, D.C., Thiel, D., van Dantzich, M.: Data mountain: using spatial memory for document management. In: Proceedings of the 11th annual ACM Symposium on User Interface Software and Technology, pp. 153–162 (1998) 22. Rekimoto, J.: Tilting operations for small screen interfaces. In: Proceedings of the 9th annual ACM Symposium on User Interface Software and Technology, pp. 167–168 (1996) 23. Robertson, C.S.K., Mackinlay, J.D.: Information visualization using 3D interactive animation. Communications of the ACM 36, 57–71 (1993)
Menu Design in Cell Phones: Use of 3D Menus
57
24. Molina, J.P., Gonzalez, P., Lozano, M.D., Montero, F., Lopez-Jaquero, V.: Bridging the Gap: Developing 2D and 3D User Interfaces with the IDEAS Methodology. In: Jorge, J.A., Jardim Nunes, N., Falcão e Cunha, J. (eds.) DSV-IS 2003. LNCS, vol. 2844, pp. 303–315. Springer, Heidelberg (2003) 25. Suzuki, M.T., Yaginuma, Y., Sugimoto, Y.Y.: A 3D model retrieval system for cellular phones. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3846–3851 (2003) 26. Cockburn, A., McKenzie, B.: 3D or not 3D?: evaluating the effect of the third dimension in a document management system. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 434–441 (2001) 27. van Dam, A.: Post-WIMP user interfaces. Communications of the ACM 40, 63–67 (1997) 28. Jones, W.P., Dumais, S.T.: The spatial metaphor for user interfaces: experimental tests of reference by location versus name. ACM Transactions on Information Systems 4, 42–63 (1986) 29. Cockburn, A., McKenzie, B.: An Evaluation of Cone Trees. People and Computers XIVUsability Or Else! 425–434 (2000) 30. Salvendy, G., Knight, J.: Psychomotor work capabilities. In: Salvendy, G. (ed.) Handbook of Industrial Engineering, pp. 1–5 (1982) 31. Norman, D.A.: The Psychology of Everyday Things. Basic Books, New York (1988)
Mobile Interfaces in Tangible Mnemonics Interaction Thorsten Mahler, Marc Hermann, and Michael Weber Institute of Media Informatics University of Ulm, Ulm, Germany {thorsten.mahler,marc.hermann,michael.weber}@uni-ulm.de
Abstract. The Tangible Reminder Mobile brings together tangible mnemonics with ambient displays and mobile interaction. Based on the Tangible Reminder Project we present a new interface for mobile devices that is capable of viewing and editing data linked to real world objects. An intelligent piece of furniture equipped with RFID-sensors and digitally controlled lighting keeps track of appointments linked to real world objects that are placed in its trays. The mobile interface now allows the complete waiving of classic computer interaction for this ambient shelf. Instead, by implementing the toolglas metaphor, the mobile interface can be used to edit and view linked data to objects.
1
Introduction
Mark Weiser [1] formulated in 1991 his vision of ubiquitous computing stating the goal of bringing together the virtual world and the real world in a constant intertwining. The direct result for human computer interaction is that the interaction with real life artefacts can affect the virtual representations, thus making virtual objects easily graspable. According to Holmquist et al. [2] a lot of research is done in this domain focussing on different aspects depending on the primary goal they pursue: graspable interface, tangible interface, physical interface, embodied interface, to name just a few. Whatever the name, they are all unified by their common goal, to enrich the interaction with virtual objects by physicality. One of the first projects to describe this linkage between physical objects and virtual representations is Ishii’s and Ullmer’s paper ”Tangible Bits” [3]. Based on their observation with coupling everyday objects and virtual information, ”Bits and Atoms”, they tackle an interesting question: how can physical artefacts become interfaces for virtual objects and, from our point of view even more important, how can these interfaces be recognized? Their solution for this problem is the introduction of virtual light or virtual shadow. Both concepts show the information in a glow or shadow around the physical object in question. For example the software on the new Microsoft surface [4] table makes use of this concept. However, the visualization of information into the very vicinity of real objects requires a technically very well equipped environment. The Project Urp J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 58–66, 2009. c Springer-Verlag Berlin Heidelberg 2009
Mobile Interfaces in Tangible Mnemonics Interaction
59
[5] uses a large display underneath the objects in order to view the virtual shadows. The SenseBoard uses a projector to display detailed information on a ”View Details” command puck when it is placed over an item on the SenseBoard [6]. The AuraOrb provides newsticker information when a eyecontact sensor detects the focus of a user on it [7]. But if neither the environment nor the object itself is equipped with displaying technology, the augmentation of the objects remain invisible. The solution for this problem is the use of other metaphors, for example the see-through metaphor or the magic lens [8] [9]. The information linked to a real object is not projected into the real world. Rather a mobile device shows the information linked directly on its screen. A mobile device also compensates the non-ubiquity of intelligent environments through mobility [10]. In the Tangible Reminder Project we now make use of this technique. The next section gives an overview of the first prototype of the Tangible Reminder. After that we will focus on the concept of the mobile interface and finally present the whole system with this new interface.
2
The Tangible Reminder
The Tangible Reminder is a device combining the benefits of ambient displays and tangible interaction with personal objects. It is designed as a tool to keep track of appointments and deadlines and particularly to remind the user on upcoming events. As shown in Fig. 1. the ambient display subsystem in our prototype consist of three trays in which freely chosen RFID-tagged objects can be placed. According to the urgency of the appointment linked to every object the trays are colored green, yellow or red. Showing the appropriate color, the display remains calm, avoiding distraction from other tasks until the deadline is due. After that it begins flashing, grasping the user’s attention. [11] A main idea was to let the user choose personal objects with specific associations in the topic of the appointments. We use RFID technology to identify the objects in the trays. As RFID-Tags have gotten cheaper and smaller, a user can tag any object he wants to. Choosing a personal object makes it easier to mentally link it with appointments. A study of van den Hoven et al. showed that nearly all personal souvenirs bear a mass of information for the owner [12]. The memories associated with the object can give a clue when spotting it in the tray, a mnemonic to remind the topic of the appointment. If more information is needed, the user has to interact with the system. In our first prototype we established a non-ubiquitous way of allowing this ”easy transition to more in-depth information”, one of the heuristics both for graphical user interfaces [13] and ambient displays [14]. If additional information was desired, the object had to be removed from the tray and put on an interaction platform connected with an ordinary computer or laptop. The information was then displayed on the monitor. Also, before usage an object had to be technically linked to an appointment and additional information like addresses, images etc. The inspection system and the input system were the same in the first prototype,
60
T. Mahler, M. Hermann, and M. Weber
Fig. 1. The ambient display subsystem of the Tangible Reminder, with an object in each tray, showing the different colors for different states
so the user was able to change linked information when an object was placed on the platform. This form of inspection and input on traditional computers is not very ubiquitous, so we will present an alternative to that approach here.
3
A Concept for Mobile Interfaces for Tangible Mnemonics
One goal of tangible interaction is the masking of computer interaction by implicit actions with real objects. But this masking is also the problem for tangible interaction systems application in real life. This masking hinders the use of ready-mades, objects present in everyday life as it clouds the linkage between the real object and the virtual data. Neither the linkage itself, i.e. which data an object represents can be seen nor it is clear how this data can be edited. Both issues are solved in the original Tangible Reminder system by braking with the paradigm of implicit interaction by simply using a laptop as editing and viewing station. While this is true for objects linked to absolute appointments, the first steps to further pushing the computer in the background is taken. The introduction of relative appointments supersede the explicit computer interaction. Instead, the simple act of putting an object in the tray already triggers the
Mobile Interfaces in Tangible Mnemonics Interaction
61
appointment. For instance, the simple act of putting a special tea cup into the Tangible Reminder shelf results in an alert 3 minutes in the future. Clearly, this way of editing linked data is an improvement. Nevertheless, the visualization of the linkage and thus the display of the data still involves the computer. To overcome this drawback, Ishii and Ullmer [3] have proposed to use digital shadows or digital light. Whilst this metaphor is clearly interesting and integrates nicely into the real world, it is also very demanding as it needs a lot of special hardware and sensor integration which is only present in special rooms today. Instead we decided to tackle the challenges of displaying linkage and editing linked data via the use of mobile devices. Not only can they be easily carried to the real object in question, but they also perfectly support the see-through metaphor of Bier et al. [8] A portable device in this case acts as a magical toolglas, showing the link to a real object when placed over it. In the display the user can see the linked data and manipulate it if desired. This metaphor is easy to understand and to use. Nevertheless, the decision in favor of small and portable devices does not completely hide the computer. It rather shifts the interaction part to a small device. This device, though a complete computer as well, is perceived as being much simpler and easier to use. It does not cloud the computer as such but it clouds its complexity [15]. With the adoption of small computing devices, especially cell-phones, in time even smart-phones this approach is getting even more appealing.
4
The Tangible Reminder Mobile
The ambient display subsystem (see Fig. 1) of the original Tangible Reminder follows Weiser’s vision of implicit computer interaction and fits nicely into the surrounding. It brings together small personal objects acting as mnemonics, with a calm but also demanding interface if needed. Therefore, we decided to keep this part of the Tangible Reminder unchanged but to completely replace the input and inspection subsystem. 4.1
Interaction with the Tangible Reminder Mobile
To interact with an object in the Tangible Reminder Mobile system it is sufficient to simply put the PDA, containing the new mobile input and inspection subsystem, near an object enhanced with an RFID. Via the integrated RFIDReader the Tangible Reminder Mobile recognizes that the object nearby can be associated with virtual data. It queries the database and retrieves the stored data. This data is shown on the display pane. Fig. 2 depicts the scanning of a globe, which is in this case associated with a journey to San Diego. Besides the time additionally the reminder period is shown. The simple act of moving the Tangible Reminder Mobile near an object shows its linkage and capacities. The act of posing the Tangible Reminder Mobile over an object results in showing the virtual content of a real object as if watched through a magic lens for digital data.
62
T. Mahler, M. Hermann, and M. Weber
Fig. 2. Scanning the globe for the associated appointment
4.2
The Mobile Input and Inspection Subsystem
The subsystem on the PDA serves for two purposes: It shows the data linked to a certain object plus it is capable of editing this data or adding data if not yet stored with an object. Suitable objects contain a RFID-Chip simply sending an ID whenever a reader comes into close vicinity. With the readers we are using this reading distance is limited to about 5 cm which is very small. The reader therefor has to be posed directly over the object and its RFID to automatically raise the inspection screen. As we are dealing with appointments and reminders, the inspection screen shows the name of the appointment together with the exact date and the period after which a reminder should occur. Additionally a photograph of the associated real object can be shown. Fig. 3 shows this dialog in detail. In order to add or edit data stored with an object, the edit screen has to be opened. There all relevant data can be entered. To keep the interface usable we decided for special input fields like date-pickers to keep the pen interaction simple. The only text-field capable of free text is the appointment name. All other data can only be modified via controls. This better meets the accuracy of
Mobile Interfaces in Tangible Mnemonics Interaction
Fig. 3. The mobile device running the Tangible Reminder Mobile program, displaying the appointment view
63
Fig. 4. The appointment form allows for changing the associated appointment and the way of reminding
pen interaction on mobile devices as well as it reduces input errors. Fig. 4 shows an example for the input screen. Here the appointment linked with the globe is changed from an appointment on the 24th of December to a journey in June. 4.3
System Design and Changes Compared to the Original Tangible Reminder
To allow for the nice and simple magic-lens approach we decided to port only the input and inspection subsystem to the PDA while leaving the underlying concept unchanged to further support the display subsystem. The original Tangible Reminder makes use of RFID-Chips to recognize objects and to link them to appointments. There are different ways of making a small device capable of reading RFIDs. Instead of using an extra device, we equipped our PDA with a SD-Card-RFID-Reader. This solution integrated the Reader into the PDA and keeps the system small instead of using separated devices. The laptop in the original Tangible Reminder did not only work as an input device, it also kept the data stored for the linked objects. This data was retrieved by the display system to control the reminder functions of the shelf. The new Tangible Reminder system separates this earlier intertwining and divides the system according to their functions. The virtual data is now stored on a server
64
T. Mahler, M. Hermann, and M. Weber
that provides all domain related functions via a web-service. This service can be contacted and controlled by the display subsystem of the shelf as well as the mobile magic-lens subsystem. Thus, either subsystem has to deal only with the exact functions it has to fulfill. We rather make the virtual space underlying the real world accessible independent of environmental hardware by lending real objects a mobile interface, in our case rendering the Tangible Reminder system completely independent of traditional computer interaction.
5
Conclusion
With the Tangible Reminder Mobile system we presented a mobile interface for an ambient mnemonic system, the Tangible Reminder. This system brings together ambient displays with tangible interaction by reminding a user on appointments previously linked to everyday objects, functioning as mnemonics. To establish the linkage between real life objects and virtual data we needed a classic graphical computer interface in the original system. The Tangible Reminder Mobile System now provides an interface on a PDA making the Laptop as input device unnecessary. It thereby renders the Tangible Reminder system to an intelligent piece of furniture not recognized as computer interface. The mobile interface itself makes the usage of the Tangible Reminder more natural by implementing the magic lens metaphor. The smaller device not only makes the classic interface disappear, it also keeps the interface seemingly simpler. It combines the tools for information display and data manipulation and makes the interface mobile like the user chosen real life mnemonic objects. The interface therefor bridges the gap between the real and the virtual world and solves the problems of data manipulation and display in an elegant way. No classic computer interface is needed, just a PDA is used, a device that gets more and more common. The switch to this mobile interface is another step towards real implicit interaction with everyday objects with no computers visible, towards the vision of natural interaction with virtual data and ubiquitous computing alike.
6
Future Work
The Tangible Reminder Mobile system is just a step on the way to implicit and computer-less machine interaction. Further development is needed and planned in mobile interface improvement and in the field of implicit interaction. The integration of camera images to literally implement a see-through tool could make the interface more intuitive. However, from our point of view the benefit is marginal unless the system recognizes and marks the detected object which is a hard problem without marker usage. The next step therefor would be to recognize objects visually and superimpose digital data directly in the video image, implementing the digital shadow metaphor of Ishii and Ullmer [3].
Mobile Interfaces in Tangible Mnemonics Interaction
65
Another direction the Tangible Reminder can evolve is the use of real life artefacts that allow for programming by combination and handling of tool and mnemonic. This way, the computer could be made completely invisible and virtually superfluous. Yet, the problem of information display has to be solved. This could be done by switching to another media, sound and voice feedback for instance or attaching a display or projector to give visual feedback. Both extensions will make the Tangible Reminder an even more integrated intelligent piece of furniture not recognized as a common computer interface, lowering the inhibitions for computer usage.
Acknowledgments Many thanks to the programming team of the Tangible Reminder mobile project, Sung-Eun Kang and Michele Pinto, who worked on different aspects of mobile device portation and hardware integration as part of a practical course during their studies at the Institute of Media Informatics, University of Ulm.
References 1. Weiser, M.: The computer for the twenty-first century. Scientific American 265, 94–104 (1991) 2. Holmquist, L.E., Schmidt, A., Ullmer, B.: Tangible interfaces in perspective: Guest editors’ introduction. Personal Ubiquitous Comput. 8(5), 291–293 (2004) 3. Ishii, H., Ullmer, B.: Tangible bits: towards seamless interfaces between people, bits and atoms. In: CHI 1997: Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pp. 234–241. ACM Press, New York (1997) 4. Microsoft Corporation: Microsoft Surface (2008), http://www.microsoft.com/surface/ 5. Underkoffler, J., Ishii, H.: Urp: a luminous-tangible workbench for urban planning and design. In: CHI 1999: Proceedings of the SIGCHI conference on Human Factors in Computing Sstems, pp. 386–393. ACM Press, New York (1999) 6. Jacob, R.J.K., Ishii, H., Pangaro, G., Patten, J.: A tangible interface for organizing information using a grid. In: CHI 2002: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 339–346. ACM Press, New York (2002) 7. Altosaar, M., Vertegaal, R., Sohn, C., Cheng, D.: Auraorb: using social awareness cues in the design of progressive notification appliances. In: OZCHI 2006: Proceedings of the 20th Conference of the Computer-Human Interaction Special Interest Group (CHISIG) of Australia on Computer-Human Interaction: Design: Activities, Artefacts and Environments, pp. 159–166. ACM Press, New York (2006) 8. Bier, E.A., Stone, M.C., Pier, K., Buxton, W., DeRose, T.D.: Toolglass and magic lenses: the see-through interface. In: SIGGRAPH 1993: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, pp. 73–80. ACM, New York (1993) 9. Stone, M.C., Fishkin, K., Bier, E.A.: The movable filter as a user interface tool. In: CHI 1994: Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pp. 306–312. ACM Press, New York (1994)
66
T. Mahler, M. Hermann, and M. Weber
10. Mahler, T., Weber, M.: Mobile Device Interaction in Ubiquitous Computing. In: Advances in Human-Computer Interaction, pp. 311–330. In-Tech Education and Publishing (October 2008) ISBN 978-953-7619-15-2 11. Hermann, M., Mahler, T., de Melo, G., Weber, M.: The tangible reminder. Intelligent Environments. In: 3rd IET International Conference on Intelligent Environments, IE 2007, pp. 144–151 (September 2007) 12. van den Hoven, E., Eggen, B.: Personal souvenirs as ambient intelligent objects. In: sOc-EUSAI 2005: Proceedings of the 2005 Joint conference on Smart Objects and Ambient Intelligence, pp. 123–128. ACM Press, New York (2005) 13. Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: CHI 1990: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 249–256. ACM Press, New York (1990) 14. Mankoff, J., Dey, A.K., Hsieh, G., Kientz, J., Lederer, S., Ames, M.: Heuristic evaluation of ambient displays. In: CHI 2003: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 169–176. ACM Press, New York (2003) 15. Maeda, J.: The Laws of Simplicity (Simplicity: Design, Technology, Business, Life). MIT Press, Cambridge (2006)
Understanding the Relationship between Requirements and Context Elements in Mobile Collaboration Sergio Ochoa1, Rosa Alarcon2, and Luis Guerrero1 1
Universidad de Chile, Computer Science Department {sochoa,luguerre}@dcc.uchile.cl 2 Pontificia Universidad Catolica de Chile, Computer Science Department
[email protected]
Abstract. The development of mobile collaborative applications involves several challenges, and one of the most important is to deal with the always changing work context. This paper describes the relationship between these applications requirements and the typically context elements that are present in mobile collaborative work. This article presents a house of quality which illustrates this relationship and shows the trade-off involved in several design decisions. Keywords: Context Elements, Software Requirement, Mobile Collaboration, House of Quality.
1 Introduction One of the most challenging activities when developing software applications is to understand how user needs are satisfied by the application’s functionality. Groupware applications are not an exception, and although determining groupware users' needs in advance and creating the correspondent software may seem uncomplicated, the experience proves that it is not the case and that groupware users will refuse to use collaborative systems that do not support their needs in a wider context [Gru94; Luf00; Orl92; Suc83]. In addition, typical groupware applications have been criticized for being decontextualized, they do not consider the complex conditions (e.g., social, motivational, political or economic factors), where software will be executed causing users to reject the system [Gru94, Lju96, Bar97]. The problem is that most groupware developers usually focus almost exclusively on the analysis and specification of functional requirements neglecting the non-functional ones. Furthermore, users may be unaware of their needs in a wider context (e.g. social, physical, etc.), so that, by simply asking them to identify their requirements, software developers may not obtain the appropriate information [Ack00]. User’s needs are translated into functional requirements and some functions are implemented to satisfy such requirements. However, the appropriateness of such functionality is mostly evaluated later, when the software has been built. Misconceptions at that J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 67–76, 2009. © Springer-Verlag Berlin Heidelberg 2009
68
S. Ochoa, R. Alarcon, and L. Guerrero
point in the software life cycle are costly, which adds to the costs of groupware testing. Hence, new techniques are required in order to understand in advance the impact of the intended functionality on user’s needs in a wider context. One of such techniques is the Software Quality Function Deployment (QFD). QFD can be considered as a set of methods that allows contrasting the customer requirements with the product functionality and can be applied at each product development stage [Cha03, Haa96]. QFD aims to understand how users’ needs are intended to be satisfied; it is a consumer-driven process. Typically, QFD is applied in four stages: product planning, parts deployment, process and control planning and production planning. Each step produces a matrix that serves as an input to the subsequent steps. Due to its shape, the matrix is called a House of Quality (HOQ) [Kus99]. A HOQ relates customer requirements with technical descriptions of the product, including design activities. That is to say, the HOQ captures the essentially non-linear relationships between functions offered by the system and customers needs. Conflictive features become apparent and trade-offs are identified early in the design process. QFD has been used successfully for identifying customers and their needs [Mar98] considering user satisfaction as a measure of product quality. The use of formal tools that accompanies software development has been proven to be significant in various industries and some researches had accounted the usefulness of QFD for the design of groupware applications [Ant05, Gle04]. However, due to the complexity of groupware design, we believe that technical descriptions are not enough for analyzing whether user’s requirements are met or not. As stated by Grudin [Gru94] and Ackerman [Ack00], among others, groupware users needs go beyond technical functionality and involves various contexts of analysis, such as social, technological, and physical context. Based on a framework for contextual design derived by the authors [Ala06], the QFD technique and the authors’ experience, we present a correspondence matrix that shows the relationship between typical mobile groupware requirements and context elements inclusion. The analysis shows that trade-offs appear early during design, and some context have up to 9 relationships. Our aim is to provide groupware developers with formal software techniques that help them to reduce software costs while enriching its quality. We believe that such quality is strongly related with the contextualization degree of the application. Section 2 presents the context elements that typically are involved in mobile groupware applications. Section 3 describes the groupware requirements involved in the development of these tools. Section 4 presents the derived HOQ as well as an analysis of the relationships between user’s requirements and context elements. Finally, section 5 presents the conclusions and future work.
2 Context Elements The authors defined a set of context elements that are briefly explained below. Context elements should be considered during the development of collaborative mobile applications and are represented in the columns of the HOQ shown in figure 1.
Understanding the Relationship between Requirements and Context Elements
69
Readiness to use IT. This context element allows determining the group members’ preparation for using Information Technology tools. Users’ experience, readiness to use technology and learning will influence the kind of interaction dialogues, interfaces, protocol design options and even the project feasibility. Previous formal context (e.g., rules and regulations). This context element assists on characterizing users’ information needs, as well as the actions the group should perform in order to conform to such regulations. Previous informal context (e.g., social conventions and protocols). Unlike formal contexts, social conventions naturally emerge during everyday users’ interactions. They cannot be imposed and they constitute a frame for understanding each other behavior and purposes. Work practice tools. Every work practice community usually develops its own tools. These tools are not necessarily supported by technology. Provided they mediate social interactions, they can assist the analyst to understand the current underlying workflow. Group members interaction. This context element helps identify general interaction scenarios among group members in order to determine which of them require mobile support. Such interaction must consider users’ communication needs for data and/or voice transfer. Mobility Type: mobile coupled. This context element represents a type of mobility that can be present in a collaboration scenario. Group members performing mobile collaboration activities in a synchronous way are considered as carrying out mobile coupled activities. Mobility Type: mobile uncoupled. This context element represents to the asynchronous work carried out by mobile workers during a collaboration process. Communication requirements. Communication can be direct or mediated; public, private or a mixture; broadcast or multicast. This context element represents the communication requirements of a mobile collaboration activity. Communication strategies constrain the coordination strategies that can be applied. Coordination requirements. Coordination elements and policies are contextual element that needs to be identified. Some of these elements are: support for session management; floor control administration; user roles support; shared information handling. Activity criticality. It is important to determine the urgency of achieving the activity goals and the activity importance for the user. These criteria may influence the choice of communication and coordination strategies. Activity duration. Except for the case of mobile phones, activity duration in mobile collaboration based on PDAs, notebooks or Tablet PCs can be critical as it could be restricted by battery life. This context element identifies the activity duration and the requirement of power supply.
70
S. Ochoa, R. Alarcon, and L. Guerrero
Organizational structure (rigid/flexible). The organizational structure will influence the group needs for coordination and control policies. A rigid organization requires formal coordination with strict control, but flexible organizations must quickly react to environmental changes. This context element represents the type structure of the organization that host mobile workers. Collaboration policies/rules/norms/conventions. Every organization develops a series of social protocols, policies, rules and norms that regulate its workflow. It is important to identify which are the social rules that may be relevant for the intended collaborative application. Group size. Group size matters. Research in groupware has pointed out the importance of group size for the success of the coordination, communication and collaboration strategies. Most groupware design elements will be affected by the group size. Roles. An appropriate identification of roles will help developers to design useful applications. Otherwise, the collaborative mediation process could not be well supported. Clearly it may have a meaningful impact on the group performance. Group structure. The relationships among roles will define the group structure. An understanding of the group structure and the relationship between it and the organizational structure could be useful to design the interaction policies to support collaboration. Demographics. It is also important to take into account the users’ characteristics, e.g., their age, gender, race, and language may influence the application design. Usability of the application will probably be improved when considering this context element. Physical space. This element represents the available space for deploying and operating the collaborative mobile application. The smaller/less comfortable/less stable the physical available space is, the less likely is to use large or heavy computing devices. Adverse environmental conditions. This context element represents physical conditions such as noise, light, number of people around and distracting factors. These factors impose restrictions over the type of user interface to be used for interacting with the collaborative application. Safety and privacy. These are two important context elements to consider during the application design in case of mobile applications being used in public spaces. Handheld devices are especially appropriate for use in public spaces. User location (positioning). Traditionally in groupware, it refers to users’ location within the virtual environment and is known as location awareness. Current technology lets users locate the partners in the physical world. Power supply. The activity duration is in direct relation with this context element. The analysis of this element helps developers to identify if the power autonomy of the selected mobile device is enough to support each activity.
Understanding the Relationship between Requirements and Context Elements
71
Communication capability. This context element represents the availability of networking infrastructure in the work scenario. This element also includes the communication bandwidth that is possible to get in the physical scenario for supporting the mobile collaboration activity. Uptime effort. A mobile device may need short start-up time, e.g., when users have little time periods to carry out work or when quick reactions are required. This element represents the effort to leave available the mobile application. Transportability. It is important to identify those activities requiring mobility and to estimate the effort the users are able to spend while transporting the devices. Computing power. This element represents the processing and storage capacities required for a mobile computing device. Based on that, more than one device type can be selected to support activities with different requirements.
3 Computer Supported Mobile Collaboration Requirements Based on a literature review and the authors’ experience, this section describes general requirements of collaborative mobile solutions. These issues will be useful to help understand the type of applications and capabilities required to work in a specific scenario. Next, a brief explanation of each requirement is presented. Interoperability. The interoperability of a collaborative mobile application has two faces: communication capability and interaction services of the mobile units. Communication capability involves the threshold, protocols and infrastructure used to support the communication process among mobile units [Ald06, Kor01, Tsc03]. The structure and meaning of all information shared among the applications should be standardized in order to support any kind of interoperability. Multimedia support. If the application requires capturing, managing and/or transmitting heavyweight data types, such as image, video or audio, smaller the device size more limited will be the solution. The features of each device limit the quality and quantity of data that is able to capture, store and transmit. All road. Typically the nomadic work makes the work context changes periodically, therefore the groupware application has to be context-aware, and also it has to consider as much work scenarios as possible. Robustness. Nomadic work requires an important effort of the persons that use the computer-based applications. Several distracting events and interruptions are happening around them. Therefore, if the mobile groupware application is not robust and able to consider these distracting factors, then the users could not utilize the application to support the nomadic work. Autonomy. Typically, the nomadic workers carry out loosely-coupled work. It means they work autonomously and collaborate on-demand. Such autonomy must be provided by the software tool; therefore it must avoid using centralized resources.
72
S. Ochoa, R. Alarcon, and L. Guerrero
Usable or usefulness. The functionality provided by the tool, the design of the user’s interfaces, and mobile computing device utilized to perform a mobile collaborative activity, influence the usability of the solution in the work field. These three elements must be considered during the application design in order to improve the impact of the solution. Synchronous/asynchronous work. Mobile collaborative applications require synchronous/asynchronous communication capabilities depending on the type of activity to be supported (synchronous or asynchronous). If asynchronous communication is required, every mobile computing machine is able to provide such support based on minimal network availability. On the other hand, if synchronous communications is required, a permanent and stable communication service should be provided independently of the environment the user is located [Sar03]. Mobile phones supported by cellular networks are typically the best option for synchronous communication provided their large coverage range and good signal stability [Mal02]. However, these networks have a limited bandwidth. Another option is to provide synchronous communication capabilities to mobile applications using a Wi-Fi communication infrastructure [Rot01, Kor01]. Although the bandwidth is better than cellular networks, Wi-Fi signal stability depends on the physical environment where it is deployed [Ald06]. Furthermore, this type of networks has a limited coverage range [Mal02]. Portability (transportability). If the application requires to be used on the move, transportability is a strong requirement. Typically, the way to address this issue is through the mobile computing device chose to support the collaborative work. Smaller the device size the more transportable is the device. However, the device size reduction implies restrictions at least on the screen size and input capability [Kor01]. Privacy. If the privacy is an important requirement, mobile computing devices usually have small screens, and thus, they provide better privacy protection than notebooks and tablet PCs if data displayed on screen needs to be hidden from other people in public spaces. Furthermore, the physical distance between the user and the handheld device during the interactions is shorter than the distance between a user and his/her notebook or tablet PC. Another privacy consideration in mobile collaboration is the visibility of the users and users’ actions in MANETs or public networks [Kor01]. Ensuring accuracy of location information and users’ identities, and establishing private communication could be a critical issue in some cases [Che00]. Long time support (battery life). Activity duration in mobile collaboration provide a strong requirement on the type of device can be used to support it. Many researchers have identified the battery life as critical to support mobile collaboration [Kor01, Gue06]. However, the use of context-information provides a way to optimize the use of power supply resulting in a longer lasting battery life [Che00, Hak05]. On the other hand, it is always possible to carry extra batteries when PDAs, notebooks or Tablet PCs are used. Activity duration is not so critical in the case of mobile phones because these devices are able to work for many hours without being re-charged [Hak05].
Understanding the Relationship between Requirements and Context Elements
73
Capability to be deployed. Handheld devices are easy to deploy and carry, and also they require low user’s attention and have short start-up time. These features allow fast reaction from the users; such speed could be critically needed in these physical environments. Mobility. Users’ mobility on a physical environment depends on the features of the physical environment where the users are located and the current environmental conditions. A user equipped with a mobile computing device can be traveling, wandering and visiting [Kri00]. Traveling is defined as the process of going from one place to another in a vehicle. Wandering, in turn, refers to a form of extensive local mobility where an individual may spend considerable time walking around. Finally, visiting refers to stopping at some location and spending time there, before moving on to another location. Sarker and Wells report that “the optimal size of a device associated with wandering was necessarily lower than an acceptable device size when visiting or traveling” [Sar03]. Performance. The processing power needed for certain mobile applications can exceed what handheld devices can currently offer [Kor01, Gue06]. However, in case of PDAs, it is possible to find commercial devices with CPU speeds higher than 500 Mhz. The processing power limitation of these devices becomes visible, e.g., while processing multimedia information. Although every mobile computing device is able to address basic multimedia needs, just notebooks and tablet PCs are able to handle strong multimedia requirements, such as support for 3D games. Storage. Storage restrictions have been reported in the literature, especially related to handheld devices [Kor01]. However, these devices keep improving their storage and memory capacities. Last versions of these devices allow mobile applications to manage and store complex data types, even simple multimedia information. Data input. A possible requirement for a mobile collaborative application is the need for massive data entry. Typically, the mobile computing device used to support the solution will play a key role. PDAs and mobile phones use pen-based data input, which is slow, but also useful to support short annotations [Buy00, Sar03]. On the other hand, notebooks and tablet PCs are the most appropriate devices to support data intensive processes using the keyboard.
4 House of Quality The correspondence matrix, also called House of Quality (HOQ), has typically three parts (Fig. 1): customer requirements (leftmost rectangle), technical descriptions (upper rectangle), and relationships between customer requirements and technical descriptions (centered rectangle). In addition, the grey line shows the direction in which each relationship should be enhanced in order to improve the application’s capability to support the mobile work.
74
S. Ochoa, R. Alarcon, and L. Guerrero
Fig. 1. House of Quality
Analyzing the matrix it is possible to see that around a 30% of the intersections between rows and columns have some kind of relationship. It means each design decisions should be made carefully. The positive relationships must be increased and enhanced, and the negative ones should be minimized and neutralized. In addition, applications with high degree of interoperability among various software tools as well as coupled interaction pose the major challenges as they consume several resources (storage, bandwidth, battery) and may compromise application’s robustness, mobility, and performance. Authors expect this matrix helps developers to make fast and accurate decisions during the development process. When a design decision has to be made, the designer will evaluate their alternatives, against the HOQ, in order to determine which is the most appropriate. Therefore, the proposed tool not only systematizes and facilitates the decision making process, but also it makes it cheaper and expedite.
5 Conclusions and Future Work The article presents the typical users’ requirements and work contexts that are present in the development of mobile groupware applications. The paper also presents and analyzes the relationship among these two set of components.
Understanding the Relationship between Requirements and Context Elements
75
The analysis shows that trade-offs appear early during the application design. In addition, such analysis allows designers to easily identify the context variables that should be monitored in order to detect a work context change, or improve the users’ interaction paradigm. Our aim was to provide a tool (the HOQ) that allows mobile groupware developers to improve software quality, in term of usability and effectiveness, through the improvement of the decision making process at the design time. We believe that product’s quality is strongly related with the contextualization degree of the mobile application. As a next step, we are analyzing in detail three mobile groupware applications, in order to show how the HOQ can be used to support particular design decisions, and also to show the impact these decisions have in the products’ usefulness. If the authors assumptions becomes true, this proposal could have an important impact in the development of mobile groupware applications.
Acknowledgements This work was partially supported by Fondecyt (Chile), grant Nº 11060467, and LACCIR grants No. R0308LAC001 and No. R0308LAC005.
References 1. Ackerman, M.S.: The Intellectual Challenge of CSCW: The Gap Between Social Requirements and Technical Feasibility. Human Computer Interaction 15(2/3), 179–204 (2000) 2. Alarcón, R., Guerrero, L., Ochoa, S., Pino, J.: Analysis And Design of Mobile Collaborative Applications Using Contextual Elements. Computers and Informatics 25(6), 469–496 (2006) 3. Aldunate, R., Ochoa, S., Pena-Mora, F., Nussbaum, M.: Robust Mobile Ad-hoc Space for Collaboration to Support Disaster Relief Efforts Involving Critical Physical Infrastructure. ASCE Journal of Computing in Civil Engineering. American Society of Civil Engineers (ASCE) 20(1), 13–27 (2006) 4. Ramirez, J., Antunes, P., Respício, A.: Software Requirements Negotiation Using the Software Quality Function Deployment. In: Fukś, H., Lukosch, S., Salgado, A.C. (eds.) CRIWG 2005. LNCS, vol. 3706, pp. 308–324. Springer, Heidelberg (2005) 5. Bardram, J.: I Love the System -I just don’t use it! In: Proc. of International ACM SIGGROUP Conf. on Supporting Group Work, Phoenix, US, pp. 251–260 (1997) 6. Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Focused Web Searching with PDAs. Computer Networks. International Journal of Computer and Telecommunications Networking 33(1-6), 213–230 (2000) 7. Chan, L.K.V., Wu, M.L.V.: Quality Function Deployment: A Comprehensive Review of Its Concepts and Methods. Quality Engineering 15(1), 23–36 (2003) 8. Chen, G., Kotz, D.: A Survey of Context-aware Mobile Computing Research. Dept. of Computer Science, Dartmouth College, Tech. Rep. TR2000-381 (2000), ftp://ftp.cs.dartmouth.edu/TR/TR2000-381.ps.Z
76
S. Ochoa, R. Alarcon, and L. Guerrero
9. Glew, P., Vavoula, G.N., Baber, C., Sharples, M.: A ‘learning space’ Model to Examine the Suitability for Learning of Mobile Technologies. In: Attewell, J., Savill-Smith, C. (eds.) Learning with Mobile Devices: Research and Development, London, pp. 21–25. Learning and Skills Development Agency (2004) 10. Grudin, J.: Groupware and social dynamics: Eight challenges for developers. Communications of the ACM 37(1), 92–105 (1994) 11. Guerrero, L., Ochoa, S., Pino, J., Collazos, C.: Selecting Devices to Support Mobile Collaboration. Group Decision and Negotiation 15(3), 243–271 (2006) 12. Haag, S., Raja, M.K., Schkade, L.L.: Quality Function Deployment: Usage in Software Development. Communications of the ACM 39(1), 41–49 (1996) 13. Hakkila, J., Mantyjarvi, J.: Collaboration in Context-Aware Mobile Phone Applications. In: Proc. of HICSS 2005. IEEE Computer Society Press, Los Alamitos (2005) 14. Kortuem, G., Schneider, J., Preuitt, D., Thompson, T., Fickas, S., Segall, Z.: When peerto-peer comes face-to-face: collaborative peer-to-peer computing in mobile ad-hoc networks. In: Proc. of First Int. Conf. on Peer-to-Peer Computing, pp. 75–91 (2001) 15. Kristoffersen, S., Ljungberg, F.: Mobility: From stationary to mobile work. In: Braa, K., Sorensen, C., Dahlbom, B. (eds.), Planet Internet, Lund, Sweden, pp. 137–156 (2000) 16. Kusiak, A.: Engineering Design: Products, Processes, and Systems. Academic Press, San Diego (1999) 17. Ljungberg, J., Holm, P.: Speech acts on trial. Scandinavian Journal of Information Systems 8(1), 29–52 (1996) 18. Luff, P., Hindmarsh, J., Heath, C.: Workplace studies: Recovering work practice and informing system design. Cambridge University Press, Cambridge (2000) 19. Malladi, R., Agrawal, D.: Current and future applications of mobile and wireless networks. Communications of the ACM 45(10), 144–146 (2002) 20. Martin, M.V., Kmenta, S., Ishii, K.: QFD and the Designer: Lessons from 200+ Houses of Quality. In: Proc. of World Innovation and Strategy Conference (WISC 1998), Sydney, Australia (1998) 21. Orlikowski, W.: Learning from notes: Organizational issues in groupware implementation. In: Proceedings of the ACM Conference on Computer-Supported Cooperative Work, CSCW 1992, pp. 362–369. ACM Press, New York (1992) 22. Roth, J., Claus Unger, C.: Using Handheld Devices in Synchronous Collaborative Scenarios. Personal and Ubiquitous Computing 5(4), 243–252 (2001) 23. Sarker, S., Wells, J.: Understanding Mobile Handheld Device Use and Adoption. Communications of the ACM 46(12), 35–40 (2003) 24. Suchman, L.A.: Office Procedures as Practical Action: Models of Work and System Design. ACM Transactions on Office Information Systems 1(4), 320–328 (1983) 25. Tschudin, C., Lundgren, H., Nordström, E.: Embedding MANETs in the Real World. In: Conti, M., Giordano, S., Gregori, E., Olariu, S. (eds.) PWC 2003. LNCS, vol. 2775, pp. 578–589. Springer, Heidelberg (2003)
Continuous User Interfaces for Seamless Task Migration Pardha S. Pyla, Manas Tungare, Jerome Holman, and Manuel A. P´erez-Qui˜nones Department of Computer Science Virginia Tech Blacksburg, VA, 24060, USA
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. In this paper, we propose the Task Migration framework that provides a vocabulary and constructs to decompose a task into its components, and to examine issues that arise when it is performed using multiple devices. In a world of mobile devices and multiple computing devices, users are often forced to interrupt their tasks, move their data and information back and forth among the various devices manually, recreate the interaction context, and then resume the task on another device. We refer to this break from the task at hand as a task disconnect. Our objective is to study how software can bridge this task disconnect, enabling users to seamlessly transition a task across devices using continuous user interface. The framework is intended to help designers of interactive systems understand where breaks in task continuity may occur, and to proactively incorporate features and capabilities to mitigate their impact or avoid such Task Disconnects altogether.
1 Introduction and Motivation Today, with the advent of mobile devices and the deployment of computing in various form factors, the paradigm of a single user interacting with a single computer on a desk is losing its dominance. Even though the massive storage and computational power of a desktop computer has helped it to continue to be a central part of our daily work, most people interact with more than one device for their everyday tasks. From a recent survey of knowledge workers in a huge software development company and a major university [15], almost all participants reported that they use at least two computational devices for their day-to-day activities. The desktop computer and the notebook computer are still two primary devices that people use to accomplish their daily work. This proliferation of multiple computing devices is burdening the user with overheads for transferring information among different devices. Often, users are forced to copy files back and forth, open and close applications, and repeatedly recreate task context as they move from one device to another. In this paper we provide a theoretical foundation to describe the extraneous actions users need to perform as they switch from one device to another and propose the idea of Continuous User Interfaces (CUIs) to facilitate seamless task migration across multiple devices. We describe the development and evaluation of a proof-of-concept continuous user interface and observations from a preliminary usability study. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 77–85, 2009. c Springer-Verlag Berlin Heidelberg 2009
78
P.S. Pyla et al.
1.1 Tasks, Activities, Units and Cost Before formally defining a task disconnect, we first describe the vocabulary and terminology for tasks and the various associated parameters. Tasks
User Action
User Action
User Action
Procedure: Execution Sequence of Units
cost = x
cost = z
cost = y
Unit 1
Unit 2
Unit 3
Instructions
Instructions
Instructions
Fig. 1. Tasks, User actions, Units and Instructions
A task can be defined as a goal to be attained in given conditions [11]. Leplat expresses these conditions using three points of view: the states to be covered, the permitted operations, and the procedure [12]. At a lower level, we define tasks to be user actions. A user action is what the subject puts into operation (cognitive operations, behavior) in order to meet task demands. We also make use of Leplat’s definition of elementary units to be the elementary tasks, and elementary states or operations. Leplat uses these definitions to describe task complexity. However, we use the term units to further subdivide user actions to their lowest granularity. For non-trivial tasks (i.e., tasks that involve multiple activities), we define a procedure to be an operation execution sequence of multiple units. We also associate with each unit a parameter required for the successful execution: an instruction. Instructions are knowledge directions necessary to execute units, that can exist in the user’s understanding of the world or it can exist in the aids and artifacts in the task environment.The cost of a unit is a multidimensional attribute set that is incurred during the execution of a unit [12]. These dimensions could be cognitive, physical, memory-intensive, resource-intensive or a combination depending on the nature of the unit and the expertise of the user. Another important parameter of a task is time. In the words of Leplat, every task takes place in time and may be described by the temporal dimensions of its organization [12]. Out of the few temporal dimensions that Leplat describes, temporal ruptures is of particular importance to our work. We adapt and modify Leplat’s definition of temporal ruptures to mean interruptions by activities that do not directly contribute to the successful execution of the task at hand.
Continuous User Interfaces for Seamless Task Migration
79
1.2 Task Disconnects With the increasing proliferation of mobile and ubiquitous computing devices, people are forced to adapt their daily workflows to try and keep their work products and other data available to themselves at all times. E.g., Jones et al. [10] found that people often emailed to themselves the URLs of the websites they visited or their favorites list when they needed to access it from a different location. In scenarios such as this, where a user attempts to work with two devices, the need to transfer task and information back and forth between them is a burden. Users are forced to use workarounds like USB key drives, remote desktop software, e-mail, network file storage, and other means. These attempts to orchestrate a migration of data back and forth between the two devices create a gap in task execution. When a task is to be migrated from one device to another, the process involves more than just breaking it into two chunks along a selected boundary. It also involves the insertion of extra units into the task procedure that are entirely absent when the same task is executed on a single device. It is the inclusion of these extra units that hinders the seamlessness of task migration. Depending upon the exact nature of the task and the two devices in question, this process may involve simply adding more instructions to the procedure (low additional cost), or may involve an entirely new set of user actions (high additional cost). A task disconnect is a temporal task rupture arising due to extraneous user actions required when performing a task using multiple devices. Such extraneous user actions are required to accomplish the task across multiple devices, but do not directly aid in the completion of the task at hand. This raises issues such as how to help the user make a switch from one task condition to another in such a way that the need for a user’s attentional resources, cognitive workload, reaction time, and memory demands are minimized.
2 Related Work Bellotti and Bly [2] observed information workers to be mobile within the confines of their office; this local mobility existed mainly to enable the use of shared resources and for communication with other staff members. While mobile, users were seen to use secondary computational devices in addition to traditional devices such as desktops. A few strands of research have tried to address the problem of migrating tasks or applications over multiple devices. However, most of these studies have focused primarily on the technological aspects of this problem: Chu et al. [6] take the approach of migrating an entire application to support seamless task roaming, but with considerable latency (thus interrupting the user’s tasks sequence.) They do not discuss the implications on the user’s tasks and goals. Bandelloni et al. [1] describe user interaction with an application while moving from one device to another, in three levels of migration: total, partial and mixed. Chhatpar and P´erez-Qui˜nones [5] call this migration dialogue mobility and propose a requirement for the application data and logic to be separate from the user interface. Neither one of these projects take the task perspective we propose in this paper. Florins et al. [9] describe rules and transformations that attempt to provide graceful degradation of user interfaces while an application is migrated from one device to
80
P.S. Pyla et al.
another; even though their work is based on the same principle of continuity, their focus is on user interface generation and not on task migration. ARIS [4] is a window management framework that relocates running applications from one display to another; TERESA [14] helps design and develop model-based nomadic applications. Toolkits and tools such as TERESA have utility in rapidly deploying applications that can be migrated over multiple devices, but do not address the task semantics that users wrestle with while trying to interact with a multi-device interface. Denis and Karsenty [7] discuss a conceptual framework for the inter-usability of multiple devices. They provide an analysis of different cognitive processes in inter-device transitions and postulate two dimensions required for seamless interaction: knowledge continuity and task continuity. We base our work and the definition of Continuous User Interfaces on this requirement of seamlessness. We take this task-centered approach to solving the problem and we provide a definition, description, parameters, requirements, and a prototype to demonstrate a seamless interaction over multiple devices without task disconnects. Interruptions are events that break the user’s attention on a particular task to cater to another task that is in need of attention, and are the focus of a whole area of study by themselves [13]. Task disconnects can also be envisioned as analogous to interruptions, but occurring over multiple devices, locations, contexts, and most importantly, over a much longer time interval. Interruptions happen often in normal conversations [3,8]. However, in the field of linguistics [8,3], not all interruptions are disruptive or need repair. Even in cases where the interruptions are disruptive, the costs associated with repair are low, because humans have an inherent ability to repair, recover and proceed with most of the conversations using their ingrained social and cultural aids.
3 Study Design We targeted a specific application domain with sufficient complexity to allow us to observe clearly the different parameters responsible for task disconnects. Software development is a domain that involves work on several tasks of different affinities for devices. Software engineers perform requirements gathering and early design at client locations, away from their own offices, with laptop computers. They bring the artifacts of this stage to their office and continue to create a complete design using their desktop computers. The software that is designed must finally run on clients’ machines. We chose this application domain because of the need to use several tools such as text editors, drawing packages, scheduling programs, etc. when accomplishing a task, and because the nature of the task requires the use of multiple devices. We built a prototype to support the preliminary design phase of software engineering where developers collect client requirements and generate initial design prototypes, diagrams, and models. 3.1 Prototype We built a prototype that incorporated knowledge continuity and task continuity to provide a seamless multi-device user experience. The Task Explorer (Figure 2(a)) allowed users to create and demarcate a task, track the activities they performed in a to-do list tool (included), and provided constant visual feedback on the status of the connected devices in range. Opening a task in the Task Explorer launched the Task Viewer
Continuous User Interfaces for Seamless Task Migration
(a) Task Explorer
81
(b) Task Viewer
Fig. 2. Task Explorer and Viewer
(Figure 2(b)), a place to view documents and files related to a task. In our prototype application domain, these are requirements documents, diagrams and prototypes, e-mail addresses, and people related to the project in a unified view. Each task is uniquely color-coded to establish a visual identity with the task. Opening a document such as a requirements specification launched that file in an external editor. This was implemented as an application on a tablet interface. The interface leveraged spatial organization, shape, color, partitioning of data and function, recovery of state of data and recovery of activity context on its user interface. Tasks were migrated from the desktop computer to the tablet interface either automatically (because the task was open) or manually (by dragging and dropping). For each task, we displayed the task parameters in the same color gradient on the tablet and the desktop. The last drawing that was being accessed on the desktop computer was automatically loaded to maintain task continuity and activity context. If the drawing was cleared and replaced by another, the new diagram was synchronized automatically with the desktop. This obviated the need to open and save documents, making the interface more like paper. As artifacts were being generated, they were populated into the task tree on the right side of the screen. The task tree on the tablet brought together artifacts related to the task on the desktop computer, e.g. requirements documents, people, to-do list, and email messages. 3.2 Evaluation Interviews and user surveys were conducted to gather insights into the example task of prototyping and the existence of disconnects when using multiple devices to prototype. Six professional software developers were asked open-ended questions targeting the technologies and devices they used to prototype and any insights into disconnects arising due to the mediation by these technologies. In addition, we received N=32 responses to a survey that targeted software developers, graduate students with software development experience, and researchers in HCI who were familiar with computing and did prototyping tasks.
82
P.S. Pyla et al.
The prototype was evaluated with a group of graduate students with a software engineering background. A total of six participants participated in the evaluation. Three of the six participants constituted a control group where they were given tasks that required switching between a tablet and a desktop computer. The other three participants comprised our test group and were asked to perform the same tasks using our prototype. Each participant was assigned a total of seven tasks. Each task required drawing simple low-fidelity user interface prototypes using our custom drawing tool, updating requirements specifications using a text editor, or a combination of these two tasks. Participants were provided a background scenario explaining the context of a software development project for a fictitious client and the need to transfer documents between the tablet and the desktop. The participants were asked to use a tablet when meeting the client. Their interaction with the client was scripted in the scenario provided. The participants were asked to think aloud while they worked and the evaluator prompted the users when they stopped talking during a task. 3.3 Tasks The first task required the participant to make changes to an existing requirements document based on a fictitious client’s new insights into the project at the client’s location (i.e. using a tablet). The second task required the participant to prepare a low-fidelity prototype for the new requirements specification on the desktop. The third task asked the participant to visit the client to demo the prototype that was created on the desktop at the participant’s office. The fourth task required the participant to work on the desktop and to add more description to some requirements based on the client’s feedback. The participants were asked to assume they were at home for the fifth task (i.e. they were to use a tablet.) When they thought of a design feature, they were to create a new prototype with that insight to demo to the client the next day. The sixth task asked the participant to visit the client and demo the new prototype and get feedback. Based on the feedback, they were required to change the prototype and the requirement specification. The last task was set at the participant’s office where they were asked to update their desktop files with the latest prototype and requirements specifications. These tasks were designed with the goal of making the participants transfer information between the two devices as they progressed through the tasks. In the test group, this transfer was automatic because the participants used our prototype. In the control group, the participants had to move the files themselves using their choice of a USB pen drive, email, or other server-based technologies. The control group participants were provided the tablet and the desktop that were both connected to the Internet. They were given one task at a time with the associated scenario to provide the context of the interaction. At the end of the session, all participants were asked to fill out a questionnaire.
4 Results and Discussion In this study, we found several aspects of multi-device interaction that current systems do not adequately support. Specifically, several users reported dropping the use of multiple devices in favor of a single computer, to avoid the costs of task migration. They
Continuous User Interfaces for Seamless Task Migration
83
also reported that migrating files across devices taxes their short-term memory, is often frustrating, and likely to result in errors. We examine each of these in turn, based on our observations and responses from our study participants. 4.1 Use of a Single Computer Another interesting observation that one participant made was: “this [migrating data] almost makes me use the tablet alone for all the tasks and forget about my desktop if I had the choice”. When asked if she would do that even if the task at hand required more processing power (such as that available in a desktop), she responded affirmatively. Several survey respondents in another study [15] also confirmed that they chose to use only a single computer for convenience rather than have to copy/move files among several computers. This illustrates that the high costs of user actions associated with a task switch from one device to another prompt users to forgo computational power in favor of eliminating the switch entirely. 4.2 Consistency (or the Lack Thereof) of File Locations One common complaint from participants was that they needed to remember file locations and the state of a file on each device. As one participant put it, “this version control is getting irritating”. Remembering such extraneous information increases the short-term memory costs of this activity tremendously. Given that short-term memory is unreliable, it is difficult for the user to remember which device had the latest version of the data if temporal ruptures in a task take place over a longer period of time. This is another observation that directly supports our hypothesis that transferring activity context is important. The experimental group, who performed the same tasks using our prototype, were instantly able to locate documents and the information they needed. When switching from one device to the other, they reported being able to restart their task immediately and to be productive because the environment was already automatically loaded with the appropriate task context. Because information was re-displayed using less screen real estate, users were immediately able to focus on their work while keeping related information in their peripheral vision. The only limitation of the system was that users spent time moving and resizing the requirements window to enable them to easily see both and work between them. The act of copying files manually involves two steps: copying the data over, and placing it in the correct location on disk. Most current data copying tools and media (e.g. USB drives, email to self, etc.) assist in performing the first task, but not in the second. Thus, to perform the second step, users are forced to rely on their short-term memory, increasing cognitive workload and scope for error. Automatic system support for the second step therefore was viewed as a distinct advantage of our prototype. The related issues of version control and conflict management were also automatically handled.
84
P.S. Pyla et al.
4.3 Fear, Frustration In the questionnaire, all three control group participants reported a fear of making errors due to the overheads associated with the migration of information across devices. They also reported difficulty in keeping track of document version information. One participant commented that if a scenario such as the one in the evaluation were to occur in real life, the costs would be higher because of longer temporal ruptures. One of the participants forgot to copy the files from the desktop to the tablet before visiting the client. When she realized this, she remarked, “Wow! In real life, this would mean I’d have to go back to my office to get my updated files or redo the prototype that I did in the last task.”. On the questionnaire, members of the experimental group reported that they were less likely to make errors accomplishing the tasks. Also, because file state and application state were transferred automatically, the experimental group only had to worry about finding the appropriate location in the UI to begin work again. There were comments by some users that it would be nice to have a better view of all the files related to a project, but creating a new file system view was not the purpose of our prototype. Overall, participants of the experimental group responded that the application was more satisfying and easier to use (per Likert scale ratings.) This means that as the task procedure lengthens (in light of the extraneous actions required for task switching), and associated increase in costs, there is a corresponding rise in the likelihood of user error. In continuous user interfaces, such costs are reduced because of in-built system support for task migration. 4.4 Use of Mobile Computers as Secondary Displays For the second task, where the participants were required to create prototypes based on the requirements specification document, all the three participants in the control group preferred using the tablet as an information display. They opened the specification document on the tablet and referred to it as they sketched the prototype on the desktop. When asked about this, they said that having the information on a secondary display was good as it did not make them switch between different windows on one device. This might mean that CUIs should leverage the capabilities of the various devices even when they are co-located.
5 Discussion and Summary We explored the issues that arise when users use multiple devices to execute a single task. We found that current systems lend inadequate support for several user actions that must be performed during a task migration between/among devices. Among the problems reported were: dropping the use of multiple devices in favor of a single computer; increased short-term memory costs while migrating files across devices; higher frustration; and a higher likelihood of errors. We proposed and designed a prototype Continuous User Interface that ensured a seamless task migration for users attempting to perform a requirements specification
Continuous User Interfaces for Seamless Task Migration
85
and gathering task, using a tablet computer and a desktop computer. This system provided support for automatic migration of task context (e.g. the applications that were in use; pages and objects such as diagrams that were selected and active; etc.) between the two devices. In an evaluation conducted, participants reported that it helped mitigate the disruptive effects of task disconnects to a high degree. An interesting observation was that users expected to be able to annex existing collocated devices when performing their tasks (i.e., using the tablet computer at their desk along with their primary desktop computer.) They also reported that the automatic availability of necessary data on mobile computers directly contributed to higher perceived reliability and lower likelihood of error.
References 1. Bandelloni, R., Patern`o, F.: Flexible Interface Migration. In: IUI 2004: Proceedings of the 9th International Conference on Intelligent User Interface, pp. 148–155. ACM Press, New York (2004) 2. Bellotti, V., Bly, S.: Walking away from the desktop computer: distributed collaboration and mobility in a product design team. In: CSCW 1996: Proceedings of the 1996 ACM Conference on Computer Supported Cooperative Work, pp. 209–218. ACM Press, New York (1996) 3. Bennett, A.: Interruptions and the interpretation of conversation. Discourse Processes 4(2), 171–188 (1981) 4. Biehl, J., Bailey, B.: Aris: An interface for application relocation in an interactive space. In: Proc. 2004 Conference on Graphics Interface, pp. 107–116 (2004) 5. Chhatpar, C., P´erez-Qui˜nones, M.: Dialogue mobility across devices. In: ACM Southeast Conference (ACMSE), Savannah, Georgia (2003) 6. Chu, H.-h., Song, H., Wong, C., Kurakake, S., Katagiri, M.: Roam, a seamless application framework. Journal of Systems and Software 69(3), 209–226 (2004) 7. Denis, C., Karsenty, L.: Inter-usability of multi-device systems - a conceptual framework. In: Seffah, A., Javahery, H. (eds.) Multiple User Interfaces: Cross-Platform Applications and Context-Aware Interfaces, pp. 373–384. John Wiley and Sons, Chichester (2004) 8. Drummond, K.: A backward glance at interruptions. Western Journal of Speech Communication 53(2), 150–166 (1989) 9. Florins, M., Vanderdonckt, J.: Graceful degradation of user interfaces as a design method for multiplatform systems. In: IUI 2004: Proceedings of the 9th International Conference on Intelligent User Interface, pp. 140–147. ACM Press, New York (2004) 10. Jones, W., Bruce, H., Dumais, S.: Keeping found things found on the web. In: CIKM 2001: Proceedings of the tenth international Conference on Information and Knowledge Management, pp. 119–126. ACM Press, New York (2001) 11. Leontiev, A.: Le Developpement du Psychisme. Editions Sociales, Paris, France (1972) 12. Leplat, J.: Task complexity in work situations. In: Tasks, Errors and Mental Models, pp. 105–115. Taylor & Francis, Inc., Philadelphia (1988) 13. McFarlane, D.C.: Interruption of people in human-computer interaction. Doctoral dissertation, The George Washington University (1998) 14. Mori, G., Patern`o, F., Santoro, C.: Tool support for designing nomadic applications. In: IUI 2003: Proceedings of the 8th international Conference on Intelligent User Interfaces, pp. 141–148. ACM Press, New York (2003) 15. Tungare, M., P´erez-Qui˜nones, M.: It’s not what you have, but how you use it: Compromises in mobile device use. Technical report, Computing Research Repository, CoRR (2008)
A Study of Information Retrieval of En Route Display of Fire Information on PDA Weina Qu1, Xianghong Sun1, Thomas Plocher2, and Li Wang1 1
State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China {quwn,sunxh,wangli}@psych.ac.cn 2 Honeywell ACS Labs, Minneapolis MN 55418, USA
[email protected]
Abstract. This study was concentrated on which way is the most convenient for firefighter to get information, comparing among audio display, text display, and combined multi-modal display. Can fire commanders effectively obtain key fire information while they are en route to the fire, especially when they sitting in a moving and bumpy car? The task includes free-browse, free-recall and searching information. The result showed that: (1) Audio only always made firefighter taking the longest time to browse and search, but the introduction of audio display made the two combined displays more quickly to access information, and more easy to remember. (2) Searching in a moving environment took a little longer than searching in lab. (3) Comparing in the lab and in moving car, it was found that searching in a moving environment took a little longer than in lab. (4) It was proved that text display was still a necessary and indispensable way to display information. Keywords: Information retrieval, Display, PDA, Free-browse, Free-recall, Search.
1 Introduction The rapid growth of the IT industry during the last few decades has increased demands on mobile devices such as PDAs, cellular phones, and GPS navigation systems. With emerging concepts of context-aware computing, the mobile devices can provide mobile users with timely information by using not only common knowledge but also environmental context such as current time and location [1]. PDA has applied many systems. For example, alerts in healthcare applications [2], navigation system [3]. Auditory system is another important sensory system for getting information, which is the major complement to visual system. Moreover, human responds to auditory stimulus is faster than visual stimulus [4]. For Firefighting, time means life saving. En route display system is a kind of handheld device with mobile communication, which aimed to help fire commander to access the most current fire information as quickly as possible. Purpose of the experiment is to answer the following questions: can fire commanders effectively obtain J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 86–94, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Study of Information Retrieval of En Route Display of Fire Information on PDA
87
key fire information while they are en route to the fire, especially when they sitting in a moving and bumpy car? Comparing among audio display, text display, and combined multi-modal display, which way is the most convenient for firefighter to get information?
2 Method 2.1 Experimental Environment Test equipment. An en route display prototype (installed in a handheld PDA device) has been developed as the experimental platform, which could show the current fire related information to fire commanders, such as the address of the building got a fire alarm, the location of the first alarm in the building, and so on. The size of words is based on previous experiment[5]. Based on the result from card sorting experiment, all the fire information has been clustered into three-level menu structure. The first level and the second level are total information (see Fig. 1). The third level is specific information.
Fig. 1. Content of the first and the second level menu
Fire scenarios. Totally 16 fire scenarios drawn from the previous 3D fire information display prototype[6] were pre-installed in the prototype, supposed to happen in 3 buildings: Camden building, a 45-floor simple structured tower building, and an 8-floor
88
W. Qu et al.
complex structured hospital building with unregulated shape. 2 of them were Camden building fires (single fire). 7 of them were tower building fires (including 2 SS fires (Single fire seed spread on Single floor), 3 SM fires (Single fire seed spread on Multiple floors)), and 2 MM fires (Multi fire seeds spread on Multiple floors). The other 7 of them were hospital fires (including 2 SS fires, 2 MM fires, and 3 SM fires) were prepared as the scenario pool. Experimental places. The experimental tasks were completed both in a lab (indoor test part) and in a moving car (outdoor test part). A set of standard usability test room was used as the indoor test place. A Volkswagen sagitar car was used as test car. Driving speed is about 45 kilos/hour. In the test room, firefighters were asked to fulfill two kinds of test tasks (free recall, information searching) by using en route display prototype. In the moving car, one firefighter sitting in co-driver’s seat was asked to do a dual-task: the main task is a counting how many street lamps passing by on right side along the road and speak out the number loudly; the secondary task is to finish information searching by using en route display prototype. 2.2 Participants 12 firefighters participated in our experiment. 11 of them aged from 21 to 29, and 1 subject is 35 years old. 6 of them have bachelor degree. 9 of them have more than 5-year firefighting experience. All participants had accomplished all the experimental tasks.
3 Procedure Each test was conducted by one firefighter and two experimenters. During the test, one experimenter worked as a moderator, the other one was in charge of video recording and note taking. The whole test consisted of three parts: training, test, and interview. Training experimenter took a couple of minutes to explain 5 key’s function on the PDA device so that subjects can know clearly how to use them (up: previous item, down: next item, left: previous level menu, right: next level menu and middle key: updated information) to browse the system and get information. Test. This part was divided into two stages: test in lab, test in moving car. Test in lab 1. Task 1 free browse and free recall: firefighter was asked to explore en route system for a few minutes (3-5 minutes) then stopped when he thought he was already familiar with the prototype, and was asked to recall whatever he can remember in his mind. Each firefighter could only try one of the four display ways to explore the en route system. 2. Task 2 Information searching: moderator show four items to firefighter one by one, and ask him using the en route prototype to find the items as quickly as possible. In order to assure firefighter had not only found the location of the item but also remembered the content, after finding the item, he was asked to repeat the content about the item without seeing the information.
A Study of Information Retrieval of En Route Display of Fire Information on PDA
89
Test in moving car 3. Task 2 Information searching: same as the test in lab. Interview. After the test part, firefighters were asked to evaluate the menu structure, to tell their preference to the four ways of fire information display. Then experimenter asked firefighter several questions. 3.1 Data Analysis of Free-Browse and Free-Recall Task Free-recall task was to test what firefighters can really get from the en-route display system and what they have memorized. Experimental design. One way between subjects design was used in this task. There were four ways of information display on en route system, which were audio only, text only, combined text + audio and text + the third level auditory display. 12 subjects were randomly divided into four groups. Each group has 3 firefighters. Each group tried only one kind of four information display ways. They were asked to browse and operate the PDA device freely till they felt familiar with the system and knew the fire related information. Then, they were asked to recall key information from they had browsed. Browsing time, the item number they browsed, and the items they recalled were recorded. All subjects used the same scenario (Camden building fire). Experimental result. Table 1 showed the numbers of browsed items and average time of free browsing. The first level menu had 6 items, so there are total 6 scores. If subject browsed only 1 item, he would get 1 score. The second level and third level menu had 24 items respectively. It’s found that “text only” and “combined text + audio” display are better than other two kinds of display. Subjects spent least time on “combined” display style. Table2 showed free-recall item numbers under the four different kinds of display conditions. If subject could recall one item correctly, he would get 1 score. It’s found that subjects could get more scores using “test only” and “combined text + audio” display than the other two display styles. Table 1. Free-browse item number and average time
audio only
1st level menu (6 scores including updated inf.) 5.3
2nd level menu (24 scores)
3rd level menu (24 scores)
average time for exploring
12.3
6
0:05:35
text only
5.3
17
13.3
0:04:40
combined text + audio
5.3
16
13.3
0:03:17
text + the 3rd level audio
5.3
12
10
0:03:41
5.3
14.3
10.67
0:04:18
mean
90
W. Qu et al. Table 2. Free-recall item numbers
audio only text only combined text + audio text + the 3rd level audio
1st level menu (6 scores including updated inf.) 0.7 1
2nd level menu (24 scores)
3rd level menu (24 scores)
total number
2.7 2.3
2 5
5.4 8.3
3
4.7
1
8.7
1
2.3
3
6.3
From the data on table 1 and table 2, we can say firefighters have browsed all the items at the first level, most items at the second level, and about half of the items at the third level. The more items they browsed, the more items they can correctly recalled. But the amount of their memory was still around the limitation of short memory: 7+/-2. 3.2 Data Analysis of Searching Task Experimental design. Searching task was to test the efficiency of the en route system for firefighter’s operations. 2*4 between subjects design was used in this task. Two factors were the places where the en route system was used, and the ways of fire information display. Lab and moving car were used as the two experimental places. Four different display ways (same as mentioned above) were tested to find which way is easier and more convenient to search information. 12 subjects were randomly divided into four groups. Each group has 3 firefighters. Each group used the same way of display as in the browse and recall task to search information. In the lab situation, each subject was asked to search four items: facility manager, hose, security passage and name and address of the building got a fire alarm. All subjects used the same scenario (hospital fire). In moving car situation, each subject searched four different items: Power Company, control room, road information and building structure. All subjects used the same scenario (Camden fire). In this task, subjects were asked to find the item as quickly as possible and repeat detailed content without watching back the PDA screen. The searching time and the percentage of correctly repeated content were recorded. Experimental result. Table 3 and table 4 showed percentage of correct repetition and searching time in the lab and in moving car. Searching time didn’t include subjects’ repetition time. Percentage of correct answers means accuracy of subject repeating detailed content. About the percentage of correct answer, 0 meant that subjects could not repeat the detailed content that they found; 1 meant that subjects could repeat part of detailed content; 2 meant that subjects could repeat all detailed content. It’s found that audio display took the longest time to search, and “text + 3rd level auditory display” took the shortest wherever in the lab and in moving car. The results
A Study of Information Retrieval of En Route Display of Fire Information on PDA
91
Table 3. Percentage of correct repetition and searching time in the lab Percentage of correct answers 0 1 2 Searching time
combined text + audio
Text + 3rd level audio
audio only
text only
16.70% 33.30% 50%
0 25% 75%
0 16.70% 83.30%
0 0 100%
0:00:55
0:00:17
0:00:25
0:00:13
Table 4. Percentage of correct repetition and searching time in moving car Percentage of correct answers 0 1 2 Searching time
combined text + audio
Text + 3rd level audio
audio only
text only
33.30% 33.30% 33.30%
8.30% 8.30% 83.30%
8.30% 0.00% 91.70%
8.30% 16.70% 75%
0:01:03
0:00:31
0:00:31
0:00:23
Fig. 2. Comparison of searching time between lab situation and car situation
showed that combined text + audio display were better repeated by firefighters than the other two ways. Among the four displays, there was no significant difference existed. Fig. 2 showed the comparison of searching time between the lab situation and the moving car situation. It’s found there was the same trend among the four ways of
92
W. Qu et al.
information display, and searching in moving car took about 8 seconds longer than searching in the lab.
4 Interview The short interview was trying to answer the following questions: 1. What do you think the menu structure? At each level of the menu, which item do you think the most important? Could you sort the items at the same level by their significance? Is there any suggestion to improve the current menu structure? 2. Which way of showing fire information do you prefer to choose when you are using the en route prototype on the way to the fire scene? Why? 3. Do you think the information shown in the system is easy to remember? How much information can you remember each time? 4. Do you think the fire information is easy to find? 5. For the button of updated information, what do you think it, useful, or not? 6. How do you think the en route information display, is it useful, or not? Why? 7. Is there any other suggestion to the prototype? 4.1 Subjective Ratings to the Four Display Styles From Table 5, we can say most of users prefer combined visual and audio display. (answer to the question 2) Table 5. Preference to the four display styles
audio only in the lab in moving car
combined text + audio
text only
0
16.70%
66.70%
8.33%
33.30%
41.70%
Text + the third level auditory display 16.70% 16.70%
4.2 Subjective Ratings to Menu Sequence 1 means the most important, 5 means the least important. Some firefighters said the most important item should be put on the first line, but at the first level menu the alarms was put on the bottom line although it was ranked the most important by everyone (see Table 6). Table 6. Preference to the first level menus sequence
mean
1 Facility phone contact 3.5
2 Firefighting equipment 3.2
3 Site information 3.8
4 General building information 2.6
5 Alarms 1.9
A Study of Information Retrieval of En Route Display of Fire Information on PDA
93
For the sorting of 2nd level and 3rd level menu items, it was found that at each category firefighters’ rating was consistent with the current sorting in Table 7. Combining the data in Table 6 and 7, to answer the first question about the menu structure, we think the current one is good enough except two things: 1) alarms should be put on the first line at the first level. 2) Security passage should be moved from “4 General building information” to “3 site information”. Table 7. Preference to the second level menus sequence
mean
mean
mean
mean
mean
1.3 1.2 hazard facility manager coordinator 2.6 3.2 2.2 2.1 fire equipment 2.3 hose equipment shutoff 1.9 3.3 3.4 3.1 secu3.2 road 3.3 road rity passage limit information 1.4 2.3 2.3 4.3inf.of 4.2gener surround4.1 ocal inf. of ing buildcupants building ings 2.8 2.4 4.6 5.1 1st 5.2 5.3 alarm alarm list hazard 1.4 2.2 2.4 1.1 building owner 1.8
1.4 power company 3.9 2.4 outdoor standpipe 3.4
4.4 building structures 3.5
1.6 water department 4.3 5.3 2.6 2.5contr power ol room room 3.9 5
1.5 gas company
4.5 keybox location 4.5
4.6name and address 2.4
5 Conclusion In order to answer how useful the en route information display system for firefighters’ information accessing, current situation understanding, and decision making, we did a series of tests to investigate the efficiency of the system, to compare different display ways including audio, text, and their combinations to find the most appropriate one. Based on the data, and the subjective ratings, we can summarize our findings as follows: 1. En route information display system was useful to help firefighter get the critical fire information and make decision more quickly and accurately. 2. Comparing the four information display ways: audio only, text only, audio + text, and text + the 3rd level audio, audio only always made firefighter taking the longest time to browse, search, but the introduction of audio display made the two combined displays (text + audio, and text + 3rd level audio) more quickly to access information, and more easy to remember.
94
W. Qu et al.
The reasons making audio display the worst could be: a. the voice message was not clear enough for firefighter to hear, especially in moving car; b. 3-level information structure is difficult for people to understand just by listening. But the data also showed that, after training, if firefighter has already got the information structure in mind, the convenience of operating with audio way will be the same as operating the system with other way. 1. Comparing the two situations of using the en route system: in lab, and in moving car, it was found that searching in a moving environment took a little longer than searching in lab. 2. It was proved that text display was still a necessary and indispensable way to show information. The reason was that, even there was a voice message playing, people still need to look at the text to make sure what they heard and understand was correct, especially for the building address, alarm location, and the contactors’ name, and so on. Because those information is really critical for firefighting and life saving.
References 1. Kim, N., Lee, H.S., Oh, K.J., Choi, J.Y.: Context-aware mobile service for routing the fastest subway path. Expert Systems with Applications 36, 3319–3326 (2009) 2. Chiu Dickson, K.W., Kwok Benny, W.-C., Kafeza, M., Cheung, S.C., Eleanna, K., Hung Patrick, C.K.: Alerts in healthcare applications: process and data integration. International Journal of Healthcare Information Systems and Informatics 2, 36–56 (2009) 3. Lee, W.C., Cheng, B.W.: Effects of using a portable navigation system and paper map in real derving. Accident analysis and Prevention 40, 303–308 (2008) 4. Quan, P.: Design of application Interface based on the human cognition. Computer Engineer and Application 19, 148–150 (2001) 5. Sun, X.H., Plocher, T., Qu, W.N.: An empirical study on the smallest comfortable button/icon size on touch screen. In: Aykin, N. (ed.) HCII 2007. LNCS, vol. 4559, pp. 446– 454. Springer, Heidelberg (2007) 6. Qu, W., Sun, X.H.: Interactive Style of 3D Display of Buildings on Touch Screen. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS (LNAI), vol. 4562, pp. 157–163. Springer, Heidelberg (2007)
A Mobile and Desktop Application for Enhancing Group Awareness in Knowledge Work Teams Timo Saari1, Kari Kallinen2, Mikko Salminen2, Niklas Ravaja2, and Marco Rapino2 1
Temple University, 1801 N. Broad Street, Philadelphia, PA, USA, and Center for Knowledge and Innovation Research (CKIR), Helsinki School of Economics, Finland, and Helsinki Institute for Information Technology (HIIT), Finland
[email protected] 2 Center for Knowledge and Innovation Research (CKIR), Helsinki School of Economics, Finland
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. In this paper we present a first prototype for a mobile and desktop system and application for enhancing group awareness in knowledge work teams. The prototype gathers information from the interactions of the group within the application and analyses it. Results are displayed to members of the group as key indexes describing the activity of the group as a whole and the individual members of the group. The advantages of using the prototype are expected to be increased awareness within group possibly leading to positive effects on group performance. Keywords: Group awareness, emotional awareness, knowledge work, mobile application, desktop application.
1 Introduction We see knowledge work consisting of the capacity to act in intelligent ways in one’s context and environment. Senge [1] suggests that while information implies knowing “about” things, and is received and passed on, knowledge implies knowing “how”, thereby giving people the capacity for effective action. Davenport et al. [2] define knowledge work as “the acquisition, creation, packaging, or application of knowledge. Characterized by variety and exception rather than routine, it is performed by professional or technical workers with a high level of skill and expertise.” Consequently, knowledge work includes the creation of knowledge, the application of knowledge, the transmission of knowledge, and the acquisition of knowledge. McGrath and Hollingshead [3] have proposed that technologies, as they have been applied to groups, can be placed along a dimension of increasing and decreasing richness of social cues. Face-to-face groups have access to a rich variety of social cues that they can then use to determine the preferences and positions of other group members. On the other hand, computer-mediated groups, do not have access to nonverbal cues, and must rely simply on the written word. That is, computer mediated groups and hence computer mediated group work are low in the richness of social cues. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 95–104, 2009. © Springer-Verlag Berlin Heidelberg 2009
96
T. Saari et al.
There are different types of group awareness some of which are relevant to worklike tasks. According to Greenberg [4], there are several types of group awareness needed for effective collaboration: • Workspace awareness is ``the up-to-the minute knowledge a person requires about another group member's interaction with a shared workspace if they are to collaborate effectively''. • ``Group-structural awareness involves knowledge about such things as people's roles and responsibilities, their positions on an issue, their status, and group processes.'' • ``Informal awareness of a work community is basic knowledge about who is around in general (but perhaps out of site), who is physically in a room with you, and where people are located relative to you.'' • ``Social awareness is the information that a person maintains about others in a social or conversational context: things like whether another person is paying attention, their emotional state, or their level of interest.'' Other information can be the special skills a co-worker has. Also, emotional awareness within a group has been discussed in relation to knowledge work groups [see 5]. Within this article we see emotional awareness, i.e. awareness of emotion and mood states of the members of the group as part of social awareness. Often in knowledge work situations, awareness of others provides information that is essential for frictionless and effective collaboration. Even though group awareness is taken for granted in face-to-face work, it is rather difficult to maintain in distributed settings. Hence, there is a considerable challenge in designing technology to support the types of group awareness that actually may lead to increased performance or other beneficial effects at work. We propose an application to increase awareness in knowledge work groups. Our application enhances group awareness by making explicit the implicit interaction patterns in a group. We expect our application to have beneficial effects on performance in knowledge processes and tasks.
2 Use Scenario and System Design Supporting general knowledge work processes Knowledge work tasks can roughly be classified to job specific tasks and general processes [see 6]. Job-specific tasks differ greatly as a function of the type of work. Examples are preparing a budget, analyzing results in terms of estimated and actual costs, planning and scheduling a project, eliciting and documenting system requirements, and writing applications software. There are also many general processes when working. General processes can be such as goal setting, communication, updates, group cohesion and synchrony maintenance and informal group communication and coordination. “Mobile” knowledge work differs mostly from “normal” knowledge work in that it takes place in distributed settings with the use of mobile technologies. Mobile knowledge work can be mostly mobile such as when a person is communicating with others
A Mobile and Desktop Application for Enhancing Group Awareness
97
and accessing files on the field while conducting work tasks. Mobile knowledge work is naturally intertwined with non-mobile knowledge work as people move in and out of their offices. Mobile technologies for knowledge work also mix with desktop computing environments as workers also carry their mobile phones to the office. Our use scenario is enhancing general communication processes of knowledge work teams. We feel that general communication and coordination processes of knowledge work vs. task-focused processes are not well supported by current technologies. In short we propose a system in which various data is collected from the status of single users and transmitted to other users to an easy-to-use mobile and desktop application. The application supports better group awareness in terms of workspace awareness, group-structural awareness, informal awareness and social awareness. Group performance and effectiveness We hypothesize that our system and applications will influence group performance. However, there is more to groups than performance on task only. For instance Andriessen [7] has divided the key aspects of group interactions as i) performance related to activities and specific tasks aimed at reaching a common goal for the group and ii) group maintenance, i.e. activities aimed at enhancing and building cohesion and trust in the group or to gain status and power in the group. Regards group awareness discussed above we expect that “better” group awareness (workspace awareness, group-structural awareness, informal awareness and social awareness) will then lead to positive outcomes in group interactions, in both performance and group maintenance. In addition to group performance another concept, group effectiveness has been proposed to consist of three components: 1. production function: effectiveness is here defined as “the degree to which the productive output meets or exceeds the performance standards set by the clients”. Criteria used to define such dimension are product quality, product quantity, efficiency and innovativeness; 2. group well-being function: is defined as “the degree to which the attractiveness and vitality of the group is strengthened”; 3. member support function: “the degree to which participating in the group results in rewards for the individual group members” [6, pp 100]. We propose that different types of group awareness are linked to group performance and group maintenance and are prerequisites for group effectiveness. Previous studies have shown that there are several possibly central concepts that are related to social interaction in our focus area of group performance. We use the examples of group cohesion, cooperativeness or reciprocity and convergence below. Group cohesion has referred to as “the extent to which group members perceive and feel attracted to the group” [6, pp 118]. Cohesion in a group has also been described as “a tendency to stick together or to be united either physically or logically” [8, pp 130]. Another important dimension in defining the quality of a social interaction is cooperativeness, a kind of reciprocity of communication. It is described as the behavior of people towards others in the group with which they share common interests and tasks. The behavior is also characterized by the fact that each person strives towards their goals within the group and this progress is facilitated by the other person’s actions, leading each to expect reciprocation [9].
98
T. Saari et al.
There is also another interesting approach to seeing how groups communicate: conveyance and convergence. Conveyance can be seen as the exchange of information among participants of the group in which the interpretation of the messages is done by the receiving individual [10]. This refers to the individual act of receiving and interpreting messages. Convergence, on the other hand, is about producing and facilitating shared social meanings among participants, rather than individual interpretations. In previous studies it has been suggested that these two different dimensions of meaning-making in a group may emerge differently based on the different mediating technologies used. People in face-to-face interactions tend to reach consensus (high level of convergence) faster than groups using online chatrooms [11]. It has been proposed that synchronous media are better in facilitating convergence whereas asynchronous media are better in supporting conveyance [12]. It then seems that it may be sensible to think of solutions to enhancing cohesion, cooperativeness, the efficient emergence of shared meanings and convergence as part of group awareness in the group in the context of technological support for group work. We have tried to address these issues in our prototypes with several ways of creating and sharing group awareness. Activity indexes for a working group For our prototypes we have preliminarily selected a number of key activity indexes that describe the state of the group or its activities. The indexes are formed by analyzing the user’s activities through our system. The indexes are displayed to users via our applications in visual representations. We have preliminarily chosen among many potential indexes which would be useful in group work. It should be noted that we have not yet tested our chosen indexes in field tests within the context of our applications and that the indexes after our first field tests could be altered, rejected or changed. Our chosen indexes for group activity are: group reciprocity, group centralization, my participation, my reciprocity, my popularity and my influence power. These indexes are hypothesized to be related to different types of group awareness. We expect that making such indexes available in a group work situation therefore enhances certain types of group awareness. In more detail the indexes and their meanings are as follows divided into group and individual levels:
A. Group level indexes: 1. Cooperation level Description: Describes the collaboration level in the group. Based on the tendency to contact others within the group in work tasks. A group could at certain times be very highly collaborative and at some times very little. This could be used to give an understanding the activity level and “tightness” of the group. Types of group awareness supported: Workspace awareness: what is the activity level of the group, what is the context into which I am sending my messages or requests? Social awareness: What is the level of “attention” of the group, how intensively the group works together?
A Mobile and Desktop Application for Enhancing Group Awareness
99
2. Communication hierarchy Description: Describes whether the group activity reflects a tendency to be hierarchical (top-down) around some individuals or whether it is democratic (many-tomany). In other words it is like the “power structure” of the group. Types of group awareness supported: Group-structural awareness: Who is the intellectual leader of the group? Who provide the best ideas? Is the group really collaborative and democratic or really top-down led by a single individual, perhaps corporate supervisor of the group? What are the different roles of people in the group?
B. Individual indexes: 1. My influence power Description: Describes how much a person is involved in the actions of the group. Can be based on how many contacts or messages the user receives. This reflects the importance of the user in the group. Types of group awareness supported: Group-structural awareness: How central am I as a person in the group? How central are others in the group? Do I have a lot of influence over the group? Who has most and least influence over the group? 2. My popularity Description: Describes the user’s popularity based on the idea that a user is more popular the more messages he/she receives. Types of group awareness supported: Group-structural awareness and social awareness: Am I popular? Who is the most popular person in the group? How popular is the person I am sending my message to? How should I craft my negative feedback message to this person as he is so popular in the group? 3. My participation Description: Describes how actively the user participates in the activity of the group. Is assessed on how much activity the user inputs into the system. Types of group awareness supported: Group-structural awareness: How actively do I participate in the groups activities? How active is my boss or co-worker?Social awareness: What is the level of interest of a person in the task we are doing based on his participation index? 4. My reciprocity Description: Describes how much the user is mutually contacted when the user contacts others. It reflects a kind of symmetry or mutuality of communication as seen from the point of view of a single user. For instance, a user could be communicating a lot but not receiving a lot of feedback, or the user’s sent and received messages reflect a balance or communication activities. Types of group awareness supported: Social awareness: What is the attention level of other users I am contacting regards my messages? Do they reply to my messages actively or not at all? Is there someone I am ignoring? Am I isolated in this group and if so, why? Should I talk more to this person as I have not really communicated with him?
100
T. Saari et al.
3 Prototype System and Application Mobile Application We constructed two prototypes based on our approach to designing the system: a mobile application running on mobile phones and a desktop application running on top of Microsoft Outlook. These two applications are the first ones constructed and reflect the status of the project at the time they were built. First we will discuss the mobile application. Technical description: client The client is based on Flash and Python technologies. The Flash part of the application is basically for handling vector graphics and showing results that come from the server. Python handles the hardware devices like Bluetooth and camera. We used Py60 for the Symbian platform and PyCE for WM. The Flash/Python application is able to run in many platforms such as Symbian phones (S60 2nd 3rd editions), WM, desktops (Linux/Windows/Mac OSX) and PocketPC. Technical description: server Server is based on Python/PHP. The main computations for social network analysis (SNA) indexes are done by the PSE, a special software that analyses and generates the social indexes discussed earlier. The interaction with PSE happens through a special protocol - SOAP - and the SOAP client is PHP based. Interface description When starting, the application updates data and this may take some time. The application opens to an orbit view (see Figure 1, column A). In this view the members of the group are placed to a three concentric circles with the user in the middle. Other group members are placed to the orbits around you either by “My Participation” or by “My Reciprocity” indexes. A person on the innermost orbit has similar index values with the user, whereas a person on the outermost orbit has very dissimilar values with the user. The user is given explanations of all the icons shown in Figure 1. For example MyParticipation is explained as ollows My participation refers to how extensively involved a person is in the communication among the groups members. Similar explanations and icons are used for other indexes. Left and right buttons in the mobile phone are used for rolling the circles to select a user, for the selected user the user name is highlighted. A firing button press opens a details window for the group member that has been selected. In the details window following data are shown (Figure 1, panel D): user name, availability and time tag, activity: a free text field for the current activity, last used: a time tag for when the user last time launched the mobile application, nearby: a list of nearby group members, based on mobile phone’s Bluetooth proximity, also, a user taken photo, with a time tag, is shown. The details window can be shut by pressing again the firing button. The Actions menu under the right soft key (Figure 1, panel E) offers sending a private message to the selected user, calling to the selected user, or sending him/her a sound message. From the actions menu the application can also be put to the background or closed.
A Mobile and Desktop Application for Enhancing Group Awareness
101
For some phone models it is not possible to put applications to the background, in these cases the only way is to shut the application, if other phone functionality is needed, and then again restart it.
A - Orbit
D - Details pop-up
B - Main
E - Details pop-up
C - List
F - My Details
G - Chat
Fig. 1. Different views of the first version of the mobile application. A) The orbit view. B) Main menu. C) List view. D) Details pop-up window for a single user data. E) Details pop-up window with the actions menu opened. F) My details view for inputting availability information etc. G) Chat view.
Pressing the Main with the left soft key opens the Main menu (Figure 1, panel B). Here the user can choose between the previously described orbit view, the List view, Input menu, and the Chat area. In the list view the list of group members is sorted with the same criteria as in the orbit view. Pressing the firing button opens the details pop-up window for a single user. In the My Details the user can input his/her status, mood, reason, action and launch the camera for taking pictures (Figure 1, panel F). On some phone models it is not possible to take pictures through the application. In the chat area the messages are preceded with a time tag (for example 1h = the message was sent 1 hour ago) and a user name (Figure 1, panel G). Desktop Application The desktop application works through an Outlook plug-in with Microsoft Outlook 2003 (and Outlook 2007 to a limited degree) and provides augmentation to the mail client on the basis of emails sent within the application. Additionally, Internet Explorer 7 or Mozilla Firefox 2.x is required for the Outlook tab to work properly.
102
T. Saari et al.
After the installation the user should be able to see the Pasion button (i.e. the button for our application to be launched) below the standard command bar (where there are icons to create a new document, open, save etc.) in Outlook. The Outlook plug-in collects data about e-mail traffic between trial group members and sends it to the Pasion server (i.e. our application server). The collected data are sender, recipient, time and date. It should be noted that no information on the content of e-mails is collected The data collected by tour server is used in calculation of the chosen indexes. These indexes are visualized both in the Outlook plug-in and in mobile application. The indexes are calculated from pooled e-mail and chat messages. In the upper left panel of the Outlook tab is a list of group members (Figure 2). Pressing Sort list in the upper left panel opens the sorting criteria menu. The user list can be sorted by Username, My Participation (automatically calculated index, described above), My Reciprocity, My Popularity, My Influence Power, Mood (manually set variable), and Availability (manually set variable).
Fig. 2. First version of desktop application interface in Outlook
Placing the mouse pointer over the user icon in the list view opens a pop-up window. From this window it is possible to send the user a private instant message, or view the user’s detailed information. Below the group panel is the MyStatus panel. Where the user can set his/her mood, reason, activity, or download a picture which is then shown in the pop-up details window. It is also possible to ask other group members to update their mood, reason, activity, or picture by placing the cursor over the Ask button. The request to update will be sent to the chat area. After editing the values the Set button must be pressed for the values to update. A logout button is also located there. Pressing the details button opens a view where the user can change password, language (English, Finnish, Italian, and Spanish), phone number, Bluetooth address, and e-mail.
A Mobile and Desktop Application for Enhancing Group Awareness
103
The orbit view operates similarly as with the mobile application, described above. Bringing a cursor over a user shows the details window on the right. A private instant message can be sent to user by clicking him/her. A separate tab is opened for private messaging to each group member. These can be closed from the chat areas title bar. In the details pop-up window there are shown: the user name, user taken picture, reason for mood, activity, last used time tag and nearby users determined by the Bluetooth proximity. The shown indexes are: My Participation, My Popularity, My Influence Power, and My Reciprocity.
4 Discussion One of the key future challenges for our work is to identify tasks that most benefit from the use of our system. After identifying the tasks we can redesign our system to support them and optimize the use of increased group awareness to fit the purpose. We hope that our initial emphasis on general communication processes across several types of tasks in knowledge work teams will produce results that help to focus our work further. There may also be challenges in the area of privacy when using our system. For instance, we hypothesize that no single user would wish to publicly transmit their social information. Similarly, problems of social comparison may arise. This has at least two sides. First, if a user is always in the periphery of the group as indexed by our visualizations in different knowledge work tasks, it may indicate unpopularity, not being liked, or not being valued. Second, as this is a system for work, there is always the question of one’s value to the employer. If the social network analysis visualizations show to one’s boss that one is constantly not producing much input and is at the periphery of interactions at a task in which one should perhaps not be at the periphery, this may create doubts about the role and value of such a person to the employer. On the other hand, both problems described above regards social comparison also cut the other way. For instance, if someone is very central and active as indexed by our system in some knowledge work task, it informs the person as well as the boss of his popularity and value to employer. In this way one could perhaps have “hardcore data” of one’s performance within the group and in the eyes of the employer. At the group level, it could be hypothesized that people tend to pump-up their performance relative to others and this would have positive outcomes regards group performance in total. However, the main problem may be the social mirror- effect. The indexes used create an individual mirror image in terms of: “Who am I in the context of this group? What is my worth? Am I popular and liked? Am I effective at my tasks?” Our system can also create an image of the others in the system, such as: “Who is this popular person? Why is this person the center of our communication? Why is this person at the periphery of our discussions when I expected more?” Our system creates a mirror image on some dimensions of the group using the system. This image can be either flattering or not. Despite the obvious challenges we feel that users will benefit from the use of our system by gaining a more holistic view of the group they are working with in addition to gaining insights about themselves in
104
T. Saari et al.
relation to the group. Our rationale is to enable users to transmit and receive enriched social cues to enhance their communication processes while working. The intrusiveness, resolution and accuracy of gathering the information as well as the understandability of visual representations of various indexes are naturally critical issues. The next stage of our work is to run field tests of our application in real-life working environments. Based on this research a new, refined version of the system will be built with tested and selected functionalities and improved visualization schemes.
References 1. Senge, P.: Sharing knowledge. Executive Excellence 14(11), 17–18 (1997) 2. Davenport, T., Jarvenpaa, S., Beers, M.: Improving knowledge work processes. Sloan Management Review 37(4), 53–65 (1996) 3. McGrath, J.E., Hollingshead, A.B.: Putting the “group” back in group support systems: Some theoretical issues about dynamic processes in groups with technological enhancements. In: Jessup, L.M., Valacich, J.S. (eds.) Group Support Systems: New Perspectives, pp. 78–96. Macmillan, New York (1993) 4. Greenberg, S., Gutwin, C., Cockburn, A.: A using distortion-oriented displays to support workspace awareness. Technical report, Dept. of Comp. Science, Univ. of Galgary, Canada (January 1996) 5. Saari, T., Kallinen, K., Salminen, M., Ravaja, N.: A System for Facilitating Emotional Awareness in Mobile Knowledge Work Teams. In: 41st Hawaii International International Conference on Systems Science (HICSS-41 2008), Proceedings, Waikoloa, Big Island, HI, USA, January 7-10, 2008. IEEE Computer Society, Los Alamitos (2008) 6. Woodman, R.W., Sawyer, J.E., Griffin, R.W.: Towards a theory of organizational creativity. Academy of Management Review 18(1), 293–321 (1993) 7. Andriessen, J.H.E.: Working with groupware. Understanding and Evaluating Collaboration Technology. Springer, London (2003) 8. Reber, A.S.: The penguin dictionary of psychology. Penguin, London (1985) 9. Raven, B.H., Rubin, J.Z.: Social psychology: People in groups. Wiley, New York (1976) 10. Wheeler, B.C., Dennis, A.R., Press, L.I.: Groupware comes to the Internet: charting a new world. ACM Sigmis Database 30(3-4), 8–21 (1999) 11. Dennis, A.R., Valacich, J.S.: Beyond media richness: an empirical test of media synchronicity theory. In: Proceedings of the Thirty-Second Hawaii International Conference on System Sciences, vol. 1 (1999) 12. Hung, Y.T.C., Kong, W.C., Chua, A.L., Hull, C.E.: Reexamining media capacity theories using workplace instant messaging. In: Proceedings of the 39th Annual Hawaiii International Conference on System Sciences, vol. 1, 19.2 (2006)
A Study of Fire Information Detection on PDA Device Xianghong Sun1, Weina Qu1, Thomas Plocher2, and Li Wang1 1
State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China {quwn,sunxh,wangli}@psych.ac.cn 2 Honeywell ACS Labs, Minneapolis MN 55418, USA
[email protected]
Abstract. This study was concentrated on how useful the en route information display system for firefighters’ information accessing, current situation understanding, and decision making, we did a series of tests to investigate the efficiency of the system, to compare different display ways including audio, text, and their combinations to find the most appropriate one. The result showed that: (1) Audio only always made firefighter taking the longest time to information detection, but the introduction of audio display made the two combined displays (text + audio, and text + 3rd level audio) more quickly to access information, and more easy to remember. (2) It should be clarified that en route system could be used very well either in quiet and static environment, or in a moving and a little bumping environment if user could get some training before using it. Keywords: Information detection, PDA, fire.
1 Introduction Fire incidents came into being with the discovery and utilization of fire and are closely linked to the advancement of human civilization [1]. Firefighter will know about details after arriving firing spots today. For saving time, a handheld PDA device has been developed, which could show the current fire related information to fire commanders. Fire alarm system is an essential part of high buildings in modern times, which helps firefighters’ detection more efficiently and reduces the casualty [2]. Ko’s research focused on proposed robust fire-detection algorithm that is installed in home network server [3]. How useful the en route information display system for firefighters’ information accessing, current situation understanding, and decision making, we did a series of tests to investigate the efficiency of the system, to compare different display ways including audio, text, and their combinations to find the most appropriate one. Purpose of the experiment is to answer the following questions: is the information they can get from the en route display system really helpful for firefighting decision making? In other words, comparing with the situation without en route display system, is the en route information really help the incident commander assess the situation and make decisions any faster or more accurate? Could firefighters be “primed” by en route information? J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 105–113, 2009. © Springer-Verlag Berlin Heidelberg 2009
106
X. Sun et al.
2 Method 2.1 Experimental Environment A handheld PDA device has been developed as the experimental platform, which could show the current fire related information to fire commanders. The size of words is based on previous experiment [4] .Totally 16 fire scenarios drawn from the previous 3D fire information display prototype [5] were pre-installed in the prototype. The experimental tasks were completed both in a lab and in a moving car. Since the PDA device was not dedicated for the en route display system, all the keys and buttons used in the experiment were four arrow keys for going up/down to the upper/lower level menu, and one middle key for getting updated information, the other keys became to be distracters. In order to avoid interrupting the application of en route prototype during the test, all the other functions were closed, and all the other keys and buttons were masked by plaster (see Fig. 1).
Fig. 1. PDA device as the experimental platform
Experimental places divided two parts: in a lab and in a moving car. In the test room, firefighters were asked to finish fire information detection task by using PDA prototype and 3D fire information display prototype. In the moving car, firefighters were asked to do a dual-task: the main task is to counting how many street lamps passing by on right side the road and speak out the number of lamps loudly; the secondary task is to finish fire information detection task using PDA prototype. 2.2 Participants 12 firefighters aged from 21 to 35 years participated in our experiment. 9 of them have more than 5-year firefighting experience. 6 of them have bachelor degree.
3 Procedure Test part was divided into three stages: test in lab, test in moving car, and test in lab with 3D prototype.
A Study of Fire Information Detection on PDA Device
107
Test in Lab 1. First alarm and fire spread finding (first on PDA then on PC touch screen): firefighter was exposed to four different scenarios (using hospital fires: SS fire, and SM fire) one by one. Each scenario was corresponding to one kind of displays: audio only, text only, combined text +audio, and partly combined text + audio. The display order was counterbalanced. For each scenario, firefighter was asked to find the location of the first fire alarm, and the fire spread by using en-route display as quickly as possible, task completion time was recorded as the performance of en route system. Then he was asked to go to the 3D fire information display system, which was used to compare the efficiency in the experiment of 2006[5], to find the location of the first alarm and the fire spread again. Task completion time was also recorded as the performance of 3D fire information display prototype. Test in Moving Car 2. First alarm and fire spread finding on PDA: procedure and the task requirement is the same as the task 3 in the lab. Also four scenarios (tower building fires: SS fire, and SM fire) was exposed to firefighter one by one. But firefighter was only asked to find the first alarm and the fire spread on PDA device. Task completion time was recorded as the performance. Test in Lab again: 3. First alarm and fire spread finding on touch screen: Just after the test in car and going back to the lab, only one scenario (the last one of the four scenarios used at the fifth step) was exposed to firefighter. The time of finding the location of first alarm and the fire spread was recorded as the performance. 4. First alarm and fire spread finding: A MM fire (multi fire seeds spreading on multiple floors) was exposed to firefighter, and record the task completion time both with en route system and the 3D information display prototype. 3.1 Data Analysis of Fire Detection Task This task was to test whether en route system help the incident commander assess the situation and make decisions any faster or more accurately and which way of information display was more helpful. Experimental design. 2*4 within subjects design was used in this task. Two factors were the places where the en route system was used, and the ways of fire information display. Lab and moving car were used as the two experimental places. Four different display ways were the same as mentioned above. Each firefighter was asked to complete 8 scenario fire detections (including locating the first fire alarm and judge the fire spread). 4 of them (hospital fires: SS or SM) were detected in the lab, and the other 4 (tower building fires: SS or SM) were detected in a moving car. Each scenario was corresponding to one kind of display style. The display order of the four scenarios and the matching pair between scenarios and the ways of information display were counterbalanced. For each subject, the test in the lab went first, and then followed by the test in moving car.
108
X. Sun et al.
During the test in car, as described in the experimental environment part, firefighter needs to fulfill a dual-task: counting as a distracted task was used as main task, and fire detection is the secondary task. Experimental Result Test in Lab Fire detection with en route system. Table 1 showed mean and std. deviation of task completion time in finding first alarm when using en route prototype. Task completion time was recorded from choosing the scenario by moderator to find the first alarm item. It’s found that “audio only” display was the worst way. It took the longest time. (F (3) =25, p=0.00). Among the other three display styles, there was no significant difference existed. Table 1. Task completion time in finding first alarm task with PDA
Mean of task completion time Std. Dev. N
combined text + audio
text + 3rd level audio
audio only
text only
0:00:50
0:00:11
0:00:09
0:00:06
0:00:26 12
0:00:11 12
0:00:07 12
0:00:02 12
Table 2 showed mean and std. deviation of task completion time in finding fire spread when using en route prototype. Task completion time was recorded from just finding the first alarm to finding the fire spread item. Among the four display styles, Both table 1 and table 2 showed the audio display took the longest to fulfill the fire detection task, but there was no significant difference existed. Table 2. Task completion time in finding fire spread task with PDA
Mean of task completion time Std. Dev. N
combined text + audio
text + 3rd level audio
audio only
text only
0:00:15
0:00:03
0:00:04
0:00:09
0:00:24 12
0:00:01 12
0:00:04 12
0:00:18 12
Fire detection with 3D information display prototype. Table 3 showed mean and std. deviation of task completion time in finding first alarm when using 3D information display prototype. Among the four display styles, there was no significant difference existed. Table 4 showed mean and std. deviation of task completion time in finding fire spread when using 3D information display prototype. Among the four display styles, there was no significant difference existed.
A Study of Fire Information Detection on PDA Device
109
Usefulness of the en route information display system. In our previous experiment conducted in 2006 [5], it’s found that, without the help of en route system participants didn’t finish fire detection task within 30s. In this experiment, combining the data in table 3 and table 4, we found all subjects could finish all the tasks no more than 30s when using “Text + the third level auditory” display for SS or SM fires. Table 3. Task completion time in finding first alarm task using 3D prototype
Mean of task completion time Std. Dev. N
combined text + audio
text + 3rd level audio
audio only
text only
0:00:24
0:00:17
0:00:24
0:00:11
0:00:25 12
0:00:18 12
0:00:25 12
0:00:05 12
Table 4. Task completion time in finding fire spread task using 3D prototype
Mean of task completion time Std. Dev. N
combined text + audio
text + 3rd level audio
audio only
text only
0:00:45
0:00:22
0:00:24
0:00:19
0:00:50 12
0:00:14 12
0:00:21 12
0:00:16 12
Test in Moving Car Sitting in a moving car, firefighter was asked to do a dual-task: counting street lamps loudly + fire detection. In this situation, moving and bumping made text on screen difficult to read, and dual-task made attention resource very limited to fire detection task. One hypothesis was that audio display would help firefighter to access information and understand it easily and quickly. Especially for the situation that user’s hands were not available for operating the en route system, which happened rarely in the lab situation. Therefore, except for the four kinds of displays we used in lab, an auto-play way was added as the fifth level of the information display ways. Here the auto-play meant that the fire information could be automatically play by voice message and any key/button operation was not necessary, which ensure firefighter could get all the information even if they had no hand, or had no time to activate the en route system. So, for each firefighter in this test part, he would do 5 fire scenarios and each scenario was corresponding to one kind of display styles. There are total 5 display styles, including auto-play audio way. The order of the 5scnarios was counterbalanced. Table 5 showed mean and std. deviation of task completion time in finding first alarm when using en route system. The time was recorded from choosing the scenario by moderator to the time point of having found the first alarm item. It’s found that finding first alarm using “audio only” display was the worst way. It took the longest time (F (3) =4.4, p=0.01). The bottom half in table 5 showed mean and std. deviation
110
X. Sun et al. Table 5. Task completion time in fire detection using en route system in car
task completion time Mean the first fire alarm
Std. Dev. N Mean
the fire spread
combined text + audio
text + 3rd level audio
audio only
text only
0:00:40 **a
0:00:13
0:00:05
0:00:05
0:00:54
0:00:10
0:00:02
0:00:03
12 0:00:08 **b
12
12
12
0:00:03
0:00:03
0:00:03
0:00:02
0:00:01
0:00:01
12
12
12
Std. 0:00:05 Dev. N 12 a: F (3)=4.4 p=.01; b: F (3)=9.2 p=.00.
of task completion time in finding fire spread when using en route system. It’s found that using “audio only” display to find fire spread was the worst way too. It took the longest time also (F (3) =9.2, p=0.00). Counting Task Performance This task aims to simulate an attention distracter, which can interrupt how the subject finished detection task. All the firefighters were required to count how many street lamps were passing by on right side along the road and to speak out the number loudly. Participants began to counting task when the recorder asked the participants to do it. After the experiment, the experimenter and the recorder count the number of street lamps in all together. From the table 6, we can see that S1 and S10 have the worst counting task performance, 67% and 69%, respectively. Most of the participants have good counting task performance. Someone even counted the number absolutely correct! The mean of correct percentage is 89%. Table 6. Percentage of correct counting street lamps in moving car Subjects
S1
S2
S3
S4
S5
S6
S7
S8
Percentage of correct 67% 90% 97% 96% 89% 85% 92% 100 answer
S9 S10
S11
S12
99% 69% 88% 91%
Comparison of the two test situations: in test room vs. in moving car. Fig 2 and Fig 3 showed the comparison of task completion time of finding first alarm and fire spread task between lab situation and car situation. They showed the same trend wherever in the lab or in moving car. It seemed that subjects could take shorter time in car to fulfill the fire detection task than to do it in the lab. There could be two reasons for the result: 1) The en route system was not so complicated to operate and understand, so even in a moving and bumping situation, it was
A Study of Fire Information Detection on PDA Device
111
0:01:00 0:00:52 audio only 0:00:43
text only Text + the third level auditory display combined text + audio
0:00:35 0:00:26 0:00:17 0:00:09 0:00:00 in the lab
in driving car
Fig. 2. Task completion time of finding first alarm
0:00:17 0:00:16
audio only text only
0:00:14 0:00:12
text + the third level auditory display
0:00:10 0:00:09
combined text + audio
0:00:07 0:00:05 0:00:03 0:00:02 0:00:00 in the lab
in driving car
Fig. 3. Task completion time of finding fire spread
still easy for firefighter to use. 2) There perhaps existed a training effect. In our experiment firefighter always completed the tasks in the lab first and then went through the rest tasks in car, which probably made them more and more familiar with the PDA device.
112
X. Sun et al.
Comparison of the two auditory displays: auto-play vs. play by manual control. For the auto-play display, mean of the task completion time is 2 minutes 52 second. 8 subjects thought automatic style wasted much time to find useful information. Test in Lab Again (Complicated Fire Test) Most scenarios used in lab test and in car test were SS fire, or SM fire, which were easy for firefighters to understand the current fire situation. In order to find the advantages and disadvantages of en route system, and to find the appropriate conditions that en route system could be made the best use of, after completing the 8 fire scenarios and firefighter’s going back to the lab firefighter was asked to fulfill fire detection task to a complicated fire (hospital or Camden fires: MM fire). The task completion time was recorded. 12 subjects were divided into three groups, and each group did one scenario using one of the four kinds of display styles. After they finished finding first alarm and fire spread by en route system, they would be asked to use the 3D FirstVision graphical display prototype to do the fire detection task. Table 7 showed mean and std. deviation of task completion time of fire location when using en route system. Result was similar as above. Audio display took the longest time (F (3) =71.03, p=0.00). Among the other three display styles, there was no significant difference existed. For the performance of fire spread, among the four display styles, there was no significant difference existed. Table 8 showed the task completion time of fire detection when using 3D prototype. Among the four display styles, there was no significant difference existed. Table 7. Task completion time in fire detection using en route system
task completion time
the first fire alarm
the fire spread
Mean Std. Dev. N Mean Std. Dev. N
audio only 0:00:12 ** 0:00:1 3 0:00:4 0:00:4 3
text + 3rd level audio
text only
combined text + audio
0:00:03
0:00:04
0:00:02
0:00:0 3 0:00:2 0:00:1 3
0:00:01 3 0:00:2 0:00:0 3
0:00:0 3 0:00:3 0:00:1 3
**F (3) =71.03, p = .00. Table 8. Task completion time in fire detection using 3D prototype
task completion time the first fire alarm the fire spread
Mean Std. Dev. N Mean Std. Dev. N
audio only
text only
combined text + audio
0:00:16 0:00:12 3 0:00:19 0:00:07 3
0:00:12 0:00:10 3 0:00:53 0:00:09 3
0:00:26 0:00:16 3 0:01:01 0:00:43 3
text + 3rd level audio 0:00:14 0:00:4 3 0:00:31 0:00:16 3
A Study of Fire Information Detection on PDA Device
113
For complicated fires, subjects couldn’t finish all the tasks within 30s. And few of the 12 firefighters could find there were 2 fire seeds in the test scenario.
4 Conclusion In order to answer how useful the en route information display system for firefighters’ decision making, we did a series of tests to investigate the efficiency of the system, to compare different display ways including audio, text, and their combinations to find the most appropriate one. Based on the data, we can summarize our findings as follows: 1. En route information display system was useful to help firefighter get the critical fire information and make decision more quickly and accurately. 2. Comparing the four information display ways: audio only, text only, audio + text, and text + the 3rd level audio, audio only always made firefighter taking the longest time to understand the information, but the introduction of audio display made the two combined displays (text + audio, and text + 3rd level audio) more quickly to access information, and more easy to remember. 3. Comparing the two situations of using the en route system: in lab, and in moving car, it was found that for the task of fire information detection, it took shorter when it was in moving car than in lab, but there was no statistical difference. This result didn’t mean people could do a better job in car than in lab. It should be clarified that en route system could be used very well either in quiet and static environment, or in a moving and a little bumping environment if user could get some training before using it.
References 1. Guo, T.N., Fu, Z.M.: The fire situation and progress in fire safety science and technology in China. Fire Safety Journal 42, 171–182 (2007) 2. Fang, Z.J.: Development of human-machine interaction: multimedia and multisensory. Human Factors 2, 34–38 (1998) 3. Ko, B.C., Cheong, K.H., Nam, J.Y.: Fire detection based on vision sensor and support vector machines. Fire Safety Journal 44, 322–329 (2009) 4. Sun, X.H., Plocher, T., Qu, W.N.: An empirical study on the smallest comfortable button/icon size on touch screen. In: Aykin, N. (ed.) HCII 2007. LNCS, vol. 4559, pp. 446– 454. Springer, Heidelberg (2007) 5. Qu, W., Sun, X.H.: Interactive Style of 3D Display of Buildings on Touch Screen. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS (LNAI), vol. 4562, pp. 157–163. Springer, Heidelberg (2007)
Empirical Comparison of Task Completion Time between Mobile Phone Models with Matched Interaction Sequences Shunsuke Suzuki1, Yusuke Nakao2, Toshiyuki Asahi1, Victoria Bellotti3, Nick Yee3, and Shin'ichi Fukuzumi2 1
NEC Corporation, Common Platform Software Research Laboratories, 8916-47, Takayama-Cho, Ikoma, Nara 630-0101, Japan {s-suzuki@cb,t-asahi@bx}.jp.nec.com 2 NEC Corporation, Common Platform Software Research Laboratories, 2-11-5, Shibaura, Minato-ku, Tokyo 108-8557, Japan {y-nakao@bp,s-fukuzumi@aj}.jp.nec.com 3 Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA {bellotti,nyee}@parc.com
Abstract. CogTool is a predictive evaluation tool for user interfaces. We wanted to apply CogTool to an evaluation of two mobile phones, but, at the time of writing, CogTool lacks the necessary (modeling baseline) observed human performance data to allow it to make accurate predictions about mobile phone use. To address this problem, we needed to collect performance data from both novice users’ and expert users’ interactions to plug into CogTool. Whilst novice users for a phone are easy to recruit, in order to obtain observed data on expert users’ performance, we had to recruit owners of our two target mobile phone models as participants. Unfortunately, it proved to be hard to find enough owners of each target phone model. Therefore we asked if multiple similar models that had matched interaction sequences could be treated as the same model from the point of view of expert performance characteristics. In this paper, we report an empirical experimental exercise to answer this question. We compared identical target task completion time for experts across two groups of similar models. Because we found significant differences in some of the task completion times within one group of models, we would argue that it is not generally advisable to consider multiple phone models as equivalent for the purpose of obtaining observed data for predictive modeling. Keywords: Cognitive Model, CogTool, Evaluation, Human Centered Design, Human Interface, Mobile Phone, Systematization, Usability Test.
1 Introduction Usability evaluation should be performed in the early phase of a product development process [1]. In addition, commercial enterprises demand that evaluations do not incur high costs. To satisfy these requirements, some of the authors of this paper have been J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 114–122, 2009. © Springer-Verlag Berlin Heidelberg 2009
Empirical Comparison of Task Completion Time between Mobile Phone Models
115
developing systematized evaluation methods and tools that can be applied early and economically [2]. CogTool [3] is a user interface evaluation tool for predicting task execution process and task completion time, using a given interface. In CogTool, a user model based on ACT-R cognitive architecture [4] mimics execution of a task by using graphical specification data extracted from frames of a storyboard for the task, which is input into CogTool in advance. CogTool offers a low-cost evaluation approach for the early part of a product development process. Just a sketch as the storyboard, which need not be functionally implemented, is enough for evaluation with CogTool. This small requirement allows us to evaluate the user interface early, cutting the cost of developing the system in which the user interface works. As a computational user model, not an actual human, executes tasks in CogTool, costs such as recruiting, organizing and paying participants and use of a usability lab are avoided. To apply CogTool to an evaluation for a new system, it is necessary to refine the user model to improve the accuracy of its predictions, using observed data of actual experts’ and novice’s task execution (observed performance data). In this refinement, we planned to incorporate observed performance data into CogTool’s user model and then compare its predictions with additional observed data [5]. The user model in CogTool can represent a novice who explores how they should interact with the target system, or an expert who can quickly execute the most efficient interaction sequences. In order to collect enough observed data to both incorporate in CogTool and to compare with its predictions, we needed to recruit a considerable number (approximately twenty) of experts, who had owned the specific model of the product to be evaluated for longer than two months. This research was part of an effort that also included comparing CogTool’s predictions in mobile phone evaluation to subjective user impressions in order to see if there were any correlations between these two different evaluation approaches.
2 Challenge of Recruiting Owners of Specific Models Recruiting owners of specific mobile phone models is very hard because the numbers of owners of a given model are low, due to the fact that new models are released frequently. Also, recruiting is expensive because “Owner of a specific mobile phone model” is a stricter qualifying condition for a recruiter than general conditions such as age and gender. Even if cost is not an obstacle, the recruiting may take a long time. Of course, an alternative way to make the recruiting easier would have been to reduce the number of participants. However we wanted to keep the required number (twenty) because we would like to modify the user model in high accuracy and to analyze the correlation between completion time and subjective impression statistically. Thus, it was clear that it would be quicker and cheaper to find owners of several similar related models that have matched interaction sequences for target tasks than only owners of one specific model. We defined a “matched interaction sequence” as the same sequence of key presses required to complete a given task.
116
S. Suzuki et al.
As mentioned above, our main objective in this research was a planned comparison between CogTool model predictions and observed user performance data. Our planned method was to capture the duration of each interaction event on the mobile phone by analyzing video frames and recording real user key presses [5] as they perform task execution steps in the same order as is specified for the CogTool model (thus excluding idiosyncratic user performance). This protocol was what drove the demand that all mobile phone models for the observed data have to have matched interaction sequences. In this paper, we report on a preliminary procedure that was conducted prior to our main observed user performance data collection effort. This was an empirical validation to clarify whether we could treat multiple mobile phone models with matched interaction sequences as equivalent for the purposes of predictive modeling.
3 Experiment This section describes our experiment in which we collected user task completion times (not individual key presses as is planned for our future study) across phone models in two groups (A and B), each defined by its members being a target phone that we wished to evaluate with CogTool or having matched interaction sequences to the target phone. This meant that within each group, the user interfaces were similar and tasks could be executed using exactly the same steps across models. The main difference between the models was simply their physical form factor (they had equivalent but differently sized and spaced keys). Participants executed a set of the same tasks with the same key press sequences across all the models in the group. After collecting data on participant performance, we compared the mean of the completion time for each task between the phone models. We explain this method in more detail below. 3.1 Mobile Phone Models We defined 2 mobile phone model groups. In Group A, there was the N905i, which was a target model for CogTool, and the N905iμ, which has the same interaction sequences as the N905i (Fig.1). In Group B, there was W61CA, another target model for CogTool, and the W61H and W53H, which had same interaction sequences as the W61CA (Fig.2). The models in each group have a matched key layout. For example, both N905i and N905iμ have a mail key above a main menu key. However, size, form, depth to press, and distance of keys varies by model. Because Fitts’ law [6] [7], which was used in CogTool, was the logarithm of key distance / key size, we supposed that a small gap of the distance or the size between the models would not affect the time to reach from a key to another. In this experiment, we selected only tasks with matched interaction sequences on the phone model within the group. For instance, in the case where a user has to select a target item to go to the next frame in a task, if the number of items above the target is different between the models, the number of key presses is also different with them. This difference means that their interaction sequences are not matched. Therefore, we
Empirical Comparison of Task Completion Time between Mobile Phone Models
117
Fig. 1. N905i (left) and N905iμ (right) in Group A
Fig. 2. W61CA (left), W61H (center), and W53H (right) in Group B
did not use either these models or tasks in the experiment. If the difference in number of items did not affect the interaction sequence (e.g., the items which are below the target item), the models and the task were usable in this experiment. Also, in cases where displayed labels of the items except the target one differed between the models, we used the models and the tasks in this experiment, because the participants trained to be experts, who already knew the interaction sequences for the task, could find which item they should select without comparing its label with other items’. 3.2 Participants 20 participants (16 males and 4 females, age: 20-40s) for Group A and 24 participants (18 males and 6 females, age: 20-40s) for Group B took part in this experiment. We did not select participants based on prior experience with specific mobile phone models. Instead, we provided all participants with extensive time to learn specific target tasks on specific phone models as described in 3.6 Learning.
118
S. Suzuki et al. Table 1. Task list for Group A
Content
Task 1
Task 2 Task 3
After inputting 000-3454-1111, store the number in a phonebook. Enter the name “Nihondenki” in Kana letter. Set a schedule named “Nomikai” in Kana letter from 19:00 to 23:00 tomorrow. Turn on/off “the auto keypad lock after folded”
The number of key presses 39
43 24
Task 4
Check the newest mail sent to Aoki-san in the “Friend” folder.
8
Task 5
Take a picture soon after launching a camera. Then save the picture in the “Camera” folder in the “My Picture” folder.
11
Table 2. Task list for Group B
Content
Task 1
Task 2 Task 3 Task 4
After inputting 000-3454-1111, store the number in a phonebook. Enter the name “Nihondenki” in Kana letter. Set a schedule named “Nomikai” in Kana letter from 19:00 to 23:00 tomorrow. Check the newest mail sent to Aoki-san in the “Friend” folder. Take a picture soon after launching a camera. Then save the picture as an idle screen, with a sub menu.
The number of key presses 39
40 6 13
3.3 Tasks We used 5 tasks for Group A and 4 tasks for Group B. The tasks are listed in Table 1 and Table 2. They are common mobile phone functions. At the same time, we selected tasks with various numbers of key presses. Although there were alternative interaction sequences for each task, we instructed all participants to use the same sequence for each task in this experiment. 3.4 Task Assignment to the Participants For Group A, we assigned two of the five tasks (see Table 1) to each participant. Thus, for each task in Group A, we had data from eight participants. For Group B, we
Empirical Comparison of Task Completion Time between Mobile Phone Models
119
again assigned two of the four tasks (see Table 2) to each participant. Thus, for each task in Group B, we had data from twelve participants. Each participant completed these two tasks across the phone models within the group. 3.5 The Number of Trials for Data Collection During the study, each participant repeated each task five times. We will refer to this portion of each task as the “main trial”. In addition, when participants switched from one phone model to the next, they performed two practice trials before the main trials for each task. These practice trials are not included in the data analysis. 3.6 Learning We set aside time for participants to learn assigned interaction sequences. In the learning phase, they executed the tasks assigned to each of them, with the assigned interaction sequences. In this experiment, we had to compare data generated by experts, because the observed data required for refinement of CogTool also needed to come from experts (as well as novice users). Another purpose of the practice was to reduce variance in completion time, because the more trials a person does, the smaller the gap of the completion time between the previous trial and the next one, along a general learning curve [8]. In these purposes, we set as many practice trials as possible so that participants were able to learn the interaction and develop as much expertise as possible. The practice for both models took place before the main trial part, making the participant’s learning level for each model more similar since we would expect transfer effects. If the order of the experiment had been “1. practice with N905i, 2. main trial with N905i, 3. practice with N905iμ, 4. main trial with N905iμ”, the participant’s learning levels for N905iμ could have been higher than for N905i because the participants benefit from experience with the phone they use first so that they would have far more experience during 4. main trial with N905iμ than when they did 2. main trial with N905i. Therefore, we set the order as, “1. practice with N905i, 2. practice with N905iμ, 3. main trial with N905i, 4. main trial with N905iμ”. By setting the order alternately for each task, we avoided a large gap between the learning levels for each model. The numbers of practice for each task were 46 times (23 times per a phone model) in Group A and 36 times (13 times per a phone model) in Group B. The numbers were dictated by a practical concern that the entire session for each participant should be completed in 90 minutes to avoid participant’s fatigue. For example, in Group A, there are all 120 trials; 20 main trials for 2 tasks (5 trials × 2 models × 2 tasks), 8 trials to get used to the model when the model switching (2 trials × 2 models × 2 tasks), and 92 practice trials (46 trials × 2 models × 2 tasks). If it takes maximum 45 seconds to execute 1 task, it takes 90 minutes to execute all 120 trials (45 seconds × 120 trials = 5600 seconds).
4 Results We conducted a one-way repeated-measures ANOVA for each task with phone model as the factor. In Group A, there was no effect of phone model in any of the tasks (p’s > .16), see Figure 3. In Group B, we found two significant differences. There was
120
S. Suzuki et al.
a significant effect of phone model in task 2 (F[2,22] = 5.86, p < .01) and task 4 (F[2,22] = 18.72, p < .01), see Figure 4. The other two tasks in Group B were not significant (p’s > .11). Post-hoc comparisons showed that in task 2, W53H was significantly different from the other two models (p’s < .05). And in task 4, W61CA was significantly different from the other two models (p’s < .05).
Fig. 3. Average completion time for each task with each model in Group A
Fig. 4. Average completion time for each task with each model in Group B
Empirical Comparison of Task Completion Time between Mobile Phone Models
121
5 Discussion Based on the results discussed above, for our planned data collection exercise to gather mobile phone interaction performance data to incorporate into CogTool, we will use only one model as the target model. Thus, even though it is likely to be more time consuming and difficult, we should recruit only owners of, and do our evaluations on, only the specific target phone model that we plan to model with CogTool, even though the recruiting cost is more expensive. One possible concern with the study design was that we had 12 participants for each task in Group B, but only 8 participants for each task in Group A. Thus, it may be the case that we only found significant differences in Group B because we had more statistical power from the larger sample size. To examine this concern, we reanalyzed the data from Group B with 4 participants randomly removed from each task. We found that both tasks were still significant at p < .05. This suggests that the difference in sample size alone is not why we found significant differences in Group B but not Group A. One of possible reasons of the significant difference between phones is the physical characteristics of the keys because many of the participants commented that these characteristics had affected their subjective performance. For example, some of the participants commented that flat keys had been difficult to distinguish from adjacent keys because of the lack of tactile cues. Others commented that keys with deeper key press feel made it easier to distinguish multiple repeated key presses using the tactile sense. Actually, W61H and W53H have flatter and shallower keys than W61CA has.
6 Conclusion The study suggests that we should not consider multiple mobile phone models with matched interaction sequences as equivalent to the same model, because we found significant differences in the mean task completion time between the models in Group B. Even though we found no significant differences between the two models in Group A, the findings from Group B suggest that a more conservative approach overall in using only one model may be warranted for developing cognitive models to minimize potential noise from usage variations across phone models. In Group B, there were one or two tasks with a significant difference in completion time between the models even though only four tasks out of a total of 10 target tasks were executed in this experiment. Based on the differences found in this preliminary study, we expected that it would be hard to find 10 tasks that did have matched interaction sequences but did not exhibit significant differences in completion time that would be suitable for our planned main objective to collect valid observed data on which to base modeling of mobile phone interaction. With more tasks, more participants and more trials in the main study, we would expect the number of significant differences between models to increase and make our observed data less reliable. As mentioned in the Discussion section based on the participants’ comments, we expect one of possible reasons of the significant difference between phones is the
122
S. Suzuki et al.
difference in tactile key press sensation due to hardware differences between different phone models.
References 1. Nielsen, J.: The Usability Engineering Life Cycle. Computer 25(3), 12–22 (1992) 2. Bellotti, V., Fukuzumi, S., Asahi, T., Suzuki, S.: User-Centered Design and Evaluation The Big Picture. In: Proceedings of Human Computer Interaction International. Springer, Heidelberg (to appear, 2009) 3. John, B.E., Prevas, K., Salvucci, D.D., Koedinger, K.: Predictive Human Performance Modeling Made Easy. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2004, pp. 455–462. ACM, New York (2004) 4. Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An integrated theory of the mind. Psychological Review 111(4), 1036–1060 (2004) 5. Teo, L., John, B.E.: Comparisons of Keystroke-Level Model predictions to observed data. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2006, pp. 1421–1426. ACM, New York (2006) 6. Fitts, P.M.: The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology 47, 381–391 (1954) 7. Fitts, P.M., Peterson, J.R.: Information capacity of discrete motor responses. Journal of Experimental Psychology 67, 103–112 (1964) 8. Newell, A., Rosenbloom, P.S.: Mechanisms of skill acquisition and the law of practice. In: Rosenbloom, P.S., Laird, J.E., Newell, A. (eds.) The Soar papers. Research on integrated intelligence, vol. 1, pp. 81–135. MIT Press, Cambridge (1993)
Nine Assistant Guiding Methods in Subway Design – A Research of Shanghai Subway Users Linong Dai School of Media and Design, No. 500 Dongchuan Road, Shanghai Jiaotong University, Shanghai, China
[email protected]
Abstract. In big cities, it often occurs that passengers (users) have great difficulties to recognize subway stations. Except improving the signs of subway stations, based on large amounts of field researches, we find 9 practical and effective methods to help passengers to identify subway stations. These 9 methods include visual design, aural design, and tactual design etc. This paper also tries to apply some theories of cognitive psychology about human memory in the research of subways. These methods are also applicable to other space design in subway and even general underground space design. Keywords: User Research, Subway Station, Quick-Identification.
1 Introduction The following photos of the 4 stations were taken in Shanghai Subway Line 1 from the same point of view from the train.
Fig. 1. 4 different stations in Shanghai subway
From both photos taken on the site and large amount to interview, we find that most stations in Shanghai Subway are similar. If we omit the signs which reads name J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 125–132, 2009. © Springer-Verlag Berlin Heidelberg 2009
126
L. Dai
of a station, users can barely identify stations quickly through watching the platforms. In fact, the crowd condition of the train and the different user perspective all result in missing the sight of signs. In addition, from the research on familiarity of subway users and their behaviors, we find something very interesting: the more familiar a user with subways, the less he reliant on signs. Furthermore, the flood of advertisements also disturbed users’ sight. Thus we propose to develop some assistant methods to help users obtaining guiding information form the environment. Biological cognitive theories emphasize on the offers of the environment, and the offers about human surviving can be obtained instinctively, or can be learned without much effort. This enlightened us to make good use of human instinct of identifying environment. Thus, through proper design, we can make it more energy saving and more convenient for users in guiding themselves in underground environment. The way people guiding themselves on streets enlightened us that people seldom depend on signs in identifying familiar environment, such as the way home. People tend to build their cognition right from information provided by the environment. In order to transfer the good experience on the ground to underground environment, we need to add more characteristic information for users to memorize. This may be a new way in subway guiding field. The researches on behavior and way of cognition of subway users will enlighten designers to build humane subway environment. The techniques of research include abundant observation and interviews, eyetrack experiences and psychological experiences, questionnaires, and some literature reviews. We tried to find out new fields of guiding methods in subways through developing all kinds of senses of users, such as vision, hearing, and touching, and combine them with researches about human memories and human experiences.
2 9 Assistant Guiding Methods for Subways 2.1 Visual Series – Spatial Design Aids Guiding in Subways Voice from a Beijing subway user: I absolutely won’t miss Yonghe Palace Station! Even the passengers only have been there once, they also feel the same. Compared to those featureless stations, Yonghe Palace station is easy for users to recognize and memorize.
Fig. 2. Yonghe Palace Station, Beijing (left) VS the featureless Gulou Street Station (right)
Nine Assistant Guiding Methods in Subway Design – A Research
127
In subway construction, we can add some features to subway space, thus effectively help passengers to recognize stations and guide themselves. Common method such as, setting several layers of ground or ceiling, making inter-junction of horizontal position, comparing different spatial areas, making different arrangement of the columns along corridors, etc. Even adding features in a part of the space is also effective, e.g. a clearstory in the ceiling or a raised plant container on the ground. Furthermore, we can imitate the way people recognizing surroundings above the ground, i.e. to build characteristic landmarks in underground spaces, which is also a good guiding method for users. Spatial design can effectively leave strong memories in users’ minds, because the ability of recognizing three-dimension spaces is people’s instinct, and is continually developing in the process of human evolution. One person can store huge amounts of spatial memories in his mind. If we make good use of human brains, the results will be remarkable. 2.2 Visual Series – Color Design Aids Guiding in Subways Applying au unique color for each station works well in the design of Hongkong subway lines. As long as the passenger recognizes the color of the destination station, he will take off at the right station. The result of our research shows that we should pay attention to three aspects in color design for subways: 1. Use colors with high saturation, avoid using compound colors. Make it easy for users to identify the hue of the color. 2. The area of the color should be large enough for users to identify through the window in every corner of the train. 3. Do not use similar color in two neighboring stations.
Fig. 3. Hongkong Subway: Zhonghuan Station is red (see the left picture), while Jiulongtang Station is blue
128
L. Dai
Chicago subways are named by color. Stations of the same line all continuously bear the same colored ribbon in and out of the stations. It is easy for a user to know which line he is taking, thus strengthen the guiding effect of color. Surely enough, when we use colors to mark different subway lines, large areas of high saturated color is needless; otherwise, with all the stations of one similar color, memories of the users will be disturbed. 2.3 Visual Series – Decoration Design Aids Guiding in Subways The result of eye-track experiments shows that testees are able to identify a station by decoration. Another interesting finding is that memories of decoration is not accumulating with time, but appears to be a salutatory process, i.e. once or twice a user is attracted by some decorations, he will soon form a vivid memory of it. If he sees the same decorations again, he will recognize them immediately. Considering the limited space for users to pass, when we apply this method for guiding, we’d better use big and global decoration in the station, or set the decorations along the only path for users. Furthermore, as users need to identify a station from a train window quickly, the decorations in the platform should be distinctive and eye-catching; otherwise it will reduce the resolving power of it. As for the big decorations in other places, even not so eye-catching, users will identify them when passing by. 2.4 Visual Series – Lighting Design Aids Guiding in Subways Lighting is inevitable in underground spaces. Use lighting system to guide users is a good idea. Especially in the transfer stations, sometimes, it is too crowded for users to find the features of ground and wall. Then, using lights on the air for guiding is another effective way. There are two kinds of lighting design to aid guiding in subways: the shape of the lights and the color of the lights. As most lights in subways are fluorescent lamps, we advice that we should make different arrangement of these lamps, thus take advantage of the shape of the lamps. Fig. 4 shows with low cost, the guiding effect will rise. Condition permitted, the lighting system design in Japan is better in effect. As for using the color of lights for guiding, colorful lights such as LED is needed.
Fig. 4. The different arrangement of fluorescent lamps in Beijing subways
Nine Assistant Guiding Methods in Subway Design – A Research
129
Fig. 5. Lidabashi Station, Japan, designed by Makata Sei
2.5 Visual Series – Product Design Aids Guiding in Subways In our interviews, some sensitive passengers said that they use seats, billboards, newspaper boards, lamps, and advertisements to identify destination stations. Products which can aid guiding in subways are not only unitary but also diverse. They can help the users who pay great attention to details identifying stations quickly. Fig. 6 shows two subway stations with different multi-functional billboards in Vienna, which also serve as a symbol for station identification.
Fig. 6. Different multi-functional billboards in Vienna subways
2.6 Visual Series – Material Design Aids Guiding in Subways In the interview to Hongkong subway users, the effect of material in guiding is confirmed. Different materials are used in different lines of Hongkong subways. Thus the inter-changeable station Lijing Station between Quanwan Line and Dongyong Line use mosaic on one wall and plastic-aluminum board on another.
130
L. Dai
Fig. 7. Lijing Station of Hongkong subway
In the research of Shanghai subway users, some users said: Shanghai South Railway Station is relatively new. Thus proves that some passengers do use material to help identifying stations. But most passengers haven’t perceived the different materials used in Shanghai subways. Experiment shows that the effect of material in guiding is relatively poor. Only when the feature of material is enlarged and form a global impression, the effect will be obvious. 2.7 Aural Design Aids Guiding in Subways Some Shanghai subway users identify stations by sound. We got following comments in our interviews: I know which station is People’s Square even with my eyes closed, because almost all the people on the train will take off at this station with a big noise. Although I miss the broadcast of the name of a station, it will be fine, because the following broadcast will introduce some tourist sites for passengers. Once I heard a familiar name, I know it’s time for taking off. What’s more interesting is the musical broadcast in Pusan subway in Korea. If the train will soon stop in a station with sea nearby, you will hear the sound of ocean wave and sea-gulls. If the station is near mountains, you will hear birds twittering in the woods. Commuters will easily identify stations with those sounds of nature. Subway users take advantage of all kinds of sound to help identify stations. Based on memory theory, music is easier for people to memorize than other sounds. Using background music to guide passengers is feasible. Especially for those people who with poor eye-sight, a familiar tune is more friendly than any other guiding method. 2.8 Scent Design Aids Guiding in Subways We have not find any example about how scent help Shanghai subway users identifying stations in our research, but some passengers did mentioned that sometimes a
Nine Assistant Guiding Methods in Subway Design – A Research
131
special scent will remind them of a place. For example, a user who frequently takes off at Xujiahui Station mentioned that he can even smell the path to Pacific Bazzar because there is a W.C. on the way. These interesting reports excite us to explore how scent memories aid guiding in subways. As we all know, the nose of human being can help to store large amounts of information in human memory. Compared to other sense organs, nose is more reliant on intuition. Once a memory of scent is formed, it is hard to forget. 2.9 Experimental Memory Aids Guiding in Subways From some interesting interviews we find that some passengers’ guide themselves by a special experience in a certain station in the past, i.e. they memorize the station because they did something special in such place. Yes, I can identify Shanghai Railway Station because there is a W.C. on the platform. That’s true. I have been there, too. Haha~ Our interview proved that if we provide some additional functions in subways properly, e.g. commerce, entertainment, and exhibition, those colorful experiences will also help users memorizing and recognizing subway space. Subways in Paris and Japan, which built many years before, also have lots of commercial areas. Underground commerce can not only bring great profits, but also brings more stories about subways, and makes the dull trip a colorful experience. 2.10 Other Aspects of Design Aids Guiding in Subways There are many more interesting methods about how users guide themselves in subways, which is beyond our list. Only when we merge ourselves into real sites, can we hear them, see them, and feel them. For example, one passenger said he identify stations by watching the side of the opening door. Another passenger said that, we only need to remember one featured station, and then count the stops for destination. Other passengers remember the inter-change station by noticing other passengers’ behavior. We also meet a passenger who marks the number of the door to get on the train in the morning. Some passengers are so smart that even they have fallen asleep on the train; they will wake up automatically on the destination station.
3 Conclusion The 9 methods mentioned above are rooted in clues from different human senses. We hope that through proper design, underground spaces can provide adequate methods for users to memorize, thus let users identify and get to their destination quickly and conveniently. Make it easy for passengers forming cognizing-map in their minds, improve the efficiency of underground traffic. But we should notice that many users have their unique methods to identify underground environment, the construction conditions in different cities are also various, it is not possible for designers to consider every details. So how these methods make effect depends on the development of subway design and the purpose of constructors.
132
L. Dai
What’s more, the reorganization to environment of human beings is a global system. Based on theories of Gestalt psychology, perception of human beings is an integrated Gestalt, which is inseparable. Thus the 9 methods mentioned above are guide lines for designers in their exploration. It was the comprehensive effect of all the methods that make users identify underground environment quickly. A really good designer is who can guide users to use facilities conveniently and efficiently, thus bring pleasant and humane experiences.
Reference Li, J., Tang, Y., Qu, L., Zhang, D., Chen, R., Yao, Z., Xi, W., Zhou, X., Ren, H., Yu, L., Zhao, Z.: All the data in this paper is from the PRP Creative Project of Shanghai Jiaotong University (Serial Number: PRP-C10067) and National Creative Experience Project of College Students (Serial Number: ITP040). Project team members include the graduate and undergraduate students as follows
Pull and Push: Proximity-Aware User Interface for Navigating in 3D Space Using a Handheld Camera Mingming Fan and Yuanchun Shi Department of Computer Science &Technology, Tsinghua University, P.R. China
[email protected],
[email protected]
Abstract. In the 3D object controlling or virtual space wandering tasks, it is necessary to provide the efficient zoom operation. The common method is using the combination of the mouse and keyboard. This method requires users familiar with the operation which needs much time to practice. This paper presents two methods to recognize the zoom operation by sensing users’ pull and push movement. People only need to hold a camera in hand and when they pull or push hands, our approach will sense the proximity and translate it into the zoom operation in the tasks. By user studies, we have compared different methods’ correct rate and analyzed the factors which will affect the approach’s performance. The results show that our methods are real-time and high accurate.
1 Introduction Many 3D interaction tasks need the zoom operation. Suppose that if we want to wander in the 3D campus, we may need to go ahead to watch the landscapes. In order to satisfy this requirement, we can use the mouse to control the moving direction and the up arrow key to move ahead. The disadvantages of the method are as following. First, the operation needs relatively complex combinations of keyboard shortcuts with mouse movement and clicks. This usually operates with two hands. Second, it gives a low level of naturalness and is not a good choice for the children or people who are not familiar with the keyboard and mouse operations. In order to crack the above two disadvantages, we propose a method that users could simply pull or push their hands to move in or out by holding the camera. When they want to go ahead, they just need push their hands forward. When they would like to go back, they just need pull their hands back. Our approach needs only people’s natural movement and almost need no study. Besides the naturalness, the operation only needs one hand and people may use the other hand for other operations. Some researches [2, 5, 6] have done the familiar studies, such as the Harrison and Dey [2] try to recognize the people proximity by the camera in the computer. However, during their mode, the camera is still and the approach is not proper for the 2D or 3D interaction tasks such as the object control or the virtual space navigation. IsseU [5] is similar to our approach. IseeU tries to calculate the change in the standard deviation of the positions of feature points, which are selected in the image captured by the camera, and transform it into a zooming message. However, we analyze that it is not enough to give a high accuracy. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 133–140, 2009. © Springer-Verlag Berlin Heidelberg 2009
134
M. Fan and Y. Shi
After having studied the previous works, we first give two methods to recognize the zooming message. After that we test the accuracy rates of them and analyze the factors which have an effect on the accuracy. Then by taking more factors into consideration, such as how to support large distance zoom, we modify the methods to make them more efficient.
2 Framework of the Algorithm The handheld camera is just a tool for interaction. Because the camera is hold steadily in user’s hand, the camera’s movement will reflect the hand’s movement. In order to detect the camera’s movement, first, we detect some corner points in the image frames captured by the handheld camera, then by analyzing the geometric characters of the corner points’ positions, we try to decide whether the movement is zooming or not. During the following part, we propose two methods to detect the zoom and then compare them with each other to see which one is better.
Corner points detecting
Corner points tracking
Zooming detected
Controlling Application
Fig. 1. The whole framework of processing
3 Corner Points Detecting and Tracking Corner-like points [4] which are corners, with big eigenvalues in the image, are easy to find on incoming frames and are relatively stable while being tracked. Tracking the points means finding the new positions of the corner points, which appeared in the last frame, in the frames. Our approach tracks the feature points by implementing sparse iterative version of Lucas-Kanade optical flow in pyramids [1] .
Fig. 2. The green points are the corner points
Pull and Push: Proximity-Aware User Interface for Navigating
135
4 Zoom Detecting Algorithms In this part, we will discuss two algorithms to detect the zoom in detail and then compare their performance. 4.1 Algorithm One: Sensing the Distance As the figure3 shows, A, B, C, D are positions of the corner points in the last frame, the A’, B’, C’ D’ are the positions in the frame. The average distance between the corner points and their centers becomes farther when the camera zooms in, since the distance between the camera and background are shortened. According the above analysis, first, we calculate the positions of the corner points in the last and now frames. Then, calculate the average distances among them and their centers. Finally, calculate the rate of the new distance and old distance. If the rate is over 1.0, it means that the camera pulls back. If the rate is less 1.0, it means that the camera pushes forward. In the real experiment, due to the hand’s jitter, the camera may have the slight movement. In order to reduce the jitter’s interference, we set a threshold to instead of the above number 1.0.
Fig. 3. Corner points’ positions before and after the camera zoom in. o and o’ are the centers of the old corner points in the last frame and new corner points in now frame.
4.2 Algorithm Two: Sensing the Change of the Area As the figure 3 shows, the corner points form a polygon ABCDE. After the zoom in
,
operation, the polygon becomes the A' B 'C ' D ' E ' the area of the polygon ABCDE changes to be larger. Through sensing the change of the area, we can decide whether a zoom in or out happens. 4.3 The Accuracy of the Two Algorithms Participants Seven participants, six male and one female, take part in the test. They use a webcamera with frame rate 30 fps(frames per second) and a pentium 4 PC with the
136
M. Fan and Y. Shi
main frequency 3.2GHz. Each of them takes the experiment for about five minutes. Before the test, they are given no more than five minutes to be familiar with the camera. Experiment We have rendered a 3D virtual space with DirectX 3D(see figure4). People are asked to use the camera to go ahead or back in the virtual space. We count the total decisions and the right decisions, then calculate the accuracy rate. (We ask the testers to do zoom in operation, then we count the total judgment and the actural zoom in times).
Fig. 4. The left one is the last image. When users push the camera forward, our view goes forward and the house becomes bigger than ever.
Accuracy Rate Participants are asked to do the zoom in and out movements to test two algorithms’ accuracy rates. According to the Fitts law [3], the distance between the camera and positions of the corner points in the real world will have an effect on our algorithms. So in order to test how the distance factor affects our algorithms’ performances, we do the experiments at different distances, such as 0.6m, 2~2.5m, 5m. We calculate the seven participants’ results and give the average accuracy rates at different distances in figure 5. Discussion From the figure5, we can conclude that: • The average accuracy rate of the algorithm2, which senses the proximity by calculating the changes of areas, is higher than the algorithm1, which detects the zoom by calculating the changes of distances. • As the distance between the camera and the corner points’ real world positions increase, the accuracy rate declines rapidly. At the distance about five meters, the accuracy rate of algorithm one has been below 50% and the accuracy rate of algorithm two is almost equal 50%. • The zoom detecting algorithms are totally sensitive about the distance. Within the five meters, the algorithm can keep the accuracy rate above 50%. Within the
Pull and Push: Proximity-Aware User Interface for Navigating
137
Fig. 5. The average accuracy rates of seven participants’ results
distance 1~2m, the two algorithms can keep the accuracy rate over 80%. This results guide us that our hands had better push or pull the camera in a direction in which there are some objects with in 1~2m. • The experiments are taken by seven participants who only use the camera to operate less than five minutes. The results show that they can operate the zooming easily and need less time to practice. 4.4 Large Distance Zooming Support Using a Finite State Machine From the questionnaire, they reflect that the approach is not suitable for moving a long distance at one time. If they want to go ahead in the virtual space for a long time, they must keep pushing or pulling the camera for a long time. This is impossible due to users’ moving space is limited. In order to crack this hard nut, we give the strategy that we first detect the movement. If we continuous detect the zoom in movement for two times, then we simply think that users want to zoom in. In this situation, we output the zoom in. if users want to stop zooming in, they can pull back the camera. If our approach detects the zoom out movement for two times, we think that users want to pull back. The whole procedure can be described as a finite state machine (FSM)(Figure 6). During the zoom in / out state, our approach will output the “zoom in/out” decision. Suppose the current state is “zoom in”, and current judgment of the algorithm is “zoom out”, then the counter Count1 will add by one, then we examine whether Count1 is two or not. If the count1 is two, then the state will change to “zoom out” and the output is “zoom out”. But if the count1 is not two, then the state will still be “zoom in” and the output is “zoom in”. if the current state is “zoom in” and the current judgment is “zoom in”, our approach will output “zoom in” and set the count1 as zero. The reason why we use the counters when the state is changing is to make our algorithm stable. Since the camera is held in people’s hand and the hand will be
138
M. Fan and Y. Shi
Zoom in Count1 = 0
Zoom out Count2 = 0 Count2= 2
Zoom in Count2 = 0 Zoom out Count1 ++ && Count1 !=2
Count1= 2
Zoom out Count1 = 0 Zoom in Count2 ++ && Count2 != 2
Fig. 6. The finite state machine of zoom in and zoom out
naturally jitter while suspending in the air. And this jitter maybe causes some zoom in or out motion. If the approach does not use the state machine or two counters, then only a slight noise will cause false decisions. Experiment for Testing the Effect of Finite State Machine Participants and the hardware conditions are the same as the above experiment. We have done two zoom detecting algorithms, one uses the finite state machine and the other does not. All participants have required to do zoom in and out movements alternatively for about five minutes. Each have done the experiments twice, one time is without the finite machine and the other time with the finite machine. The average distance between the corner points’ positions and users’ hands’ positions is about one meter which is good for our algorithm to work. The average accuracy rates are calculated.
Fig. 7. The accuracy rates of the algorithms. One is 0.93, the other is 0.95. The result shows that the FSM improves the algorithm’s performance.
After the experiments, participants give us some valuable feedbacks, based on which we conclude the following • With the finite state machine, they can continuously zoom in or out. While they use for the object controlling, they can magnify or reduce the size of objects. While
Pull and Push: Proximity-Aware User Interface for Navigating
139
using in the virtual space wandering, they can continuously go ahead or back in the scene. • They can switch between the zoom in and zoom out movement with higher accuracy rate. • Before the hand’s motion state changes to the other one, the counter must count to two. Since the frame rate is 30fps, then the delay time is 6.6 milliseconds which is almost real-time to our eyes. By using the finite state machine and the counter, the accuracy rate is improved.
5 Applications As we have claimed that the given proximity-aware algorithm can be used in the object controlling and the virtual space navigation. In the object controlling task, people can magnify or reduce the virtual object by pushing or pulling the camera. The application is shown in Figure8. In the virtual space navigating tasks, the algorithm can be used for going forward or back in the scene which is especially useful for the games(Figure4).
Fig. 8. The left one is the former image of a cube, when the user pushes the camera forward, the cube’s size will increase as is shown in the right picture
6 Conclusions In this paper, we have proposed and compared two proximity-aware algorithms. From the experiment’s results, we conclude that the algorithm two has a better performance. In order to support the large distance zoom and improve the accuracy rate, we take in a finite state machine. Comparing to the traditional mouse and keyboard operation, our methods are much more natural and easier to learn. Our approaches are real-time and have high accuracy rate. Our methods can be used in the object control and virtual space navigating tasks to fulfill the zoom function.
Acknowledgements Specialized Research Fund for the Doctorial Program of Higher Education, China, No.20050003048 and this research is also supported by the Nokia Research Center.
140
M. Fan and Y. Shi
References 1. Bouguet, J.V.: Pyramidal implementation of the Lucas Kanade Feature Tracker Description of the algorithm. Intel. Corporation Microprocessor Research Labs (1999) 2. Harrison, C., Anind, K.D.: Lean and Zoom: Proximity-Aware UserInterface and Content Magnification. In: Proc. CHI, pp. 507–510 (2008) 3. ISO. Ergonomic requirements for office work with visual display terminals (VDTs) Requirements for nonkeyboard input devices. ISO 9241-9 (2000) 4. Shi, J., Tomasi, C.: Good features to track. In: Proc. IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recogn., pp. 593–600 (1994) 5. Sohn, M., Lee, G.: ISeeU: camera-based User interface for a handheld computer. In: ACM MobilCHI 2005, pp. 299–302 (2005) 6. Wang, J., Canny, J.: TinyMotion: Camera Phone Based Interaction Methods. In: Proc. CHI 2006, pp. 339–344 (2006)
A Study on the Design of Voice Navigation of Car Navigation System Chih-Fu Wu, Wan-Fu Huang, and Tung-Chen Wu Graduate School of Industrial Design, Tatung University 40 JhongShan North Road, 3rd Section Taipei 10452, Taiwan
[email protected]
Abstract. This study tries to find the designing blind spots of the voice prompt function in the current car navigation systems and make improvement suggestions. The experimental plan was implemented through videotape analysis of the voice-prompt mode, referring to Urban Road Classification Regulations and the questionnaire survey results. Driving simulation tests were conducted with 15 experimental subjects, 13 road combinations, and 3 running speeds, and different prompt modes which were run synchronously were also included. Compared with the present mode (prompt time is determined by distance.), the newly-designed mode (prompt time is determined by running speed.) significantly improved driving performance and reduced mental workload. When driving on a main artery with fast lanes and slow lanes, adding a changing-lane prompt with a clear sound to the system can help increasing the driving accuracy rate. Keywords: navigation systems, voice prompt function, driving accuracy rate.
1 Introduction Following the development of science and technology, car navigation systems have increasingly been used by a lot of drivers. Car navigation systems provides drivers with information about how to get from one place to another in a turn-by-turn format such as distance to the turn, the name of the street to turn onto and turn direction [1]. However, whether a car navigation system is pre-installed on the dashboard or setup on the windshield of the driver, the driver inevitably needs to move the eyesight from the road ahead to a 3.5 to 8-inch LCD display of the automobile navigation system. Such distraction from the road is one of the main causes of traffic danger [2,3]. Nowadays, car navigation systems provide information through not only monitor displays but also voice prompt messages, so as to reduce the time drivers spend on monitors and reduce the danger of driving. Moreover, certain drawbacks remain in voice prompt function of car navigation systems, which prevent drivers from using them as the only way to receive navigation information. However, noises on the road, chattering voices of the passengers or sounds made by other vehicles, as well as music, radio and other factors may all interfere with navigation information. If voice playbacks are overlapping, it may confuse drivers and cause accidents. Unclear J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 141–150, 2009. © Springer-Verlag Berlin Heidelberg 2009
142
C.-F. Wu, W.-F. Huang, and T.-C. Wu
navigation information may cause accidents, especially when drivers need to switch lanes from one multi-lane road section to another, or to enter special terrains such as tunnels or overpasses. Due to the limitation of information processing, attention needs to be allocated when a person is multitasking, and his/her mental workload is increased as a result. Attention resources of drivers can be categorized into visual resources, operative resources, mental-workload resources, acoustic resources and information ranking for analytical purposes [4]. Drivers depend largely on the visual modality for drivingrelated information [5]. While mental workload varies along with changes in acoustics and operation, several studies claim that if a message is instant, simple and to be reacted immediately after being received, vocal prompt is suggested; if a message is complicated, lengthy and not to be reacted immediately after being received, visual prompt is preferable [6]. Some studies suggest that names of road are not necessary to be provided by voice prompt in navigation system, because these are not easily comprehended by drivers immediately [7]. Given that driving is metal-requisite, the timing of information to be provided should be limited: if the information is provided too early, drivers may forget such information; if the information is provided too late, drivers may not have enough time to execute relevant tasks. Although mobile navigation systems have been increasingly popular, issues with regard to navigation have appeared as well. Foreign or domestic studies in this area, however, are rather scarce. This study, as a result, aims first at pointing out the issues regarding comprehension of displayed information and voice prompt information from currently available car navigation systems through questionnaires to drivers. Further, with driving simulation tests, driving patterns of drivers passing intersections of different kinds of roads are recorded and compared with the results of questionnaires. Finally, through the combination of variables such as monitor displays, driving reminders and timing of voice prompt, further designs and verification are conducted for the study to be applied in the design of voice prompt systems.
2 A Survey of Currently Car Navigation Systems The study started from investigating products on the market. Three types of car navigation system from four large manufacturers were selected, and installed to proper positions above the wind outlet of air-conditioner in a real car. After setting the route, the car was driven on the same way in Taipei city and the playing mode was recorded in video taping for three runs. From the content of the video, the syntax of driving reminders and broadcasting timing with matching road classification principle were analyzed for designing questionnaire. Questionnaires are used in the early analysis so as to understand issues regarding the provision of information to drivers from currently available automobile navigation systems. They are described below. • Sample Setting: Three most representational car navigation systems (Mio CT720,
PaPaGo R12, TomTom ONE ) were selected from the products available in the market. Two CCDs were used to simultaneously record the status on the road ahead and the graphic displayed and voice played by the navigation systems. In total, three rounds of tests were conducted. Grammar analysis is employed based on
A Study on the Design of Voice Navigation of Car Navigation System
143
the abovementioned samples. After the analysis, Mio-Tech’s product is the most complicated one, with driving information provided before a decision point in a pattern of a intersection in front + b please drive close to X + c please turn + d enter Y + e leave Y + f drive Z, in which “a” and “f” are broadcasted before every intersection, “b” is broadcasted when driving straight is required, “c” is broadcasted when a turn needs to be made at an intersection, “d” and “e” mean entering or leaving special roads, such as overpasses or tunnels, are required, “X” means turning left or right, “Y” means special terrains, such as overpasses and tunnels, and “Z” means the name of the name to drive on. The information provided by PaPaGo is less complicated than that by Mio-Tech but in a similar pattern. The difference is that it does not provide names of the roads to drive on (namely, without the item, f drive Z). TomTom uses the simplest way to broadcast, which only presents one simple sentence such as “PLEASE TURN LEFT” or “PLEASE DRIVE STRAIGHT”. • Categorization of and Regulation on Downtown Roads: Based on the analysis of voice prompt functions of the abovementioned mobile navigation systems, navigation scripts do not adapt in accordance with different navigation contents for different roads. Though additional contents are made for special roads such as overpasses and tunnels, they are too limited. In order to fully understand the issues regarding the provision of driving information with mobile navigation systems, roads need to be fully understood. In accordance with Urban Road Classification Regulations relevant regulations in Taiwan [8], according to the property of the road service such as the speed limit, traffic lane number, lane width, traffic control devices, etc. condition, downtown roads may be categorized into expressway, major road, sub road and service road. Combination of roads and questionnaires are used to understand issues regarding comprehensibility of voice prompt of mobile navigation systems. • A survey of questionnaires interview: The information from the navigation system was presented on paper in the questionnaires with voice prompts being played. As for the graphic information, 13 representative graphic images were captured during the actual uses of the current navigation systems. Through cluster analysis with Urban Road Classification Regulations, with road names changed to prevent the effect from subjects’ memories, they were then re-made into the testing images for the questionnaires with CoreDraw. There are 12 questions in 3 categories including awareness, UI preference, and habit. Only those who had used a navigation system before need to answer the habit-related questions. The investigation was carried out for forty-three subjects (20-55 years old, M=28.4) who owned driving licenses and had driving experiences. Fifteen of forty-three subjects had driven an automobile with a car navigation system. From the above analysis, we can summarize the finding as follows: (1) Products of Mio and PaPaGo have better comprehensibility compare to that of TomTom for those who haven’t used car navigation systems to get more detail the drive information. (2)On the roads with fast lanes and slow lanes, starts at about 20~30 meters from intersections, there might be double solid white line as “keep in lane” lane marking to
144
C.-F. Wu, W.-F. Huang, and T.-C. Wu
divide the car flows toward the same direction for the sake of safety. If a right-turn needs to be made at this kind of intersections, drivers must switch to the slow lane on the outside before passing where the double solid white line starts. (3) The voice prompts in the current products are usually played 500 meters before critical intersections. The same content is played four times. The last time begins about 20~30 meters before intersections are also present graphically in Fig. 1. But drivers usually care only about the last prompt and ignore the first three. Therefore there would be actions like braking, slowing down, etc. right after the last prompt.
Fig. 1. Some voice prompt models of the currently available products (on expressway)
3 Driving Simulation Test The key point in the design of human-machine interface is to discuss the interaction relationship among practical users, objects and environment. As to this study, the real experience in car driving was the best in the sense of reality. However, due to lots of factors interfering the road driving and experimental variables not easy to control, simulation driving was carried out to assure the consistency of test conditions in the experiment. Virtual reality scenes were constructed by thirteen combinations of turns in the road that created from the roads classification and regulation on downtown roads. 3.1 Participants A total of 15 graduate students of Tatung University at the age of 20 or more (20-46 years, M=26, SD=3.2) and were paid to participate in the experiment. All participants
A Study on the Design of Voice Navigation of Car Navigation System
145
have valid driving licenses and driving experience. All participants have normal or corrected visual acuity. 3.2 Apparatus In the present study, the simulated vehicle cab, a 1990 FORD TELSTAR car, included all the normal automotive displays and controls found in an automatic vehicle. The simulator used two Pentium IV PCs to control the simulator, scenario respectively and one Notebook to show visual in-car navigation system. The scenario graphics were projected onto a projector screen located about 2 m in front of the driver’s seat to produce 60 *40 degree field of view. The steering, throttle and brake inputs were connected to the PC that controlled the simulator software. The car navigation system is set up on in right-front of the driver, the height and location of which is around the ventilator of the air-conditioning system as the visual display. For a driver of normal height, the in-vehicle display was 15degrees below his/her straight-ahead plane and 20 degrees to the right. A speaker in front of the passenger seat provided auditory information in the form of a digitized human female voice with a speech rate of ~150 words/min and sound effects.
Fig. 2. One scene of driving simulation ware/hardware
&
Structure of virtualization control soft-
GW InstruNet Data Retrieving System is used for transformation and retrieval of data. Data is retrieved and input into Excel and SPSS for further determination. For analyses of the broadcasting timing of navigation systems, speakers are connected with the GW Instruments data retrieving box at Vin+ and Vin-, and hence voltage values vary as sounds are played. 3DSTATE World Editors developed by Morfit was used to generate the virtual environment. Scene data can be recalled by using Visual Basic together with the animation engine of 3D Developer Studio for Visual. Structure of Virtualization control software/hardware is depicted in Fig. 2. 3.3 Experimental Designs Two different tests were assessed. One was driving simulation tests of currently available car navigation system with modifying switch-lane reminder (hereafter referred to
146
C.-F. Wu, W.-F. Huang, and T.-C. Wu
as Test1), and the other was driving simulation tests for newly-design evaluation car navigation system (hereafter referred to as Test2). Test1 was a 3 *2*2*2 mixed-factors model that compared results by speed on the driven road (three levels: 70 km/hr in expressway , 50 km/hr in major road, as well as 20 km/hr in service road ), way of voice prompt (two levels: complicated broadcasting patterns: Mio-Tech, simple broadcasting patterns: TomTom), monitor display(two levels: on, off), reminders of switching lanes (two levels: on, off).When the last factor is set to be “off”, the mode being tested is the currently available voiceprompt mode; when it’s set to be “on”, the mode being tested is a new trial. The way it works is that a clear and short sound will be played before a voice prompt to remind divers paying attention to the voice content which is about to be played. The effect is like that of the broadcasts to look for somebody in the railroad stations. The evaluation on the newly-designed mode (prompt time is determined by running speed instead of prompt time is determined by distance.) was performed in Test2. In this voice-prompt mode, the reaction distance should be [speed (km/h)*2.5second+30] meters. The number of tests was reduced because of the experience from Test1. Only the factors “speed on the driven road” was kept the same as planned. The numbers of levels were reduced to 1 for the two factors “monitor display (one level: on)” and “way of voice prompt (one level: simple broadcasting patterns: TomTom)”. But the number of levels for the factor “reminders of switching lanes” was increased to 3 (three levels: display, sound, voice). The display mode shows a red arrow icon as a guide to switching lanes (to the left or right lane) on the navigation screen. The sound mode is the same as the “on” mode in Test1. The voice mode has voice prompts for switching lanes (to the left or right lane). Dependent variables were based on both objective and subjective measures. Whether it would be successful or not while turning into scheduled road would be regarded as the object evaluation criterion. Subjective measures were obtained using a modified five-point scale NASA TLX workload assessment. 3.4 Procedure First, a ten-minute explanation is provided with regard to the test purpose and procedure and to instruct the subjects on how to manipulate in the driving simulation environments with the wheel, accelerator and brake pedal (those were connected to the serial port of the personal computers). Then the subject entered the test phase where he will receive the 36 navigation scenarios in pre-planned different sequences but were counterbalanced to prevent any learning/order effect. A short break was taken if necessary after each scenario trial. For each navigation test, all subjects were required to drive through the designated routes to reach an identical destination. All participants were told to arrive the destination in accordance with the driving information provided by available mobile navigation models, to drive along the route as accurate as possible, and to maintain the car moving in a controllable speed. Chart of changes in voltage are drawn with Excel based on data retrieved, which are analyzed to determine whether subjects have driven in accordance with the rules of the test. Finally, subject filled out a NASA TLX subjective evaluation questionnaire to describe his/her subjective feelings toward different experimental conditions. It took ~120 min for participants to complete the present study.
A Study on the Design of Voice Navigation of Car Navigation System
147
4 Results Data collect for present study were analyzed by inferential statistics of ANOVAs. The LSD treatment contrast tests were used for post-hoc comparison. Table 1. Test1 Resuls -1 ( Currently Available Products )
Driving Speeds 70km/h Expressway 50km/h Major Road 20km/h Service Road
Broadcasting Patterns Mio TomTom Mio TomTom Mio TomTom
Driving Correctness(%) Monitor Display On Monitor Display Off 13.3% 0.0% 26.7% 0% 40.0% 0.0% 93.3% 26.7% 96.7% 36.7% 100% 46.7% 100% 93.3% 100% 90% 100% 86.7%
Table 1 shows that only 26.7% of drivers can correctly leave the expressway by only listening to the final navigation information 20~30 meters before the event decision point. On major roads, 96.7% of drivers can make turns correctly with the monitor on, and 36.7% with it off. On service roads, whether monitors are on or off, the correctness rates are more than 90%. As TomTom’s navigation contents are shorter and require less time for broadcasting within the same distance, they tend to be fully listened to by drivers compared with those of Mio-Tech’s. Table 2. Test1 Resuls -2 ( Switch-Lane Reminding with sound effect)
Driving Speeds 70km/h Expressway
50km/h Major Road
20km/h Service Road
Switch-Lane Reminding Sound On Off On Off On Off
Driving Correctness 78.3% 68.3% 83.3% 58.3% 80.0% 85.0%
Chi-Square Inspection Value DOF Sig. 1.534
1
0.215
9.076
1
0.003**
0.519
1
0.472
From Table 2, it shows that when driving on major roads with median strips, driving correctness rate is 83.3% with switch-lane reminders and 58.3% without such, while “p” is less than 0.05 under the chi-square test. Table 2 shows descriptive statistics and a chi-square inspection sheet of driving with different broadcasting patterns. There is neither significant difference between these two broadcasting patterns, nor between whether the monitor display is on or not. Two major disadvantage of currently available car navigation systems based on the conclusion from early analysis: (1) The timing of final voice prompt on function
148
C.-F. Wu, W.-F. Huang, and T.-C. Wu
cannot be changed based on the car speed. As a result, when the driving speed is rather high, navigation information may not be fully broadcasted, and driving action may not be made in time by drivers. (2) It would result in the same problem if drivers don’t fully understand the switch-lane prompt from the navigation system when driving. But when using reminding sounds on some sections of the roads, as they may be misunderstood as “PLEASE MARK A TURN”, when no switch-lane reminder is provided, driving correctness is higher, nevertheless. Table 3. Test2 Results (Switch-Lane Reminding with Different Ways)
Driving Speeds 70km/h Expressway 50km/h Major Road 20km/h Service Road
Switch-Lane Reminding Sound Voice Display Sound Voice Display Sound Voice Display
Driving Correctness 93.3% 100% 73.3% 86.7% 100% 53.3% 73.3% 80.0% 93.3%
Chi-Square Inspection Value DOF Sig. 5.850
2
0.05*
10.833
2
0.004**
2.218
2
0.345
Table 3 shows descriptive statistic data of correctness on three kinds of tested roads. Correctness rates on the three kinds of roads are generally the same, while different driving reminders contribute to changes in correctness rate. Based on the significance derived with chi-square tests, on expressways and major roads, three different kinds of reminders affect driving correctness significantly, while there is no significant difference on service roads. Subjective measure Subjective grading items include quality of information provision, suitability of broadcasting timing, necessity of monitor display and comprehensibility of information. Through single-variable analyses of variance, it shows that broadcasting patterns designed by this study are better than those of the currently available products and can reduce drivers’ mental workload.
5 Discussion and Conclusions During driving and navigating, drivers have to monitor the car by searching the environmental information and shifting attention from one information source to another [10,11], and drivers depend largely on the visual modality for driving-related information [5]. According to the multiple resource theory, in a heavily loaded visual display environment, an auditory display will improve time-sharing performance [12]. When driving conditions and information are complicated, drivers may have more difficulty in filtering and remembering useful information presented by an auditory display because of the memory interference problem [13]. Similar results were found in the present study.
A Study on the Design of Voice Navigation of Car Navigation System
149
Although voice information aims at reducing the occasions in which drivers move their eyesight from the road, deficiencies in voice navigation functions may nevertheless become a safety concern. However, drivers may have difficulty in paying attention to the auditory display all the time. The timing of the appearance of prompt messages is a keypoint. Four of these findings are worth summarizing: • Prompting Issues of Voice Navigation Systems: Insufficient prompted information and unsuitable timing of prompting result in misunderstanding of voice navigation information. Unclear information, such as not reminding drivers to switch lanes, leads to driving actions not being accomplished when event decision points, such as turning and switching lanes, are passed. In terms of prompting timing, drivers usually neglect the first three prompted contents before the decision, while the final information is not prompted based on driving speeds and may not be heard by drivers in time. • Switch-Lane Reminder : With regard to reminding sounds for switching lanes: (1) On major roads, such sounds can effectively improve driving correctness (p<0.05), with voice reminders being the best one (100%), sound reminders the second (93.3%) and display reminders the third (73.3%); (2) On express ways, although there is no significant difference, driving correctness is still improved (from 68.3% to 78.3%), with voice reminders being the best one (100%), sound reminders the second (86.7%) and display reminders the third (53.3%); (3) On service roads, there is no significant difference, with display reminders being the best one (93.3%), voice reminders the second (80.0%) and sound reminders the third (73.3%). • Timing of prompt: According to the voice prompt pattern in the tests conducted, the distance before event decision point should be [speed × 2.5 seconds] meters for the broadcasting to be finished. According to the study [9,14], 2.5 seconds may be further reduced to 1.93 seconds due to reaction time in acoustic transmission is shorter than that in visual one. As switching lines is prohibited within 30 meters from intersections in downtown areas, for major roads and expressways, the reaction distance should be [speed × 2.5 seconds + 30] meters. • Further Suggestions: A proper amount of navigation information and its optimization have not been discussed in this study. In terms of tests conducted, interfering events and other vehicles are not simulated for the driving environment.
Acknowledgements This research was supported by the National Science Council under contract number NSC 97-2221-E-036-032-MY2, Taiwan(R.O.C).
References 1. Perez, W.A., Mast, T.M.: Human factors and advanced traveller information systems (ATIS). In: Proceedings of the Human Factors Society 36th Annual Meeting, pp. 1073– 1077. Human Factors Society, Santa Monica (1992) 2. Wierwille, W.W.: Development of an initial model relating driving in-vehicle visual demands to accident rate. In: Third Annual Mid-Atlantic Human Factors Conference Proceedings. Virginia Polytechnic Institute and State University, Blacksburg (1995)
150
C.-F. Wu, W.-F. Huang, and T.-C. Wu
3. French, R.L.: In-vehicle navigation-status and safety impacts. Technical Papers from ITE’s 1990,1989, and 1988 conference (1990) 4. Wierwille, W.W., Hulse, M.C., Fischer, T.I., Dingus, T.A.: Visual adaptation of the driver to high-demand driving situation while navigating with an in-car navigation system. In: Vision in Vehicles III, pp. 79–87. Elsevier Press, Amsterdam (1991) 5. Lansdown, T.C.: Visual allocation and the availability of driver information. In: Gothengater, T., Carbonell, E. (eds.) Traffic & Transport Psychology: Theory and Application, pp. 215–223. Pergamon Press, Amsterdam (1997) 6. Sanders, M.S., McCormick, E.J.: Human Factors In Engineering and Design, 7th edn. McGraw Hill Press, Singapore (1993) 7. Rasker, P.C., Post, W.M., Schraagen, J.M.C.: Effects of two types of intra-team feedback on developing a shared mental model in command & control teams. Ergonomics 43, 1167– 1189 (2000) 8. Construction and planning Adency Ministry of the Interior(CPAMI) Information, http://www.cpami.gov.tw/web/index.php 9. McGehee, D.V., Mazzae, E.N., Scoot, G.H.: Baldin: Driver Reaction Time in Crash Avoidance Research:Validation of a Driving Simulator Study on a Test Track. In: Proceedings of the IEA 2000/HFES 2000 Conference, vol. 3 (2002) 10. Dewar, R.E.: In-vehicle information and driveroverload. International Journal of Vehicle Design 9, 557–564 (1988) 11. Dingus, T.A., Hulse, M.C.: Some human factors design issues and recommendations for automobile navigation information systems. TransportationResearch 1, 119–131 (1993) 12. Wickens, C.D., Sandry, D., Vidulich, M.: Compatibility and resource competition between modalities of input, central processing, and output. Human Factors 25, 227–248 (1983) 13. Liu, Y.-C.: Comparative study of the eVects of auditory, visual and multimodality displays on drivers’ performance in advanced traveller information systems. Ergonomics 44(4), 425–442 (2001) 14. Oglesby, C.H., Hicks, R.G.: Highway Engineering. John Wiley & Sons, Chichester (1982)
Front Environment Recognition of Personal Vehicle Using the Image Sensor and Acceleration Sensors for Everyday Computing Takahiro Matsui, Takeshi Imanaka, and Yasuyuki Kono Graduate School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo, Japan {acl27598,bbv84420,kono}@kwansei.ac.jp
Abstract. In this research, we propose the method for detecting moving objects in front of the Segway by detecting running state for the Segway. Running state of the personal vehicle Segway is detected with both an image sensor and an acceleration sensor mounted on the Segway. When objects are moving in front of the Segway, the image sensor can capture the motion while the acceleration sensor shows a different result. By analyzing the difference our method successfully recognizes moving objects from environment. Keywords: Segway, Image Sensor, Acceleration Sensor, Optical Flow.
1 Introduction This research examines the safety and comfort of the personal vehicle Segway [1] which uses electrical energy instead of gasoline in everyday environment. Unlike other vehicles such as cars and bicycle, the Segway can be used to run not only on the road in outdoors, but also indoors such as inside the buildings or factories shown in Fig. 1 right. Safety must be especially considered for the Segway when driving in such place, since they have high pedestrian traffic. To improve the safety of the Segway, a system to notify the danger to the rider and avoid the incident automatically is necessary. This paper proposes the method to detect moving objects in front of the Segway. Rider can easily recognize static obstacles and avoid them, but it is difficult to avoid dynamic obstacles such as unexpected pedestrian and other personal vehicles. Thus, if a system can recognize these obstacles and notify the riders or avoid them automatically, safety of the Segway can be assured. Most of the researches which use vehicle video system to recognize the front environments are instead for use of cars, and are not applicable for slow vehicle like the Segway, which requires to recognize dynamic obstacles in close range. Moreover, the operation of the Segway is different from conventional vehicles in the way that has no acceleration pedal and no brakes. A rider stands on the footplate between two wheels to adjust speed, movement, and direction of the Segway. Vehicles other than the Segway do not have these features. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 151–158, 2009. © Springer-Verlag Berlin Heidelberg 2009
152
T. Matsui, T. Imanaka, and Y. Kono
Fig. 1. A difference between a car and the Segway
Our proposed method detects the running state of Segway by employing both an image sensor(camera) aimed forward and an acceleration sensor. By integrating sensor readings, our method detects moving objects in the frontal direction. In this research, we focus on the tilting of the handle bar while driving the Segway. From the image sensor mounted on the handle bar, the system analyzes the video captured from it and detects the optical flow [2,3] to obtain the running state among the following six states: forward, backwards, right turn, left turn, acceleration and deceleration. When a moving object such as a walking person is within the range of the image sensor in front of the Segway, it affects the trend of the optical flow, although the readings from acceleration sensors are not affected by the object. Using these data obtained from both sensors, we analyze whether the moving object actually exists within the certain range which the incompatible information has been detected. This method will contribute for safety in the future, notifying the riders to avoid the moving obstacles in front of the Segway.
2 Research Background 2.1 Related Works Many of related works usually mount the image sensor on the car. Although the posture of the camera is overall stable in these works, that on the Segway changes depending on the running state, i.e., the rider’s center of gravity. For example, when the rider wants to turn right, he tilts the handle bar to the right, and the image sensor also tilts to the right. The rider leans forward to gain speed, so the handle bar together with the camera also leans forward. It is necessary to analyze the running state of Segway by detecting the tilt of the image sensor which is affected by the change of the Segway posture.
Front Environment Recognition of Personal Vehicle
153
A method for detecting three-dimensional position of moving objects with an invehicle camera is proposed [4]. The method in [4] is not suitable to the Segway because the posture of the camera is not stable while driving. Another approach is to detect approaching vehicles by classifying horizontal edges in an image into two classes; ones on the ground and one above the ground [5]. The method in [5] is also difficult to apply to complicated everyday scenes. 2.2 Preparation The image sensor is mounted on the handle bar as indicated in Fig. 2(A). The acceleration sensor is mounted on the footplate. The image sensor is Glasshopper of Point Grey Research Inc shown in Fig. 3. The acceleration sensor is WAA-001 of Wireless Technologies Inc shown in Fig. 4. The specification of the sensors is shown in Table 1 and 2 [6] [7].
Fig. 2. Placement of sensors on the Segway
Fig. 3. Grasshopper
Fig. 4. WAA-001
154
T. Matsui, T. Imanaka, and Y. Kono Table 1. Spec of the Grasshopper
Table 2. Spec of the WAA-001
3 Moving Object Detection by Sensor Fusion In this research, the image sensor is mounted on the handle bar of the Segway and the acceleration sensor is mounted on the footplate between two wheels. Running states of the Segway are detected from these sensors, and moving objects are recognized from the discrepancy of the detection results.
Fig. 5. Optical flows change when a moving object is recognized
The image sensor detects and analyzes optical flow patterns of each running states while the acceleration sensor measures the acceleration of the Segway to determine the running states. The optical flow shows the movement of feature points between
Front Environment Recognition of Personal Vehicle
155
video frames expressed as vector. Considering the running state of the Segway as moving forward, when the image sensor detects no moving objects, the optical flow displays as shown in Fig. 5 left. When a moving object is within the field of view, the optical flow is disrupted as shown in Fig. 5 right. On the other hand, the moving object has no influence on the detection results of the acceleration sensor, which measures the acceleration of the Segway for detection of the running states.
4 Running State Detection Using Each Sensor 4.1 Posture Change and Running State of the Segway The method to detect the running state is described here, namely; forwards, backwards, right turn, left turn, acceleration and deceleration.
Fig. 6. Forward
Fig. 9. Eceleration
Fig. 7. Backwards
Fig. 10. Right turn
Fig. 8. Acceleration
Fig. 11. Left turn
Details of optical flow pattern for each running state is described as follows. In forward running state, a vanishing point and optical flows appear as shown in Fig. 6. In backwards running state, a vanishing point and optical flows appear as shown in Fig. 7. In accelerating running state, optical flows appear as shown in Fig. 8. The flows rise from the bottom to the top as the handle bar leans forward. The camera is faced downwards and a range of the image sensor moves to bottom. In deceleration running state, optical flows appear as shown in Fig. 9. The flows descend from top to bottom as the handle bar is pulled back, and the camera is faced upward. In addition, after the Segway has decelerated, note that the posture of the Segway returns to the upright position. In turning right running state, optical flows appear as shown in Fig. 10. In turning left running state, optical flows appear as shown in Fig. 11.
156
T. Matsui, T. Imanaka, and Y. Kono
4.2 Running States Detection Using the Acceleration Sensors When running state of the Segway changes, the acceleration sensor data change characteristically. Fig. 13 shows one example of the acceleration data. In addition, the line X shows the acceleration that is traveling direction of the Segway, and the line Y shows the acceleration that is horizontal direction and vertical to X axis against traveling direction of the Segway shown in Fig. 12. The acceleration data change remarkably when the running state changes. For example, when the vehicle is running forward, the line X does not show visible changes, but, large values are obvious along the positive direction when moving the vehicle from an idling state and along the negative direction during acceleration. Also, in line Y, large values are obvious along the positive direction when turning right, and along the negative direction when turning left. Likewise, by detecting the significant changes in the acceleration data, we can distinguish the running states from the image sensor. Fig. 13 shows readings from the acceleration sensor starting from the idling state, turning right, moving forward for a while, and then turning left.
Fig. 12. Three axial directions
Fig. 13. An example of an acceleration sensors data
Front Environment Recognition of Personal Vehicle
157
4.3 Integrating to Sensors Information This section compares image sensor data and acceleration sensor data obtained simultaneously while riding the Segway. Fig. 14 shows the readings while the Segway is moving forward. When no moving objects are in the sight, the image sensor simply detects the optical flow for “moving forward” as explained in section 4.1. No remarkable changes can be verified from the acceleration sensor data. In contrast, Fig. 15 shows the case when the moving object exists in front of the Segway. In this case, the optical flow shows different trends from that in Fig. 14, although the acceleration sensor obtains similar data with Fig. 14. The moving object which affects the optical flows can be detected by examining the moment when such discrepancy occurs. Fig. 16 and 17 show examples while the vehicle is turning right. In this case, acceleration sensor data shows the same trend, although optical flow shows different trends.
Fig. 14. Two sensors data when a moving object does not appear (moving forward)
Fig. 15. Two sensors data when a moving object appears (moving forward)
Fig. 16. Two sensors data when a moving object does not reflect (turning right)
158
T. Matsui, T. Imanaka, and Y. Kono
Fig. 17. Two sensors data when a moving object does not reflect (turning right)
5 Conclusion This paper described our method to detect front moving objects by using the incompatibility between readings from an image sensor and an acceleration sensor. Moreover, the detection method of running state using two sensors to detect a moving object is described. As future work, it is necessary to examine whether the safety of the personal vehicle can be achieved with our method.
References 1. Segway, http://www.segway.com 2. Horn, B.K.P., Schunck, B.G.: Determining Optical Flow. Artifical Intelligence 17, 185–204 (1981) 3. Bouguet, J.Y.: Pyramidal Implementation of the Lucas Kanade Feature Tracker, Intel. Corporation, Microprocessor Research Labs (2000) 4. Onoguchi, K.: Shadow Elimination Method for Moving Object Detection. In: Proceedings of International Conference on Pattern Recognition (ICPR 1998), pp. 583–587 (1998) 5. Okada, R., et al.: Obstacle Detection using Progressive Invariant and Vanishing Lines. In: ICCV, pp. 330–337 (2003) 6. Point Grey Research, http://www.ptgrey.com/products/grasshopper/index.asp 7. ATR-Promotions, http://www.atr-p.com/sensor01.html
Common Interaction Schemes for In-Vehicle User-Interfaces Simon Nestler, Marcus Tönnis, and Gudrun Klinker Fachgebiet Augmented Reality Technische Universität München Fakultät für Informatik Boltzmannstraße 3, 85748 Garching bei München, Germany {nestler,toennis,klinker}@in.tum.de
Abstract. In this paper different interaction schemes which are currently implemented by major automotive manufacturers have been identified and analyzed. Complete overviews on all in-vehicle user-interface concepts are rarely spread. This paper gives a deeper insight in interaction schemes and userinterface concepts which are implemented in current cars. Additionally an expert review with 7 experts was performed to get a first impression which userinterface interaction schemes work well in the in-vehicle context. In order to get an impression of the suitability of the interaction schemes for the development of usable in-vehicle user-interfaces we performed different tests. The results are reported in text and tables. Keywords: User Interface Design, In-vehicle information systems, IVIS.
1 Introduction The variety of driver information systems (IVIS) continuously increases. Integration of IVIS into a consistent and integrated human centered interaction concept gets more and more important. We have identified and analyzed different human computer interaction interaction schemes which are currently implemented by major automotive manufacturers. The identification of these HCI interaction schemes and the determination of their suitability for the use in cars is the basis for future design and development of intuitive and easily learnable user-interfaces in cars. The identification of interaction schemes and the evaluation of their suitability for the use in cars were performed in a two-stage expert review: Whereas in the first step the general usability was tested (with SUS / SEA), in the second step the usability was tested more specifically (with HMI). This paper starts with an overview on related work in the field of in-vehicle userinterface concepts. Afterwards it gives an overview on different interaction schemes which are implemented in current automobiles. Finally the evaluation and the evaluation results are presented. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 159–168, 2009. © Springer-Verlag Berlin Heidelberg 2009
160
S. Nestler, M. Tönnis, and G. Klinker
2 Related Work Ablassmeier et al. proposed new search techniques for in-car interfaces [1]. Their proposed search agent has a high potential to increase the concentration on the primary driving task. Burnett et al. focus on the usability of car navigation systems and give a comprehensive overview on the issues concerning human-machine interfaces [2]. Research on in-vehicle information systems quite often is limited to the presentation of navigation information and warnings [5]. Burnett et al. identified a rapid growth of interest in the development and utilization of tactile interfaces in cars [3]. In their opinion the human skin surface offers an important means for presenting information to the users, even if their other senses may already be overloaded. Finally they summarized the arguments for and against allowing drivers to enter a destination with a vehicle system while driving. In their opinion the inhibition of this functionality whilst being onthe-move is not an ideal solution. Consequently the research in user interfaces and human factors has to investigate the potential of novel in-car user-interfaces [4]. Another field of in-vehicle human computer interaction research are considerations of driver distraction. The risk caused by usage of mobile devices is commonly taken into account [6]. Although this study describes different types of driving distractions caused by mobile devices, the distraction by in-vehicle information systems is analyzed rather limited. Cell-phone dialling tasks in cars have been analyzed by Salvucci as well [16]. He generated a model for a priori predictions of total times for different tasks. Stevens et al. published a checklist for the assessment of in-vehicle systems, which contains a questionnaire, instructions and additional supporting information [17, 18]. Nevertheless it remains unclear whether following the proposed procedures leads to systems with better design and supports the identification of design errors. Additionally the European Commission published a statement on the user-interface design principles for in-vehicle information and communication systems [7]. When focussing on the in-vehicle user interface design additional related work has to be considered as well. Green et al. proposed design guidelines for driver information systems by establishing the resumption lag as a factor in predicting an IVIS-style task time [8, 9]. Particular design handbooks such as the European HARDIE report from Ross et al. contain guidelines how information should be to the car driver [13]. Ito et al. analyzed eyes-off-the-road times which are caused by the manipulation of IVIS [11]. Maximum eyes-off times measured in the evaluation were between 4s and 5s. Libuda presents an example of the potential of multimedia user interfaces [10]. He describes the development of an in-vehicle user interface with different input options: language, manual mode and signs. The in-vehicle presentation of navigational information was analyzed by Narzt et al. They found, that typically either a flat arrow or a virtual bird eye view is used best for visualizing the current position [12]. Presenting information on Head-Up displays (HUD) leads to reduced access costs and increased time with eyes on the road [19]. This way of information presentation can improve the detection of objects in the outside world, lane tracking and velocity control [20].
Common Interaction Schemes for In-Vehicle User-Interfaces
161
3 Interaction Schemes The overview on related work shows that a lot of research has been performed in the field of driver distraction, display technologies and in-vehicle applications. Publications which give a deeper insight into interaction schemes are rarely spread. It is difficult to get a broader overview on user-interface concepts which are implemented in current cars. The underlying interaction schemes play an important role in the estimation of the usability and performance of the in-vehicle user-interface. We identified different interaction schemes when taking a closer look at seven different in-vehicle user-interfaces: A, B, C, F, L, M and T. As a result of this analysis, several interaction schemes were identified. These are: integrated interaction, logical connections, information distribution, information presentation in the HUD, menu manipulation, short cuts, independent state transitions. 3.1 Integrated Interaction Concept Two different interaction schemes regarding the general interaction concept exist in current cars. In most cars interaction is based on a central multi-functional controller with equal functionality (rotating, shifting and pressing) as shown in Figure 1. In some cars, however, interaction bases on touch screen devices. Both of these interaction concepts show advantages and disadvantages when used in a car. Due to considerations of driver distraction the functionality of touch screen devices is reduced during motion, while the controller based concepts offer the full functionality even during motion. Furthermore the hard-keys beside the central controllers differ significantly between the different manufacturers. Whereas the central controller concept at userinterface B bases on a controller which can be pushed pressed and rotated, the userinterface A includes many additional hard keys. A composition of these into controller concepts can be found at user-interface M as shown in Figure 1.
Fig. 1. The integrated interaction concepts at user-interface B (left), user-interface M (middle) and user-interface A (right) base on a multi-functional controller and one or more additional buttons
3.2 Logical Connections The existence or absence of a logical connection between the central information display (CID) and the digital instrument panel (DIP) are two further interaction schemes which are implemented in current user-interfaces. Some concepts connect the CID and DIP logically as shown in Figure 2a, the modification of the CID's system state
162
S. Nestler, M. Tönnis, and G. Klinker
(a)
Fig. 2. (a).The digital instrument panel (left) and the central information display (right) are connected logically at user-interface F. (b).The digital instrument panel (left) and the central information display (right) are not connected logically at user-interface M.
changes the DIP's system state and vice versa. Most current cars, however, use two completely independent system states as shown in Figure 2b. 3.3 Information Distribution The information distribution across the different display areas in the cockpit follows two different user-interface interaction schemes. The first interaction scheme includes the equal distribution of all information on all available displays, typically CID and DIP. In the second interaction scheme one display is the dominant while the other display only provides sparse information as shown in Figure 3. The CID as well as the DIP is used as the central display in current user-interface concepts for cars.
Fig. 3. In the digital instrument panel (left) very sparse information is presented whereas in the central information display (right) extensive information is available at user-interface T
Common Interaction Schemes for In-Vehicle User-Interfaces
163
3.4 Information Presentation in the HUD The introduction of HUDs for cars leads to new interaction schemes for the information distribution between the DIP and HUD. One interaction scheme is the redundant visualization of the most relevant information in the DIP and in the HUD. The relevant information is not distributed between these two displays; it is duplicated as shown in Figure 4. The other interaction scheme contains the consistent distribution of information which leads to the removal of information from the DIP.
Fig. 4. In the head-up display at user-interface B (left) more information is presented than in the head-up display at user-interface C (right)
The concept of user-interface B contains the presentation of important detailed information; the concept at user-interface C is limited to speed information and a rather schematic navigation hint. Due to the fact that the information at user-interface B is quite detailed the driver has to look on the DIP less frequently. 3.5 Menu Manipulation Two different interaction schemes for the manipulation of menus exist. In some cars the main menu is realized completely in software. No hard keys for the direct access of menu items are available. In the contrary interaction scheme all items in the main menu are represented by hard keys, the main menu is built in hardware.
Fig. 5. In user-interface A the main menu is represented by hard keys (left), in user-interface B (middle) the main menu is completely realized in software, and in user-interface T (right) the main menu is represented by hard keys at the side of the screen (dark buttons) as well as on the screen (light buttons)
164
S. Nestler, M. Tönnis, and G. Klinker
In some current cars a combination of these two interaction schemes exists, the menu items can be selected by a soft menu as well as by a hard menu. Figure 5 shows the different menu types; User-interface A uses a menu with hard keys, user-interface B uses a soft menu and user-interface T uses a combination of both principles. 3.6 Short Cuts Short cuts are a common interaction scheme for accessing frequently used functionality more easily. Two different interaction schemes are implemented in current cars. Some cars are equipped with a large number of controls for the direct access of frequently used functionalities. Other cars, however, use short cut controls which can be defined freely by the user as shown in Figure 6.
Fig. 6. In the shortcut concept at user-interface C a large number of controls for direct access is provided (left), whereas in the dynamic shortcut concept at user-interface B (right) only a limited amount of dynamic shortcuts is available
3.7 Independent State Transitions Distraction of the driver's attention is always an issue when discussing different interactive schemes for user-interfaces in cars. Especially common interaction schemes such as returning to a neutral system state have an influence on the distraction of the driver's attention. In some cars the system returns after certain timeout to the initial state if no further interaction has been performed, as shown in Figure 7. Consequently this concept
Fig. 7. The user-interface F automatically switches from the menu (left) to the initial state after a certain timeout (right)
Common Interaction Schemes for In-Vehicle User-Interfaces
165
forces the driver to perform his task quite fast – if he wants to continue his task at the same point at which he was interrupted. Besides an increase of the interaction times the distraction of the driver might increase as well.
4 Evaluation Ross et al. give recommendations for the evaluation of IVIS. They state that smallscale expert evaluations in a task-based context lead to good results [14]. Since a rigorous and comprehensive evaluation of technology is quite expensive, the proposed method was used in our evaluation as well. In our expert-review a group of 7 experts evaluated seven different user-interfaces: A, B, C, F, L, M and T. First of all we tried to get an impression of the usability of the different in-vehicle user-interface with the SUS (system usability scale) test [21]. The results of the SUS are shown in Figure 8 / left. The small sample size made it impossible to interpret the result statistically. We used this SUS test to get an initial estimation of the usability without explicitly distinguishing between usable and less usable in-vehicle interfaces. The test revealed, however, that there is still room for improvement in all user-interfaces, because none of the user-interface is clearly above 50 points (which means that the users could not decide between the two antipoles userinterface is highly usable and user-interface is not usable at all). The workload was measured by the SEA test [22]. The users performed two tasks: manipulation of the radio (Figure 8 / middle) and of the navigation system (Figure 8 / right). Again these tests reveal room for improvement; whereas the workload of the radio manipulation (selecting a radio station and changing the volume) was not very high, the manipulation of the navigation system (entering a destination) was rather high.
Fig. 8. The usability of the different user-interfaces (left) was evaluated by the SUS-Test, the cognitive workload of the radio manipulation (middle) and the manipulation of the navigation system (right) were evaluated by the SEA-Test
To get an impression of the suitability of the different interaction schemes in the in-vehicle user-interface context, we performed a more specific HMI test from [15]. Whereas SUS and SEA are general tests for the evaluation user-interfaces, the HMItest focuses on user-interfaces in vehicles. This HMI test is suitable to evaluate
166
S. Nestler, M. Tönnis, and G. Klinker
numerous aspects of in-vehicle user-interfaces. We selected the aspects, which are connected with the identified interaction schemes (regarding form and content): occlusion, visiblity, grouping, shortcuts, overview, layout, design, consistency and cancelling. Some of these aspects were evaluated separately for each display (CID, DIP, HUD). The user had to rate each attribute of the user-interface from 1 (very poor) to 5 (very good). Table 1. In the HMI test we focused on occlusion, visiblity, grouping, shortcuts, overview, layout, design, consistency and cancelling. The users rated from 1 (very poor) to 5 (very good). The first value shows the mean, the second value in brackets shows the standard deviation.
5 Results and Discussion When taking a closer look at the lowest and highest values in each category in Table 1 (these values are written bold), we were able to connect these results with the interaction schemes identified above. The occlusions occurred most often at user-interface T (touch screen concept) whereas they occurred quite seldom at user-interface C (central controller concept). A large number of hard keys made it quite difficult for the driver to see the right key and reach it easily. Consequently the visibility and accessibility of the hard keys in the user-interface C (static shortcut concept) was rated rather poor, the visibility and accessibility of user-interface B (dynamic shortcut concept) was rated best. The spatial grouping of the two displays and the logical connection of DIP and CID was rated poor for user-interface F. The cars with no logical connection of the displays (userinterface A and M) were rated best regarding the logical and spatial connection. The shortcut concept at user-interface C (a hard key for every function) was rated quite poor, whereas the shortcut concept at user-interface A (a hard key for each menu) was rated best. The overview was best at user-interface M, which uses a balanced
Common Interaction Schemes for In-Vehicle User-Interfaces
167
information distribution between CID and DIP. With user-interface T the users lost the overview, caused by the combination of a rather sparse DIP and an overloaded CID. Deducing functionality and way of interaction from the design worked best at user-interface B (single controller) and worst at user-interface C (many different controllers with different functionality and way of interaction). The information presentation in the HUD was rated best at user-interface B (detailed information in the HUD) as opposed to the HUD at user-interface C which offers only limited information (speed and navigation hints). The user-interface T was rated poor regarding the similar screen layouts, which is a consequence of the redundant menu structure (soft menu and hard key menu). The menu manipulations on basis of hard keys (user-interface A) and on basis of a soft menu (user-interface B) were both rated equally well. Cancelling settings was most difficult with user-interface F where independent state transitions are performed after a certain timeout. User-interface A contained the best concept for aborting interactions by using a hard key for the cancelling of interactions. The analysis of the questionnaire revealed the differences between the HCI models in the different cars. The questionnaire then qualified the different models by showing advantages and disadvantages in interaction. Through the analysis we have the opportunity to identify how different HCI models fit together and can get integrated into one common concept to reduce diversity and complexity of HCI in cars. These results enable research to gain insight in opportunities for highly combined and integrated IVIS systems for optimized driver workload and preference.
Acknowledgements This work was performed in collaboration with BMW Forschung & Technik GmbH. The authors would like to thank Dr. Klaus-Josef Bengler and Mariana Rakic for their support in the identification of the interaction schemes. Furthermore we would like to thank Kerstin Sommer for her support in the design of the different questionnaires. Finally we thank Tony Poitschke for his support in the examination of the evaluation results.
References 1. Ablassmeier, M., Poitschke, T., Rigoll, G.: A new approach of a context-adaptive search agent for automotive environments. In: CHI 2006 extended abstracts on Human factors in computing systems, pp. 1613–1618 (2006) 2. Burnett, G.E.: Usable vehicle navigation systems: are we there yet? In: Proceedings of Vehicle Electronics Systems 2000, European Conference and Exhibition, Leatherhead, UK (2000) 3. Burnett, G.E., Porter, J.M.: Ubiquitous computing within cars: designing controls for nonvisual use. International Journal Human-Computer Studies 55, 521–531 (2001) 4. Burnett, G.E., Summerskill, S.J., Porter, J.M.: On-the-move destination entry for vehicle navigation systems: Unsafe by any means? Behaviour & Information Technology 23(4), 265–272 (2004) 5. Campbell, J.L., Richman, J.B., Carney, C., Lee, J.D.: In-vehicle display icons and other information elements. Guidelines, Federal Highway Administration, vol. I (2004)
168
S. Nestler, M. Tönnis, and G. Klinker
6. Chittaro, L., De Marco, L.: Driver Distraction Caused by Mobile Devices: Studying and Reducing Safety Risks. In: Proceedings 1st International Workshop Mobile Technologies and Health: Benefits and Risks (2004) 7. Godthelp, H., Haller, R., Hartemann, F., Hallen, A., Pfafferott, I., Stevens, A.: European Statement of Principles on Human Machine Interface for In-Vehicle Information and Communication Systems (May 1998) 8. Green, P.: Estimating Compliance with the 15-Second Rule for Driver-Interface Usability and Safety. In: Human Factors and Ergonomics Society Annual Meeting Proceedings, Surface Transportation, pp. 987–991 (1999) 9. Green, P., Levison, W., Paelke, G., Serafin, C.: Preliminary human factors design guidelines for driver information systems, Tech report: FHWA-RD-94-087. US Government Printing Office, Washington, DC (1995) 10. Libuda, L.: Improving Clarification Dialogs in Speech Command Systems with the Help of User Modeling: A Conceptualization for an In-Car User Interface. In: GI-Workshop ABIS-Adaptivität und Benutzermodellierung in interaktiven Softwaresystemen (2001) 11. Ito, T., Miki, Y.: Japan’s safety guideline on in-vehicle display systems. In: Proceedings of the Fourth ITS World Congress, Brussels, Belgium, VERTIS (1997) 12. Narzt, W., Pomberger, G., Ferscha, A., Kolb, D., Müller, R., Wieghardt, J., Hörtner, H., Lindinger, C.: A New Visualization Concept for Navigation Systems. In: Stary, C., Stephanidis, C. (eds.) UI4ALL 2004. LNCS, vol. 3196, pp. 440–451. Springer, Heidelberg (2004) 13. Ross, T., Vaughan, G., Engert, A., Peters, H., Burnett, G., May, A.: Human factors design guidelines for information presentation by route guidance and navigation systems. CEC DRIVE II Project V2008 HARDIE, Deliverable 19, May 1995, 79 p. (1995) 14. Ross, T., Burnett, G.: Evaluating the human-machine interface to vehicle navigation systems as an example of ubiquitous computing. International Journal of Human-Computer Studies 55(4), 661–674 (2001) 15. Rottner, R.: Entwicklung einer praxistauglichen Methode zur Relativbewertung von modernen Anzeige- und Bedienkonzepten im Kraftfahrzeug, Diploma Thesis, Technische Universität München (2002) 16. Salvucci, D.D.: Predicting the effects of in-car interface use on driver performance: an integrated model approach. International Journal of Human-Computer Studies 55(1), 85–107 (2001) 17. Stevens, A., Board, P.A., Quimby, A.: A Safety Checklist for the Assessment of inVehicle Information Systems: Scoring Proforma, Project Report PA3536-A/99, Crowthorne, UK, Transport Research Laboratory (1999) 18. Stevens, A., Quimby, A., Board, A., Kersloot, T., Burns, P.: Design Guidelines for Safety of In-Vehicle Information Systems, TRL Limited (2004) 19. Sojourmer, R.J., Antin, J.F.: The effects of a simulated head-up display speedometer on perceptual task performance. Human Factors 32(3), 329–339 (1990) 20. Weintraub, D.J.: Human Factors Issues in Head-Up Display Design: The Book of HUD Descriptive, State-of-the-art report, Dayton University Ohio Research Institute (1992) 21. Brooke, J.: SUS: A quick and dirty usability scale. In: Jordan, P., Thomas, B., Weerdmeester, B., McClelland, I. (eds.) Usability evaluation in industry, pp. 189–194. Taylor & Francis, London (1996) 22. Eilers, K., Nachreiner, F., Hänecke, K.: Entwicklung und Überprüfung einer Skala zur Erfassung subjektiv erlebter Anstrengung. Zeitschrift für Arbeitswissenschaft 40(4), 215– 224 (1986)
Dynamic Maps for Future Navigation Systems: Agile Design Exploration of User Interface Concepts Volker Paelke1 and Karsten Nebe2 1
Leibniz Universität Hannover, IKG Appelstrasse 9a, 30167 Hannover Germany +49-511-7622672
[email protected] 2 University of Paderborn, C-LAB Fürstenallee 11, 33100 Paderborn Germany +49-5251-606132
[email protected]
Abstract. Maps have traditionally been used to support orientation and navigation. Navigation systems shift the focus from printed maps to interactive systems. The key goal of navigation systems is to simplify specific tasks, e.g. route planning or route following. While users of navigation systems need less skills in navigation specific activities, e.g. reading maps or manual route planning, they must now interact with the user interface of the navigation device, which requires a different set of skills. Current navigation systems aim to simplify the interaction by providing interfaces that use basic interaction mechanisms (e.g. button based interfaces on a touch-screen), exploiting the fact that many users are already familiar with such techniques. In the presentation of the information most navigation systems employ map-like displays, possibly combined with additional information, again to exploit familiarity. While such an approach can help with early adoption, it can also limit usefulness and usability. There is, however, a large opportunity to improve input, output and functionality of navigation systems. In this paper we expand a model of classical map based communication to identify possibilities where “dynamic maps” can enhance map based communication in navigation systems. We report on how an agile design exploration process was applied to examine the design space spanned by the new model and to develop system probes. We discuss the user feedback and its implications for future interface concepts for navigation systems.
1 Motivation Navigation in unfamiliar environments is a common task for many users in a variety of contexts. With the availability of low-cost positioning technologies like GPS the implementation of navigation systems as a commodity item for large users groups has become viable. The market for so called personal navigation devices (PNDs) has experienced a rapid growth in recent years. Until about 2001 the only example of J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 169–178, 2009. © Springer-Verlag Berlin Heidelberg 2009
170
V. Paelke and K. Nebe
navigation systems aimed at non-expert users was the automotive navigation market which was dominated by embedded navigation systems. Since then portable PNDs have taken over a large percentage of that market. While marketed as personal devices that are “portable” and suitable for “hand-held” use, the current generation of PNDs is designed specifically for in-car use on the road network. Efforts to address the specific requirements of pedestrians, cyclists, bikers or off-road navigation have been limited so far. While there is a lot of variety in the details of the interaction mechanisms and presentation styles most PND systems use a common design, employing a touch screen (and sometimes speech recognition) for input and a 2D map or 2.5D perspective map view combined with audio output to convey guidance information. With increasing competition developers are currently looking at a variety of innovations to distinguish their products from the competition. Typical examples include the use of wide-screen displays, the use of textured 3D models in the visualization or the integration with on-line services to provide “intelligent” routing. While many of these new features are effective from a marketing perspective they add little to the usability of the systems, as experience reports of users show. Developers of novel approaches to the navigation problem face a complex design problem. While significant experience has been gathered in the domain of car navigation there is a lack of knowledge about the requirements of users in other usage contexts. Developers and designers often restrict themselves to established technologies that may not be optimal for other usage contexts. This is even more true for end-users who may participate in a design process. While end-users can provide valuable information about their usage goals and the context of use they are typically not able to provide novel design ideas, especially for concepts that differ significantly from established standards. To address these shortcomings requires to develop a better and more detailed understanding of the navigation requirements of users in a variety of usage situations, to explore possible implementation approaches without limiting the technology selection to currently available standard technologies and to produce alternative design solutions that are not limited to the emulation and automation of previous technologies.
2 Approach To explore the design space of novel navigation systems we have pursued an approach that builds on a classical model for cartographic communication, to identify possible modifications and additions enabled by interactive systems. This provides a foundation for the systematic identification of design options. In a combined design and user study we have applied an agile prototyping process to explore the resulting design space and to establish user requirements for a variety of non-standard navigation applications. Following a user-centred participatory design process prototypes of possible user interface concepts for the domains of pedestrian navigation and on/off road car navigation were developed. The initial stage of the project was conducted as a user study of current navigation systems and approaches to navigation. The findings showed that users are still easily frustrated with the available user interfaces even when using current best in class
Dynamic Maps for Future Navigation Systems: Agile Design Exploration
171
devices. We also found that users rate many of the current innovations lower after practical experience than in a pre-use interviews. In the second stage of the project an agile iterative process was used to prototype and evaluate novel user interface concepts for different use scenarios. Several promising designs were refined into system probes that are suitable for real-world evaluation, providing practical feedback on usability and user preferences with novel user interface paradigms. The results indicate that there is a large opportunity to provide users of navigation systems with improved experiences if developers are willing to modify systems significantly compared to current navigation systems.
3 A Communication Model for Dynamic Maps The primary function of maps is communication of information in a spatial context. Communication refers to the exchange of information between a source and a destination through a common system of signs and symbols. While the ultimate goal of communication is usually the exchange of information between persons, communication is affected through the use of intermediary media. In the case of conventional printed maps for navigation purposes the ultimate intention of the user (destination) is not known at the time of production. Thus, when cartographers (source) prepare the map they have to include all information that is potentially required in a number of different usage situations. Since this information must be graphically encoded (signs) on a limited surface a careful balance is required to avoid a cluttered presentation while still including all necessary information. The information used in the preparation of a map must be acquired from the real spatial environment.
physical environment
raw data
model
map
User Fig. 1. Cartographic communication model
Figure 1 shows a communication model that captures the information processing stages when preparing and using paper maps. Information from the physical environment is captured (measured). The resulting raw data is then processed (simplified, unified) into a model that captures the essential spatial information. In the next stage the map is produced as a static graphical representation of the information in the model. Finally, during use at a later time the user has to read and interpret the printed map to extract relevant information and to apply it in his current context. The move from static printed maps to dynamic interactive systems can improve communication by making the mapping process from real world environment to the perceivable presentation dynamic. Such a dynamic “map” can respond at run-time to
172
V. Paelke and K. Nebe
changes in the environment and data as well as to user interaction. Figure 2 illustrates how the communication model can be extended to cover these dynamic aspects. direct perception of real environment (e.g. augmented reality)
6
1 physical environment
raw data
model
map
User 2 3 4 5
presentation interaction (web-mapping, nav.-systems)
model interaction (e.g. GIS)
data interaction (e.g. real-time monitoring system)
physical interaction (e.g. tangible user interfaces)
Fig. 2. Extended cartographic communication model for dynamic maps
1. The first extension of the model addresses the fact that dynamic maps are in fact multi-media systems. In addition to static 2D visual content other modalities can be employed. E.g. 3D graphics can be used, the display can be animated and visual output can be complemented by other modalities, such as audio output. The fact that the content can be selectively adapted to the user requirements at run-time makes it possible to use richer and more elaborate visual presentations that may be easier to interpret or at least more attractive for users. In printed maps the static presentation has to optimized for the uncluttered presentation of a maximum of information that may not even be required at the time of use. Because the information in dynamic maps can be limited to what is actually required in a given situation, more band-width (e.g. screen area) can be used to display this information. Typical examples of this extension in current navigation systems can be found in the use of animated 3D views or audio output. A central difference between dynamic and static maps is the possibility to interact with the system and influence its presentation and content. Extensions 2-5 address the possibilities enables by such interaction. 2. The most simple extension is interaction that changes the map presentation. In such a system the production of the model from real world data can still be a static process. The interaction influences only the generation of the media output (e.g. the map image) from the model. This interaction could be selection of the content coverage (area), selection of themes (which information to display), adjusting the scale of the display (zoom) and presentation style (e.g. choosing different color schemes for night driving). Examples in navigation systems include pan and zoom
Dynamic Maps for Future Navigation Systems: Agile Design Exploration
173
interaction, the selection of poi (points of interest) themes or switching between map and satellite image views. 3. More complex is interaction that involves modification or extension of the model itself. These changes can be as simple as annotation of locations with text or images (e.g. the push-pins in google maps) or involve complex changes of data in the model, e.g. to record changes in the environment (as typical functions of GIS systems). Traditionally, changes to the model were the domain of specialized digital mapping companies. However, with developments like openstreetmap.org the ability to effect such changes becomes more widespread. Use in navigation systems at this point in time is still very limited, though some new navigation systems include functionality for basic extensions of the model through the definition of personal poi’s or the annotation of changes in the road-network. 4. Even more involved are changes to the acquisition of raw data at run-time, e.g. by configuring sensor settings. A central challenge for systems that aim to pursue such an approach is that the processing of the raw data into a model must be completely automated. In general the acquisition of raw data can be achieved through arbitrary sensors systems. A typical example for such an interaction is setting the controls of a weather-radar or air/space-borne sensor. In navigation systems the incorporation of real-time raw data has seen only limited use so far. Navigation systems that incorporation current traffic data from a variety of sensors are an example, where a limited real-time data set is integrated. 5. In addition to interaction with stages of the data processing pipeline users can interact directly with the real environment. In a real-time system the resulting changes can impact the whole processing/display pipeline. Tangible user interfaces are an example for such an approach, where the manipulation of physical objects is used to control a software system [7]. Direct interaction with a real environment is used only in a very limited sense in current navigation systems. The change of the user’s location (as tracked by a GPS receiver) constitutes an interaction in the real environment and is typically used to adapt the presentation content in most navigation systems. The sensing of other context parameters could potentially be useful for automatic adaptation of the presentation content and style to the users current context. A common characteristic of all extensions is that the ultimate presentation is still completely determined by the model (which may incorporate real-time sensor data). However, dynamic maps can also incorporate the real environment directly: 6. Direct integration of the real environment means that the presentation generated from the model is integrated in a coherent way with the perception of the real environment. Typical approaches are mixed-reality and augmented-reality systems [1], where computer graphics objects are integrated into the view of the user. Several experimental navigation systems employ augmented-reality views, where the guidance information is realized as a graphical overlay on the environment, alleviating the perceptual effort of a context switch between the use of the navigation device and the actual navigation tasks, e.g. driving a car. As can be seen from the examples, the use of features from extension (1) and (2) are quite common in current navigation devices, while the possibilities enabled by extensions (3)-(6) are just beginning to be explored. Furthermore, a wide variety of
174
V. Paelke and K. Nebe
control mechanisms are available to effect the interactions, opening up a further design dimension that is orthogonal to the model. The extended model assists with the design of future interactive navigation systems by indicating possible modifications and additions. The following sections describe how an exploratory study of this design space was conducted for several non-standard navigation applications.
4 Agile Design Exploration Process The initial stage of the design exploration was conducted as a user study of current navigation systems and approaches to navigation, specifically car-navigation, innercity pedestrian navigation and off-road driving. The findings showed that users are still easily frustrated with the available user interfaces even when using current best in class devices. We also found that users rate many innovations in current PNDs like “intelligent routes” or “3D views” lower after practical experience than in a pre-use interviews. One problem that became obvious in the initial user study is that while end-users can provide valuable information and are essential in the evaluation of existing approaches, they are typically not able to contribute to the generation of novel design ideas, especially not for concepts that differ significantly from established standards. In the second stage we therefore focused our effort on the generation of testable prototypes for novel navigation systems. Several promising designs were refined into system probes that are suitable for evaluation with real users, providing practical feedback on usability and user preferences with novel user interface paradigms such as the use of dynamic maps and augmented reality output. The key principle behind the agile exploration process is to iteratively develop refinements of a system in rapid succession, as advocated by agile software engineering practices like scrum [6]. These prototypes are then used to evaluate the system with users. The results guide the refinement in the next iteration. In general, development starts with a rough approximation and then proceeds towards components that are increasingly refined. Scrum is a popular agile process in which development activities are organized into short 30 day iterations, called sprints. Each sprint starts with a planning meeting in which the functionality to be developed is selected from the product backlog, a flexible requirements repository that evolves with the product. In the beginning it only contains high-level requirements and its content gets more and more precise with each sprint. The scrum team and its manager – the scrum master – meet in short, daily meetings, called daily scrum, to report progress, impediments and further proceedings. Every sprint ends with a sprint review, where the current product increment is demonstrated to project stakeholders. The flexibility of scrum allows to integrate user centered design activities and to address technology constraints and is therefore well suited for our purposes. Figure 3 illustrates the agile exploration process. The key extension to the software focused scrum process is the application of the same organization principle to user centred design activities [3] and the inclusion of an extended design exploration phase prior to actual development. The initial design exploration is conducted in the iteration on the left of figure 3. Once a promising design has been identified the implementation proceeds in a scrum based software development process, shown in
Dynamic Maps for Future Navigation Systems: Agile Design Exploration
175
Fig. 3. Extended scrum process with added exploration phase (left)
the iteration on the right of figure 3. The initial exploration phase is organized according to scrum principles, but focuses on the exploration of the available design space from the functional, interface and hardware perspective. Central to this exploration is the generation of potential design alternatives, which can be guided by the dynamic maps communication model. A central difference between the exploration phase and usual scrum activities is the use of non-code based design representations like paper prototypes and mock-ups that are quick and cheap to generate and can be discarded without a high cost penalty. A detailed description of the extended scrum process to incorporate design exploration and user-centred activities during the implementation phase is published in [4].
5 Prototypes One specific navigation problem is the navigation of pedestrians in inner-city environments. Much recent research in pedestrian guidance has addressed the communication of routes and the associated wayfinding instructions. However, most currently available commercial systems that claim to support pedestrian navigation are simply adaptations of existing car navigation systems. Research has shown that the requirements of pedestrians are very distinct from users of a car navigation system. The integration of landmarks into routing instructions is essential for effective pedestrian navigation [5] and a number of experiments have already been conducted to examine various forms of information presentation. In several prototypes and a number of user test we have examined landmark and route visualization styles [2], ranging from classical maps over route descriptions to stories that are designed to give a memorable account of a path. One focus in the PedNav project is the study of different visual representations for pedestrian navigation. In particular we have developed prototypes that use concrete representation of landmarks (photographic images of landmark buildings at decision points along the way) and compared them to
176
V. Paelke and K. Nebe
abstract visualizations (using non-photorealistic rendering of 3D geometry models of landmarks embedded in a 3D city model) and the use of augmented reality visualizations (that incorporate visual path indications directly into the field of view of the user). Figure 4 shows examples of the different presentation styles.
Fig. 4. Exemplary dynamic map presentation styles for pedestrian navigation (augmented map, 3D world viewer, augmented reality view)
Depending on the actual context of use the presentations have quite distinct properties: The use of photorealistic images works well if the actual view is similar. Changes in lighting (e.g. use of the system by night) or a different season can severely affect the ability to recognize landmarks. Abstracted non-photorealistic models are not subject to this variability, but require a distinct geometry to be useful. No clear benefit of a 3D perspective view could be identified so far. Augmented reality views completely remove the need for landmark recognition but are subject to reservations due to the need to permanently interact with the display (either on a portable device, e.g. a PDA, or a head-mounted display) and the limitations of available positioning technologies. A clear benefit of dynamic map based systems is that they can support various presentation styles and can adapt to changes in context or user preferences. A second use case that we studied with a focus on the interaction aspect is off-road navigation. In some sense off-road navigation is similar to pedestrian: Movement is not bound to a well defined road-network and distinct street names or well defined decision points may be lacking. The central goal in off-road navigation is to reach a specific destination. The routing process is subject to obstacles that may (e.g. geographic obstacles) or may not be captured in an existing database (e.g. flooding). To address this problem the user must be able to specify additional constraints in the interaction with the system (e.g. to route around obstacles) and the display of guidance information must take the specifics of the environment into account (e.g. turn indications on tracks, course information in open environments). A key requirement that we addressed in our prototype is the need for different interaction mechanisms in different usage situations. A very precise interaction concept can be challenging while driving off-road, but would be suitable while standing and results in faster interaction when applicable. If the context changes
Dynamic Maps for Future Navigation Systems: Agile Design Exploration
177
Fig. 5. Navigation test on-road (a) and off-road (b) with adapted visualization and interaction techniques
frequently, a context adaptive interface offers potential benefits in comparison to nonadaptive systems. Our prototype off-road navigation system (Figure 5) therefore supports different ways of entering data by switching input devices and also adapts the information display depending on context parameters. It was used in experiments to determine if users were able to follow the adaptation. Our observations show a strong contrast in acceptance between adaptive information display and adaptive interaction mechanisms: Users did not like to change interaction mechanism, especially if this was enforced by the system. While users liked to have multiple mechanisms and devices to interact with the system, they were frequently irritated if they had not been notified by the system about the adaptation. The automatic adaptation of information display (e.g. on-road turn indication vs. off-road direction indication), on the other hand, was well accepted and easily understood by all test users.
6 Conclusions In summary, a large opportunity exists to improve the usability of future navigation systems by improving input, output and functionality and adapting them better to users and the task at hand. A user-centred process is required to develop innovations in these domains that are of actual benefit to the user. In the work reported here we have applied an agile design process to explore the design space for navigation systems without limiting the process to established standard hardware platforms. Selected system probes were developed and tested to validate key assumptions and inform future design decisions. The results suggest interesting opportunities for the development of future navigation systems.
Acknowledgments We would like to thank our colleagues Birgit Elias, Claus Brenner, Sascha Tönnies, Stefan Radomski, Marcel Chaouali (Leibniz Universität Hannover), Julian Masuhr (D-LABS GmbH, Potsdam), Markus Düchting, Florian Klompmaker (C-LAB, Paderborn) and Christian Reimann (Siemens SiS, Paderborn) for their contributions in developing and evaluating the various system probes.
178
V. Paelke and K. Nebe
References 1. Azuma, R.: A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments 6(4) (1997) 2. Elias, B., Paelke, V.: User-centered Design of Landmark Visualization. In: Meng, Z., Winter (eds.) Map-based Mobile Services. Lecture Notes in Geoinformation and Cartography. Springer, Heidelberg (2008) 3. Mayhew, D.J.: The Usability Engineering Lifecycle. Morgan Kaufmann, San Francisco (1999) 4. Paelke, V., Nebe, K.: Integrating agile methods for mixed reality design space exploration. In: Proc. 7th ACM Conference on Designing interactive Systems, DIS 2008, Cape Town, South Africa (2008) 5. Raubal, M., Winter, S.: Enriching wayfinding instructions with local landmarks. In: Egenhofer, M.J., Mark, D.M. (eds.) GIScience 2002. LNCS, vol. 2478, p. 243. Springer, Heidelberg (2002) 6. Schwaber, K., Beedle, M.: Agile Software Development with Scrum. Prentice Hall, Upper Saddle River (2002) 7. Ullmer, B., Ishii, H.: Emerging Frameworks for Tangible User Interfaces. In: Carroll, J.M. (ed.) HCI in the New Millenium. Addison-Wesley, Reading (2001)
Flight Searching – A Comparison of Two User-Interface Design Strategies Antti Pirhonen1 and Niko Kotilainen2 1
Department of Computer Science and Information Systems 2 Deparment of Mathematical Information Technology P.O. Box 35, FI-40014 University of Jyväskylä, Finland {pianta,npkotila}@jyu.fi
Abstract. The most usable user-interface is not necessarily the most popular. For example, the extent to which an interaction is based on graphics can depend highly on convention rather than usability. This study compares contemporary flight search applications in order to investigate whether a more extensive use of graphics can enhance usability. Two user-interfaces are compared: one follows the ideal principles of graphical user-interfaces and direct manipulation, while the second interface requires text to be entered with a keyboard. The results of the comparison indicate that even an early prototype of the graphics based alternative performed better than the typical formula based search application for several measurements of usability. Keywords: Flight search, direct manipulation, graphical user interface.
1 Introduction Why did the graphical user-interface (GUI) become the standard in personal computing after an era of command line interfaces (CLI)? This is a contentious question which has no simple answer. When the Microsoft Windows GUI was introduced, its superiority over CLIs was not at all clear [2, 5]. Some 20 years later, it can be argued that the first sceptical comments were caused by the relatively slow and clumsy graphics of the computers of that time. However, the critique against GUIs was not based on this kind of simple technical argument – everyone knew that the graphics would inevitably become more fluent over time. Therefore the target of critical statements was not the first clumsy implementations of GUI but the underlying logic and principles, recently also trust [8]. Much of this critique can be understood in terms of “user profiles”. Before the advent of the GUI for personal computers, a typical computer user was a technically oriented computing specialist. The CLIs and their cryptic commands had been developed to satisfy the needs of expert users, for whom the essence of computing was effective code and underlying logic. The only value of the user interface was its ability to reflect the underlying computation. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 179–188, 2009. © Springer-Verlag Berlin Heidelberg 2009
180
A. Pirhonen and N. Kotilainen
The introduction of the GUI coincided with a major revolution in computing, which was not technical by nature. When the personal computer became mass-marketed, there was a clear need to develop the computer with ordinary people and their practical needs in mind. From this point on, a typical user was not considered interested in the effectiveness of the code or other technical issues, as long as the performance of a given task is effective. Meanwhile, the development of microprocessors has provided so much computational capacity that the optimisation of code and other traditional virtues of computing no longer hold the same value. A computer can even be programmed with the help of graphical tools, which may not produce the neat and compact code that a human programmer could produce, but sufficiently for a contemporary processor. At the start of the GUI revolution, applications were bursting with elaborate graphics. It is easy to argue that these graphical fireworks were designed to impress prospective computer buyers, and that most of the new features did not have much to do with the actual tasks to be performed within the application. A famous rational argument for GUIs was based on the notion of direct manipulation [7]. The idea was that with a GUI, the user has direct access to the entities to be worked with, without the need to remember complex syntaxes. In this study, we first briefly analyse the current usage of user-interfaces (UIs), and the arguments for and against the various types. We then examine the differences in the approaches by comparing a formula-based and a graphics-based UI for a flight search application. The pros and cons of each strategy are analysed, based on a usability evaluation.
2 Differences in UIs in Terms of the Usage of Graphics and Input Devices The use of graphics divides user interfaces to different genres. In addition to the use of graphics as part of a visual display unit (VDU), the use of a graphical input device also creates a distinction between the main categories of user interfaces. Interaction in a CLI relies on text input, and therefore the dominating input device is a keyboard. Input for a GUI largely relies on mouse or another two-dimensional pointing device. After the heyday of an elaborate use of graphics in GUIs, the novelty effect of the new interaction style has faded out. The GUI has established itself as the dominant paradigm for interaction design but the use of graphics alone is no longer considered the hallmark of an effective program. At the same time, the borderline between GUIs and CLIs has become somewhat ambiguous. A CLI is often implemented within GUI. For example, many GUIs require text input in certain text fields, making the interaction with the application similar to a CLI. Therefore, it is not necessarily appropriate to make a distinction between the categories anymore. Rather, a CLI and a GUI in their original meaning represent two ends of a continuum. Most user interfaces fall somewhere between these extremes. In addition to the use of output graphics and the input device, a third property is often connected with GUIs: the simulation of real world objects. These kinds of user interface elements are sometimes referred to as metaphors. Although the use of the
Flight Searching – A Comparison of Two User-Interface Design Strategies
181
word metaphor in this context is contentious1, it is clear that the imitation of real world entities is one of the most typical interaction design strategies for creating intuitive mappings within an application. However, the imitation of physical objects with digital technology can sometimes hinder the development of technology. For example, the physical design of the first generation digital cameras highly resembles that of film cameras. Only a few manufacturers had the courage to completely redesign the camera. Indeed, most digital cameras continue to carry the outdated physical restrictions of the cameras of the past with them. This could be due to conservative consumers who want that their digital cameras to look like cameras. Presumably, these design inefficiencies will be gradually reduced as new designs gain mainstream acceptance. In computer applications, the same kind of evolution can be observed. For instance, the first graphics based self-service banking applications imitated the familiar paper forms. After a while, they were replaced with more efficient forms based on a clear step-by-step procedure instead of the completion of a one single form. A related debate from the early 1990’s was evoked by Bonnie Nardi and the notion of visual formalisms [4]. She strongly opposed ‘metaphors’ in design, i.e., the imitation of real world objects such as the famous ‘desktop metaphor’. Nardi implies that not everything in a user-interface needs to have a direct counterpart in the physical world. The concept of visual formalism was intended to combine direct manipulation with a user interface designed in terms of the capabilities of the computer rather than the technology of the past. While it is possible to discuss different UI-styles and their properties endlessly, it appears that the borderline between the concept of a GUI and a CLI is usually technically defined. A GUI is graphical because the output is graphics. For instance, rather than using letters as the atomic units of words, GUIs draw each letter in VDU with a large number of pixels. However, in terms of interaction design this kind of technical definition is not necessarily appropriate. Even in GUIs, CLI-like interaction can be implemented. Therefore more interesting than comparing a GUI and a CLI is to analyse how current practices of using GUIs fulfils the original ideas of direct manipulation. Do UIs provide direct access to underlying entities, or is interaction based on rules and syntaxes which need to be learned?
3 User-Interface for a Database Query Databases have inherited much of their terminology from the technology of the past, like files and folders. This nomenclature has made it easy to adopt highly abstract computing concepts. What is new compared to paper files and folders is the ease and 1
GUIs are sometimes said to be based on metaphors. However, precisely what is meant by metaphors in the context of user interfaces is questionable [6]. In metaphor theories, metaphors are the mental entities on which human conceptualisation processes are built. On the contrary, in the context of UIs the term metaphor denotes the imitation of real world objects. Since we find the latter use of the term metaphor inaccurate, we reject the use of the term in this study in favour of more accurate expressions. In other words, we don’t argue that the difference between GUI and CLI is that GUIs would be based on metaphors.
182
A. Pirhonen and N. Kotilainen
speed of managing the database. For example, making a query to a large database is revolutionarily easy compared to paper files. Perhaps, the superiority of digital databases has made it easy to accept a rather ‘techy’ interaction with the databases. For example in most applications, the queries require text input similar to that required in a CLI. The user enters text, and confirms this text input by pressing a physical (e.g. ‘Enter’ in a keyboard) or a virtual push-button. Text-based data entry has become so widely accepted in database queries that it has been implemented in all types of database applications and in a wide variety of contexts. However, in the following sections we describe an application design project, which made us consider whether formula based data query is necessarily the most usable strategy. 3.1 Searching for a Flight: The Creation of FlightMapper Air traffic has recently continued in growth, partly because of the emergence of budget airlines that have made flying a possibility for many travellers who could not previously afford it. In an effort to cut costs, many airlines have embraced online services. In particular, flight searches and booking are widely accomplished via the Internet. The quality of online services is a major competitive issue among airlines, and this should motivate investment into the development of these services [3]. Flight search facilities and their quality have been cited as the most important feature of online service from a users’ point-of-view [1]. Online flight searching applications are now available for most flight operators. In addition, there are a number of services which have access to the databases of several different operators. This has made it possible to easily search and compare different flight operators. All of the flight search applications we found were based on the same kind of userinterface design concept. The user chooses the departure airport, destination and date. This information is typed in the fields of a formula and then the search is launched by clicking a push-button. An intermediate screen shows that the search is being performed. After a while, the search results are presented. The kinds of database query formulas described above are so familiar that we rarely question their appropriateness. However, in the context of a flight search, a much easier and illustrative process can be implemented with a fairly simple2 webapplication. The creation of new kind of flight search application, FlightMapper, arose from very practical needs. It was found that the traditional formula based queries, which share resemblances with a CLI, might be satisfactory when the exact time, date, and departure location are fixed and known. However, consumers who have a flexible schedule but a small budget for travelling are often happy to choose the destination and the whole itinerary according to the most affordable option rather than be fixed to a specific date and route. In UI-design terms, these consumers present a different use 2
The system consists of a web server replying to users' queries for flights and a JavaScript client running in the users' web browser. Queries are made using the AJAX technique. Google Maps API was used in implementing map functionalities.
Flight Searching – A Comparison of Two User-Interface Design Strategies
183
Fig. 1. Screenshot of search results with FlightMapper
scenario than the ordinary business traveller. In this scenario, the traveller is trying to rapidly figure out the available flight routes between somewhere here to somewhere there. Rather than having certain airport or city or even country in mind, the user in our scenario scans opportunities to different areas and is content to continue or start the journey by train, bus, or some other vehicle to reach an interesting travel destination. The starting point for the creation of the user-interface of FlightMapper was an ordinary map. This map is the abstraction of geography that best meets the requirements of a nomadic traveller. It illustrates the physical directions and distances, major towns and borders of countries. When providing information for the application about the preferred place of departure, the map is utilised by simply pointing and clicking on a place on the map with a mouse pointer. The same is done for the preferred destination. FlightMapper then starts searching flights roughly (within a given radius) from the selected departure point to roughly the selected arrival point. The application then draws the available routes on the map, and prints the names of the airports and operators for each alternative. If no flights exist, no routes are drawn on the map. The user must then try another place of departure and/or destination. This change again utilises familiar GUI routines. The marked point of departure and the point of destination can be draggedand-dropped elsewhere on the map. In this way, the user can effortlessly scan a large number of travel plans. Figure 1 illustrates the search results when setting the departure point near Helsinki, Finland and destination point near Amsterdam, The Netherlands. The application returns several options. The pins indicating the departure and destination points can be dragged to new locations. Once the pins are dropped again, the application immediately returns new results.
184
A. Pirhonen and N. Kotilainen
The user-interface of FlightMapper relies on graphics and mouse operations like pointing, choosing, and drag-and-drop. It can therefore be seen as a representative of GUI and direct manipulation. We decided to contrast it with a typical, formula based flight search application. We hypothesised that for the needs of the above described use-scenario, FlightMapper would be faster and generally more appropriate. To verify this assumption, we organised a usability evaluation to compare the two design strategies. 3.2 Usability Evaluation Setting In the usability evaluation, we wanted to compare the overall usability and performance time in particular, of FlightMapper and a typical flight search application. From the typical, formula based flight search applications we chose www.amadeus.net as the representative of this kind of applications. Amadeus was chosen because it is widely used and has all the typical properties and functionalities of flight search applications. Twelve people from the University of Jyväskylä took part. Eight participants were male and four female. Five of them were staff and seven were students. The ages ranged from 21 to 48. The completion of the tasks took about 10-15 mins per participant, and the participants were rewarded with a movie ticket. The sessions were videotaped so that both the actions of the participant and the screen events could be traced. We used an external tft-display pointing towards the camera to enable this. Each of the participants had a set of ten simple tasks to be performed with both FlightMapper and Amadeus. Six of the participants started with Amadeus, and six with FlightMapper. The tasks were to find out if there are flight routes available between given areas. The participants had a printed table of 10 pairs of cities. After each search, the participant was supposed to mark on the table ‘yes’ or ‘no’. Before the first task with each application, each participant was given a very short demonstration about the use of the application. In practice, this meant that the researcher demonstrated how the application works. No hands-on trials were allowed before the first task, because we wished to get data about the learnability of the applications. Concerning usability, our focus was on performance time. To measuring the time taken for each task, we did not create any separate log file but relied on the time code of the digital video. After the sessions, we used the video to record the starting and completion time of each task. However, in this kind of task, the definition of starting time might be ambiguous; is it when the gaze is in the next task printed in the paper, or perhaps when the first physical movement toward the departure city is done with the mouse? To maintain reliability, we defined the starting time of one task as the completion time of the previous task. In practice, this meant the point of time when the participant wrote the search result on the paper. In the case of the first task, the starting time was defined as the first movement of the mouse. After the completion of all the search tasks, each participant was asked to complete a usability evaluation form. The form was a slightly modified (and translated) version of James Lewis’ post-study system usability questionnaire (PSSQU). The modifications included the elimination of questions concerning help-options, since there were none available for FlightMapper. This is because FlightMapper is only a
Flight Searching – A Comparison of Two User-Interface Design Strategies
185
prototype and lacks many of the functionalities of a final application. The problem of comparing a prototype with a widely used application is discussed later. Another modification was due to the comparative setting: the participants were asked the same questions concerning both applications. 3.3 The Results of the Usability Evaluation Performance Time We performed statistical analysis on the recorded performance times. Figure 2 illustrates the summary of the task completion times.
40 35
Time, seconds
30 25 Amadeus
20
FlightMapper
15 10 5 0 1
2
3
4
5
6
7
8
9
10
City pair
Fig. 2. Mean performance time, with the fastest and slowest removed
As described in the caption of Figure 2, we rejected the fastest and slowest performances. This is because according to our observations, both exceptionally fast and exceptionally slow times were due to experimental shortcomings. For instance, in some cases when the participant was searching for a flight, the route of the next task could be seen in the same view, resulting in zero performance time. This occurred when FlightMapper was looking for airports within a certain range, and by chance, two airports which were related to consecutive tasks, were in range of one single search. Figure 2 illustrates the differences in the use of the applications. First of all, it confirms our hypothesis that FlightMapper is more efficient than a traditional form based application. As can be seen from the curves, performance times shorten towards the end of the set with both applications, but with FlightMapper the drop is much clearer. However, the more drastic drop of performance time with FlightMapper cannot be interpreted as a higher level of learnability. Presumably, the participants were familiar with with Amadeus, or at least some similar kind of application. Interacting with FlightMapper, on the contrary, was not that familiar, although it was based on familiar GUI routines. Therefore it is natural that the drop of performance times in the beginning of the session is clear. In other words: the use of familiar Amadeus did not require learning to the extent as FlightMapper did, and therefore the learning curves should not be compared.
186
A. Pirhonen and N. Kotilainen
The drop of performance times is not steady with either of the applications. This is easy to understand, having followed the sessions: it simply took different times to either find a place on the map (FlightMapper) or to type the name of the city (Amadeus). The other common feature with these curves is that the second task took more time than the first one, but this is probably due to difference in time measurement, as reported above: the starting point of the first task was defined differently from the starting points of the other tasks. Also, especially with FlightMapper, it took more time to complete a task when no flights were found; the user waited for a while before noticing that there were no search results. In order to summarise the results of performance time measurement, we derived a t-test and a Wilcoxon test (because of the small number of participants) for the whole data. In the t-test for medians t = -3.519, df = 11, p = .005 (FlightMapper’s times were shorter), and for mean performance times t = -4.339, df = 11, p = .001 (FlightMapper’s times were shorter). In Wilcoxon test for medians Z = -2.275, p = .023 (FlightMapper’s times were shorter), and for mean values Z = -2.903, p = .004 (FlightMapper’s times were shorter). Subjective Evaluation The post-study questionnaire (PSSQU) provided with subjective observations about essential usability issues. The questionnaire measured overall usability, system usefulness, information quality, and user interface quality, on scale from 1 to 7. The number 1 indicated the highest value of usability. In the modified and translated version of the questionnaire there were 14 questions concerning each application. The mean values of each factor are presented in Table 1. From the table it can be seen that there were no large differences in the experienced usability. Concerning all other factors than interface quality, FlightMapper was assessed slightly better. In usefulness, difference in favour of FlightMapper was clearest. Table 1. Average usability scores in subjective evaluation
Overall usability
System usefulness
Information quality
Interface quality
FlightMapper
2,38
2,13
2,61
2,86
Amadeus
2,59
2,49
2,70
2,81
Difference in favour of FlightMapper
0,21
0,36
0,09
-0,05
Flight Searching – A Comparison of Two User-Interface Design Strategies
187
4 Conclusions and Discussion We started this report by contrasting CLIs and GUIs. We then compared two userinterfaces, of which one had inherited essential properties from CLIs, and the other one had clearly GUI with its extensive use of graphics, use of the mouse as the dominating input device, and the application of the principles of direct manipulation. What did we learn then? From a single case study, obviously, no universal conclusions can be drawn. However, we argue that this case study illustrates issues which are worth consideration in several contexts. On the basis of the results of this study, which one is better, formula based (analogous with CLI) or graphics based (analogous with GUI) user-interface? Let’s have a look at the quantitative results of this study – FlightMapper, the representative of GUI-style, was significantly faster, it was found more useful, slightly more usable and slightly better in terms of information quality. In terms of user-interface quality, the scores were practically equal. Are these results clear enough basis for arguing that FlightMapper is a better tool for finding a flight, not to speak about the comparison of text based and graphics based user-interfaces? In order to interpret the results, we will need to remember the setting: First, the form based application (Amadeus) is widely used, and probably has had a respectable evolution with numerous re-design iterations, where as FlightMapper is still an early prototype. Taking into account this David vs. Goliath setting, the results were relatively positive for FlightMapper. Second, FlightMapper was based on the use scenario of a flexible traveller. The tasks in the evaluation can be argued to be a direct reflection of that scenario. Undoubtedly, with tasks that did not reflect FlightMapper use scenario, the results would have been very different. Third, being a prototype, FlightMapper lacks many functionalities that are common in database queries. Therefore the role of this evaluation should be seen only as a test of which application best enables a fast scan of available routes. In the further development of FlightMapper, more functionality will emerge, and a broader evaluation will be carried out. This evaluation also showed that there are database query tasks in which ‘traditional’ GUI is faster and generally more usable than a text input based formula. It is quite understandable, that when more and more public services are going online, there are things to which people simply get used while using them regularly. They become de-facto standard. However from the point-of-view of usability, the result is not necessarily ideal. The comparison of the two UIs in this study does not follow the traditional division be the pro-users’ CLI and the consumers’ GUI. The graphics version was faster with all of the participants, whether technically oriented or not. This shows that different interaction designs should be considered by all user groups. The purpose of this study was not to prove one user-interface design strategy as superior to another. Rather, we are illustrating that in terms of usability, current GUI design conventions do not always propose the best design practice but can still rely on the interaction paradigm of CLIs.
188
A. Pirhonen and N. Kotilainen
References 1. Benckendorff, P.: An exploratory analyisis of traveler preferences for airline website content. Information Technology & Tourism 8, 149–159 (2006) 2. Hazari, S.I., Reaves, R.R.: Student preferences toward microcomputer user interfaces. Computers & Education 22(3), 225–229 (1994) 3. Long, F., Poskitt, H.: Aerlingus.com – A Usability Case Study. In: Proceedings of the Irish Ergonomics Society Annual Conference, pp. 42–47 (2003) 4. Nardi, B.A., Zarmer, C.L.: Beyond models and metaphors: Visual formalisms in user interface design. Journal of Visual Languages and Computing 4, 5–33 (1993) 5. Petre, M., Green, T.R.G.: Is graphical notation really superior to text, or just different? Some claims by logic designers about graphics in notation. In: Proceedings of the Fifth Conference on Cognitive Ergonomics, Urbino, Italy, September 3-6 (1990) 6. Pirhonen, A.: To simulate or to stimulate? In search of the power of metaphor in design. In: Pirhonen, A., Isomäki, H., Roast, C., Saariluoma, P. (eds.) Future Interaction Design, pp. 105–123. Springer, London (2005) 7. Shneiderman, B.: Designing the user interface: Strategies for effective human-computer interaction. Addison Wesley Longman, Reading (1998) 8. Takayama, L., Kandogan, E.: Trust as an underlying factor of system administrator interface choice. In: Extended abstracts of CHI 2006, pp. 1391–1396. ACM Press, New York (2006)
Agent-Based Driver Abnormality Estimation Tony Poitschke, Florian Laquai, and Gerhard Rigoll Technische Universität München Institute for Human-Machine Communication Theresienstrasse 90, 80333 Munich, Germany {poitschke,laquai,rigoll}@tum.de
Abstract. For enhancing current driver assistance and information systems with regard to the capability to recognize an individual driver’s needs, we conceive a system based on fuzzy logic and a multi-agent-framework. We investigate how it is possible to gain useful information about the driver from typical vehicle data and apply the knowledge on our system. In a pre-stage, the system learns the driver’s regular steering manner with the help of fuzzy inference models. By comparing his regular and current manner, the system recognizes whether the driver is possibly impaired and betakes in a risky situation. Furthermore, the steering behavior and traffical situation are continuously observed for similar pattern. According to the obtained information, the system tries to conform its assistance functionalities to the driver’s needs.
1 Introduction and Motivation Nowadays, customers of automobiles are often confronted with new systems, which shall provide more comfort and safety. However, such complex assistance and information systems have a significant impact in car accidents. To ensure a higher safety for all traffic participants, modern systems have to involve the human state and behavior as essential factors. Driving is a complex interaction process between driver, vehicle and environment. However, engineers included the human factor in the design of such systems not yet, e.g., current assistance systems as the anti-lock brake system ABS, only consider vehicle information. Current systems analyze the environment to warn the driver about risky situations, e.g., lane departure warning. This system tracks the road markings and calculates the course of the vehicle using wheel angle and velocity. Further, the system just considers the usage of the turn signal to differ an intended lane leaving from a drive mistake. In reality, the driver often librates inside of the lane or sometimes carries out a lane change without setting a turn signal. As a result, the driver suffers from unnecessary interventions and warnings from the system. Further, every user receives the same level of support regardless of his regular driving characteristics, experience, skills or age. The desired modality or grade of assistance varies according to the driving behavior. For optimal assistance, it is desirable to create systems with high considerateness of driver state and behavior. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 189–198, 2009. © Springer-Verlag Berlin Heidelberg 2009
190
T. Poitschke, F. Laquai, and G. Rigoll
2 Driver State Parameters The general driver state can be regarded as an information tuple comprising all driverelevant information about the person who is currently engaged in a driving task. The consideration of the driver’s state includes physical as well as psychological aspects. According to [9], all information parameters affecting the driver’s actual condition can be distinguished in regard to their time possibility in changing: non or long-term variable (driving experience and skills, personality, etc.), mid-term variable (fatigue, circadian rhythm, individual driving strategy, restriction due to current health, influence of alcohol and drugs, etc.), and short-term variable (emotion, vigilance, intention, situation awareness, etc.). Distraction denotes the disturbance of the ability to maintain the focus on the essential object due to lack of concentration, lack of interest or attraction by another object. Sources of distractions can be both, internal and external. Physical urges, own thoughts or even emotions are some internal factors that can effect distraction. In contrast, possible external influences are physical stimuli for any human senses. If the focus of the driver is attracted by some visual stimulus, e.g., a blinking light within his sight area or rather a dominant stimulus averts the percipience of the essential stimulus in the effective field of view, then it can be assumed that the driver is distracted. Here, attention should be paid to the differentiation between external directed attentiveness and self-directed attentiveness. The reassignment of resources from the primary and secondary driving tasks to diverse tertiary tasks is not distraction but rather turning away. The assignation, what resources are assigned to what tasks at which time, is done deliberately by the driver himself. 2.1 Acquisition of Driver State Factors Determining driver state factors for proper safety-relevant assistance presents a problematic challenge. Due to the mental nature of most factors, a direct access for measurement is in principle not possible. Thus, most studies and approaches use specific manifestations as metrics for inference on the current level of the respective parameter. Therefore, it is necessary to have indicators which can be determined continuously during a drive. In this work, we are engaged in investigating the representativeness of various typical vehicle data as basis to imply individual behaviors or attentiveness of drivers and reducing the mentioned handicaps of current assistance systems with the help of that information. Hence, only the relevant metric types and their relevance for a reliable acquisition of mid- and short-term variable driver state factors are presented in the following. 2.1.1 Longitudinal Control Parameters Longitudinal control parameters represent all variables relating to the longitudinal vehicle steering such as acceleration, velocity, headway distance, brake, throttle position, etc. Their relevance for a driver’s state was investigated in numerous studies. A very common assumption in most studies is that the driver tries to reduce the main task load, i.e. stress from driving, while he is engaged in other tasks at the same time. This was primarily noticeable in the driver’s speed regulation. In general, the test persons in [16] slowed down while performing auxiliary tasks. The essential indicator
Agent-Based Driver Abnormality Estimation
191
was the throttle position. About 80% of the subjects showed an alternate behavior in fine throttle corrections during secondary activities. While focused on multiple tasks, drivers could not maintain all activities continuously and tend to pause temporarily the speed adjustments. Furthermore, [7,16] stated that longitudinal control measures as indicator for diversions are more appropriate than lateral variables. It should be noted that the tests were performed both, on straight roads and curves. Similar results could previously be seen in [1,4]. Moreover, [5] observed that before such compensatory behavior occurred, there was an increase of headway variation and speed that already hint at higher demand of the driver. In [6] it was also noticed that during concurrent mobile phone use and cognitive tasks a driver’s stop behavior is significantly affected. In addition to more intense brakings, the stopping distance – i.e. the headway distance to stop lines or intersections – is shorter. The results of the study regarding the relation between longitudinal steering performance and strain – as representative for all driver state factors – could be summarized by following points: (a) additional workload mainly reduces the driver’s capability of interaction with traffic environment, (b) increasing task complexity and cognitive demands induce a reduced speed control, (c) typical indicators: increased speed variation, increased distance variation, harder decelerations, and (d) compensatory behavior in form of speed decrease. 2.1.2 Lateral Control Parameters Lateral control parameters are all variables relating to a vehicle’s lateral movement such as steering wheel angle, steering frequency, lateral position, lateral deviation, lateral acceleration, etc. [15] showed that the additional strain due to several visual auxiliary tasks caused an increase in the steering wheel reversal rate. The number of steering motions in the higher frequency range increased significantly. According to [10], the percentage of high frequent steering motions can be interpreted as an objective metric of strain. Based on this knowledge, a steering entropy was introduced to quantify a driver’s effort to maintain a lateral safety clearance [11]. A significant alteration in the lateral driving behavior in terms of phone usage could be seen in [14]. In addition, according to the statistical analysis in [13], lateral position standard deviation and steering wheel angle seem to represent two of the most important variables for driver’s impairment detection. Following points are essential findings from the surveys: (a) additional workload during driving can induce variations in lateral steering behavior, and (b) steering frequency presents an adequate objective metric for strain.
3 Concept and Implementation Our system acts as a virtual fellow passenger in a purely advisory capacity. It informs the driver about potential traffical conflicts depending on the driver’s current performance and recommends suitable behaviors to handle different situations. Therefore, this system shall be able to dynamically model the driver’s steering behavior and detect abnormality in regard to the driving style. The basic idea is the direct usage of regular vehicle information to build up knowledge about the relation between environment and driver. All relevant
192
T. Poitschke, F. Laquai, and G. Rigoll
information (e.g., present steering behavior and traffic conditions) are logged continuously in the history database. Significant discrepancy between the current and previous steering behavior are determined by a widespread agent network to estimate safety relevant impairment of the driver’s state. 3.1 Architecture Information Acquisition. The estimation of the current driver condition is based on ordinary vehicle steering data. Changes of a driver’s state are noticeable in his manner of steering. Essential is the fact if the driver is actually impaired and to what extent this disturbance is currently affecting his driving performance. The availability of an inter-vehicle communication infrastructure is assumed for more information about the present traffic state. The structural conditions are given by the driving simulator. Modular Design. Base of the entire system is the software agent platform Java Agent Development Framework JADE [2]. This provides basic agent and behavior structures, communication functions, management tools and a respective agent runtime environment. The information processing is done by various agents. Functional Structure. The entire system is divided in several functional parts primary in the form of software agents. The system comprises a total of seven agents, each of them assigned with different behaviors. Five agents are responsible for the processing and analysis of the raw data with regard to following scopes: Drive Behavior: This agent continuously gathers all relevant steering data and models the driver’s present steering style, primarily longitudinal steering. Significant unusual variations in the manner of driving are detected and reported to other relevant agents as an indicator for impairment. Driver Type: This agent rates the driver’s steering manner, and classifies the driver in three different groups according to his headway distance and approaching behavior (careful, normal, and aggressive). All agent units are controlled and supervised by a Chief Executive Agent (CEA). This agent is the last instance of the entire system. All information provided by the processing agents is transmitted to the CEA for the final decision. The last agent presents the user interface agent. This includes the timing as well as the scheduling to an appropriate displaying area. The warning is implemented using the known metaphor of a traffic light (i.e. a red, orange or green light; depending on the current risk level) in the central information display. A further essential component of the system is the database in which all raw information and results from agents are stored. 3.2 Drive Behavior Modeling The relation between traffic situation and the driver’s operations is an essential knowledge to determine whether the driver is currently deviating from his usual driving manner or not. Generally, human processes information in the form of vague statement, e.g., the distance is too short. He decides his further acts according to his knowledge and experience, such as If the headway vehicle is closing too fast, I should slow down. Such uncertain mind processes of a human driver are represented by fuzzy inference systems (FIS). The implementation is based to the approach introduced in [8]. For the detection of a driver’s impairments, especially due to additional tasks
Agent-Based Driver Abnormality Estimation
193
during driving, the agents primarily focus on the longitudinal steering behavior during car-following and lane-keeping. Reason for this is the assumption that a driver tends to counter additional loads and would not increase it further by changing lanes. The longitudinal steering model is separated in two parts, each represented by an adaptive FIS. Both rule bases are formed as follows: :
…
1, … ,
where k and n is the number of rules and input variables. The fuzzy set Fil for input variable x is given by following Gaussian membership function: μ
exp
where σil is the standard deviation and cil is the mean. To simplify the optimization process, singleton membership functions are used for the output. The output value v is computed according to the center of singletons method (COS/COGS) where sGl is the position of singleton Gl: ∑
∏ ∑
μ ∏
· μ
Each output membership function is optimized at run time by applying gradient descent algorithm with a small learning rate η. ∆
η
∂E ∂s
E=[vact−v] is the model error determined by comparison of the model output v and actual value vact. The optimization is paused if E falls below a certain threshold. Following sub behaviors are modeled by the two fuzzy inference systems: Headway control behavior: Primary aim of steady car-following is to maintain an adequate distance to the headway vehicle. The distance is usually decided by the driver with regard to the current speed and his own understanding of adequate. The first FIS reflects this assumption by using the current speed v of the ego vehicle to estimate the steady-state distance de. The agent determines the estimation error by comparing the estimated value with the actual distance d. The error Ed=d−de is interpreted as the deviation of the regular headway control manner. Furthermore, the error ed is used by the agent to calibrate the output membership functions of the FIS. Therefore, the agent observes the incoming velocity and distance measures. Acceleration behavior: During car-following, a driver generally controls his car depending on his sense of distance to the vehicle ahead d, his own speed v and the relative speed vrel=vhw−v. The driver tries to adapt to the situation by accelerating or braking. In cases of no external influences, e.g. if there is no vehicle ahead, a person normally drives only according to a self-chosen speed vdes. This is modeled by the second FIS. The estimated acceleration ae is described as a function of d, v, vrel and vdes. The error Ea=a−ae is calculated and used for model calibration. Here, vdes is an average value based on empirical data from previous periods of the run.
194
T. Poitschke, F. Laquai, and G. Rigoll
Fig. 1. Behavior models; the left part shows the headway control behavior model, and the right part shows the acceleration behavior model
3.3 Drive Abnormality Rating Through the modeling process described above, the software agent gains two essential information from the driver. By using small learning rates (here, values η≤0.5), the agent is able to determine a driver’s mean longitudinal steering behavior of a certain period. With the comparison of the mean behavior and the actual measures, the system acquires the present deviation. Since the human is not a high-precision sensor and actuator, there will be always some deviation. The variations in the steering actions could be treated as noise for the modeling process. However, exactly these variations characterize a driver’s natural steering manner. Basically, the extent of those variations reflects a driver’s capability to maintain steady driving. A driver’s performance changes due to physical and mental influences. Especially in case of influences due to additional tasks, the effects become manifest in increased variations of velocity, distance and acceleration. Hence, changes of a driver’s regular steering variations denote also changes of its state. The agent models the natural steering deviation by accumulating all estimation errors of a certain period, in which the driver is following a car without any disturbance and steady driving is given. From these data, a corresponding frequency distribution is generated for each FIS. For determining the level of performance deviation and impairment, the responsible agent uses an additional FIS which is tuned periodically according to the distribution. By comparing the actual deviation with attributes of the distributions, this rates the current longitudinal steering behavior in the range between no and extreme abnormality. Therefore, the inputs of the system are the scaled distance estimation error ed=(d−de)/de and scaled acceleration estimation error e=(a−ae)/ae. The input spaces are partitioned as shown in fig. 2. The partition parameters are given by the error frequency distribution and some constraints of the ego vehicle. The value ed,P0 represents the positive border of the driver’s regular headway control deviation. It is defined by the interval [0,ed,P0] of the corresponding distribution in which 90% of the positive smallest values are located, provided that the limits 0.05≤ed,P0≤ 0.20 will not exceed otherwise the respective limit is used. The positive space boundary ed,P2=1.0 corresponds to double regular headway range. ed,P1 is the midpoint between ed,P0 and ed,P2. In contrast, ed,N0 is the negative border of driver’s regular headway control deviation and is defined in a similar way as ed,P0 but by the negative half of the distribution and with −0.15 ≤ed,N0≤−0.05 as limitation. The negative space boundary describes the error in the critical distance at which immediate full braking is necessary to avoid collision and is defined as follows 1 ,
,
2·
,
Agent-Based Driver Abnormality Estimation
195
where vavg and de,avg are the average velocity and the average distance estimate in the regarded period of the run.
Fig. 2. Abnormality rating. The left part of the figure shows the partition of the input spaces and the right part shows the partition of the output space.
The value amax− corresponds to the vehicle’s maximum brake power. ed,N1 is the midpoint between ed,N0 and ed,N2. The range of the driver’s regular acceleration deviation is defined by ea,N0 and ea,P0. Both values are determined similar like ed,P0 but without any limitation. The values ea,N1 and ea,P1 correspond to the driver’s average maximum braking and acceleration. ea,N2 is the midpoint between ea,N1 and the error value corresponding to the maximum brake power amax−. The value ea,P2 is the midpoint between ea,P1 and the error value corresponding to the maximum acceleration amax+. The fixed partition of the output space can be seen in fig. 2. 3.4 ChiefExecutiveAgent The ChiefExecutiveAgent (CEA) evaluates the results of all the other agents. According to the linguistic rules, the CEA calculates the risk levels for the longitudinal and lateral steering behavior and displays them in the user interface. The assistance functionality is designed in a way that at the end of the entire processing the output recommendation bases solely on the recommended acceleration from the CEA. All other relevant information has already been taken into account in advance. In critical situations, this acceleration amount is unusually large.
4 System Evaluation The evaluation of the system was carried out in a driving simulator experiment. The simulator is equipped with several freely configurable displays. The simulation software platform is based on computer game Unreal Tournament 2004 [13]. A total of 20 test persons (TP) participated. The average age was 25 years. About 80% of the TPs rated themselves at least as experienced drivers. All TPs were very interested in technical innovations, and the majority favors a discharge of the driver by assistance systems.
196
T. Poitschke, F. Laquai, and G. Rigoll
The test course was composed of a freeway section and an urban section. As the freeway section possesses mainly straight road segments with two lanes per direction, the urban section contained straight and crossroad segments with one lane. External vehicles (EV) are positioned on the course and move on ideal trajectories. Their speed was mostly constant and set to a low value to provoke the TPs to overtake. Also, their speed varies only at several periods to induce desired traffic situations. At the start, the TP is instructed to follow an EV on the freeway. The range-clearance and speed are chosen by the TP. At a certain period, variations of the headway vehicle’ driving velocity are abruptly induced to observe the TP’s regular performance. Such scenario is repeated later while the driver is performing auxiliary tasks. The driver is instructed to enter a navigation destination. Afterwards, the test person is allowed to drive freely. The whole sequence is repeated in the urban section. 4.1 Experimental Results: Abnormality Rating During the experiment, it could be observed that the used modeling procedure can sufficiently approximate the TP’s mean longitudinal steering behavior. The usage of the abnormality rating system could indicate whether a TP was changing his driving style. Especially high abnormality values of long duration occurred while the driver was performing an auxiliary task. The detected periods of abnormality were mainly concordant with the subjective impressions of the TPs. Fig. 3 shows an output sequence. The upper part shows the distance between the ego and headway vehicle.
Fig. 3. Example sequence: Abnormality rating raw outputs
Agent-Based Driver Abnormality Estimation
197
The progress of acceleration is shown in the mid part. The blue and green graphs represent the actual and estimated mean steering values. The plot at the bottom shows the trend of abnormality. Fig. 3 shows raw output values which are further processed before they are used for any decisions. The sequences comprise three periods. The first part represents the car-following in which the TP approached and followed an EV with a self-chosen range-clearance. In this period, the agent analyzed the incoming data and learned the regular steering manner. In the second phase, the TP was instructed to input a navigation destination while following the headway vehicle. This period is marked by a red box in the abnormality part. Afterwards, the TP was allowed to drive as desired. Here, a higher and denser abnormality signal is conspicuous while performing the auxiliary task. This was caused by an increased variation in the TP’s steering manner compared to the previous periods. This phenomenon could be seen in 70% of all valid experimental runs. It seemed that most TPs could not maintain their regular driving manner due to the auxiliary task. The reactions of the system in the last phase occurred due to the fact that nearly all TPs were driving more aggressive and overtook every EV. This is noticeable through the sawtooth-similar progress at the end of the distance plot. The corresponding abnormality results are caused by the abrupt changes of headway distance and regularly filtered out in the next processing level. Short abnormality signals are also oppressed in the case of an EV enters or leaves the predefined sensory region.
5 Outlook Currently we are working on the expanding of the presented setup. Therefore, we plan the integration of physiological parameters to further tune the single agents by additional parameters like visual attention, mental state, etc. Also, we are working on the integration of the presented framework into real interaction concepts to further gain from the findings in this contribution.
References 1. Alm, H., Nilsson, L.: Changes in driver behaviour as a function of handsfree mobile phones - a simulator study. Accident Analysis and Prevention 26, 441–451 (1994) 2. Bellifemine, F., Caire, G., Trucco, T., Rimassa, G.: JADE programmer’s guide (June 2007) 3. Boer, E., Rakauskas, M., Ward, N., Goodrich, M.: Steering entropy revisited. In: Proceedings of the 3rd International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, pp. 25–32 (2005) 4. Brookhuis, K., Vries, G.D., Waard, D.D.: The effects of mobile telephoning on driving performance. Accident Analysis and Prevention 23, 309–316 (1991) 5. Dragutinovic, N., Twisk, D.: Use of mobile phones while driving - effects on road safety. Technical Report SWOV report R-2005-12, SWOV Institute for Road Safety Research, The Netherlands (2006) 6. Forsman, A., Nilsson, L., Törnos, J., Östlund, J.: Effects of cognitive and visual load in real and simulated driving. Technical Report VTI report 533A, VTI Swedish National Road and Transport Research Institute (2006)
198
T. Poitschke, F. Laquai, and G. Rigoll
7. Jamson, A.H., Merat, N.: Surrogate in-vehicle information systems and driver behaviour: Effects of visual and cognitive load in simulated rural driving. Transport Research Part F: Traffic Psychology and Behaviour 8, 79–96 (2005) 8. Kamal, M., Kawabe, T., Murata, J., Mukai, M.: Driver-adaptive assist system for avoiding abnormality in driving. IEEE Transactions on Control Applications, 1247–1252 (2007) 9. Kopf, M.: Was nützt es dem Fahrer, wenn Fahrerinformations- und -assistenzsysteme etwas über ihn wissen? In: Fahrerassistenzsysteme mit maschineller Wahrnehmung, pp. 117–139. Springer, Heidelberg (2005) 10. Macdonald, W., Hoffmann, E.: Review of relationships between steering wheel reversal rate and driving task demand. Human Factors 22, 733–739 (1980) 11. Nakayama, O., Futami, T., Nakamura, T., Boer, E.: Development of a steering entropy method for evaluating driver workload. Society of Automotive Engineers Technical Paper Series: 1999-01-0892 (1999) 12. Poitschke, T., Ablassmeier, M., Reifinger, S., Rigoll, G.: Multifunctional VR-Simulator Platform for the Evaluation of Automotive User Interfaces. In: Proceedings of 12th International Conference on Human-Computer Interaction HCI Interantional 2007, Beijing, P.R. China (2007) 13. Santana-Diaz, A., Hernandez-Gress, N., Esteve, D., Jammes, B.: Discriminating sensors for driver’s impairment detection. In: 1st Annual International IEEE-EMBS Special Topic Conference on Microtechnologies in Medicine & Biology (2000) 14. Tornros, J., Bolling, A.: Mobile phone use - effects of handheld and hands-free phones on driving performance. Accident Analysis and Prevention 37, 902–909 (2005) 15. Verwey, W., Veltman, J.: Detecting short periods of elevated workload: A comparison of nine workload assessment techniques. Journal of experimental psychology: Applied 2, 270–285 (1996) 16. Zylstra, B., Tsimhoni, O., Green, P., Mayer, K.: Driving performance for dialing, radio tuning, and destination entry while driving straight roads. Technical Report Technical Report UMTRI-2003-35. The University of Michigan Transportation Research Institute, Ann Arbor, MI (2003)
Enhancing the Accessibility of Maps with Personal Frames of Reference Falko Schmid Transregional Collaborative Research Center SFB/TR 8 Spatial Cognition, University of Bremen, P.O. Box 330 440, 28334 Bremen, Germany
[email protected]
Abstract. The visualization of geographic information requires large displays. Even large screens can be insufficient to visualize e.g. a long route in a scale, such that all decisive elements (like streets and turns) and their spatial context can be shown and understood at once. This is critical as the visualization of spatial data is currently migrating to mobile devices with small displays. Knowledge based maps, such as Maps are a key to the visual compression of geographic information: those parts of the environment which are familiar to a user are compressed while the unfamiliar parts are displayed in full detail. As a result Maps consist of elements of two different frames of reference: a personal and a geographic frame of reference. In this paper we argue for the integration personally meaningful places in Maps. Their role is to clarify the spatial context without increasing the visual representation and they serve as an experienced based key to different scales (the compressed and uncompressed parts of the environment) of Maps.
1
Motivation
The visualization of complex geographic information is resource intense as it requires large display areas. In the wayfinding domain, even large screens can be insufficient to visualize a route in a scale, such that all decisive elements can be shown and understood at once. Internet based route planners typically choose a scale to display displaying the complete route at once. This practice entails significant interaction: users have to zoom in and out to understand the details of the course to follow. Beside inconvenience, [1] recently showed that the understanding of fragmented maps leads to corrupt spatial knowledge; zooming in and out of parts of the route only offers a certain view and results in fragmented mental processing and disturbed compilation. This is increasingly critical as the visualization of spatial data is currently migrating to mobile devices with small displays and limited interaction possibilities. I.e., in order to limit fragmentation and interaction, we have to develop new visualization methods for geographic information on mobile devices. [2] postulates task and context depended maps since general maps contain too much information. However, not many approaches have been proposed for the wayfinding task. [3] proposed an early turn-by-turn directions approach: they do not depict the whole route, but only the crucial J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 199–210, 2009. Springer-Verlag Berlin Heidelberg 2009
200
F. Schmid
steps. [4] propose Halo, a method to integrate remote locations in maps of partial views. By means of rings having their center at the remote location they point to, Halo preserves a sense of spatial context. However, Halo cannot adapt the visualization of complex spatial data to a small screen. In [5] the authors propose a fish-eye based map transformation: the area of interest is in the center of the fish-eye an the context is in the surrounding. Depending on the scale of the surrounding and the curvature of the lens function. The interaction with route information is still problematic, as the environment is constantly transformed and the single views are always integrated in a different environment. In [6] the authors demonstrate a method to visually compress routes by schematizing parts where no activity is required (like long parts on a highway). This effective idea only works on linear information (it shortens or stretches links), but does not integrate spatial context beyond the route. 1.1
Why Maps at All?
Turn-by-turn assistance challenges maps as wayfinding aids: why should one still use a complex representation to extract a rather small amount of information? The strongest argument is the fact that users of GPS based turn-by-turn systems do not learn the environment properly ([7, 8, e.g.]). Studies showed that users of turn-by-turn instructions made more stops than map users and direct-experience participants, made larger direction estimation errors, and drew sketch maps with poorer topological accuracy. These are strong indicators that people do not learn the environment properly and seem not to trust the assistance. We are currently at the edge of a technological evolution and can observe a significant change in how people access geographic information: cars are delivered with build in navigation devices, geographic information is accessed via Internet services. So far it is unclear how a possible life-long learning of the environment with rather context-free representations will affect the formation of a mental map. The so far available results suggest poor individual mental representations. 1.2
The Visual Compression of Geographic Information
Independent from the type of spatial assistance we use, the support of the cognitive processing of the required information has to be a priority; this is the key to understand and learn our world. The ideal spatial representation is one that reduces the cognitive efforts to an minimum, but still enables the understanding of all information necessary to solve a task (e.g. wayfinding). I.e., when we cope with small screens, we have to visually compress the information, but at the same time preserve the semantic accessibility. However, due to manifold topological and conceptual constraints, the algorithmic transformation of geographic data is a hard task; we have to preserve the consistency of all constraints between all visual elements. E.g., straightening a curvy road might disturb topological relations of other entities (e.g. wrong placements of buildings afterwards). Furthermore, a transformation does not automatically guarantee visual compression
Enhancing the Accessibility of Maps with Personal Frames of Reference
201
- this can only be achieved by task specific maps: only the context dependent selection of features and minimization of constraints allows the effective reduction of the size of a representation. 1.3
Personalized Maps
One possible solution are personalized maps like Maps [9]. By analyzing movements of users (with GPS), a spatial user profile is compiled. This profile consists of the places and paths a user regularly visits [10]. The profile is used to compute routes along personally meaningful places and paths. Maps then compress the familiar parts (FP) of the route and highlight unfamiliar parts (UP), see Figure 1. The results are visually compressed maps, which are qualified for mobile devices [9]. Depending on the configuration of FP and UP of the route, Maps can achieve very effective visual compression rates. Due to the encoded individual knowledge, Maps still provide full semantic accessibility. Maps are furthermore a constructive link between turn-by-turn assistance and map-based assistance: spatial learning is supported by relating new places to existing knowledge and makes future assistance dispensable. At the same time it does not only provide route knowledge, but offers full spatial configuration of FP and UP of the environment. However, the reduced representation of Maps requires the clarification of the spatial embedding of the route to anchor a map unambiguously within the environment. The key is the addressing of the intrinsic personal frame of reference of the familiar parts of a route: personally meaningful places (e.g., ”home”, ”work”, ”friend’s place”, etc.): These places also serve as a cognitive decompression code for the minimized familiar parts of the environment; they
a)
b)
Fig. 1. Generating a Map: a) depicts the original map annotated with prior knowledge (bold magenta lines), the shortest path from S to D, and the path across previous knowledge. b) shows the corresponding Map: the path across the prior knowledge is schematized and minimized, the unfamiliar parts are visualized in detail. Note the different space requirements of a) and b).
202
F. Schmid
are the key to understand the varying scales and frames of reference of Maps, and allow to anchor Maps correctly within the real environment. In the following we call the elements of the UP to be part of the geographic frame of reference and the elements of the FP to be part of the personal frame of reference.
2
Place Selection and Visualization
Maps relate a significant part of a route to existing knowledge, but they still need the clarification of the relation between the FP and the UP of the route. A route across familiar environments does not automatically guarantee the recognition of the course and the scale: it is extracted, schematized, minimized, and does so far not contain contextual information. Additionally, the user might not have traveled the selected route in the proposed sequence before. [11] showed that people rely on relative distance information when they learn places. They are able to find a place even if the distances between landmarks that are related to a place are altered. I.e., if we preserve the relative distance between places, users are able to decode the course and scale of the familiar part. If we assume places to be anchor points, thus individually meaningful landmarks ([12, e.g.]), we can utilize them as self-contained frames of references. Due to the spatial meaning of a place (a user knows how it is spatially related to the surrounding environment and to other familiar places), a pair of familiar places along a route is sufficient to clarify their mutual spatial relations and those between the FP and UP (they are relative to the familiar places and constrained by their sequence enforced by the route). 2.1
Spatial Disambiguation: Selection of Suitable Place References
We now have a look at the selection of suitable places for a given route. We are interested in places that do not (significantly) increase the size of a Map and are at the same time meaningful. Place can be located on the route (on-route places) or they can be located near the route and are connected via paths across the FP. We call these places remote places and their links to the route branchoffs. If we select places located on the route we do not need to add pointers to remote places, which potentially increase the size of the representation (see c) in Figure 5). Places are meaningful for the specific route when they clarify the embedding of a particular route within the environment, and when they clarify the course of the route across the FP. In the following we describe the algorithm to identify suitable places for a FP. 1. In the first step we segment the FP (see illustration I in Figure 2) of a route into n parts and compile all places located on the route. See illustration II and III in Figure 2 for details. The selection of n has great influence on the resulting size of a map, the more segments we create, the more places we have to visualize (see Figure 5). However, as places serve as cognitive decompression codes, we have to identify a reasonable amount of segments for a route.
Enhancing the Accessibility of Maps with Personal Frames of Reference
203
Fig. 2. Route segmenting and place selection: The black lines illustrate a familiar part of the route (red and black) with the entrance and exit points E1 , E2 , see illustration I. II shows only the familiar part with identified places: P1 , P2 , P6 , P7 are located on the route, P3 , P4 , P5 are located near the road in the familiar environment. The off-route places are linked on the route by means of their branch-off points in the street network (dashed gray circles). Illustration III shows the integration of the remote places in the route for the place selection algorithm.
2. For each segment we check if there is a place located on the route. If this is not the case, we check at every branch along the route if there is a branch-off in a familiar environment. (a) For every familiar branch, we follow this path and every further branchoff as long as we reach the closest remote place. We mark every traversed edge and place as visited to avoid loops and multiple selections of one place from different contexts. Illustration II in Figure 2 shows the selection of remote places for the second segment of the FP. We do not select the same place as a reference for different FPs or for different segments in one FP, as it can entail representational conflicts (see Figure 3). (b) If we identified a place, we insert the branching point as dedicated places in the FP (see Illustration III in Figure 2).
a)
b)
Fig. 3. Conflict due to the selection of the same remote place: the familiar parts of the route can have individual schematizations and minimizations, the pointers to the same place (see a)) can be conflicting and contradicting afterwards, see b)
204
F. Schmid
Fig. 4. The place selection process: I shows the initial situation withe the places P2 , P6 at significant locations. II shows the segmentation of the route, and III the result of the selection process.
3. In this step we select places according to their significance for a segment: (a) If there are places located on the route and at a significant location (a decision point), we select it to clarify the required action (see c) in Figure 5 for an example). If there are equal choices we select the place with the highest familiarity measure. If there are still equal choices we select the place which is located most central. If there is only one place on the route we select it, independent from the significance. (b) For the segments with no place at a significant location, if there are n ≥ 1 places located on-route in the segment, we select places according to an even distribution amongst the neighboring segments: we select the first pair of subsequent segments Si , Si+1 and the respective place candidates S S P1Si , ..., PnSi and P1 i+1 , , ..., Pmi+1 (see illustration III in Figure 2). Places at significant locations are treated as fixed points. We treat them just as the entrance and exit points E1 , E2 , which are naturally fixed points (they are the transition between the geographic and the personal frame of reference). To optimize the distribution of places we apply following distance maximization: x1 = max(dist(E1 , PiS1 )) 1
Si xi = max(dist(Pi−1 , PiSk )) i
xn+1 = dist(xn , E2 )
n>i>1 (1)
Places are under this condition selected when they maximize the distance to the previous and the subsequent place. Figure 4 illustrates the algorithm: illustration I is the initial situation - a FP and the elements E1 , E2 and the places P1 , ..., P8 . P2 , P6 are at significant locations and considered as fixed places. In illustration II we can see the segmentation of the FP into three parts. The algorithm now selects the fixed places P2 , P6 as representatives for the first and the third segment, only the middle segment has a choice of optimizable places. The algorithm maximizes the distance between
Enhancing the Accessibility of Maps with Personal Frames of Reference
205
P2 and P3 , P4 , P5 and betweenP6 . In this case P4 is selected (see illustration III in Figure 4).
3
Visualization
Maps are visual representations of the environment, i.e. we need to visualize the personal frame of reference defined by the selected places. Maps are intended to support the wayfinding process dynamically, i.e. they have to cover typical requirements of wayfinding assistance during all phases: the communication of survey knowledge and the support during navigation. To support cognitively adequate, we require specialized representations reflecting the task with matching visualizations ([13, e.g.]). This does not only hold for principle configurational issues, but also for the incorporated symbols. Entities on maps should either follow a cartographic convention or in case of non-existence new cartographic symbols have to be created ([14, e.g.]). Up to the knowledge of the author, there are no available symbols for personally meaningful places and pointers to them. It is beyond the scope of this work, to analyze the requirements of these new kind of visual elements. We decided to use a straightforward visualization: in our examples and illustrations we will depict places as circles (illustrations) and solid dots (generated maps) and the pointers to them as lines. 3.1
Visualization of Places on the Route
The course of the FP of the route is schematized by means of the discrete curve evolution (DCE), see [15]. DCE simplifies the geometry by successively removing geometric control points from the shape information. Applying the DCE without explicitly considering the places, the coordinates of the places are no longer guaranteed to be located on the course of the route. I.e., we have to compute the schematization of FP differently; the schematization has to consider and preserve the position of places as the route is described in relation to them. In the following algorithm we sketch the positioning of places (and branches to remote places) on a schematized path: 1. In the first step we segment the route at the points where the selected places (or the branching points) are located. Illustration I in Figure 6 shows the initial situation. Illustration II depicts the segmentation of the route at the places P1 , P2 , P3 into self-contained units. 2. In the second step, we schematize each segment by means of the DCE (see [15]). This will transform the places into fixed points of the curve and are not removed by the DCE. This step is important as we do not consider any other constraints, required by the DCE to declare fixed points. 3. In the third step we compile all segments again to one coherent FP. This can be done straightforwardly, as the positions of the contact points (places) are not altered in each segment (see Illustration III in Figure 6).
206
F. Schmid
a)
b)
c)
d)
Fig. 5. Selecting places. a) The map of Figure 1 with the places 1, 2, 3 (bold black dots). Note the different schematization of the FP in b), c), d) due to the integration of places. b) the FP is only one single segment: place 1 is selected, as it is on-route. c) FP consists of two segments: place 1 (first segment), and place 2 (second segment) is selected. Place 2 branches off at a significant location. d, FP consists of three segments: all places are selected (each is within one of the three segments). Note the different compression rates: b) is the most compact map as it utilizes the on-route place 1. c, requires more space as it points to place 2 (although FP is compressed with the same ratio as b)). d) is significantly larger, because place 3 would intersect the unfamiliar part of the map on the bottom if we would apply the same minimization as in b) and c). This illustrates the effect of local rendering constraints on map compression (see Section 3.2).
3.2
Visualization of Branch-Off Places
The question now is how we can visualize places which are not located on the route. In this case we need the differentiation between the two basic assistance types: communication of veridical survey knowledge and navigation support. In the following, we will differentiate between the two scenarios and show some examples for respective Maps. Furthermore we have to propagate new local visualization constraints to the global map rendering.
Enhancing the Accessibility of Maps with Personal Frames of Reference
207
Fig. 6. Schematization with places as fixed points: illustration I shows the initial situation, II the segmentation with the places as start and endpoints of the segments, III the result of the schematization and compilation
Reference Frame Visualization for Survey Maps. Survey maps are means to visualize the embedding of the route within the environment in a geographic veridical manner. I.e., the real geographic relations amongst the elements of the route, and between the route and the surrounding environment have to be represented according to a allocentric (geographic) frame of reference. Survey maps are intended to communicate overview information for a certain route. However, in Maps, the familiar part of the route is always schematized and minimized (as otherwise no compression could be achieved), but the configuration of all elements is not altered. The schematization of the known paths works as described in Section 3.1: the places (and the branches to remote places) serve as constrained supporting points of the familiar part of the route. The crucial step for the veridical visualization of remote places are the paths to them: we depict the path within the familiar environment with the same degree of schematization and minimization as the route starting at the branching point at the route and ending at the configurable street network depth k, which is the number of expanded vertices from the branching point towards the place (see place 2 in Figure 5). Reference Frame Visualization for Navigation Maps. Navigation maps are intended to support the wayfinder during the wayfinding process. As discussed in [9], the maps follow the egocentric, bottom-up approach of mobile wayfinding maps: the part of the route which is ”in-front” of the wayfinder (in terms of travel direction), is at the top of the display, the remaining parts at the bottom. A number of studies showed that people encode turning actions usually as 90 degree angles ([16, 17, e.g.]). The mental representation of turning actions are termed wayfinding choremes (see [17] and Figure 7 for an illustration). Branchings to remote places are, due to the egocentric and direct experience in the real environment mentally encoded as wayfinding choremes [17]. For this reason we depict the branch to the remote place by means of a choreme. We replace the real angle α with the angle α of the respective choreme. However, as the spatial configuration at the particular point can be complex, the choreme holds between the segment of the route before the branch and the branch in travel direction (see Figure 7). This reflects the perception and the expectation of the wayfinder in the FP.
208
F. Schmid
a)
b)
Fig. 7. The chorematization of places for the navigation perspective: a) depicts the set of wayfinding choremes. b) depicts a turn at a place within the FP (left is the initial configuration), on the right we see the navigation perspective of the intersection. The intersection is rotated in travel direction and the angle α is replaced by the angle α of the corresponding wayfinding choreme.
a)
b)
c)
Fig. 8. Communication of local rendering constraints to the global minimization procedure: a) depicts the global minimization distance h. b) illustrates the minimization constraints of the visual elements of the FP, it is not possible to apply the global minimization factor to the FP. In c) we see the global minimization based on the local minimal distance k. See also d) in Figure 5 for an example.
Communicating Local Rendering Constraints for Global Rendering. Maps minimize the familiar part of the route by moving the closest points of the convex hulls of the unfamiliar environment Ui , Ui+1 towards each other; so far the distance-to-keep was determined by a threshold h (see Figure 8). Now, with the integration of places, we have additional visualization constraints: a visual intersection of the used symbols has to be avoided, thus a distance threshold k between all elements has to be preserved. We can resolve the constraints by following procedure: 1. In the first step we determine the global minimization factor min(h) for the FP between Ui , Ui+1 , such that dist(Ui , Ui+1 ) = h. 2. In the second step, we determine the closest pair of elements by means of the euclidean distance (in Figure 8 it is E1 , P1 ). 3. We then compute the minimization factor min(k) for the familiar part, such that dist(E1 , P1 ) = k. 4. If min(k) ≥ min(h), we apply min(h) to the familiar part, min(k) otherwise.
Enhancing the Accessibility of Maps with Personal Frames of Reference
4
209
Conclusions
Maps are personalized wayfinding maps for devices with small displays like mobile phones. By means of relating a route to familiar parts of the environment, Maps can achieve significant visual compression rates by at the same time preserving the individual accessibility. The clarification of the embedding in the environment is based on the integration of a personal frame of reference, the places and paths a users usually visits and travels. However, due to the schematization of the familiar parts of a route, the integration of personally meaningful places require basic considerations about the selection of places, as well about their visualization within Maps. The selection process for places is based on three considerations: structural significance, segmentation and distribution, and minimalistic visual appearance. The visualization considers the support of two basic requirements for wayfinding maps: the communication of geographic veridical survey knowledge and navigation support. We introduced the selection algorithm, as well as the visualization primitives for both map use conditions. Additionally we discussed the requirements to communicate the additional rendering constraints for integrated places and how we can resolve the conflict between local and global minimization attempts.
Acknowledgments This work has been supported by the Transregional Collaborative Research Center SFB/TR 8 Spatial Cognition, which is funded by the Deutsche Forschungsgemeinschaft (DFG).
References [1] Dillemuth, J.: Spatial cognition with small-display maps: Does the sum of the parts equal the whole? In: Association of American Geographers Annual Meeting, Boston (April 2008) [2] Reichenbacher, T.: Mobile Cartography Adaptive Visualization of Geographic Information on Mobile Devices. PhD thesis, University of Munich, Institute of Photogrammetry and Cartography, Munich, Germany (2004) [3] Rist, T., Brandmeier, P.: Customizing graphics for tiny displays of mobile devices. Personal Ubiquitous Computation 6(4), 260–268 (2002) [4] Baudisch, P., Rosenholtz, R.: Halo: a technique for visualizing off-screen objects. In: CHI 2003: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 481–488. ACM, New York (2003) [5] Harrie, L., Sarjakoski, L.T., Lehto, L.: A variable-scale map for small-display cartography. In: Proceedings of the Joint International Symposium on GeoSpatial Theory: Processing and Applications, Ottawa, Canada, July 8-12 (2002) [6] Agrawala, M., Stolte, C.: Rendering effective route maps: improving usability through generalization. In: SIGGRAPH, pp. 241–249 (2001) [7] Parush, A., Ahuvia, S., Erev, I.: Degradation in spatial knowledge acquisition when using automatic navigation systems. In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 238–254. Springer, Heidelberg (2007)
210
F. Schmid
[8] Ishikawa, T., Fujiwara, H., Imai, O., Okabe, A.: Wayfinding with a gps-based mobile navigation system: A comparison with maps and direct experience. Journal of Environmental Psychology 28(1), 74–82 (2008) [9] Schmid, F.: Knowledge based wayfinding maps for small display cartography. Journal of Location Based Systems 2(1), 57–83 (2008) [10] Schmid, F., Richter, K.F.: Extracting places from location data streams. In: UbiGIS 2006 - Second International Workshop on Ubiquitous Geographical Information Services (2006) [11] Waller, D., Loomis, J.M., Golledge, R.G., Beall, A.C.: Place learning in humans: The role of distance and direction information. Spatial Cognition and Computation 2(4), 333–354 (2001) [12] Couclelis, H., Golledge, R.G., Gale, N., Tobler, W.: Exploring the anchor-point hypothesis of spatial cognition. Journal of Environmental Psychology 7(2), 99–122 (1987) [13] Klippel, A., Richter, K.F., Barkowsky, T., Freksa, C.: The cognitive reality of schematic maps. In: Meng, L., Zipf, A., Reichenbacher, T. (eds.) Map-based Mobile Services - Theories, Methods and Implementations, pp. 57–74. Springer, Berlin (2005) [14] MacEachren, A.M.: How maps work: representation, visualization, and design. Guilford Press, New York (1995) [15] Barkowsky, T., Latecki, L.J., Richter, K.F.: Schematizing maps: Simplification of geographic shape by discrete curve evolution. In: Freksa, C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition II - Integrating abstract theories, empirical studies, formal models, and practical applications, pp. 41–53. Springer, Berlin (2000) [16] Tversky, B., Lee, P.U.: Pictorial and verbal tools for conveying routes. In: Freksa, C., Mark, D.M. (eds.) COSIT 1999. LNCS, vol. 1661, pp. 51–64. Springer, Heidelberg (1999) [17] Klippel, A.: Wayfinding choremes. In: Kuhn, W., Worboys, M.F., Timpf, S. (eds.) COSIT 2003. LNCS, vol. 2825, pp. 320–334. Springer, Heidelberg (2003)
Augmented Interaction and Visualization in the Automotive Domain Roland Spies1, Markus Ablaßmeier1, Heiner Bubb1, and Werner Hamberger2 1
Institute of Ergonomics, Technical University of Munich, Boltzmannstraße 15, 85747 Garching {spies,bubb}@lfe.mw.tum.de,
[email protected] 2 AUDI AG, Development HMI, 85045 Ingolstadt
[email protected]
Abstract. This paper focuses on innovative interaction and visualization strategies for the automotive domain. To keep the increasing amount of information in vehicles easily accessible and also to minimize the mental workload for the driver, sophisticated presentation and interaction techniques are essential. In this contribution a new approach for interaction the so-called augmented interaction is presented. The new idea is an intelligent combination of innovative visualization and interaction technologies to reduce the driver’s mental transfer effort that is necessary between displayed information, control movement and reality. Using contact-analog head-up displays relevant information can be presented exactly where it is needed. For control, an absolute natural and direct way of interaction is delivered by touch technologies. However, to leave the eyes on the road, the driver needs haptic feedback to handle a touchpad blindly. Therefore, the touchpad presented in this contribution, is equipped with a haptic adjustable surface. Combining both technologies delivers an absolutely innovative way for in-vehicle interaction. It enables the driver to interact in a very direct way by sensing the corresponding environment on the touchpad.
Keywords: head-up display, touch, haptic feedback, interaction, automotive, augmented reality.
1 Introduction To keep the increasing amount of information in modern vehicles easily accessible and controllable for the driver and also to minimize his mental workload, sophisticated presentation and interaction techniques are of major importance. In the car domain, error-prone situations often occur regarding the human-machine interaction with different in-car applications, as the driver often has a certain mental workload [1] by combining displayed information, interacting with different input devices and transferring it to the reality. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 211–220, 2009. © Springer-Verlag Berlin Heidelberg 2009
212
R. Spies et al.
Automobile manufacturers have already introduced a couple of innovative visualization and interaction techniques. The head-up display for example provides information according to the situation in reality like e.g. navigation information or the current vehicle speed. This visualization method enables the driver to keep his glance on the road and reduces accommodation problems. In the recent past a lot of scientific work has been done on presenting the information in a contact-analog way in the HUD [2,3,4,5,6]. A further challenge of car manufactures today is the increasing amount of comfort functions in modern vehicles e.g. navigation, media and communication systems and soon internet services. To keep all these systems controllable while driving, car producers have integrated these functions in menu-based central infotainment systems which are mostly controlled by one multifunctional input device. Currently, many different solutions of such control devices are available on the market. These solutions can be divided into two different groups: first, an integrating approach represented by touchscreens and second an approach separating display and control element, e.g. turning knobs or joysticks in the center-console. These systems are often extended by voice control and a lot of recent research publications deal with multimodal in-vehicle infotainment interaction [7,8,9]. Further research activities handle with innovative ideas for control elements [10,11], e.g. flexible touch-sensitive surfaces for in-vehicle interaction [12,13]. This contribution delivers a highly new approach for intuitive and workload-reduced in-vehicle interaction by combining innovative technologies for visualization and control. In the following chapter, the theoretical background concerning controlling and displaying menu systems in dual task situations is discussed and the required technical background is given. Afterwards, the new approach of augmented interaction is explained and a couple of demonstrative examples for future realization are given.
2 Background The following chapter reflects the relevant theoretical background to analyze the parallel tasks of driving and menu operating. Therefore, the driver as well as the vehicle are considered as parts of the human-machine control loop. This consideration should uncover the need for action. 2.1 Ergonomic Reflection of the Driving Task The driver-vehicle interaction can be described in a conventional control loop as it is shown in Fig. 1 [14]. The left side of the picture shows the task as the input of the loop. The system components are the driver and the vehicle equipped with an infotainment system. The output of the whole system is permanently verified by the driver and adjusted if necessary. In case of menu control while driving it is about a dual task situation which can cause interferences between the two parallel tasks. Inattention, distraction, and irritation occur beside mental treating internal problems
Augmented Interaction and Visualization in the Automotive Domain
213
as a consequence of the high workload resulting from a superposition of the tasks, which will become manifest in an increased error potential and in erroneous operations of the systems [15]. According to Bubb the driving task can be classified in primary, secondary and tertiary task [16]. The primary task only consists of real required driving operations. These are segmented into navigation, steering, and stabilization. Choosing the route from departure to destination corresponds to the navigation task. Steering includes, for example, lane changes due to the current traffic situation. User interaction with the car to navigate and steer is called stabilization. These tasks are essential for a safe control of the car, and therefore have highest priority while driving. Secondary tasks are operations that are not essential to keep the vehicle on the track. Examples are the turn signal, honking, and turning the headlights up and down. Tasks not concerning the actual driving itself are categorized as tertiary tasks, e.g. convenience tasks like adjusting the temperature of the air condition or communication and entertainment features.
Fig. 1. Control loop of driver vehicle interaction
While working on a desktop PC, the user can predominantly execute her or his operations in a concentrated way, as there is no dual task competition. However in the car domain, often error-prone situations occur regarding human-machine interaction with different in-car applications, as the driver often has a certain mental workload. This basic stress level is due to the execution of so-called primary and secondary tasks, and may be increased by environmental impacts, like the conversation with a co-driver. If the driver interacts, e.g. with a communication and infotainment system in such a stress phase (tertiary task), he probably will be distracted from driving. A reason for this is founded in the human information detection. The fact that the primary as well as the tertiary task share mainly the visual channel [17] (Fig. 1) leads to gaze movements away from the road and consequently traffic blindness occurs. Here it is necessary to either provide additional channels (e.g. haptic) to transfer information or to merge it with the reality to avoid gaze movements away from the driving scene.
214
R. Spies et al.
Another reason for the interferences between driving and tertiary tasks is the mental overload effected by compatibility problems between displayed information, control movements and reality [18]. The following part explains a theory of the human information processing and presents ergonomic rules to avoid such incompatibilities. 2.2 Human Information Processing During system design it is important to identify workload bottlenecks and overload. As the human operator is a central part of a human-machine system, the correction of these workload problems is necessary for the safe and efficient operation. The Multiple Resource Theory of Wickens [19] allows predicting when tasks will interfere with each other or can be performed in parallel. If the difficulty of one task increases a loss of performance of another task will be the result. Wickens describes several processing resources. All tasks can be divided into the following components (see Fig. 2): there are the encoding, central processing and responding stages. The visual and auditory components are input modalities. The central processing component describes the level of information processing required. The responding component consists of manual or vocal actions. Stages Encoding
Central processing Responding manual
vocal
es ns
verbal
po es R
spatial visual
auditory
spatial
od C es
verbal
Fig. 2. Wickens’ Multiple Resource Theory (MRT) Model [20]
Wickens postulates that using multiple information channels increases the mental capacity and reduces interference effects [19]. To avoid additional mental effort a compatible design between information visualization, control element and reality is important [21]. To adjust the user interface according to the task, the task itself has to be analyzed. The ergonomic analyses according to Bubb provide a method to identify the content of the task according to space (dimensionality) and time [22]. Menu control is a two-dimensional task, what means that in principle, movements on the surface are possible in two directions (e.g. selecting items on a display). To guarantee
Augmented Interaction and Visualization in the Automotive Domain
215
a compatible interaction, ergonomic solutions require a two-dimensional input device for such a task. Turning knobs in some current automotive solutions are onedimensional, what means that e.g. moving maps on a display is very inconvenient and requires a mental transfer effort from the user. Concerning the information presentation, ergonomic solutions also require compatibility between information content, the user’s mental model and reality. Summing up the mentioned theoretic facts above, it is possible to derive visualizing as well as control concepts. 2.3 Conclusion To come up to the claim of a two-dimensional input device which can be controlled blindly, provide additional information via another channel and can be mounted in the center console for ergonomic reachability, the following concept proposes a touchpad with a haptic adjustable surface. The idea is that every kind of structure which is displayed at a screen can be felt on the touchpad surface for orientation. Elevated elements on the touchpad (e.g. buttons) can be sensed and pressed. This input device enables a direct intuitive interaction with a menu-based system. For a direct, non-distracting information presentation this concept suggests a contact-analog head up display. This technology enables to project information in reality. The technical realization of both concepts is described bellow.
3 Technical Solutions The following chapter describes the technologies and current application fields of the preliminary considerations and consequences mentioned above. 3.1 The Contact-Analog Head Up Display As described, a very innovative display technology for cars is the head-up display (HUD). The HUD projects information directly into the driver’s visual field. HUDs were pioneered for fighter jets in the early 70s and later for low-flying military helicopter pilots, for whom information overload was a significant issue, and for whom alter their gaze to look at the aircraft's instruments could prove to be a fatal distraction. In future, HUDs are likely to become more common in vehicles. Further recent developments are done to give the driver contact-analog information. A couple of recent research work shows the potential of giving the driver contact-analog information. The spectrum consists of giving speed and distance information [23], night vision information [3] as well as contact-analog navigation information [24]. There exist several technical approaches for contact-analog HUDs. An early solution is delivered by Bubb [4], where a spatial impression is obtained by bending the display in a certain way according to the first optical illustration law, which produces a virtual display lying on the ground. Further developments given by
216
R. Spies et al.
Schneid and Bergmeier bring this principle closer to automotive capability [2,3,6]. A completely different approach is delivered e.g. by DENSO [5]. Their suggested realization is based on the effect of stereoscopy via two monocular head-up displays covering an image distance of 2m. The drawback of such a solution is that either the head has to be in a fixed position or a highly accurate eye-tracking-system has to be used, what makes this solution extremely cost-intensive. Moreover, if there is a delay in the system caused by the data transferring computing system, sickness will be the consequence. Fig. 3 shows an example of contact-analog driver information.
Fig. 3. Example for contact-analog visualization
3.2 The Haptic Touchpad Touchpads are currently used exclusively in the notebook domain to move a cursor over a display and select items. To make it useable for automotive applications the idea is to give an additional haptic structure on the surface for orientation. Some similar approaches with just a few mechanical, moveable elements or simulated vibration feedback are already published [11,12,13]. Another possibility to realize such an adjustable haptic touchpad surface is to use the so called braille technology. Fig. 4 shows a few examples of haptic displays realized in the braille sector for blind people, using piezo-activated pins.
Fig. 4. Examples for Braille displays [25,26]
Augmented Interaction and Visualization in the Automotive Domain
217
4 Augmented Interaction – A New Approach for In-Vehicle Interaction This chapter explains a new approach for in-vehicle interaction and presents demonstrative use-cases for the combination of the presented display and control techniques. 4.1 Definition of Augmented Interaction A new way of interaction in the automotive domain can be reached combing both introduced innovative technologies from chapter 3. The structured surface of the touchpad (see section 3.2) enables direct mapping of the displayed information with virtual objects represented by the contact-analog HUD (see section 3.1). The driver interacts with the touchpad by sensing the corresponding environment, activates and manipulates functions directly by pressing and moving on the sensed elevated objects. As a consequence, the mental workload can be reduced through the simple and direct cognitive mapping (see chapter 2). Real and virtual objects are fused together. This kind of interaction will be called augmented interaction. 4.2 Illustrative Use-Cases for Augmented Interaction In the following, two examples for direct augmented interaction are given to illustrate the potential benefit of the suggested concept. 4.2.1 POIs along the Route An application of navigation systems is that the driver wants to get further information about a points-of-interest (POI) in his direct surrounding area while driving. With state-of-the-art interfaces, the driver realizes an interesting building outside his vehicle then searches the corresponding POI on the digital map on his central display inside the car and finally selects and activates the information with the central control element depending on the infotainment system. The concept of augmented interaction in this contribution enables the driver to feel his surrounded environment including relevant points-of-interest on the haptic touchpad surface. The driver places his finger on the interested objects and afterwards he gets the real object highlighted contact-analog by the HUD. To avoid an information overflow just the currently touched objects are highlighted. After the relevant object is selected the driver can directly activate further information by pressing the sensed elevated element on the touchpad (Fig. 5). 4.2.2 Adaptive Cruise Control The second example described in this context stands for interaction with an adaptivecruise-control (ACC). The potential of contact-analog information concerning distance and speed control is already shown in several recent research contributions (e.g. [24]).
218
R. Spies et al.
i
i
Fig. 5. Examples for highlighting POIs along the route
Fig. 6. Examples for adjusting the distance bar
Combined with ACC the contact analog HUD can give a direct system feedback merged with the reality. The drawback of current systems is to adjust speed and distance. Currently, there are a lot of different HMI-variants available on the market. Some control elements are mounted at a drop arm; some are integrated in the steering wheel. All these solutions require a certain mental transfer effort for control and are hard to handle while driving. The augmented interaction solution here projects the environment in front of the vehicle in bird view on the touchpad surface, so that the driver can feel the distance bar with his finger and directly adjust the distance to the front vehicle by moving this bar. A direct visual feedback is given by the
Augmented Interaction and Visualization in the Automotive Domain
219
contact-analog HUD by a green break bar (Fig. 6). As a consequence the vehicle adjusts the distance to the front vehicle according to the new position of the break bar. If the user wants to chose an illegal distance the system can give information via the HUD by giving the bar a red color for example.
5 Summary and Conclusion In this contribution a challenging new way of in-vehicle interaction is presented. The so-called augmented interaction is expected to reduce the mental effort for the driver while interacting. This is reached by the intelligent combination of innovative display methods together with new control technologies. For the driver-adjusted display of information a contact-analog HUD is used to present the information directly where it is needed. For an adequate control of this information a haptic adjustable touchpad is used that maps the reality to the surface. As a result the driver can now interact directly by touching the corresponding haptic surface. To realize the presented approach of augmented interaction for vehicles a lot of further aspects have to be considered. Contact-analog HUDs are still not automotive capable respecting packaging size, sensor technologies and field-of-view. Also the haptic configurable touchpad is a very complex element and still very space and costintensive. Additionally, a lot of effort has to be invested to map the reality to the haptic surface and to realize the presented use-cases. Therefore, elaborate computer processing is affordable. After the prototype is finished studies are necessary to evaluate this approach according to driving scenarios and prove the presented theoretical benefits of the combination of these new technologies.
References 1. Praxenthaler, M.: Experimentelle Untersuchung zur Ablenkungswirkung von Sekundäraufgaben während zeitkritischer Fahrsituationen, Dissertation, Universität Regensburg (2003) 2. Bergmeier, U.: Methode zur kontaktanalogen Visualisierung von Fahrerassistenzinformationen unter automotive-tauglichen Gesichtspunkten. In: Produktund Produktions- Ergonomie – Aufgabe für Entwickler und Planer. Kongress der Gesellschaft für Arbeitswissenschaft, vol. 54, pp. 125–128. GfA Press, München (2008) 3. Bergmeier, U., Bubb, H.: Augmented Reality in vehicle – technical realization of a contact analogue head-up display under automotive capable aspects; usefulness exemplified through night vision systems. FISITA World Automotive Congress (F2008-02-043), Munich (2008) 4. Bubb, H.: Untersuchung über die Anzeige des Bremsweges im Kraftfahrzeug, BMVg – FBWT 76-7, pp. 198–202 (1979) 5. Koji, N., Hiroshi, A., Nobuaki, K.: Denso Corporation, Windshield display for active safety, FISITA World Automotive Congress (F2006D105), Yokohama (2006) 6. Schneid, M.: Entwicklung und Erprobung eines kontaktanalogen Head-up-Displays im Fahrzeug, Dissertation, TU München (2009)
220
R. Spies et al.
7. Geiger, M.: Berührungslose Bedienung von Infotainment-Systemen im Fahrzeug, Dissertation, TU München (2003) 8. Mischke, M., Hamberger, W.: Multimodalität im Dualtask - eine Lösung für die Probleme der Sprachbedienung. In: Prospektive Gestaltung von Mensch-Technik-Interaktion, vol. 7. Berliner Werkstatt Mensch-Maschine-Systeme, Berlin (2007) 9. Hummel, S.: Akzeptanzentwicklung bei multimedialen Bedienkonzepten, Dissertation, TU München (2008) 10. Sendler, J.: Entwicklung und Gestaltung variabler Bedienelemente für ein Bedien- und Anzeigesystem im Fahrzeug, Dissertation, TU Dresden (2008) 11. Vilimek, R.: Gestaltungsaspekte multimodaler Interaktion im Fahrzeug Ein Beitrag aus ingenieurspsychologischer Perspektive, Dissertation, Universität Regensburg (2007) 12. Doerrer, C.: Entwurf eines elektromechanischen Systems für flexible konfigurierbare Eingabefelder mit haptischer Rückmeldung, Dissertation, TU Darmstadt (2003) 13. Hayward, V.: Change of Height: An Approach to the Haptic Display of Shape and Texture Without Surface Normal. In: Experimental Robotics III. Springer Tracts in Advanced Robotics, pp. 570–579. Springer, New York (2003) 14. Bubb, H., Seiffert, R.: Struktur des MMS. In: Bubb, H. (ed.) Menschliche Zuverlässigkeit, pp. 18–20. ecomed – Fachverlag, Landsberg (1992) 15. McGlaun, G., Lang, M., Rigoll, G., Althoff, F.: Kontextsensitives Fehlermanagement bei multimodaler Interaktion mit Infotainment- und Kommunikationseinrichtungen im Fahrzeug. In: Nutzergerechte Gestaltung technischer Systeme, Tagungsband VDIFachtagung USEWARE, VDI-Bericht 1837, pp. 57–65. VDI-Verlag Düsseldorf, Darmstadt (2004) 16. Bubb, H.: Fahrerassistenz primär ein Beitrag zum Komfort oder für die Sicherheit? VDI Nr. 1768, pp. 25–44. VDI-Verlag (2003) 17. Rockwell, T.H.: Eye Movement analyses of visual information acquisition in driving: an overview. Paper presented at the North Carolina State University, Raleigh (1971) 18. Bullinger, H.J.: Ergonomie Produkt und Arbeitsplatzgestaltung. B.G. Teubner Verlag, Stuttgart (1994) 19. Wickens, C.D.: Engineering Psychology and Human Performance. Columbus, Merrill (1984) 20. Wickens, C.D.: Attention and Situation Awareness, Ph.d. thesis, Univ. Illinois (1996) 21. DIN EN ISO 10075-2, Ergonomische Grundlagen bezüglich psychischer Arbeitsbelastung, Teil 2: Gestaltungsgrundsätze (2000) 22. Bubb, H., Schmidtke, H.: Systemstruktur. In: Schmidtke, H. (ed.) Ergonomie, vol. 3. Auflage, Hanser Verlag, München (1993) 23. Assmann, E.: Untersuchung über den Einfluss einer Bremsweganzeige auf das Fahrverhalten, Dissertation, TU München (1985) 24. Tönnis, M., Lange, C., Klinker, G., Bubb, H.: Transfer von Flugschlauchanzeigen in das HUD von Kraftfahrzeugen. In: Proceedings 22. Internationale VDI/VW Gemeinschaftstagung Integrierte Sicherheit und Fahrerassistenzsysteme, Wolfsburg (2006) 25. See by Touch, http://see-by-touch.sourceforge.net/index.html 26. TIM - Der Blindenmonitor, http://www.blindenmonitor.de
Proposal of a Direction Guidance System for Evacuation Chikamune Wada, Yu Yoneda, and Yukinobu Sugimura Graduate School of life Science and Systems Engineering, Kyushu Institute of Technology Hibikino 2-4 Wakamatsu-ku, Kitakyushu, Fukuoka, 808-0196, Japan
[email protected]
Abstract. In this paper, we propose a device that indicates the direction to evacuate. Our proposed system, which would present the direction through the tactile sensation of the head, could be used in no visibility environment such as filled smoke. This paper describes a feasibility of our proposed system and indicates problems to be solved. Keywords: Evacuation, Smoke, Direction, Guidance, Tactile sensation.
1 Introduction Generally speaking, there is an electrical sign to escape in case of fire at a hotel. However, if there is a massive poll of smoke we will not able to see the sign and we will feel fearful while evacuation. Moreover, if the sign can not be seen or is turned off because of flat battery during night time we will not be able to know which way we have to go in the dark. In such this environmental condition, the sighted person will become similar to a blind person because visual information can not be available. Incidentally, we went about to develop a new device which presented the visually impaired person with the obstacles direction and distance. As for direction, we proposed a new method to present the direction of obstacle by tactile stimulation and revealed its effectiveness [1]. If using our experimental results, we hypothesize a person would know easily an escape direction by tactile stimulation. Also, we hypothesize a person will be able to escape without using visual information while smoky dark environment. However, our previous experimental results [1] were obtained under the condition in which subjects were fixed on a chair and they were not allowed to walk. Then, in this paper, we investigated whether or not subjects were guided by tactile stimulation to a designated direction while walking and we reported feasibility of our system to help evacuation.
2 Direction Displaying Method Figure 1 shows one of experimental results when our direction displaying method was used. In this experiment, first tactile stimulation was presented, next the subjects were J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 221–227, 2009. © Springer-Verlag Berlin Heidelberg 2009
222
C. Wada, Y. Yoneda, and Y. Sugimura
Horizontal plane
Stimulation
=
Head
Response (βh)
Stimulation (αh)
≒
Vertical plane
Stimulation (αv)
Response (βv)
Fig. 1. Direction displaying method [1]
asked to imagine the direction of stimulated point, lastly the subjects were asked to point the direction with their fingers. In this experiment, we used an air stimulation as tactile stimulation because that the air stimulation did not cause uncomfortableness. Left figure shows the angle between tactile stimulation and face forward direction (alfa), and right shows the angle between pointed direction and face forward direction (beta). From the upper figures, alfa is equal to beta. That is, if there is an obstacle at 30 degrees rightward for example, tactile stimulation should be presented 30 degrees right on the head. Therefore, the blind person will be able to imagine where the obstacle is. Or, if you intend to guide the blind person to the 30 degrees rightward, tactile stimulation should be presented 30 degrees right on the head. However, these results were obtained when the head of subjects were fixed, that is to say, it was not known whether or not these results applied to the condition in which the head moved while walking. Then, we investigated a feasibility.
3 Feasibility Study In order to present a designated direction, tactile stimulation point should be changed according to the movement/rotation of head while walking. Firstly, we made a head movement measuring unit by combining gyro sensor and digital magnetic compass. The gyro sensor and digital magnetic compass was small and light weight, so the unit does not become obstructive to evacuate. On the basis of the head movement speed, optimal gyro sensor and compass were selected.
Proposal of a Direction Guidance System for Evacuation
+
223
Z-axis Gyro sensor Digital magnetic compass
X-axis
Y-axis #7
Vibratory motor
#6
#5 #4
#3 #2 #1
#1 left 90 degrees #2 left 60 #3 left 30 #4 0 #5 right 30 #6 right 60 #7 right 90
Face forward direction (= 0 degree)
Fig. 2. Our guidance system
Next, air stimulation was used as tactile stimulation in previous experiment. However, it is impossible to carry the experimental setup which includes an air compressor and electrical valves because of its heaviness and big size. And we think the experimental results are applicable to any tactile stimulation such as vibration. Then, we used vibration caused by vibratory motor because a vibratory motor was light weight and easy to control. From the preliminary psychophysiological experiment, the arrangement of vibratory motors on the head was decided. Then, we have made a system by combining vibratory motors, gyro sensors and digital magnetic compass. The outline of our system was shown in figure 2. In this system, seven vibratory motors were arranged in every 30 degrees on the head and three gyro sensors and a compass were put on the head. Guided walking experiment was done in order to investigate whether or not the subjects could be guided to a designated direction. In this experiment, the four Z X
Transmitter
Y
Z Y
Z Y X
Measurement area Indication
Start
Fig. 3. Experimental setup
224
C. Wada, Y. Yoneda, and Y. Sugimura Measurement area Stand 0.96 m
(3)
Theta3 D3 End
Transmitter D2
Theta2 (2)
2-3 m
D1
Y Theta1
(1)
Start
X Z
Fig. 4. Experimental protocol
subjects who wore blindfold were asked to walk to the direction which was pointed out by vibration of a vibratory motor. Figure 3 shows experimental setup. The head position while walking was measured by a magnetic three-dimensional positioning sensor(Fastrak). A transmitter of Fastrak was put on a wooden frame and two receivers were put on the front and back of the head so that head center was calculated. A subject was asked to walk following a vibration on the head. Figure 4 shows the experimental protocol. First the subject started to walk at the start position(indicated by (1) in figure 4 and indicated by start in figure 3). At this time, #4-vibratory motor was vibrating, that is to say, vibration was added to the forehead center and the subject walked to the forward direction. After walking for about 2 or 3 meters, another vibratory motor started to vibrate((2) in figure 4 and indication in figure 3). The subject was asked to turn to the direction which the vibration indicated. In other words, the subject was asked to turn until vibration moved to the forehead center. Before executing experiment, the subjects were not trained at all. Figure 5 shows one result. These graphs show overhead view of track of head position. Vertical axis shows the head movement in backward and forward direction, whereas horizontal axis shows the movement in leftward and rightward direction. After starting the experiment, the center motor(#4) vibrated so the subjects went straight ahead and the track moved from the bottom to the center of the graphs. Next, in graph (a), #5-motor which was set at right 30 degrees vibrated. After the motor vibrated, the subject turned to the rightward. The graph shows that the direction of track changed and the angle change was about 30 degrees. Similar results were obtained for 60 and 90 degrees and were obtained for other subjects. From these results, we thought that our system could guide the subject to the designated direction in this experimental condition.
Proposal of a Direction Guidance System for Evacuation
120 0
30 degrees
100 0 80 0 60 0
]m [m 後 前
40 0
Vibration starts
20 0 0 0
-20 0
20 0
400
600
800
1000
-40 0 -60 0
Walking starts
-80 0 -100 0 [m m ] Movement [mm] 左右 (Leftward-Rightward)
(a) 30 degrees rotation 1200 1000
60 degrees
800 600
]m 400 [m 200 後前 0 -200
Vibration starts 0
200
40 0
600
8 00
1000
-400
Walking starts
-600 -800
右 [mm] Movement [mm] 左 (Leftward-Rightward)
(b) 60 degrees rotation 800 600
Vib ration starts
400
m]m [ 後前
90 degrees
200 0
0
200
400
600
800
-200 -400 -600
Walk ing starts
-800 [mm] Movement [mm]左右 (Leftward-Rightward)
(c) 90 degrees rotation Fig. 5. Track of guided walking
1000
225
226
C. Wada, Y. Yoneda, and Y. Sugimura
4 Problems to Be Solved The experimental results showed a possibility that our system could guide a person to the designated direction without training. However, an integral error would become problem if the head moved complicatedly. For example, figure 6 shows the angle difference between an angle measured by our head movement measuring unit and an angle obtained by two Fastrak receivers which were put on the front and back of head. Figure 6 shows one result when a subject walked with his/her intention for 30 seconds. The value changed for positive or negative but maximum value was about 50 degrees. Nevertheless to say, this angle difference depended on the walking condition, but it was difficult to imagine there was no angle difference. Then, we have to make a method which will decrease the angle difference.
Angle difference[deg]
20 10 0 -10 0
10
20
30
-20 -30 -40 -50 -60 Time[sec]
Fig. 6. Change of angle difference while walking 100
2 Right rotation
80
40
Forward movement
20 0 3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
-20 -40 -60
Value of acceleration [m/s^2]
Shaking head
60
Head angle[deg]
Turning
-80 -100
0 Time [s]
Fig. 7. Head movement and change of acceleration value
Proposal of a Direction Guidance System for Evacuation
227
The reason why the angle difference was occurred is thought to be accumulation of integral calculation error of gyro sensor. If small change of head movement such as head shaking which is not related to walking is avoided, the angle difference may become small. Then, we investigated whether or not small head movement could be detected by using two acceleration sensors which were put on both head sides. A subject executed the following four actions; shaking head, walking forward, rotating rightward and turning right. Figure 7 shows a result. Dots indicate head direction while two lines indicate values of two acceleration sensors which were put on left/right of head. This graph indicates that neither acceleration values changed much for the forward movement, though they did change in a similar way for the rightward rotation and in a similar way periodically for the right turn, and there was no relationship between them for the head shaking. Then, we thought small head movement during the forward movement might be detected, but more researches are necessary.
5 Conclusion In order to evacuate in no visibility environment, we proposed a method which present direction by tactile sensation. Our results showed the possibility of our proposition but showed problems to be solved. We would like to solve the problems and make a useful evacuation system in the near future.
Reference 1. Asonuma, M., Matsumoto, M., Wada, C.: Study on the use Air Stimulation as the Indicator in an Obstacle Avoidance System for the Visually Impaired. In: SICE 2005, MA2-14-2 (CD-ROM) (2005)
A Virtual Environment for Learning Aiport Emergency Management Protocols Telmo Zarraonandia, Mario Rafael Ruiz Vargas, Paloma Díaz, and Ignacio Aedo Universidad Carlos III de Madrid {tzarraon, mrrvarga, pdp}@inf.uc3m.es,
[email protected]
Abstract. This paper presents a virtual environment designed to enhance the learning of airport emergency management protocols. The learning is performed in an informal manner, with each learner playing a different role in a particular emergency simulation. Learners interact within the virtual environment, managing the available information and following the steps prescribed for each type of emergency in the Airport Emergency Plan of the Spanish Civil Defence Organization. The simulation can be run in different modes of difficulty, and can be used as a learning tool as well as an evaluation tool to measure the accuracy of the learner's actuation within the protocol. It can also support standalone training having some of the emergency roles played out by the computer. The virtual environment has been built using DimensioneX, an open source multi-player online game engine. Keywords: Virtual environment, emergency, game engine, simulation.
1 Introduction Airports should always guarantee a fast and effective response to any kind of emergency. All the efforts and decisions should be perfectly coordinated to minimize the consequences whenever an airplane accident, natural disaster or any other emergency interferes in the normal progress of the aeronautical operations. Following this objective, airport emergency plans are specified in order to compile all the norms, measures and procedures that should rule all the actions taken by each of the actors involved in the emergency management, before, during and after the emergency. Learning such protocols and plans is then crucial. The use of games and simulation in the field of training has been widely explored [1, 2] due to the facilities they provide to recreate virtual environments that could support situated learning. Situated learning happens when knowledge and skills are applied in a realistic context [3], in our case a real emergency. Situation games or simulations bring a great level of realistic immersion and promote situated learning, which, according to the literature presents the following advantages: • Apprentices are aware or the actual conditions in which they should apply their knowledge and abilities. If the situation is a simulation of real incident children can dive into the real problems that are going to have to face. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 228–235, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Virtual Environment for Learning Aiport Emergency Management Protocols
229
• Real situations create an atmosphere that produces a greater motivation and engagement of users. • Apprentices understand better the implications of their knowledge or ignorance. • There is a better understanding of the knowledge structure that facilitates its application in real situations. For instance, in the military area an adapted version of the commercial game DOOM has been used to train US Marine fire teams [4], while the first person shooter game Unreal Tournament serves to implement a simulation of a first-responder to a mass casualty airline accident scene [5]. Simulation and virtual environments are also common tools for training and education in the aeronautical area, and they have also been applied to the area of emergency response training. [6, 7]. This paper presents a virtual environment designed to enhance the learning of airport emergency management protocols. An open source multiplayer game engine has been used to implement a virtual world where different types of airport emergencies can be simulated. Simulation participants play the role associated to their position, interact within each others and manage the emergency information as it becomes available. The virtual environment can be used as a learning tool as well as an evaluation tool to measure the accuracy of the learner's actuation according to the protocol. From the wide range of aspects of an airport emergency, our application will focus in the management of the emergency information and the communication between the different roles involved. The rest of the paper is organized as follows: first, the objectives and scope of the airport emergency plans is outlined. Second, the Airport Emergency Management simulator (AEM-Simulator) is presented, its interface is described and a example of use is provided. Next, the use of the simulator with different interaction devices is analyzed, and the characteristics of the game engine used for implementing the virtual environment are also detailed. Finally, some conclusions and future work lines are exposed.
2 Airports Emergency Plans The Spanish Directorate of Civil Defense and Emergencies (DGPCE - Dirección General de Protección Civil y Emergencias) of the Ministry of Interior defines an Airport Emergency Plan as “a set of pre-defined norms, measures and coordinated procedures which aim to minimize the impact of an emergency situation that could take place in the airport or in the response areas defined in the emergency plan” [8]. The emergency plan predetermines the degree of participation of all of the dependencies and services involved in an emergency by clearly stating their functions and the procedures to follow before, during and after the emergency. The operability of the plan is guaranteed by defining each of their responsibilities, the chain of command, the communication procedures, the coordination system and the specific actuation procedures. An emergency plan defines the set of methods and procedures to follow for a particular number of emergency types that are classified depending on whether they involve aeroplanes or not, if the aeroplane is flying or not, or by using the airport zone where the emergency takes place. Any other type of emergency which differs from
230
T. Zarraonandia et al.
the ones considered in the plan will be treated using the procedures of the closest typified emergency. The plan defines the actuations directives before (phase 1), during (phase 2) and after (phase 3) an emergency takes place. Phase 2 is the very essence of the plan and for each of the emergency types and each of the services involved in it a directive record is defined. The directive record defines the hierarchy, the person in charge, the pseudonym to be used in radio communications, the means of communication and an ordered explanation of the task to be performed until the emergency situation is under control. In order to guarantee the efficacy of the plan the staff involved are regularly trained in their specific functions, and the reliability and effectiveness of the plan is evaluated through periodic emergency exercises and practices. As a result of those experiences the plan is constantly reviewed and new norms and improvements are introduced whenever it is consider necessary. The aim of our work is the development of a tool to facilitate the training and learning of the different plans of actuations of an airport emergency plan, reducing the number of real simulations in order to decrease costs. Moreover, the simulation can also be used to test the own plan efficacy and detect flaws or inconsistencies since the behaviour during the emergency can be recorder and analysed in the aftermath to learn from errors and help to build and institutional memory [9].
3 AEM Simulator Currently, the training on the actuations plans is performed through tabletop exercises and full-scale practices. During the first ones, each of the participants play the role associated with their position, following the procedures established in the plan of actuation of a particular emergency. Participants use phones to communicate to each other the decisions adopted, ask for information, confirm data, etc. This scenario can be improved by making use of a graphic and interactive multimedia environment. The virtual environment can be used to support the communication between the different participants/players while keeping track of all the action adopted during the emergency procedure. The actions performed by one actor at a particular stage in the emergency can be compared to the one established by the specific plan of actuations. This can serve both as a training tool, providing the participants with suggestions or feedback on the appropriateness of her actions, and as an evaluation tool, providing a measure of how well each of the participants has followed the procedure. Moreover, the whole procedure can be recorded to be studied afterwards and learn from errors, something that can’t be done using full-scale exercises and phone-based communication. Following this idea a virtual environment for training in the emergency management protocols has been implemented. Trainers connect to the virtual environment and play the emergency role associated with his/her position. Currently the protocols implemented are the ones for “Control Tower Unit”, “Advance Command Post”, “Principal Command Post”, “Airport Fire and Rescue Service” (AFRS), “Airport Health Service” (AHS) and “Airport Coordination Centre” (ACC) for an emergency type of “Incident in a flying aeroplane”. It is expected that in the future all the roles
A Virtual Environment for Learning Aiport Emergency Management Protocols
231
could be played by either real user or the computer, allowing individual and whole team training on the protocols. At present users can only play the roles of “Control Tower Unit”, “Advance Command Post” and “Principal Command Post”. The emergency simulation can be played in different modes providing a different range of feedback and tips to the user. This way using the easiest mode, “Step by step training”, users will be provided with feedback for each of the actions taken, indicating the compliance with the actual action specified in the actuation plan. On the other hand, in the “Evaluation” mode, no feedback will be provided until the end of the simulation. Once the simulation is finished the participant will be presented with a score based in the deviation between her actuation and the one specified in the actuation protocol. To produce this score, the relevance of the information gathered, accuracy of the communications with the other participants, unnecessary contacts, time spent, etc, will also be considered. When more than one participant plays the simulation a team-score is also computed to evaluate the actuation of the whole group. This simulator has been designed following a participatory design process in which experts on civil protection take part. Experts provide descriptions of the functionalities and characteristics of an ideal application for airport emergency management. Taking as a start point their opinions and the traditional table-top exercises used for training, an initial design of the simulator was produced. The same experts collaborate on its refinement during subsequent meetings until the version here described was produced. Keeping the use of simulator simple was a major goal of the process. 3.1 Interface Description Fig 1 shows a screenshot of the virtual environment. The numbers depicts the different sections in which the simulator screen is divided: information section (1), role section (2), communication section (3), action section (4), auxiliary screen section (5), message section (6), event section (7) and feedback section (8). The participants use the information section to store relevant information about the emergency as it becomes available. Whenever they consider a piece of information is relevant, they can assign one of the slots of this section to save it. This is achieved by selecting the keyword corresponding to that information from the drop-down menu on the left side of the slot. Once a slot has been assigned, he/she can type the value of the data directly in the right hand side box, or obtain the information from other participants. The status colour can be used to indicate whether the data has been computed by the user, received from another role or confirmed. The role section provides a list of the roles involved in the emergency management. Different colours are used to specify if a particular role is played by the computer or by a real player. The communication section provides five options to support the communication between the emergency roles: send, request, confirm, not available and error. Buttons in this section are used in combination with the information and role section to compose and send messages. For instance, whenever a user needs to send data he/she will select the data slot in the information section, select the role or roles he/she wants to receive the data, and finally push the “Send” button. The messages section will then display the composed message and the information section of the selected roles will be refreshed to show the data received. A similar process will be followed when a user requires information or confirmation from another role.
232
T. Zarraonandia et al.
Fig. 1. Screen sections of the Airport Emergency Management Protocols Simulator
In the action section the user will find buttons specific for the actions of the role he/she plays. For instance, user who plays the “Tower Control Unit” role will find one button to fire the emergency alarm and other one to order the recording units custody. On the other hand, the one who plays the “Principal Command Post” role will find buttons to establish the grade of emergency and the coordinates of the command position. The auxiliary screen section will be used to support these role-specific actions, being populated with role-specific maps, templates for generating reports, etc. Depending on the action performed, the event section may display a message to inform of the action taken to some or the rest of the users. Finally, the feedback section provides participants with appropriate hits and advice about the actions they should undertake at each step of the emergency. Hints will be more frequent and precise depending on the game mode selected. 3.2 Example Figure 2 shows a screenshot of a particular moment of the simulation, which can be used to track the users’ actions. The screenshot corresponds to the view of the “Tower Control Unit” during the first stages of the “Incident in a flying aeroplane” emergency management and playing the simulation in a “Step by step” mode. The emergency procedure starts when the air traffic control unit receives a “fire in cabin” notification message from an aeroplane commander (1). The AEM simulator automatically assigns a slot of the information section to store the data received (2). Following the plan of actuation, the user asks the commander for confirmation of the incidence (3), and next the corresponding confirmation message is received (4). The red colour next to the incidence data in the information section reflects this confirmation. Next, as can be seen in the events section, the person responsible of the Tower Control Unit user have fired the alarm to alert of the incidence to the Airport Fire and Rescue Service (AFRS), the Airport Health Service (AHS) and the Airport Coordination Centre (ACC) (5). The three services should send confirmation of the alarm
A Virtual Environment for Learning Aiport Emergency Management Protocols
233
Fig. 2. Screenshot of the Airport Emergency Management Protocols Simulator
notification before to trigger communication with the Tower Control Unit. At this moment only the ACC and AHS sent acknowledgments (6). As he waited for those confirmations the traffic control started to gather information about the incidence from the commander. He assigned slots for data about the identification and type of aeroplane, coordinates and time of the incident and number of passengers (7), and composed a “Request” message (8). The commander sent the requested information (9) which was populated by the simulator to the information section (10). As the game mode selected was “Step by step” the feedback section of the screen reflects the different tips provided at each step of the emergency (11). 3.3 AEM-Simulator Interaction Devices Work has been carried out to explore different possibilities of interaction with the AEM-Simulator. On the one hand, given that the only requirement to run the AEMSimulator is a common web browser, and that the actions are carried out by simply clicking on the screen, the use of mobile devices or PDAs to follow the training becomes an obvious option to be investigated. In practical terms, this means that the application can be used in a wide range of environments. On the other hand, to identify drawbacks and inconsistencies in the emergency plans it would be useful to offer a turn-based game mode, in which all the main players work together using the same device, thus facilitating the free exchange of views and opinions. Interactive whiteboards lend themselves to this type of use of the AEMSimulator, as an ideal medium for collaborative activities such as these (Figure 3). 3.4 The DimensioneX Game Engine The AEM-simulator has been implemented using DimensioneX [10], a free-open source multiplayer game engine provided under the GNU General Public License.
234
T. Zarraonandia et al.
Fig. 3. Use of a AEM-Simulator with an interactive whiteboard
DimensioneX provides a software kit for developing and running adventure games, multiplayer games, multi-user real-time virtual worlds and turn-based games. Among many other features, DimensioneX provides online multiplayer capability, multimedia support, game maps and player tracking, game saving and events handling. The game engine is actually a Java servlet. The games developed for this engine can therefore be run in any servlet container, typically Tomcat. Players connect to the game via a conventional web browser without any additional software required. DimensioneX provides a script language for specifying virtual worlds; this is, to describe the rooms, links, themes, characters and events that can take place during the game. Worlds’ descriptions are stored in plain text files with “.dxw” extension, and processed by the game engine to produce an HTML document. Players interact via browser within each others as well as with the rest of the game elements. As a result of these interaction the world state becomes modified, which in turn triggers the creation of new HTML documents which reflect these changes. The world state is stored in the server and continually updated. We have chosen this engine due to the facilities it provides for implementing the communication between the different players which is a most important requirement in our domain. Modifying the source code of the engine would be relatively straightforward if that were ever necessary, and it is also worth noting that the programming language is simple.
4 Conclusions and Future Work This paper presents a virtual environment to facilitate the training and learning of protocols for the management of emergency in airports. The virtual environment has been implemented making use of an open source game engine named DimensioneX. Users can connect to the simulator via browser, without installing additional software on the machine for players. The project is on its first stages. Currently, the “Incident in a flying aeroplane” emergency plan has been implemented to be play in the “Step by step” mode; this is, providing participants total guidance about each of the actions they should performed
A Virtual Environment for Learning Aiport Emergency Management Protocols
235
at every moment. The next step will be to complete the implementation of the “Evaluation” mode. This will allow starting to test the application with real users and validate its usefulness as a training tool. At the same time work is carried out to implement the rest of the airport emergency plans defined by the Spanish Civil Defense Department. Moreover, the system has to be tested with real users. Even though it has been designed with domain experts the efficacy of any interaction device can only be assessed when end users try to perform their tasks with it. Future work lines consider integrating within the AEM-simulator role specific simulators as, for instance, an Airport Fire and Rescue Services Team Management Simulator. The participant who plays the role of person in charge of these units could then be trainned in emergency protocols and, at the same time, in actions specific of his/her role as emergency analisis, strategy selection and unit personal leading. The output information from one simulator could serve as input and the other provinding a more realistic experience. Acknowledgments. This work is part of the MoDUWEB project (TIN2006-09678), funded by the Ministry of Science and Technology, Spain.
References 1. Rolfe, J., Saunders, D., Powel, T.: Simulations and Games for Emergency and Crisis Management: Simulations and Games for Emergency and Crisis Management. Routledge (1998) 2. McGrath, D., McGrath, S.P.: Simulation and network-centric emergency response, Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), Orlando, Florida (2005) 3. Brown, J.S., Collins, A., Duguid, P.: Situated cognition and the culture of learning. Educational Researcher 18(1), 32–41 (1989) 4. Prensky, M.: True Believers: Digital Game-Based Learning in the Military LearningCircuits.com (February 2001), http://www.astd.org/LC/2001/0201_prensky.htm (11/02/09) 5. McGrath, D., Hill, D.: UnrealTriage: A Game-Based Simulation for Emergency Response. In: The Hunstville Simulation Conference, Sponsored by the Society for Modeling and Simulation International (October 2004) 6. Virtual preparation for ARFF emergencies, Industrial Fire Journal (October 2008), http://www.hemmingfire.com/news/fullstory.php/ Virtual_preparation_for_ARFF_emergencies.html (11/02/09) 7. McGrath, D., Hunt, A., Bates, M.: A Simple Distributed Simulation Architecture for Emergency Response Exercises. In: Proceedings of the 2005 Ninth IEEE International Symposium on Distributed Simulation and Real-Time Applications (DS-RT 2005) (2005) 8. Dirección General de Protección Civil Española. Subdirección General de Planes y Operaciones. Plan de Emergencia de Aeropuertos (1993) 9. Klenk, J.: Emergency Information Management and Communications. Disaster Management Training Programme. DHA (1997) 10. DimensioneX Online Multiplayer Game Engine, http://www.dimensionex.net
User Profiling for Web Search Based on Biological Fluctuation Yuki Arase, Takahiro Hara, and Shojiro Nishio Department of Multimedia Engineering Graduate School of Information Science and Technology, Osaka University 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan {arase.yuki,hara,nishio}@ist.osaka-u.ac.jp
Abstract. Because of the information flood on the Web, it has become difficult to search necessary information. Although Web search engines assign authority values to Web pages and show ranked results, it is not enough to find information of interest easily, as users have to comb through reliable but out of the focus information. In this situation, personalization of Web search results is effective. To realize the personalization, a user profiling technique is essential, however, since the users’ interests are not stable and are versatile, it should be flexible and tolerant to change of the environment. In this paper, we propose a user profiling method based on the model of the organisms’ flexibility and environmental tolerance. We review the previous user profiling methods and discuss the adequacy of applying this model to user profiling. Keywords: User profile, Web search, biological fluctuation.
1 Introduction As our current life is always surrounded by Internet enabled devices, such as computers, cellular phones, PDAs and game consoles, the highly advanced information society allows us to collect information of concern far easily than the past. However, the larger the amount of information on the Web grows, the harder it becomes to find information of interest. According to a report of Google on July 2008, the number of unique URLs on the Web has already exceeded a trillion and the number of Web pages is practically uncountable. Furthermore, the number of Web pages is still rapidly growing every second. Currently, people use Web search engines to find Web pages containing their information of interest. Most search engines use link structure of Web pages to decide authoritative Web pages, which is based on the idea that authoritative Web pages contain more reliable information than the other minors. When people query a search engine, these authoritative pages are ranked high as the search result. This criterion has been really effective to enable people to find reliable information without bothering by browsing hundreds of junk Web pages. However, the authority based ranking is not enough in the current situation of information flood, since the information on the Web has become too diverse in their semantic meaning to recommend based on J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 239–247, 2009. © Springer-Verlag Berlin Heidelberg 2009
240
Y. Arase, T. Hara, and S. Nishio
only their reliability. As a result, people have to access many reliable but unnecessary Web pages to find pages that exactly match with their interests. To solve this problem, it is effective to personalize Web search results based on users' interests in addition to the current authority based ranking. For this aim, user profiling is essential. However, since the users’ interests are unstable and easy to change, user profiling is not easy task. It is apparent from the fact that although user profiling methods have been actively researched for decades, tracking users’ versatile interests is still difficult. In this paper, we propose a novel approach to realize flexible and dynamic user profiling. We adopt a model called Attractor selection based on biological fluctuation to detect users' intentions. Since the Attractor selection is tolerant to the change of the assumed environment, it is suitable to model users’ versatile interests. Meanwhile, ambient information environment is a recent hot topic, where surrounding computers and embedded sensors in the environment detect users' situations and provide functions to satisfy users' needs without users’ explicit declaration. In the precedent ubiquitous environment, users do access computers to satisfy their requirements. However, in the ambient environment, the environment does make actions to satisfy users' needs. The concept of ambient environment can be applicable to various fields, including user interface on which we are working. We define ambient user interface as interfaces that detect users' intentions and provide information according to them. We regard our personalization method as a realization of ambient interface for Web search. The reminder of this paper is organized as follows. In section 2, we introduce previous user profiling methods and discuss the differences from our approach. In section 3, we briefly introduce the biological Attractor selection model. In section 4, we propose our user profiling method based on Attractor selection, and discuss its potential to solve problems that previous user profiling methods could not settle. Then, in section 5, we conclude this paper and describe our future work.
2 Related Work There have been two directions for user profiling, as one makes use of explicit feedback from users, and the other does implicit feedback. A popular method for the former approach is asking users to input their interest, which is adopted by some portal and news sites. Another method is to ask users to assign a score to their browsed Web pages according to the strength of their interest to contents of the pages. As represented by News Dude [1], a user can specify i) whether they think the content is interesting or not, ii) if they would like to see more similar information, or iii) if they have already seen the information previously. An advantage of this approach is that extracted user profiles tend to be reliable, since users themselves input their interests. A disadvantage is that they trouble users to input their interests, and more, users have to change their profiles each time when their interests change. The latter approach uses the users’ browsing behaviors to extract user profiles. SUGGEST [2] adopts a two-level architecture composed by an offline creation of historical knowledge and an online engine that understands user’s behavior. As the
User Profiling for Web Search Based on Biological Fluctuation
241
requests arrive at this system module, it incrementally updates a graph representation of the Web site based on the active user sessions and classifies the active session using a graph partitioning algorithm. Gasparetti et al. [3] proposed an algorithm with which the system could identify the users’ interests by exploiting Web browsing histories. Claypool et al. [4] investigated different kinds of user behaviors, such as scrolling, mouse clicks, and time on page for extracting user profiles. These methods are based on learning user interests, and thus, a large amount of training data is needed and they seem to take long time to converge to reasonable profiles. Billsus et al. [1] proposed that users’ interests can be classified into two, as longterm and short term interests. The long-term interests can be regarded as the users’ intrinsic interests, and thus, seem to be stable over time. Therefore, they are easier to extract by directly asking users and using learning based methods. On the other hand, the short-term interests can be regarded as interim, reflecting users’ current interests. This feature results in difficulty of tracking change of profiles because of their versatile nature in a short period. Especially for Web search, users’ interests can be classified to the short-term interests, since users usually search information which they need, concern, and get interested in, at that time. Therefore, a user profiling method of flexible and tolerant to environmental change is suitable. For this aim, we adopt the Attractor selection mechanism, which is based on the fluctuation of organisms and realizes flexible and environmental tolerant solutions, to track change of users’ interests on Web search.
3 Attractor Selection In this section we outline the principle of Attractor selection, which is a key component in our method. The original model for adaptive response by Attractor selection is given by Kashiwagi et al. [5]. Attractor selection defines each possible situation as attractor, and evaluates the current situation to select one of better attracters in a dynamic environment. The goodness of the attractor is estimated by the activity value. While the activity is high, the system keeps staying the current attracter. On the other hand, when the situation changes and the activity gets low, the system performs a random walk to find a more suitable attractor. Because of the random walk, the system performs fluctuation. We can basically outline the attractor selection model as follows. Using a set of differential equations, we describe the dynamics of an M-dimensional system. Each differential equation has a stochastic influence from an inherent Gaussian noise term. Additionally, we introduce the activity α which changes the influences from the noise terms. For example, if α comes closer to 1, the system behaves rather deterministic and converges to attractor states defined by the structure of the differential equations. On the other hand, if α comes closer to 0, the noise term dominates the behavior of the system and essentially a random walk is performed. When the input values (nutrients) require the system to react to the modified environment conditions, activity α changes accordingly causing the system to search for a more suitable state. This can also involve that α causes the previously stable attractor to become unstable. The random walk phase can be viewed as a random search for a new solution state and when it is found, α decreases and the system settles in this solution. This behavior
242
Y. Arase, T. Hara, and S. Nishio
is similar to the well known simulated annealing [6] optimization method, with the main difference that the temperature is not only cooled down, but also increased again when the environment changes. The biological model describes two mutually inhibitory operons where m1 and m2 are the concentrations of the mRNA that react to certain changes of nutrient in a cell. The basic functional behavior is described by a system of differential equations, as the following equations show. deg
1 1
The functions syn(α) and deg(α) are the rate coefficients of mRNA synthesis and degradation, respectively. They are both functions of α, which represents cell activity or vigor. The terms ηi are independent white noise inherent in gene expression. The dynamic behavior of the activity α is given as follows.
_
1
Here, prod and cons are the rate coefficients of the production and consumption of α. The term nutrienti represents the external supplementation of nutrient i and nutr_threadi and ni are the threshold of the nutrient to the production of α and the sensitivity of nutrient i, respectively. A crucial issue is the definition of the proper syn(α) and deg(α) functions. To have two different solutions, the ratio between syn(α) and deg(α) must be greater than 2 when there is a lack of one of the nutrients. When syn(α) / deg(α) = 2, there is only a single solution at m1 = m2 = 1. The functions syn(α) and deg(α) as given in [5] are as follows. 6 2 deg The system reacts to changes in the environment in such a way that when it lacks a certain nutrient i, it compensates for this loss by increasing the corresponding mi value. This is done by modifying the influence of the random term ηi through α, as Figure 1 shows. When α is near 1, the equation system operates in a deterministic fashion. However, when α approaches 0, the system is dominated by the random terms ηi and it performs a random walk. In Figure 1, an example is given over 20000 time steps. We can see the following behavior. When both mi values are equal, the activity is highest and α = 1. As soon as there is a lack of the first nutrient (2000 ≤ t < 8000), mi compensates this by increasing its level. When both nutrient terms are fully available again (8000 < t ≤ 10000), the activity α becomes 1 again. An interesting feature of this method can be observed between 10000 < t < 13000. Here, the random walk causes the system to search for a
User Profiling for Web Search Based on Biological Fluctuation
243
Fig. 1. Biological attractor selection model
new solution, however, it first follows a wrong “direction” causing α to become nearly 0 and the noise influence is highest. As soon as the system approaches the direction toward the correct solution again, α recovers and the system gets stable again. Such phases may always occur in the random search phase. As we showed above, Attractor selection cannot always find the best answer. Instead of finding that, it tries to find a better attractor with quite a simple strategy. By this mechanism, Attractor selection ensures robustness in exchange for efficiency. Organisms can survive in such an unstable and dynamically changeable environment in the nature owing to inhering Attractor selection.
4 User Profiling Method Based on Attractor Selection As we discussed in Section 2, the problem of previous user profiling methods is that they take long time to follow the change of users’ interests. As Billsus et al. [1] proposed, users’ interests can be classified into two, as long-term and short term interests. The long-term interests are easier to extract by directly asking users and using learning based methods owing to their stable nature. On the other hand, the short-term interests are difficult to track because of their versatile nature in a short period. To detect the short-term interests, we adopt the Attractor selection scheme, which is suitable for finding solutions in a dynamically changeable environment. 4.1 Design of User Profiling Method We model users' interests as attractors and their change as environmental change in the Attractor selection scheme. We define a user’s profile as a ranking of pre-defined thirteen topics and detect the ranking using the Attractor selection scheme. 4.1.1 Definition of User’s Profile According to the definition of categories of Web sites at YAHOO! Japan [7], we decided the following thirteen topics as the users’ interest candidates.
244
Y. Arase, T. Hara, and S. Nishio
1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13)
News Entertainment Sports Art Health Politics Economics Life Computer Education Technology Local Others
A user profile is a ranking of these thirteen topics, as User profile = {(1| topic1),..., (13| topic13)}, where (rankk | topick) represents that rankk is the rank of the topic and topick is one of the candidate interest topics. For example, a profile of {(1| Technology), (2| computer),…,(13| Education)} means that this user is most interested in technology and computer related Web pages, while not interested in pages relating educational information. Here, we assume that categories of Web pages are given. Since our definition of users' interest topics based on the Web sites' categories, we can expect to use Web sites' categories assigned by portal sites. Additionally, since there are many previous works conducting automatic Web page categorization, it is also possible to adopt these methods to categorize Web pages as a pre-processing step. 4.1.2 Definition of Activity In Attractor selection, the value of activity α represents the goodness of the current attractor. In our case, α represents how the current user profile matches with the user’s real interests. To evaluate it, we adopt the essence of DCG (discounted cumulative gain) to evaluate the current profile. DCG is a measure of effectiveness of a Web search engine, often used in information retrieval. The premise of DCG is that highly relevant items appearing lower in a search result list should be penalized as the graded relevance value is reduced logarithmically proportional to the rank of the result. The DCG accumulated at a particular rank p is defined as follows:
Where reli is a graded relevance of the result at rank i. In our case, we cannot obtain explicitly graded values for each interest topic. However, it is reasonable to assume that users frequently browse Web pages of topics of interest and their browsing times will be longer compared to other topics. Therefore, we define reli using users’ cumulative browsing times of topics when they browsed a certain number of Web pages. The desired behavior of α is summarized as follows. If we have no information about a user, the candidate topics should be evaluated uniformly. A low α means that the current profile does not match with users’ interests and a new one should be
User Profiling for Web Search Based on Biological Fluctuation
245
detected. We should keep the value of α as 0.0 1.0. The larger α is, the detected profile better matches with the user’s interests. As a whole, after browsing N pages, activity α is determined as follows. (1)
(2)
_
Here, δ and λ are constant parameters to adjust the adaption of α, / is the fraction of the evaluation value of previously found best matching profile over that of the current profile, f(n) is the normalization factor, p is the rank of a topic which should be considered. For example, if p=3, the method evaluates only the top three interests of the user profile instead of examining the whole profile, which results in stressing the topics of the best interest. 4.1.3 Calculation of Interest Ranking As we showed in Section 0, the original form of Attractor selection is two dimensional. Leibnitz et al. extended the form to multi dimensional [8]. We adopt the multi dimensional form since we have to deal with the thirteen topic candidates. For each topic, we decide its weight using Attractor selection, and rank the topics according to their weights. Specifically, we use the following multi dimensional form. ,
deg ,
,
,
(3)
,
deg 1 √2
Here, , is the weight of topic i when being assigned rank j, , is a white noise, β and γ are constant parameters that adjust the effect of the noise term and activity α, respectively. 4.1.4 Flow of User Profiling Detection We can now summarize the basic algorithm for detecting the user profile when a user browsed N Web pages: 1. Calculate activity α based on Equation (1). 2. Initialize the set of topics that are already determined the rank as 3. Conduct the following process for each rank 1, 2, … , 13:
.
246
Y. Arase, T. Hara, and S. Nishio
a. b. c.
For each topic 1, 2, … , 13 , calculate the weight of the each topic , based on Equation (3). . Set max_i as i of the maxim value of , 1, 2, … , 13 Set j as the rank of the topic max_i and add the topic into .
4. Calculate the feedback of the decided user profile based on Equation (2). 5. Update if necessary. 4.2 Discussion The Attractor selection scheme has been applied to some research fields [8][9][10]. The first application is for multi-path routing in overlay networks conducted by Leibnitz et al [8]. They showed that their method based on Attractor selection is noisetolerant and capable of operating in a very robust manner under changing environment conditions. The authors also applied the Attractor selection scheme to routing problem in a mobile ad-hoc/sensor network environment [9]. They proved that their Attractor selection based method can operate entirely in a self-adaptive manner and that it can easily compensate for sudden changes in the topology of the network. On the other hand, Kitajima et al. applied the Attractor selection scheme to set parameters for filtering contents on data broadcasting services [10]. They assume an environment that broadcasting service providers broadcast enormous and various data to users and user clients have to filter out unnecessary data for users. By using the Attractor selection scheme to decide the order of filters, they can reduce the time for filtering in such a dynamic environment. These three previous works show the robustness of the Attractor selection scheme to the change of the environment. In addition, another advantage of the Attractor selection scheme is that it operates without explicit rules and is simply implemented by numerical evaluation of the differential equations. In our case, users’ interests are not stable in nature and versatile in a quite dynamic manner. Therefore, we can expect that our Attractor selection based method successfully track change of users’ interests in a self-adaptive manner. Furthermore, because of its simplicity of implementation and requiring nothing to store users’ histories to learn their profiles, we can implement and distribute our method as a light-weight plug-in to Web browsers, which means that users can receive a benefit of personalization very easily. It also has the advantage that our method is free from violating users’ privacies, since it takes into account the current Web page the user browsed and does not need to store their browsing histories and behaviors.
5 Conclusion and Future Work In this paper, we reviewed the works of user profiling and discussed their problems. Since most of the user profiling methods requires considerable amount of users’ browsing histories as well as the information of their behaviors on the Web pages, it seems difficult to converge to the reasonable user profiles in a practical time, as the users’ interests frequently change. We briefly introduced the Attractor selection scheme that models the fluctuation inhering in organisms, and proposed the Attractor selection based method for user
User Profiling for Web Search Based on Biological Fluctuation
247
profiling in Web browsing. We defined a user’s profile as the ranking of pre-defined topics and decide the ranking using the Attractor selection scheme. As future work, we implement a practical application and conduct user experiments to examine the effectiveness of our method. Since the users’ interests might be quite unstable and versatile, we should confirm the how quickly our method can converge to each attractor.
Acknowledgement This research was partially supported by “Global COE (Centers of Excellence) Program” and Grant-in-Aid for Scientific Research on Priority Areas (18049050) of the Ministry of Education, Culture, Sports, Science and Technology, Japan.
References 1. Billsus, D., Pazzani, M.J.: A Personal News Agent that Talks, Learns and Explains. In: The Third Annual Conference on Autonomous Agents, Seattle, pp. 268–275 (1999) 2. Baraglia, R., Silvestri, F.: Dynamic Personalization of Web Sites Without User Intervention. Communication of the ACM 50(2), 63–67 (2007) 3. Gasparetti, F., Micarelli, A.: Exploiting Web Browsing Histories to Identify User Needs. In: International Conference on Intelligent User Interfaces (IUI 2007), Hawaii, pp. 28–31 (2007) 4. Claypool, M., Le, P., Waseda, M., Brown, D.: Implicit Interest Indicators. In: The Sixth International Conference on Intelligent User Interfaces (IUI 2001), USA, pp. 33–40 (2001) 5. Kashiwagi, A., Urabe, I., Kaneko, K., Yomo, T.: Adaptive Response of a Gene Network to Environmental Changes by Fitness-Induced Attractor Selection. PLos ONE 1(1), e49 (2006) 6. Aarts, E., Korst, J.: Simulated Annealing and Boltzmann Machines. Wiley, New York (1989) 7. Yahoo! Japan, http://www.yahoo.co.jp/ 8. Leibnitz, K., Wakamiya, N., Murata, M.: Resilient Multi-Path Routing Based on a Biological Attractor-Selection Scheme. In: Ijspeert, A.J., Masuzawa, T., Kusumoto, S. (eds.) BioADIT 2006. LNCS, vol. 3853, pp. 48–63. Springer, Heidelberg (2006) 9. Leibnitz, K., Wakamiya, N., Murata, M.: Self-Adaptive Ad-Hoc/Sensor Network Routing with Attractor-Selection. In: IEEE GLOBECOM, San Francisco, pp. 1–5 (2006) 10. Kitajima, S., Hara, T., Terada, T., Nishio, S.: Filtering Order Adaptation Based on Attractor Selection for Data Broadcasting System. In: International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2009), Fukuoka (2009)
Expression of Personality through Avatars: Analysis of Effects of Gender and Race on Perceptions of Personality Jennifer Cloud-Buckner, Michael Sellick, Bhanuteja Sainathuni, Betty Yang, and Jennie Gallimore Department of Biomedical, Industrial and Human Factors Engineering Wright State University, Dayton, OH, 45435, USA
[email protected]
Abstract. Avatars and virtual agents are used in social, military, educational, medical, training, and other applications. Although there is a need to develop avatars with human-like characteristics, many applications include avatars based on stereotypes. Prabhala and Gallimore (2007) conducted research to develop collaborative computer agents with personality. Using the Big Five Factor Model of personality they investigated how people perceive personality based on actions, language, and behaviors of two voice-only computer agents in a simulation. However, these computer agents included no visual features in order to avoid stereotypes. The objective of the current research extends the work of Prabhala and Gallimore by investigating the effects of personality, race, and gender on perceived personality of avatars with animated faces. Results showed that subjects were able to distinguish the different personalities and race and gender significantly affected perceptions on a trait-by-trait basis. Keywords: avatar, virtual agent, personality, Big Five Factor.
1 Introduction Avatars are frequently used in social networking activities, educational contexts, and in medicine for training, telemedicine, collaboration among providers, rehabilitation, and group counseling [1-5]. For the military, avatars have been used to aid in various simulations, including the navigation of unmanned aerial vehicles. In the research of Prabhala and Gallimore [6], the avatar was actually represented by just a voice with personality rather than a visual representation. The voice guided the user through different navigation activities. However, in comparison of text-only, audio-only, audio-video, and audio-avatar chat contexts, the video-assisted and avatar-assisted chats resulted in higher ratings for subject attention than audio-only or text-only chats [7]. With the growing applications of avatars, it is important to note that the perception of avatars’ personality and credibility can be affected by their appearance, behavior, language, and actions. The need for research and development for computer agents or avatars that more closely mimic human behaviors has been noted by many researchers [6, 11]. Using the Big Five Factor model of personality, Prabhala and Gallimore (2007) found that people could perceive personality from avatars through their J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 248–256, 2009. © Springer-Verlag Berlin Heidelberg 2009
Expression of Personality through Avatars
249
actions, language, and behavior [6]. Studies show that appearance, gender, and ethnicity can affect the user’s reaction to the avatar [9-10]. Facial actions like head tilting, nodding, eye brow rising, and blinking are also used as visual cues in perceiving inherent personality [11].
2 Methods 2.1 Objectives, Experimental Design, and Hypothesis The objective was to evaluate if people can perceive personality of computerized avatars based on actions, language, and behaviors and to determine if race and gender as represented in facial and voice features would affect perceptions. Independent variables are avatar’s race (dark or white), gender (male or female), and personality (P1 or P2). These were arranged in a 2x2x2 within-subject experiment. Data were collected on the subjects’ ratings of 16 personality subtraits in the Big Five Factor Model and 4 subjective questions on a 10-point scale (1: trait is not very descriptive of avatar and 10: trait is very descriptive). The hypotheses are as follows: Hypothesis 1: There will be no significant differences in subject’s subtrait ratings by personality. Hypothesis 2: Race and gender will not affect perception of personality. 2.2 Subjects and Apparatus The subjects were recruited from Wright State University via e-mail and class announcements. Thirty-five subjects completed the study (16 female and 19 male). The subjects were not paid to participate and had the option to leave at any time. Test scenarios were presented with two 17-inch computer monitors, a Windows XPequipped computer, and headphones. Scripts were recorded with Audacity audio software from sourceforge.net, and avatars were developed in Haptek’s PeoplePutty software. Statistical analysis was performed in JMP 7.0 and Microsoft Excel. 2.3 Stimuli Avatar Appearance. Shown in Figure 1, avatars consisted of: 1) dark female, 2) white female, 3) dark male, 4) white male. Each used the same face structure.
Dark female
White female
White male
Fig. 1. Avatar Appearance
Dark male
250
J. Cloud-Buckner et al.
The only modifications to the basic face were changing skin color (dark or white), broadening of the male face, and lengthening of the females’ hair. Avatar Personality. The two personality levels were constructed to be completely opposite each other, around 16 of the 30 Big Five Factor model subtraits. These were selected for what could be emulated in scenarios; also, insignificant factors from [6] were omitted, such as artistic interests and liberalism. Traits emphasized in each personality type were given an expected rating of 10, shown in Tables 1 and 2. Personality 1 was designed to be friendly, outgoing, and self-sufficient with a high activity level and some anger. Personality 2 was designed to be introverted, self-conscious, cooperative, orderly, modest, disciplined, and sympathetic to others. Table 1. Emphasized Traits and Actions, Language, and Behavior to Define Personality 1
Personality 1 Subtrait
Expected Rating
Behavior Casual greeting, positive comments, friendly tone, jokes, smiles with teeth showing and wide cheeks
Friendliness
10
Gregariousness
10
Is happy to see a crowd, talkative, likes activities with crowds, makes eye contact
Assertiveness
10
No hesitation, expresses ideas openly, talkative, serves as a leader of groups, makes eye contact
Activity level
10
Involved in many activities, quick information pace
Cooperation
1
Discusses confrontation about issue that angers them
Modesty
1
Talks about being superior to professors or other students, brags about achievements
Sympathy
1
Self-efficacy
10
Orderliness
1
Self-discipline
1
Cautiousness
1
Anxiety
1
Anger
10
Self-consciousness 1
Adventurousness 10 Intellect 1
Jokes about people being sick Uses confident, self-focused phrases (e.g., confident can do anything after graduation), never looks at notes Might jump from point to point in their talk because they don't know the order Procrastinates on assignments / studying for tests Mentions campus security because "it's required," mentions outdoor adventure activities Does not seem nervous about talking Makes snide remarks; has a tirade about something on campus (e.g., parking ticket, bad cafeteria food) Bold opinions no matter what others think Mention study abroad, outdoor adventure activities Prefers to people to ideas; does not like long puzzles
Expression of Personality through Avatars
251
Table 2. Emphasized Traits and Actions, Language, and Behavior to Define Personality 2
Personality 2
Friendliness
Expected Behavior Rating 1 Formal greeting, negative comments, Avoids eyes
Gregariousness Assertiveness Activity level
1 1 1
Cooperation
10
Modesty
10
Sympathy
10
Self-efficacy
1
Orderliness
10
Keeps organized schedule / notes for classes, regular study times, meeting with professor on regular basis
Self-discipline
10
Has regular study schedule ; finishes assignments
Cautiousness
10
Talks about campus security, carrying pepper spray
Anxiety
10
Apologizes for nervousness about talking
Anger
1
Nice about everything, even when it is bad
Self-consciousness
10 1 10
Shy about people; worried that they look stupid Mention others doing study abroad, going outdoors Describes solving puzzles; talks about ideas
Subtrait
Adventurousness Intellect
Shy, avoids crowds, avoids eye contact Hesitates while speaking, not talkative Slower pace of info, involved in fewer activities Avoids confrontation, offers individual help, talks about helping on teams rather than leading them Does not brag about things, works hard in class but does not take credit for being on top Offers extra guidance, condolences on problems Has no control in their life (e.g., parents made them go to that school); unsure about notes
A, L, B in Avatar Scripts. Each avatar script welcomed visitors and introduce a college campus as an online tour guide. The avatar personality was exhibited verbally through actions, language, and behavior (A, L, B) as well as tone of the avatar. The A, L, B for each trait are shown in Tables 1 and 2. Eight gender-neutral scripts were randomly assigned to the faces and each face had one script from P1 and P2. One female recorded all of the female scripts, and one male recorded all the male scripts. Each script was 1.5 to 2 minutes long and followed the same order of items: initial facial expression, greeting, university name, purpose of the talk, why the speaker came to the school, the year the school began, number of students, academics including speaker’s major, professors and classes, study abroad opportunities, facilities, dorms, security, clubs and organizations, and a conclusion. 2.4 Procedure Each subject was randomly assigned to an “order code” that specified the random order in which the 8 scripts would be viewed. After each script, subjects filled in a
252
J. Cloud-Buckner et al.
randomized spreadsheet ratings questionnaire with their rating on a 1-10 scale of how well the subtrait described the avatar, and what A, L, B led to that rating. Each question included the definitions of the subtrait based on the Big Five Factor model, such as “Friendliness describes the quality of being amicable and having no enmity,” This was repeated for all 8 scripts, and the entire process was generally 30-45 minutes. Subjects were also asked four subjective questions about the guide.
3 Results 3.1 General Model and ANOVA Information Each of the 35 subjects answered 160 total ratings questions, resulting in 5600 data points. The ANOVA model included Personality, Race, Gender and Question (20) resulting in a 2x2x2x20 within-subject analysis. Question was included because each question addressed a different subtrait. Since traits like Anger and Friendliness are on opposite scales, performing an analysis across questions can result in an averaging of those traits. We are interested in evaluating how subjects perceived the different subtraits embedded in the personality. The Alpha level was set to 0.05 and resulted in 11 significant factors. To control for sphericity, a conservative Greenhouse-Geisser (GG) correction resulted in 7 significant factors. Significant interactions were analyzed with simple effects f-tests, and main effects were analyzed with Tukey-Kramer Honestly Significant Difference (Tukey HSD) tests. 3.2 Interactions Gender x Personality x Race (GxPxR). This interaction must be viewed by subtrait. Without the G-G correction the RxGxPxQ interaction was significant, and with the GG correction the p-value is 0.0692. Given that the G-G is conservative, we looked at the effects of R and G for each Personality and each subtrait and we find significant main effects of R, G and RxG depending on the trait as indicated in Table 3. Personality x Question (PxQ). PxQ was significant (F(1,34) = 37.71, p < 0.0001). Simple effects F-tests show P is a significant factor for every Q except the one measuring Cooperation. Figures 2 and 3 illustrate the PxQ interaction for P1 and P2, respectively. The traits designed into P1 are rated as more descriptive for P1 than P2. For the traits designed into P2 there are two traits that are more descriptive for P1, Orderliness and Intellect. Cooperation was perceived equally for both P1 and P2. Race x Question (RxQ) was significant (F(1,34) = .0211, p = 0.0211). When analyzed by each question, ratings for race were significantly different for the following traits: orderliness, anger, cooperation, self-consciousness, modesty, self-discipline, intellect, sympathy. For all of these traits, subjects gave higher ratings (more descriptive) when the race was Dark except on the subtrait Anger, in which case the rating was lower (less descriptive). Additionally, the Dark avatars received a significantly higher mean rating on the question asking subjects if they would be willing to trust information from that avatar. Figure 4 shows significant traits.
Expression of Personality through Avatars
8 7 6 5 4 3 2 1 0
253
P1
s ne s ntu
Ad ve
Ac
tiv
i ty
ro us
Le
ti v en es se r As
eg ar Gr
ve l
s
r An ge
s ne s io us
fi c ac Ef lf Se
Fr i en dli n
es
y
s
P2
Fig. 2. Traits designed into P1 are rated on average as more descriptive than P2 Table 3. Instances of Significant Factors for Each Question, where x indicates occurrence of statistical significance of that factor (R, G, RxG) for a given personality and question
Friendliness Self-Efficacy Anxiety Gregariousness Orderliness Anger Assertiveness Activity Level Cooperation Self Consciousness Adventurousness Modesty Self-Discipline Intellect Sympathy Cautiousness Q: Willing to trust? Q: Satisfied w/ tour guide? Q: This guide for later tour? Q: School with this disposition? Total
Personality 1 Race Gender R*G x x
Personality 2 Race Gender R*G
x x
x
x
x x x
x x x x x x x x x x 11
x x x
x
x x x
x x
x
x
x
x x
x x
x x x
x x
7
5
7
5
5
254
J. Cloud-Buckner et al.
7 6 5 4 3
P1 P2
2 1
hy Ca ut io us ne ss Co op er at io n
pa t Sy m
lec t tel In
ip l in e isc
ty f-D
Se l
M od es
ss ci ou sn e
es s
on s
rd er lin
Se lf C
O
A
nx ie ty
0
Fig. 3. Traits designed into P2 are more descriptive for P2 than P1 for 6 of the 9 traits
3.3 Descriptions of Actions, Language, Behavior That Led to Ratings Out of 5600 rating points, 571 comments were provided from subjects. They centered on these areas: tour guide’s direct quotes, guide’s behaviors, tone of voice or accent, stuttering, pauses, forgetfulness, and lack of organization. Most comments referred to verbal rather than visual elements. Comments on appearance mentioned lack of eye contact, looking “boring,” “comfortable” facial expressions, and head movement. Every script element except for school size was mentioned at least once in comments. Some elements had up to 15 comments. The script lines were specifically designed to represent certain personality traits, but those were not always interpreted as intended. For example, when one avatar mentioned living at home with parents, some subjects interpreted that as intelligent budgeting while others saw that as lack of self-efficacy. 7 6 5 4 3 2
Dark White
In te lle ct Sy m pa W th ill y in g to Tr us t?
io n on sc io us ne ss M od es Se ty lfD isc ip lin e
Se lfC
oo pe ra t
ng er A
C
O
rd er lin es s
1 0
Fig. 4. Two-Way interaction of Race x Question (Significant questions only)
Expression of Personality through Avatars
255
4 Discussion 4.1 General Conclusions on Hypotheses The results rejected the null hypothesis of no difference in subtrait ratings by Personality. The two avatar personalities were specifically designed to be different and the ratings indicate that subjects perceived differences in the personalities. These findings are similar to those of Prabhala and Gallimore (2007) who found that subjects could perceive differences in personality even without a face. P1 was rated higher (traits more descriptive) on all 7 traits specifically designed into it. For P2, only 6 of the 9 traits that were built into the personality were perceived as “more descriptive” than P1. Two traits were rated as being more descriptive in P1 than in P2 (Orderliness and Intellect). There was no significant difference in ratings for Cooperation between P1 and P2. The magnitude of differences in the mean ratings varied across traits. For example, the difference in means for friendliness between P1 and P2 is 0.85, while the differences in all other traits built into P1 vary from 1.64 to 3.17. In P2, difference in mean ratings for self-discipline is only 0.47, while the other significant traits range from .0.67 to 2.79). It appears that traits for P2 are more difficult to model into the personalities. Traits that are related to emotion have larger differences (Anger and Anxiety). Adventurousness and activity level also has a larger differences, because scripts for P1 had notable sports and activity information, such as rock climbing or study abroad rather than P2’s swimming laps or walking. The null hypothesis that there would be no difference in ratings based on race and gender was rejected. We expected no differences, but race and gender of the avatar did play some role in perceptions of personality. For example, for P1 and P2, when there was a difference on Race, the dark skin had a higher rating for every item except anger. When there was a difference on Gender, in P1, the male had a higher score, but in P2, the women had a higher score. The interaction of RxG had mixed results across all subtraits. Overall, dark males had higher ratings on many personality traits except anger, which showed the lowest rating. It is difficult to understand the cause of these effects; we could not use race and gender as subject blocking variables so no comparison can be made between the subject’s race or gender and their ratings for the avatars of different races. Approximately 75% of the subjects were white. 4.2 Implications for Future Research The different ways that some script elements were interpreted by subjects indicates that future research should include a personality profile of the subject to see if the person’s preferences affect their favorable ratings on other personality types. Additionally, implications of racial bias in the RxQ results mean that future studies should track the race of the subject to see how the person’s race affects responses. Subjects should clearly understand that their answers are anonymously combined with others’ so that no judgments are made on their individual racial preferences. Some comments mentioned voice tone and accent, so future research should consider using different human and computer-generated voices to address this, especially when non-native English speakers are involved. Personality can also affect interpretation of tone; in one study extroverts were more attracted to extroverted voices; similarly, introverts were more attracted to introverted voices [17].
256
J. Cloud-Buckner et al.
The 571 comments about A, L, B showed that most comments referred to avatar verbal elements rather than visual appearance, possibly indicating that subjects’ biases for avatar appearance were revealed only through ratings, not through the comments. The scenario of avatars presenting a campus tour was a good vehicle to convey personality preferences. Differences between the schools were minimal but still conveyed some personality preferences of the avatars. In future studies, it would be interesting to see how different scenarios affect ratings. With a growing need for computer agents that resemble human behavior, this research confirms that personality can be both modeled and perceived and provides insight on how gender and race affect the perceptions of personality.
References 1. Kang, H.-S., Yang, H.-D.: The visual characteristics of avatars in computer-mediated communication: Comparison of internet relay chat and instant messenger as of 2003. Int. J. Hum.-Comput. St. 64(12), 1173–1183 (2006) 2. Gorini, A., Gaggioli, A., Vigna, C., Riva, G.: A second life for eHealth: Prospects for the use of 3-D virtual worlds in clinical psychology. J. Med. Internet Res. 10(3), e21 (2008) 3. Monahan, T., McArdle, G., Bertolotto, M.: Virtual reality for collaborative e-learning. Computers 50(4), 1339 (2008) 4. Heinrichs, W., Youngblood, P., Harter, P., Dev, P.: Simulation for team training and assessment: Case studies of online training with virtual worlds. World J. Surg. 32(2), 161– 170 (2008) 5. Hilty, D., Alverson, D., Alpert, J., Tong, L., Sagduyu, K., Boland, R.: Virtual reality, telemedicine, web and data processing innovations in medical and psychiatric education and clinical care. Acad. Psychiatr. 30(6), 528–533 (2006) 6. Prabhala, S.V., Gallimore, J.J.: Designing computer agents with personality to improve human-machine collaboration in complex systems. Wright St. Univ. (2007) 7. Bente, G., Rüggenberg, S., Krämer, N.C., Eschenburg, F.: Avatar-mediated networking: Increasing social presence and interpersonal trust in net-based collaborations. Hum. Commun. Res. 34(2), 287–318 (2008) 8. Rizzo, P., Veloso, M., Miceli, M., Cesta, A.: Personality-Driven Social Behaviors in Believable Agents. In: Proceedings of the AAAI Fall Symposium on Socially Intelligent Agents (1997) 9. Nasoz, F., Lisetti, C.L.: MAUI avatars: Mirroring the user’s sensed emptions via expressive multi-ethnic facial avatars. J. Visual Lang. Comput. 17, 430–444 (2006) 10. Masuda, T., Ellsworth, P., Mesquita, B., Leu, J., Tanida, S., De Veerdonk, E.: Placing the face in context: Cultural differences in the perception of facial emotion. J. Pers. Soc. Psych. 94(3), 365–381 (2008) 11. Arya, A., Jefferies, L.N., Enns, J.T., DiPaola, S.: Facial actions as visual cues for personality. Comput. Animat. Virt. W 17(3-4), 371–382 (2006) 12. Lee, K., Nass, C.: Social-psychological origins of feelings of presence: Creating social presence with machine-generated voices. Media Psychol. 7(1), 31–45 (2005)
User-Definable Rule Description Framework for Autonomous Actor Agents Narichika Hamaguichi1, Hiroyuki Kaneko1, Mamoru Doke2, and Seiki Inoue1 1
Science & Technical Research Laboratories, Japan Broadcasting Corporation (NHK) 1-10-11, Kinuta, Setagaya-ku, Tokyo, 157-8510, Japan {hamaguchi.n-go, kaneko.h-dk, inoue.s-li}@nhk.or.jp 2 NHK Engineering Services, Inc. 1-10-11, Kinuta, Setagaya-ku, Tokyo, 157-8540, Japan
[email protected]
Abstract. In the area of text-to-video research, our work focuses on creating video content from textual descriptions, or more specifically, the creation of TV program like content from script like descriptions. This paper discusses a description framework that can be used to specify rough action instructions in the form of a script that can be used to produce detailed instructions controlling the behavior and actions of autonomous video actor agents. The paper also describes a prototype text-to-video system and presents examples of instructions for controlling an autonomous actor agent with our designed descriptive scheme. Keywords: Autonomous Actor Agent, Digital Storytelling, Text-to-Video, TVML, Object-Oriented Language.
1 Introduction Research into digital storytelling has attracted considerable interest in recent years, and one approach of producing computer graphics (CG) video content from textual descriptions has inspired a number of studies around the world. (We refer to this approach as “text-to-video” [1]) In text-to-video production, figuring out how to make the actor agents (CG characters) in the video act and behave naturally is critically important. In big production animated films, the mannerisms and behavior of actor agents can be manually edited on a frame-by-frame basis which is extremely costly in terms of man-hours, time, and budgets. But for smaller scale or personal video productions such lavish and costly production techniques are impractical, thus creating a demand for an autonomous method of controlling actor agents. A number of studies have addressed this issue of autonomous actor agents [2]. Researchers have also investigated language-based descriptive methods of producing video content and controlling the actions and behavior of actor agents, including a specially designed scripting approach [3] and a method of controlling the behavior of actor agents using natural language instructions [4]. Most of these J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 257–266, 2009. © Springer-Verlag Berlin Heidelberg 2009
258
N. Hamaguichi et al.
studies of autonomous actor agent action and ways of describing such actions are only able to describe such actions using a limited vocabulary based on rules set up in advance under limited conditions. The problem is that if the user wants to add a new autonomous action rule or wants to modify an existing rule, there are very few schemes giving users access to the rules, and the expandability of those that do provide access is quite limited. This led us to design a system that is functionally separated into two parts—a video content production part and an object-oriented description part— and instead of using a special proprietary language, the object-oriented description part uses a dynamic programming language that can be run as is from the source code without compiling the code beforehand. Users are thus able to add, modify, and reuse actor agent action rules by directly accessing the source code. Moreover, because this approach is based on an existing programming language, it is infinitely expandable according to the whims and desires of the user. In the next section, we will lay out the principle issues that will be addressed in this paper.
2 Requirements In this section we consider the requirements needed to enable users to represent rules that control the behavior and actions of actor agents. 2.1
Openly Modifiable Object-Oriented Language
The object-oriented approach permits functions to be encapsulated which are highly beneficial in terms of reusability, and today many advanced programming languages have adopted the object-oriented approach. There are essentially two types of objectoriented languages: languages that are executed after the source code is compiled (compiler languages), and languages that are not pre-compiled but are interpreted at run-time (dynamic programming languages or interpretive languages). In compiler languages, the source code is separated from the executable file, and because the executable file is a black box, the user is unable to modify functions or copy and reuse portions of functions even if he is able to use functions created by others. To achieve our objectives, we need a language that will enable users to add, modify, and reuse rules for controlling the behavior and actions of actor agents. In short, we need an object-oriented dynamic programming language that permits the user to access the source code. 2.2
Versatile Layered Structure for Different Types of Users
With the goal of using descriptive language to control the actions of actor agents, our first concern is to achieve the desired behavior or action using the simplest possible expressions without inputting detailed instructions. The problem with the objectoriented languages described earlier is that expressions inevitably become much more complicated than can be handled by a simple script language the more encapsulation and reuse is involved.
User-Definable Rule Description Framework for Autonomous Actor Agents
259
The level of language used also varies depending on the type of user and the intended use. For example, if one wants to produce a simple video clip with a minimum of time and effort, the level of language abstraction and types of data manipulated are very different than if one wants to produce content in which very detailed actions and timing of the actor agents are critically important. In order to address these issues, we adopted a three-layer structure that can be tailored to different kinds of users and different intended uses. The lower layer (detailed description instructions) is for pros enabling descriptions supporting detailed video content to produce professional-grade video content. The upper layer (simple description instructions) is for beginners or amateurs. It hides the complexity of detailed descriptions, and allows amateur users to produce video content using relatively simple descriptions. The rules that control the actions and behaviors of actor agents are described in the middle layer sandwiched between the upper and lower layers. Essentially, this layer converts the relatively simple instructions received from the upper layer to the more detailed expressions required by the lower layer.
3 Language Design and Prototype System Based on the requirements outlined in the previous section, we designed a prototype system consisting of two parts as illustrated in Fig.1: an Object-Oriented Description Module and a Presentation Module. The Object-Oriented Description Module is a three-layer structure as described above. It consists of a Simple Scripting Layer, an Automatic Production Layer, and a Detailed Description Layer, and runs using the dynamic programming language Python. A series of rough sequential instructions similar to the script for a TV program are described in the upper Simple Scripting Layer. The instructions are in a format that even someone with little or no experience with programming languages can understand and edit. The rules controlling the behavior and actions of the actor agents are described in the middle Automatic Production Layer. The layer receives the rough instructions described in the upper Simple Scripting Layer, acquires the situation using the functions of the lower Detailed Description Layer, then automatically determines the specific actions and behavior of the agents based on rules in the middle layer, which are sent to the lower Detailed Description Layer for execution. The lower Detailed Description Layer provides a simple wrapped interface with a TVML Player [5] in the Presentation Module, and through this intermediary wrapper, the Detailed Description Layer obtains the video states and delivers the instructions. Instructions are created using a descriptive language called TVML (TV program Making Language), and the states are acquired using an external application program interface called TvIF. TVML is a self-contained language featuring all the capabilities needed to create TV program-like video content, including detailed control over the speech and movement of actor agents, cameras, display of subtitles, and so on. Essentially, complex descriptive instructions are substituted for detailed control. The TVML Player uses software to interpret a TVML Script, then generates video content using 3D computer graphics, synthesized speech, and other production
260
N. Hamaguichi et al.
Fig. 1. Layered structure of the object-oriented video content production framework
techniques. The TVML Script only provides the TVML Player with a one-way stream of instructions, and the states are returned by way of the TvIF interface. Moreover, the TVML Player was developed using C++. This relieves the user of dealing with the technically challenging aspects of production such as handling 3D computer graphics which is done by the TVML Player, but the internal operation of the TVML Player itself is unalterable. By adopting the layered approach outlined above, users can employ whichever layer is best suited to their skills and objectives. And because users have direct access to the source code of each layer, they can add, modify, reuse and inherit classes of rules controlling the actions of actor agents.
4 Application Examples for Each Layer In this section we will provide description and application examples for each layer. 4.1
Simple Scripting Layer
The Simple Scripting Layer is the upper layer. It is based on a very intuitive format: a sequential string of instructions without any inherent control structure, much like the script of a TV program. It is thus transparent and easily manipulated by anyone, even people with little or no experience with programming languages. Application Example. Fig.2 shows a typical example of an application in which a description in the Simple Scripting Layer is edited, and the internal descriptions written in Python are the followings. These descriptions in the Simple Scripting Layer consist of a simple line-by-line sequence of unstructured instructions. So using a tool such as illustrated in Fig.2, the user can easily edit the script in much the same way as
User-Definable Rule Description Framework for Autonomous Actor Agents
261
Fig. 2. Application example for editing the descriptions in the Simple Scripting Layer
using a word processor. Any user capable of using a word processor is thus capable of producing video content! Description Examples import apetest
#Import of Automatic Production module
ape=apetest.APETest()
#Constructor
ape.title(“Script Example”) ape.text(“I wanna go over there!”) #Speech ape.action_walk_to_goal()
#Action
ape.subimage(“goal.jpg”)
#Show image
ape.action_look_at_camera() ape.text(“You see?”) ape.end() 4.2
Automatic Production Layer
The specific rules that control the actions and the behavior of actor agents on the basis of instructions received from the Simple Scripting Layer are described in the Automatic Production Layer. Description Examples. Here is an example of descriptions in the Automatic Production Layer. Action rules are represented in classes, and inherit a new class called APEBase. Basic action rules are defined in APEBase, so in order to create a
262
N. Hamaguichi et al.
new action rule, a user only needs to create or describe a rule that is different from that in the APEBase. The module for producing new action rules in this way is called the Automatic Production Engine (APE) [6]. goal.x=4 goal.z=0 class APETest(APEBase): … def setup(self): #Initialization self.A=tvml.Character(filename=”bob.bm”, x=-4) self.obst1=tvml.Prop(filename=”tsubo.obj”, x=1,… … def text(self, value):
#Speech
self.A.talk(text=value) … def subimage(self, value): #How to show image self.img=tvml.Prop() self.img.openimageplate(filename=value, platesizeh=3.6, platesizev=2.7) self.img.position(y=2, pitch=270) … def action_walk_to_goal(self):
#How to walk to goal
props=getPropNameList() #Get all prop names for prop in props: loc=findPath(obstacle=prop)
#Find a path per prop
self.A.walk(x=loc.x, z=loc.z)
#Simple walk
self.A.walk(x=goal.x, goal.z) As one can see, several method subroutines are defined: the setup method deals with initialization, the text method relates to the speech of actor agents, and the subimage method relates to how images are presented. The action_walk_to_goal method is a new subroutine created by the user that instructs the actor agent to avoid obstacles as it proceeds to a goal (the actual description has been simplified in this example). Previously, the only walk-related method defined in the lower Detailed Description Layer instructed the action agent to proceed in a straight line from its current position to the goal. The new action_walk_to_goal subroutine calculates a path from the position and size of obstacles (3D bounding box), thus
User-Definable Rule Description Framework for Autonomous Actor Agents
263
Fig. 3. Operation of the action_walk_to_goal method
enabling the user to define an action rule permitting the agent to proceed to a goal without bumping into things. Output Example. Fig.3 shows an example of how the action_walk_to_goal method is run based on descriptions in the Simple Scripting Layer. Users are thus able to add and modify the rules controlling the actions and behavior of agents in the Automatic Production Layer. Significantly, newly added action rules can be used and manipulated using an easy user-friendly format from the upper Simple Scripting Layer just like any other instruction. Here we have discussed a userdefined action rule enabling agents to avoid obstacles, but all sorts of powerful rules can be created in the same way, such as: − − - Actor agent actions playing to a particular camera: − - Actions of an agent can be controlled to play to a particular camera by acquiring the names, positions, and angles of the cameras. − - Actor agent actions can be synched to a movie file: − Actions of an agent can be synchronized to the playback timing of a movie by acquiring the playback timing of the movie file. − - Actor agent behavior can be synched to speech:
264
N. Hamaguichi et al.
− The expressions and gestures of an agent can be synched to the character strings of the synthesized speech lines spoken by the agent. − - Evaluation of the output screen layout: − The layout or composition of the output video screen can be evaluated and the agent's actions adjusted to the layout by acquiring an on-screen 2D bounding box. These various types of automatic production rules are actually executed by the functional capabilities of the Detailed Description Layer. Let us next take a closer look at the Detailed Description Layer, which must be endowed with powerful capabilities in order to execute these rules. 4.3
Detailed Description Layer
The TvIF/TVML are wrapped by the method subroutines that are incorporated in the Detailed Description Layer. This layer can obtain a comprehensive range of states in the TVML Player including the prop bounding box, camera information, movie playback timing, and a host of other states. Table 1 shows some of the state acquisition methods that are incorporated in the Detailed Description Layer. Note too that all of the states incorporated in the TVML Player—orientation angle and coordinates of the actor agents, speed, timing, and so on—can be directly controlled by the TVML Script. This allows more experienced users who want direct control over the production and editing of their video content to directly work at this layer. Description Examples. Here are some typical examples of Detailed Description Layer descriptions. These descriptions enable detailed control over the movement of actor agent joints, camera movements, and a host of other variables. buddy.gaze(pitch=-30, wait=NO) buddy.turn(d=-120, speed=0.5) buddy.definepose(pose=GetWhisky, joint= LeftUpperArm, rotx=-105.00, roty=25.00) buddy.definepose(pose=Getwhiskey, joint=Chest, rotx=5.00, roty=-15.00, rotz=0.00) buddy.pose(pose=Getwhiskey, speed=0.25, wait=NO) tvml.wait(time=0.7) cam1.movement(x=-0.49, y=1.57, z=1.75, pan=400, tilt=-5.00, roll=1.00, vangle=45.00, transition=immediate, style=servo) whisky.attach(charactername=buddy, joint=RightHand, switch=ON) Output Example. Fig.4 illustrates how the agent actually moves based on the Detailed Description Layer descriptions listed above.
User-Definable Rule Description Framework for Autonomous Actor Agents
265
Table 1. Typical state acquisition methods incorporated in the Detailed Description Layer
getCharacterLocation getCharacterTalkingText getCameraCurrent getCameraLocation getPropNameList getPropBoundingSolid getPropBoundingBox getMovieCurrentTime
Current position of an actor agent Character string currently spoken by an actor agent Name of camera currently selected Current location and angle of camera List of prop names 3D bounding box of a prop 2D bounding box of an on-screen prop Playback position of a movie file
Fig. 4. Movement based on Detailed Description Layer descriptions
5 Conclusions In this work we designed an object-oriented scheme for building a descriptive language framework enabling users to add, modify, and replay rules controlling the behavior and actions of autonomous actor agents. By dividing the object-oriented description into three layers—the Simple Scripting Layer, the Automatic Production Layer, and the Detailed Description Layer—we have implemented a structure that can be tailored to different kinds of users and different intended uses. This scheme allows users themselves to describe rules for controlling the behavior and actions of autonomous actor agents by editing the Automatic Production Layer. Leveraging this Automatic Production Layer based scheme, we plan to design a wide range of autonomous actor agents and develop applications that use the agents.
266
N. Hamaguichi et al.
References 1. Bindiganavale, R., Schuler, W., Allbeck, J., Badler, N., Joshi, A., Palmer, M.: Dynamically Altering Agent Behaviors Using Natural Language Instructions. In: The 4th International Conference on Autonomous Agents, Proceedings, Barcelona, Spain, pp. 293–300 (2000) 2. Funge, J., Tu, X., Terzopoulos, D.: Cognitive Modeling: Knowledge, Reasoning and Planning for Intelligent Characters. In: The 26th International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 1999), Proceedings, Los Angeles, USA, pp. 29–38 (1999) 3. Hamaguchi, N., Doke, M., Hayashi, M., Yagi, N.: Text-based Video Blogging. In: The 15th International World Wide Web Conference (WWW 2006), Proceedings, Edinburgh, Scotland (2006) 4. Hayashi, M., Doke, M., Hamaguchi, N.: Automatic TV Program Production with APEs. In: The 2nd Conference on Creating, Connecting and Collaborating through Computing (C5 2004), Kyoto, Japan, pp. 20–25 (2004) 5. http://www.nhk.or.jp/strl/tvml/ 6. Perlin, K., Goldberg, A.: IMPROV: A System for Scripting Interactive Actors in Virtual Worlds. In: The 26th International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 1996), Proceedings, New Orleans, USA, pp. 205–216 (1996)
Cognitive and Emotional Characteristics of Communication in Human-Human/Human-Agent Interaction Yugo Hayashi and Kazuhisa Miwa Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464--8601, Japan {hayashi,miwa}@cog.human.nagoya-u.ac.jp
Abstract. A psychological experiment was conducted to capture the nature of Human-Human and Human-Agent Interactions where humans and computer agents coexist in a collaborative environment. Two factors were manipulated to investigate the influences of the 'schema' about and the 'actual partner' on the characteristics of communication. The first factor, expectation about the partner, was controlled by the experimenter's instruction, manipulating with which partner (human or computer agent) participants believed to be collaborating. The second factor, the actual partner, was controlled by manipulating with which partner (human or computer agent) participants actually collaborated. The results of the experiments suggest that the degree of the refinement of the conversation controlled as the actual partner factor affected the emotional and cognitive characteristics of communication; however the schema about the opponent only affected the emotional characteristics of communication. Keywords: Collaboration, Human-Human Interaction, Human-Agent Interaction, Communication.
1 Introduction Communication across different computers connected by the Internet continues to increase due to the development of computer network technologies. In such situations, research on technologies for supporting such collaboration using computer agents has appeared [6]. In the fields of Human Computer Interaction (HCI), there are studies focusing on the nature of humans and computer agents [3]. In our study, we conduct a psychological experiment to capture the nature of Human-Human Interaction (HHI) and Human-Agent Interaction (HAI) in a setting where humans and computer agents coexist. In daily life, we make inferences and decisions about an opponent based on information received from it. For example, when he reacts politely, we may guess his character and attitudes through his reactions. This indicates that in communication the contents of conversation are important for determining the characteristics of communication. In contrast, in the initial stage of communication, information about an opponent is limited. Therefore, people rely on the related knowledge of an opponent obtained in advance and J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 267–274, 2009. © Springer-Verlag Berlin Heidelberg 2009
268
Y. Hayashi and K. Miwa
infer him based on this knowledge. Actually, social psychological research has indicated the importance of top-down processing based on the knowledge about the speaker such as schema and stereotypes in interpersonal cognition [1]. In communication through the Internet where we do not face an opponent directly, the above two perspectives, "actual conversation" and "schema," function as follows: (1) either the opponent is believed to be a human or computer agent (based on the schema), and (2) either the actual opponent's conversation is sophisticatedly adaptive or simply machine-like. We performed a psychological experiment in which the two factors were manipulated to investigate the characteristics of communication where humans and computer agents coexist.
2 Method 2.1 Task We assume that the above interesting aspects of communication emerge remarkably in a situation where interpersonal conflicts emerge during collaboration. To establish such a situation, we used an experimental paradigm designed by Hayashi, Miwa, and Morita [2], in which two participants, each having a different perspective, communicate with each other. As shown in Figure 1, a stimulus is constructed where black and white unit squares are randomly arranged on a 6- by 6-grid. We call each surface comprised of black and white unit squares an 'object'. In Figure 1, there are a total of ten objects: five black and five white. This stimulus is displayed on either a black or white background. This stimulus is presented to one participant on a black background; to the other on a white background. Based on Gestalt psychological principles, the participants acquire a single perspective focusing on objects whose color is the opposite of the background color. Each participant informs the other of the sequence of the number of objects he/she perceives during the task (Figure 2). In the initial stage of the experiment, one experiences difficulty understanding the perspective of the other; with miscommunication, interpersonal conflict occurs, which the participants must resolve to complete the task.
Fig. 1. Example of stimuli
Cognitive and Emotional Characteristics of Communication
269
Fig. 2. Series of stimuli presentations
2.2 Experimental System Figure 3 illustrates an example screen shot. The stimulus is presented in the center. Below it, there is a text field where the participants input their messages and receive their partner's messages. Just one sentence per trial is permitted, and at most 30 words are accepted. Buttons for changing the slides, sending messages, and terminating the experiment are placed at the bottom of the screen. We developed a natural language conversation agent whose responses are generated based on scripts. The agent is constructed to respond to the sentences input from the participants. This agent has mechanisms for conversation such as extracting keywords, activating scripts, and generating responses utilizing keywords drawn from the partner's sentences inputted.
Fig. 3. Example screenshot
270
Y. Hayashi and K. Miwa
3 Experiment Design 3.1 Summary The experiment has a 2 x 2 between-subjects factorial design. The first factor was controlled by the experimenter who manipulated with which partner (human or computer agent) the participants believed themselves to be collaborating. This represents the manipulation for the schema about the opponent. The second factor was controlled by manipulating with which partner (human or computer agent) the participants actually collaborated. This represents the manipulation of the degree of refinement of the conversation as the actual partner. In the following, we use small letters to express characters in the first factor, human and agent, and capital letters to express characters in the second factor, HUMAN and AGENT. For example, in the agent/HUMAN condition, the participants were instructed that the collaborative partner was a computer program; however they conversed with a human partner (details are discussed below). We constructed four conditions: human (instruction)/HUMAN (actual partner), agent/HUMAN, human/AGENT, and agent/AGENT. 3.2 Experiment Situation One hundred and three undergraduates participated in the experiment (male = 57, female = 46, M age = 18.82 years). They were set up to always speak first in the AGENT conditions where the participants conversed with the agent. On the other hand, the participants were set up to speak both first and second in the HUMAN conditions where the participants conversed with real people. Therefore, twice as many participants were assigned to the HUMAN conditions. Table 1 shows the number of participants assigned to each condition. The experiment was performed in small groups consisting of eight to twelve participants. Two types of computers were set up in a laboratory: machines connected to the Internet by wireless LAN and those running independently of other computers. These computers were placed so that no participant could peek at other screens (Figure 4). For manipulating the first factor, the participants were instructed that the collaborative partner was either: (1) someone in the room or (2) a program installed in the computer. For manipulating the second factor, the actual partner was controlled by assigning either: (1) a computer connected to someone in the room through wireless LAN or (2) one in which the conversation agent was running independently from others. Table 1. Experimental design and number of assigned participants
Actual partner
HUMAN AGENT
Instruction humans agents 34 34 18 17
Cognitive and Emotional Characteristics of Communication
271
Fig. 4. Experimental situation
3.2 Questionnaires In our study we utilized a questionnaire developed by Tsuduki and Kimura [4] answered by the participants after their conversations to solve the task were terminated. This questionnaire, which asks about the psychological characteristics of the media in communication, was comprised of 16 questions scored with a five-point scale. We classified the 16 questions into three measures. The first measure denotes the "interpersonal stress" factor consisting of five questions about such feelings as tension, severity, and fatigue. The second measure denotes the "affiliation emotion" factor consisting of eight questions about such feelings as friendliness to the opponent, ability to discuss personal matters, and happiness. The third measure denotes the "information propagation" factor consisting of three questions about such feelings as purpose and effectiveness in collecting information. In each of the three measures, the rating scores were totaled and divided by the number of subordinate questions. These rating scores (i.e., mean numbers) were used for analysis.
4 Results Figure 5 indicates the results. The vertical axis represents the mean value of the ratings, and the horizontal axis represents each measure. For participants who talked first, a 2 x 2 ANOVA was conducted on each measure with the factor of instruction (human vs. agent) and the factor of actual partner (HUMAN vs. AGENT) as a between-subject factor. For participants who talked second, a t-test was conducted on each measure. These participants were assigned only to two conditions: human/HUMAN and agent/HUMAN; therefore, only the effect of the instruction factor was examined.
272
Y. Hayashi and K. Miwa
5 4.5 4
humanHUMAN agentHUMAN
humanAGENT agentAGENT
3.5 3 2.5 2 1.5 1 Interpersonal stress
Affiliation emotion
Information propagation
(a) Participants who talked first
5 4.5
humanHUMAN
agentHUMAN
4 3.5 3 2.5 2 1.5 1 Interpersonal stress
Affiliation emotion
Information propagation
(b) Participants who talked second Fig. 5. Questionnaires
4.1 Interpersonal Stress For participants who talked first, interaction on interpersonal stress was significant (F(1,65)=7.34, p<.01). An analysis of the simple main effect was conducted. Focusing on the actual partner factor, the rating score of the human condition was significantly higher than the agent condition in the HUMAN condition (p<.01). On the other hand, the rating score was not significantly different in the AGENT condition (p=.63). Focusing on the instruction factor, the rating score in the AGENT condition was significantly higher than the HUMAN condition in the human condition (p<.01). On the other hand, the rating score was not significantly different in the agent condition
Cognitive and Emotional Characteristics of Communication
273
(p<.01). In addition, there was a main effect of both factors of instruction and actual partner (F(1,65)=4.07, p<.05; F(1,65)=4.96, p<.05). For participants who talked second, the rating score in the agent/HUMAN condition was significantly higher than in the human/HUMAN condition (F(1,32)=4.25, p<.05). 4.2 Affiliation Emotion For participants who talked first, the interaction of affiliation emotion was not significant (F(1,65)=0.06, p=.8). There was a marginal main effect of the factor of instruction (F(1,65)=3.66, p=.06), and the rating score of the human condition was marginally higher than the agent condition. There was a main effect of the factor of actual partner (F(1,65)=15.12, p<.01), and the rating score of the HUMAN condition was significantly higher than the AGENT condition. For participants who talked second, the rating score of the human/HUMAN condition was significantly higher than the agent/HUMAN condition (F(1,32)=6.70, p<.05). 4.3 Information Propagation For participants who talked first, the interaction of information propagation was not significant (F(1,65)=0.01, p=.92). There was not a main effect of the factor of instruction (F(1,65)=0.48, p=.49). There was a significant main effect of the factor of actual partner (F(1,65)=10.49, p<.01), and the rating score of the HUMAN condition was significantly higher than the AGENT condition. For participants who talked second, the rating score was not statistically significant (F(1,32)=0.10, p=.75). 4.4 Summary Table 2 summarizes the statistical results. The asterisk represents a significant difference, the plus sign represents a marginal difference, and the minus sign represents no differences. Table 2. Summary of results Talking first Instruction Actual partner Interpersonal stress Affiliation emotion Information propagation
* + -
* * *
Talking second Instruction Actual partner
* * -
5 Discussion and Conclusion We assumed that the interpersonal stress and affiliation emotion scores are related to the 'emotional' features of communication, whereas the information propagation score is related to the 'cognitive' features of communication. The overall results of the experiments suggest the following: (1) the degree of the refinement of the conversation
274
Y. Hayashi and K. Miwa
controlled as the actual partner factor affected the emotional and cognitive characteristics of communication, and (2) the schema about the opponent only affected the emotional characteristics of communication. In Yamamoto, Matsui, Hiraki, Umeda, and Anzai [5], the participants played Shiritori, a popular Japanese word game, with a partner by computer. Even though the actual identity of their partners was a computer agent, the participants, who were informed that they were facing a human player, gave significantly higher pleasure ratings than those who were informed that they were facing a computer player. Pleasure ratings are representative emotional measures. Therefore, the result of the preceding study is consistent with our finding: i.e., the instruction effect, relying on the participants' schema about their partners, appears in the emotional characteristics of communication. However, as a whole, the effect of the actual partner factor was dominant, and it was detected in all measures. This contradicts the findings of a study with a simple computer program called ELIZA, a counselor-like agent program [7]. Even though the responses were very simple, people felt empathy throughout their interaction with ELIZA. This study indicates that the correlation is not simple between the elaboration of a partner's conversation and the quality of interaction. What caused the difference? In the preceding study, the participants conversed with the computer without any specific task goals. However, in our study, a relatively complicated situation was given to the participants, who had to find a rule as a task goal. These characteristics of our task probably caused the difference: i.e., the actual partner factor was dominant in our study, while in the previous study the effect of actual partner was limited.
References 1. Fisk, T.S., Taylor, E.S.: Social cognition. McGraw-Hill Education, New York (1991) 2. Hayashi, Y., Miwa, K., Morita, J.: A laboratory study on distributed problem solving by taking different perspectives. In: Proceedings of the 28th Annual Conference of the Cognitive Science Society, pp. 333–338 (2006) 3. Parise, S., Kiesler, S., Sproull, L., Waters, K.: Cooperating with life-like interface agents. Computers in Human Behavior 70, 123–142 (1999) 4. Ye, Y., Churchill, E.F.: Agent supported cooperative work. Kluwer Academic Publisher, Dordrecht (2003) 5. Tsuduki, T., Kimura, Y.: Characteristics of media communication of college students: comparing face to face, mobile phone, mobile mail, and electronic mail. Applied sociology studies 42, 15–24 (2000) (in Japanese) 6. Yamamoto, Y., Matui, T., Hiraki, K., Umeda, S., Anzai, Y.: Interaction with a computer system:-a study of factors for pleasant interaction. Cognitive Studies 1, 107–120 (1994) (in Japanese) 7. Weizenbaum, J.: A computer program for the study of natural language communication between man and machine. Communications of the Association for Computing Machinery 9, 36–45 (1966)
Identification of the User by Analyzing Human Computer Interaction Rüdiger Heimgärtner IUIC – Intercultural User Interface Consulting, Lindenstraße 9, 93152 Undorf, Germany
[email protected]
Abstract. This paper describes a study analyzing the interaction of users with a computer system to show that user identification is possible only by analyzing the user interaction behavior. The identification of the user can be done with a precision of up to 99.1% within one working session. This classification rate can be improved using additional interaction indicators. Moreover, this kind of protection method using the analysis of the interaction of the user with the system cannot be betrayed because of the uniqueness of the user interaction patterns. The method and the results of the study will be presented and discussed. Keywords: user interface, user identification, HCI analysis, interaction analysis, interaction indicator, tool, theft protection, computer protection, culture, user interface design, personalization, identification, recognition.
1 Background and Significance The identification of the user can be found out using entrance oriented protection methods at computer systems (e.g. entering passwords). However, as soon as the personal acquisition dates of a user are known, another user can attain an unauthorized permanent use of the computer. Therefore, the loss of personal acquisition data can lead to the denial of the system functionality even for actually authorized users. Moreover, most of the entrance oriented user identification methods (such as matching of password, fingerprint, iris or face) do not ensure access security that cannot be cracked. Until now, recognizing the differences in the user interaction behavior in HCI has mainly been used to gather knowledge for the design or the adaptation of the HCI appropriately to the user needs. For example, there are cultural differences between Chinese and German users using a computer system which led to different user interaction profiles (cf. [1]). However, analyzing the interaction behavior can also be used to identify the user currently working with the computer. For example, from the user interaction events like keyboard strokes, individual user profiles can be generated by the computer system (cf. [2]). Furthermore, usage oriented protection and user identification methods (as if the one presented in this paper) can check the user identity permanently at runtime, i.e. during working with the computer. In this paper, the focus lies on analyzing mainly mouse events. Method, study, and results are presented and discussed. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 275–283, 2009. © Springer-Verlag Berlin Heidelberg 2009
276
R. Heimgärtner
2 Method and Related Work User profiles can be created and recognized by the permanent logging and evaluation of human computer interaction. The interaction samples can be analyzed while using the system (e.g. by analysis of interaction frequency and style, cf. [1]). The most relevant interaction means in HCI available today in computer systems are mouse and keyboard (cf. [3]). Bartmann investigated the user interaction with the keyboard very extensively (cf. [2]). They stated that every user has himself his own style of entering chars via keyboard. Röse and Heimgärtner researched the user-system interaction in the cultural context (cf. [4] and [1]). Heimgärtner recorded and analyzed the values of some of the most significant “interaction indicators” for HCI design: number of mouse moves and mouse clicks, number of using mouse wheel and number of keystrokes (cf. [1]). The objective of the study presented in this paper is to find out if the user can be identified simply by analyzing the events the user produces while interacting with a computer system. 2.1 Test Setting To record the user interaction behavior during interacting with the computer, a test program in C# has been developed.1 Several working sessions of two professional software developers have been recorded in September 2008 at IUIC.2 The test environment was normalized as far as possible by the use of similar working conditions like the same operating system, integrated development environment, similar experience in programming and usage of computers (cf. table 1). Hence, most of the characteristic aspects of the test setting except nationality and age are very similar. However, user 1 studied already in China and user 2 works and lives in Germany. Furthermore, the education and experience in programming is similar. Hence, in this case, nationality and age does not matter very strongly because the interaction with the system depends also heavily on context and situation (cf. [1]). This can be also supported by the fact that according to Inglehart [6], the secondary and tertiary cultural imprint can cover or even change primary cultural imprint.3 Furthermore, this effect is small in contrast to the high robustness of the mean values of the interaction parameters got in this study, which indicates that the difference between the mean values of users with the same nationality is still big enough to identify the single user, by his interaction behavior.4 1
Using the IDE “Visual Studio 2003”, cf. [5]. Intercultural User Interface Consulting (IUIC) is a service provider advising and supporting the designers of intercultural user interfaces. The enterprise has come into being due to the steady demand for guidelines for the design of international products in the course of globalization. IUIC offers corresponding knowledge for user interface designer, software developer, and project manager by trainings, workshops, consulting, and research. This provides knowledge for becoming market leader, having shorter development times and higher product conformity with foreign markets. 3 However, the influence of being abroad a long time on the primary cultural imprint is controversially discussed in cultural studies (cf. [7]). 4 The Welch-Test to test the robustness of the equality of means is significant for all used measurement variables in this study, which indicates reliable results. 2
Identification of the User by Analyzing Human Computer Interaction
277
Table 1. Comparing the test setting Nationality Education Function Age Gender Working place Computer Experience in Programming Experience with PC Tasks Duration of session
User 1 German University Software Developer 41 Male Aside of those of User 2 Same as those of User 2 Same as those of User 2
User 2 Chinese University Software Developer 23 Male Aside of those of User 1 Same as those of User 1 Same as those of User 1
Same as those of User 2 Programming, Project management
Same as those of User 1 Programming, Documentation 8 hours
8 hours
2.2 Test Procedure and Data Collection The test tool has supported the test procedure. To begin a session, the user has to register with the system using a login procedure by specifying user name and password. In the learning stage, this allows the system to known, which user currently interacts with the system. During the user interaction with the system, the events are recorded and sequentially stored in a history file including date and time stamp. During the sessions, the following system events of the two test participants were recorded simultaneously every second using the test program: • • • • • • • • • •
Date and time stamp Mouse movements Mouse clicks Wheel movements Keyboard entries Process number Application number Current process Active window Application list.
Table 2 shows the descriptive statistics of the variables from the data collection of one session recorded in September 2008. 2.3 Data Analysis The data analysis was carried out manually using the statistic-tools SPSS 16.0 and Statistica 8.5 applying the statistical methods "Jack-Knife" (discriminance analysis)
278
R. Heimgärtner Table 2. Descriptive Statistics of the Data Collection
Mean Mousemoves_total Mousemoves_second Mousemoves_per_second Mousewheels_total Mousewheels_second Mousewheels_per_second Mouseclicks_total Mouseclicks_second Mouseclicks_per_second Keystrokes_total Keystrokes_second Keystrokes_per_second
159089 15,61 13,49 2267,62 0,22 0,21 2290,67 0,22 0,21 1596,11 0,20 0,11
User 1 Std. Error 653 0,15 0,04 9,65 0,00 0,00 9,88 0,01 0,00 9,87 0,01 0,00
Std. Deviation 98983 22,48 5,88 1462,48 0,53 0,05 1497,23 1,33 0,07 1495,71 1,14 0,07
Mean 177674 17,64 17,05 1063,22 0,11 5,50 3406,90 0,26 0,46 1298,67 0,09 0,14
User 2 Std. Error 693 0,20 0,03 4,45 0,00 0,03 14,49 0,01 0,00 4,30 0,00 0,00
Std. Deviation 108797 31,12 4,65 699,05 0,36 4,32 2274,22 1,91 0,68 675,18 0,66 0,05
and ANOVA (cf. [8]). For the interaction analysis, 12 computed parameters out of the collected data set regarding mouse and keyboard events have been used and stored chronologically in a data log file: • • • • • • • • • • • •
Number of mouse movements per second on average since start of data logging Number of mouse movements in the last second Number of mouse movements in total since start of data logging Number of mouse clicks per second on average since start of data logging Number of mouse clicks in the last second Number of mouse clicks in total since test program installation Number of Wheel movements per second on average since start of data logging Number of Wheel movements in the last second Number of Wheel movements in total since start of data logging Number of keyboard entries per second on average since start of data logging Number of keyboard entries in the last second Number of keyboard entries in total since start of data logging
3 Results 3.1 Classification Power and Classification Parameters Table 3 shows the classification result of the cases of the collected data using discriminance analysis. The users could have been identified 99.1% correctly just by analyzing the data of one working session.
Identification of the User by Analyzing Human Computer Interaction
279
Table 3. Correct User Identification of 99.1% Classification Results b, c User Predicted Group Membership (User) 1 2 Total Original Count 1 19573 366 19939 2 17 22931 22948 % 1 98,2 1,8 100,0 2 ,1 99,9 100,0 Cross-validated a Count 1 19571 368 19939 2 17 22931 22948 1 98,2 1,8 100,0 % 2 ,1 99,9 100,0 a) Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. b) 99.1% of original grouped cases correctly classified. c) 99.1% of cross-validated grouped cases correctly classified. Source: Table „User Lock 4.sav“
The interaction indicators for this high classification quote are presented in table 4. Table 4. Combination of interaction indicators exhibiting a classification quote of 99.1% Classification Function Coefficients Source: Table „User Lock 4.sav“
User 1
Mousemoves_total Mousemoves_second Mousemoves_per_second Mousewheels_total Mousewheels_second Mousewheels_per_second Mouseclicks_total Mouseclicks_second Mouseclicks_per_second Keystrokes_total Keystrokes_per_second
,000 -,003 ,761 -,028 -,055 1,677 -,004 ,048 8,695 -,005 238,001
2 3,047E-5 ,012 ,918 ,023 ,871 ,531 ,000 ,308 -,352 -,025 52,729
The classification matrix in table 5 (lines represent the observed classifications and the columns represent the forecasted classifications) shows the results of the evaluations of data of another session. The performance measurements yielded only insignificant deviations for the classification quotes as computed before. Table 5. User identification (97.4% successful) Source: Table „UserLock5.sav“
User 1 User 2 Total
Classification function % correct User 1 - p=,56517 96,73450 28853 98,47045 351 97,48934 29204
User 2 - p=,43483 974 22597 23571
On the one hand, this little difference in the classification rates can be caused by the shorter session length of (5 hours). Hence, the environmental conditions have
280
R. Heimgärtner Table 6. Combination of interaction indicators providing a classification rate of 97.4%
Classification function Source: Table „UserLock5.sav“
Mousemoves_total Mousemoves_second Mousemoves_per_second Mousewheels_total Mousewheels_second Mousewheels_per_second Mouseclicks_total Mouseclicks_second Mouseclicks_per_second Keystrokes_total Keystrokes_second Keystrokes_per_second
Groups User 1 - p=,56517 0,0002 0,0076 0,8519 -0,0066 0,1597 1,0160 -0,0022 0,1419 0,3847 -0,0019 0,1089 11,5202
User 2 - p=,43483 0,00003 0,00207 0,59341 0,00300 0,71770 0,41765 -0,00145 0,06763 0,24561 -0,00102 0,08073 3,85298
been slightly different. Therefore, other values of the classification variables in the discriminance analysis effected in another combination of interaction parameters that was included in the classification model (cf. table 6). Since not all interaction parameters do classify similarly well, mutual “disturbances" can arise in the context of the calculation of the classification quote. Although the classification rate of approx. 97% is very good, still an increase can be expected using additional well classifying interaction parameters. Even more and relevant parameters must be researched to receive proper information about the user behavior and to get information that is more suitable for user identification. This is made by combination with further interaction data like interaction breaks as well as sensor data, which can be used for the user identification. This is necessary for early identification, i.e. that the user can be recognized at best immediately after beginning the session. 3.2 Learning Curve and User Profiles The learning curve until the user can be identified (ordinate) as well as the time in hours (abscissa) can be taken from figure 1. Thereby, the simplest interaction parameters were selected, i.e. interaction numbers per second: mouse movements per second, mouse clicks per second, mouse wheels movements per second and key strokes per second. The analysis of the measured data shows a reconnaissance precision of approx. 98%) after a learning stage of about 2.5 hours. From the diagram in figure 1 can be seen that the learning stage could already have been ended after considerably shorter time. The course of the curves for user 1 lies more high than the course of the curves for user 2 (except for mousemoves_per_second) already shortly after the beginning of the session and represents approximately the same trend of the distances of the mean average values of the complete session. At approx. 0.25 to 0.5 hours, all mean average values already cut the curves of the users in the right distance to each other. Therefore, it can be assumed that the learning stage can be shortened correspondingly. The user profile is the more correct the more it is in common with the real interaction behavior of the user. Hence, the results like shortening of the learning stage,
Identification of the User by Analyzing Human Computer Interaction
281
Fig. 1. Learning curve of the user profile for some interaction parameters as well as the duration for the user identification (indicated by the similar course of the learning curves of both users after about 2.5 hours)
rising of the precision and distance between the mean average values as well as the correctness of the user profiles are expected to still improve with the rising number of measured and analyzed relevant interaction parameters.
4 Discussion 4.1 Problems and Restrictions On the one hand, the high classification rate can be explained by the fact that there are cultural differences in HCI as shown in [1] because one of the participants has been a Chinese and one a German user. Employing persons that are even more “homogeneous” as well as using bigger samples simultaneous must minimize this effect in feature studies. On the other hand, the kind of used applications by the test persons can vary very much. This may explain the differences not only in the number and in kind of used applications but also in the values of the interaction indicators representing the usage of different applications. The varying classification rates can be caused also
282
R. Heimgärtner
by methodological issues: Since e.g. the cases not classified correctly have immediately been in the database behind each other, fault data results. However, these can easily be corrected by calculating them out from the "correct" data by transforming the database. The classification rate climbs almost up to 100% through this and which is available for the system within short time, i.e. after approx. 15 min. to 2.5 hours of observation. The separating sharpness and the connections of the individual parameters for the identification of the user have to be investigated still in detail. Moreover, there are at least two problems arising from the interaction analysis approach. First, the mean of the values of the interaction indicators of all analyzed users can approach in worst-case the same value if observed over longer time or many sessions. Hence, the user profiles will be getting similar and their discriminance strength regarding different users’ decreases over time. To avoid this problem, only time slices or single sessions should be compared. Second, it is very difficult to get mean values for the different users especially at the beginning of the analysis phase because the system does not know if the observed values can be used as reference values to compare them to other users. Hence, learning stage is necessary. As long as a user is not identified obviously by his interaction with the computer yet, the protection system still cannot work profitably. Nevertheless, the means of the interaction indicators regarding the interaction behavior of the participants stabilized within about two hours. Hence, user identification can be done by the system in any case using the data of only one session. 4.2 Benefits and Expansions The algorithm derived from the user identification method of analyzing the interaction behavior of the user with the system cannot be deceived and bypassed because it is not possible to imitate perfectly the complex and dynamic interaction behavior of a certain user over time. This “dynamic” user identification method as applied while using computers is therefore fundamentally safer than "static" user identification methods. Furthermore, first statistical examinations show that the duration of the learning stage decreases with increasing the number of interaction criteria. Particularly, the combination of the analysis of interaction data from the user interface and of sensor data available from the sensors inside of the system (e.g. temperature, acceleration, pressure etc.), results in reliable identification criteria within very short learning times from some hours down to some seconds of object use (depending on number, frequency and combination of available object data). The measurement equipment can be expanded by sensors of any type (e.g. movement, pressure, temperature, light etc.) to measure e.g. the distance to the computer, keystroke strength, seat compression distribution or the pressure on all objects at the HMI like hand and arm bearing surfaces, state and seat weight or tip speed. Thereby, the user behavior during the interaction with the system can be analyzed in various combinations and almost arbitrarily exactly.
5 Conclusion and Outlook User identification that is not deceivable is possible by analyzing the user interaction behavior with the computer system using relevant interaction indicators. The classification rates
Identification of the User by Analyzing Human Computer Interaction
283
of 97-99% from data captured within 2.5 hours implicate high quality of user identification. The analyses support the assumption that these classification rates can be improved by additional variables. The biggest differences have been found using the following interaction indicators: number of mouse moves, mouse wheel usage, and number of keystrokes followed by the number of mouse clicks. The cross-validated classification rate recognizing the correct user using these four interaction indicators is 99.1%. The differences in the interaction behavior between the test persons are highly significant (p < 0.001). Furthermore, user profiles can be generated from this representing the user needs which help to design suitable HCI. However, much research has still to be done to get new interactions indicators as well as to improve their user identification rate. The next step is to do a study with many more “equal” users to check the reliability of the identification rate and to get the identifying algorithm in detail. Acknowledgments. Many thanks go to all persons who contributed to the study presented in this paper.
References 1. Heimgärtner, R.: Cultural Differences in Human Computer Interaction: Results from Two Online Surveys. In: Oßwald, A. (ed.) Open innovation, pp. 145–158. UVK, Konstanz (2007) 2. Bartmann, D., Wimmer, M.: Kein Problem mehr mit vergessenen Passwörtern. Datenschutz und Datensicherheit - DuD 31(3), 199–202 (2007) 3. Ratzka, A.: Steps in Identifying Interaction Design Patterns for Multimodal Systems. In: Engineering Interactive Systems, pp. 58–71 (2008) 4. Röse, K.: Methodik zur Gestaltung interkultureller Mensch-Maschine-Systeme in der Produktionstechnik, vol. 5, p. 244. Kaiserslautern: Univ. (2002) 5. Smith-Ferrier, G.: .NET internationalization: The developer’s guide to building global Windows and Web applications, p. 639. Addison-Wesley, Upper Saddle River (2007) 6. Inglehart, R.: Culture shift in advanced industrial society, p. 484. Princeton Univ. Press, Princeton (1990) 7. Thomas, A. (ed.): Psychologie interkulturellen Handelns. 2., unveränd. Aufl. ed., p. 474. Göttingen, Hogrefe (2003) 8. Bortz, J., Döring, N., Bortz-Döring: Forschungsmethoden und Evaluation: Für Human- und Sozialwissenschaftler. 4., überarb. Aufl. ed., p. 897. Springer, Berlin (2006)
The Anticipation of Human Behavior Using "Parasitic Humanoid" Hiroyuki Iizuka, Hideyuki Ando, and Taro Maeda Department of Bioinformatics Engineering, Graduate School of Information Science and Technology, Osaka University, 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan {iizuka, hide, t_maeda}@ist.osaka-u.ac.jp
Abstract. This paper proposes the concept of Parasitic Humanoid (PH) that can realize a wearable robot to establish intuitive interactions with wearers rather than conventional counter-intuitive ways like key-typing. It requires a different paradigm or interface technology which is called behavioral or ambient interface that can harmonize human-environment interactions to naturally lead to a more suitable state with the integration of information science and biologically inspired technology. We re-examine the use of wearable computers or devices from the viewpoint of behavioral information. Then, a possible way to realize PH is shown as integrated wearable interface devices. In order that PH establishes the harmonic interaction with wearers, a mutually anticipated interaction between a computer and human is necessary. To establish the harmonic interaction, we investigate the social interaction by experiments of human interactions where inputs and outputs of subjects are restricted in a low dimension at the behavioral level. The results of experiments are discussed with the attractor superimposition. Finally, we will discuss integrated PH system for human supports. Keywords: Ambient interface, parasitic humanoid, behavior-based turing test, attractor superimposition.
1 Introduction Most wearable computers today derive their usage from concepts in desktop computing such as data-browsing, key-typing, device-control, and operating graphical user interfaces. If wearable computers are anticipated to be fitted continuously as clothing, we must re-examine their use from the viewpoint of behavioral information. In this paper, we consider the role of wearable computers as a behavioral interface. The Parasitic Humanoid (PH) is a wearable robot for modeling nonverbal human behavior. This anthropomorphic robot senses the behavior of the wearer and has internal models to learn the process of human sensory motor integration, thereafter it begins to predict the next behavior of the wearer using the learned models. When the reliability on the prediction is sufficient, the PH outputs the errors from the real behavior as a request for motion to the wearer. Through this symbiotic interaction, the internal model and the process of human sensory motor integration approximate each other asymptotically. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 284–293, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Anticipation of Human Behavior Using "Parasitic Humanoid"
285
In this paper, we consider wearable computing applications which rely on the advantage of embodiment as a primary medium of the interface. The term Parasitic Humanoid (PH) refers to wearable robotics adapted to such applications. In order that PH establishes the harmonic interaction with wearers, we investigate the social interaction by experiments of human interactions. The results of experiments are discussed with the attractor superimposition. Finally, we will discuss integrated PH system for human supports.
2 Wearable Robotics as Behavioral Interface Wearable computing and wearable robotics have separate histories. Most recent research on wearable robotics is motivated by interest in powered assist devices [8]. These devices are typically too heavy and consume too much energy for mobile use. On the other hand, research in mobile wearable robotics [12] does not take advantage of the embodiment of wearable devices. 2.1 Usage of Anthropomorphic Robots Consider the usage of anthropomorphic robots as an interface for human behavior. This is perhaps the only pragmatic usage, because the anthropomorphic shape is usually disadvantaged in comparison with optimized designs for other purposes. One successful example is Telexistence system [19]. However, such a robot is too complicated and expensive to be an interface of human behavior, and therefore too socially unacceptable to be considered as a pragmatic solution, except for some specific purposes (although commercial systems are quickly driving down the cost of such devices). A more serious concern regards the safety for the users under common circumstances in modern lifestyle. There will be situations of disorder in which the control system of the robot continues to move although it has to stop to avert a collision. A solution in this situation is a lightweight and low power design such that a surveyor can easily prevent undesirable motion. However, this strategy makes it difficult for the robots to support themselves. 2.2 Wearable Robotics without Powered Assist Wearable technologies provide a solution to the problem. Wearable sensory devices can construct a wearable humanoid without muscles and skeletons, if they are of the proper type and in sufficient number (Fig.1(a)). This robot may be too weak to move by itself, and cannot assist the wearer with mechanical power. However, it is safe and light for the wearer, and can assist him or her with rich behavioral information, when the worn robot is continuously capturing, modeling and predicting the behavior of the wearer. 2.3 Parasitic Humanoid We refer to such a wearable robot as Parasitic Humanoid (PH). PH is a wearable robot for modeling nonverbal human behavior. This anthropomorphic wearable robot
286
H. Iizuka, H. Ando, and T. Maeda
Fig. 1. (a) Wearable sensory devices construct a wearable humanoid without muscle or skeleton. (b) Symbiotic relationship between the wearer and the Parasitic Humanoid.
senses the behavior of the wearer and has internal models to learn the process of human sensory motor integration, thereafter it begins to predict the next behavior of the wearer using the learned models (e.g. [20]). When the reliability on the prediction is sufficient, the PH outputs the errors from the real behavior as a request for motion to the wearer. Through this symbiotic interaction, the internal model and the process of human sensory motor integration approximate each other asymptotically (Fig.1(b)). As a result, the PH acts as a symbiotic subject for information of the environment. The relationship is similar to that of a horse and a rider. The symbiotic relationship between these partners acts as a high performance organism.
3 Basic Elements Constructing Parasitic Humanoid A typical example of the PH usage as a behavioral interface is capturing daily or best behaviors and retrieving the most suitable actions. It is not only for scientific research to analyze human behaviors, but also is used to enhance daily life. For instance, when you play golf, you may want to capture and retrieve your best shot or you may download the data of the swing from the PH of Tiger Woods. Behavioral interfaces will make up the new style of communication of behaviors and skills. To realize it, the prototype of PH consists of 3-Axis postural sensors, fingernail sensors [9,14], eye movement sensors, shoe-shaped sensors, audio and visual sensors, vibration motors [1], head-mounted display (HMD), and galvanic vestibular stimulation (GVS). All sensors are used for capturing the wearer’s behavior and the motors, HMD and GVS are used for giving a feedback to wearers. The prototype of the PH is shown in Fig. 2. The feature of the prototype is to use haptic illusions to give pseudoforce feedbacks to wearers in an intuitive way. One is a pseudo-attraction force display that exploits the differences of the human perception sensitivity between rapid and slow accelerations. Despite the fact that the device is not grounded anywhere but just mounted on the human body, the wearers can feel the attraction force during the high acceleration. Pseudo-Haptic is another haptic display that induces pseudo-force by controlling the visual image of wearer’s own body without any physical contacts. GVS is used to control the posture of the wearers by transmitting low level current signals to induce sensations of tilting in human. By giving wearers sensation and motion in an intuitive manner at the same time, the PH attempts to give rise to “feeling” that can be shared or communicated between humans.
The Anticipation of Human Behavior Using "Parasitic Humanoid"
287
Fig. 2. (a) Prototype of Parasitic Humanoid. (b) Galvanic vestibular stimulation device equipped with HMD. (c) Pseudo-attraction-force and tactile display that produces rapid and slow acceleration for pseudo attraction force and high-frequent vibrations for an illusion of tactile sensation. (d) Manipulation of visual image for pseudo-haptic. (e) Inducing sensations of tilting by GVS.
4 Behavior-Based Turing Test Our prototype of the PH mainly uses illusions to give feedbacks to users. It means that the PH exploits the low-level nature of our sensorymotor mechanism. As higher level of processes, it must be important and useful information for the PH to read intention and motivation of wearers to support their behaviors. However, the question how the system can harmonically support or interact with humans or how the devices can behave like human to establish our natural communication is still missing. The supports by the PH should be naturally superimposed over the user’s intention and behaviors. There are some studies that have investigated human interaction with a small humanoid robot [17,18]. However, it is too complicated and difficult to analyze because the interaction between human and the robot is based on the full modality. Therefore we take a minimalist approach in our experiments [3]. The turing test is a test of a computer’s ability to demonstrate human intelligence in a natural conversation. As the first step to investigate how the harmonic interaction can be established, the turing test at the behavior level is studied in experiments. In order to make them simple, minimal action and sensation are set up in a same way as Lenay's experiment [2]. Our study explores the ongoing dynamical aspects of the mutuality in a social interaction. In the experiments, the behavior-based turing test is defined as the ability to distinguish a partner's moving avatar with a dummy object that just replays the motions that are recorded when real two are interacting. It means that the motions of a dummy object and the avatar are identical in the passive observations. Only active mutual coupling enables to pass the behavior-based turing test. The original idea comes from Trevarthen’s double-monitor experiment that shows a baby can detect that the mother is not reacting to his/her motions in an online manner when mother’s recorded motions are displayed [13,16]. 4.1 Human Experiments A schematic view of the experimental setup is shown in Fig. 3. Subjects can move left or right by his/her finger along a rail on the desk that restricts the motion precisely on the one dimensional space and makes it possible for the CCD laser displacement sensor to detect the position of the finger without lost. A voice coil attached on the
288
H. Iizuka, H. Ando, and T. Maeda
Fig. 3. A schematic view of the experimental environment for behavioral turing-test. The CCD laser displacement sensor gets the position and the voice coil gives the subject sensations.
subject’s nail gives input signals by vibrations when the avatar touches objects in the virtual space, i.e., subjects receive only all-or-none information. This experimental environment is close to the original perceptual crossing experiments but different in the following important points. The first one is a reliability of the space. The positions in the physical space correspond to the positions in the virtual space. On the other hand the correspondence is easily broken in the case of using a computer mouse as the original experiment. The second point is that the motion and sensation are not separated because both happen on a single finger different from the original one where motion is controlled by the right hand and sensation is given to the left hand. The subjects are required to answer if the partner has been controlled by human or not after 30 seconds interactions with an unknown object. They know that a single object exists and it does not disappear during the experiment. In the experiment, the subjects interact with two kinds of objects. One is the avatar that is controlled by another subject. Another one is an object that replays the partner’s previous motions as in the double monitor experiments. A trial consists of 20 interactions, 10 of them are interactions with the human avatar and another 10 are with the recorded motions. After each trial, the subject can know how much he/she can correctly answer human or not-human. The recoded motions are made from previous trial results where the subject has answered that the partner is a human. Emergence of Turn-taking. At the beginning of the experiment, two subjects are not used to this low-level environment. However, the rates of correct answers increase and they can significantly detect the human player from the recorded motion (Fig. 4).
Fig. 4. Rate of correct answers during each trial that consists of 10-human and 10-recorded interactions
The Anticipation of Human Behavior Using "Parasitic Humanoid"
289
Fig. 5. Behavioral patterns by a couple of human subjects at the (a) 13th, (b)21st, (c) 41st and (d) 81st interaction. Both subjects answer that the partner is controlled by human after these interactions. Input signals are also shown in the graphs.
Figure 5 shows the behavioral patterns and input signals during an interaction between both human players. At the beginning of the experiment, their motions are not organized and it is difficult to know how they understand the fact that the partner is human (Fig. 5 (a) and (b)). Actually, the rates of correct answers are still around chance level as the subjects report that they are not sure if the answer is correct or not. After 40 interactions, the behavioral patterns change and it seems that they try to see their partner’s reactions. The behaviors are clearly different from irregular motions as the initial stage. At the 81st interaction, both players’ motions are organized and we can see turn-taking behaviors where roughly they have two fixed roles such as moving and staying, and those roles are alternately exchanged. Player1 touches player2 by oscillating while player2 stays at a certain place and observes player1’s behavior. After a while, the roles are exchanged. Player1 stops oscillating and stays at a certain place and player2 starts oscillating. At the beginning of the interaction, they try to establish this role switching and it continues during the experiment. In order to clarify the emergence of turn-taking, we calculate the performance of turn-taking behavior as follows. First, behavioral patterns of each subject are classified into moving and staying behaviors by simply using a threshold. The integrated dynamics of moving and staying behaviors between the subjects are classified into 3 kinds of integrated roles as a whole such as both-moving, both-stopping, and eithermoving. Next, we estimate a normal distribution of how long each integrated role lasts in the successful human-human interactions. Since we know that either-moving role is important for turn-taking in the preliminary experiments, the performance increases linearly during the either-moving role. When the role changes, e.g. from both-stopping to either-moving, the performance is modified by simply multiplying a parameter that is acquired from the normal distribution. If the role changes at the timing that is often observed in successful interactions, the performance does not change but if the timing is not familiar, the performance decreases. Figure 6 shows the evolution of the average turn-taking performance at the end of human-human interactions of each trial. It is clear that the human subjects establish turn-taking behaviors over trials. It is interesting that the turn-taking behavior spontaneously emerges in the task of our behavior-based turing-test without any information about turn-taking. The importance of turn-taking has been argued in social interactions of the experiments and
290
H. Iizuka, H. Ando, and T. Maeda
Fig. 6. Average performances of turn-taking over human-human interactions in each trial
simulation models [4,7,15]. This emergence of turn-taking behavior is observed in another couple of different subjects as rates of achievement of the task increases. Coordination behaviors modeled by attractor superimposition. The turn-taking behavior observed in the behavior-based turing-test is not simple cooperation but is caused by a mutual dependency between subjects. We try to understand this coordination from an aspect of dynamical systems model called attractor superimposition. The concept of the attractor superimposition is proposed by Global COE program for founding ambient information society infrastructure at Osaka University in Japan [21] as a novel cooperative and self-adaptive algorithm for self-organization of multiple dynamics, that is the next step of “attractor selection” dynamics modeled from an individual bacterial adaptation [5,10]. The attractor selection and the attractor superimposition are based on the idea that a system works adaptively with a parameter termed activity that controls a stability of attractors against fluctuations. For example, if a biological system is not well adapted against an environment, the activity becomes small and an attractor becomes less stable, that causes the noise effect to relatively increase. Then the dynamics escapes from an attractor and searches for another attractor that can adapt to the environment. In the case of the attractor superimposition, the different (biological) networks share the activity and create a symbiotic system that shows the adaptivity as a whole by the basically same mechanism as attractor selection [6]. The equations are expressed as follows, d x A = f A ( x A , xB ) × activity + η A dt d x B = f B ( x A , x B ) × activity + η B dt
(1)
where each x indicates dynamics of different (biological) networks determined by f and η means noise. In response to the activity, the dynamics becomes relatively deterministic or indeterministic. The concept is originally derived from biological phenomena but it is likely to be applicable for information networks [11]. In the case of our human experiments, the turn-taking performance is nothing else the activity shared between subjects and each behavioral dynamics could be x . They try to establish a mutual interdependence that increases the turn-taking performance to achieve the behavioral turing-test. In order to confirm if the attractor superimposition can be a theory that explains our results, two subjects who already experienced enough and achieve the task well are
The Anticipation of Human Behavior Using "Parasitic Humanoid"
291
Fig. 7. Changes of activity during human-human interaction where the noise strength is controlled by an experimenter. The dashed lines indicate ongoing answers of each subject where upper states mean that the subject answers “Non-human” and lower states mean that the subject is feeling that the partner is human. The noise is introduced into the on-off sensations.
re-tested in the interaction under a noisy environment. By externally controlling the strength of noise, we investigate the stability of the ongoing interaction that induces feeling of human. To do so, the subjects are required to constantly answer if they feel that the partner is human. Figure 7 shows the dynamics of the activity, the strength of noise and answers of the subjects. Around 100 seconds where the noise strength is 0.08, subjects are not considering the partner as human. After that, they interact with each other with 0.04-noise for relatively long time. Then, the noise strength comes back to 0.08 again and this time the interaction is not broken and they consider the partner as human. Furthermore, even noise strength increases, the feeling of human is kept in the subjects. Around 480 seconds, the same phenomenon can be seen. It is considered that the phenomenon can be explained by the attractor superimposition. Despite the fact that the coordination has been exposed to the same strength of noise, the stabilities of the coordination are different. The phenomenon can be interpreted with the attractor superimposition. The first 0.08-noise breaks the interactions because the activity is not so high that the noise overcomes coordinated behaviors between subjects. However, when the second and third 0.08-noise is given, the turntaking behavior has been well established as a whole dynamics so the shared activity becomes high and the coordination has enough stability against the noise.
5 Discussion and Conclusion We have presented a survey of several interaction techniques and interfaces that are useful for the design of Parasitic Humanoid. Desktop computers come equipped with large monitors providing high bandwidth output and essentially obviate the need for nonverbal output. Similarly the keyboard does not accept the need for nonverbal input. Wearable computing changes the picture dramatically, since traditional interfacing components are not incorporated as easily. We suggest that this is an ideal area to integrate behavioral information as a primary interface medium. Most of the works we have reviewed in this paper are areas of ongoing research. To conclude we would like to address a particularly challenging but important thing to consider. Given the personal nature of wearable computers, symbiotic interfaces
292
H. Iizuka, H. Ando, and T. Maeda
that can automatically anticipate wearer’s behavior will be especially important. The paradigm of the conventional interfaces asks users to interpret the devices’ outputs to resolve the differences of users but our proposing paradigm is to use the behaviorally intuitive interactions with users and to establish natural interaction. The users might not notice the wearable computers while naturally supporting their behaviors. The users must have their own habits and styles of motions. By anticipating their behavior through PH, we will realize human-centered interface. That is our goal of PH. There are fewer studies that can explain the dynamic symbiotic types of coordination. The attractor superimposition may be able to be a novel theory to achieve it and will give us new aspects of coordination or communication. When both subjects feel that the human player controls the avatar, the parameter, activity, increases at the same time. It can be regarded as sharing something between them, that enhances their behaviors and makes more stable. We consider this as the emergence of protocommunication that we use. Further analyses and studies would compensate for the gap between our interface technologies of parasitic humanoid and the practical issue of how they work with human through communication.
Acknowledgments This work was supported by “Global COE program: Founding Ambient Information Society Infrastructure” of the Ministry of Education, Culture, Sports, Science and Technology in Japan.
References 1. Ando, H., Miki, T., Inami, T., Maeda, T.: SmartFinger: Nail-mounted tactile display. In: ACM SIGGRAPH 2002 Conference Abstracts and applications, p. 78 (2002) 2. Auvray, M., Lenay, C., Stewart, J.: The attribution of intentionality in a simulated environment: the case of minimalist devices. In: Tenth Meeting of the Association for the Scientific Study of Consciousness, Oxford, UK, June 23-26 (2006) 3. Beer, R.D.: Toward the evolution of dynamical neural networks for minimally cognitive behavior. From Animals to Animats 4. In: Proceedings of the 4th International Conference on Simulation of Adaptive Behavior, pp. 421–429. MIT Press, Cambridge (1996) 4. Di Paolo, E.A.: Behavioral coordination, structural congruence and entrainment in a simulation of acoustically coupled agents. Adaptive Behavior 8, 25–46 (2000) 5. Furusawa, C., Kaneko, K.: A generic Mechanism for Adaptive Growth Rate Regulation. PLoS Computational Biology 4 (2008) 6. Hosoda, K., Mor, K., Shiroguchi, Y., Ymauchi, Y., Kashiwagi, A., Yomo, T.: Synthetic ecosystem of Escherichia coli for discovery of novel cooperative and self-adaptive algorithms. In: The 3rd International Conference on Bio-Inspired Models of Network, Information, and Computing Systems (2008) 7. Iizuka, H., Ikegami, T.: Adaptability and diversity in simulated turntaking behavior. Artificial Life 10, 361–378 (2004) 8. Jacobsen, S.: Wearable Energetically Autonomous Robots, DARPA Exoskeletons for Human Performance Kick Off Meeting (2001)
The Anticipation of Human Behavior Using "Parasitic Humanoid"
293
9. Johansson, R.S., Westling, G.: Role of glabrous skin receptors and sensorimotor memory in automatic control of precision grip when lifting rougher or more slippery objects. Exp. Brain Res. 56, 550–564 (1984) 10. Kashiwagi, A., Urabe, I., Kaneko, K., Yomo, T.: Adaptive response of a gene network to environmental changes by attractor selection. Plos One 1, e49 (2006) 11. Leibnitz, K., Wakamiya, N., Murata, M.: Biologically inspired self-adaptive multi-path routing in overlay networks. Communication of the ACM 49(3), 62–67 (2006) 12. Mayol, W.W., Tordoff, B., Murray, D.W.: Wearable Visual Robots. In: International Symposium on Wearable Computing (2000) 13. Murray, L., Trevarthen, C.: Emotional regulations of interactions between two-month-olds and their mothers. In: Field, T.M., Fox, N.A. (eds.) Social perception in infants, pp. 177– 197. Ablex, Norwood (1985) 14. Mascaro, S., Asada, H.: Distributed Photo-Plethysmograph Fingernail Sensors: Finger Force Measurement without Haptic Obstruction. In: Proceedings of the ASME Dynamic Systems and Control Division, vol. DSC-67, pp. 73–80 (1999) 15. Nadel, J.: Imitation and imitation recognition: their social use in healthy infants and children with autism. In: The imitative mind: Development, evolution and brain bases, pp. 42– 62. Cambridge University Press, Cambridge (2002) 16. Nadel, J., Carchon, I., Kervella, C., Marcelli, D., Reserbat-Plantey, D.: Expectancies for Social Contingency in 2-Month-Olds. Developmental Science 2, 164–174 (1999) 17. Robins, B., Dickerson, P., Dautenhahn, K.: Robots as embodied beings – Interactionally sensitive body movements in interactions among autistic children and a robot. In: Proc. IEEE Ro-man 2005, 14th IEEE International Workshop on Robot and Human Interactive Communication, pp. 54–59 (2005) 18. Scassellati, B.: Imitation and Mechanism of Joint Attention: A developmental structure for building social skills on a humanoid robot. In: Nehaniv, C.L. (ed.) CMAA 1998. LNCS (LNAI), vol. 1562, p. 176. Springer, Heidelberg (1999) 19. Tachi, S., Arai, H., Maeda, T.: Tele-Existence Simulator with Artificial Reality(1) - Design and Evaluation of a Binocular Visual Display Using Solid Models. In: IEEE International Workshop on Intelligent Robot and Systems, IROS 1998 (1988) 20. Taga, G.: A model of the neuro-musculo-skeletal system for human locomotion. Biolog. Cybern. 73, 97–111 (1995) 21. Global COE Program, Center of Excellence for Founding Ambient Information Society Infrastructure by Osaka University, http://www.ist.osaka-u.ac.jp/GlobalCOE
Modeling Personal Preferences on Commodities by Behavior Log Analysis with Ubiquitous Sensing Naoki Imamura1, Akihiro Ogino2, and Toshikazu Kato3 1 2
Graduate School of Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan Kyoto Sangyo University, Kamigamomotoyama Kita-ku, Kyoto-si, Kyoto 603-8555, Japan 3 Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan {imamura,kato}@indsys.chuo-u.ac.jp,
[email protected]
Abstract. Consumers may take some specific behavior preference or favorite items to get more information, such as the material and the price, in shopping. We have been developing a smart room to estimate their preference and favorite items through observation using ubiquitous sensors, such as RFID and Web cameras. We assumed the decision decision-making process in shopping as AIDMA rule, and detected specific behavior, which are “See”, “Touch” and “Take”, to estimate user’s interest. We found that we can classify consumers by their behavior patterns of the times and duration of the behaviors. In our experiment we have tested twenty-eight subjects on twenty-four T-shirts. In the experiment, we got better precision ratio for each subjects on estimating preference and favorite items by discriminate analysis on his or her behavior log, and behavior patterns classification above.
1 Introduction In recent year, not only a lot of products have been diversified, but also the consumer's interests to the products are diversified. As a result, when the consumer goes to the store it is difficult for the store to correspond to consumer needs. Thus, there are very necessary to provide information to the consumer and predict the consumer’s interest and preference. From the previous research, when they have interest in a product, they will stare at it or pick it up to take a glance [5][6]. Therefore, when we collect consumer’s preference by observing the consumer’s behavior with ubiquitous equipments such as RFID, sensor cameras [4], we can predict the consumer’s behavior and we can provide the appropriate item recommendation to the consumer in the shop space. From our study, male consumers tend to decide to buy when spec of item match with their desire while female consumers carefully consider, choose and compare lots of item before decide to buy [3]. That means the consumer takes various behavior patterns. In this paper, we have been studying about the consumer’s preferences by observing the unevenly consumer's behavior with ubiquitous equipments such as RFID, sensor cameras in the store. We have collected log files data and can predict the consumer’s interested by analyzing the behavior of each consumer from log files. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 294–303, 2009. © Springer-Verlag Berlin Heidelberg 2009
Modeling Personal Preferences on Commodities by Behavior Log Analysis
295
2 Approach to Understand the Consumer Behavior Pattern 2.1 The Definition of the Consumer’s Action in the Shop Space In this research, we tried to observe the action and the action frequency that the consumer does to the item to presume that is the favorite item. Then, we classify these actions to behavior pattern. We propose three steps actions since consumer recognize in his/her mind until purchase an item. Three steps can be described as follows: 1. Watching an item that interested and still recognize in his/her mind. 2. For an interest item, check material by visual and touch it to watch a price tag. 3. For an interest item, investigate by hand to get information and choose fit size for him/herself. In this study, we defined three actions of the consumer for the item which are "See" "Touch" and "Take". We can explain each action as follows: "See": this action is in condition to watch an item, and state that checked the direct most moving passage, feel of a material or a price tag, "Take" took the item in a hand, and nearby, "Touch" watched an item and did it with the state that checked it was an item suitable for oneself. 2.2 Comparison with the AIDMA Rule Generally, the discussion concerning person's consumption behavior is increased, and AIDMA which is a rule and the behavioral economics has become popular. AIDMA rule [2] is a hypothesis about the process of the consumption action before reaching the purchase of a certain item after a consumer recognize about an item. AIDMA is composed of “Attention (A)”, “Interest (I)”, “Desire (D)”, “Memory (M)”, “Action (A)”. Because three actions that we defined are regarded about the change of the action called "See "→" Touch "→" Take" and the strength of the interest degree, these are along a process of AIDMA rule (Fig.1). However, these three actions are considered about intention of watching or touching of items, “ Memory “ of AIDMA rule is difficult to measure then we did not use in this paper. Based on the above, this research was defined three actions between the processes to purchase items from the perception that consumers which are "See", "Touch", and "Take", and focused on the degree of interest means that this action, and estimates of the consumer preference. But, it is difficult to measure their favorite degrees with the item in one action because there are consumers who often touches and takes it for the item but not purchase. Then, we propose the kind of the action and the frequency of action that we can collect from the behavior log to classify a shopping pattern and to able to presume what kind of item you liked.
Fig. 1. Definition of action process
296
N. Imamura, A. Ogino, and T. Kato
In this research, we tried to observe the kind of the action of consumer's "See", "Touch", and "Take" to the item and the action frequency, and to classify the consumer into the purchaser patterns such as consumer who often felt after the item and consumer not so according to the kind and the frequency of the action. 2.3 Definition of Actions We observe consumer's actions which are “See”, “Touch”, and “Take”, the action time, and the action frequency with an ubiquitous sensor such as Web cameras and RFID in the store. We can definite all actions are as follows: "See: The person is looking at the item" is interpreted as "Presence of the person in front of the item shelf is observed and detection and the person observe someone with the sensor", and the sensor observes the state, it is assumed "See". Moreover, when it interprets as "The sensor that observes changing the item that exists in the item shelf and the person observes someone", and the sensor observes the state, the state "Touch: Feel after the thing" is assumed to be “Touch”. And, when it interprets as "The item that exists in the item shelf disappears", and the sensor observes the state, "Take: The thing is taken up" is assumed to be "Take". In this way, it makes it to the amount here of catching in the state that the movement of the person, the hand, and the item can be observed with the sensor. 2.4 Classification of Shopping Pattern by Clustering The consumer is divided according to various behavior patterns such as people who often take the item of the favor and person not so from Chapter 1. Thus, after it divides into the behavior pattern that uses the action log to the item, we were able to presume consumer preferences high accuracy by the analysis in each pattern. As technique to divide a consumer according to a behavior pattern, we perform cluster analysis. When the frequency of "See" "Touch" "Take" that a certain consumer acted for one item is defined as each x See , xTouch , xTake , Each average and decentralization of each action that the person acted for two or more commodities is defined respectively as xSeeAve , xTouchAve , xTakeAve , x See var , xTouchVar , xTakeVar . We were able to divide two or more consumers by the shopping pattern by the division method of the cluster analysis with using these six variables and the number of consumers. That means we were able to classify it by a person with a lot of action frequencies and little person by the use of the average and the decentralization of each action frequency of the consumer as the variable. 2.5 The Judgment of the Favorite Item by the Discriminant Analysis By using the cluster analysis after it divides from the consumer's behavior into each behavior pattern, we investigate how to like each item by using the questionnaire, and the objective variable is a degree of the favor of the item and the explanatory variable is action frequency x See , x Touch , xTake in which it acts for each item, and the distinction analysis is done. Thus, we can presume the item of likes and dislikes by calculating the distinction coming true rate of the favor and the probability according to the misjudgement using action frequency of each action.
Modeling Personal Preferences on Commodities by Behavior Log Analysis
297
3 An Evaluation Experiment 3.1 Experimental Overview Consumer's behavior pattern was classified by using the proposal technique of this research in the shop space with an ubiquitous sensor, and it experimented on the utility of the presumption of the item of the favor. The explanation, the processing method, the content of the experiment, and the experiment result of the sensor used to observe the purchase space, "See", "Touch", and "Take" constructed to experiment are described as follows. 3.2 Experiment Environment 3.2.1 Construction of the Real World Purchasing Space That Used an Ubiquitous Sensor We constructed the observation and the analysis system. We call Smart Sphere System (SSS) of the real shop space with various sensors that we call an ubiquitous sensor (Fig.2). Ubiquitous sensor is composed of sensor cameras and RFID as System that observes action (Take, Touch, See) and time and frequency of the action on item and calculate the relation between the action and the interest of the consumer who did each behavior pattern. In the experiment, the space that set up six item shelves named point A to F was constructed as a store in the laboratory. Web camera (BL-C31) made by the Panasonic company that stops taking a picture after it finished taking a picture of the image at regular intervals when the change occurs in the place of which it takes a picture to the ceiling and the shelf of each point, and it taking a picture of a constant image is set up, and the RFID reader made by the OMRON company (V720 series) is hided under the floor in front of the item shelf (Figure 2). The subjects were supposed to shopping freely in the shop space and wear slippers that have RFID card that was recorded ID (= customer number) that identified each one. Moreover, SSS is connected with all ubiquitous sensors in the store, analyzes the data that has been sent from those ubiquitous sensors, and accumulates consumer's each behavior log.
Fig. 2. Smart Sphere System
298
N. Imamura, A. Ogino, and T. Kato
Fig. 3. Observation image with ceiling camera
3.2.2 The Personal Certification That We Used RFID The experiment authenticates the consumer by using RFID. When the consumer who put on slippers that has the RFID tag comes in front of the item shelf, SSS attests who you are there with the RFID reader that hided under the floor in front of the item shelf. In this experiment, when the RFID reader observes tester's RFID tag, This system stores the data of observation time, the customer number, and the place of the item shelf in the database named “RFID” on SSS, and leaves it as an observation log. However, it was likely to cause interference when there were three people or more who had the RFID tag in the same shelf on the performance of this RFID reader, and the customer number wasn’t able to be read correctly. Thus, each action can be observed even if two or more customers shop at the same time in SSS if it limits it up to two people per one shelf though it is difficult for this system to observe two or more testers with one item shelf. 3.2.3 The Person Detection That We Used a Web Camera This system detects the person by WEB camera. The image when the person comes in front of the shelf by using the Web camera set up in the ceiling of each item shelf is acquired (right of Fig.3), and processing that takes the difference with the background image (left of Fig.3) acquired beforehand is done [1]. SSS is able to acquire data adequately by updating the background image when data will not be sent from the Web camera in fixed time even if the environment changes. This system judges that there is a person in the place when the difference between the acquisition image and the background image is great, this system stores the data of acquisition time of observed image and the place of the item shelf in the data base named "CEILING" on SSS when there is a person, and leaves it as an observation log. 3.2.4 The Hand of the Person Who Used a Web Camera and the Detection of the Movement of the Item In the experiment, this system detects person's hand and the movement of the item by using two WEB cameras. First, this system acquires the image when feeling after the item of the shelf by using one the Web camera (touch camera) set up in each item shelf. The background image at this time is assumed to be a state that the item is put on the shelf (Fig.4). This system defines to feel after the item put on the item shelf when the difference between this acquisition image and background image is great, this system stores the data of acquisition time of observed image and the place of the item shelf in the database named "SHELF1" on SSS when there is a person’s hand, and leaves it as an observation log. SSS is able to acquire data adequately by updating the background image when data will not be sent from the Web camera in fixed time even if the
Modeling Personal Preferences on Commodities by Behavior Log Analysis
299
Fig. 4. Observation image and picture processing with shelf camera
environment changes. Second, this system acquires the image when the item of the shelf is being taken up by the consumer using another Web camera set up in each item shelf. At this time, the image from which the item is not put on the shelf is assumed to be a background image. The system assumes the item was taken by the consumer when there is no difference between the acquisition image and the background image, this system stores the data of acquisition time of observed image and the place of the item shelf in the database named "SHELF2" on SSS when there is no item, and leaves it as an observation log. 3.2.5 The Definition of the Action by the Observation History We combine the observation histories left for each database of "RFID" "CEILING" "SHELF1" “SHELF2”, and define the action to observe "See" "Touch" "Take" defined to make each action taken in the store an amount with RFID and Web camera (Table 3). First, this system assumes the state to see the item when the place of the shelf is corresponding to the observation time from the observation log of database "RFID" and "ceiling", and accumulates in the database named "SEE" as a behavior log. When the consumer was halting in front of the item shelf, we assumed the consumer would see the item. Moreover, this system assumes the state to feel after the item when the place of the item shelf is corresponding to the observation time from the observation log of database "RFID" "SHELF1", and accumulates in the database named "touch" as a behavior log. In addition, this system assumes the state taken up when "Take" is corresponding to the observation time the place of the item shelf from the observation log of database "RFID" "SHELF2". 3.3 A Precision Evaluation Experiment We experimented the evaluation of accuracy to which is "See", "Touch", and "Take" were observed by using SSS. The experiment puts one T-shirt per one item shelf of SSS (six shelves in total) and five subjects freely see, touch, and take T-shirt, finally take up and choose one T-shirt taking. At this time, the subject takes some actions for all items. Moreover, we took a picture of all processes from the experiment beginning to the end with the fixed point video camera. The frequency of "See", "Touch", and "Take" that the system of SSS had obtained and the frequencies of "See", "Touch", and "Take" when seeing with the video camera with the unassisted eye were comparing verified, and the accuracy of SSS was evaluated. Table 1 shows the accuracy of the average of each subject of See, Touch, and Take.
300
N. Imamura, A. Ogino, and T. Kato Table 1. Result of accuracy evaluation of SSS
"See" Accuracy of observation "Touch" Accuracy of observation "Take" Accuracy of observation
0.747 0.699 0.633
3.4 The Evaluation Experiment of the Item in the Purchasing Space We tested evaluation to examine utility of the technique that we described in section 2 with SSS. As for the item used this time, all are T-shirts or polo shirts (Figure 5) made by the UNIQLO company. The amount of the feature is squeezed to two (the color and the shape of the neck), each of white, the black, the pink color, yellow, light blue, and the navy blue six color crew necks (Thereafter, it is written C), V problems (Thereafter, it is written V), polo shirts (Thereafter, it is written P), and the sizes are united in M size, and all materials are united to cotton, and 18 pieces in total were prepared. These two are selected at random, and six experimental sets are prepared by three sets. At this time, the same color is not in single-unit. Moreover, everything is all united to a dry cloth, and only the kinds are the single-unit mother-of-pearls in M size and the material in six color six pieces and the sizes as for the border shirt (Thereafter, it is written T). Four in total experiment sets are prepared. UNIT1 : C = 2, V = 2, P = 2, UNIT2 : C = 2, V = 2, P = 2 UNIT3 : C = 2, V = 2, P = 2, UNIT4 : T = 6 We arrange experiment UNIT 1 to each commodity shelf of SSS (A to F) at random, and SSS observes the action that the subject takes in each item while shopping. Experiment UNIT 2, 3, and 4 are similarly done. We announce to the subject before experiment: "Please choose one favorite clothes or more from among the experiment set prepared in each shelf in the experiment store", and we have the subject actually choose the commodity of the favor in SSS. When the subject chooses all commodities, the experiment is ended. We will present the chosen item free of charge later to make the subject choose the commodity really wanting it. Moreover, to know the reason for the action where the subject had gone to the item, we did the questionnaire survey immediately after the experiment. The subject sees twenty-four clothes used this time directly three days after the end of the experiment, and puts 5-point scale on each item. The purpose of the reason done three days after the end of the experiment is to obtain subject's essential favor from the questionnaire the subject forgets the action by the experiment.
Fig. 5. Sample item used by experiment
Modeling Personal Preferences on Commodities by Behavior Log Analysis
301
Table 2. Distinction analysis result of presumption of favorite item All subjects Distinction coming true rate (%) Probability according to misjudgement (%)
Comparison type
Noteworthy type
57.5
58.0
59.3
40.4
37.7
39.2
We average the action frequency of "See", "Touch" and "Take" of the subject who observed it by SSS, and calculate decentralization by this experiment. And we classified twenty-eight subjects into two shopping patterns by assuming the average and decentralization to be a variable and using the cluster analysis. As a result, the people who cause the action are five people in the commodity often, and subjects who hardly rush into action to the item are twenty-three people among twenty-eight subjects. Here, we name the subject who causes the action to the items a "Comparison type", and name the subject who hardly rushes into action a "Noteworthy type". Next, when the item that put the evaluation of 3 to 5 was made a item group that was the favor by the questionnaire, and the item of the evaluation of 1 and 2 was made a item group that was not the favor, we calculated the relevance ratio of the evaluation by the questionnaire and the evaluation by the observation by using the discriminant analysis. The distinction result after the subject is divided with "Noteworthy type" group and "Comparison type" group is matched, and the result is shown in Table 2. Thus, the distinction coming true rate after it classifies it before it classifies it is higher, and it is understood that the probability according to the misjudgement has lowered.
4 Consideration 4.1 Basic Performance The accuracy of the observation of “See”, “Touch”, and “Take” became a result in which about 70 percent from the accuracy evaluation experiment result of SSS constructed with this research. Touch and Take are thought to be a fall of accuracy because of causing of it the mis-observation when the difference with the background image was taken in a white system because of white the color of the item shelf T-shirt put on the shelf though See, Touch, and Take were observed by using the image data processing by the background difference method. The image processing technique of the background difference method that uses the background image is improved, and the observation of the action with high accuracy and the pursuit of the meaning of the action will be examined with a new sensor in the future. 4.2 Utility of Classification Technique of Behavior Pattern The ubiquitous shop space was constructed with RFID and Web camera that was an ubiquitous sensor, the consumer saw the item, it felt after the item, and the subject was classified into the shopping pattern "Noteworthy type" and "Comparison type" by observing three action of taking the item, and using the cluster analysis from the action frequency.
302
N. Imamura, A. Ogino, and T. Kato
In addition, we measured the hitting ratio of the distinction of the item of the favor by using the discriminant analysis to examine the utility of the classification. As a result, when you compare time that was not classified with time when "Noteworthy type" and "Comparison type" were classified, Classifying it into "Noteworthy type" and "Comparison type" goes up and the hitting ratio goes up and the probability according to the misjudgement fell. We understand that the person of "Comparison type" that took a lot of actions to the item tended to choose the item from the result of the questionnaire (end of a paper appendix) immediately after the experiment more careful; For example, they select the item that matches to clothes that they have most, They touch all items, and select the most favorite item and so on. Oppositely, the person of "Noteworthy type" that doesn't rush into action so much for the item judges a favorite item and the hated item from externals, and they tend to choose the item when they found the item wanted most. It is thought that the action on the item tended to become few overall because the number of commodities is little, all commodities are recognized easily in the actual experiment, and it selected it easily. As a result, it is thought that the subject of "Noteworthy type" that is the purchaser pattern with a little action frequency increased.
5 Conclusion In this paper, when we can predict the interest and the preference to the item from consumer's action log in the shop space, we paid attention to the existence of the behavior pattern by consumer's individual, and proposed the technique for presuming whether the item that caused the action liked it. In the experiment, we have tested twenty-eight subjects on twenty-four T-shirts, and we got better precision ratio for each subjects on estimating preference and favorite items by discriminate analysis on his or her behavior log, and behavior patterns classification above. Thus, it is thought that this technique has utility. We will improve the accuracy of SSS more in the future, and after understanding consumer's individual behavior pattern from this technique, the system that does recommend of the item guessed to be a favor by using Web a browser from the display set up in the store is constructed, and it aims at the construction of the algorithm to display recommend put together on the difference of the behavior pattern.
References 1. Kazuhide, H., Yo-iti, W., Sae-ueng, S., Katayama, T., Nobuyoshi, H., Nagase, H., Toshikazu, K.: Design of Ubiquitous Information Environment with Real World Interaction and Modeling KANSEI Facilities, vol. 103, ISSN:09135685 2. http://ja.wikipedia.org/wiki/AIDMA 3. Onodera, K., Yusaku, K.: Why would someone buy in this store. Publisher Nihon Keizai Shimbun (2005)
Modeling Personal Preferences on Commodities by Behavior Log Analysis
303
4. Koshizuka, S.: Ubiquitous ID technology and its applications. IEICE 87(5), 374–378 (2004) 5. Sae-ueng, S., Pinyapong, S., Ogino, A., Kato, T.: Consumer friendly Shopping Assistance by Personal Behavior Log Analysis on Ubiquitous Shop Space. In: Proceeding of 2nd IEEE Asia-Pacific Services Computer Conference, APSCCS 2007, pp. 496–503 (2007) 6. Sae-ueng, S., Pinyapong, S., Ogino, A., Kato, T.: Modeling personal preference using shopping behaviors in ubiquitous information environment. In: Proceeding of 18th IEICE data engineering workshop /5th DBSJ annual meeting (DEWS 2007) (2007)
A System to Construct an Interest Model of User Based on Information in Browsed Web Page by User Kosuke Kawazu1, Masakazu Murao2, Takeru Ohta3, Masayoshi Mase3, and Takashi Maeno2 1
Keio University, Graduate School of System Design and Management, 4-1-1 Hiyoshi, Kouhoku, Yokohama, Kanagawa 223-8526, Japan 2 Keio University, Graduate School of System Design and Management, Japan 3 Keywalker. Inc, 6th floor Sprit Buildin 3-19-13 Toranomon, Minatoku, Tokyo, 105-0001, Japan
[email protected]
Abstract. In these days, they expect that computers comprehend characteristics of the user, for example interest and liking, to interact with computers. In this study, we constructed a system to construct an interest model of the user based on information in browsed Web pages by the user by extracting words and interword relationships. In this model, metadata is appended to words and interword relationships. Kinds of metadata of words are six, personal name, corporate name, site name, name of commodity, product name and location name. And metadata of interword relationships is prepared to clarify relationships of these words. This system makes a map by visualizing this model. And this system has functions to zoom and modify this map. We showed efficacy of this system by using evaluation experiment.
1 Introduction In recent years, we became possible to do a lot of works by using computers with the high functionality in computers. As the result, it becomes difficult that computers comprehend the user requirements. Today it is noticed to model characteristics of the user, for example interests and likings, to comprehend the user requirements through interactions between the user and computers. It was formerly conducted that some studies to model characteristics of the user. There are two methods to model the user. One method is collaborative filtering to model characteristics of the user by using information of a lot of users. Another method is information filtering to model characteristics of a user by using information of a user [1]. Collaborative filtering is a method to extract action patterns of user communities [2]-[5]. So, detailed characteristics of the user can’t be model by using collaborative filtering. In this study, information filtering is chosen to model detailed characteristics of the user. There are two methods in information filtering. One method is explicit one. This is that interested fields and keywords are evinced by the user. Another method is implicit one. This is that computers extract automatically interested keywords of the user by using J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 304–313, 2009. © Springer-Verlag Berlin Heidelberg 2009
A System to Construct an Interest Model of User
305
actions at the time of browsing [6]. Explicit method is highly-loaded because the user must evince analysis objects every time. And model of characteristics of the user is limited because of limited analysis objects [7]. In this study, implicit method is chosen. In the study using implicit method, there are Corin’s study and Murata’s study. Corin constructed a personalized start page by using Web access history [8]. Murata established Site-Keyword graph to visualize and extract interest of the user by using logdataes [9]. Site-Keyword graph is a graph which has titles of Web pages and search keywords as vertices and time ordering as sides. Thus, heretofore, vertices are titles of Web pages and search keywords. And sides are time ordering and occurrence rate in existing models. But if vertices are titles of Web pages and search keywords, characteristics of the user aren’t clear. And if sides are time ordering or occurrence rate, it is difficult to conjecture objects of interest of the user because the user’s actions are inconsequence. So purpose of this study is to establish a system to construct an interest model of the user which has two features by using information in browsed Web pages by the user: • Detailed model of the user; and • Easy model of the user to use.
2 Concept Design In this chapter, in order to fulfill the demands stated in the previous chapter, concept design is clarified. 2.1 An Analysis Object for Making the Model As stated in previous chapter, proposed system is that builds the detailed model of the user to present information for which the user hopes. It is necessary to consider an analysis object for making the detailed model of the user. In this study, the detailed interest model of the user is defined as clarifying the interword relationship by using each word as each vertex. Hence, in this study, analysis object is defined as sentences of browsed Web page by the user. This system is able to clarify the object of user’s interest by extracting object of user’s interest word by word by using sentences of Web page. That will be effective to present information for which the user hopes. Concretely, proposed system gets words and interword relationships by extracting title and body text of browsed Web page by the user. 2.2 Addition of Metadata As stated in previous chapter, proposed system is that builds the user’s interest model that is used easily to present information for which the user hopes. In this study, easy model of the user to use is defined as being able to arrange and categorize object of user’s interest easily. As a result, proposed system will be able to conjecture user’s requirement by using the model. So, it is effective to arrange and categorize words and interword relationships. Hence, proposed system adds metadata into each word and each interword relationship to arrange and categorize them.
306
K. Kawazu et al.
Table 1. Example of metadata of word
Table 2. Example of metadata of interword relationship
Metadata of Words The metadata of words is clarified. As stated in previous, it is necessary to add leading concept that this word is corporate name into a word such as Google to arrange and categorize words. The ease of use of interest model of the user will increase by adding these leading concepts as metadata into words. In this world, there are various words. Therefore, there are various metadata that is the leading concept of them. It is difficult to find out all kinds of metadata and make proposed system get them. Then, we categorized Web search words that were released by various internet search sites [10]-[13]. As a result, it was found that the Web search words used frequently by user are: • • • •
Personal name such as George Washington and Marilyn Monroe. Corporate name such as SONY and NASA. Site name such as Google and Yahoo. Name of commodity such as Xbox360 and Freelander.
Therefore, in this system, metadata of words is constructed from personal name, corporate name, site name and name of commodity. Proposed system extracts words that correspond to metadata and adds the metadata into the words. Table 1 shows examples of words that can be obtained and the kinds of metadata. These words can be obtained by analyzing sentences of browsed Web page by the user.
A System to Construct an Interest Model of User
307
Metadata of Interword Relationships The metadata of interword relationships is defined. Previously, the interword relationship was defined by using occurrence rate of word that is represented by Takashiro’s research [14]. But the method defined by using only occurrence rates of a word can not clarify difference of interword relationships between words. It is effective to define the relationships by using not only occurrence rate of word but also metadata to clarify them to arrange and categorize the model. Interword relationships are decided by using semantic analysis based on situational context in sentences of browsed Web page by the user. Hence, proposed method defines interword relationships by using two kinds of analysis based on occurrence rate of word and semantic relation of word at the same time. In this study, words proposed system extracts is defined by four kinds of metadata that are personal name, corporate name, site name and name of commodity. Hence, it is necessary to define interword relationships in order to connect these words. There are various relationships between the words. Therefore, it is difficult to define all of them. Hence, the relationships that are specified and extracted easily in sentences of Web page are defined. Table 2 shows example of interword relationships that can be obtained and kinds of metadata. 2.3 Outline of Proposed System Outline of proposed system based on methods stated above is clarified. The function of proposed system is divided into two functions. The first function extracts words and interword relationships, adds metadata into each of them, and constructs interest model of user by analyzing sentences of browsed Web page by user. The another function visualizes interest model of user for user to browse own model. As a result, user will be able to modify own model properly. Fig. 1 shows outline of processes of proposed system.First, user visit a Web page. As a result, this system obtains user’s ID, time, the Web page of URL and sentences in the website. Second, this system analyses sentences of the Web page. As a result,
Fig. 1. Outline of Proposed system
308
K. Kawazu et al.
the sentences are decomposed to words. And, the system adds metadata into each word and interword relationship. Third, the system constructs map by arranging these words and interword relationships in two dimension spaces as vertices and sides. Finally, this map is visualized for user to modify it.
3 Detail Design and Building This system is constructed by two subsystems. One subsystem is a database. Another subsystem is an user interface. In this chapter, algorithms of this database and this user interface are described. 3.1 Algorithm of Database In this study, URL and document of Web pages are extracted by installing plug-in of our own composition on Firefox. In this database, vertices, layout of vertices and sides to make a map are defined by analyzing information of Web pages browsed by the user. In this database, it processes by the following flows: 1. URL of a browsed Web page by the user is received from Firefox. 2. HTML is received. 3. Documents are extracted. 4. Situation Analyze(SA) is performed to the acquired documents. 5. A map of the user is made in two-dimensional space. SA has two features: • Metadata of words is appended by performing morphological analysis and syntax analysis to the acquired documents; and • Metadata of interword relationships is appended by using semantic network and analyzing documents of browsed Web pages by the user. In the previous chapter, we described that words which were appended metadata of personal name, corporate name, site name and name of commodity are extracted. A rule-based database is established to extract these words. This is a rule to define metadata, personal name and corporate name etc, from the acquired documents. Fig. 2 shows a flow between input to output. Words and metadata of words are outputted from the acquired documents by checking the rule in the case of coinciding.
Fig. 2. Flow between input to output
A System to Construct an Interest Model of User
HONDA
309
SWIFT
Table 3. Color of Metadata Connected Corporation Connected Corporation
Inclusion
MITSUBISHI MOTORS
CAR
SUZUKI
Yahoo!
Fig. 3. Visualized map
In this study, interword relationships are defined by combining with two algorithms. One algorithm is interword relationships based on occurrence rates of a word. Interword relationships based on occurrence rates are defined by weights of interword relationships. Weights W of interword relationships are calculated by:
⎛ N N WW 1&W 2 = log( N W 1&W 2 ) × log⎜⎜1 + W 1&W 2 URL NW1 NW 2 ⎝
⎞ ⎟ ⎟ ⎠
(1)
Ww1&W2 is weight of relationships of the word2 to word 1. NURL is total number of browed Web pages by the user. NW1&W2 is the number of Web pages in which word 1 and word 2 are contained at the same time in browsed Web pages by the user. NW1 is the number of Web pages in which word 1 is contained in browsed Web pages by the user. NW2 is the number of Web pages in which word 2 is contained in browsed Web pages by the user. Words which are acquired by SA are calculated by using this calculating formula. And average weight of all the words that have relationship in word 1 and standard deviation are calculated. Then, words which have weight of more than the sum total of average weight and standard deviation are regarded as having relationships. Another algorithm is interword relationships based on meaning in documents. interword relationships based on meaning are appended on relationships based on occurrence rates. A rule-based database is established to extract interword relationships based on meaning as well as acquisition of words. Metadata of interword relationships is appended by checking the rule and documents of Web pages in the case of extracting relationships, for example connected commodity. Finally, layout of vertices is defined by using Graphviz which is software. As the result, vertices are arranged in two-dimensional space in a single layer. And then, vertices have x-coordinate and y-coordinate. This information is outputted as XML. 3.2 Algorithm of User Interface In the user interface, worked information in the database is acquired. And a user can interact with the computer by using this information. In particular, firstly, contents of vertices, coordinates of vertices and interword relationships are defined by receiving in Flash information which was outputted as XML. Secondly, the computer makes a
310
K. Kawazu et al.
map by defining shapes of vertices, colors of vertices and sides. Finally, the user can interact with the computer by modifying his model. The map has functions to modifying the model of the user, for example a function to zoom and his map, a function to move vertices of his map. Fig. 3 shows a sample of the map which is made actually. And Talble 3 shows relationships between metadata of words and color. In this sample map, vertex of Yahoo draws by yellow because site name is sometimes recognized faultily as corporate name.
4 Verification Experiment In this chapter, the two experiments were conducted to verify effectiveness of proposed system. 4.1 Verification of Proposed System The first experiment was conducted by two procedures to verify validity of kinds of metadata and extraction of words. First, examinees browsed the Web pages in thirty minutes. The second task for examinees was to extract the words he has been interested in from the Web page he browsed and to write down the words and interword relationships freely. What examinees wrote down was constructed from the words as vertex and the relationships as sides. This experiment was conducted by eight examinees in their twenties and thirties. Verification of Kinds of Metadata of Words In this study, the metadata of words was constructed from personal name, corporate name, site name and name of commodity. Therefore, it is necessary to verify content rate of these four kinds of metadata. Content rate of metadata of words, Rinclusion,, where NALL and NTAG are the total number of word examinees wrote down and the number of word classified into the four kinds of metadata, respectively, is expressed by
Rinclusion =
N TAG N ALL
(2)
Fig. 4 shows the relationship between content rate of metadata of words and the number of kind of metadata. As shown in Fig. 4, proposed system constructed from four kinds of metadata can express only 37.4% of words in which examinees was interested. Consequently, two kinds of metadata of words that are location name and product name were added into proposed system. At the same time, two kinds of metadata of interword relationships that are inclusion and location were added into proposed system. As a result, as shown in Fig. 4, the system constructed from six kinds of metadata can express 57.1% of words in which examinees was interested. Verification of Words Extracted by Proposed System The words extracted by proposed system were verified by comparison with the words examinees wrote down. Concordance rate of words, Rmatch, where NSYS are the number
A System to Construct an Interest Model of User
311
of the words extracted by proposed system that coincide with the words examinees wrote down, is expressed by
Rmatch =
N SYS N TAG
(3)
Fig. 5 shows the relationship between concordance rate of words and the number of kind of metadata. As shown in Fig.5, in case of four kinds of metadata, concordance rate was 69.9% and in case of six kinds of metadata, concordance rate was 75.2%. 4.2 Verification of Effectiveness of Proposed System The sensory evaluation experiment were conducted to verify effectiveness of proposed system that construct the user’s interest model by using metadata of words and interword relationships. First, examinees browsed Web pages freely. At the same time, proposed system construct the interest model of examinee’s. In this experiment, the task for examinees was to compare the two maps. The one is the model didn’t include metadata, the other is the map included it. The map that didn’t have metadata shows words without changing the color of vertices by kinds of word such as personal name or corporate name. And, the map shows interword relationships without writing the semantic of relationship by kinds of interword relationships. As a result, examinees didn’t understand what kinds of relationship were. In contrast, as shown in Fig. 3, the map that has metadata shows words that have the color of vertices and interword relationships that have the semantic of them. These two models were compared by examinees in sensory evaluation experiment. The five evaluation items were: (a) Easiness of seeing this map. (b) Degree of similarity between the objects of your interest and this map. (c) Degree of contain the keywords you were interested in. (d) Amount of extra words in this map. (e) Adequacy of interword relationships. This sensory evaluation experiment was conducted by eight examinees. Fig. 6 shows averages and standard deviations of the evaluation of each evaluation item. As a result [%]
[%]
Four kinds
Six kinds
Fig. 4. Content rate of metadata of words
Four kinds
Six kinds
Fig. 5. Concordance rate
312
K. Kawazu et al.
In case of no metadata Proposed System
Fig. 6. Result of sensory evaluation
of two-sample t-test, evaluation item of (a) showed significant differences of 1% of significant level and evaluation item of (b) and (e) showed differences of 5% of significant level.
5 Consideration and Future Task Fig. 5 shows that the system constructed from six kinds of metadata can express 57.1% of words in which examinees was interested. In the future, it is effective to add some kinds of metadata in order to model user’s interest more adequately. However, extra extracted words in which users are not interested would increase by increasing the kind of metadata. To solve this problem, it is necessary to propose algorithm for distinguishing between the words in which the user is interested and the words in which the user isn’t interested. The development of this algorithm and the increase of Rinclusion would be attempted in the future. The words extracted by proposed system were verified by using Rmatch. Fig. 6 shows that there is no significant difference between concordance rate of words of four kinds of metadata and that of six kinds of metadata. This means that concordance rates of words about each kind of metadata are almost constant and high level. In the future, the increase of concordance rate of words would be attempted. As shown in Fig. 6, it was found that the easiness of seeing the user’s interest model increase by using metadata of words and interword relationships. However, it was found too that examinees sometimes feel that metadata of interword relationships is not appropriate. It will be the future task to solve this problem by adding the kinds of metadata of interword relationship appropriately.
6 Conclusion In this study, we constructed a system to construct an interested model of the user based on information in browsed Web pages by the user. This system is a foundation to present appropriate information to a user. This system can model 57% of interest objects of the user by using personal name, corporate name, site name, name of commodity, product name and location name. This became clear from evaluation experiment. And we constructed easy model to use for the user by appending metadata to words and interword relationships. In the future, we will construct a system to present appropriate information to a user by applying this system.
A System to Construct an Interest Model of User
313
References 1. Huangr, H., Fujii, A., Ishikawa, T.: The individualized technique of the information retrieval based on the Web community. In: The Association for Natural Language Processing Annual Conference, vol. 11, pp. 1006–1009 (2005) 2. Niwa, S., Doi, T., Honiden, S.: Web Page Recommender System Based on Folksonomy Mining. Transactions of information Processing Society of Japan 47(5), 1382–1392 (2006) 3. Kazienko, P., Kiewra, M.: Integration of relational databases and Web site content for product and page recommendation. In: Database Engineering and Applications Symposium, IDEAAS (2004) 4. Golovin, N., Rahm, E.: Reinforcement Learning Architecture for Web Recommendation. In: Proceedings of the International Conference on Information (2004) 5. Claypool, M., Gokhale, A., Miranda, T., et al.: Combining Content-Based and Collaborative Filters in an Online. In: Proc. ACM SIGIR 1999 Workshop on Recommender Systems: Algorithms and Evaluation, Berkeley, California (1999) 6. Hijikata, Y.: User Profiling Technique for Information Recommendation and Information Filtering. Journal of Japanese Society for Artificial Intelligence 15(5), 489–497 (2003) 7. Kantor, P.B., Boros, E., Melamed, B.: Capturing Human Intelligence in the Net. Communications of the ACM 43(8), 112–115 (2000) 8. Anderson, C.R., Horvitz, E.: Web Montage: A Dynamic Personalized Start Page. In: Proceedings of the 11th World Wide Web Conference (WWW’s 2002) (2002) 9. Murata, T., Saito, K.: Extraction and Visualization of Web Users’ Interests Using SiteKeyword Graphs. Journal of Japan Society for Fuzzy Theory and Intelligent informatics 18(5), 701–710 (2006) 10. http://www.google.com/press/zeitgeist2005.html 11. http://searchranking.yahoo.co.jp/ranking2008/general.html 12. http://searchranking.yahoo.co.jp/ranking2008/general.html 13. http://www.technorati.jp/ranking2006/ 14. Takashiro, T., Takeda, H.: Acquisition and Organaization of Personal Knowledge through WWW Browsing. Institute of Electronics, Information, and Communication Engineers J85-D-1(6), 549–559 (2002)
Adaptive User Interfaces for the Clothing Retail Karim Khakzar, Jonas George, and Rainer Blum Hochschule Fulda – University of Applied Sciences Marquardstrasse 35, D-36039 Fulda, Germany
[email protected]
Abstract. This paper presents the results of a research project that identifies the most important concepts for adaptive user interfaces in the context of ecommerce, such as online shops, and evaluates these concepts using a formalized method and standardized criteria. As a result, recommendations for the design of adaptive user interfaces are derived. Keywords: Adaptive User Interfaces, Concepts, Evaluation, Retail Shops.
1 Introduction The success of online shops is very much influenced by the quality of the user interface. Crucial criteria such as consistency and use of standards, flexibility and ease of control, transparency and robustness against maloperation, efficiency and help functionalities, clear and aesthetic design, and credibility of the shop determine whether users in the end purchase a product or not. In particular, complex products that normally require specialized service and consultancy by the sales person require a well designed user interface, in order to support the sales process. In this context, user interfaces that dynamically adapt to the needs and expectations of different customers seem to provide substantial advantages.
2 Approach This paper presents the results of a research project that identifies the most important concepts for adaptive user interfaces [1] in the context of e-commerce, such as online shops, and evaluates these concepts using a formalized method and standardized criteria. As a result, recommendations for the design of adaptive user interfaces are derived. First user tests performed on a prototype clothing retail online shop seem to confirm these results.
3 Applied Concepts for Adaptive User Interfaces Among the known and applied concepts for adaptive user interfaces [2], [3], that we have considered in our evaluation, are J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 314–319, 2009. © Springer-Verlag Berlin Heidelberg 2009
Adaptive User Interfaces for the Clothing Retail
1. 2. 3. 4. 5. 6. 7.
315
recommenders, intelligent menus, automatic localization, adaptation of content to the users’ knowledge, assistant for the structuring of information, intelligent help and support functionality, and user-oriented optimization of navigation.
Systems that include recommenders dynamically propose or suggest products or content to the customer based on the knowledge that the system has collected from the customer. The user may then decide, whether he or she will make use of the recommendation. Intelligent menus dynamically adapt to the needs of the users. E.g. a system may add or drop relevant or irrelevant items or issues to or from a menu list. One of the most frequently used methods of automatic localisation is the selection of a suitable language, which of course requires information on where the user is currently located. Besides language other characteristics may also be automatically adapted based on the information on the location of the user. Adaption of content to the user’s knowledge implies that a model of the user and information on the knowledge of the user are known to the system. The system may then react according to a predefined internal algorithm or strategy. Many applications request from the user that information is structured for further processing, storage, or presentation. An assistant for the structuring of information may very efficiently support this process. The benefit of this functionality depends very much on the complexity of information to be processed by the user. A widely spread possibility to improve usability of applications is the implementation of intelligent help and support functionality. Depending on the history and the context additional information is provided to the user. The user shall have the possibility to configure and optimize this functionality for his needs. The structure of content and navigation through a complex store may sometimes create problems and confusion for the user. A user-oriented optimization of navigation shall improve orientation of a user within the application. For the time being this collection of concepts represents the main adaptive user interfaces, but will probably not cover all existing and future concepts. The evaluation is currently not based on user tests or questionnaires but on a formalized way of grading different aspects, which obviously seem to be important for an implementation. In future this evaluation has to be combined with practical usability tests in order to confirm the conclusions and results.
4 Evaluation of Adaptive User Interfaces The main evaluation criteria for adaptive user interfaces that have been applied to the above concepts are
316
a. b. c. d.
K. Khakzar, J. George, and R. Blum
potential for the improvement of usability, support of different modes of needs, applicability to online shops, and availability of user information.
The first criterion relates to the question whether the application of one or several of the adaptive concepts will improve the usability of the interface. The second criterion considers the fact that different users may have totally different needs [4]. Three different types of users have been considered, namely users that know exactly what type of product they would like to buy (pre-knowledge driven), users that have a clear idea of the functional requirements a product has to meet (function driven), and users that react in an impulsive way on certain stimuli (impulse driven). Furthermore, it has to be clarified whether the proposed concepts are in general applicable to online shops. Finally, many adaptive user interfaces need some information on the user or the user behaviour. It is therefore necessary to investigate, whether this information is available or not in an online retail context. A formalized and more theoretical approach is proposed in order to evaluate the different concepts with an acceptable and limited effort. The validation of the recommendations shall be verified in near future using results from prototype implementations and user tests. For each of the concepts a rating with respect to the four above-mentioned criteria is applied using normalized values between 0 and 1, where 0 indicates that the relevant criterion is not fulfilled and 1 means that the criterion is completely fulfilled. In order to improve the relevance of the results the 4 main criteria have been divided into sub-criteria. E. g., the criterion „applicability to online shops“ consists of 9 sub-criteria, representing separate, typical, functional areas, namely • • • • • • • • •
door page, product catalogue, search function, detailed representation of products, configuration of products, shopping cart, administration of customer accounts, payment functionality, and evaluation and comment functionality.
In a similar way sub-criteria for the above mentioned main criteria “potential for the improvement of usability”, “support of different modes of needs”, and “availability of user information” have been defined. Based on these sub-criteria and the degree of fulfilment a total value has been calculated for each of the potential concepts for adaptive user interfaces (see Table 1) [5]. The corresponding main criteria are listed in Table 2. Our evaluation results can be summarized as follows:
Adaptive User Interfaces for the Clothing Retail
317
Table 1. Summary of evaluation results
Adaptation concept
a
b
c
d
total
recommenders
0,43
1
0,56
1
2,99
intelligent menus
0,57
0
0,44
1
2,01
automatic localization
0,71
1
1
1
3,71
adaptation of content to the users’ knowledge assistant for the structuring of information intelligent help and support functionality
0,43
1
0,22
1
2,65
0,57
0
0
1
1,57
0,57
1
1
1
3,57
user-oriented optimization of navigation
0,29
1
0,22
1
2,51
Table 2. Criteria
a
potential for the improvement of usability
b
support of different modes of needs
c
applicability to online shops
d
availability of user information
Depending on the priorities of requirements and the relevance of the criteria Table 1 can provide an indication which of the concepts shall be followed and implemented. According to Table 1 the two concepts automatic localization and intelligent help and support functionality seem to provide the highest benefit. In particular, with respect to improvement of usability, automatic localization shows clear advantages. It is obvious that a user friendly interface has to consider the individual needs of the user which can be extracted from an automatic localization function. Both functions automatic localization and intelligent help and support functionality fully support the different modes of needs and are applicable to online shops. Furthermore, from a technical point of view the required user information can be made available in most environments in order to support those concepts. On the other hand in the context of a shop application the adaptation concept of an assistant for the structuring of information is obviously not too relevant. Online shops typically use existing catalogues which are already well structured. However, this concept may gain importance, e.g. for complex and configurable products, which require detailed information and description. In this case, a well structured presentation of relevant product information could significantly improve usability of an application.
318
K. Khakzar, J. George, and R. Blum
Clearly, for the time being, the evaluation is not based on empirical tests but on theoretical considerations and information derived from literature. Nevertheless, we believe that the results are well suited to design online shops with adaptive user interfaces.
5 Guidelines for the Development and Implementation A set of guidelines has been derived from the results of the evaluation, which should help to introduce adaptive user interfaces in online shops. 5.1 Conformity Customers who visit a shop expect a high degree of conformity while navigating through the shop. Adaptive concepts should never create confusion for the user. Eventually, two alternative modes may be offered to the customer, one using persistent components and the other using adaptive concepts. 5.2 Unobtrusiveness Adaptive concepts should never disturb the sales process in an online shop. Adaptive user interfaces which are not understood by the customer may draw off the attention of the customer. Ideally, the user should not notice that the user interface has adapted to his needs. 5.3 Confidentiality Adaptive user interfaces that make use of individual user data have to make sure that confidentiality of user data is absolutely guaranteed. If this is not the case, adaptive concepts will be counterproductive, since users will avoid these shops due to lack of credibility in the trade partner. 5.4 Controllability Even for adaptive user interfaces the user shall keep the control over the interface. It should be possible to prevent adaptation and to fall back to the last status. It shall be avoided that users do not get access to information or details due to automatic adaptation of an interface. 5.5 Transparency It shall always be clear for the user why and how the adaptation has been performed. This allows the user to understand the logic behind the user interface. In case of multiple and complex adaptations, additional information may be provided to improve understanding of the adaptation. 5.6 Flexibility and Scalability The concept and architecture of the adaptive interface should allow flexible and scalable modification and extension of an adaptive user interface. It may be necessary to
Adaptive User Interfaces for the Clothing Retail
319
optimize the functionality of the interface in a continuous process, e.g. different languages may be implemented at a later point in time in order to adopt to new markets and users.
6 Conclusions The conclusions that can be derived from this evaluation of adaptive user interfaces can be summarized as follows. • Most of the listed concepts are suitable for the improvement of usability. However, adaptive menus and navigation structures are critical, because users might miss consistency and controllability. • All concepts except intelligent menus seem to offer benefits for the support of different modes of needs. • In principle, all concepts except assistants for structuring information are applicable to online shops. However, the effectiveness very much depends on the concrete application. Availability of user information should be given for all the presented concepts. Relevant user information can be collected in a typical web application. Additional information on the location of users can be derived from the user’s system or connection, respectively. We are currently undertaking practical user tests in order to receive confirmation for the above evaluation results, guidelines and conclusions.
References 1. Ardissono, L., Goy, A.: Tailoring the Interaction with Users in Electronic Shops. In: Proceedings of the Seventh International Conference on User Modeling, pp. 35–44. Springer, Heidelberg (1999) 2. Galitz, W.O.: The Essential Guide to User Interface Design: An Introduction to GUI Design, Principles and Techniques, 3rd edn. Wiley Publishing Inc., Chichester (2007) 3. Jameson, A.: Adaptive Interfaces and Agents. In: Jacko, J.A., Sears, A. (eds.) Humancomputer interaction handbook, 2nd edn. Erlbaum, Mahwah (2006) 4. Ambaye, M.: A Consumer Decision Process Model for the Internet, Brunel University, UK, Dissertation (2005) 5. George, J.: Adaptive User Interfaces in Electronic Commerce Applications (Adaptive Benutzungsschnittstellen im elektronischen Handel), in German, Fulda University of Applied Sciences, Final Thesis (2008)
Implementing Affect Parameters in Personalized Web-Based Design Zacharias Lekkas1, Nikos Tsianos1, Panagiotis Germanakos2,3, Constantinos Mourlas1, and George Samaras3 1
Faculty of Communication and Media Studies, National & Kapodistrian University of Athens, 5 Stadiou Str, GR 105-62, Athens, Hellas
[email protected], {ntsianos, mourlas}@media.uoa.gr 2 Department of Management and MIS, University of Nicosia, 46 Makedonitissas Ave., P.O. Box 24005, 1700 Nicosia, Cyprus 3 Computer Science Department, University of Cyprus, CY-1678 Nicosia, Cyprus {pgerman, cssamara}@cs.ucy.ac.cy
Abstract. Researchers used to believe that emotional processes are beyond the scope of a scientific study. Recent advances in cognitive science and artificial intelligence, however, suggest that there is nothing mystical about emotional processes. Affective neuroscience and psychology have reported that human affect and emotional experience play a significant, and useful, role in human learning and decision making. Emotions are considered to play a central role in guiding and regulating learning, performance, behaviour and decision making, by modulating numerous cognitive and physiological activities. Our purpose is to improve learning performance and, most importantly, to personalize web-content to users’ needs and preferences, eradicating known difficulties that occur in traditional approaches. Affect parameters are implemented, by constructing a theory that addresses emotion and is feasible in Web-learning environments. Keywords: affect, emotions, mood, disposition, regulation, personalization, decision-making, learning.
1 Introduction Web-based information systems are increasingly being used for learning and decision support applications. Computers are becoming better and more sophisticated every day. They can already perceive information related to user needs and preferences. One possible implementation of a Web-based system’s interface that can appraise human needs is through the use of a set of parameters which influence the environment according to the emotional condition of the user [1]. An emotionally tense or unstable individual will be able to receive the contents of a webpage based to what he considers appropriate for his working or learning profile. A certain emotional condition demands a personalization of equivalent proportions. The user will have the capability to respond emotionally either after being asked or after a decision from the system and to inform the content presentation module about his preferences and inclinations. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 320–329, 2009. © Springer-Verlag Berlin Heidelberg 2009
Implementing Affect Parameters in Personalized Web-Based Design
321
In order for a personalization system to work, it is necessary to have a solid and grounded theory and a set of personalization rules that will truly respond to user needs and change the environment to their benefit. Affective processing is a mechanism that is not fully researched and the implications from the various studies that exist in the field are often contradictory [2]. Therefore, it is of great importance to formulate a theory and especially a model of Affect and then implement a platform which takes into consideration both traditional profile as well as cognitive and affective data of the user and develop suitable system’s architecture and its personalization rules.
2 Proposed Model of Affect Affect is a term that includes a range of feelings that individuals experience, including discrete emotions, moods and traits such as positive and negative affectivity. There is an ongoing debate in whether emotions have a vital role in people’s performance, judgement and decision making process [3] [4]. There are of course the notions of mood and disposition. The borders between the three dimensions are foggy and we cannot be certain in many occasions about the nature of the affective process. Emotions can sometimes transform into a mood and moods in a range of time can be indicative of a person’s dispositional affect. An in-depth model that grasps the complexity of these underlying concepts is the first purpose of our research. Instead of selecting one area of implementation we combine these three levels of analyses and form a typology that will help us circle effectively the affective mechanisms of the brain. In order to apply a purely psychological construct to a digital platform based on personalization rules we adjust the various theories concerning emotions having in mind to make our model flexible and applicable to users’ profiles, needs and preferences. Our model has three base elements: a) Emotional arousal is the capacity of a human being to sense and experience specific emotional situations. An effort to construct a model that predicts the role of specific emotions is beyond the scope of our research, due to the complexity and the numerous confounding variables that would make such an attempt rather impossible. We focus on arousal as an emotional mechanism and not on a number of basic emotions because it can provide some indirect measurement of general emotional mechanisms since it manages a number of emotional factors like anxiety, boredom effects, anger, tension and sadness. b) Mood is an affective state that lasts longer than an emotion and is not as specific as an emotion can be. Moods generally have either a positive or negative valence. c) Dispositional affect is a stable trait and tendency towards positivity or negativity. Individuals with positive affectivity tend to be cheerful and energetic and experience positive moods across a variety of situations as compared to people who tend to have low energy and be melancholic. Individuals with negative affectivity have a negative view on self and tend to be distressed and upset in relation to people who are calm and relaxed. These basic elements that constitute the affective state of an individual play an important role in the emotionally-charged information that a person is receiving. Our model would be problematic without a regulatory mechanism of affect. For this reason we
322
Z. Lekkas et al.
constructed the measure of emotion regulation that is comprised from terms like emotional intelligence, self-efficacy, emotional experience and emotional expression. Emotion regulation is the way in which an individual is perceiving and controlling his emotions. Individuals attempt to influence which emotions they have, when they have them and how they experience and express them.
3 Incorporating Affective Factors in the Personalization Process In order to manipulate the parameters of our adaptive system [5] according to user characteristics, our research has to go through the stage of extracting quantified elements that represent deeper psychological and emotional abilities. The latter cannot be directly used in a web environment, but a numerical equivalent can define a personalization parameter. Our main objective is to quantify the terms of emotional arousal and emotion regulation in our first experiment and dispositional affect in the second, and see their effect on performance and the value of personalization through the aforementioned categories. In our study, we are interested in the way that individuals process their emotions and how they interact with other elements of their information-processing system. We conducted two consecutive experiments to test a part of our theory on affection. In the first phase of our research we examined the immediate and synchronous affective user reactions and behaviour which are covered in our model by the terms of emotional arousal and emotion regulation [6]. We hypothesized that by combining the level of arousal of an individual with the moderating role of emotion regulation, it is possible to clarify, at some extent, how affectional responses of the individual hamper or promote learning procedures. Thus, by personalizing on this concept of affect the educational content that our already developed adaptive system provides [7], we can avoid stressful instances and take full advantage of the users’ cognitive capacity at any time. At a practical level, our personalization rules were based in the assumption that users with high arousal levels lacking the moderating role of emotion regulation are in a greater need of enhancing the aesthetic aspects of our system, while users with low arousal levels focus more on usability issues. Another hypothesis is that emotion regulation and arousal are negatively correlated. We propose that an individual with high emotion regulation would usually have low arousal levels because of his ability to control and organize his emotions. In the second phase we are interested in clarifying the role of dispositional trait affect which is a global and general mood (positive or negative) and its relationship with the construct of emotion regulation. After the construction and standardization of our instruments we are currently trying to find the weighting, the importance and the implications of dispositional affect. Our hypothesis is that a user with negative affect and low regulation potential will be keener to accept and make greater use of the personalization tools that we offer him. The personalization is based on the aesthetic enhancement of the interface and on the better provision of content. The former tool aids his informational needs and the latter his processing needs. These two tools are embedded in the interface and our goal is to measure how users with specific profiles will perform with or without personalization. Participants were allocated in four categories of affection that came up from a combination of dispositional affect (positive or negative) and emotion regulation (high or low) and we assumed that users with positive affect and high regulation would perform better than users with negative and low.
Implementing Affect Parameters in Personalized Web-Based Design
323
Emotional arousal and emotion regulation are concepts easily generalized, inclusive and provide some indirect measurement of general emotional mechanisms. These sub-processes manage a number of emotional factors like anxiety boredom effects, anger, feelings of self efficacy, user satisfaction etc. Among these, our current research concerning emotional arousal emphasizes on anxiety, which is probably the most indicative, while other emotional factors are to be examined within the context of a further study. Anxiety is an unpleasant combination of emotions that includes fear, worry and uneasiness and is often accompanied by physical reactions such as high blood pressure, increased heart rate and other body signals [8] [9]. Accordingly, in order to measure emotion regulation, we are using a construct that includes the concepts of emotional intelligence, self-efficacy, emotional experience and emotional expression [10]. However, there is a considerable amount of references concerning the role of emotion and its implications on academic performance (or achievement), in terms of efficient learning [11]. Emotional intelligence seems to be an adequate predictor of the aforementioned concepts, and is a grounded enough construct, already supported by academic literature [12] [13]. Dispositional affect is a general term used more or less interchangeably with various others, such us emotion, emotionality, feeling and mood. It can be used as a label for the pleasantness-unpleasantness dimension of feeling. It can be differentiated from mood in most occasions, which is properly used for more pervasive and sustained emotional states [14]. In our research we took into consideration dispositional affect as a separate construct and we investigated its relationship with emotion regulation as well as its effect on performance. For this reason we developed a questionnaire of ten items that follows the typology of positive and negative affect. It allocates users in one of the two categories. Combined with emotion regulation they give us four categories in total that we used in our second experiment to investigate user performance and personalization efficiency.
4 Experimental Evaluation 4.1 Sampling and Procedure All participants were students from the University of Athens. The first part of the study concerning affect was conducted with a sample of 92 students. 35% of the participants were male and 65% were female, and their age varied from 17 to 22 with a mean age of 19. In the second experiment 124 students participated 40% of whom were male and 60% female with exactly the same age mean and variance. The environment in which the procedure took place was an e-learning course on algorithms; the factor of experience was controlled for. In order to evaluate the effect of matched and mismatched conditions, participants took an online assessment test on the subject they were taught (algorithms). This exam was taken as soon as the elearning procedure ended, in order to control for long-term decay effects. The dependent variable that was used to assess the effect of adaptation to users’ preferences was participants’ score at the online exam.
324
Z. Lekkas et al.
The sample was divided in two groups: almost half of the participants were provided with information matched to their preferences, while the other half were taught in a mismatched way. We expected that users in the matched condition would outperform those in the mismatched condition. In the first experiment, users in the matched condition with moderate and high levels of anxiety receive aesthetic enhancement of the content and navigational help and in the mismatched condition users with moderate and high levels of anxiety receive no additional help or aesthetics. In the second experiment again half of the participants were provided with information matched to their affective preferences (aesthetic and processing facilitation), while the other half were taught in a mismatched way. Apart from the investigation on the role of personalization in general we measured performance in four categories of affection that came up from a combination of dispositional affect (positive or negative) and emotion regulation (high or low). Our hypothesis was that again (like emotional arousal) dispositional affect would be negatively correlated with emotion regulation, and that personalization tools will help users to raise their performance especially those with negative affect and low regulation skills. In this second phase apart from the aesthetic enhancement tool, participants in the negative affect category received the additional help of personalized content, because according to theory individuals with negative affect process information with a different manner (usually worse) and that is why they have extra processing needs. 4.2 First Experiment Results The results of experiments conducted within the actual learning environment (table 1, 2), as we hypothesized, show that users with high or medium anxiety level (core and specific), lacking the moderating role of emotion regulation, are in a greater need of enhancing the aesthetic aspects of our system and the provision of additional help, in order to perform as well as low anxiety users. Users with low anxiety levels focus more on usability aspects. Table 1. Analysis of variance between emotion regulation groups and core anxiety means
Table 2. Analysis of variance between emotion regulation groups and specific anxiety means
Implementing Affect Parameters in Personalized Web-Based Design
325
Table 3. Multifactorial ANOVA (Factors - Core Anxiety, Application Specific Anxiety and Aesthetics)
All types of anxiety are positively correlated with each other and negatively correlated with emotion regulation. These findings support our hypothesis and it can be argued that our theory concerning the relationship between anxiety and regulation has a logical meaning. In tables 1 and 2 an even stronger relationship between Emotion regulation and core and specific anxiety is displayed respectively. A statistically significant analysis of variance for each anxiety type shows that if we categorize the participants according to their Emotion regulation ability, then the anxiety means vary significantly with the high regulation group scoring much higher than the low one. Finally, in table 3 we can see that the two conditions (matched aesthetics/mismatched aesthetics) are differentiating the sample significantly always in relation with performance. Participants in the matched category scored higher than the ones in the mismatched one and additionally lower anxious (core or specific or both) scored higher than high anxious, always of course in relation to match/mismatch factor. We also found that participants with low application specific anxiety perform better than participants with high specific anxiety in both matched and mismatched environments. Additionally, when a certain amount of anxiety exists, the match-mismatch factor is extremely important for user performance. Participants with matched environments scored highly while participants with mismatched environments had poor performance. Emotion regulation is negatively correlated with current anxiety. High emotion regulation means low current anxiety and low emotion regulation means high current anxiety. Finally, current anxiety is indicative of performance, while high current anxiety is associated with test scores below average and low current anxiety with high scores. Graph 1 shows the scores that participants achieved in relation to each experimental condition. 4.3 Second Experiment Results The results of the second experiment conducted again as we hypothesized, show that users with negative affect, lacking the moderating role of emotion regulation take advantage of the aesthetic aspects of our system and the provision of additional help (processing), in order to perform similarly with users with positive mood and regulation skills (see Graph 2). Additionally as it can be seen in table 4, the two notions of
326
Z. Lekkas et al. Graph 1. Mean scores (performance) in each experimental condition
dispositional affect and emotion regulation were found to be as hypothesized significantly statistically different. A user with high regulation ability has a tendency towards positive mood and a user with low regulation ability is resilient to negative mood. A significant finding is that the affective state of the user is having an effect on his score (Table 5 and Graph 2). Participants with positive affect perform better than participants with negative affect in both matched and mismatched environments. Table 4. Analysis of Variance (ANOVA) between dispositional affect (positive or negative) and Emotion regulation ANOVA reg_means
Between Groups Within Groups Total
Sum of Squares 2.245 20.796 23.041
df 1 122 123
Mean Square 2.245 .170
F 13.171
Sig. .000
Table 5. Analysis of Variance (ANOVA) between affective state (based on dispositional affect and regulation) and scores ANOVA Score % Sum of Squares Between Groups 2203.378 Within Groups 32649.589 Total 34852.968
df 3 120 123
Mean Square 734.459 272.080
F 2.699
Sig. .049
Implementing Affect Parameters in Personalized Web-Based Design
327
Graph 2. Overall scores categorized by affective type and by environment
Additionally, the match-mismatch factor is extremely important for user performance. Participants with matched environments scored highly while participants with mismatched environments had poor performance. Overall we can say that affect is greatly related to performance. Of equal importance is the notion of regulation that acts as a moderating factor to negative Affect and as a reinforcement to positive affect. The personalization techniques were proven beneficial for all users, and especially for those with negative affect. This group of users requires specific help on the interface as well as the structure and appearance of information. In our design, their informational and processing needs were met by the personalization tools of aesthetic enhancement, navigation support and content re-allocation.
5 Conclusions By combining the affective state of the individual with his regulatory mechanism we can reach a conclusion of how affect infuences his learning performance. We cannot accept in advance that high emotional reactions have a negative effect on the individual since, through regulation, emotionality can be manifested as motivation and/or extra effort. Another key point in our rationale is that an affective instance cannot be described as a discrete and separate emotion but it is a more complex state in which various emotions can coexist. Affective information can be analysed in many consecutive emotional bursts that can easily be theoretically contradictory. Various emotions and affective reactions of different (or the same) valence can exist at the same time or alternate in great speed that is difficult to grasp. Due to the complexity of the individual’s affective state it is
328
Z. Lekkas et al.
wise to form a typology and speak of affective types and categories and not to look for specific emotions. One possibly wrong assumption in emotion research is that discrete emotions occur in isolation. In fact, we believe that emotional reactions frequently involve more than one discrete emotion. Emotion regulation is of great importance because it can alter the outcome of the individual’s behaviour from negative to positive. We can argue that affect is greatly related to performance. Of equal importance is the notion of regulation that acts as a moderating factor to negative affect and as a reinforcement to positive affect. The personalization techniques were proven beneficial for all users, and especially for those with negative affect. This group of users requires specific help on the interface as well as the structure and appearance of information [15]. In our design, their informational and processing needs were met by the personalization tools of aesthetic enhancement, navigation support and content reallocation. The examination of affective reactions can enrich our understanding of both personality and emotions. The issue of individual differences in affective reactions can thus be explained through the simultaneous study of user affective characteristics and learning behaviour.
References 1. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997) 2. Lewis, M., Haviland-Jones, J.M.: Handbook of emotions, 2nd edn. The Guildford Press, New York (2004) 3. Bechara, A., Damasio, H., Damasio, A.R.: Emotion, decision-making, and the orbitofrontal cortex. Cerebral Cortex 10, 295–307 (2000) 4. Levenson, R.W.: The intrapersonal functions of emotion. Cognition and Emotion 13, 481– 504 (1999) 5. Germanakos, P., Tsianos, N., Lekkas, Z., Mourlas, C., Samaras, G.: Capturing Essential Intrinsic User Behaviour Values for the Design of Comprehensive Web-based Personalized Environments. Computers in Human Behavior Journal, Special Issue on Integration of Human Factors in Networked Computing (2007), doi:10.1016/j.chb.2007. 07.010 6. Lekkas, Z., Tsianos, N., Germanakos, P., Mourlas, C.: Integrating Cognitive and Emotional Parameters into Designing Adaptive Hypermedia Environments. In: Proceedings of the Second European Cognitive Science Conference (EuroCogSci 2007), pp. 705–709 (2007) 7. Tsianos, N., Lekkas, Z., Germanakos, P., Mourlas, C., Samaras, G.: User-centered Profiling on the basis of Cognitive and Emotional Characteristics: An Empirical Study. In: Nejdl, W., Kay, J., Pu, P., Herder, E. (eds.) AH 2008. LNCS, vol. 5149, pp. 214–223. Springer, Heidelberg (2008) 8. Kim, J., Gorman, J.: The psychobiology of anxiety. Clinical Neuroscience Research 4, 335–347 (2005) 9. Barlow, D.H.: Anxiety and its disorders: The nature and treatment of anxiety and panic, 2nd edn. The Guilford Press, New York (2002) 10. Schunk, D.H.: Self-efficacy and cognitive skill learning. In: Ames, C., Ames, R. (eds.) Research on motivation in education. Goals and cognitions, vol. 3, pp. 13–44. Academic Press, San Diego (1989)
Implementing Affect Parameters in Personalized Web-Based Design
329
11. Kort, B., Reilly, R.: Analytical Models of Emotions, Learning and Relationships: Towards an Affect-Sensitive Cognitive Machine. In: Conference on Virtual Worlds and Simulation (VWSim 2002) (2002), http://affect.media.mit.edu/projectpages/lc/vworlds.pdf 12. Goleman, D.: Emotional Intelligence: why it can matter more than IQ. Bantam Books, New York (1995) 13. Salovey, P., Mayer, J.D.: Emotional intelligence. Imagination, Cognition and Personality 9, 185–211 (1990) 14. Barsade, S., Brief, A., Spataro, S.: The affective revolution in organizational behavior: The emergence of a paradigm. In: Greenberg, J. (ed.) Organizational Behavior: The State of the Science, p. 352. Lawrence Erlbaum Associates, Publishers, London (2003) 15. Cassady, C.C.: The influence of cognitive test anxiety across the learning–testing cycle. Learning and Instruction 14, 569–592 (2004)
Modeling of User Interest Based on Its Interaction with a Collaborative Knowledge Management System Jaime Moreno-Llorena, Xavier Alamán Roldán, and Ruth Cobos Perez Dpto. de Ingeniería Informática, EPS Universidad Autónoma de Madrid 28049 Madrid, Spain {Jaime.Moreno,Xavier.Alaman,Ruth.Cobos}@uam.es
Abstract. SKC is a prototype system for knowledge management in the Web by means of semantic information without supervision and tries to select the knowledge contained in the system by paying attention to its use. This paper explains user activity analysis in order to find out their interest for knowledge elements in the system, and the application of this interest for users classification and knowledge identification for their interest, inside and outside SKC. As a result a model for user interest based on interaction is obtained. Keywords: user interest model, user interaction, user profiling, data mining, knowledge management, CSCW.
1 Introduction Information overload is one of the problems of the ICT use extension. The Web is the most general and significant example of this phenomenon. We think network knowledge management systems have the most important characteristics of the systems with this problem, but these systems are more scalable and have more controllable parameters, so they may be used as an experimental model for research. For example, the Web could be seen as a global knowledge management system. Our hypothesis is that there are several hidden aspects in the systems affected by the information overload which can contribute positively to the solution of this problem. On one hand, taking advantage of the excess energy of the active elements that are involved in the systems, such as users, services, applications and other entities related to them. On the other hand, using the properties of both the elements and the activities related to the systems affected by the problem, eg. the network, the active entities mentioned above, both the information and the knowledge involved, or the processes and the interactions of that elements and activities. To investigate this hypothesis we have used, as an experimental platform, a knowledge management system called KnowCat [1][4]. This is a groupware system that facilitates the management of a knowledge repository by means of user community interaction through the Web. This can be done without supervision by using information about user activities and their opinion about the documents that are part of the knowledge base -eg. through votes or notes-. The knowledge repository is constituted J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 330–339, 2009. © Springer-Verlag Berlin Heidelberg 2009
Modeling of User Interest Based on Its Interaction
331
fundamentally by two components: the Documents, that are the basic knowledge units; and the Topics, that are structured hierarchically as a Knowledge Tree. Each document describes the topic where the document is located in the tree. Each topic appears once in the tree, and it could include several documents that describe it and it could include some subtopics too. Each system instance is a KnowCat node that deals with a specific subject and has a user community and its own knowledge repository organized in a knowledge tree. The task carried out by KnowCat could be improved by reducing the necessity of explicit displays of users’ opinion about the knowledge, as well as exploiting the implicit displays of opinion demonstrated by users’ activity and the features of all the elements involved in the process -eg. user’s interaction with the system-. Such improvements could be generalized to other knowledge management systems, as the Web. This paper deals with how to represent the implicit interest shown by the users’ activity towards the knowledge elements of the KnowCat’s repository so that it can be used in the system. In order to corroborate the design hypothesis of the proposed approach, a prototype has been developed on KnowCat, which has been called Semantic KnowCat (SKC) [10]. SKC includes, among other things, Client Monitor [11] (CM) that is in charge of obtaining information about the user’s activity on the system’s client side, information to be used for analyzing the user’s interest to the knowledge, and SKC incorporates a Analysis Module (AM) [12] that is in charge of analyzing the system knowledge repository with data mining techniques to describe its elements by means of Words Weight Vectors (WWV) [2]. This paper starts from a mechanism for monitoring users’ activity [11] built into the CM of SKC prototype mentioned above. That mechanism makes possible to establish each user's interest per the elements of the knowledge repository. The monitoring mechanism registers the users’ interaction intensity towards the knowledge elements by using the user interface events of the system. This paper shows how the data obtained with that mechanism allows representing each user's interest inside the knowledge management system, using a User’s Interest Vector (UIV), which represents the distribution of user’s interest among the knowledge elements of the system. This paper also shows how the monitored data are good to describe each user's interest inside and outside of the system by means of a User’s Interest Words Weight Vector (UIWWV). This new vector is obtained starting from the corresponding user's UIV and the Words Weight Vectors (WWV) that describe the knowledge elements of the system. These UIWWVs allow comparing the users’ interests with any knowledge element represented by a generic WWV [2]. The proposed approach has been validated with an experiment series carried out with the system KnowCat in educational activities in the Universidad Autónoma de Madrid (Spain). Several approaches have been proposed for users’ interest modeling for personalization in data recovery field. These go from manual personalization [9] [6], by means of direct indication of preference, to complete automatic modelling [7][5], through monitorization and analysis of user behaviour; going through a demonstration modeling [8], by means of a presentation in the resource system which users consider interesting. Many of the above-mentioned approaches use vector models for representing and comparing text documents, some of which also use WWVs to represent users’ interest [13]. Others use conceptual clustering algorithms and use user activity with the processed documents to establish user interest for these documents [13] [5]. Others are in charge of modeling the progress of user interest throughout the time [8].
332
J. Moreno-Llorena, X. Alamán Roldán, and R. Cobos Perez
The proposed approach combines some of the strategies and techniques mentioned above to achieve an interest model based on user interaction with the system on client’s side but which can nevertheless be used on the server side. The intention of the model presented is to facilitate comparison of user interest and selection of knowledge elements which are interesting for those inside and outside the system. The final objective of this proposal is to improve knowledge collaborative management through the system, taking advantage of user’s normal activity but without disrupting it.
2 User Interest Vector In a previous paper [11] we have shown how an impression on user’s interest towards items that constitute the knowledge handled in one instance -node- of SKC prototype can be obtained by measuring the intensity of users’ interaction with the latter. The corresponding process is based on the activity register analysis (LOG) of the Web Server that supports the system. In this file a line is annotated per each resource requested to the server -HTML pages, images, etc.- in chronological order. The registered data may be configured in the Server, but it usually includes data such as URL resource, the moment of the request or the requester IP address, among others. Normally, Web servers –such as the one that supports the KnowCat system- only write in the LOG file when they deal with resource request, for this reason they have no information about what happens to them after they have been served. The Client Module (CM) of the SKC prototype corrects this inconvenience by collecting data on user’s activity on the Client and sending it to the Server at a frequency appropriate to the activity received. Since data is sent invoking a program from the Web Server itself, the corresponding calls are registered in the LOG with this data. A CM activity data register LOG line is a register line like the others, but it refers to a characteristic resource “infoSituation.pl” and includes encoded data, among them, for instance, the user identifier that generated it or values observed for the activity indicators taken into account in each period, such as mouse movement, scrolling or computer keystroke. By analysing LOG file lines and taking into account the CM lines included in this file and the design of the system user interface -by using both techniques [3]: activity register mining Web and Web site structure mining -, we can possibly get an idea of what happened with the resources served throughout the time and of user interest for knowledge base items to which the resources refer to. The basic process analysis is described in detail in the above mentioned paper [11]. In general terms, it consists of establishing regular monitorization cycles by days and sessions of the user, and to appreciate observed interaction for the knowledge items involved in each period. Taking as a starting point this basic analysis, we may establish how intensive user community interaction is, as a whole, on each knowledge item, what may give us an idea of the group interest for items and, somehow, of the value that the group gives them [11]. In addition, when users are requested to identify themselves to use the system, the CM register lines include the references of those that have caused them, as we have seen in the previous example. With this information it is possible to follow-up each user activity throughout the time, even through several sessions. This has been done to establish user interest indication (UII) and user interest vectors (UIV).
Modeling of User Interest Based on Its Interaction
333
In particular, taking the results obtained from the basic analysis outline as our starting-point, the values assigned to each of the registered knowledge items in monitorization cycles are accumulated by users. As a result, a list of all knowledge items accessed by the user labeled with an interaction intensity indicative value (III) maintained with them is obtained for each user. A couple of interesting elements may be withdrawn from this list. On the one hand, by adding the IIIs per user, the latter may be assigned the representative interaction intensity indicative value maintained with the system, which may be used as indication of the interest shown by the individual in his/her activity with the node, IIU, and used to compare it with the level of interest shown by other members of the community. On the other hand, by dividing every III of each user into the sum of these, a vector representing contribution of each item relative to the interest shown by the user may be obtained. These are the user interest vectors, UIVs, with which it is possible to establish comparisons among interests shown by members of the community who use the system.
3 User Interest Words Weight Vector In a previous paper [12] the process used in SKC prototype to create knowledge item descriptors handled by the system was explained, through words weight vectors (WWV) [2] based on the texts related to these items. The way to use these descriptors for establishing a relationship among the corresponding knowledge items was also shown. The following summarizes the above in general terms. WWVs are term lists which have been given a weight. Terms are obtained from significant words from the starting text, putting into groups the different ways in which they may appear -number of nouns, verb tenses, etc.- The weight is established taking into account the frequency of the terms in the original text and the frequency of these in the general use of the language, so that the most frequent words in the text and less common in the language have greater weights than the most common ones in the usual use of the language, especially if they are not very numerous in the text. In order to compare the items with each other, a similarity among WWVs is established, in this specific case the cosine of the angle that forms these vectors. The level of similarity among WWVs is between zero and one, closer to the unit, the greater similarity is among the vectors and closer to zero, the less similar they are. These are typical data recovery techniques [2] and are very popular currently for its use in well-known search engines as Google. Until now, in SKC prototype, the WWVs related to the knowledge items were based on texts that were considered to be totally linked to the respective items, eg. text documents were their own descriptions, topics were assigned a descriptive text composed by a combination of descriptions of all the documents and subtopics that were included in these topics, users' descriptions as authors were obtained from document linking created by them, etc. This strategy is not likely to be appropriate for user interest treatment, since each individual's interest is distributed unequally among several knowledge items. The user interest vector, UIV, of each individual shows how its interest is distributed among various knowledge items, indicating for each of them a level of interest, which is given as a fraction of unity of its participation of the total. Taking as a starting point each UIV and the knowledge items WWVs to which the first one refers to, it is possible to create one descriptor per user representative of the interest content shown by the corresponding user in the system.
334
J. Moreno-Llorena, X. Alamán Roldán, and R. Cobos Perez
The new descriptor is the user’s interest words weight vector, UIWWV. It is actually a WWV in which words weight is established based on the weight of these in the knowledge item words weight descriptors and on the proportion of these items in users' interest. UIWWVs, as it happens for UIVs and WWVs, depend on the period taken into account for their calculation and they progress with time depending on how users’ interest is shown by the system knowledge items or the description of the items involved vary -eg. when documents are updated or new documents or subtopics are added to the topics-. UIWWVs are the same as other WWVs and may be compared with each other, as well as being useful to establish a relationship among users and any element of this type that a descriptor may have associated. Specifically, within the SKC prototype field, it makes sense to determine interest similarities among users on one or more instances -nodes- in the system, and also to establish possible users’ interest for node themes or any knowledge tree topic theme -list of topics-. In addition, as it is easy to establish WWVs for elements external to the system and accessible through the Internet, as long as they are textual or have some description of this type associated, UIWWVs may be used to establish links with them. In this sense, it could be interesting to discover available resources in other knowledge repositories – eg. different SKC nodes or Wikis - within users’ interest area inside the knowledge node field to which these belong to, or to identify those which are outside this area.
4 Experiments Carried Out In order to prove the viability of the proposals mentioned above, support has been incorporated into the Analysis Module (AM) of the SKC prototype for users’ interests. Experiments with KnowCat nodes aimed to test the new prototype SKC functionalities in teaching activities in the Universidad Autónoma of Madrid (Spain) have been carried out. To that end, two KnowCat nodes have been used, one on Computer Systems (CS) and another on Automata Theory and Formal Languages (ATFL). In both cases, activities with specific aspects for the AM tests have been designed with support for users’ interest, which were developed throughout 2006-2007. The CS node continues with the development of the subject matter initiated one year before. During the new academic year around 90 students have contributed about 160 documents distributed among 40 new topics, which are added to about that many existing ones and over 180 documents previously developed. The ATFL node gathers the documents made by the students based on others contributed by a teacher throughout an academic year in a knowledge tree on topics related to the corresponding subject. Around 90 students have worked on 6 topics and almost 450 documents in total. 4.1 Identification Experiments of User Interest The first experiment uses the CS node so that students can develop the subject outline based on the notes and references provided during the year. The idea is to develop papers that are useful for preparing the development of one of the topics during the subject test. The experiment has two stages: In the first stage each student is assigned a couple of topics so that they can prepare the corresponding papers and publish them in the system; in the second stage, each student is assigned three more topics so that they can check, make comments and mark the work carried out by their classmates in
Modeling of User Interest Based on Its Interaction
335
the first stage. After this process, the system arranges the papers according to quality in the corresponding knowledge tree. As a result, students have at their disposal 40 new topics developed for the exam preparation and can get up to one more point in the subject's final mark, depending on their participation in the activity. In order to make follow-up possible, students must identify themselves at the beginning of each session on the system. At the end of the experiment the LOG file registered during the development has been processed. Based on it, user interest vectors (UIVs) have been established for two activity intervals: the first, from the beginning of a time interval until the end of the first stage, paper preparation stage on the topics assigned; and the second, from the beginning of the second stage until the end of this stage, revision interval of the papers included in the new topics assigned. To illustrate this experiment, the second interval UIVs are shown in Figure 1, in which the UIV of each user is represented by a column of colour blocks. Each block shows the users' relative level of interest for each topic, so that the lighter colours indicate a higher degree of interest than the darker colours. In the diagram, each column corresponds to one user and each row to a topic and the topics assigned for checking to students are marked with a cross. Some users have no topics assigned at all, because they are not students. In some cases there are more topic assignments than expected, these are errors in the assignment process that were corrected during the development of the experiment, but remained registered in the LOG. As we can see in Figure 1, the topics assigned to users in the second interval are found to be the most interesting to the vast majority of them, since the assignment marks are usually placed on some lighter colour block in each column. This phenomenon is also obvious in the first interval. Another phenomenon that turns the attention in the diagram is students’ little apparent interest for the topics they have not been assigned. This could be due to experiment conditions, activity periods observed and characteristics of the group involved. The experiment has been designed to cause an artificial interest in users for specific topics at certain moments and make the necessary activity easy to monitorize. Students’ motivation in the periods observed is to get an extra mark for participating properly in the activity, but not for using the material prepared for studying -students usually prepare their exams in the last minute-.
Fig. 1. UIVs of the second activity stage in the experiment with CS node
336
J. Moreno-Llorena, X. Alamán Roldán, and R. Cobos Perez
Fig. 2. UIVs of the third activity stage in the experiment with ATFL node
The second experiment carried out uses the ATFL node mentioned above. The activity consists of making five short summaries on some other topics of the subject by using the reference papers provided by the teacher through the system. There are two weeks for the preparation of each paper. At the beginning of each period one working topic per student must be chosen, so that the students do not repeat the same topic throughout the activity and there are no more than a maximum number of students per topic and interval. The summaries delivered in each period remain placed in the system with restricted access, only authorized to the teacher in charge of their assessment. According to the work carried out, each student can get up to one point more in their subject mark. As in the previous case, participants in the experiment must identify themselves at the beginning of each session to allow their activity follow-up. After the experiment the system LOG was processed to establish User’s Interest Vector (UIV) for the five document delivery periods. The UIVs corresponding to the third stage are shown in Figure 2. Each column corresponds to one user and each row to one topic. As in the previous case, the topics assigned to the students in the illustrated interval are marked with a cross and some users have no topics marked because they are not students. The UIVs representation is the same as in the previous experiment and the same chromatic code is used, in which the lighter shades of colour the higher degree of interest. In analogous way to what happened in the previous experiment, students appear to be more interested in topics which are clearly among the ones assigned to users in each period. On this occasion, the phenomenon is more obvious than in the previous case, given the characteristics of the experiment. Firstly, the knowledge tree has been prepared for the activity, presenting a number of topics adjusted to the latter. Secondly, for summary preparation the documents published in the corresponding topics of the knowledge tree itself must be revised. Thirdly, students are not allowed to access the documents delivered. In addition, the only incentive for participating is the extra mark for the fulfilment of the assigned tasks. Lastly, owing to all this, students are hardly interested in topics that have not been assigned to them in each interval. 4.2 Content Identification Experiments of Users’ Interest Inside and Outside SKC In the above experiments we have shown the use of User’s Interest Vector (UIV) to represent how members’ attention of a community is distributed among the knowledge node elements they belong to. UIVs have no sense outside the context where they have been defined, because they refer in an explicit way to specific elements of the latter and do not provide directly any information that can be used outside this context. Now we are going to test the benefits of User’s Interest Words Weight Vector (UIWWV) to make use of the information previously collected in the UIVs in a more general manner and with application inside and outside the node where they were produced.
Modeling of User Interest Based on Its Interaction Users
External Sources
Users
Users
337
Fig. 3. Grouping of users by interest through their VPPIUs (left) and classification of external knowledge sources by user interest through VPPIUs (right)
The first experiment carried out on this occasion consists of comparing among themselves the UIWWVs generated from the UIVs obtained in the first of the previous experimental, which was carried out with a CS node and was introduced in section 4.1. This activity consisted of proposing students two topics first, to carry them out in each paper, and after another three topics, so that they would evaluate the work provided by different users. As a result, during the activity, students have appeared to be more interested in the assigned topics than in the rest of the contents in the knowledge tree node used, which was really the intention. Topics were divided into eight packages and were assigned to students in a systematic manner; therefore the topic group which has been established for each of them can be known. The packages included a varied selection of CS forty topics taken into account for the activity. The corresponding UIWWVs have been calculated according to interest distribution represented by the UIV of each user- and to the WWVs of the papers involved, as explained in section 3. Users’ UIWWVs have been compared among themselves, to check if they made clear the common interests users, using typical information retrieval techniques [2] mentioned above. The result has been shown in a graph (see Figure 3 left) where all the participant users are represented in both x and y axes, arranged by the assigned topic groups. Similarity value is shown with small colour blocks, being a lighter colour the greater the represented coefficients are. As we can see in the graph, the higher values appear grouped around the diagonal forming blocks. In proposed conditions, this indicates that the UIWWVs allow to identify users who are interested in the same topic groups, using generic descriptors that may be used to compare them with any object that has a WWV associated. However, we can only clearly see five blocks in the diagonal instead of eight, as would have been expected from the number of topic groups established. This may be due to the number of users assigned to each package that have taken part effectively in the activity, and to the loss of significant characteristics in some of the UIWWVs of those packages, as a consequence of specific combinations in diverse topics. The second experiment carried out consists of applying the UIWWVs generated in the previous experiment to identify knowledge sources outside the Computer System node used there. For this reason, several knowledge virtual sources have been established on different topics based on documents accessible through the Web -such as papers on other KnowCat nodes or on Wikipedia and teachers’ notes-. Specifically, eight virtual repositories have been prepared on the following topics: (1) Philosophy, (2) Current History, (3) Biology, (4) Data and Information Structure, (5) Automata
338
J. Moreno-Llorena, X. Alamán Roldán, and R. Cobos Perez
and Formal Languages, (6) Software Engineering (7) Operating Systems and (8) Computer Systems. As we can see, the last five repositories deal with subjects related to the field of Computer Science, whereas the three first ones have nothing to do with this area. The last repository specifically refers to Computer Systems, which is the field of node Computer Science, which students have had to work on and for which they were compelled to show interest in an artificial manner. At no given time documents included in the reference node -where UIWWVs come from- have been used. Based on the documents in each repository a WWV representative of the latter has been obtained. These WWVs have been compared with users’ UIWWVs generating a similarity graph (see Figure 3 right). In the graph each row corresponds to one user and each column to a repository -identified by the number assigned to its topic in the list of topics mentioned in the above paragraph-. The same as in graphs of other previous experiments, the greater similarities have been represented by lighter shades and the minor ones by darker shades. As we can see, the WWVs of the topics that are less related to Computer Science have much minor similarities with UIWWVs than the ones of the so mentioned knowledge area, something which seems to be quite reasonable. Within the WWV’s of topics related to the reference node, the one in Operating Systems has greater similarity with some users’ interest than other topics, what makes sense if we look at the reference node contents. Lastly, it is obvious that the greatest similarity is concentrated in the virtual repository on Computer Systems, what was also to be expected. Under the conditions of the experiment, we may conclude that representation of user interest through UIWWVs allows to identify knowledge repositories that seem to be objectively linked to the interest shown by the users. As a matter of fact, in the conditions of these experiments the UIWWVs are WWVs representative to the themes of each user’s particular interest and are similar to WWVs of topics that form Knowcat node knowledge trees. Therefore, these vectors may be useful to compare themes that they represent to WWVs of diverse elements: documents, topics or users of the node itself as well as other KnowCat nodes, or even any external source represented by WWVs. However, UIWWVs seem to be very sensitive to user interest dispersal as it happens with UIVs and its use may decline if the context of its definition is not handled in some way, for example maintaining different UIWWVs for each user in different contexts.
5 Conclusion In this paper, a model to represent user interest in a Web system for collaborative knowledge management without supervision has been presented. This model uses some User Interest Vectors (UIVs) to represent user interest for the elements that constitute the system knowledge repository. With these vectors it is possible to compare users' interests inside the system. In addition, this model uses some User Interest Words Weight Vectors (UIWWV) to represent user interest in such a way that it may be compared with Words Weight Vectors that represent any data element inside and outside the system. In addition, the results of a series of experiments aimed to test the practical application of the proposed approach have been shown. As a result of these experiments, we may confirm that: (1) UIVs allow to adequately identify the focal points of user interest in the system knowledge repository, (2) UIWWVs allow to
Modeling of User Interest Based on Its Interaction
339
identify users with similar interests and (3) data elements related to users' interests inside and outside the system. In conclusion, the model proposed contributes to the work area that concerns us, with a new process to determine the key words that may represent users' interest and its importance; thereby starting from the intensity of user interaction with data elements through the system. Acknowledgements. This research has been partially financed by the Spanish Ministry of Science and Technology, through TIN2007-64718 and TIN2008-02081/TIN projects, and by the Spanish Agency for the International Cooperation (AECI) through A/7954/07 project.
References 1. Alamán, X., Cobos, R.: KnowCat, AWeb Application for Knowledge Organization. In: Chen, P.P., et al. (eds.) ER Workshops 1999. LNCS, vol. 1727, pp. 348–359. Springer, Heidelberg (1999) 2. Baeza, R., Ribeiro, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999) 3. Chang, G., Healey, M., McHugh, J., Wang, J.: Mining the World Wide Web: An introduction search approach. Kluwer, Dordrecht (2001) 4. Cobos, R.: Mechanisms for the Crystallisation of Knowledge, a proposal using a collaborative system. Doctoral dissertation. Universidad Autónoma de Madrid (2003) 5. Godoy, D., Amandi, A.: Modeling User Interests by Conceptual Clustering. Information Systems 31(4-5), 247–265 (2006) 6. Gudivada, V.N., Raghavan, V.V., Grosky, W.I., Kasanagottu, R.: Information Retrieval on the World Wide Web. IEEE Internet Computing 1(5), 58–68 (1997) 7. Kim, S., Fox, E.A.: Interest-Based User Grouping Model for Collaborative Filtering in Digital. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 533–542. Springer, Heidelberg (2004) 8. Lam, W., Mostafa, J.: Modeling user interest shift using a Bayesian approach. Journal of the American Society for Information Science and Technology archive 52(5), 416–429 (2001) 9. Lieberman, H.: Letizia: an agent that assist web browsing. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 924–929 (1995) 10. Moreno, J., Alamán, X.: A Proposal of Design for a Collaborative Knowledge Management System by means of Semantic Information. In: Navarro-Prieto, R., et al. (eds.) HCI related papers of Interacción 2004, pp. 307–319. Springer, Dordrecht (2005) 11. Moreno, J., Alamán, X.: SKC: Measuring the user’s interaction intensity. In: FernándezManjón, B., et al. (eds.) Computers and Education: E-learning, From Theory to Practice, pp. 123–132. Springer, Heidelberg (2007) 12. Moreno, J., Alamán, X.: SKC: Digestión de Conocimiento. In: Proceedings of the VII Congreso Internacional INTERACCION 2007, Zaragoza, pp. 281–290 (2007) 13. Zhengwei, L., Shixiong, X., Qiang, N., Zhanguo, X.: Research on the User Interest Modeling of personalized Search Engine. Wuhan University Journal of Natural Sciences, 893– 896 (2007)
Some Pitfalls for Developing Enculturated Conversational Agents Matthias Rehm1, Elisabeth André1, and Yukiko Nakano2 1
Faculty of Applied Informatics University of Augsburg {rehm,andre}@informatik.uni-augsburg.de 2 Faculty of Science and Technology Seikei University Tokyo
[email protected]
Abstract. A review of current agent-based systems exemplifies that a Western perspective is predominant in the field. But as conversational agents focus on rich multimodal interactive behaviors that underlie face-to-face encounters, it is indispensable to incorporate cultural heuristics of such behaviors into the system. In this paper we examine some of the pitfalls that arise in developing such systems. Keywords: Embodied Conversational Agents, Cultural Heuristics, Multimodal Interaction.
1 Introduction This paper argues that Embodied Conversational Agents (ECAs) [5] are prototypical devices for enculturating the human computer interface. It examines the standard development process for ECA systems and discusses at each step the pitfalls that arise from integrating culture as a computational parameter into the process. The paper is not going to argue for or against specific cultural theories, but relies on Hofstede’s [11] dimensional theory of culture as a widely used example. Embodied conversational agents can be regarded as a special case of multimodal dynamic interactive systems (see Figure 1 for some examples). They promote the idea that humans, rather than interacting with tools prefer to interact with an artifact that possesses some human-like qualities – at least in a large number of application domains. If it is true, as Reeves and Nass’ [24] media equation suggests, that people respond to computers as if they were humans, then there are good chances that people are also willing to form social relationships with virtual personalities. As a consequence, it seems inevitable to take cultural aspects into account when creating such agents. Due to their embodiment, agents present complex multimodal systems with rich verbal and nonverbal repertoires. Additionally, the appearance of the agent might play an important role when taking cultural aspects into account. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 340–348, 2009. © Springer-Verlag Berlin Heidelberg 2009
Some Pitfalls for Developing Enculturated Conversational Agents
341
Fig. 1. Examples of Embodied Conversational Agent. Top row: the IPOC earthquake instructor [21], an autonomous bot in Second Life [27], the Gamble multiuser dice game [25], interacting with virtual dancers [30]. Bottom row: collaborating agents in edutainment [28], a virtual tourist guide, the FearNot! anti-bullying system [10].
Embodied Conversational Agents as an interface metaphor have a great potential to realize cultural aspects of behavior in several fields of human computer interaction: 1. Information presentation: By adapting their communication style to the culturally dominant persuasion strategy, agents become more efficient in delivering information or selling a point or a product. 2. Entertainment: Endowing characters in games with their own cultural background has two advantages. It makes the game more entertaining i.) by providing coherent behavior modifications based on the cultural background and ii.) by letting the characters react in a believable way to (for them) weird behavior of other agents and the user. 3. Education: For educational purposes, experience-based role-plays become possible, e.g. for increasing cultural awareness of users or for augmenting the standard language textbook with behavioral learning. Two main issues for enculturating embodied conversational agents are discussed in this paper: 1. Enculturating agents opens up a challenging research field because culture penetrates most of the above mentioned features (verbal and nonverbal behavioral, appearance) of an agent. Thus, enculturating such a system has to rely on a solid theoretical framework that is able to describe or even predict these influences. 2. Moreover, the developers’ own cultural background provides them with implicit design heuristics for the system, which have to be challenged actively at every step of the process. These issues are addressed in relation to the methodological approach for realizing ECA systems.
2 Designing ECA Systems The methodological approach for modeling the behavior for embodied conversational agents is well exemplified by the following development steps:
342
M. Rehm, E. André, and Y. Nakano
─ Study: To build a formal model for generating realistic agent behaviors, data of human interactions is necessary for two reasons: (i) it serves as an empirical foundation for the formal models of human agent interaction, and (ii) it serves as a benchmark against which these models are evaluated. In most cases, formal models are not built from scratch. Rather, the data analysis serves to refine existing models found in the literature. Such models often lack explicit information necessary for the integration in an agent system like synchronization and timing of modalities. Over the last decade, numerous work has established the area of multimodal corpus analysis to shed light on the specifics of multimodal interaction. To give some examples, [16] suggest an annotation scheme for gestures that draws on the distinction between the temporal course of a gesture and its type and relies on a gesture typology introduced by [20]. [1] as well as [7] annotate instead the expressivity dimensions of gestural activity focusing on how a gesture is accomplished and not on what kind of gesture is used. [26] describe an annotation scheme that analyzes gestures on a more abstract functional level. Their corpus captures the relation between linguistic and nonverbal strategies of politeness. ─ Model: The data gathered in the previous step of the development process serves as the foundation of a formal model of human agent interaction. [4] give an account on how the data from such a corpus can be used to directly mirror the behavior of a human speaker with an agent. A similar approach is described by [16], who extract information of personal idiosyncrasies of the human speaker, which is then mimicked by the agent. [19] extract statistical rules from a corpus of natural dialogues that allow them to generate appropriate head and hand gestures for their agent that accompany the agent’s utterances. Instead of rules, [26] have shown how statistical information can be extracted from a multimodal corpus and used as control parameters for a virtual character. To this end they analyzed what kind of relation exists between certain types of gestures and verbal strategies of politeness. The resulting models of human-human conversational behavior then serve as a basis for the implementation of ECAs that replicate the behaviors addressed by the models. ─ Test: To evaluate the resulting system, experiments are set up in which humans are confronted with ECAs following the model. The data collected in the first step can serve as a baseline against which the resulting ECA implementation can be tested. [6] as well as [23] exemplify this use of multimodal corpora in developing agents that exhibit human turn taking behavior and human grounding behavior respectively. [25] uses instead a corpus of human agent interactions to exemplify how design guidelines can be derived on this basis for such interactive systems. The above mentioned work concentrated on the challenge of realizing natural interaction behaviors for agent systems but did not acknowledge culture as a relevant parameter that might influence such interactions. As long as the cultural background of the users of such systems is identical to the developer’s background, this does not pose a problem as both work with the same culturally determined heuristics for generating and interpreting behavior. In the next section, we present work that tries to explicate cultural influences in order to allow for adapting the behavior of agents to the user’s cultural background.
Some Pitfalls for Developing Enculturated Conversational Agents
343
3 Related Work Compared to the systems described in the last section, culture adds another layer of complexity to the endeavor of modeling the behavior of conversational agents. Prominent approaches that embrace this challenge are so far primarily located in the area of intelligent tutoring systems. The tactical language initiative [14] aims at coaching US soldiers in culture specific language skills. Obviously, the target domain is the Middle East. Users have to use the right phrases and select appropriate co-verbal gestures in order to achieve their goals, e.g. to persuade a doctor to move his hospital somewhere else. The same user group and cultural domain is focused in [18], who present a tutoring system to teach social norms in negotiation scenarios. [15] examine cultural differences in persuasion strategies and present an approach of incorporating these insights into a persuasive game for a collectivist society. Whereas the presented systems aim at simulating real cultures, the Orient system [3] considers a virtual culture instead, targeting teenagers as a user group and trying to increase the awareness of cultural differences in this age group. These systems explicitly model cultural behavior of the agents for a given domain. What is lacking so far is a principled approach of considering culture as a computational parameter. [13] present a first approach of modifying the behavior of characters by cultural variables relying on Hofstede’s [11] dimensions. The variables are set manually in their system to simulate the behavior of a group of characters. [29] aim at automatically adapting to the user’s cultural background by setting appropriate parameters for the nonverbal behavior of the agents. To this end they employ Bayesian networks that model the causal relations between cultural dimensions and nonverbal behavior. [12] investigate the relative importance of appearance and verbal as well as nonverbal behavior to attribute a specific culture to an agent and find evidence that consistent behavior can override the cultural background implied by the agent’s appearance. To sum up, the importance of cultural influences on the interactive behavior of agents have been acknowledge but there are only few approaches that go beyond the explicit modeling of a specific culture for a clearly determined application scenario. One reason might be the difficulty of pinning down the influences of culture on the development process of agent-based systems. In the remainder of this paper, we exemplify the problems of developing truly enculturated agents.
Fig. 2. Cultural influences on the development of ECAs
344
M. Rehm, E. André, and Y. Nakano
4 Culture in the Development Process Introducing culture into the development process poses challenges on two levels. (i) The development should be grounded in a theoretical framework that is able to explain and ideally predict behavior based on the features of a specific culture. This would allow realizing a parameter-based model of cultural influences in order to simulate culture-specific behavior without having to develop a completely new agent for every culture. (ii) The developer(s) own culture has to be kept at bay as it provides implicit design heuristics that have to be actively challenged at every step of the development process. The rest of this section will address these two challenges for the first two steps of the development process. Figure 2 exemplifies the interrelation of cultural aspects that interfere with this process either from the phenomena under examination or from the developers own cultural background. 4.1 Study The most appropriate theoretical framework for our endeavor seems to be a theory that defines culture as norms and values, i.e. heuristics for behavior. A number of approaches exist for this line of thinking ([9], [17], [31]). One that is prominent and widely used is Hofstede’s [11] dimensional theory of culture. Based on a broad empirical survey Hofstede defines culture as a dimensional concept, where a given culture occupies a certain area on each dimension. Correlated with the locations on these five dimensions are heuristics on how to behave “properly” in the given culture. The five dimensions are hierarchy, identity, gender, uncertainty, and orientation. We will not go into detail here but give an example on possible correlations between dimension and behavioral heuristics. The identity dimension e.g. is tightly related to the expression of emotions and the acceptable emotional displays in a culture. Thus, it is more acceptable in individualistic cultures like the US to publicly display strong emotions than it is in collectivistic cultures like Japan [8]. Uncertainty avoidance like identity is directly related to the expression of emotions. In uncertainty accepting societies, the facial expressions of sadness and fear are easily readable by others whereas in uncertainty avoiding societies the nature of emotions is less accurately readable by others, which was shown by [2]. It has to be noted that Hofstede’s theory is not without controversy. His theory is based on a large-scale questionnaire study with IBM employees, which constitutes a strong selection bias on the results. Nevertheless, Hofstede’s theory has a great appeal because of its quantitative nature. Although the theory describes certain correlations between cultural dimensions and correlated behavioral heuristics, this attribution is not unambiguous as the correlated heuristics might contradict each other on different dimensions. Consider for instance the following example dealing with proxemics. High power distance (hierarchy) might result in standing further apart in face-to-face encounters whereas collectivism (identity) generally means standing closer together in the same situation. Both attributions hold true for the Japanese culture. Thus, what will be the result of these correlations if they are combined? Solutions of different complexity can be thought of. Interlocutors could position themselves simply in a mean distance. Or we could define a hierarchical relation between the dimensions resulting in some information being overridden or weighted differently. More sensible would be a contextual adaptation
Some Pitfalls for Developing Enculturated Conversational Agents
345
that takes the semantics of the dimensional position into account. If a culture has a high power distance then there could be differences in proxemics behavior that are related to social status, for instance resulting in standing further away from high status individuals but closer together with peers. What is apparent from these examples is one obvious conclusion. To adapt the behavior of agents to cultural heuristics it is indispensable to gain insights into how these differences manifest in face-to-face encounters. Unfortunately, there is a lack of reliable cross-cultural data as the information in the literature is often of an anecdotal character, or lacks technical information that is necessary to realize an interactive system. One way to deal with this problem is to gather data in a standardized way, tailored to the modeling endeavor. In [22] in this volume, we describe such an approach for the German and Japanese culture. Whereas sometimes the developer’s intuition might work due to the fact that the developer can take his own actions as a model for building the interactive behavior of an ECA, this is quite problematic if designing for a different culture. The developer’s own cultural norms and heuristics hinder this process in making quite specific aspects of behavior relevant that might be irrelevant in a different culture. Consider the following example. If studying turn-taking behavior in Germany, the effect will be to consider an ordered exchange between interlocutors with little overlap as the basic form of discussion. But in other cultures it is more common to have strong overlaps and simultaneous turns in discussions to emphasize one’s interest in the topic [32]. Thus, investigating turn-taking behavior in Italy might result in a completely different model of turn-taking behavior. Thus, even when being aware of cultural differences does not necessarily help in identifying relevant behaviors. An obvious solution to this problem would be to always involve developers from the targeted cultures in the development process. This might only be feasible for large-scale projects. A lowbudget solution would be to discuss most of the design choices as often as possible with someone from the target country. To do so, it is important to make one’s own design choices explicit. As the underlying heuristics are implicit and generally interpreted as the “natural” way to do things, this might not be too easy. One way of solving this problem could be to develop some best-practice advices on how to check for cultural issues in the design of the system. 4.2 Model If we roughly sketch the process of behavior selection and generation in an agent system, it becomes obvious that culture penetrates most stages of this process. Figure 3 gives a simplified impression of some of the main processing steps. In the planning stage, culture provides scripts and rituals for interactions. One of the most fundamental situations in this respect is a first meeting encounter. According to [2], a first meeting is a ritual that follows pre-defined scripts. [32] follows this analysis by denoting a first meeting as a ceremony with a specific chain of actions. Behavior selection is concerned with enriching the dialogue step with suitable verbal and nonverbal behavior. Consider the use of gestures as an example. Culture influences the selection process on different levels. On the one hand, it is necessary to choose the right gesture type and animation for the utterance. This repertoire of available gestures is at least partially culture-specific as there are sets of language and thus
346
M. Rehm, E. André, and Y. Nakano
Fig. 3. Culture influences during the behavior generation process
culture-specific emblematic gestures. On the other hand, if and how many gestures are employed in an utterance differ widely between cultures. The Italian culture for instance has a rich repertoire of emblematic gestures and gestures in general are used frequently in face-to-face encounters. Quite the opposite is true for the German culture. In the realization stage another influence of culture comes into play. Consider again gestural activity. Whereas one culture gestures fast and frequently, taking much space in doing so, other cultures make only use of infrequent gestures that do not intrude the space of the interlocutor. The scheduling stage at last is necessary to ensure appropriate timing in turn-taking of the interlocutors, which again is culturespecific. For instance in the above mentioned study on German and Japanese behavior [22] we found that German interlocutors are generally uncomfortable with longer pauses in conversations compared to the Japanese samples. One suggestion to deal with this ubiquitous influence of culture is presented in [29]. By modeling the causal relations between a culture’s location on Hofstede’s dimensions and correlated behavior in a probabilistic network, it becomes possible to extract different types of cultural influences from this network. The different layers of the network can serve as influence at different steps of the generation process. Another idea is presented in [3]. In a close analogy to the Chomskian ideas of language use, a universal behavior selection process is realized, which is augmented with culture-specific transformation rules for perceptions and actions. Again, the cultural background of the developer supplies heuristics on what is interpreted as relevant or typical behavior. At this point, this check is necessary on different layers of abstraction. The developer’s background may bias how the data derived in the previous step is used to model the behavior of an ECA. The definition of objective criteria is a necessary prerequisite for a reliable analysis. Actually building the ECA based on the analysis and the model suffers from the same pitfalls as before. What is an unimportant variation in gestural expressivity in one culture might lead to severe misunderstandings in a second culture. The same suggestions presented in the previous section apply here.
5 Conclusion In this paper we examined a number of pitfalls one could stumble in while developing enculturated conversational agents and presented some ideas on strategies to prevent these pitfalls. These pitfalls originate from different sources. On the one hand trying to integrate cultural aspects in interactive systems poses the challenge of modeling these aspects based on a sound theoretical framework that is able to explain or even predict behavioral heuristics of different cultures. On the other hand the developer’s
Some Pitfalls for Developing Enculturated Conversational Agents
347
own cultural background has to be kept in check in the process as it provides implicit design heuristics for the system that might go easily unnoticed. Consequently, there are different strategies for avoiding these pitfalls. Hofstede’s dimensional theory of culture is very popular but the difficulties in applying this theory have been noted. Thus, it remains to be shown if there are more suitable models for this endeavor. For keeping the developer’s own cultural background in check, a solution was presented that involves establishing best practice guidelines for the integration of cultural aspects in interactive systems.
Acknowledgements The work described in this paper was partially supported by the German Research Foundation (DFG) with research grant RE2619/2-1, the Japan Society for the Promotion of Science (JSPS) with a grant-in-aid for scientific research (C) (19500104), and by the European Community (EC) in the eCIRCUS project IST-4-027656-STP.
References 1. Abrilian, S., Martin, J.-C., Buisine, S., Devillers, L.: Perception of movement expressivity in emotional TV interviews. In: HUMAINE Summerschool (2006) 2. Argyle, M.: Bodily Communication. Methuen & Co. Ltd., London (1975) 3. Aylett, R., Paiva, A., Vannini, N., Enz, S., André, E., Hall, L.: But that was in another country: agents and intercultural empathy. In: Proceedings of AAMAS (2009) 4. Caridakis, G., Raouzaiou, A., Bevacqua, E., Mancini, M., Karpouzis, K., Malatesta, L., Pelachaud, C.: Virtual agent multimodal mimicry of humans. Language Resources and Evaluation 41, 367–388 (2007) 5. Cassell, J., Sullivan, J., Prevost, S., Churchill, E.: Embodied conversational agents. MIT Press, Cambridge (2000) 6. Cassell, J., Nakano, Y., Bickmore, T.W., Sidner, C.L., Rich, C.: Non-Verbal Cues for Discourse Structure. Meeting of the Association for Computational Linguistics, 106–115 (2001) 7. Chafai, N.E., Pelachaud, C., Pelè, D.: Analysis of gesture expressivity modulations from cartoon animations. In: Proceedings of the LREC Workshop on Multimodal Corpora (2006) 8. Ekman, P.: Telling Lies — Clues to Deceit in the Marketplace, Politics, and Marriage, 3rd edn. Norton and Co. Ltd., New York (1992) 9. Hall, E.T.: The Hidden Dimension. Doubleday (1966) 10. Hall, L., Woods, S., Aylett, R., Newall, L., Paiva, A.: Achieving Empathic Engagement Through Affective Interaction with Synthetic Characters. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 731–738. Springer, Heidelberg (2005) 11. Hofstede, G.: Cultures Consequences: Comparing Values, Behaviors, Institutions, and Organizations Across Nations. Sage Publications, Thousand Oaks (2001) 12. Iacobelli, F., Cassell, J.: Ethnic identity and engagement in embodied conversational agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 57–63. Springer, Heidelberg (2007) 13. Jan, D., Herrera, D., Martinovski, B., Novick, D., Traum, D.: A Computational Model of Culture-Specific Conversational Behavior. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 45–56. Springer, Heidelberg (2007)
348
M. Rehm, E. André, and Y. Nakano
14. Lewis Johnson, W.: Serious use of a serious game for language training. In: Proceedings of the International Conference on Artificial Intelligence in Education, pp. 67–74 (2007) 15. Khaled, R., Biddle, R., Noble, J., Barr, P., Fischer, R.: Persuasive interaction for collectivist cultures. In: Piekarski, W. (ed.) The Seventh Australasian User Interface Conference (AUIC 2006), pp. 73–80 (2006) 16. Kipp, M., Neff, M., Kipp, K.H., Albrecht, I.: Towards Natural Gesture Synthesis: Evaluating gesture units in a data-driven approach to gesture synthesis. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 15–28. Springer, Heidelberg (2007) 17. Kluckhohn, F., Strodtbeck, F.: Variations in value orientations. Row, Peterson (1961) 18. Chad Lane, H., Hays, M.J.: Getting down to business: Teaching cross-cultural social interaction skills in a serious game. In: Workshop on Culturally Aware Tutoring Systems (CATS), pp. 35–46 (2008) 19. Lee, J., Marsella, S.: Nonverbal Behavior Generator for Embodied Conversational Agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS, vol. 4133, pp. 243–255. Springer, Heidelberg (2006) 20. McNeill, D.: Hand and Mind — What Gestures Reveal about Thought. The University of Chicago Press, Chicago (1992) 21. Nakano, Y., Nishida, T.: Awareness of Perceived World and Conversational Engagement by Conversational Agents. In: Proceedings of the AISB 2005 Symposium on Conversational Informatics for Supporting Social Intelligence & Interaction (2005) 22. Nakano, Y., Rehm, M.: Multimodal Corpus Analysis as a Method to Ensure Cultural Usability of Embodied Conversational Agents. In: Proceedings of HCI International (2009) 23. Nakano, Y., Reinstein, G., Stocky, T., Cassell, J.: Towards a Model of Face-to-face Grounding. In: Proceedings of the Association for Computational Linguistics (2003) 24. Reeves, B., Nass, C.: The Media Equation – How People Treat Computers, Television, and New Media Like Real People and Place. Cambridge University Press, Cambridge (1996) 25. Rehm, M.: She is just stupid – Analyzing user-agent interactions in emotional game situations. Interacting with Computers 20(3), 311–325 (2008) 26. Rehm, M., André, E.: More Than Just a Friendly Phrase: Multimodal Aspects of Polite Behavior in Agents. In: Nishida, T. (ed.) Conversational Informatics, pp. 69–84. Wiley, Chichester (2007) 27. Rehm, M., Rosina, P.: Second Life as an Evaluation Platform for Multiagent Systems Featuring Social Interactions. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2008) 28. Rehm, M., André, E., Conradi, B., Hammer, S., Iversen, M., Lösch, E., Pajonk, T., Stamm, K.: Location-based interaction with children for edutainment. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Weber, M. (eds.) PIT 2006. LNCS, vol. 4021, pp. 197–200. Springer, Heidelberg (2006) 29. Rehm, M., Bee, N., André, E.: Wave like an Egyptian – Accelerometer based gesture recognition for culture specific interactions. In: Proceedings of Britisch HCI (2008) 30. Rehm, M., Vogt, T., Bee, N., Wissner, M.: Dancing the Night Away – Controlling a Virtual Karaoke Dancer by Multimodal Expressive Cues. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1249–1252 (2008) 31. Schwartz, S.H., Sagiv, L.: Identifying culture-specifics in the content and structure of values. Journal of Cross-Cultural Psychology 26(1), 92–116 (1995) 32. Ting-Toomey, S.: Communicating Across Cultures. The Guilford Press, NewYork (1999)
Comparison of Different Talking Heads in Non-Interactive Settings Benjamin Weiss1, Christine Kühnel1, Ina Wechsung1, Sebastian Möller1, and Sascha Fagel2 1
Quality & Usability Lab, Deutsche Telekom Laboratories, Technische Universität Berlin, Ernst-Reuter-Platz 7, D-10587 Berlin, Germany {bweiss, christine.kuehnel, ina.wechsung, sebastian.moeller}@telekom.de 2 Department for Language & Communication, Technische Universität Berlin, Straße des 17. Juni 135, D-10623 Berlin, Germany
[email protected]
Abstract. Six different talking heads have been evaluated in two consecutive experiments. Two text-to-speech components and three head components have been used. Results from semantic differentials show a clear preference for the most human-like and expressive head. The analysis of the semantic differentials reveals three factors each. These factors show different patterns for the head components. Overall quality is strongly related to one factor, which covers the quality aspect ‘appearance’. Another factor found in both experiments comprises ‘human likeliness’ and ‘naturalness’ and is much less correlated with overall quality. While subjects have been able to clearly separate all head components with different factors of the semantic differential, only some of these factors are relevant for explicit quality ratings. A good appearance seems to affect the perception of sympathy and the ascription of reliability. Keywords: talking heads, evaluation, quality aspects, smart home domain.
1 Introduction Embodied Conversational Agents (ECAs) are assumed to improve Human-Computer Interaction (HCI) as they represent social conversational partners [11] and their presence can result in better user ratings, the so-called ‘persona-effect’ [4,8]. One promising application domain is the smart home environment, where ECAs as main interface to the multitude of different available services can support users. However, it is still not well understood which features of an ECA are relevant for the quality perceived by users. For example, state-of-the-art talking heads can be used in real-time to synthesize audio-visual articulation with high synchronicity of the two modalities (e.g. [2,4]). Apart from the persona-effect, ECAs are foremost evaluated with respect to style (natural human vs. comic style or even non human visualization, formal vs. informal), gender, or dialogue behaviour (cf. [11,14] for an overview). However, there is no J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 349–357, 2009. © Springer-Verlag Berlin Heidelberg 2009
350
B. Weiss et al.
study known to us directly comparing different synthesis modules concerning their impact on user’s perception. This question is addressed in two watching-and-listening-only experiments. For the first experiment, significant differences for overall quality as well as speech and visual quality have been obtained for three different talking head components and two speech synthesis systems [12]. In this paper, we aim at quantifying the impact on a more detailed level, including aesthetic attribution, to identify different quality aspects. The benefit is twofold: First, with such a detailed response the impact of the technical components on the ratings can be assessed. Second, the relevance of the different quality aspects on the overall quality within the smart home domain addressed by our study can be estimated. Whereas the first experiment was conducted in a strictly controlled environment of our lab, the second one basically mirrors the first apart from being a shorter web-based version.
2 Material Three different talking head components and two freely available speech synthesis systems are combined, resulting in six different combinations. One head originates from the Thinking Head Project [3] and will be referred to by TH in the following. This head is based on a 3D model with the texture made from pictures of the Australian artist STELARC. Additional to articulating, it moves, smiles, and winks. The other two heads were developed at TU Berlin. One is the Modular Audiovisual Speech SYnthesizer (MA) [7], the other is a German Text-To-Audiovisual-Speech synthesis system based on speaker cloning (CL) using motion capture [6]. Both heads are immobile apart from lower face movements, and they show no facial expressions. In contrast to the others, CL has neither hair nor neck. The speech synthesis systems used are the Modular Architecture for Research on speech sYnthesis (Mary) [15] based on HMM-synthesis, and the Mbrola system (Mbrola) [5] based on diphone synthesis. For both speech synthesis systems a male German voice (‘hmm-bits3’ for Mary and ‘de2’ for Mbrola) was used. 60 videos were pre-recorded presenting the talking heads uttering 10 sentences related to the smart home domain for all 2x3 voice-head combinations. Those sentences are of variable phrase length, and contain both questions and statements. One example is: ‘The following devices can be turned on or off: the TV, the lamps and the fan.’
3 Procedure In both experiments perceived quality of the talking heads was assessed with various measures. The participants first received a short introduction and were asked four questions concerning their experience with talking heads and spoken dialog systems in general. Both experiments are divided into two parts, one per-sentence part and one per-set part. The per-sentence part consists of single stimuli presented in randomised order. After every stimulus the participants were asked to answer four questions (per-sentencequestionnaire). One question concerning the content of the sentence was only included to focus their attention not only on the appearance but on understanding as
Comparison of Different Talking Heads in Non-Interactive Settings
351
well and was excluded from further analysis. With the remaining three questions the participants were asked to rate the speech quality (‘How do you rate the speech quality?’), visual quality (‘How do you rate the visual quality of the head?’) and overall quality (‘How do you rate the overall quality?’) of each stimulus (Figure 1). How do you rate the overall quality? ○ very good
○ good
○ undecided
○ bad
○ very bad
Fig. 1. Example of one question to collect quality ratings
In the other part a set of six stimuli was presented for every voice-head combination followed by a questionnaire (per-set-questionnaire). This questionnaire assessed the overall quality of the voice-head combination (‘How do you rate the overall quality of the animated head?’) and their detailed impression (‘Please use the following antonym pairs to rate your impression of the animated head.’) using 25 semanticdifferential items. Every item was rated on a five-point scale with the poles described by antonyms. These items derive from a questionnaire currently being developed at Deutsche Telekom Labs partly based on [1]. For the results of the first experiment’s overall ratings confer to [12]. Here, only the analysis of the semantic differential used to assess different quality aspect will be presented. 3.1 Experiment 1 In the first experiment, seven female and seven male participants aged between 20 and 32 (M=27, SD=4.21) were paid to rate the six voice-head combinations in a two hours experiment. The experiment comprised the per-sentence part followed by the per-set part, divided by a short break. The participants were seated in front of a screen on which the videos were displayed. The sound was played back over head-phones. Before the start the participants were shown six anchor stimuli, consisting of each voice-head combination uttering one sentence not contained in the above mentioned 10 sentences. Thus, every participant had seen the whole range of talking heads analyzed in this study before being asked for his or her rating. Finally, six freeze images (one for each condition) were arranged on the screen in random order. The participants could replay the anchor stimuli by clicking on the images and where then asked to order them according to their preference, giving ranks from ‘1 – least liked’ to ‘6 – most liked’. Each number could only be given once, thus resulting in six ranks. On paper they where then asked to answer two open questions: The first question asking why they had ranked the six conditions the way they had, the second question asking which condition they would prefer in a smart-home setting. 3.2 Experiment 2 In the second experiment the per-sentence-questionnaire was slightly altered to assess additional information. It was also asked (‘How well does the voice fit to the head?’) and (‘How do you rate the synchrony of voice and lip movements?’). As this experiment was
352
B. Weiss et al.
implemented for web-access,1 the exact procedure and hardware for every participant is unknown, but headphones were recommended. In order to shorten the duration of every test to 10–15 minutes, the participants performed either the per-set or per-sentence part, selected randomly. Additionally, those 4 sentences of the first experiment most strongly deviating from the mean ratings in overall and speech quality were eliminated, resulting in six instead of ten sentences. There was a similar training part, but no final ranking in this experiment. The semantic differential consists of 24 items, 20 identical to those of the first experiment. Altogether, data from 12 participants (aged 24 to 54, M=29, SD=8.37) of the per-set part is analyzed here.
4 Results 4.1 Experiment 1 The factor analysis of the data from the differential (SemD) using Horn's parallel analysis [9] with Varimax rotation reveals 3 factors (cf. Figure 2, 67% explained variance, Kaiser-Meyer-Olkin measure of sampling adequacy [10], KMO=.91). They can be entitled as ‘realistic, humanlike’ (Cronbach’s α=.93), ‘friendly, honest’ (α=.92) and ‘attractive, pleasing’ (α =.94). All items with loadings greater than 0.4 have been included, apart from those with cross-loadings, resulting in 8 exclusions. The six conditions show significant differences in the average rating of factor 1 (Friedman Test: χ2(5)=31.4, p=7.7e-06), factor 2 (χ2(5)=28.5, p=2.9e-05), and factor 3 (χ2(5)=39.1, p=2.2e-07). For the SemD separated by factors, cf. Figure 3.
Fig. 2. Results of the Parallel Analysis of data from experiment 1. Three factors have higher unadjusted Eigenvalues than random Eigenvalues.
1
This experiment is accessible at http://talking-heads.qu.t-labs.tu-berlin.de
Comparison of Different Talking Heads in Non-Interactive Settings
353
Fig. 3. SemD of factor 1 to 3 (left to right) for the three head components, averaged for participant and voice components. Positive numbers designate a more positive rating for the respective dimension.
Fig. 4. Average ratings of factor 1 to 3 (left to right) for the three heads (lines indicating significant differences)
The basic ranking of the SemD is similar to that from direct quality ratings described in [12], namely TH is better than MA, which is better than CL. However, this holds only for factor 3 and the combined significant results of factor 1 and 2. Interestingly, there are not only 2 resulting factors that are needed to separate three head components, but all three factors show different statistical results (cf. Figure 4 for single comparisons of the ‘head’ component, α-level=.5). For the ‘voice’ component, there is only one significant difference. Although the Mary voice is rated systematically better concerning the overall quality and speech quality [12], the observable difference in the SemD (cf. Figure 5) is only significant for factor 2 (Wilcoxon rank sum test: V=370.5, p=0.047). Relating all factors to explicit quality ratings, factor 1 shows the weakest correlations (cf. Table 1). In fact, factor 1 does not include relevant information to explain the ranking or absolute rating results of overall quality.
354
B. Weiss et al.
Fig. 5. SemD of factor 1 to 3 (left to right) for the two voice components, averaged for participant and head components Table 1. Spearman coefficient between the three factors and explicit quality ratings from the per-sentence questionnaire
Overall quality Visual quality Speech quality
Factor 1 .35 .50 < .1
Factor 2 .51 .55 < .1
Factor 3 .57 .68 < .1
4.2 Experiment 2 Results from the data analysis of the second experiment are quite similar to that of experiment 1. There are also three factors found (65% explained variance, KMO=.90): ‘realistic, humanlike’ (Cronbach’s α=.89), ‘interesting, entertaining’ (α=.84) and ‘attractive, pleasing’ (α=.96). Factors 1 and 3 correspond to those of the first experiment. As one can see in Figure 6, both show a similar pattern for the different head components, whereas factor 2 differs from that, including the items used. Please note, that the numbering of the items is not comparable to experiment 1. For all three factors there are significant differences for the six head and voice combinations (χ2(5)=14.8, p=0.01, χ2(5)=33.5, p=3.0e-06, χ2(5)=35.7, p=1.1e-06). Compared to experiment 1, only factor 1 shows a similar statistical result. In factor 3 only CL differs from the two other head components (cf. Figure 7). The voice component Mary is rated better than Mbrola only for factor 3 (V=409, p=0.022). Relating all factors to the explicit overall quality ratings, factor 3 shows the strongest correlation (cf. Table 2). In contrast to experiment 1, participants conducted either the per-set or the per-sentence block. Therefore, correlation analysis could only be calculated between factors and overall quality, asked before the SemD.
Comparison of Different Talking Heads in Non-Interactive Settings
355
Fig. 6. SemD of factor 1 to 3 (left to right) for the three head components, averaged for participant and voice components
Fig. 7. Average ratings of factor 1 to 3 (left to right) for the three heads (lines indicating significant differences) Table 2. Spearman coefficient between the three factors and explicit quality ratings from the per-set questionnaire
Overall quality
Factor 1 .58
Factor 2 .57
Factor 3 .84
5 Discussion and Conclusion Significant differences between the two voices in terms of explicit quality ratings (overall quality, speech quality) [12] are not relevant for the semantic differential for both experiments. This may be due to the fact that the head components differ much more than the voice components dominating the resulting factors, or that the SemD’s introductory question was misleading.
356
B. Weiss et al.
Apparently, the three resulting factors in both experiments differ in the ratings of the three head components. Massy in particular does not show consistent results: Whereas it is rated positively and similar to Clone concerning human likeliness and naturalness (exp 1&2), it is rated similar to Thinking Head for the second factor (friendliness and honour) (exp 1). Only for factor 3 (exp 1) the actual ranking found in the explicit quality ranking in [12] is replicated. Massy lacks natural texture and non-articulatory movements despite its natural animation, which can explain being rated like Clone as less entertaining and quite artificial. However, Massy is also rated as friendly and honest as Thinking Head. To sum up, the semantic differentials successfully assessed different aspects of subjective user ratings concerning the talking heads. The inconsistency in the results concerning Thinking Head and Massy for factor 3 (attractiveness, pleasantness) between both experiments may lie in the altered items, as factor 3 incorporates also ‘good’ and ‘satisfying’ in experiment 2, which were not included in experiment 1. But more important are the particular items classified in one factor: Sympathy is related to attractiveness and reliability (exp 1, no ‘satisfaction’ item), or attractiveness and satisfaction (exp 2, no ‘reliability’ items). In both experiments, factor 1 (naturalness, human likeliness) contributes less to explicit subjective quality ratings (overall and visual quality) than factor 3 (attractiveness, pleasantness). A possible explanation of the fact that Massy, despite its artificial appearance and lacking of non-articulatory movements, is quite similarly rated as Thinking Head, is that sympathy, ascribed intelligence, and reliability of these talking heads are mostly dependent on attractiveness and not on realism concerning human resemblance (texture, head movements or facial expressions). Insofar, the results indicate that general appearance is much more relevant for subjective quality of the talking heads than realism. As Clone has the most real ‘texture’ (video), lacking a neck and hair seem to be relevant for being considered human like, real and especially attractive. Winking, moving the head etc. is not that relevant for overall quality. The resulting factors cannot directly be considered quality aspects of talking heads. However, results indicate at least a separation between one quality aspect including appearance and another including naturalness. Of course, this finding can currently not be generalized for other head components. The results presented here build the basis for evaluating the used talking head components in interaction. Acknowledgments. This work has been supported by the Deutsche Forschungsgemeinschaft (DFG), grant MO 1038/6-1. The experiments were carried out partly in the frame of the Thinking Head project (Australian fund by ARC/NH & MRC).
References 1. Adcock, A., Van Eck, R.: Reliability and factor structure of the attitude toward tutoring agent scale (ATTAS). Journal of Interactive Learning Research 16(2), 195–217 (2005) 2. Beskow, J.: Rule-Based Speech Synthesis. In: Proc. Eurospeech, pp. 299–302 (1995) 3. Burnham, D., Abrahamyan, A., Cavedon, L., Davis, C., Hodgins, A., Kim, J., Kroos, C., Kuratate, T., Lewis, T., Luerssen, M., Paine, G., Powers, D., Riley, M., Stelarc, S.K.: From talking to thinking heads: 2008. In: Proc. AVSP, Tangalooma (2008)
Comparison of Different Talking Heads in Non-Interactive Settings
357
4. Cohen, M.M., Massaro, D.W.: Modeling coarticulation in synthetic visual speech. In: Thalmann, N.M., Thalmann, D. (eds.) Models and techniques in computer animation, pp. 139–156. Springer, Tokyo (1993) 5. Dutoit, T., Pagel, V., Pierret, N., Bataille, F., Van der Vreken, O.: The MBROLA Project: Towards s set of high-quality speech synthesizers free of use for non-commercial purposes. In: Proc. ICSLP, pp. 1393–1396 (1996) 6. Fagel, S., Bailly, G., Elisei, F.: Intelligibility of natural and 3D-cloned German speech. In: Proc. AVSP, Hilvarenbeek (2007) 7. Fagel, S., Clemens, C.: An articulation model for audiovisual speech synthesis – determination, adjustment, evaluation. Speech Communication 44, 141–154 (2004) 8. Foster, M.E.: Enhancing human-computer interaction with embodied conversational agents. In: Proc. HCI International, Beijing (2007) 9. Horn, J.: A rationale and a test for the number of factors in factor analysis. Psychometrika 30, 179–185 (1965) 10. Hutcheson, G., Sofroniou, N.: The multivariate social scientist. Sage Publications, Thousand Oaks (1999) 11. Krämer, N.C.: Soziale Wirkungen virtueller Helfer: Gestaltung und Evaluation von Mensch-Computer-Interaktion. Kohlhammer, Stuttgart (2008) 12. Kühnel, C., Weiss, B., Wechsung, I., Fagel, S., Möller, S.: Evaluating Talking Heads for Smart Home Systems. In: Proc. ICMI, Chania (2008) 13. Osborne, J., Costello, A.: Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis. Practical Assessment Research and Evaluation 10(5), 1–9 (2005) 14. Ruttkay, Z., Pelachaud, C.: From Brows to Trust: Evaluating Embodied Conversational Agents. Springer, New York (2004) 15. Schroeder, M., Trouvain, J.: The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology 6, 365–377 (2003) 16. Yee, N., Bailenson, J.N., Rickertsen, K.: A meta-analysis of the impact of the inclusion and realism of human-like faces on user experiences in interfaces. In: Proc. SIGCHI, San José (2007)
Video Content Production Support System with Speech-Driven Embodied Entrainment Character by Speech and Hand Motion Inputs Michiya Yamamoto1, Kouzi Osaki2, and Tomio Watanabe1,3 1
Faculty of Computer Science and System Engineering, Okayama Prefectural University, 111 Kuboki, Soja, Okayama 719-1197, Japan {yamamoto, watanabe}@cse.oka-pu.ac.jp 2 Graduate School of Systems Engineering, Okayama Prefectural University, 111 Kuboki, Soja, Okayama 719-1197, Japan
[email protected] 3 CREST of Japan Science and Technology Agency
Abstract. InterActor is a speech-input-driven CG-embodied interaction character that can generate communicative movements and actions for entrained interaction. InterPuppet, on the other hand, is an embodied interaction character that is driven by both speech input, like the InterActor, and hand motion input, like a puppet. In this study, we apply InterPuppet to video content production and construct a system to evaluate the content production. Self-evaluation of longterm (5-day) video content production demonstrates the effectiveness of the developed system. Keywords: Human communication, human interaction, embodied interaction, embodied communication, video content.
1 Introduction Today, video content that employs CG characters is becoming increasingly popular, and such CG characters are given as much importance and preference as actors or stuffed toys. The possibility of using CG characters will increase with an increase in the availability of televisions with multiple channels. Moreover, there has been an increase in the use of streaming content on networks or websites such as YouTube where users can broadcast content. Many studies have been conducted on the movement generation of CG characters and on video content that uses CG characters [1], [2]. However, the importance of CG characters will increase further if we can make the CG characters in TV programs move in real time. In human face-to-face communication, not only verbal messages but also nonverbal behaviors such as nodding and body movements are rhythmically related and mutually synchronized between the communicating humans. This synchrony of embodied rhythms, termed entrainment, in communication generates the sharing of embodiment in human interaction, which plays an important role in human interaction and communication [3]. We have already developed a speech-driven CG-embodied J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 358–367, 2009. © Springer-Verlag Berlin Heidelberg 2009
Video Content Production Support System
359
character called InterActor, which performs the functions of both the speaker and the listener by coherently generating expressive actions and movements in accordance with a speech input. We have demonstrated that this system can be effective in supporting human interaction and communication between remote individuals [4]. In addition, we have developed another embodied interactive character called InterPuppet, which permits input in the form of hand motion. The effectiveness of InterPuppet in providing communication support has been demonstrated [5]. In video content, the entrainment movements and actions of CG characters are very important because character motion often appears with voice. If the mechanism of InterPuppet, which involves both entrainment motion and intentional motion, is introduced to a video content production support system, InterPuppet can be widely used in applications such as news programs and live broadcasts. In this study, a video content production support system using InterPuppet is developed. Further, the effectiveness of InterPuppet is evaluated from the viewpoint of content creators by producing video content for a long period.
2 Video Content Production Support System 2.1 Outline of Content Production Figure 1 shows the overview of content production using InterPuppet. In our previous researches, we developed a conversation support system using InterPuppet [5]. This system facilitates communication via the CG character, and it is important that there are users behind the CG character. The CG character plays an important role in video content production because the production work is done by the content creator. The application of InterPuppet to video content production will result in the production of attractive content that involves entrainment. In addition, for any content, this system maintains the entrainment of the body’s rhythm and the vividness and natural movement of a CG character such as InterActor. In particular, the viewer can be entrained to the content that presents and explains information when the CG character talks to the viewer as a speaker. Intentional body movements necessary for the production of video content are also added by using hand movements in a manner similar to the manipulation of a puppet. Necessary motion such as pointing is expressed in real time, while automatically generating the communicator’s motion from the creator’s voice.
Fig. 1. Outline of content production
360
M. Yamamoto, K. Osaki, and T. Watanabe
By using InterPuppet, we can use a CG character to smoothly produce a TV program, explain the information shown on screen, and interact with other characters. Rich communicative movements and actions based on entrainment body rhythm are generated more easily than those obtained by the method for generating the movement of the character from just the hand motion input, thereby providing support to video content production. 2.2 System Development We have developed a prototype of the content production system for evaluation and demonstration. Figure 2 shows the system configuration. In order to operate the character like a puppet, the system consists of a data glove (Immersion CyberGlove), a headset for providing voice input/output, a display, and a PC. The PC comprises a video card (DirectX 9.0 support), voice input/output, and a serial port (for connecting the data glove). This system uses an AT-compatible PC with Windows XP. Video content is recorded on the hard disk of the PC and can be replayed freely. Here, we use an information program that introduces a flowering plant as an example of video content. Figure 3 shows the screen composition. The screen shows the CG character and the background image. An image and a video are set up in the background. If necessary, the position and the size of the video can be changed smoothly in accordance with the composition of the TV screen. This system can produce a wide range of video content by changing the background, movie, and the appearance of the screen. Voice/Sound
Display
Headset
Character
Data glove Hand motion PC
Fig. 2. System configuration
Fig. 3. Screen composition
Video Content Production Support System
361
Fig. 4. Data glove operation of InterPuppet
2.3 Operation of a Character In general, the voice input to InterPuppet is obtained by reading out the script of the program. We have already developed InterCaster; it can produce video content by using the speaker movements of InterActor. InterCaster is used in applications such as television programs developed by InterRobot Ltd. (a laboratory venture). In InterCaster, the effects of communication movements are enhanced when the movements and actions of the listener are added to the movements of the speaker, because the action of nodding is related to the speaker’s own voice. Therefore, InterCaster includes both speaker motion and listener motion. In this study, the model that generates the communicative motion of speakers and listeners is the same as that used in our previous research [6]. While generating intentional motion from hand motion, the hand motion input from the data glove is related to the motion of the CG character. By using 18 sensors of the date glove, we can measure hand motion and convert it to the CG character’s motion in the form of the motion of a puppet. Figure 4 shows the relationships between character motion and data glove manipulation.
3 Content Production Experiment 3.1 Setup In order to evaluate the effectiveness of the proposed method from the viewpoint of content creators, we performed an evaluation experiment in which a live TV broadcast was assumed (Figure 5). In this experiment, the proposed method was evaluated from the viewpoint of producing video content; the experiment was performed by following a previously established procedure. Moreover, in order to create a realistic work scene, we made a subject rehearse his/her task before recording the video content program. Figure 6 shows the experimental setup. Dummy broadcasting equipment (Sony, FXE-120) was arranged to create the atmosphere of an actual production site. Two experimenters played the roles of production staff. The experimenters handed out realistic instructions to the subject, operated the dummy broadcasting equipment, and
362
M. Yamamoto, K. Osaki, and T. Watanabe Subject
Script
Data glove Keyboard Display Clock
DV camera
DV recorder
Voice recording PC
PC
Display Dummy broadcasting equipment
Fig. 5. Experimental scenery
Keyboard Experimenter 1
Experimenter 2
Fig. 6. Experimental Setup
monitored the DV camera. In addition, we assumed that the programs were broadcast to many unspecified viewers. In order to make the subjects perform their tasks with the highest efficiency, we told them that their produced work would be made available to general public for viewing. We prepared a script and video for the production of a one-minute program that introduced a flowering plant. While reading out the script, the subjects generated movement in accordance with the video. The program was recorded 15 times. In order to examine regular content production, three recordings were made in one day; such recordings were made on five different days in a total span of 16 days. A different video and script for the introduction of the flowering plant were prepared for each program. On average, the scripts had 261.8 moras, and it was possible to read out all the information in approximately 1 min. Moreover, the timing of the video was changed, and the BGM was set to a value equal to this timing. In the experiment, the following three modes of operation were compared. The program was recorded in each of these modes. A: Hand motion input only (movements of mouth and eyes are the same as those in mode B) B: InterActor C: InterPuppet (A + B) Before beginning the experiment, we confirmed that the subjects were well acquainted with the system. These subjects were briefed on the character operation method to be followed while using this system. Further, the data glove was calibrated when necessary. Figure 7 shows the procedure of the production of the program. First, the subjects used the system. The three modes were used randomly in a day. Then, these subjects watched the recorded content and evaluated their own work at the end of each trial. In order to evaluate the mental workload, the subjects generated the character motions as work. We used NASA-TLX for the evaluation of their work. Using NASATLX, we asked the subjects to assign weights on the basis of a paired comparison. They were asked to rate six items (mental demand, physical demand, temporal demand, own performance, effort, and frustration level) between 0 and 100. Then, the
Video Content Production Support System
Watching content・ Recording evaluation of work
Practice 20
363
min
1
min
a few min
Production work Rest Production work Rest Production work Fig. 7. Time table
mean weight workload (WWL) score, which was an integrated score of the mental workload obtained by a weighted average, was calculated. Moreover, the subjects used a seven-point bipolar rating scale from –3 (not at all) to 3 (extremely) in order to rate the three modes on parameters such as “enjoyment,” “ease of speaking” (a subject can speak easily and smoothly), “role play,” and “operation” (a subject feels that he/she can manipulate the character) from the viewpoint of body interaction support during the generation of the character movement. Further, we recorded the extent of character operation by the data glove for modes A and C. The subjects randomly rested for a few minutes during the program production work. In addition, after the completion of the recordings on the fifth (final) day, the subjects were asked to complete questionnaires for an overall evaluation of the work. The six items given below were rated on the seven-point bipolar scale for operation in modes A–C. The questionnaire addressed parameters such as “vivid movement,” “movement to report content information,” “rich movement,” “reading efficiency,” “work satisfaction,” and “He/She wants to show his/her work to others.” Ten Japanese students (5 males and 5 females) participated in the experiments. 3.2 Results Figure 8 shows the result of the evaluation carried out using NASA-TLX. The result of the analysis of variance is also shown in the figure. From the figure, we can ** : p < 0.01 A 100
C
B
**
80 60 40 20 0
1
2
3
4
Fig. 8. Result of NASA-TLX
5
day
364
M. Yamamoto, K. Osaki, and T. Watanabe
Fig. 9. Sensory evaluation
observe that the workload of each mode decreased with the passage of days. However, the workload in the case of mode A (hand motion input only) was higher than that in the case of the other modes. Figure 9 shows the results of the seven-point bipolar rating for the five days on which the programs were recorded. The results of the Friedman test of each mode are also shown in this figure. For the parameter “enjoyment,” modes A and C were rated higher than mode B. For “ease of speaking,” mode B was rated the highest on all five days, and mode C was rated higher than mode A. For “role play” and “operation,” mode A was rated higher than the other modes. However, mode C was also rated high for “role play.” 400
* * p<0.01 * p<0.05
** *
rad
300
A C
*
200 100 0
1
2
3
4
5
day
Fig. 10. Amount of data glove operation
(rad)
Video Content Production Support System
0.8 0.6
5
- 0.2
(rad)
Head front Head side
: Nod operation of InterActor is done.
le 0.4 A nga 0.2 ed da oM eh 0
10
15
20
15
20
time (s)
0.8 0.6
le 0.4 B gan 0.2 ed ad oM eh 0
5
- 0.2
(rad)
365
10 time (s)
0.8
20
12.0 s 11.5 s
0.6
el 0.4 C gna 0.2 ed da oM eh 0
5
- 0.2
10
15
20
time (s)
Fig. 11. Some head movements
** 3 2 1 0 -1 -2 -3
3 2 1 0 -1 -2 -3
*
A B C Vivid movement
**
A B C Reading efficiency
*
** : p < 0.01 * : p < 0.05 **
A B C Movement to report content information
A B C Rich movement
**
*
A B C Satisfactory work
A B C He/She want to show his/her work to others
Fig. 12. Self evaluation of content
Figure 10 shows the extent of data glove operation in accordance with the hand motion input in modes A and C. The extent of operation shown in the figure is the
366
M. Yamamoto, K. Osaki, and T. Watanabe
total extent of hand and wrist joint operation used for character operation in the program recording of 1 min. The result of a t-test is also shown in the figure. Significant differences in the extents of operations are observed after the third day. Figure 11 shows some head movements and screenshots depicting these movements in mode C. The sections highlighted in gray are those where the nodding movements of InterActor were made. The character extensively turned to the right after approximately 8 s and 12 s in modes A and C. At this time, the character viewed the video. This figure shows a typical usage of the system (The content was recorded by a subject on the fifth day). Figure 12 shows the result of the overall evaluation after the end of the experiment. The result of the Friedman test is also shown in this figure. The ratings for most of the parameters among “vivid movement,” “rich movement,” “work satisfaction,” and “He/She wants to show his/her work to others” were the highest in the case of mode C; mode A had the lowest ratings. In particular, mode C was rated higher than the other modes on the parameters “vivid movement” and “rich movement.” For the parameter “reading efficiency,” mode B was rated higher than the other modes. The ratings for the parameter “movement to report content information” were higher in the case of mode C than in the case of mode B.
4 Discussion The subjects’ evaluation changed after they participated in the experiment. The results of the evaluation using NASA-TLX show that the workload of content production decreased. In the case of modes A and C, the ratings for the parameter “role play” increased gradually, suggesting that the subjects gained experience in character operation by hand motion input. On the first day, the ratings of the parameter “enjoyment” in mode C were higher than those in mode A; on the fifth day, these ratings were almost the same. However, after the third day, a significant difference was observed between the extents of character operation in modes A and C. On the fifth day, the ratings for this parameter were high in the case of both mode A and mode C. In the case of mode B, the parameter “operation” was rated low; however, the parameter “ease of speaking” was rated high during the five days. These results demonstrate the effectiveness of InterActor in providing communications support. The rating for the parameters “ease of speaking” and “operation” in the case of mode C was intermediate between the ratings in the case of modes A and C. In addition, the results of the evaluation using NASA-TLX and the rating for the parameter “role play” in the case of mode C were intermediate between those in the case of the other two modes. Therefore, we can conclude that mode C is well balanced between mode A and mode B.
5 Conclusion An embodied interactive character termed InterPuppet was used for the production of video content, and its performance was evaluated from the viewpoint of content creators. First, we outlined the method for video content production for making the best possible use of InterPuppet; we also described the evaluation system. A sensory
Video Content Production Support System
367
evaluation and the behavioral analysis of the production in three modes—hand motion input only, InterActor, and InterPuppet—for five days demonstrated the effectiveness of the proposed InterPuppet system. It was also found that InterPuppet received high ratings for the parameters “enjoyment,” “ease of speaking,” and “role play.” Moreover, character motion received high ratings for the parameters “vivid movement,” “movement to report content information,” and “rich movement”. We have developed a learning support system using InterActor, in which InterActors are superimposed on video images such as educational programs [6]. By providing support for content production, we can produce attractive content by superimposing two or more InterActors. Acknowledgements. This work under our project "Generation and Control Technology of Human-entrained Embodied Media" has been supported by CREST (Core Research for Evolution Science and Technology) of JST (Japan Science and Tech-nology Agency).
References 1. SIGGPRAPH (2008), http://www.siggraph.org/s2008/ 2. Morishima, S.: Face and Gesture Cloning for Life-like Agent. In: Proceedings of the 11th International Conference on Human-Computer Interaction, pp. 2044 (2005) 3. Kobayashi, N., Ishii, T., Watanabe, T.: Quantitative Evaluation of Infant Behavior and Mother Infant Interaction. Early Development and Parenting, 23–31 (1992) 4. Watanabe, T., Okubo, M., Nakashige, M., Danbara, R.: InterActor: Speech-driven embodied interactive actor. International Journal of Human-Computer Interaction 17(1), 43–60 (2004) 5. Yamamoto, M., Watanabe, T.: Development of an Embodied Interaction System with InterActor by Speech and Hand Motion Input. In: CD-ROM of the 2005 IEEE International Workshop on Robots and Human Interactive Communication, pp. 323–328 (2005) 6. Watanabe, T., Yamamoto, M.: An Embodied Entrainment System with InterActors Superimposed on Images. In: Proceedings of the 11th International Conference on HumanComputer Interaction, p. 2045 (2005)
Autonomous Turn-Taking Agent System Based on Behavior Model Masahide Yuasa, Hiroko Tokunaga, and Naoki Mukawa School of Information Environment, Tokyo Denki University, 2-1200 Muzai Gakuendai, Inzai, Chiba, 270-1382, Japan {yuasa, tokunaga_imlab, mukawa}@sie.dendai.ac.jp
Abstract. In this paper, we propose a turn-taking simulation system using animated agents. To develop our system, we analyzed eye-gaze and turn-taking behaviors of humans during actual conversations. The system, which can generate a wide variety of turn-taking patterns based on the analysis, will play an important role for modeling the behaviors at turn-takings, such as gazes, head orientations, facial expressions and gestures. The paper describes the system concept, its functions, and implementations. The findings obtained from investigations using the system will contribute to development of future conversational systems in which agents and robots communicate with users in a lively and emotional manner. Keywords: animated agents, turn-taking, nonverbal information, conversation.
1 Introduction In recent years, animated agents have been used in a wide range of applications, including entertainment, virtual environments, and e-commerce. In order to enrich communications between agents and humans and make it smoother, many researchers have investigated effective nonverbal behaviors of agents, such as facial expressions, voice tones, and body language. There has also been a major issue in investigating not only nonverbal behaviors of well-known but also turn-taking behaviors during conversations. The psychological literature has analyzed the relationship between eyegaze behaviors and turn-taking [5, 1]. Moreover, it was found that a listener being looked at by a speaker tended to start speaking [8, 12]. Based on these findings, several existing agents and robots have a turn-taking function that detects a speaker’s gaze using eye-tracking equipment [6, 9, 4]. The existing agents and robots are equipped with rigid turn-taking rules, and have difficulties communicate with users in a lively and emotional manner. A typical rule is, for example, while a speaker is speaking, the hearer (agent) looks at the speaker, and then at the end of the speaker’s utterance, the agent must begin speaking. We must develop a variety of other rules, as in the following three examples: 1) even if the speaker does not look at the hearer, he/she starts to speak; 2) before the speaker finishes speaking, the hearer interrupts and starts to speak; and 3) even though the speaker gazes at the hearer, the latter does not start to speak because he/she does not want to say anything. These turn-taking rules are promising for enriching communications J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 368–373, 2009. © Springer-Verlag Berlin Heidelberg 2009
Autonomous Turn-Taking Agent System Based on Behavior Model
369
between agents and humans. And human may use the other turn-taking rules in conversation, and we need to observe the actual conversation and improve the conversational behavior model. In our research, we developed a turn-taking agent simulation system using new rules based on the analysis of the relationship between eye-direction and turn-taking in actual conversations. The system generates various kinds of turn-taking behaviors, such as those that involve excitement, activity, silence, and boredom. A user can observe the conversations and control those behaviors of the agent that influence the conversation. Moreover, we investigated the relationship among turn-taking patterns, time lags, and the emotional expressions of the agents. Then, we can also generate conversational moods between agents and humans by controlling turn-taking patterns. In future, it will be useful to build the sophisticated conversational model. Additionally, the findings derived from observations of conversations using our system will be useful in developing agents and robots that can communicate with humans in a lively and emotional manner.
2 Previous Studies Many researchers have developed conversational robots and agents by focusing on nonverbal behaviors such as eye-gazing. In the area of humanoid robots, researchers have developed a conversational robot to take turns based on participants’ gazing behaviors and the voice directions of the speaker [6, 9, 4]. Peters et al. have developed animated agents with a mathematical model of gaze behavior in a virtual world [7]. In their research, when one agent looks at another hearer, the gaze behavior of the former attracts the attention of the hearer, then the hearer takes the next turn. Prendinger et al. developed a web-camera-based agent that interacts with humans [4]. Garau et al. have shown that when an agent’s gaze is synchronized with speech, the user’s response to face-to-face impressions is improved compared with random gaze conditions [3]. Traum et al. have developed a conversational agent that takes turns in a multiparty conversation [10]. The agent gives a simple rule-based response to the user, that is, when a user looks at an animated agent, the agent responds to the user. Duncan et al. showed several turn-taking categories (e.g., turn-maintaining, turn-yielding) and described the complexity of the turn-taking [2]. We need to take into account more complicated rules of turn-taking than the simple rule. In our research, we developed a conversational agent system that has affective and efficient turn-taking that even causes interruptions and pauses during conversations. Through complicated rules, conversational agents can communicate with human in lively and emotional ways.
3 Turn-Taking Agent Simulation System 3.1 Concept of the System In order to build the conversational behavior model to communicate lively and emotional way, we develop the turn-taking simulation system using animated agents. We
370
M. Yuasa, H. Tokunaga, and N. Mukawa
can observe the various kinds of virtual conversations by controlling the agents' behaviors. By using the simulation system, we can investigate the conversational behavior model for multi-party conversation and which particular cues are important for the model. Human may use the turn-taking rules which we don’t know; therefore we need to observe the actual conversation in detail. For the first step, we observe the actual three-party conversation and build the basic conversational behavior model. 3.2 Observation of Actual Conversations In order to make an efficient behavioral model of turn-taking, we analyzed the conversational scenes of three female university students recorded on video. As a result, we found three patterns for the selection of the next speaker at turn-takings. That is, the next speaker was selected in the following three patterns: Rule 1: the person being looked at by the previous speaker takes the next turn; Rule 2: the person not being looked at by the previous speaker takes the next turn, even though the previous speaker looks at another listener; and Rule 3: one of the listeners takes the turn even when the previous speaker look no one. Our analysis showed that the rate of Rule 1 was 65%, which is much higher than Rule 2 (26%) and Rule 3 (9%). Through our analysis, we found that humans use not only the simple Rule 1–but also Rule 2 and Rule 3 that do not adhere to other mechanical and orderly rules. We used probabilities for modeling these rules to develop our turn-taking agent system. 3.3 Implementation of the System Based on the analysis of turn-taking rules in previous section, we have developed a conversational autonomous turn-taking agent system using animated agents. In this system, we can control the probabilities of turn-taking rules, the timing of turn-taking, and facial expressions using the slide bars on the control panel (Figure 1). We applied TVML to a test bed environment [11]. We selected three characters from the TVML set as the agents that can produce various facial expressions, gaze directions, and mouth movements. The agents say random words that have no meaning from the word set (e.g., arabahika or ukujarah), preventing the meaning of the words may affect the turn-taking.
Fig. 1. Agents and part of the control panel of our system
Autonomous Turn-Taking Agent System Based on Behavior Model
371
Based on the previous analysis, we implemented functions to control the following three probabilities: 1. (P1) The probability that the “hearer” agent who is being looked at by the previous speaker starts to speak. (The probability 0 means “silence.”) 2. (P2) The probability that the “hearer” agent who is not being looked at by previous speaker starts to speak. (The probability 0 means “silence.”) 3. (P3) The probability that the “speaker” agent looks at someone when the agent finishes speaking. (The probability 0 means “speaker’s looking down.”) By regulating these probabilities, we can generate several patterns of conversations. For example: • Case 1: the most typical turn-taking pattern P1 = 100%, P2 = 0%, P3=100% • Case 2: the pattern of two hearers starting to speak simultaneously P1 = 100%, P2 = 100%, P3=100% • Case 3: the pattern of occasional silence P1 = 30%, P2 = 0%, P3=100% • Case 4: the pattern in which a speaker looks down and silence occurs P1 = 100%, P2 = 0%, P3=30% Figure 2 shows the case of turn-takings when the left agent finishes speaking and gazes at one of the hearers. Figure 2A shows the Case 1 that is the most typical turn-taking rule that a person who is being looked at by the previous speaker starts to speak. Case 2 indicates the collision of speaking by two hearers (Figure 2B). Here, not only the hearer who is looked at by the previous speaker, but also the hearer who is not looked at by the previous speaker, starts to speak. Figure 2B shows that both the center agent and right agent start to speak. The case can represent exciting conversations in which several people want to start to speak and frequently collision occurs. Case 3 is an example of the case when silence occurs. In this case, even though the speaker looks at the hearer, he/she starts to speak at only 30%, so, silence may occur. Case 4 is another type of silence (Figure 2C). The speaker looks at the hearer at only 30%; occasionally, the speaker looks down and no one starts to speak and silence occurs.
(A) Case 1 (100%) and Case 3 (30%)
(B) Case 2
(C) Case 4
Fig. 2. Turn-taking patterns and agents’ behaviors
The system can also generate other emotional turn-taking behaviors by controlling probabilities for each of the three agents. For example, the system can generate cases in which only two agents talk and the other agent seldom takes turns; only one agent
372
M. Yuasa, H. Tokunaga, and N. Mukawa
speaks for a while and the others rarely take turns; or two agents tend to hesitate to speak and the other agent has to start speaking more frequently. In addition, we have been improving the system to use facial expressions and body postures in a stepwise manner, in order to simulate a wide variety of conversations.
4 Discussions Using our system, we generated various kinds of turn-taking patterns, including collision, silences, and others. It was found that we can generate many types of emotional conversations. Our findings are useful and efficient when designing the turn-taking and the conversations of the agents and robots. In addition, we think that turn-taking is very important factor for the mood of the conversation and the turn-taking changes the conversational mood significantly. For example, when a hearer interrupts the speaker before the speaker has finished, the mood is more exciting than using simple turn-taking. On the other hand, a case in which only one person continues to speak at length is very boring. Even if there are three people in a conversation, in a case in which only two talks, there is no chance for the other one to take turns and this is no fun. When silence occurs, we feel that a mood is not activated. Therefore, we have conducted a preliminary experiment to evaluate the patterns of conversational moods by subjective tests using volunteers. We have confirmed that the system generates activated/not activated and pleasant/unpleasant moods. We will investigate the relationship between turn-taking and the conversational mood in more detail. We have been developing a turn-taking system that can interact with a user with gaze equipment. The system detects the area of the user’s gaze using gaze-tracking equipment (Figure 3). The agents’ behaviors are controlled on TVML, and the system uses NAC Tech’s gaze tracker and speech recognition engine. By using this system, a user can communicate with two agents according to turn-taking rules in a virtual conversation. For example, when a user finishes speaking, he/she gazes at an agent, and
Fig. 3. Turn-taking system using gaze-detection equipment
Autonomous Turn-Taking Agent System Based on Behavior Model
373
then the agent starts to speak. Not only is the agent gazed at by the user but the agent not gazed at by the user might start to speak, depending on the setting of probabilities of turn-taking. Based on these findings of the three agents’ system and two agents’ system involving a user, we will be able to develop the conversational agents to enrich the communications with humans.
5 Conclusions Based on the concept of needing more turn-taking rules to make conversations livelier and more emotional, we developed an autonomous turn-taking agent system. The system generates various conversations, not only simple ones but also collisions, silence, boredom, and more. The findings of this investigation will contribute to the designing of agent behaviors that control the conversations between humans and computers.
References 1. Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press, Cambridge (1976) 2. Duncan Jr., S.: Some Signals and Rules for Taking Speaking Turns in Conversations. Journal of Personality and Social Psychology 23, 283–292 (1972) 3. Garau, M., Slater, M., Bee, S., Sasse, M.A.: The Impact of Eye Gaze on Communication using Humanoid Avatars. In: Proceedings of CHI 2001, pp. 309–316. ACM Press, New York (2001) 4. Eichner, T., Prendinger, H., Andre, E., Ishizuka, M.: Attentive Presentation Agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 283–295. Springer, Heidelberg (2007) 5. Kendon, A.: Some Functions of Gaze Direction in Social Interaction. Acta Psychologica 32, 1–25 (1967) 6. Matsusaka, Y., Fujie, S., Kobayashi, T.: Modeling of Conversational Strategy for the Robot Participating in the Group Conversation. In: Proceedings of Eurospeech 2001, pp. 2173–2176 (2001) 7. Peters, C.: Foundations of an Agent Theory of Mind Model for Conversation Initiation in Virtual Environments. In: Proceedings of AISB 2005 (2005) 8. Sacks, H., Schegloff, E., Jefferson, G.A.: A Simplest Systematics for the Organization of Turn-taking for Conversation. Language 50(4), 696–735 (1974) 9. Scassellati, B.: Theory of Mind for a Humanoid Robot. Autonomous Robots 12, 13–24 (2002) 10. Vertegaal, R., Ding, Y.: Explaining Effects of Eye Gaze on Mediated Group Conversations: Amount or Synchronization? In: Proceedings of CSCW 2002, pp. 41–48. ACM Press, New York (2002) 11. Traum, D., Rickel, J.: Embodied agents for multiparty dialogue in immersive virtual worlds. In: Proceedings of AAMAS 2002, pp. 766–773 (2002) 12. TVML, http://www.nhk.or.jp/strl/tvml/
An Interoperable Concept for Controlling Smart Homes – The ASK-IT Paradigm Evangelos Bekiaris, Kostas Kalogirou, Alexandros Mourouzis, and Mary Panou Centre for Research and Technology Hellas (CERTH) Hellenic Institute of Transport Thermi, Thessaloniki, GR-57001, Greece {abek, kalogir, mourouzi, mpanou}@certh.gr
Abstract. This paper presents an interoperable home automation infrastructure that offers new levels of mobility, accessibility, independence, comfort, and overall quality of life. Building on previous experience with similar systems and existing gaps over the full potential of automated support, both at home and on the move, new concepts and objectives are defined for R&D on smart homes. The paper outlines the proposed integrated and holistic solution, discusses design and development issues, provides indicative evaluation results emerging from a case study conducted in the European ASK-IT project, and concludes by highlighting open issues and future steps. Keywords: Smart home, Ambient assisted living, Accessibility, Infomobility.
1 Introduction Quality of life depends heavily on the efficiency, comfort and cosiness of the place an individual calls “home”. Thus, a wide range of products and systems have been invented in the past to advance human control over the entire living space. Domotics1 is a field specializes in specific automation techniques for private homes, often referred to as “home automation” or “smart home technology”. In essence, home environment control and automation2 extend the various techniques typically used in building automation, such as light and climate control, control of doors and window shutters, surveillance systems, etc., through the networking of ICT in the home environment, including the integration of household appliances and devices. Such solutions are not only offering comfort and security, but when serving an elderly, an injured person or a person with disability can leverage safety and individual independence. Assistive domotics, represents a relatively recent effort in this direction that further specialises in the needs of people with disability, older persons, and people with little or no technical affinity, and which seeks to offer such residents new levels of safety, security and comfort, and thereby the chance to prolong their safe staying at home. Yet, a high quality of life is not only about comfort at home. The ability of moving about at will, 1 2
The term “domotics” is a contraction of the words “domus” (lat.= home) and “informatics”. A comprehensive state-of-the-art and market analysis is offer in Manchado et al. 2009.
J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 377–386, 2009. © Springer-Verlag Berlin Heidelberg 2009
378
E. Bekiaris et al.
daily or not, is crucial for social inclusion. Being able to continue in high-quality employment and contribute productively to the economy is also important for selfesteem. Active participation in society, through social contacts and activities, daily economic activities such as shopping, and democratic decision-making are key to well-being. The work presented here, motivated from the ICT tendency towards ambient intelligence and universal access to continuums of general and specific computer-based services and applications [5], aims at bridging the technological gap among domestic and urban environments. Our focus has been placed both on extending the access to domotic services beyond the home space and on enabling the proper fusion with infomobility services3 and other key services that help individuals in maintaining their independence, especially individuals at risk of exclusion, such as elder citizens, people with chronic conditions or disability, persons living in remote areas, etc. The paper presents a novel concept for smart homes4, conceived and prototyped as part of an integrated approach endorsed by ASK-IT5, which offers users greater levels of freedom, mobility and inclusion.
2 Concept and Key Objectives Researchers in ASK-IT have been working closely with older and disabled persons for many years in order to identify opportunities where ICT can help to overcome any experienced barriers and drive a good life with the least possible dependency on help from family, friends or social services. Our focus in ASK-IT has been on computermediated services to mobility-impaired people6 (MI people, in short), such as accessibility-informed pre-trip and on-trip support for helping individuals in travel preparations and executions (e.g., to create new itineraries or collect travel information and navigation guidance regarding a tip in mind). In the same direction, the pilot system presented here has been designed to improve the typical experience of interaction with domotic systems. Ensuring ubiquitous access and control of the status of private homes, including while on the move, as a means for never leaving the house (comprising of electrical and electronic appliances, other residents, etc.) really unattended, helps individuals to feel more comfortable with getting out or getting on travelling. In other words, innovation within the ASK-IT domotic system is two-fold: • In-house: use the domotic-end to deliver infomobility services at home. • Outside the home: use the ASK-IT infrastructure to offer home control and home monitoring services while on the move. Overall, this proposed approach introduces several new perspectives to R&D in home automation as it takes into consideration key concepts and principles such as: 3
As data sources and services to encourage and enhance everyday mobility and travel. Other terms used, often interchangeably, for “smart home” include “intelligent home”, “connected home”, “e-home”, and “digital home”. 5 ASK-IT (Ambient Intelligence System of Agents for Knowledge based and Integrated Services for Mobility Impaired Users), is a European Integrated Project (IST-2003-511298) within the IST 6th Framework Program in the e-Inclusion area. See http://www.ask-it.org. 6 A term used (Simões, Gomes & Bekiaris, 2006) to refer to various citizens who experience different kinds of limitations in self-powered motion or in using common transport means. 4
An Interoperable Concept for Controlling Smart Homes – The ASK-IT Paradigm
379
• Mobility. A primary objective is to take into account the emerging need of the modern citizen for enhanced mobility and render the home automation services in question portable and accessible through PDAs, smart phones, or remote access points. Further to Salomaa, Jaworek, and Maini, 2001, the term mobility may refer to: (i) personal mobility that deals with the mobility of people who may not necessarily carry a device, and (ii) computer mobility that deals with the mobility of devices. The above are inherent features of ambient intelligence systems and often addressed in conjunction since they are both concerned with enabling access to the computing space (i.e., content and computations, either public or private) from various locations [7]. • Interoperable. Our vision entails the objective of integrating multiple services and making them ubiquitously available offering a continuous, yet unobtrusive, experience of ambient intelligence. To this end, domotic services are integrated into a single platform, along with complementary services, such as infomobility services, eHealth, eWorking, eLearning, etc. • e-Inclusion. A clear objective underlying this effort is to mobilise new technologies to overcome social and economic disadvantages and exclusion. • e-accessibility. Ensuring that everyone is able to access and utilise the domotics techniques proposed here on an equal basis, especially people with disabilities and the elderly, is critical for the success and broad acceptance of the system. A combination of “Design for All” and “Adaptive design” approach is employed to this end, in which all components are designed to be used by everybody, but adaptable to particular user and context needs as a means to improve subjective usefulness and ease of use. • Energy saving automation. Efficient energy management needs to be the baseline for all techniques in our home automation solutions in order to ensure the delivery of environmental friendly concepts.
3 Overview of the System The domotic infrastructure in question has been developed as part of the Greek Pilot in ASK-IT, and has been installed at the temporary premises of CERTH/HIT7. At this stage, it temporarily consists of a single room virtually divided in four areas to simulate a full flat with the following basic rooms: living room, bedroom, kitchen and bathroom (see Fig. 1.). Although many techniques typically used in house automation are also used in HIT’s Domotics Lab, additional functions are considered, such as the control of a multi-media home entertainment system, environment and user interface adaptation according to various preference settings (such as automatic scenes for dinners and parties) and to diverse user profiles (deaf, blind, wheelchair user, etc.). The ambient intelligent environment is further supported by several user-friendly and accessible interfaces (see Fig. 2.) to control home automation. In summary, interaction with the domotics functions is equally accessible through: PCs or Laptops (incl. through a wheelchair control or a joystick as an alternative to traditional mouse-based interaction offered for people with upper limp impairments), a Media center, PDAs (incl. through a wheelchair control or a joystick as an alternative to traditional 7
See authors’ affiliation.
380
E. Bekiaris et al.
Fig. 1. Overview of the various rooms simulated at HIT’s Domotics Laboratory
Fig. 2. Fixed-location access: wall-mounted display (left) and PC/Laptop (right)
stylus-based interaction), in house wall-mounted touch panels, and mobile phones. Devices currently integrated include video cameras, one actuator (door control), one door bell, one HVAC, one Dimmable light, two Lamps, white goods (Micro, Grill, Coffe machine, etc.), while the sensors integrated include one temperature sensor, a humidity sensor, a luminance sensor and a motion detector. SMS-based services for alerts propagation have been implemented and integration with similar domotics installation in Madrid and Nuremberg has been achieved. From a top level point of view, the system makes use of both wired and wireless network communication mediums. Two different middleware were used in order to test the feasibility and connectivity issues via wireless transmission. The first one is the OSGi and it is applied for PDA and PC devices. The second one is the JADE framework, which is based in Agent’s structure. It is applied for more limited devices, such as Symbian mobile and smart phones (see next paragraph). Finally, the domotic modules are integrated with the overall ASK-IT platform providing a single user interface under the common ASK-IT client software, and thus acting as “portal” for accessing all ASK-IT services (Route planning, Searching for Points of Interest, e-Learning, e-Working, etc. – see [2]).
Fig. 3. Remote access and control to domotics (while away from home)
Fig. 4. Mobile access and control of the home environment
An Interoperable Concept for Controlling Smart Homes – The ASK-IT Paradigm
381
An innovative aspect of this work is that of mobility and freedom offered to the user with regards to home environment automation access and control. In order to support the modern citizen in travelling around the city and the world, the project has produced equivalent interfaces for Personal Digital Assistants (PDAs) and smart phones (see Fig. 3. and Fig. 4.). In this way, users can from anywhere in the world alter, or get informed about, the status of their home appliances and intelligent systems. For instance, the user may turn the oven on to warm-up the food while in car and on the way to home, and remain assured that the system will act smartly, for example if smoke is detected. Current emergency messaging solutions include direct messaging through pop-messages on the closest display and /or SMS-based text messaging as appropriate.
4 Development Considerations As shown in Fig. 5. , we have generally two different categories of networks: The Local Devices Network (LDN) connects all stationary and mobile devices inside the home to an overall network. The LDN has a heterogeneous structure, what means that more than one communication media will be used: E.g. BlueTooth, powerline, twisted-pair or ISM (free radio-frequency band). BlueTooth is required to connect a mobile phone to the LDN, available as common user interface when the MI user is at home. But because of its higher node cost and restricted communication range, BlueTooth is not appropriate to control all devices at home directly. Therefore, a gateway is used, translating the BlueTooth commands to other communication media, fulfilling the requirements for low cost and easy installation. Examples of appropriate communication media are Konnex PL132 (Power-Line), EIB (twisted pair) or RF433 (ISM radio frequency). The Wide-Area Networks (WAN), such as GSM network or Internet, are used to supervise and control the domestic devices from anywhere outside the home. As access platforms, various devices are taken into account, including mobile phones, PDAs or Notebooks. The domotic HIT’s site has followed the same approach as for all ASK-IT services. The domotics web service is integrated into Data Management Module at server side
Fig. 5. ICT networking at HIT’s domotic system
382
E. Bekiaris et al.
Sensors
Control Unit
Web service
Fig. 6. HIT’s domotic lab system architecture
(see [12] for a detailed presentation of the ASK-IT architecture and modules). Ontological objects have been defined in order to bind and map the ASK-IT server side with the domotic web service. The communication is handled by the PEDA agent at device side, and a JADE middleware controls the message exchange between client and server. The user can control devices remotely via GPRS or WLAN connection and changing the status of them, while he is away from home. The HIT’s domotic lab uses the OBIX protocol. The oBIX stands for Open Building Information eXchange, and it is an industry-wide initiative in to define XML and Web Services. The sensors are connected to main electrical board via a multi plug interface panel. Each sensor is wired or wireless and allocates a separate plug at the panel. The web service has been generated, contains also OBIX protocol features, as an external library. This is fully transparent to developer. The “output” to the outside world is a generated web service in the form of .jar file. The following diagram describes the general system architecture for HIT’s domotic lab. Following the overall ASK-IT approach, the user interface for the desktop application (PCs and Laptops) has been developed using Java Swing library version 1.3, while the user interface for the PDA application has been developed using Java AWT library. Adaptation is realised by employing the Decision Making Specification Language (DMSL) engine and a run-time environment [10] to offer a powerful rule definition mechanism and promote scalability by utilizing external rule files while relieving the actual UI implementation code by any adaptation-related conditionality. The mobile user interface was developed using the MID Profile 2.0 API. The javax.microedition.lcdui.game.GameCanvas() class was used, which has the Canvas() class as the base. Each room is considered as a separate Canvas() and each device icon allocates a specific interaction area on it. The GUI provides includes a cursor to easily point and select the preferred option icon (room, device, etc.). In case the user has colour impairment, the GUI is transformed into black and white display layers.
An Interoperable Concept for Controlling Smart Homes – The ASK-IT Paradigm
383
5 Design Considerations Focusing on Accessibility CERTH/HIT’s domotic lab has also been design with particular attention, among others, to the needs of wheelchair and visually impaired users. For the design of the services of the domotics prototype in question, the differences in communication abilities of the elderly and various disability types were considered by taking into account the design guidelines collected in MOSAIC-HS for user interfaces to home environment control [1]. Fig. 7., in the interface design of the control (touch) panel, particular attention was paid to the needs of the elderly, people with low vision, and people with colour impairments (see Fig. 7.). Furthermore, the home automation system supports adaptation through activation of different disability profiles. For instance, when the deaf user profile is activated, the various acoustic messages are additionally rendered to equivalent visual messages to the user’s nearest displays, including when the door bell is ringing. In addition to the various pop-up messages on the various screens (Media Centre, PDA, etc.), the lighting system is also used to communicate such a message. In the previous example with the door bell ringing, the current room lights are flashing for 3 seconds. In addition, physical accessibility has been considered, e.g., the wall-mounted displays are placed in a lower height so that they can be accessed by individuals on wheelchairs, following key guidelines coming from the TELAID project [6] regarding the design of public access terminals for use by elderly and disabled people (e.g., locating kiosks, display/control design, information requirements). For instance, TELSCAN’s collaborative testing of booking terminals with the SAMPLUS project found that a terminal height of 90 cm would be acceptable for more people, as long as the angle of the terminal display was adjustable (about 45 and 30 degrees). As the ASK-IT project paid particular attention to the needs of wheelchair users, a special software was developed to allow interacting with the ASK-IT devices and applications through a wheelchair steering control (joystick) by means of wireless connection (bluetooth and infrared). In this way, the system support the scenario of interchanging between steering the wheelchair (inside or outside the house) and interacting with any ASKI-IT service, including home automation options, through a PDA device for example to unlock the front door to a friend ringing the door bell. In terms of software adaptation, the UIs can adapt to the user’s profile and all input devices are interfaced as “plug and play” devices that the host machines (Media
Rooms view(homepage)
Devices view (bedroom 2 of 2)
Sensors view (1of 2)
Fig. 7. Icons-based, large fonts, high contrast are the main GUI characteristics of the touch panel units to ensure accessibility and usability to elderly and vision-impaired users
384
E. Bekiaris et al.
center, PC, PDA, etc.) recognize as standard input devices. To this end, the Unified User interfaces methodology and architecture [9] were followed and the DMSL language mentioned above was employed define the user interface of the domotics application in PDAs. This approach is further detailed in Leuteritz et al. 2009. In this context, a number of UI elements were designed in various forms (polymorphic task hierarchies) according to specific user- / context- parameter values. In this way, the user may change on the fly the text font family, size, text colours, background colours, etc. according to his/her preferences or interaction needs. Similarly, the mobile user interface may change font size, family and colours as a means to adapt to specific user preferences. In the ASK-IT case study, the correlation of the various alternative designs of UI elements to user and context related parameters (i.e., the adaptation according to generic, predefined profiles) has been made on a normative basis. Therefore, it can not currently be claimed to be optimal, and needs to be further elaborated and verified in the future, through feedback from user trials in real contexts. However, this work has made clear that the proposed approaches allows embedding in PDA and mobile applications such decision making logics and automatic adaptation facilities for the benefit of accessibility and better user experience.
6 Preliminary Evaluation Results Stand alone trials were conducted on 5 different occasions from 6th of June till 11th of July 2008. The ASK-IT services reviewed were: Domotics (our focus here), Planning urban/interurban/national trip, POI & social events search, E-Learning, and invehicle navigation support. Five (5) male and four (4) female users (mean age: 42.75±16.2) participated in the tests. Elderly users (3), wheelchair users (3), as well as people with hearing and upper/lower limb impairments participated in tests, as mobility-challenged user types within the ASK-IT framework.
Usability
Median values (1=Very high to 5=Very low)
5 4,5 4 3,5
4
3
3
3
2,5 2 1,5
2 3 2
1 e e e y s n ic cy ic ac lit io es rv rf rv bi en ct en te se ci se fa na In n tiv n ffi ar tis o c e o E i e i a n fe L S at at hi Ef lis lis ac ca M ca lo Lo an on um N H
Fig. 8. Usability findings
3
An Interoperable Concept for Controlling Smart Homes – The ASK-IT Paradigm
385
User Acceptance
Median scores (1=Very high to 5=Very low)
5 4,5 4 3,5 3 2,5
3
3
3
2 1,5
2 2
2
2
1 g tin ar St
up sy Ea
e us of
e n y on ce io nc lit si at ta en bi vi is ia ep fid ro al el c n p n R o o ac n C r rs io at se pe lU m rm l e o t f ra In ys ve S O
Fig. 9. User acceptance findings
In general, all participants agreed that ASK-IT offered a complete solution that satisfies their needs (see Fig.8 and Fig.9). Yet, the elderly mentioned being uncomfortable with working with such technologies and may need extensive effort for training. All participants agreed that this solution could potentially: be used frequently; make them feel more confident; take away stress and offer more freedom of movement; and offer freedom of choice. The [remote] domotics were graphically appealing to all MI groups. ASK-IT domotics was perceived as the least difficult to learn and use among the ASK-IT services and participants are willing to own such a home automation system. In fact, installation and system usage costs were the most common post-debriefing questions.
7 Conclusions and Future Work The evaluation data from the integrated tests are still under process and shall be presented in further detail elsewhere. This process is anticipated to bring forward new dimensions with regards to concept and the design of mobility and accessibility informed domotics. For instance, early tests with indicative users showed that the provision of effective and efficient human control on the dynamic and distributed system is also critical. In particular, is now clear that it will be necessary to establish an appropriate balance between automated learning on the part of the intelligent environment, human behaviour patterns, and human intervention aimed at directing and modifying the behaviour of the environment. This aspect of the emerging technologies needs to be carefully taken into account particularly when elderly and cognitively disabled people are involved, as services that monitor the health status or the location of users may also interfere with their capacity of taking decisions. Future work will now focus on the upcoming results of the undergoing user test and fine trimming of the interaction concepts and designs of all the employed platforms. Then in future research works, efforts shall focus on the intelligence of the underlying system, on co-morbidity issues, and on
386
E. Bekiaris et al.
collaborative interfaces for houses shared by more than one people with diverse abilities, skills, preferences and needs. Indoor user localisation and identification mechanisms are a key issue in this direction, while the ethical issues involved require that all future concepts will have to evolve around consensus building processes among real users and technology designers.
References 1. Bekiaris, E., Portouli, E.: Existing guidelines on user interface software for home environment control for elderly and disabled users. Deliverable 7.1.1 of the MOSAIC-HS project (TIDE No DE3007), Commission of the European Communities (1998) 2. Konstantinopoulou, L., Amditis, A., Vlachos, F., Emmanouilidis, V., Orthopoulos, Y., et al.: Reissued Services Specifications. Deliverable D5.7.1 of the project ASK-IT (Contract No. IST-2003-511298), Commission of the European Communities (2006) 3. Leuteritz, J.-P., Widlroither, H., Mourouzis, A., Panou, M., Antona, M., Leonidis, A.: Development of Open Platform Based Adaptive HCI Concepts for Elderly Users. In: Stephanidis, C. (ed.) Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009), San Diego, California, USA, July 19-24. Springer, Berlin (2009) 4. Manchado, P., Bekiaris, E., Chalkia, E., Mourouzis, A., Panou, M., et al.: Market analysis. Deliverable 5.4.1 of the OASIS project (Grant Agreement no. 215754). Commission of the European Communities (2009) 5. Mourouzis, A., Stephanidis, C.: Universal Access to information and services for users on the move. In: Proceedings of the 1st ASK-IT International Conference Mobility for All – The Use of Ambient Intelligence in Addressing the Mobility Needs of People with Impairments: The Case of ASK-IT, Nice, the French Riviera, France, October 26-27 (2006) 6. Nicolle, C., Burnett, G. (eds.): TELSCAN Code of Good Practice and Handbook of Design Guidelines for Usability of Systems by Elderly and Disabled Travellers. Deliverable 5.2 of the project TELSCAN, Commission of the European Communities (1999) 7. Roman, G.-C., Picco, G., Murphy, A.: Software engineering for mobility: A roadmap. In: Proceedings of the 22nd International Conference on Software Engineering (2000) 8. Salomaa, J.D., Maini, W.: Accessibility and Mobile Phones. In: Proceedings of the CSUN’s Sixteenth Annual International Conference Technology and Persons with Disabilities, March 19-24 (2001), http:// www.csun.edu/cod/conf2001/proceedings/0156salomaa.html 9. Savidis, A., Stephanidis, C.: Unified User Interface Design: Designing Universally Accessible Interactions. International Journal of Interacting with Computers 16(2), 243–270 (2004) 10. Savidis, A., Antona, M., Stephanidis, C.: A Decision-Making Specification Language for Verifiable User-Interface Adaptation Logic. International Journal of Software Engineering and Knowledge Engineering 15(6), 1063–1094 (2005) 11. Simões, A., Gomes, A., Bekiaris, E.: Use Cases. Deliverable 1.1.2 of the ASK-IT project (IST-2003-511298), Commission of the European Communities (2006) 12. Vlachos, F., Konstantinopoulou, L., Bimpas, M., Amditis, A., Spanoudakis, N., et al.: System Architecture Concept methodologies and tools. Deliverable 5.7.3 of the ASK-IT project (IST-2003-511298), Commission of the European Communities (2006)
Towards Ambient Augmented Reality with Tangible Interfaces Mark Billinghurst, Raphaël Grasset, Hartmut Seichter, and Andreas Dünser The Human Interface Technology New Zealand (HIT Lab NZ), University of Canterbury, Private Bag 4800, Christchurch, New Zealand {mark.billinghurst, raphael.grasset, hartmut.seichter, andreas.duenser}@hitlabnz.org
Abstract. Ambient Interface research has the goal of embedding technology that disappears into the user’s surroundings. In many ways Augmented Reality (AR) technology is complimentary to this in that AR interfaces seamlessly enhances the real environment with virtual information overlay. The two merge together in context aware Ambient AR applications, which allow users to easily perceive and interact with Ambient Interfaces by using AR overlay of the real world. In this paper we describe how Tangible Interaction techniques can be used for Ambient AR applications. We will present a conceptual framework for Ambient Tangible AR Interface, a new generation of software and hardware tools for development and methods for evaluating Ambient Tangible AR Interfaces. Keywords: Augmented Reality, Ambient Interfaces, Tangible Interfaces.
1 Introduction One of the overarching goals of human computer interaction is to the make the computer vanish and to allow technology to invisibly assist people in their everyday real world tasks. Over the last several decades there have been a number of compelling visions presented showing how this may be achieved, such as Weiser’s concept of Ubiquitous Computing [1], Norman’s Invisible Computing [2] and Dourish’s ‘Embodied Interaction’ [3]. Similar to this earlier work, Ambient Intelligence [4] has the goal of embedding context technology that disappears into the users surroundings. Ambient Intelligence or AmI typically refers to electronic environments that are sensitive and responsive to the presence of people. In developing invisible interfaces, providing information display back to the user is a key element. Many AmI applications use traditional screen based or projected displays. However, one of the more interesting approaches to information display is through Augmented Reality. Augmented Reality (AR) applications are those in which three-dimensional computer graphics are superimposed over real objects, typically viewed through headmounted or handheld displays [5]. In many ways Augmented Reality technology is complimentary to AmI in that AR interfaces seamlessly enhance the user’s real environment with virtual information J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 387–396, 2009. © Springer-Verlag Berlin Heidelberg 2009
388
M. Billinghurst et al.
Fig. 1. An Ambient AR interface showing real time temperature information superimposed over the real world
overlay. The two merge together in context aware Ambient AR applications, which allow users to easily perceive and interact with Ambient Interfaces by using AR overlay of the real world. For the time being display and tracking technologies are only stepping stones for achieving truly non-intrusive interfaces. Ambient AR applications are those which use AR technology to represent context information from an Ambient Interface. For example, Rauhala et. al. [6] have developed an AR interface which shows the temperature distribution of building walls. In this case they embedded temperature sensors in room walls, and then wirelessly sent temperature information to a mobile phone AR interface. When the user pointed their phone at the wall, on the phone screen they could see a virtual image of the current temperature distribution superimposed over a live video of the wall. In this way AR technology provides a natural way to make visible the invisible context information captured by the Ambient Interface application. Although AR technology is very promising there is still a lot of research than needs to be conducted on how to interact with Ambient AR applications. There has been substantial research conducted in Augmented Reality, much of it has been focused on the underlying technology (such as tracking and display devices), rather than the user experience and interaction techniques. Interaction with AR environments has usually been limited to either passive viewing or simple browsing of virtual information registered to the real world. Few systems provide tools that let the user interact, request or modify this information effectively and in real time. Furthermore, even basic interaction tasks, such as manipulation, copying, annotating, and dynamically adding and deleting virtual objects in the AR scene have been poorly addressed. In our research we are exploring new interaction techniques for Ambient AR interfaces. In this paper we describe how Tangible Interaction concepts can be used to design Ambient AR applications. Users already know how to manipulate real world objects and so by building interaction techniques around object manipulation very intuitive interfaces can be developed. We will describe a Tangible Interface design framework and subsequently demonstrate how this can be used to support the development of Ambient AR Interfaces. Based on this we introduce our in-house authoring tools that can be used to develop Ambient AR interfaces based on Tangible AR techniques and finally look at the methodologies and evaluation concepts relevant for these techniques.
Towards Ambient Augmented Reality with Tangible Interfaces
389
2 Related Work Our research work is based on earlier work in the areas of Augmented Reality, Ambient Interfaces and Tangible User Interfaces. Tangible user interfaces (TUI) describe physical objects which are able to translate user actions into input events in the computer interface [7]. Thus, tangible user interfaces make virtual objects accessible to the user through physical proxies. TUIs can be seen as an implementation of ‘direct manipulation’ concepts described in by Shneiderman [8]. Though, not conceived as TUI as such, Shneiderman describes parameters like immediacy and haptic quality as important concepts to foster physical engagement with an object for the purpose of lowering the user’s mental load. Shneidermans’ idea of ‘direct manipulation’ is the transposition of the abstract nature of digital interfaces evolving over time. Over a relatively short period of time more and more features have been accumulated in digital interfaces. In addition, legacy functionality was retained and consequently legacy modes of interaction will remain. In turn new, emerging interfaces will be forced to circumvent ambiguities by reinventing modes of engagement even with tangible objects. Modes of ‘direct manipulation’ therefore need new interfaces or a removal of legacy modes. Shneiderman identifies three core properties of ‘direct manipulation’ [8]: • Continuous representation of the object of interest • Physical actions or labeled buttons pressed instead of complex systems • Rapid incremental, reversible operations whose impact on the object of interest is immediately visible. Hutchins et al. [9] elaborated, on the basis of user observation, that two main aspects of information retrieval through direct manipulation are achieved; one is that the user is relieved from interpreting the representation and consequently is able to focus on the goal rather than the process. Secondly, the mnemonics are tied to an external instance and as such do not change modes. Therefore a direct link between object and action is maintained – a crucial concept for TUI in augmented reality. Tangibility and the need for direct manipulation can be identified as important concepts for interaction with ambient interfaces. Physical representation is only a small part of those concepts. More important is the representation of flow and logic which provide an important clue to the understanding of information. The ‘Universal Constructor’ developed by Frazer et al. [10] used the metaphor of urban structures as a networked system. Actual cubes represent the spatial relationship of autonomous working units interconnected as nodes. Hence, it was essential for the users’ of the installation to understand the flow of information between the nodes; ambient meaning conveyed through the link between the real and virtual objects. Current Tangible interfaces provide very intuitive manipulation of digital data, but limited support for viewing 3D virtual objects. For example, in the Triangles work [11], physical triangles are assembled to tell stories, but the visual representations of the stories are shown on a separate monitor distinct from the physical interface. Presentation and manipulation of 3D virtual objects on projection surfaces is difficult, particularly when trying to support multiple users each with independent viewpoints. In contrast, Augmented Reality technology can be used to merge the display space and interaction space. So we believe that a promising new AR interface metaphor can
390
M. Billinghurst et al.
arise from combining the enhanced display possibilities of Augmented Reality with the intuitive manipulation of Tangible User Interfaces. We call this combination Tangible Augmented Reality. In the next section we describe the Tangible AR metaphor in more detail and show how it can be used to provide seamless interaction in Ambient Intelligent interfaces.
3 Ambient AR Interface Conceptual Framework The combination of TUI and AR techniques provides an interaction metaphor which we call Tangible Augmented Reality (Tangible AR) [12]. Tangible AR interfaces are those in which: 1) each virtual object is registered to a physical object and 2) the user interacts with virtual objects by manipulating the corresponding tangible objects. So, in the Tangible AR approach the physical objects and interactions are equally as important as the virtual imagery and provide an intuitive way to interact with the AR interface. One of the most important outcomes of developing the Tangible AR metaphor is that it provides a set of design guidelines that can be used to developed effective interfaces. In designing Tangible AR interfaces there are three key elements that must be considered (Figure 2): (1) The physical elements in the system (2) The visual and audio display elements (3) The interaction metaphor that maps interaction with the real world to virtual object manipulation. A Tangible AR interface provides true spatial registration and presentation of 3D virtual objects anywhere in the physical environment, while at the same time allowing users to interact with this virtual content using the same techniques as they would with a real physical object. So an ideal Tangible AR interface facilitates seamless display and interaction. This is achieved by using the design principles learned from Tangible User Interface, including: • • • • • • •
The use of physical controllers for manipulating virtual content. Support for spatial 3D interaction techniques (such object proximity). Support for both time- and space-multiplexed interaction. Support for multi-handed interaction. Matching the physical constraints of the object to the task requirements. The ability to support parallel activity with multiple objects. Collaboration between multiple participants
We can extend the previous framework in the context of ambient applications as illustrated in Figure 3. In this environment, the control of a tangible interface can act on sensors or actuators embedded in the environment with Augmented Reality providing a support for visual information. For example, different tangible cubes can be shifted and rotated on the surface of a table to change the lighting intensity and colour in a
Towards Ambient Augmented Reality with Tangible Interfaces
391
Fig. 2. Tangible AR Interface Components
room. By complementing this with Augmented Reality, the user can have a virtual preview of the final result. Using another interaction metaphor, a user could move a tangible cube on a 2D plane to control the distribution of air flow in a room from real fans. Compared to a more traditional remote command interface, a user will benefit from the more intuitive interaction of a TUI, using spatial constraints and a better control/display mapping (in our scenario, moving the cube in a certain location will concentrate airflow in this location). We can thus re-use the previously described framework in the context of an Ambient Application. However, some inherent characteristics of Ambient Applications can be easily illustrated here. Firstly, the output components will not only be virtual but also physical (actuators, sensors, etc). Secondly, Ambient AR Applications intrinsically use sparse distributions of sensors/actuators within a room, building, or urban environment. Finally, the definition and design of an efficient interaction metaphor will certainly be more challenging since the user can manipulate physical elements in close proximity, and visual feedback can be provided at a distance (e.g. interaction on table vs. sensors mounted in the room). We can thus redraw the previously presented Tangible AR metaphor incorporating these new factors as illustrated Figure 4.
Fig. 3. On the left, traditional a Tangible AR application. On the right using a Tangible Interface with an Ambient Augmented Reality Application.
392
M. Billinghurst et al.
Fig. 4. Ambient Tangible AR Interface
4 Ambient Augmented Reality – The Tools In order to develop Ambient AR application there is a need for tools for programmers and application developers. Higher level AR authoring tools address the need for interactivity, and thus user input of any kind in AR environments. They are essential in providing a pathway for interface designers to prototype or create ambient interfaces. These tools can be developed on different levels: high level GUI driven feature rich tools and frameworks for programming environments, and low level development libraries for computer vision or input fusion. Mostly, one level sits on top of the other and the various authoring tools are geared towards a certain application domain. 4.1 Authoring Software There are a number of software tools that have been developed for high-level AR authoring. For example, DART[13], a plug-in for Adobe Director, inherently has access to a wealth of pre-existing infrastructure. ImageTclAR [14] introduced a more rigid framework which was only capable of compile time definition of interactions. APRIL [15], in comparison, is addressing the connection between real and virtual environments with a much higher level framework. It provides an extensible AR authoring platform based on XML descriptions. However, interactions are implemented in non-interpretive languages addressed through the XML parser. At the HIT Lab NZ we have developed ComposAR [16], which is unique compared to other AR authoring tools in its ability to support different levels of interaction. We followed a similar approach to an earlier work, Hampshire et al. [17] by extending the notion of a fiducial marker into that of a sensor. The intermediate level of the system implements an Action-Reaction mechanism imitating a Newtonian Physics paradigm. To distinguish the different levels where input and output are connected we describe the chain of events through Sensors, Triggers and Actions. Sensors provide a raw live data-stream into the authoring environment. All physical devices including keyboards, mice and other conventional input devices are sensors. The data provided by sensors is elevated to the state of ``information'' only once it is interpreted by a Trigger, which evaluates the input and decides whether or not to invoke an Action. An example of this process is the monitoring of the visibility of a
Towards Ambient Augmented Reality with Tangible Interfaces
393
marker. Currently ComposAR provides some basic interaction methods based on a standard repertoire common in AR applications, including interaction based on fiducial proximity, occlusion, tilting and shaking. Through this very rough abstraction ComposAR can provide a convenient way to create a broad variety of interfaces including ambient displays or simulations. As ambient interfaces interact with low level data this methodology allows us to quickly create demonstrations which react to data from an RSS feed or the tilt-sensor in a desktop computer. As this approach is net-transparent, displays and sensors can be meshed together. 4.2 Hardware Authoring In addition to software authoring tools, for Ambient AR interfaces, there is a need for hardware tools for rapid prototyping. The design and the development of AR Tangible interfaces has demonstrated the need for tools that can easily integrate physical actuators and sensors into Ambient AR applications. By combining a Tangible Interface with intelligible sensors, users can benefit from a new range of design possibilities such as kinetic movement, skin sensitivity, and sustainable power sources. In the past it was difficult to explore such designs because of the high level of hardware skills required and the difficultly of software integration with this technology. However more affordable and intuitive solutions have emerged. Simple programming microcontroller boards (like the Arduino [18]) are available and can be remotely read or controlled from a standard PC. Similarly, USB or wireless components have also become simpler and easier integrated into an electronic interface. In this context, the research community has pursued the goal of creating physical computing elements that can be easily integrated into the user interface. In this section we will present a few examples from this category. The Phidgets toolkit [19] combines a set of different independent sensors that are interfaced through a USB port. A low level software interface allows users to connect them in at run-time and access them through a .NET interface. The CALDER Toolkit [20] introduces a similar approach, but also adds wireless components. The iStuff framework [21] facilitates the support of distributed sensors and actuators through a transparent network interface and an API based on system events. Papier-Mache [22] has support for RFID in an event-based toolkit integrated with other technologies like computer vision tracking. It also has high level tools for the developer, through a visual debugging interface, and monitoring tools. Most of these toolkits are designed for developers with good programming skills and a good understanding of hardware and so are inaccessible for many potential users. Furthermore, few of these toolkits are integrated with more general libraries for developing Augmented Reality applications. One way to make these tools more accessible is by using a visual programming interface. For example, in the Equator project, ECT has a visual interface for building applications that support a large range of physical sensors [23]. Support for AR Applications has recently been added to this library (see [17]). More recently, we also provide support in ComposAR [16] for physical input devices [24]. We have been developing a generic hardware toolkit supporting a large range of physical actuators and sensors. Our toolkit, Pandora's Box, is a multi-language library which uses a client server approach to access different types of hardware toolkits
394
M. Billinghurst et al.
(e.g. Arduino, In-house toolkit, T3G, etc.). This tool is being integrated into the ComposAR software toolkit and by doing so we are creating an all-in-one solution for easy development of physical, ambient and visually augmented interfaces. Nonetheless, there are a number of research areas that remain. Providing a transparent interface for multiple hardware devices is still challenging and requires further development and testing. Standardization, and providing a more generic interface for electronic boards and sensors will help with this issue. Research also needs to be conducted on the representation and interaction techniques for giving the user access to these sensors. Issues include how can sensors be visually represented? How can they be easily configured? How can sensors be combined and high level information provided to the end-user in a relevant way? Initial work has been conducted in this area, for example with flow control diagrams, but their generic nature makes them difficult to use for novice programmers or end-users.
5 Design and Evaluation of Ambient AR Systems There is still very little knowledge regarding the proper design and evaluation of AR systems [25, 26] and few general design guidelines for AR interfaces. Most guidelines seem to be rather specific suggestions for the specific design challenges of individual systems. One reason for this is the huge variety of AR system implementations. These systems can be realized with different hardware, tracking technologies and software frameworks. Although we have definitions for what constitutes an AR system [5], these definitions are quite broad. Therefore, deriving common design principles seems to be a challenging task. General HCI guidelines and user-centered design guidelines can serve as a starting point for the development of more general AR design guidelines [26]. However, these must be refined to meet the specific demands of AR systems. At this stage even less is known about suitable guidelines for the development of the systems discussed in this paper. We can use design guidelines derived from AR systems research, and try to combine this with knowledge from TUI design. This might be a good first approach to develop such systems. However, the whole is more that the sum of its parts, so we argue that Ambient AR systems with Tangible Interfaces require a separate paradigm with respect to proper system design. There also is a need for more research on suitable evaluation techniques for Ambient AR interfaces. We found that only few AR related research publications include formal or informal user evaluations. In a recent survey we estimate that only 10% of AR articles published in ACM and IEEE between 1993 and 2007 included some user evaluation [25]. Excluding informal user evaluations (evaluations which did not follow a rigorous evaluation program), the percentage is around 8% which is similar to findings reported by Swan et al. [27]. This relative lack of user evaluations in AR research could be due to a lack of education on how to evaluate such experiences. Again, this is even more the case for Ambient AR with Tangible Interfaces. But here it is also worthwhile to collect knowledge gathered in other disciples and combine this with the specific demands of these systems. More general evaluation techniques and approaches used in HCI can be readily applied here. In the early design stages of novel interfaces exploratory evaluation techniques can be applied with the aim to uncover issues that need further investigation. In later stages these issues can be studied using more rigorous approaches.
Towards Ambient Augmented Reality with Tangible Interfaces
395
6 Conclusion We have described in this paper the issues arising from the combination of Augmented Reality, and Ambient and Tangible User Interfaces. The introduced framework is an initial step to be evaluated further with the development of new Ambient AR interfaces. The paper also introduced novel methods of handling interaction within AR authoring tools which are valuable for new Ambient AR interfaces. Finally, we described some issues related to the empirical evaluation of these new types of interfaces, and the challenges in this area.
References 1. 2. 3. 4.
5. 6.
7. 8. 9. 10. 11.
12. 13.
14. 15.
16.
Weiser, M.: The computer for the 21st century. Scientific American 265, 94–104 (1999) Norman, D.A.: The invisible computer. MIT Press, Cambridge (1998) Dourish, P.: Embodied Interaction: Exploring the Foundations of a New Approach (1999) Aarts, E., Harwig, R., Schuurmans, M.: Ambient intelligence, The invisible future: the seamless integration of technology into everyday life. McGraw-Hill, Inc., New York (2001) Azuma, R.T.: A Survey of Augmented Reality. Presence - Teleoperators and Virtual Environments 6, 355–385 (1997) Rauhala, M., Gunnarsson, A.-S., Henrysson, A.: A Novel Interface to Sensor Networks using Handheld Augmented Reality. In: MobileHCI 2006, pp. 145–148. ACM, Helsinki (2006) Ishii, H., Ullmer, B.: Tangible bits: towards seamless interfaces between people, bits and atoms. In: CHI 1997, pp. 234–241. ACM, New York (1997) Shneiderman, B.: The future of interactive systems and the emergence of direct manipulation. Behaviour and Information Technology 1, 237–256 (1982) Hutchins, E.L., Hollan, J.D., Norman, D.A.: Direct manipulation interfaces. Hum.Comput. Interact. 1, 311–338 (1985) Frazer, J.H., Frazer, J.M., Frazer, P.A.: Intelligent physical three-dimensional modeling systems. In: Computer Graphics Conference, vol. 80, pp. 359–370 (1980) Gorbet, M.G., Orth, M., Ishii, H.: Triangles: tangible interface for manipulation and exploration of digital information topography. In: CHI 1998, pp. 49–56. ACM, New York (1998) Kato, H., Billinghurst, M., Poupyrev, I., Tetsutani, N., Tachibana, K.: Tangible Augmented Reality for Human Computer Interaction. Nicograph, Nagoya, Japan (2001) MacIntyre, B., Gandy, M., Dow, S., Bolter, J.D.: DART: a toolkit for rapid design exploration of augmented reality experiences. In: Proceedings of the 17th annual ACM symposium on User interface software and technology. ACM, Santa Fe (2004) Owen, C., Tang, A., Xiao, F.: ImageTclAR: A Blended Script and Compiled Code Development Systems for Augmented Reality (2003) Ledermann, F., Schmalstieg, D.: APRIL: A High-level Framework for Creating Augmented Reality Presentations. In: Proceedings of the IEEE Virtual Reality 2005 (VR 2005), pp. 187–194 (2005) Seichter, H., Looser, J., Billinghurst, M.: ComposAR: An Intuitive Tool for Authoring AR Applications. In: Saito, M.A.L., Oliver, B. (eds.) International Symposium of Mixed and Augmented Reality (ISMAR 2008), pp. 177–178. IEEE, Cambridge (2008)
396
M. Billinghurst et al.
17. Hampshire, A., Seichter, H., Grasset, R., Billinghurst, M.: Augmented Reality Authoring: Generic Context from Programmer to Designer. In: Australasian Computer-Human Interaction Conference, OZCHI 2006 (2006) 18. Arduino, http://www.arduino.cc 19. Greenberg, S., Fitchett, C.: Phidgets: easy development of physical interfaces through physical widgets. In: UIST 2001: Proceedings of the 14th annual ACM symposium on User interface software and technology, pp. 209–218. ACM, New York (2001) 20. Lee, J.C., Avrahami, D., Hudson, S.E., Forlizzi, J., Dietz, P.H., Leigh, D.: The calder toolkit: wired and wireless components for rapidly prototyping interactive devices. In: DIS 2004: Proceedings of the 5th conference on Designing interactive systems, pp. 167–175. ACM, New York (2004) 21. Ballagas, R., Ringel, M., Stone, M., Borchers, J.: iStuff: a physical user interface toolkit for ubiquitous computing environments. In: CHI 2003: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 537–544. ACM, New York (2003) 22. Klemmer, S.R., Li, J., Lin, J., Landay, J.A.: Papier-Mache: toolkit support for tangible input. In: CHI 2004: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 399–406. ACM, New York (2004) 23. Greenhalgh, C.I.S., Taylor, I.: ECT: A Toolkit to Support Rapid Construction of Ubicomp Environments. In: Ubicomp 2004 (2004) 24. Hong, D., Looser, J., Seichter, H., Billinghurst, M., Woo, W.: A Sensor-based Interaction for Ubiquitous Virtual Reality Systems. In: ISUVR 2008, Korea, pp. 75–78 (2008) 25. Dünser, A., Grasset, R., Billinghurst, M.: A Survey of Evaluation Techniques Used in Augmented Reality Studies. Technical Report TR-2008-02. HIT Lab NZ (2008) 26. Dünser, A., Grasset, R., Seichter, H., Billinghurst, M.: Applying HCI principles to AR systems design. In: MRUI 2007: 2nd International Workshop at the IEEE Virtual Reality 2007 Conference, Charlotte, North Carolina, USA (2007) 27. Swan, J.E., Gabbard, J.L.: Survey of User-Based Experimentation in Augmented Reality. In: 1st International Conference on Virtual Reality, Las Vegas, Nevada (2005)
Rapid Prototyping of an AmI-Augmented Office Environment Demonstrator Dimitris Grammenos1, Yannis Georgalis1, Nikolaos Partarakis1, Xenophon Zabulis1, Thomas Sarmis1, Sokratis Kartakis1, Panagiotis Tourlakis1, Antonis Argyros1,2, and Constantine Stephanidis1,2 1
Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH), Greece 2 Computer Science Department, University of Crete, Greece
[email protected]
Abstract. This paper presents the process and tangible outcomes of a rapid prototyping activity towards the creation of a demonstrator, showcasing the potential use and effect of Ambient Intelligence technologies in a typical office environment. In this context, the hardware and software components used are described, as well as the interactive behavior of the demonstrator. Additionally, some conclusions stemming from the experience gained are presented, along with pointers for future research and development work.
1 Introduction The field of Ambient Intelligence envisages a future where our surrounding environment is populated by several interoperating computing-embedded devices of different size and capabilities, which are interweaved into “the fabric of everyday life” and are indistinguishable from it [1]. In such “intelligent” environments, the way that people perform everyday tasks is expected to radically change. Multimodal, direct, “natural” interaction methods such as speech, touch, and gestures, are expected to be widely used, in combination with knowledge about contextual factors such as the user’s profile, preferences and location. Recently, the Institute of Computer Science of the Foundation for Research and Technology – Hellas (ICS-FORTH) has been running a horizontal interdisciplinary Research and Development Programme in the field of Ambient Intelligence (AmI) that serves as a connecting thread for the activities of the Institute’s individual laboratories. In this context, a small-scale prototype has been set up exhibiting the concept of Ambient Intelligence in a typical office environment, showcasing, on the one hand, the potential impact of related technologies in everyday activities, and, on the other hand, the expected paradigm shift in the way that people will perceive and interact with information and communication technologies (ICT) in the future. This paper provides an overview of the demonstrator prototyping (design and development) process and its outcomes, elaborating on the constituent hardware and software components and interactive behavior. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 397–406, 2009. © Springer-Verlag Berlin Heidelberg 2009
398
D. Grammenos et al.
2 Related Work During the design phase of the demonstrator, related published work was taken into consideration. For example, in 1999, Stanford University started the interactive workspaces project [2]. A prototype workspace, the iRoom, was created that contained several types of large displays including a conference-room table. The room was equipped with cameras, microphones, wireless LAN support, and several interaction devices. A dedicated meta-operating system (iROS) was developed to tie together the individual devices, and three subsystems addressing the tasks of moving data, moving control, and dynamic application coordination. A couple of years later, IBM, in the context of its BlueSpace project [3] constructed a prototype future office and implemented some related applications. The workspace incorporated several types of sensors and environmental effectors, and alternative displays including a steerable projection system. The applications developed were related to workspace and technology personalization and configurability for either collaborative or individual work. The Fraunhofer IPSI Institute in Darmstadt, based on the idea of cooperative buildings, introduced the notion of roomware components [4] as room elements (e.g., furniture, wall) integrating information and communication technologies. In this context, some related inter-operating components were developed, including a large touch-sensitive wall, a touch-table, and a chair with an integrated pen-based computer. Finally, Pingali et al. [5] have investigated the concept of augmented collaborative spaces and presented some of the technology and architectural challenges that need to be addressed to realize such spaces. Furthermore, they introduced the concept of steerable interfaces that can be moved around to appear on ordinary objects and surfaces anywhere in a space.
3 Development of the Smart Office Demonstrator 3.1 Physical Space, Requirements and Constraints The physical space available for creating the demonstrator was part of an existing office at the premises of ICS-FORTH. Two sides were blocked by brick walls and one by a glass window, while the facet was not separated from the rest of the office by any physical barrier. The space was furnished with a meeting table, seven chairs and a 60 inches LCD TV. The key requirements given to the demonstrator development team were the following: 1. Technological “transparency”: Ideally, the space should remain “as is”. In other words, no noticeable alterations should be made to the pre-existing office environment. The technological components used should be hidden within the surrounding environment (e.g., walls, ceiling, and furniture). 2. Seamless experience: The demonstrator should provide the feeling of a versatile, multi-faceted tool, rather than a loose collection of various applications. 3. Robustness: The entire system should work consistently and without failures even in less favorable or controlled conditions (e.g., light changes, large number of visitors, and accidental use of equipment).
Rapid Prototyping of an AmI-Augmented Office Environment Demonstrator
399
4. Budget: Only low-cost, everyday technological components should be used with the total cost kept below 10.000 Euros. 5. Time limit: The demonstrator should be up and running in about one month. 3.2 Design and Development Process Due to the extremely tight time schedule, a fast-paced iterative design and development process was followed. A development team was set-up comprising a dozen people with complementary roles and skills, ranging from interaction design and software development to electronic engineering and carpentry. For the first couple of days an initial concept formation and brainstorming session was held, where alternative scenario ideas and candidate technologies were discussed. After that, a one-day feasibility analysis (reality-check) followed, combined with an informal qualitative evaluation of each suggestion, using five selection criteria: (a) relevance to the demonstrator’s goals; (b) feasibility in the available time-frame; (c) immediate availability of required technological components; (d) pre-existing knowhow; and (e) robustness and fail-safety. As an outcome of this phase, some of the suggested ideas were eliminated, some were combined and a few new ones came up. The “surviving” ideas were prioritized using an ad hoc weighted algorithm calculating their conformance to the aforementioned criteria and were then assembled in the form of a narrative walkthrough scenario. Additionally, several sketches were created illustrating: (a) the envisioned layout of the required technological components in the available space, and (b) the indicative interaction behavior and correspondent user interfaces. This material (scenario plus sketches) was used as a shared roadmap for the team, Subsequent steps in development process entailed interleaved activities for hardware installation, software programming and space construction, with daily partial integration and testing phases. In parallel to these activities, a monitoring and design / re-design process was running that (taking into account interim progress, problems, and new ideas, as well as the initial set of requirements) continuously updated the design documents and goals. The last few days of the available time were dedicated to exhaustively testing and “debugging” the demonstrator as a whole. 3.3 Technological Infrastructure The technological infrastructure integrated in the office space is illustrated in the diagram provided in Fig. 1. The top part of the diagram provides a side view of the entire space, while the bottom part is a top-down view of the meeting table. More specifically, the following components were employed: 1. Deskpad: An ordinary leather deskpad that can double its size when opened. Inside the top part of the deskpad an RFID tag has been concealed. 2. Projector: A small projector (1024x768), hidden inside the room’s ceiling. It is used to project information on the deskpad’s surface. 3. Color camera: Used to track: (i) the position of objects placed on the deskpad; and (ii) whether an arm has been extended to the left or right of the deskpad.
400
D. Grammenos et al.
Fig. 1. Overview diagram of the installed technological infrastructure (top part: side view of the entire space; bottom part: top-down view of the meeting table)
4. Wiimote: A remote controller of Nintendo's Wii console, used to track the position of the IR pens on the table, following the approach suggested in [6]. 5. Computer-controlled lights: Neon lights that can be turned on / off and dimmed at any intensity using the DMX communications protocol. 6. e-Frame: A 19 inches touch screen embedded in a custom-made wooden frame, in order to resemble a typical painting. 7. TV: A 60 inches flat screen TV. 8. Tiltable camera: A table-mounted camera that can zoom, rotate and tilt. 9. Microphone: A cardioid condenser table top microphone. 10.PCs: Four average Core Duo PCs. One is used by the computer vision software, while the rest drive the three available displays (e-frame, TV, projector). 11.Switches: Seven touch buttons located underneath the table. Each button corresponds to a chair position (i.e., 3 on each side and 1 at the head of the table). 12.Distance sensor: An ultrasonic distance sensor concealed inside a pen holder. 13.Speakers: Two typical USB PC speakers located underneath the table surface. 14.RFID readers: Two RFID readers located underneath the table surface. One at the center of the deskpad and the other near its top (when it is open). 15.IR pens: Two ordinary whiteboard pens (black and red) that their tip and ink have been replaced by an infra-red LED and a battery respectively. Additionally they have been equipped with a small pressure (turn-on) switch and a round RFID tag glued for identification purposes.
Rapid Prototyping of an AmI-Augmented Office Environment Demonstrator
401
16.Mobile phone: A Bluetooth-capable mobile phone with an embedded digital camera and an RFID tag attached. 17.RFID-augmented objects: An identity card, a leaflet, an envelope and three rectangular paper cards. 3.4 Software Modules The demonstrator’s software modules follow a simple service-oriented architecture in order to support its infrastructure. The infrastructure’s entry point is the Office manager, which communicates and interacts with a set of peripheral services and applications using a custom, simple, string-based protocol over TCP/IP connections. Arguably, the use of a more sophisticated communication platform, would allow for a more robust and flexible infrastructure. However, since the project’s needs for remote communications were modest, the implemented approach was considered as the most effective solution, given the project’s tight schedule. The main software modules that comprise the demonstrator will be presented in the following sections. 3.4.1 Office Manager The Office Manager (or in short OM) is responsible for realizing the main table user interface and “orchestrating” the overall interaction process. Depending on the current context of use, as well as user actions and system events, it decides whether to run or suspend specific applications. Additionally, it keeps track of all context parameters that the various applications may need in order to adapt their content or behavior. 3.4.2 Computer Vision Subsystem In order to be able to support natural interaction, a computer vision subsystem was integrated into the overall system. This subsystem utilizes a camera located to the office ceiling, overlooking the table surface. The main use of the vision system was to detect occurrence of the user’s hand over predefined areas on the desktop, as well as, to locate objects placed on this desktop. When the user’s hand is placed above a predefined area (e.g., left-hand side of the deskpad) a software event is generated. To achieve the above functionality, a model of the background (desktop) is a priori constructed. The occurrence of the user’s hand above this area is achieved by detection of the color change within the corresponding image region, based on a background subtraction process [7]. The method operates at a rate of approximately 20 frames per second, on a conventional PC for 480×640 images. Furthermore, to achieve robust behavior against global changes of illumination, the background model is frequently updated, based on the color values of the pixels that are unoccluded by the user’s hand. In the event of a dramatic change of illumination (such as switching all the room lights on or off) the system requires approximately three seconds for retraining, during which events are suppressed, to avoid reporting spurious events. Additionally, using the above functionality, objects that are placed on the deskpad are located, by extracting the blob of active pixels that is formed in the background image. In this way, when an object (such as the mobile phone or the leaflet) is placed on the deskpad, its location is estimated through the centroid of this blob. A transfer function (homography) is then used to translate the location of this blob from image coordinates to deskpad coordinates.
402
D. Grammenos et al.
3.4.3 Wireless Photo Manager One of the concepts for experimentation was the seamless merging of the digital and the real world [5], or in other words, combining the notions of Ambient Intelligence and Augmented Reality. In this context, a small application was built for easily transferring and managing digital photos. This application works as follows: 1. The user takes a photo using a mobile phone. The choice of the mobile phone instead of a camera is motivated by two facts, (a) it is more readily available as people carry it with them all the time and (b) typically, it has built-in wireless communication capabilities. 2. When the user places the phone on the deskpad, its RFID tag is read by the RFID reader and, using Bluetooth, the photo is transferred to the PC that the OM is running. At the same time, the OM requests the computer vision subsystem to pinpoint the exact position of the phone on the deskpad. Based on this information, the OM overlays a green tick mark (9) over the phone (using the ceiling projector) and, through speech synthesis, announces that the phone was recognized and is ready to be used. 3. The user can now use the IR pens to drag the photo out of the phone and onto the leather deskpad (see Fig. 2a). If the photo is dragged outside the top edge of the deskpad, then it appears on the TV screen (see Fig. 2b). If it is dragged outside the left side, then it appears on the e-Frame (see Fig. 2b). If the same action is performed during a video conference, then, the photo instead of appearing on the TV, it is transferred to the screen of the remote participant (see Fig. 2c). 3.4.4 Multi-display PowerPoint Presenter A typical task performed during office meetings, is having a PowerPoint presentation. Thus, a related application was integrated to the demonstrator that works as follows: 1. The user places on the deskpad an RFID-augmented artifact, related to a specific presentation. For our demo, an ICS-FORTH leaflet was used. 2. When the RFID reader identifies the tag, the OM dims the ceiling lights over the table and starts the presentation on the TV screen (Fig. 3a). A related notification message is also spoken. If the user removes the leaflet from the deskpad, the presentation is also projected on its surface, so that the user does not need to read it from the TV screen located a few meters away (Fig. 3b).
(a)
(b)
(c)
Fig. 2. (a) Dragging the photo out of the mobile phone; (b) sending to it the TV and the e-Frame; (c) sending it to a colleague during video-conference
Rapid Prototyping of an AmI-Augmented Office Environment Demonstrator
(a)
(b)
403
(c)
Fig. 3. (a) Putting the leaflet on the deskpad; (b) Taking the leaflet off; (c) Opening the deskpad
3. The user can browse the presentation slides by using bare hands. If one hand is placed to the right-hand side of the deskpad, the presentation advances to the next slide. If it is placed to the left side, the presentation moves to the previous one. 4. If one of the pens is placed on the deskpad, the user can annotate the current slide (e.g., write or draw on it) using the respective pen’s color. 5. If the deskpad is opened during the presentation, then, on the top part of it, the notes associated with the slide are shown (Fig. 3c). If the deskpad is closed, then the notes are hidden again. 6. When the presentation ends, lights are restored to their previous state. 3.4.5 Video Conference Another typical scenario of use of meeting rooms involves Internet-based video conferencing. In this context, a relevant application was developed using a third party commercial framework by LEADTOOLS multimedia1. A problem faced at this point was that there can be up to seven people around the table and just one camera. To overcome this problem a tiltable camera was used, along with touch switches mounted in front of each available seat. By pressing a switch the camera turns, zooms and focuses onto the specific position. Camera control was achieved through the serial port of the camera using the VISCA protocol by SONY. 3.4.6 Informative Art Display Presenting dynamic information, in a subtle and aesthetically-pleasing way, without obstructing the users’ primary task was another of the dimensions to be explorer. In this context, an informative art display [9] was developed for presenting “live” e-mail related information. The key idea of an informative art display is rooted in the Tangible Bits approach [8] that employed “ambient” media in order to subtly display and communicate information. This concept was later applied to the domain of dynamic paintings following several different approaches (e.g. [9, 10]). In our case, we followed the approach suggested by [10] where specific information semantics are mapped to some parts of an existing painting. The painting selected was “the Birth of Venus” by Sandro Botticelli. The Informative art display initially presents a view of the original painting from which the flowers have been removed. The display tracks 1
http://www.leadtools.com/SDK/MULTIMEDIA/Multimedia-LE.htm
404
D. Grammenos et al.
an e-mail account and, depending on the number and type of the incoming e-mails makes some painting elements appear (or disappear). For example, whenever a new message arrives a flower is added, messages from a list of colleagues appear as oranges on the tree, virus-infected messages as sharks circling Venus, etc. 3.4.7 e-mail Manager The informative art display presented in the previous section, is responsible for presenting information regarding the number and type of incoming e-mails. In order to allow users to also read and respond to the content of these e-mails, another application was developed that also employs the alternative interaction techniques supported by the installed infrastructure. Through the e-mail manager, the user can browse incoming e-mails by using her hands, manage them (e.g., delete, organize) using the IR pens, and even write new e-mails using a Bluetooth laser keyboard. When the deskpad is closed, the e-mail manager presents just a list of the messages along with sender and reception time information. When the deskpad is open, the content of the selected e-mail is also visible. The application was implemented using KoolWired.Net2, a publicly available open source library. 3.4.8 Game In addition to the applications elaborated above, an entertaining activity was developed, with a two-fold purpose. In term of the process, it was an experiment on reusing the existing infrastructure for a different application, while, in the context of an office environment, it can be considered as a “tool” helping to release tension after a meeting. The theme of the game is a cowboy duel in the far west. On each side of the screen there is a cowboy that can move up and down and shoot. Players around the table control the cowboys using the switch buttons in front of their seat. Each side of the table controls the corresponding cowboy. Whenever a cowboy manages to eliminate his opponent, he gets a point. If the switch at the head of the table is pressed, a stagecoach that can shoot either way passes through the middle of the field. 3.4.9 Various Small-Scale Specialized Software Modules In addition to the aforementioned modules, a number of smaller-scale specialized modules were developed, offering some required low-level services, such as: 1. User identification: When user information is received, this service notifies the OM for the user’s access rights. 2. RFID reading: When an RFID-tagged object is either placed or removed from the table’s surface, it resolves the RFID tag to a user-friendly, semantically sensible character sequence (using a text file as a database) and notifies the OM. 3. Video playing: Provides the capability to play video files using Microsoft’s DirectShow framework. The OM, utilizing the functionality offered by the computer vision subsystem and the distance sensor, enables the user to pause / resume the video and change its volume by using the hand gestures. 4. Image presentation: Can receive an image through a TCP/IP connection and display it in any screen. 2
http://koolwired.com/solutions/
Rapid Prototyping of an AmI-Augmented Office Environment Demonstrator
405
5. Bluetooth communication: The OM uses the Obex (http://www.irda.org) protocol through the “In the hand” (http://inthehand.com) open source library for .NET to retrieve image files from Bluetooth-enabled devices, such as the mobile phone. 6. IR position detector: The functionality of the IR position detector is offered by the open source program described in [5] that translates IR position information into mouse movement events. 7. Speech synthesis: Uses the Microsoft Speech API for generating synthetic speech. 8. Lights controller: Uses the DMX SDK by Velleman Inc. (http://www.vellemanusa.com) to enable the OM to control the room’s lights. 9. Distance sensor: Calculates the distance to the nearest object through a serial connection to an ultra-sonic distance sensor and delivers it to the OM.
4 Conclusions and Future Work The demonstrator was fully functional within the specified period of one month and, following repeated “shows” to a variety of audiences, it consistently elicited very positive responses. On a purely practical basis, the demonstrator helped us get a better understanding of the challenges that one needs to face when trying to set up an AmIaugmented environment. Furthermore, it offered us valuable, hands-on experience, and a tangible means for constructing, visualizing and evaluating concepts and envisaged scenarios towards the creation of a fully-working future “smart” office environment. In this context, some of the key findings were the following: 1. The simple, string-based protocol used, was adequate for the demonstrator purposes, but for a large-scale real-life AmI environment, the use of a sophisticated middleware communication platform is absolutely required. 2. The most common problems faced were related to hardware failures. An automated testing and failure identification mechanism would greatly improve the overall system robustness and “service” time. 3. All computers used MS Windows. A frequent problem was that messages for automated updates (and related restarts) would pop-up almost daily. 4. Bluetooth was quite cumbersome to use. A basic reason was that currently several different stacks exist that can not always “co-exist” on the same computer, but often specific devices work only with a certain stack. 5. Sun beams include infrared light that can totally mess up an IR tracking system. Based on the outcomes of this rapid prototyping activity, we are currently in the process of developing “Smart Office 2” with many new features, including: multiuser support for collaborative tasks, integration of commercial software applications (e.g., MS Word, PowerPoint), use of biometrics for user identification, speech-based user recognition and localization, and sensor-augmented chairs. Acknowledgements. This work has been supported by the FORTH-ICS internal RTD programme “AmI: Ambient Intelligence Environments”. The authors would also like to thank George Paparoulis, Spiros Paparoulis and Thanassis Toutountzis for their invaluable help in the installation and integration of the required technical infrastructure and Anthony Katzourakis for his graphics work.
406
D. Grammenos et al.
References 1. Weiser, M.: The computer for the 21st Century. Scientific American 265(3), 94–104 (1991) 2. Johanson, B., Fox, A., Winograd, T.: The Interactive Workspaces Project: Experiences with Ubiquitous Computing Rooms. IEEE Pervasive Computing 1(2), 67–74 (2002) 3. Chou, P., Gruteser, M., Lai, J., Levas, A., McFaddin, S., Pinhanez, C., Viveros, M., Wong, D., Yoshihama, S.: BlueSpace: Creating a Personalized and Context-Aware Workspace. IBM Research Report, RC22281 (W0112-044) December 11 (2001) 4. Prante, T., Streitz, N., Tandler, P.: Roomware: Computers Disappear and Interaction Evolves. Computer 37(12), 47–54 (2004) 5. Pingali, G., Sukaviriya, N.: Augmented collaborative spaces. In: Proceedings of the 2003 ACM SIGMM Workshop on Experiential Telepresence, ETP 2003, Berkeley, California, pp. 13–20. ACM, New York (2003) 6. Lee, J.C.: Hacking the Nintendo Wii Remote. IEEE Pervasive Computing 7(3), 39–45 (2008) 7. Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. In: ICPR, pp. 28–31 (2004) 8. Ishii, H., Ullmer, B.: Tangible bits: towards seamless interfaces between people, bits and atoms. In: Pemberton, S. (ed.) Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1997, Atlanta, Georgia, United States, March 22-27, 1997, pp. 234–241. ACM, New York (1997) 9. Redstrom, J., Skog, T., Hallnas, L.: Informative art: using amplified artworks as information displays. In: Proceedings of DARE 2000 on Designing Augmented Reality Environments, DARE 2000, Elsinore, Denmark, pp. 103–114. ACM, New York (2000) 10. Ferscha, A.: Informative Art Display Metaphors. In: Stephanidis, C. (ed.) UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 82–92. Springer, Heidelberg (2007)
Challenges for User Centered Smart Environments Fabian Hermann, Roland Blach, Doris Janssen, Thorsten Klein, Andreas Schuller, and Dieter Spath Fraunhofer Institute for Industrial Engineering Nobelstr. 12, D-707569 Stuttgart, Germany {fabian.hermann, roland.blach, doris.janssen, thorsten.klein, andreas.schuller, dieter.spath}@iao.fraunhofer.de
Abstract. Future smart environments integrate information on persons, ambient resources and objects. Many rich visions of smart environments have been developed, and current technological and market developments promise to bring aspects of these visions of into everyday life. The paper delineates the role of mobile and decentralized communities, semantic technologies, and virtual reality. Key challenges for a user centered development of smart environments are discussed, in particular the controllability of personal identity data, reliable user interfaces for autonomous systems, and seamless interaction in integrated virtual and physical environments. Keywords: smart environments, adaptive systems, system autonomy, mixed reality, social software, semantic technology, digital identity, privacy, user controllability.
1 Introduction Future living environments are envisioned to be smart: environments that are seamlessly networked provide access to ambient resources, devices and services. They integrate information on persons, objects and services (IST Advisory Group, 2001; 2003). Examples of public and corporate research include the project Oxygen at the MIT (Massachusetts Institute of Technology, 2004) that provided a pervasive environment which integrated embedded and user devices on a freely available network platform. Phillips works on the vision of a natural and comfortable relation of humans to their adaptive technological environment and runs several projects like e.g. “CAMP“ (Context Aware Messaging Platform; Turner & Groten, 2003) that demonstrates a bluetooth-based environment with users and ambient objects publishing their services, offerings, or interests. IBM’s Pervasive Computing Lab (IBM, 2004) shows prototypes of pervasive and ubiquitous technologies, e.g. the “Service Gateway” allowing the physical environment to be controlled automatically or manually. A concept widely used for describing mobile interaction and information flow in smart environments is an aura or sphere surrounding persons or smart objects. The personal sphere publishes information about the user, retrieves information, but also J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 407–415, 2009. © Springer-Verlag Berlin Heidelberg 2009
408
F. Hermann et al.
protects the user from undesirable requests or spam. This metaphorical description of information has been used by several projects, e.g. “Auranet” (Schneider et al., 2003) “Digital Aura” (Ferscha et al., 2004), or “Digital Aura” at the University of Tampere (Lugmayer et al., 2006), the “Aura Approach” (Sousa, 2001), “Digital Territory” or “Digital Bubbles” (Beslay & Hakala, 2005). These often far developed prototypical realizations in research labs demonstrate potentials of smart environments. Key aspects of these approaches are: • Highly integrated and seamlessly available data, services and resources in public and private environments • Exchange of information, access rights of objects, ambient resources and devices • Exchange of personal information between several users and environment • Location-based availability of nearby entities, location-based UIs for services, data and applications • System “intelligence”: adaptivity and to some degree autonomous system decisions, e.g. on the use of ambient systems or data exchange
2 Trends and Technologies Realizing Smart Environments While many rich visions of smart environments have been developed and implemented in research labs, realization in everyday life is restricted to clearly defined business solutions showing limited aspects. Examples are applications for networking events allowing visitors to contact and locate nearby people on proprietary devices based on interest profiles (e.g. nTAG, 2009). Some solutions like SpotMe (SpotMe, 2009) are provided as a service and must be integrated into event planning and configuration processes. On a broad basis, however, some current technological and market developments promise to bring the vision of smart environments a broad basis to everyday life. 2.1 Mobile Personal Information in Communities Besides the integration and accessibility of physical and virtual objects and resources, the vision of smart environments also covers the ubiquitous availability of social information anytime and anywhere. This aspect is about to be realized by mobile internet communities: Social networks have been continuously growing and popular platforms are amongst the most visited web sites (Universal McCann, 2008). The most successful like FacebookTM or MySpaceTM are established worldwide while others address national or regional markets like e.g. StudiVZTM in Germany. In the last two years, there has been a steady increase of user engagement, in particular adding personal profiles in social networks (Universal McCann, 2008). A further lasting increase is expected for the next years (Cuhls, Kimpeler, 2008). Web-based communities allow users to use their functionalities on current mobile devices, e.g. to post and work with the location information. Typical mobile communities (see e.g. aka-aki networks, 2009, or for an example from university research Active Campus, Griswold et al., 2004) enable users to find nearby community members to see their profile information and contact them. The mobile features often work on the basis of bluetoothTM (like WhosHere, myRete, 2009) or use e.g. GPS like typical geotagging applications that allow storing and retrieving of content for locations.
Challenges for User Centered Smart Environments
409
A current trend is the exchange and integration of diverse data from communities. Initiatives like OpenData (OpenID Foundation (2009) try to establish standards for the exchange between platforms and devices. Several of the popular social networks offer open APIs, in particular FacebookTM but also FriendConnectTM and OpenSocialTM by GoogleTM. 2.2 Data Integration and Semantic Technologies An important aspect of the vision of smart environments is not only the availability of information on persons, things and services but also the smart integration and reasoning by systems often called intelligent (Harper, Rodden, Rogers, & Sellen, 2008). The use of metadata and semantic annotations allows applications to support advanced processing of data and automatically draw conclusions that are not directly obvious. Using suitable semantic annotations, data from different sources can be integrated and reused in different contexts. E.g. data formats like FOAF (FOAF, 2009) promise to improve the ability to share and exchange social contact information amongst different applications and platforms effectively. Initiatives like SIOC (Brickley et al. 2008) intend to make user generated content from arbitrary resource automatically applicable in other systems and applications. Linking Open Data (Bizer, Heath, & Berners-Lee, 2008) is an initiative by the W3C that already integrates semantic information from a variety of resources like a semantically enriched version of Wikipedia (DBPedia) to semantic databases of books, countries or social community applications. It is expected that semantic applications will develop on a broad base in the next years (The New Media Consortium, 2009). The growing adoption of metadata and semantic enrichment of data is seen as one of the major directions for upcoming internet development towards an ubiquitous web that connects intelligence (Mills, 2008). Information visualization is supported by semantic augmentation of data: applications can visualize and arrange data structures more effectively. Today’s data visualization techniques for networks and three-dimensional displays of semantic networks enable the user to navigate more easily and effectively through semantically interlinked network and foster interpretation of them (Erétéo, Buffa, Gandon, Leitzelman, & Limpens, 2009). Rule based reasoning on semantic data can lead to more complex conclusions on existing data as well as draw conclusions beneficial for intelligent system behavior (Gruber, 2008). Matching algorithms operating on semantic metadata can furthermore facilitate communication and fast evaluation between agents and databases, peers in a p2p network and systems utilizing webservices (Shvaiko & Euzenat, 2005). 2.3 Virtual and Mixed Reality The field of ambient intelligence, addressing e.g. the technological integration of smart physical environments, recently is connected to mixed reality (first called augmented reality). This research field integrates recently also aspects of tangible interfaces and ambient intelligence. The integration of physical and virtual environments is an ongoing process which has started with publicly available computer systems in our environments as e.g. large display walls or information kiosks as early examples of access points to the digital environments. The major challenge was formulated
410
F. Hermann et al.
already in the early days of virtual reality research⎯the question is, how real objects, persons or locations and digitally created data or environments can be superimposed in space and time such that users have a coherent experience of the composed environment. Two examples may show the state of the art for integration virtual and physical environments: The research project “Office of the Future” worked on the integration of users in office telepresence environments. Here, users are captured and digital content is superimposed (Raskar, 1998; Tyler, 2007). The MATRIS project (Chandaria, 2007) shows mobile augmented reality systems which are able to register the location of the environment on a mobile device via model based video tracking. Here the van is recognized and the outline can be overlaid spatially correct which can be seen on the user’s display. The recognition uses the geometrical model data of the van to extract the location from the video. 2.4 Multi-modal UI Technologies and Multi-purpose Devices While data and resource integration is the precondition for user-centered content and functionality in smart environments, interaction devices are visible and tangible parts of user interfaces. Often, the development here goes together with new interaction paradigms replacing existing ones. A well known historic example is the windows and desktop interface metaphor which has afforded pointing devices which lead to the development of the computer mouse. Today, the introduction of multitouch displays seems to replace hardware-keyboards on mobile consumer devices offering a completely new interaction style. Typically, great advantages are interweaved with disadvantages (e.g. efficiency of text input on multitouch can be doubted; User Centric, 2007), which makes old and new styles hard to compare. In advance, it is often not obvious whether new paradigms and the associated devices will be adopted or whether they disappear after a phase of experimentation hype. Often, the mass market⎯like in the last decade the entertainment market⎯has proven to be the most valid evaluator of the acceptance of innovations in interaction. For interaction paradigms like multimodal interfaces, 3D and VR systems, the consumer markets seem to push the development of interaction and interfaces. In the last decades, consumer products for 3D-systems often failed gain a broad attention in the consumer markets as they could not reach sufficient sales. Therefore, it is one of the most interesting HCI questions for the 3D and VR systems whether some UI standard will establish in the next years, similar to mouse and keyboard for GUIs, or if many different devices will be used side by side each specialized on specific tasks as we see in the physical world. An example is the Nintendo WIITM game console which introduced spatial interfaces metaphors in the mass market and had enormous impact on the availability in terms of economics, familiarity and expectation standards. For mobile interaction, integrated “3rd generation” devices offer adapted user interfaces allowing the user to choose interaction modes e.g. keyboard, speech or gesture input. Also on the level of operation system and application software new interaction modes are offered like e.g. multi-touch systems which enables collaborative systems on large displays or new paradigms as two-finger zooming on individual systems as e.g. in the Apple iPhoneTM.
Challenges for User Centered Smart Environments
411
Taken together, three categories of novel interaction devices for smart virtual and physical environments do emerge: • Video and camera systems to capture light and images of the environment • Gesture and motion tracking systems to capture the dynamic activity of the environment and the user • Mobile multi purpose devices (smartphones) with buttons, joysticks and trackpads integrated multimodality
3 Challenges for a User Centred Design of Smart Environments Smart environments will offer integrated information on persons as well as objects and services, “intelligent” means to analyze and manage these data, superimposition of physical and virtual spaces, resources and devices. Multi-modal UI technologies will be available to realize user interfaces. Besides technological infrastructure issues (like coverage of network access and positioning, as well as availability of ambient resources and security infrastructure) there are key challenges to realize user-centered interaction and functionality in smart environments: 3.1 User Controllability of Personal Data Not only because many envisioned functions of smart environments realize personalized services and UIs, publishing the user’s identity and the exchange of personrelated data will be necessary to bring forward the value of smart environments. Current examples typically rely on centralized infrastructures which⎯brought into the market⎯makes it likely to have hosting companies following data-centered business models. Current social web platforms are strongly criticized (see e.g. 30th International Conference of Data Protection and Privacy Commissioners, 2008) because of insufficient security-standards and business terms and conditions (Fraunhofer SIT, 2008). Users risk a loss of privacy because of permanent storage of personal data, profiling and address trading by hosts etc. (Hildebrandt, 2008). On this background, a clearly user-controlled approach to identity and profile management is demanded (Hansen, 2008), together with decentralized structure of community data (Yeung, Liccardi, Lu, Seneviratne, Berners-Lee, 2009; The DataPortability Project, 2009). The challenge will be to establish standards and tools allowing the user to control, monitor and manage personal data on the one hand-side but also enabling successful business-models. 3.2 Reliable User Interfaces for Complex, Mobile and Adaptive Systems Smart environments will provide most complex functionality and data structures including entities like physical and virtual objects, personal profiles and data sets, time and location, persons and groups, communities and areas of life and within them various sub-models like e.g. projects or organization structures for business contexts. Vast data from various sources are necessary to achieve a rich and valuable functionality. The lever to solve UI issues for this complexity will be a combination of tailored models, automatic data matching and UI mechanisms relying on sound user dialogues from unobtrusive system recommendations to partly automated system decisions
412
F. Hermann et al.
controllable and adaptable by the user. Information visualization based on augmented data will help the user to navigate and specify data requests and broadcasting. While many visions of ambient intelligence assume interfaces to vanish and become invisible, they should be considered as empowering the user to use system intelligence and to control complexity. One of the great HCI challenge will be, to make the user learn to use and trust on system intelligence. And this will remain a userinterface issue: “Good” user interfaces will offer easy-to-use defaults and templates for system behaviors that perhaps will be nearly effortless to use but also simple. User interfaces will be in charge to guide the user to achieve more sophisticated system behavior, e.g. learning and training from user’s situational reactions, careful distinction between real autonomous system behavior vs. helpful recommendations. User interfaces have to provide overview, feedback and history on system decisions, data footprints left behind, combined with control and undo function as far as possible. Finally, user interface will still offer explicit configurations for those who are keen and to control and tailor the system functionality in depth. 3.3 Integration of Virtual and Physical Environments The integration of virtual and physical environments has improved a lot in the last decade, however in the near future there will be still major issues to handle the complexity of the real world. There are mere technological challenges like real-time registration and tracking of motions and gestures and the environment on an arbitrary accuracy everywhere. The fusion of different available information based on different technologies as e.g. GPS, mobile phone location and video based tracking. The synchronized and real-time rendering of complex data and information (visual, auditory, haptic, etc.) to create a coherent experience for the user is still a major challenge as well as the real-time content delivery. Core issue for seamless and consistent interaction is the realization of aware environments which can connect the physical users and the environment with virtual worlds. The ambient artifacts as e.g. displays have to be aware of its surroundings, the users in proximity, etc. Ranging from fixed to mobile systems these challenges have obviously different influence on the complexity and feasibility of the solution. Furthermore we need adequate interaction and interface concepts which guide the users throughout the mixed reality world. Important components will be multimodal interaction integrating different interaction modes and senses which to a usable whole for input and output as well as mobile interaction including these issues en-route with reduced computing resources, bandwidth and energy capacity.
4 Next Steps In a current research project1, we address these key challenges by an integrated concept realizing a user-centered approach and prototypical realization for an integrated interaction in smart and social environments. Central to this project approach is a user controlled identity kernel and user utility, integrating of personal data and interaction or control of virtual, physical environments. It will allow the user to manage and edit 1
Funded by the Fraunhofer Gesellschaft.
Challenges for User Centered Smart Environments
413
his identity and rich profile information, broadcast and retrieve personal information to other users, “ambient agents” offering their services in particular environments, as well as location-free internet services. External accounts, e.g. in social web systems, will be accessible and synchronized, digital footprints can be monitored and managed. The user component will be able to connect to a middleware enabling the user to access and use ambient resources ad-hoc and consistently in virtual and physical environments. From a system architecture point of view, the interface between user-controlled sphere and external or environmental systems is planned to be a clear and explicit border: Exchanges between them have to be controlled by an informed user imposing the responsibility to describe their request for personal data, the resulting value for the user and further usage of data. The platform and user-component is planned to rely strongly on open standards in order to allow integration and linkage to existing applications.
References 1. 30th International Conference of Data Protection and Privacy Commissioners, Resolution on Privacy Protection in Social Network Services, Strasbourg (October 17, 2008), http://www.privacyconference2008.org/index.php?page_id=199 (18.12.2008) 2. aka-aki network. Website (2009), http://www.aka-aki.com/ 3. Beslay, L., Hakala, H.: Digital Territory: Bubbles. Institute for Proscpective Technological Studies, Technical Report (2005), http://cybersecurity.jrc.ec.europa.eu/ docs/DigitalTerritoryBubbles.pdf 4. Bizer, C., Heath, T., Berners-Lee, T.: LINKED DATA: Principles and State of the Art. 17th International World Wide Web Conference, W3C Track @ (WWW 2008) (2008), http://www.w3.org/2008/Talks/WWW2008-W3CTrack-LOD.pdf (12.1.2009) 5. Brickley, D., et al.: Semantically-Interlinked Online Communities (SIOC) Ontology Submission Request to W3C. W3C (2008), http://www.w3.org/Submission/2007/02/ (1.12.2008) 6. Chandaria, J., Thomas, G.A., Stricker, D.: The MATRIS project: Real-time markerless camera tracking for augmented reality and broadcast applications. Journal of Real-time Image Processing 2, 69–79 (2007) 7. Cuhls, K., Kimpeler, S.: Zukünftige Informations- und Kommunikationstechniken. MFG Stiftung Baden-Württemberg (2008), http://www.fazit-forschung.de 8. Davis, M.: Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibilion Dollar Market Opportunities. Project 10X (2008), http://www.project10x.com (28.01.2009) 9. Erétéo, G., Buffa, M., Gandon, F., Leitzelman, M., Limpens, F.: Leveraging Social data with Semantics. W3C Workshop on the Future of Social Networks (2009), http://www.w3.org/2008/09/msnws/papers/ ereteo_et_al_2008_leveraging.html (27.02.2009) 10. Ferscha, A., Hechinger, M.R., Mayrhofer, R., dos Santos Rocha, M., Franz, M., Oberhauser, R.: Digital Aura. University of Linz, Institute for Pervasive Computing, Technical Report (2004)
414
F. Hermann et al.
11. Fraunhofer Institut für Sichere Informationstechnologie SIT. Privatsphärenschutz in Soziale-Netzwerke-Plattformen. Fraunhofer SIT, Darmstadt (2008), http://www.sit.fraunhofer.de/fhg/Images/ SocNetStudie_Deu_Final_tcm105-132111.pdf (29.12.2008) 12. Griswold, G.W., Shanahan, P., Brown, S.W., Boyer, R., Ratto, M., Shapiro, R.B., Truong, T.M.: Active Campus: Experiments in Community-oriented Ubiquitous Computing. Computer 37(10), 73–81 (2004) 13. Gruber, T.: Intelligence at the Interface: Semantic Technology and the Consumer Internet Experience. Semantic Technologies Conference (SemTech 2008) (2008), http://tomgruber.org/writing/semtech08.pdf (28.01.2009) 14. Harper, R., Rodden, T., Rogers, Y., Sellen, A.: Being Human: Human-Computer Interaction in the Year 2020. Microsoft Research Ltd. (2008) 15. Hansen, M.: Marrying Transparency Tools with User-Controlled Identity Management. In: The Future of Identity in the Information Society. Springer, Boston (2008) 16. IBM Pervasive Computing Lab. IBM’s Advanced PvC Technology Laboratory (2004), http://www.ibm.com/developerworks/wireless/library/wi-pvc/ (12.12.2008) 17. IST Advisory Group. Scenarios for ambient intelligence in 2010. Final Report. European Commission (2001), ftp://ftp.cordis.lu/pub/ist/docs/istagscenarios2010.pdf 18. IST Advisory Group. Ambient Intelligence: from vision to reality. Draft Report (2003), ftp://ftp.cordis.lu/pub/ist/docs/ istag-ist2003_draft_consolidated_report.pdf (1.2.2009) 19. Johnson, T., Gyarfas, F., Skarbez, R., Towles, H., Fuchs, H.: A Personal Surround Environment: Projective Display with Correction for Display Surface Geometry and Extreme Lens Distortion. In: IEEE Virtual Reality 2007, Charlotte, NC (2007) 20. Lugmayr, A., Saarinen, T., Tournut, J.-P.: The Digital Aura - ambient mobile computer systems. In: 14th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2006 (2006) 21. Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory. MIT Project Oxygen (2004), http://www.oxygen.lcs.mit.edu/Overview.html (2.2.2009) 22. OpenID Foundation. OpenID (2009), http://openid.net/ (1.2.2009) 23. Raskar, R., Welch, G., Cutts, M., Lake, A., Stesin, L., Fuchs, H.: The Office of the Future: A Unified Approach to Image-Based Modeling and Spatially Immersive Displays (1998) 24. Schneider, J., Kortuem, G., Preuitt, D., Fickas, S., Segall, Z.: Auranet: Trust and Face-toFace Interactions in a Wearable Community. University of Oregon, Technical Report (2003) 25. Shvaiko, P., Euzenat, J.: A Survey of Schema-based Matching Approaches. University of Trento (2005), http://www.dit.unitn.it/~p2p/RelatedWork/ Matching/JoDS-IV-2005_SurveyMatching-SE.pdf (26.02.2009) 26. Sousa, J.P., David, G.: From Computers Everywhere to Tasks Anywhere: The Aura Approach. School of Computer Science, Carnegie Mellon University, Technical Report (2001), http://www.cs.cmu.edu/~aura/docdir/sg01.pdf 27. The DataPortability Project. DataPortability (2009), http://dataportability.org (15.1.2009) 28. The New Media Consortium. Horizon Report (2009), http://wp.nmc.org/horizon2009/Abrufdatum (26.02.2009)
Challenges for User Centered Smart Environments
415
29. Turner, S., Groten, M.: Implementing the Vision - Ambient Intelligence takes Strides in the Right Direction. Philips Research Password, 17 (2003), http://www.research.philips.com/password/download/ password_17.pdf (27.01.2009) 30. Universal Mc Cann. Power to the People. Social Media Tracker Wave 3 (2008), http://www.universalmccann.com/Assets/ wave_3_20080403093750.pdf (20.12.2008) 31. User Centric. Direct comparison of iPhone and hard-key QWERTY phone owners (2007), http://www.usercentric.com/about/news (20.2.2009) 32. Yeung, C., Liccardi, I., Lu, K., Seneviratne, O., Berners-Lee, T.: Decentralization: The Future of Online Social Networking. W3C Workshop on the Future of Social Networking (2009), http://www.w3.org/2008/09/msnws/papers/ (12.1.2009)
Point and Control: The Intuitive Method to Control Multi-device with Single Remote Control Sung Soo Hong1 and Ju Il Eom2 1,2 User Interface Lab. Digital Media and Communication R&D Center, Digital Media and Communication Business, Samsung Electronics CO., LTD. 416, Maetan-3 Dong, Yeongtong Gu, Suwon-City, Gyeonggi-Do, Republic of Korea 443-742 {ssuper.hong,flip.eom}@samsung.com
Abstract. Remote controls are mainly used to control most of the CE devices in home environment these days. As the number of electronic devices increases in home, each device’s corresponding remote may also be added, and user frequently controls several devices at one time. This situation makes a user feel difficulty in finding desired remote control among many of other controllers. To alleviate this inconvenience, a technique for controlling multiple electronic devices with a single remote, well-known as a universal remote control technique, has attracted attention. Generally, when a user uses the universal remote control, she must input a key code of a desired device. If a user were to control several devices interchangeably, she may hang out pushing the key code for one device after the other. This kind of maneuver can be very tiresome, and it may lead to dropping the usability drastically. This paper propose the hardware, and software structure of Point and Control (PAC), which uses the metaphor of pointing a objective target, to select the device user intend to control. By using PAC, users can easily select and control the target device among many of candidates in real time with just simple behavior. Keywords: Remote control, Universal Remote control, IR LED, IR Image Sensor, Point and Control, PAC, Multi device Control, Concurrent Control.
1 Introduction As many consumer electronics users using several devices concurrently in home environment, they have a difficulty in finding the right remote [1]. For example, if a user wants to play a DVD title with HTS system, she normally uses at least three remotes. Or if a user wants to turn off the DVD player and watch IPTV, STB controller may be needed. This "finding-exact-remote-control" situation can be very annoying to users. To deal with this problem, techniques for controlling multi-device with a single remote, so-called universal remote control are researched lively these days [2]. In general, when a user uses the universal remote control, she inputs a key code of the target CE device. Once key code of the desired device is inputted, the universal remote recognizes which electronic device to control and the user can control it. But when a user wants to change the target device to control, the user must re-input another proper key code. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 416–422, 2009. © Springer-Verlag Berlin Heidelberg 2009
PAC: The Intuitive Method to Control Multi-device with Single Remote Control
417
Key-code inputting step is the fundamental reason of controller’s falling-off in usability. We evaluated several methods that can replace the inputting key code step, and concluded that the most intuitive way is "pointing" remote to the device that the user intends to operate. We named this pointing based multi-device controlling method as Point and Control (PAC). PAC uses IR LEDs and IR image sensor to determine the target device. Each target device has unique IR LED information, and universal remote control is equipped with an IR LED image sensor to read the target device's IR information. When a user points the remote to the target device, the remote retrieves the image of IR LED data, decides what device to control, and finally transmits the proper key code. Since key code inputting has changed to the pointing behavior, a user can control several devices with ease, feeling much more comfortable.
2 Related Work In an early stage of universal remote control, it was developed for controlling various type of electronic devices of same manufacturer’s particular brand, merged each remote buttons to one single remote. As remote made gradual advances and is recognized as one independent device by many users, the initial concept of universal remote, which can control the various type of devices without limitation of controlling different brands, was set up. But these kinds of remotes had their fundamental limitation that whenever a user wants to control the device, she should pass through too many steps, inputting the proper key code of target device. Recently, universal remote control's usage has moved from device-oriented (ex: turn on DVD player, volume up TV) to task-oriented (ex: play DVD, show TV program). Whenever task-based operation is executed by user, universal remote control transfers sequential control information to proper target devices in order to achieve the goal. This method can be very comfortable and a user can achieve her goal by one or two steps, but installation of task-based macro operation can be a burden to general users. On one hand, a coordinate recognition technique by using IR LED and IR image sensor has been researched steadily [3]. Some old method of coordinate recognition system was using markers with pre-defined color in a visible ray range. By sticking marker to the moving object or body, we could recognize the coordinate by retrieved image data. But since this system uses a discrete color space, background color similar to pre-defined one caused frequent errors. By this reason, this could be applied in very restrictive situation. To improve problem, method for using IR LED as alternative marker and retrieve coordinate data by IR image sensor has been researched. By using IR method, we could obtain remarkably low degree of coordinate error ratio. Case of Nintendo's Wii, they used Wiimote and Sensorbar [4]. Wiimote has equipped the IR image sensor, and Sensorbar has equipped IR LED array. IR image sensor gathering IR LED images continuously, and by analyzing acquired images, Wiimote senses the 2 dimensional coordinate, moving TV display's pointer. This Wiimote procedure can offer user experience of direct pointing. Also, by using perpendicular axis and diameter of LED image's each point, Wiimote can sense the rotation and distance between Sensorbar and itself. Sony has submitted some corresponding patent. They stuck IR LED to the remote, and retrieve data with IR image sensor nearby the TV [5]. It is similar concept of previous method, but by using reflective tape segment to user's finger, this can response the user's gesture recognition in case of setting IR image sensor nearby the TV.
418
S.S. Hong and J.I. Eom
3 Point and Control (PAC) As we mentioned, key code inputting step is the main reason that causes the usability of universal remote control to plummet. We propose the method to improve controllability by removing the step of key code inputting and replace it with a more intuitive way. Key code inputting in universal remote control is used as a means for transmitting user's intention that she will control the corresponding device. Without this step, it is impossible to control many devices concurrently with only one remote. This step is very exhaustive, but essential. Conventional universal remote controls understand a user's intention by receiving key-code. On the other hand, PAC remote can catch it only by pointing of the target device. To achieve this, PAC remote and target devices must satisfy three conditions below: 1. Every target device must have its own ID information. 2. When a user points PAC remote to the target device, PAC remote must identify exact one among the whole candidate pool of controllable devices in real time. 3. PAC remote must keep every ID and control code information that it can control. 3.1 Identification of Each Device with IR LED To handle the first condition, we identify every target device by using multidimensional IR LED. By sticking IR LED to the target device, we can retrieve two types of information from each target device. One is position information and the other is frequency information. Positional information can be presented similar to matrix, which contains row and column data. Frequency information implies each IR LED’s flickering data. 3.1.1 Position Information Multi-dimensional IR LED is composed of several IR LEDs, and each IR LED has its position data. In general, positional information can be presented with combination of row and column pair, just like the component of matrix. Figure 1 shows this. Left figure shows the one-dimensional matrix form, and right shows normal matrix form.
Fig. 1. Positional information of IR LED
PAC: The Intuitive Method to Control Multi-device with Single Remote Control
419
Fig. 2. Sequential information of IR LED
We acquired position data by bitmap image which was retrieved by IR image sensor. Image sensor was 640 x 480 resolutions. IR Image sensor’s visual angle may affect result. Our image sensor had 160 degrees viewing angle. We set up HTS and other devices within 2 meter by 2 meters area. PAC wasn’t work proper when the distance between IR LED and IR LED image sensor was closer than 1 meter. If distance is too far, we used upscale processing to correct the data. PAC was worked without any problem between 1meter and 7 meter. 3.1.2 Frequency Information With positional information, each IR LED can represent frequency information by flickering. Figure 2 shows it. We predefine usable frequency like left side of the figure. In this case, number of predefined frequency is six, and each IR LED emits frequency among predefined frequency. To deal with frequency data, we predefined 9 steps of frequency. To represent the frequency step, we adopted the reformed Morse code. Morse code’s short element can be correspond to IR LED’s high status (Which is represented with H), and long element can be correspond to IR LED’S low status (Which is represented with L). We can use this code, but if IR LED blinks repeatedly we may not distinguish 3 and 7. We addressed this problem by adding HLHLH to every frequency. Figure 3 shows final frequency code we used. Table 1. Predefined frequency using Morse code Frequency 1 2 3 4 5 6 7 8 9
Morse code .---..--...-......... -.... --... ---.. ----.
Font size and style HLLLL HHLLL HHHLL HHHHL HHHHH LHHHH LLHHH LLLHH LLLLH
420
S.S. Hong and J.I. Eom
Fig. 3. Predefined frequency adding HLHLH code
Fig. 4. ID information of target device
Convoluting each IR LED’s position and frequency information, each device can have unique ID information. Figure 4 shows the final form of target device. 3.2 Notification of Each Device with IR LED Image Sensor Second condition can be satisfied by adopting IR LED image sensor to PAC remote. By pointing PAC remote to target device, we can retrieve ID information that can be caught by IR image sensor’s visual angle. Figure 5 shows this. Pointing PAC remote, controllable target device within angle can be represented. Controllable candidate is device 220, 210 and 230. Since device 210 is the closest to the center of retrieved image, PAC remote send control code for device 210. Since user using remote with rotating or moving, Retrieved data should be processed with scale and rotate invariant. 3.3 Transmission of ID and Control Code Information of Target Device To achieve last condition, we sequentially transmit necessary information to PAC remote. Fig 6 shows the transmission sequence of target device’s information to the PAC remote. PAC remote has initially paired device’s (which is called main device) ID and control code information. By wired or wireless connection with main device, various CE devices (which is called sub device) can transmit their ID and control code information. After main device receives the sub device’s information, transmits it to the PAC remote. A PAC remote has s storage structure for ID and control code information, and updates its table when the new device is added.
PAC: The Intuitive Method to Control Multi-device with Single Remote Control
421
Fig. 5. Notification of target device between candidates
Fig. 6. Transmission of target device’s ID information and control code information to PAC remote
422
S.S. Hong and J.I. Eom
4 Conclusion Many CE users spend considerable amount of time in controlling several devices at a time. Some of them try to find the exact remote of the target device, and others use a universal remote control and go through complicated steps to get what they want. It is obvious that there is no difference between these two approaches because they both drop the usability. This may give a user bad UX anyway. PAC can be the troubleshooting method compare with these two methods. PAC can be applied for a various type of CE devices like DTV, BD or DVD player, STB, game console, audio player, computer, HTS and mobile devices. By proceeding usability test, some practical issues for enhancing PAC have been revealed. User may need to point the objective when she wants to select the target. But once the selection ends, she doesn’t want to point target anymore, just wants to operate the device without any constraints. We divided remote's mode into two parts. One is pointing and the other is operating mode. Selection for target among many vertically stacked devices is also a problem. Generally, CE devices in home environment are stacked, and in this situation, user may have some trouble in pointing the exact device because inter-device distance is quite narrow. Visual or haptic feedbacks of the target device when they are pointed by the user can be helpful. We are planning to implement the scenario that can be executed device-to-device such as inter-device contents sharing or easy task-based operation.
References 1. Nichols, J., et al.: Huddle: Automatically generating interfaces for systems of multiple connected appliances. In: UIST 2006, pp. 279–288 (2006) 2. Yang, Y.-C., Cheng, F.-T.: Autonomous and Universal Remote Control Scheme. IECON IEEE 3, 2266–2271 (2002) 3. Raskar, et al.: Lighting aware motion capture using photosensing markers and multiplexed illuminators. ACM Transaction on Graphics 26, 3 (2003) 4. Nintendo Co., Ltd.: Video Game System with Wireless Modular Handled Controller, US Patent US20070066394A1. Kyoto, JP (2007) 5. Sony Computer Entertainment Inc: Detectable and Trackable Hand-held Controller, US Patent US20060264260A1. Minato-ku, JP (2006)
New Integrated Framework for Video Based Moving Object Tracking Md. Zahidul Islam, Chi-Min Oh, and Chil-Woo Lee Chonnam National Univeristy, Gwang-ju, Korea
Abstract. In this paper, we depict a novel approach to improve the moving object tracking system with particle filter using shape similarity and color histogram matching by a new integrated framework. The shape similarity between a template and estimated regions in the video sequences can be measured by their normalized cross-correlation of distance transformation image map. Observation model of the particle filter is based on shape from distance transformed edge features with concurrent effect of color information. The target object to be tracked forms the reference color window and its histogram are calculated, which is used to compute the histogram distance while performing a deterministic search for matching window. For both shape and color matching reference template window is created instantly by selecting any object in a video scene and updated in every frame. Experimental results have been offered to show the effectiveness of the proposed method.
1 Introduction Robust and reliable moving object tracking is one of the demanding and challenging task in computer vision problem such as visual surveillance, human computer interactions etc. Particle filters [1] provide a robust tracking framework as they are neither limited to linear systems nor require the noise to be Gaussian. The idea of a particle filter – to apply a recursive Bayesian filter based on sample sets – was independently proposed by several research groups [5], [11]. Our present work evolved from particle filter by improving observation model which is based on DT image map and color distribution. In our previous work [12], a method has been suggested for a parametric video based object tracking using the particle filter. In this technique, we have solved the most two important problems which is faced basically for video based object tracking. The first one is segmentation difficulty which comes from illumination variation and cluttered background and second one is motion complexity of object itself. By using the feature of distinct corners and edges of the target object, the first problem is solved and for the second problem we have modeled the DOF of the object motion in 2D and 3D with some mathematics description. To observe the similarity of a hypothesis model in the image; the image and transformed feature model with a 3D transform matrix, we adopt chamfer matching method. Tracking objects is performed in a sequence of video frames and it consists of two main stages: isolation of objects from background in each frames and association of J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 423–432, 2009. © Springer-Verlag Berlin Heidelberg 2009
424
Md.Z. Islam, C.M. Oh, and C.W. Lee
objects in successive frames in to trace them. Object tracking in image processing is usually based on reference image of the object, or properties of the objects. To start tracking, generally the trackers need to be initialized by an outer component [15]. For example, a human operator can select an object of interest and let the tracking begin. Once initiated the tracking algorithms will conduct based on high correlations of the object motion, shape or appearance between consecutive video frames. But, unfortunately robust and efficient object tracking is still an open research issue. There are two central challenges; first one is visual measurements for tracking objects are not always dependable. To differentiate objects from background clutter, various image cues have been proposed such as object contour [9], edge [7], color distributions [8], [11] etc. And second one is tracking objects in nonlinear dynamic systems is not so easy. The Hidden Markov model provides a potential tool to solve the first difficulty. To address the second difficulty, particle filter (sequential Monte Carlo method) has been proposed extensively in various works [5], [6], and [11]. Shape and color both are very strong cue. The shape features, we use DT image map. In this present work, we are intended to use multi features for tracking. Only one cue sometimes is not enough for robust tracking and every individual cue has some individual limitations. We propose to use such a particle filter with DT image map and color-based image features as integrated framework for moving object tacking. A number of researchers used skin color but this is used only for hand and face tracking and it has some limitations like unique definition of skin color, shadows, occlusions and changing illuminations [5]. According to assumption in [10], the only moving objects in video scene are person. This postulation does not hold for many applications. But in this paper we try to make a general system for tracking any indoor outdoor object like person, car or hand. First we select any object form video scene manually, and then it is considered instantly as a reference image or template. Our system works in adaptive manner by updating template in every frame. The key mechanism of our developed observation model is based on similarity measures by normalized cross-correlation between DT image maps based template and tracking object in a video scene, and at the same time color histogram matching between reference image and target object. The more details we can find in overall proposed system model section. The rest of the paper is organized as follows: section 2 describes related and existing work in this field. Section 3 introduces basic particle filter for object tracking. The proposed system models which unified distance transform, normalized crosscorrelation with simultaneous effect of color information by particle filter, are discussed in section 4. Section 5 verify about the proposed system with some experimental results on various real video data in different environments. Conclusive remarks are addressed at the end of the paper in section 6.
2 Particle Filter Tracking objects in video involves the modeling of non-linear and non-gaussian systems. In order to model accurately the underlying dynamics of a physical system, it is important to include elements of non-linearity and non-gaussianity in many application areas. Particle Filters can be used to achieve this.
New Integrated Framework for Video Based Moving Object Tracking
425
Particle filter is a sequential Monte Carlo methodology where the basic idea is the recursive computation of relevant probability distributions using the concepts of importance sampling and approximation of probability distributions with discrete random measures. The fundamental idea of Particle filter approximates the filtered posterior (next) distribution (density) by a set of random particles (sampling) with associated weights. It weights particles based on a likelihood score and then propagates these particles according to a motion model. Particle filtering assumes a Markov Model for system state estimation. Markov model states that past and future states are conditionally independent of a given current state. Thus, observations are dependent only on current state. 2.1 Mathematical Description Particle filter consists of essentailly two steps: prediction and update. Given all available observations y1:t −1 = {y1 ,…, yt −1} up to time t − 1, the prediction stage uses the
probabilistic system transition model p(xt | xt −1 ) to predict the posterior at time t as
p(xt | yt −1 ) =
∫ p(x
t
| xt −1 ) p(xt −1 | y1:t −1 )dxt −1
(1)
At time t , the observation yt is available, the state can be updated using Bay’s rule
p(xt | y1:t ) =
p( yt | xt ) p(xt | y1:t −1 ) p( yt | y1:t −1 )
(2)
where p( y t | xt ) is described by the observation equation. In the particle filter, the posterior p(xt | y1:t ) is approximated by a finite set of N samples x i with importance weights wi . The candidate samples ~ x i are drawn
{}
t i =1,", N
t
t
from an importance distribution q( xt | x1:t −1 , y1:t ) and the weight of the samples are – wti = wti−1
(
)( )
p yt | ~ xti p ~ xti | xti−1 q(~ xt | x1:t −1 , y1:t )
(3)
The samples are resampled to generate an unweighted particle set according to their importance weights to avoid degeneracy. In the case of the bootstrap filter [13], q( xt | x1:t −1 , y1:t ) = p(xt | xt −1 ) and the weights become the observation likelihood p( yt | xt ) .
2.2 Simple Mathematical Model of Our Proposed System
For implementation of particle filter we need the following mathematical model: 1. Transition model / state motion model P( xt | xt −1 ) : this specifies how objects move between frames. 2. Observation model p ( yt | xt ) : this specifies the likelihood of an object being in a specific state (i.e. at the specific location).
426
Md.Z. Islam, C.M. Oh, and C.W. Lee
3. Initial state Est (1) / prior distribution model p( x0 ) : describes initial distribution of object states.
3 Parametric 2D Model In our preceding work [12], we somehow did the same basic principle for video based object tracking like as our present work. In [12], we suggested a model based object tracking using particle filter with chamfer matching. In this paper, we model selected object features as color distribution and distance transform template with non Gaussian object movement and this algorithm is based on non geometric model which is little bit heavy and limited to simple model like scaling and rotating transform of object models.
Fig. 1. Instance model (1) Four angular points (2) Corner features (3) Edge and corner features (4) Model information
Fig. 2. Feature and Distance Transform images (1) Canny Edge (2) Distance transform of Edge (3) Fast corner (4) Distance Transform of Corner
Our presented system is specially suited for a non geometric object which is far from camera position. As an ideal object model, we select A4 size book which has 4 angular points. By using these points, we select the model instance and the image transform factors are calculated as shown in Fig. 1. Chamfer matching is one of observation methods for verifying geometric edge model similarity in distance transform image. It observes how the feature points of hypothesis model are matched to distance transform of edge image. It uses feature points like edge and corner point from object model. Figure 2 shows feature images and distance transform images.
4 Proposed Integrated Framework The task of robust tracking demands a robust observation model. Both shape and color are very important features to distinguish target object from background image. We integrate these two cues to develop our observation model.
New Integrated Framework for Video Based Moving Object Tracking
427
4.1 Observation Model
Basically the observation model is used to measure the observation likelihood of the samples and this is an important concern for object tracking. In the last few years, many observation models have been developed for particle filter tracking. In [5], a contour based appearance template is chosen to model the target. The tracker based on a contour template gives an accurate description of the targets but poorly in clutter, non rigid object and is generally time consuming. Also, the initialization is not easy and tricky in this method. On the other hand, the color based trackers are faster and more robust against contour based tracker. In this case, the color histogram is typically used to model the targets to combat the partial occlusion, and non-rigidity. The draw back of the color histogram is that spatial layout is ignored, and the trackers on it are easily confused by a background with similar colors. So, the combination of the two features provides better performance minimizing all the difficulties for any general tracking system. The cross correlation based template matching is motivated by the distance measure (squared Euclidean distance). There are several disadvantages using this approach, for example, if the image energy varies with position, matching can fail. Moreover, it is not invariant to changes in image amplitude such as those caused by changing lighting conditions across the image sequences. The correlation coefficient overcomes these difficulties by normalizing the image and feature vectors to unit lengths, yielding a cosine-like correlation coefficient. So we are intended in our present work for DT image map based matching with normalized cross-correlation to develop observation model to make more robust particle filter based tracker. We try to take both advantages from DT image map and normalized cross correlation regarding to develop our particle filter based tracking system with color information. 4.2 Initialization
In our present case, for shape information, we propose normalized cross-correlation (NCC) based DT image matching. And for color information we use, simple HSV histogram based model. For our proposed tracking method we need to initialize the system, and this initialization working block diagram is shown in Fig. 3.
Select the reference image to extract information
Distance transformation of selected object in the video scene
Compare / matching by NCC with best matching score
Calculate histogram of the selected image
Correlation based histogram matching with best score
∞
Observation model
Fig. 3. Initialization steps. Whenever we select our reference image by a rectangle, then these primary actions starts to develop our observation model.
428
Md.Z. Islam, C.M. Oh, and C.W. Lee
DT image matching by NCC involves correlating the reference image with the distance transformed scene and determining the locations where the mismatch is blow a certain user defined threshold. The cross-correlation of template t ( x, y ) with a subimage f ( x, y ) is given by the following equation
N f ,t =
∑
( f ( x, y ) − f )(t ( x, y ) − t )
x, y
(4)
σ fσt
4.3 Distance Transform (DT)
The typical matching with DT [3] involves two binary images; a segmented template and a segmented image that we call feature template and feature image. It calculates the distance to closest zero pixel for all non-zero pixels of source image. To compute the distance transformation we use the function based on algorithm described in [2]. To formalize the idea of DT matching similar to chamfer matching [4], the shape of an object is represented by a set of point. The image map is represented as a set of feature points. Chamfer system basically depend on distance transform. Chamfer matching is a technique for finding the best fit of edge points from two different images, minimizing a generalized distance between them. A distance transform (DT) converts a binary image consisting of feature and non-feature pixels into an image where each pixel value denotes the distance to the nearest feature pixel. Distance image gives the distance to the nearest edge at every pixel in the image and it is calculated only once for each frame. In our present work we always update our reference template in every frame. As a result, it is more robust for matching in each changing of tracked object. DT based matching has several advantages such as, in order to be tolerant to small shape variations, any similarity function between two shapes should vary smoothly when the feature point locations change by small amounts. By DT based system, we can match it very smoothly and robustly by means of normalized cross-correlation which is discussed in next section. 4.4 Color Distribution Model
We want to apply such a particle filter in color based context with integration with DT image map information. To achieve robustness against non-rigidity, rotation and partial occlusion we focus on color distributions as target models. These are represented by HSV based histogram of image. Color based probabilistic tracking rely on the deterministic search of a window, whose color content matches a reference histogram color model using principle of color histogram distance. We employ a function which compares two dense histograms using correlation method. If H1 denotes the first reference histogram and H2 denotes the target histogram then the correlation function is given by following equation
New Integrated Framework for Video Based Moving Object Tracking
f ( H1 , H 2 ) =
∑ (H I
∑ [H I
' 1(I ) •
' 2 1(I ) ] •
429
H ' 2 ( I ))
∑ [H I
'
2 (I )
2
] (5)
where H 'k ( I ) = H k ( I ) − 1 / N •
∑
j
Hk (J )
(N = number of histograms bins) 4.5 Particle Filter Based Implementation
In our proposed system, we integrate DT based image matching by NCC and image histogram with particle filter for robust object tracking. In this particle filter based implementation we flow some steps, which are given below: State space – we have modeled the states, as its location in each frame of the video. The state space is represented in the spatial domain as X = (x,y). We have initialized state space for the first frame manually by selecting the object of interest in the video scene by rectangle. System dynamics – a second-order auto regressive dynamics is chosen on the parameters used to represent our state space i.e. ( x, y ) . The dynamics is given
as: X t +1 = Axt + Bxt −1 . Matrices A and B could be learned from a set of sequences where correct tracks have been obtained. Observation yt – the observation y t is proportional to the NCC matching score between reference image and target image of the predicted location in the frame with time combine with the histogram best matching score. The strategy to take the best effect from combination of these of two features matching, we flow the following technical steps: 1. 2. 3. 4. 5.
Score from DT image based shape matching by NCC. Score from color histogram correlation based matching. Normalized the both matching score. Final score = α × shape score + β × color score, where (α + β) = 1. Search the particle with the best score.
According to this algorithm if color is dominant, then we increase β and if shape is dominant, then we increase α. We also can decide α and β automatically. So, our present system developed the observation model which is based on the following equation –
y t ∞NCC (q, q x ) + Dist (q, q x )
(6)
where NCC = Normalized cross-correlation based matching score and Dist = Histogram Distance score, q and qx are reference image and target image respectively.
5 Experiments and Results Our proposed system is suited in several environments for tracking any moving single objects. The combined effect of multiple features that is DT image map and color make
430
Md.Z. Islam, C.M. Oh, and C.W. Lee
Fig. 4. First resulting sequences shows the single moving human tracking
Fig. 5. Second resulting sequences of fast dynamic moving person with bicycle
Fig. 6. Third resulting sequences of a dynamic moving person
our system more robust and reliable against some well known tacking problems. In this part we will point up some experimental results on several real-world video sequences captured by pan/tilt/zoom video camera in outdoor environment from the top of our department building. The captured sequences simulate various tracking conditions including quick movement, shape deformation, background clutter, appearance changes, camera pan/tilt/zoom, and partial occlusion. For all testing sequences, we use the same algorithm configuration. In our first experiment result sequences shows the tracking of single moving person and after some time being this moving person passes away by a group of moving person. These are shown in Fig. 4. For each experiment, initialization is done by manually which make reference image instantly by selecting the region of interest (ROI) and we use 100 number particles. Our all resulting image sequences are presented here from left to right, the top left corner number shows the frame number. The second resulting sequences as shown in Fig. 5 show some quick movement of a person with his cycle and randomly he changes his body appearance with cycle. As shown in Fig. 6 the third resulting video sequences the effectiveness of our proposed system as the moving person randomly changes her movement in very dynamic way. This tracking algorithm is implemented in C++ and OpenCV library on Windows XP platform with standard Pentium 4 (with 1.5 GB RAM) machine. Consequently, it can be concluded from all experimental results, that our proposed system can be very efficient general system as instant DT image and color histogram based object tracker. These all results show the algorithm performance under diverse scenarios. To illustrate the distribution of sample test, Fig. 7 shows the samples distribution considering
New Integrated Framework for Video Based Moving Object Tracking
Frame # 134
Frame # 203
Frame # 262
431
Frame # 374
Fig. 7. Probability density functions of the corresponding moving person as shown in Fig. 5
the best matching score. This probability density function (pdf) represents the best match to the target object. These distribution results are taken from the same frame as represented in Fig. 6 respectively.
6 Conclusion Our proposed tracking method successfully ensure some tracking conditions such as robustness and invariance of shape, non-rigidity due to its color distribution and DT image map similarity based observation model of particle filter. We take in to account for both color and shape features concurrently. Only color cues sometimes are not reliable for illumination changes, shadow and intensity changes. So, we are projected to add more cue like DT image map based shape which is invariant with some deformations of object like rotation, scaling etc. The observation model is updated by the best matching score in every frame. Initialization is necessary to start the tracking process and it is important that the tracking modules should be initialized effortlessly. The system has been tested on a variety of video data and very satisfactory results have been obtained. This work motivated us to do 3D information based object tracking as our future work. Acknowledgements. This research work was supported by MIC & IT leading R&D support project [2006-S-028-01].
References 1. Djuric, P.M., Kotecha, J.H., Zhang, J., Huang, Y., Ghirmai, T., Bugallo, M.F., Miguez, J.: Particle Filtering. IEEE Signal Processing Magazine, 19–38 (2003) 2. Felzenszwalb, P.F., Huttenlocher, D.P.: Distance transforms of sampled functions. Cornell Computing and Information Science, TR2004 – 1963 (2004) 3. Gavrila, D.M.: Multi-feature Hierarchical Template Matching Using Distance Transforms. In: IEEE ICPR, Brisbane, Australia (1998) 4. Gunilla, B.: Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 10(6) (1988) 5. Israd, M., Blake, A.: CONDENSATION – conditional density propagation for visual tracking. International Journal of Computer Vision 29(1), 893–908 (1998) 6. Israd, M., Blake, A.: A mixed-state condensation tracker with automatic model-switching. In: International Conference on Computer Vision, pp. 107–112 (1998)
432
Md.Z. Islam, C.M. Oh, and C.W. Lee
7. Krishna, V.T., Kamesh, R.N.: Object tracking in video using particle filtering. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 657–660 (2005) 8. Lehuger, A., Lechat, P., Perez, P.: An adaptive mixture color model for robust visual tracking. In: IEEE International Conference on Image Processing, pp. 573–576 (2006) 9. Li, P., Zhang, T., Pece, E.C.: Visual contour tracking based on particle filters. Image Vision Computing 21(1), 111–123 (2003) 10. Lu, W., Tan, Y.P.: A color histogram based people tracking system. In: ISCAS, vol. 2, pp. 137–140 (2001) 11. Nummiaro, K., Koller-Meier, E., Gool, L.V.: A color-based particle filter. In: First International Workshop on Generative-Model-Based Vision, pp. 53–60 (2002) 12. Chimin, O., Islam, M.Z., Lee, C.W.: Two Dimensional edge and corner model based object tracking using particle filter. In: 15th Japan-Korea Joint Workshop on Frontiers of Computer Vision, pp. 223–228 (2008) 13. Sanjeev, A.M., Simon, M., Neil, G., Tim, C.: A tutorial on particle filters for online nonlinear/non-gaussian Bayesian tracking. IEEE Transactions on Signal Processing 50(2), 174–188 (2002) 14. Stenger, B.D.R.: Model-Based Hand Tracking Using A Hierarchical Baysian Filter. PhD Thesis, University of Cambridge (2004) 15. Yunqiang, C., Yong, R.: Real time object tracking in video sequences. Signals and Communications Technologies, Interactive Video. Part II, 67–88 (2006) 16. Zhao, F., Huang, Q., Gao, W.: Image Matching by Normalized Cross-Correlation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. II14–II-19 (2006)
Object Scanning Using a Sensor Frame Soonmook Jeong1, Taehoun Song1, Gihoon Go1, Keyho Kwon2, and Jaewook Jeon2 1
School of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea {kuni80, thsong, tokogi}@ece.skku.ac.kr 2 School of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea {khkwon, jwjeon}@yurim.skku.ac.kr
Abstract. This paper focuses on object scanning using sensors. The objects are articles in daily use. Everyday objects, such as cups, bottles and vessels are good models to scan. The sensor scan represents the objects as 3D images on the computer monitor. Our research proposes a new device to scan real world objects. The device is a square frame, similar to a picture frame, but, except for the frame, is empty. The infrared sensors are arranged on the device frame. These sensors detect the object and extract the coordinates from the detected object. These coordinates are transmitted to the computer and the 3D creation algorithm represents these coordinates as a 3D image. The operating principle is simple, similar to scanning a person at a checkpoint. The user passes the object through the sensor frame to obtain the 3D image, creating the 3D image corresponding to the real object. Thus, the user can easily obtain the 3D object image. This approach uses a low-cost infrared sensor, rather than a high-cost sensor, such as a laser. Keywords: Sensor frame, 3D image, Scanning, Infrared sensor.
1 Introduction The ability to form a 3D image from a real object is valuable, creating a 3D image of an object in a short time, without drawing the object directly. This is especially important in 3D animation and 3D games. The general method to acquire a 3D image uses computer vision or laser sensors. Computer vision using a stereo camera is sensitive to light or color; it is difficult to obtain an accurate 3D image using a stereo camera. A laser sensor may be used to obtain the 3D image for the object, but this approach is uncompetitive, due to their high cost. Our research focuses on obtaining a 3D image from the real object using a low-cost device. We developed a sensor frame to do so. The sensor frame body is acrylic. Infrared sensors are installed on the frame circumference. These elements are cheap and readily available. Our aim is to obtain a 3D image from an object using this device. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 433–439, 2009. © Springer-Verlag Berlin Heidelberg 2009
434
S. Jeong et al.
2 Related Work A number of researchers have attempted to improve the performance of 3D object scanning. Bor-Tow Chen used a 3D laser scanning system. Generally, a 3D scanner needs two or more scanning processes to capture multiple layers to register and merge to a complete 3D object, since it is based on a 2-dimensional axis. He proposed an algorithm to decrease the number of scanned layers and to acquire the best-view data in the double-axis 3D scanner [1]. Masakazu proposed a portable 3D scanner system that combined narrow-baseline stereo camera and consumer video projector [2]. Georg introduced a 3D object detection approach with a geometric model description. Fast and robust object detection was achieved combining the RANSAC algorithm with a hierarchical subscale search [3]. Joseph P discussed the design and development of a 3D scanning system that includes a computing processor capable of handling large volumes of high-speed, high-resolution digital camera output data [4]. Kazuki proposed a method to recover valid 3D range data from a moving range sensor using multiple view geometry [5]. Beverly D focused on the laser light-sectioning method, an accepted technique for measuring the 3D shape of the target object [6].
3 The Creation Process the 3D Image from the Object This section demonstrates the process to create a 3D image from an object. The sensor frame is a device to scan the object. The sensor frame extracts the coordinates of the object in three-dimensions. The sensor frame is square, like a picture frame. Its interior is empty, except for the frame. Infrared sensors are installed on the frame circumference. The user can hold the frame. The user passes the sensor frame over the object. This process scans the object top-down. When the sensor frame passes over the object, each infrared sensor on the frame detects the each distance between the object and the sensor. However, this is insufficient to acquire the three-dimensional coordinates. Two coordinates must be acquired from the sensor position and the distance between the ground and the sensor frame. One sensor is installed at the bottom of the frame to obtain the distance between the ground and the frame. These values change the coordinates to three-dimensions. After scanning the object using the sensor frame, the coordinates are transmitted to the computer. Then the three dimensional coordinates are linked by our proposed algorithm. Finally, the 3D image can be obtained. In brief, we develop a novel device to create a 3D image from an object. 3.1 The Structure of the Sensor Frame The infrared sensor consists of the transmitter and receiver that measure the distance between the sensor and the object. The device has twenty-four infrared sensors. Six sensors are arranged on each of the four sides of the frame (Fig. 1). The sensor board transmits the sensor data to the host PC as soon as it is switched on. The host PC stores the twenty-four sensors’ data in a twenty-four variable array.
Object Scanning Using a Sensor Frame
435
3.2 Object Recognition The sensor zone is the space surrounded by sensors. Before the object enters the sensor detection zone, the initial values of all the infrared sensors are 15cm, since the detection zone is a 15cm square. Recognition begins at the top of the object. Our algorithm recognizes the object when more than three sensors have a reading of less than 8cm. Once the object is recognized, the sensor data are stored in the array variable. These sensor data are obtained from the twenty-four sensors on the sensor detection zone and one sensor on the bottom of the sensor frame. The sensor on the bottom of the device measures the distance between the device and the ground. The three coordinates are obtained using this method. Fig. 2 shows how the three coordinates from the sensor frame are obtained. 3.3 The Measurement, Compensation and Storing of the Sensor Data Once the sensor frame recognizes the object, the sensor data are stored to the array variable. The sensor detection zone is two-dimensional space, thus the sensor frame cannot detect the three-dimensional coordinates simultaneously. Therefore, the object should be divided into layers. Fig. 3 shows the object layers. The object section is formed on each layer. After the object is sensed, each section of the object layers is connected to make the three-dimensional object. However, the sensor information is unstable for each measurement and often has incorrect values. Our algorithm measures the sensor data ten times for each layer to resolve these problems. The mean values of the sensor data are used. dat_p[24][10][10] is an array variable to store the sensor position number, measurement count and the height of object layers. The mean value of ten sensor values is calculated. The mean values are stored in Data [24][10], a two-dimensional array, in which the first factor includes the sensor position number and sensor value. The second factor stores the height of each object layer. dat_p[24][10][10] = Sensor Value[Sensor Num][Measurement Count][Height] Data[24][10] = Sensor mean Value[Sensor Num][Height]
Fig. 1. The structure of the sensor frame
436
S. Jeong et al.
Fig. 2. The process to obtain three coordinates from an object
Fig. 3. The layers of the object
3.4 The Coordinate Generation of Three-Dimensional Space This section describes how to generate three-dimensional space. Three parameters are required to generate the three-dimensional object. The parameters, x, y and z, represent the coordinates on a three-dimensional space. The method obtains these parameters in the three-dimensional array Data [24][10]. The three-dimensional array Data [24][10] stores the mean value of the twenty-four sensors and the height of the layers. The first factor is the sensor position used for the x coordinate; each sensor value is used for the y coordinate. The second factor, the distance between the device and ground, is the z coordinate. 3.5 The Formation of Object Section This section describes how to create a cross-section of the object. Fig. 4 shows the process to form the object section. The row axis represents the x axis and column axis represents the y axis. The red points are the coordinates created by the sensor position and sensor value. These points have to be connected without crossing. The algorithm uses two pointers to detect the red points used to connect these points. The point detection order increases from bottom-left to top-right direction. If the first point is detected, pointer 1 indicates the point. If the second point is detected, pointer 2 indicates the point. Pointers 1 and 2 are thus connected. If the third point is detected, the nearest pointer of two pointers on the x axis is selected to indicate the third point. Therefore, pointer 1 moves from the previous point to the third point. If no point is
Object Scanning Using a Sensor Frame
437
Fig. 4. The formation process of an object section
detected, pointer 1 and pointer 2 are connected and a section of the object is made in two-dimensional space. Our algorithm uses the extended scale of 300 x 300 x 10 for ease of computation. The original scale is 24 x 15 x 10. The first factor is the number of sensors, the second factor is the maximum sensor value. The last factor is the number of object layers. Fig. 5 shows the change from the original scale to the extended scale. 3.6 The Connection of Object Sections Since the each section of the object is already formed, the next step is to connect those sections of the object. The upper sections are connected to the lower sections. Fig. 6 shows the connection process of object sections.
Fig. 5. The change from the original scale to the extended scale
Fig. 6. The connection of object sections
438
S. Jeong et al.
4 Experiment Results This section demonstrates the experimental results. We tested our object-sensing algorithm with three object of different type. The first object tested is a mug. This object looks like a cylinder with a handle on the side. We scanned this object using our object scanning algorithm and obtained the result shown in the figure on top of Fig. 7. This figure roughly depicts the shape of the mug. The second object tested is a triangular pyramid. The upshot of the scan was a similarly shaped triangular pyramid. As the height of 3D object decreased, its area became larger. The last object tested is a mini-fan. This shape is difficult to recognize. The mini-fan is divided into three parts: the pan, the cable and the prop. The test outcome represented a figure that has these three distinguishable parts.
Fig. 7. The demonstration results of the experiment
Object Scanning Using a Sensor Frame
439
5 Conclusion and Future Work The object-scanning sensor frame uses infrared sensors to recognize an object. We used twenty-four sensors to detect an object. The number of sensors is too small to scan some objects. As shown in the experiment, the 3D objects form shapes similar to the real object, but were not exact. This can be solved by adjusting of the number of sensors and the interval between the sensors. The 3D object depicted becomes recognizably closer to the real object when the scanner has more sensors and the interval between sensors is lowered. This refinement will be achieved in future research.
Acknowledgments This research was supported by the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Advancement) (IITA-2009(C1090-0902-0046)).
References 1. BorTow, C., WenShiou, L., Chen, C., Hsien Chang, L.: A 3D Scanning System based on Low-Occlusion Approach. In: Second International Conference on 3-D Digital Imaging and Modeling, Proceedings, October 4-8 (1999) 2. Morimoto, M., Fujii, K.: A Portable 3D Scanner based on Structured Light and Stereo Camera. In: Communications and Information Technology, ISCIT 2005 (2005) 3. Biegelbauer, G., Vincze, M.: Efficient 3D Object Detection by Fitting Superquadrics to Range Image Data for Robot’s Object Manipulation. In: IEEE International Conference on Robotics and Automation, April 10-14 (2007) 4. Lavelle, j.P., Schuet, S.R., Schuet, D.J.: High Speed 3D Scanner with real-time 3D Processing. In: IEEE International Workshop on Imaging Systems and Techniques (IST), No. 14 (2004) 5. Kozuka, K., Jun, S.: Rectification of 3D Data Obtained from Moving Range Sensors by using Multiple View Geometry. In: International Conference on Image Analysis and Processing (ICIAP), pp. 10–14 (2007) 6. Beverly, D.B., Adrian, D.C., Chan, M., John, D.H.: A Simple, Low Cost 3D Scanning System using the Laser Light-Sectioning Method. In: IEEE International Conference on Instrumentation and Measurement Technology, May 12-15 (2008)
Mixed Realities – Virtual Object Lessons Andreas Kratky USC School of Cinematic Arts, Interactive Media Division, 900 West 34th Street, SCA 201, Los Angeles, CA 90089-2211
[email protected]
Abstract. The question of how to design and implement efficient virtual classroom environments gains a new quality in the light of extensive digital education projects such as the One Laptop Per Child (OLPC) initiative. At the core of this consideration is not only the task of developing content for very different cultural settings but also the necessity to reflect the effects of learning processes that operate exclusively with digitally mediated content. This paper traces the design of the project Venture to the Interior, an interactive experience that presents selected objects from the collections of the Museum of Natural History in Berlin and reflects them as building blocks for the Enlightenment-idea of a building of knowledge. The project investigates the role of objects as a knowledge device and the possibilities for a translation of the didactic effects of experiential learning into virtual environments. Keywords: Virtual Museums, Virtual Reality, Mixed Reality, Virtual Classroom, Distance Learning, Photorealism.
1 Introduction The recent announcement of a 10 Dollar computer by the Secretary for Higher Education in India as well as the announcement of a new computer series of the One Laptop Per Child initiative of the MIT for 2010 gives the discussion about virtual classrooms a new and strong impulse. These initiatives are designed to make educational resources available to children who do not have a regular access to them. Targeted for mass distribution in developing countries these networked computers will be used in areas with sparse infrastructure where the computer and the content available through this computer will often be the only contact with a wider range of educational resources. While many of the studies about the pedagogy and efficiency of virtual classroom settings have been conducted in areas where the technological platforms are generally available and where also other access channels to knowledge exists, the question of how to design and distribute educational resources for a situation where the codes and a basic familiarity with digital media is not developed poses a new challenge. At the same time this increasing demand for digital learning resources and remote learning is not limited to developing countries. Also in the industrialized countries the need for targeted and customized educational tools grows and a growing number of institutions sees the need to provide information and educational content through digital channels such as the Internet and electronic publications. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 440–445, 2009. © Springer-Verlag Berlin Heidelberg 2009
Mixed Realities – Virtual Object Lessons
441
The motivations behind the efforts to develop affordable computer technology to extend the availability of knowledge and education to areas where large parts of the society are excluded from the access to appropriate learning facilities bears parallels with earlier historic projects of this kind. The English social reformer James Silk Buckingham published in 1849 his ideas for a reformation of the society towards a more healthy and stable life. As a complement to the transformation of the inner attitudes of people he suggested a number of exterior improvements, among them “ready access to Libraries, Lectures, Galleries of Art, Public Worship, with many objects of architectural beauty, fountains, statues” [1]. Buckingham was instrumental in introducing awareness for the role of culture into the agenda of British reform politics and promoted the establishment of municipal museums and libraries. The attempt for a general cultivation of people through “rational recreation” had the goal to make the society more disciplined, controllable, and efficient and to give people better access to education and future development. These motivations are not unlike those that are the driving force behind the ten Dollar computer in India, which is supposed to improve the skills of millions of students across the country and to build a more efficient and innovative layer of workers and future scientists. The same aim is behind the OLPC initiative, which has the goal to promote children to become an “educated and empowered resource” for countries whose “governments struggle to compete in a rapidly evolving, global information economy” [2]. While the example of English reform politics is situated around the time when the museum acquired its modern form as a public institution the idea did not originate in this time. The German philosopher and mathematician Gottfried Wilhelm Leibniz formulated in 1669/70 the plan for an academy of the sciences and art, the theatrum naturae et artis, a plan which he promoted several times to different political leaders in Russia, Austria, France, and Germany. The idea was a combination of archive, museum, theatre and forum, open to all people to come together and admire new inventions and participate in discussions and various kinds of entertainment. A particular role was attributed to the collection of tangible objects that conveyed the matters and results of sciences to the visual and tactile senses and thus provided a basis for the “reform of economy, education, and the arts and crafts” [3].
2 Virtual Objects of Knowledge All these examples do not only share a very similar motivation, they also have in common that they favor the practical and manifest interaction with objects as a suitable learning approach for a wide range of people who do not share the same educational background. In the historic examples it may be rather obvious that collections of objects of scientific enquiry are chosen as the vehicle to bring these sciences, their procedures and results to a mass of largely uneducated people because this was the state of the art of the sciences at that time. The value of these tangible objects that speak to all senses is also confirmed by recent studies about the value of sensory stimulation for the development of the brain. “The brain uses the outside world to shape itself and to hone such crucial powers as vision, reasoning, and language. Not hard wiring but continual interaction with the external environments is now thought to produce even the most abstract kinds of cognition.” [4]
442
A. Kratky
This turn towards the object and its sensual stimulation was not only a form of enlightenment entertainment where “bewitching arrangements of colorful rough stuff […] piqued the curiosity of the public [4], it was part of a general turn towards objectivity as a way to decipher and understand nature and the structure of the world. In his Critique of Pure Reason (1781/1787) Immanuel Kant places the human capacity to be affected by objects as a necessary precondition for any valid statements about the world. With the distinction between subjective opinion and objectively valid conviction he offers a paradigm that has influenced most modern philosophical discussion of the objectivity of mind. Operating with the term communicability Kant justifies objectivity “on the grounds that if a judgment can be communicated to other rational beings, there is a solid (though not infallible) presumption that they are talking, and talking accurately, about the same object.” [5] How can this immediacy of the encounter with tangible objects that is the characteristic of collections and museums be translated into a digitally mediated context? It seems particularly valuable for the context outlined above, where education has to deal with significant cultural differences, to turn to this tangible immediacy to convey the desired information. Thus it is crucial to find an efficient translation of the realworld object encounter. Several studies have been conducted on the use of Virtual Reality environments for educational purposes favoring the potential for high interactivity and a high degree of realism. Virtual Reality environments provide the possibility for the learner to explore and manipulate three-dimensional spaces that are displayed on a computer. As Michitaka Hirose points out, the most important contribution of this technology is “to visualize various objects that are difficult to understand intuitively” [6]. This evaluation goes along with several other studies finding VR environments capable of making “what is abstract and intangible to become concrete and manipulable” [7]. In combination with smaller, portable and more affordable computer hardware the use of digital VR technology seems to be on the track to move away from the costly and cumbersome hardware that was formerly necessary to implement a convincing experience. Classically VR environments operated with expensive immersive display technologies that were only suitable for a lab environment and due to their focus on an individual-centered perspective inhibiting the communication and interaction with other people and the immediate environment. The aspect of less intrusive devices and more mobility allows to bring some of the social aspects back into the experience that make learning and knowledge exchange effective and pleasurable [8]. These recent developments and applications suggest that VR technology provides possibilities to implement aspects of the immediacy and communicative value that was attributed to the real-world object in the earlier examples. At the same time it becomes conceivable to use VR approaches also for distance learning projects and virtual classrooms in technologically less developed areas thanks to affordable hardware and solid technology.
3 Translation Artifacts For our project about the Museum of Natural History in Berlin we decided to use a virtual space in which selected objects of the museum are represented. The viewer can explore this space according to the museum geography as well as according to an
Mixed Realities – Virtual Object Lessons
443
alternative geography based on contextual connections. The implementation was done using a 3D game engine suitable for fast and robust development. We found, though, that the particular quality of the rich and textured objects and the museum space itself were not conveyed in the virtual environment. The tangible reality effect of the encounter with real objects was impossible to achieve with a pure CG approach using computer-generated models of the objects and real-time rendering. The computer graphics are not in the position to transmit the pungent feeling that the object on display actually is a real animal, maybe a sample of a species that used to live on earth and that now is extinct. The particular power of realizations from this reality-encounter and its pedagogic values were not communicable in the VR environment. Geoffrey C. Bowker points in his book Memory Practices in the Sciences to the inherent difference between the two devices, the museum collection and the computerbased collection, as two different memory regimes. We can either be “acting as archives commissioners or conjuring the world into a form that can be represented in a universal Turing machine whose past has been evacuated in order to render its future completely controllable. Integrally associated with each are two symbolic realms: memorializing difference and secular time through classification and hermeneutics, or memorializing sameness and circular time through abstraction and analysis” [9]. Bowker sharpens our understanding of how the encoding of information into a particular memory practice shapes the information that is being encoded and produces distortions and translation artifacts. We perceive the computer-generated images as the idealized result of an abstraction, as the result of a complicated but nevertheless formulaic description rather than as individual real objects of which only this one singular entity exists. Despite the qualities of the VR environment stated above this particular aspect of individuality and historicity of the presented objects was missing.
4 Mixed Reality Environment In order to preserve the quality of object representations, which we considered very important for our application, the decision was to create a mixed reality environment using a combination of computer-generated space and photography. With a motion control camera we took a series of photographs from all perspectives of the objects in 10-degree steps and texture-mapped these images on planes in the virtual space. We are using a technique similar to Quicktime VR-objects. The effect of realness is enhanced by distributing photographically derived images posed throughout the virtual space in such a way that the images correspond to a particular point of view and line up with the architectural geometry of the virtual museum building. The posing data were gathered with the help of laser distance measuring and inclination measuring using accelerometers attached to the camera. The use of photographic images allowed us to re-establish at least part of the rich and individual quality of the space and the objects. In holding with Roland Barthes’ considerations of photography we use the aspect that a photograph makes “it possible to recover and print directly the luminous rays emitted by a variously lighted object. The photograph is literally an emanation of the referent” [10]. According to Barthes a photograph has the ability to conjure the presence of an object or person – or in our case of an animal – even when it is the image of a corpse: it is the living image of a dead thing.
444
A. Kratky
Fig. 1. Screenshot of the space and one object of the mixed reality environment of the Venture to the Interior project
The design approach we followed in our project does not aim for photorealism instead we are underlining the fact that each photograph is just one perspective from one particular point of view inside an abstract constructed space. By navigating through the museum space the viewer moves in and out of these vantage points and experiences an impression of a space reminiscent of cubist paintings that combine multiple perspectives into one picture. The same principle applies to the objects which can be seen from all sides as if they were three dimensional objects but it is still clear that each individual perspective is given by one flat image. Our motivation for this design is that we want to heighten the awareness for perspective dependency rather than creating a coherent illusory space and at the same time communicate the historicity and the immediacy of the objects.
5 Conclusion The project Venture to the Interior allowed us to reflect the issues of representing real historic objects in a computer-based simulation environment. The particular setting of the project made it obvious to consider these issues between the two poles of the classic natural history museum as a collection of tangible objects and digital datacollections communicated through electronic networks. Of special interest was the possibility to translate the didactic values that the encounter with real-world objects provides for a learning experience into a digitally mediated environment. We found that a mixed reality approach provides particular advantages to integrate the high degree of interactivity and flexibility of a virtual environment with the reality reference of photographic media. This combination allows to create a learning experience that is engaging and has the advantages of the digital format and easy and widespread distribution through electronic networks while still communicating a feeling of
Mixed Realities – Virtual Object Lessons
445
groundedness in reality. The aspect of immediacy and experiential directness provides great potential for the use in communicative situations spanning vastly different educational levels and cultural backgrounds. Further experiments based on this model have to be developed to further investigate this potential.
References 1. Benett, T.: The Birth of the Museum, p. 17. Routledge, London (1995) 2. One Laptop Per Child: Mission Statement, http://laptop.org/en/vision/ mission/index2.shtml (retrieved February 28, 2009) 3. Bredekamp, H.: Leibniz’ Theater der Natur und Kunst. In: Bredekamp, H., Brüning, W. (eds.) Theater der Natur und Kunst, p. 14. Henschel Verlag, Berlin (2000) 4. Stafford, B.M.: Artful Science, p. xxi. MIT Press, Cambridge (1999) 5. Daston, L., Gallison, P.: Objectivity, p. 262. Zone Books, New York (2007) 6. Hirose, M.: Virtual Reality Technology and Museum Exhibit. The International Journal of Virtual Reality, 1 (2006) 7. Lee, E.A.-L., Wong, K.W.: A Review of Using Virtual Reality for Learning. In: Pan, Z., et al. (eds.) Transactions on Edutainment I. LNCS, vol. 5080, pp. 231–241. Springer, Heidelberg (2008) 8. Cheok, A.D., Yang, X., Ying, Z.Z., Billinghurst, M., Kato, H.: Touch Space: Mixed Reality Game Space Based on Ubiquitous, Tangible, and Social Computing. In: Personal and Ubiquitous Computing, vol. 6, p. 430. Springer, London (2002) 9. Bowker, G.C.: Memory Practices of the Sciences, p. 109. MIT Press, Cambridge (2005) 10. Barthes, R.: Camera Lucida, p. 80. Hill and Wang, New York (1981)
New Human-Computer Interactions Using Tangible Objects: Application on a Digital Tabletop with RFID Technology Sébastien Kubicki1, Sophie Lepreux1, Yoann Lebrun1, Philippe Dos Santos1, Christophe Kolski1, and Jean Caelen2 1
LAMIH - UMR8530, University of Valenciennes and Hainaut-Cambrésis, Le Mont-Houy, F-59313 Valenciennes Cedex 9, France {firstname.name}@univ-valenciennes.fr 2 Multicom, Laboratoire d'Informatique de Grenoble (LIG) BP 53, 38041 Grenoble cedex 9, France
[email protected]
Abstract. This paper presents a new kind of interaction between users and a tabletop. The table described is interactive and associated with tangible and traceable objects using RFID technology. As a consequence, some HumanComputer Interactions become possible implying these tangible objects. The multi-agent architecture of the table is also explained, as well as a case study based on a scenario. Keywords: Human-Computer Interaction, RFID, tabletop, tangible objects, Multi-Agent System.
1 Introduction Tabletops are a far cry from the personal computers currently used. Indeed, with the concept of the interactive table, we can imagine a collaborative and co-localized workspace allowing us to bring several users into work at the same time. Dietz and Leigh [6] propose an interactive table called “DiamondTouch”. They suggest an example of application concerning a plumber and an electrician working together on the same table. In this example, each participant can modify only the plans associated with his/her field. Nowadays, applications and platforms which allow simultaneous collaboration between users, such as the multi-finger or sharing of documents in real time [11], are unusual. Therefore, current researches aim at exploring the possibilities of such new technologies [4]. Shen et al. [9] propose a software named DiamondSpin to simplify the development of interactive applications using the DiamondTouch tactile interactive table. DiamondSpin is a toolkit for the efficient prototyping and experimentation with multi-user, concurrent interfaces for interactive shared displays. It allows document positioning and orientation on a tabletop surface and also supports multiple work areas within the same digital tabletop. Besacier et al. [3] propose a whole of metaphors [2] in J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 446–455, 2009. © Springer-Verlag Berlin Heidelberg 2009
New Human-Computer Interactions Using Tangible Objects
447
direct relationship with the traditional use of a paper sheet on an interactive table. It would be possible to handle a document in a virtual way (for example to turn over it or group a whole of documents). These new interactions make it possible several co-localized users around an interactive table to work in a collaborative way. However, even though several applications have been proposed for tabletops, very few of them use tangible objects to interact with users. In this paper, we describe a new type of tabletop based on RFID (Radio Frequency IDentification) technology which enables the users to manipulate tangible objects equipped with RFID tags (offering the possibility to store data of different types); so, several participants around the table can interact and work in a collaborative way on applications using physical objects (like in design or production tasks, games, etc.). This paper details the TTT project, the multi-agent architecture of the tabletop as well as a case study.
2 TTT Project The TTT (interactive Table with Tangible and Traceable objects) project proposes an alternative vision of the way of using tangible objects in conjunction with an interactive tabletop. For that, a new technology was implemented by using RFID tags stuck on objects by the RFIdées1 Company. Four partners are involved in the TTT project: two laboratories (LAMIH2, LIG3) and two companies (CEA4, RFIdées). The magnetic table is connected to a computer, including an array of RFID antennas in the form of "tiles". The Figure 1(a) shows the prototype v1 of the digital table. It is possible to distinguish the different RFID antennas which compose the table delimited by the black line. The prototype is composed of so called tiles (Fig. 1(b)) each containing 64 antennas (8 x 8) per 2.5 cm². Each tile contains a DSP processor which reads the RFID antennas, an antenna multiplexer and communication processor. The reading strategies are prioritized and the code is distributed between the processor reader antennas, the processor in charge of multiplexing and the host computer. The table measures about one meter square and contains 25 tiles (5 x 5) or 1600 antennas in total. The tiles are associated to each other via a control interface connected to the host computer by an Ethernet bus. At this time, the prototype v3 can communicate with all the layers of the structure. We will explain this layered structure in the next part. The delay between two displacements is acceptable, we are able to play with some RFID tags like a marbles game but the delay could be again improved on the prototype v4. 2.1 Structure and Communication An architecture including three layers has been adopted for the table (Fig. 2): 1. The Capture and Interface layer handles tangible objects provided with one or more tags per object and creates a java object by associating it to a form. 1
www.rfidees.fr www.univ-valenciennes.fr/LAMIH 3 www.liglab.fr 4 www.cea.fr 2
448
S. Kubicki et al.
(a)
(b)
Fig. 1. (a) Prototype v1 of the table and (b) a tile containing 8x8 antennas (prototype v3)
2. The Traceability layer handles events associated to the objects and communicates the modifications of object positions to the applicative layer. 3. The Application layer manages the specificities of the application associated to the table. This figure 2 also shows the data flows between each one of the layers. The data flows can only move from one layer into the adjacent one and must pass through an application interface. The applicative interface is used as connection between the layers and defines the exit and entrance points. It is via this interface that two layers are linked and will be able to communicate. The applicative layer is broken up into two parts: • The part integrating the Multi-Agent System (MAS) whose the supervisor agent called Genius has a total vision of the virtual and physical (tangible) objects, knowing all the characteristics of the application (role of each object, rules of the game, and so on). • The Human-Computer Interaction (HCI) part which is given the responsibility of communicating with the users and which makes it possible to transmit virtual information (for example the displacement of a virtual object by the user). 2.2 The Multi-Agent System The structure of a Multi-Agent System could be described by multiple ways according to the context of the application concerned. The main concepts used are the following: • Tangible agents (represented by physical objects such as a book, a counter, a mobile phone), • Virtual agents (displayed if they represent a digital object such as a colored zone for example), • The users (people in interaction with the application). So, it is possible to emphasize the various possible strong interactions (Fig. 4) (a strong interaction is an interaction which results from the dependence of an element
New Human-Computer Interactions Using Tangible Objects
Fig. 2. The three layers composing the TTT Project
449
450
S. Kubicki et al.
(a)
(b)
Fig. 3. An example of tactile display (a) and inclusion of RFID tags on a glove (b)
compared to another). We find in this case the different interactions between the tangible or virtual agents and between the users and the tangible or virtual agents. As a consequence, we can deduce for example that a tangible agent can act on the virtual agent location and not the opposite. A tangible object can modify the virtual object location (please note that the reverse goes against all physical laws and it is not, in our case, possible). It will be possible, with this interactive table, to interact with some tangible and virtual objects but for virtual objects, a tactile technology must be available. For that, two solutions are possible: 1. The next prototype can include a tactile display (Fig. 3) and permit to interact directly with fingers. 2. We can adopt the solution of an interactive glove with RFID tags to simulate a tactile technology. That is why we distinguish the possibilities of interactions between user and agents (Fig. 4). In the case when the tactile is not implemented, the user can only interact with tangible objects; so, tangible agents associated to each object are involved. If the tactile is available, the user could interact with the tangible and virtual objects. The Genius agent used is a software entity able to answer any question of the users about the objects location and their roles. This agent is the central point of the MultiAgent System; the agents of the application interact with it to announce or modify their location. The multi-agent organization proposed follows a hierarchical structure with flexible levels. At the top of this hierarchy, the Genius agent can be considered as an observant agent (or even coordinator depending on applications) which establishes the link between the users and the agents. On the last level, the located agents (tangible agents dependant on a physical object), propose a reflection of the tangible objects present on the interactive table. Each object is associated with an agent which has the characteristics of this object such as: role, location, environment. These agents return the various internal modifications to the Genius which knows the general map of the table.
New Human-Computer Interactions Using Tangible Objects
451
Fig. 4. The possible interactions between agents and users
2.3 Interactions Between the Table and Users Initially, we have to agree that all the virtual objects have to be defined when designing the application. All these objects are initialized when the application is starting. After that, the user can interact with these virtual objects to move them, for example, with a finger or an object equipped with a RFID tag. The first stage to develop consists in initializing the virtual and tangible objects which will be used in the application; that is why we distinguish two types of users: • The end user of the application does not need particular knowledge to use the application. He or she can be of any age, knowledgeable in data processing or not; he or she has just to know the rule(s) of the game or the principle of the application. • The user known as administrator has a complete knowledge of the application; the administrator can intervene on the application (internal modification). However, one can suppose that an end user could take the role of administrator for some operations. The Human-Computer Interface will have to be easy to use. We have presented the layered structure defined during the TTT project. The Captures and Interface layer manages the interactions on the table; it transmits information to the Traceability layer which creates the historic of the different objects. Then the Traceability layer transmits information to the Applicative layer, more exactly to the Multi-Agent System layer which gives a role to each object and informs the Human-Computer Interaction layer to display the result to the users. To illustrate an application using the table, we propose an example which uses the entire layered structure and shows the possible interactions with users.
452
S. Kubicki et al.
3 Case Study and Modeling We propose an example of application which points out the structure presented before. After a presentation of the application called Luminous Zone, we present a scenario using it. The scenario emphasizes the utilization of Luminous Zone but we explain more especially the first stage which consists in initializing the different objects. In fact, before using the application with some objects, the user must initialize all the objects which have to be used. Two sequence diagrams (Fig. 5 and 6) modeled with UML2 will show the communication between each layer. 3.1 Example In order to illustrate the used architecture (Fig. 2), we propose an example in which the table has to illuminate a zone specified beforehand according to the location of a switch object (a tangible object initialized previously). In this example, there are several objects, each with a role. First, the tangible objects could be everything (a pen, a counter, a book for example). If the user places a tangible object in one of the virtual colored zones (projected), the lighting zone (LED included in the table) lights up color in which the tangible object (the switch) is located. It is possible to use several switches. For example, the colored zone would be divided according to the number of users (let us suppose a side for a user, therefore, four users is the maximum).
Fig. 5. A scenario using the table with Luminous Zone application
New Human-Computer Interactions Using Tangible Objects
453
3.2 Scenario We propose a sequence diagram which is modeled with UML2 (Fig. 5) to illustrate a scenario using the table with the Luminous Zone application. The user moves an object having for role “switch”. Displacement is detected by the Capture and Interface layer which transmits information to the Traceability layer. This one sends the new location of the object to the Genius. In MAS layer, each object (tangible or virtual) is represented by an agent. Each agent questions its local environment in order to know its location compared to the other objects. Here, the agent associated with the “switch” object checks if it is set in a colored zone; if necessary, this agent transmits to the Genius the need for lighting one of the luminous zones. After reception of the data by the MAS layer, the HCI layer assigns a color to the luminous zone and light it.
Fig. 6. Initialization of a new object and association of it with a role and behavior
454
S. Kubicki et al.
To use the application, all the objects must be initialized. For that, a user interface is necessary. This one has to offer the choice to name an object, define these roles and behaviors. The sequence diagram (Fig. 6) shows such aspects.
4 Conclusion The table has the original characteristic of interacting directly with users and tangible objects. This new working aspect is different compared to the interactive tables available currently. It permits to explore a new way of research in HCI as well as in MAS. The association between interactive table and Multi-Agent System is original and brings promising possibilities [1]. HCI will be used for direct interaction with the users allowing a simple and intuitive use of the applications of the table. It will propose innovations in terms of HCI in the use of an interactive table which manages tangible and traceable objects. Some examples of this are the modification of context [5] and [7] during the initialization of a tangible object (detection of the user, loading of personal parameters…) and so on, the adaptation to the context [10] and [8] (mono or multi-user, modification of environment, and so on). Our objective is now to apply this research and use the different UML diagrams to develop a set of applications using the table and its specificities. At this time, the first application Luminous Zone is under test. A new version of the table is under development by the RFIdées Company.
Acknowledgements The present research work is supported by the "Agence Nationale de la Recherche” (ANR). We would also like to thank our two partner companies in the TTT project: RFIdées and the CEA.
References 1. Adam, E., Mandiau, R.: Flexible roles in a holonic multi-agent system. In: Mařík, V., Vyatkin, V., Colombo, A.W. (eds.) HoloMAS 2007. LNCS, vol. 4659, pp. 59–70. Springer, Heidelberg (2007) 2. Agarawala, A., Balakrishnan, R.: Keepin’ it real: pushing the desktop metaphor with physics, piles and the pen. In: Proc. CHI, pp. 1283–1292 (2006) 3. Besacier, G., Rey, G., Najm, M., Buisine, S., Vernier, F.: Paper metaphor for tabletop interaction design. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 758–767. Springer, Heidelberg (2007) 4. Couture, N., Rivière, G., Reuter, P.: GeoTUI: A Tangible User Interface for Geoscience. In: Proceedings of the second ACM International Conference on Tangible and Embedded Interaction, Bonn, Germany, pp. 89–96 (2008) 5. Dey, A.K., Salber, D., Futakawa, M., Abowd, G.D.: An architecture to support contextaware applications, GVU Technical Reports 6. Dietz, P., Leigh, D.: Diamondtouch: A multiuser touch technology. In: UIST 2001: Proceedings of the 14th annual ACM symposium on User interface software and technology, pp. 219–226. ACM Press, Orlando (2001)
New Human-Computer Interactions Using Tangible Objects
455
7. Hariri, M.A., Tabary, D., Lepreux, S., Kolski, C.: Context aware business adaptation toward user interface adaptation. Communications of SIWN 3, 46–52 (2008) 8. Lepreux, S., Hariri, M.A., Rouillard, J., Tabary, D., Tarby, J.-C., Kolski, C., Jacko, J. (eds.): Towards Multimodal User Interfaces Composition based on UsiXML and MBD principles, HCI International 2007, 12th International Conference 2007, pp. 134–143 (2007) 9. Shen, C., Vernier, F., Forlines, C., Ringel, M.: DiamondSpin: An extensible toolkit for around-the-table interaction. In: CHI 2004 International Conference on Human Factors in Computing Systems, pp. 167–174. ACM Press, New York (2004) 10. Sottet, J.-S., Calvary, G., Coutaz, J., Favre, J.-M., Vanderdonckt, J., Stanciulescu, A., Lepreux, S.: A Language Perspective on the Development of Plastic Multimodal User Interfaces. Journal of Multimodal User Interfaces 1, 1–12 (2007) 11. Wu, M., Balakrishnan, R.: Multi-finger and whole hand gestural interaction techniques for multi-user tabletop displays. In: UIST 2003: Proceedings of the 16th annual ACM symposium on User interface software and technology, pp. 193–202. ACM, Vancouver (2003)
Context-Aware Cognitive Agent Architecture for Ambient User Interfaces∗ Youngho Lee, Choonsung Shin, and Woontack Woo GIST U-VR Lab, Gwangju 500-712, Korea {ylee, cshin, wwoo}@gist.ac.kr
Abstract. An ambient user interface is a set of hidden intelligent interfaces that recognize user’s presence and provides services to immediate needs. There are several research activities on user interfaces and interactions which combining VR/AR, ubiquitous computing/ambient interfaces, and artificial intelligence. However, real-time and intelligent responses of user interfaces are still challenging problems. In this paper, we introduce the design of Context-aware Cognitive Agent Architecture (CCAA) for real-time and intelligent responses of ambient user interfaces in ubiquitous virtual reality, and discuss possible scenarios for realizing ambient interfaces. CCAA applies a vertically layered twopass agent architecture with three layers. The three layers are AR (augmented reality) layer, CA (context-aware) layer, and AI layer. The two passes interconnect the layers as an input or output. One of the passes of each layer is an input path from a lower layer or environmental sensors describing a situation. The other pass is an output path and deliveries a set of appropriated actions based on the understanding of the situation. This architecture enables users interact with ambient smart objects through an ambient user interface in various ways of intelligence by exploiting context and AI techniques. Based on the architecture, several possible scenarios about recognition problems and higher level intelligent services for ambient interaction are suggested. Keywords: Ambient user interface, ubiquitous virtual reality, contextawareness, augmented reality.
1 Introduction According to the changing of computing environments, user interfaces and interaction ways have changed radically. Computing paradigms such as ubiquitous virtual reality, ambient intelligence, ubiquitous/pervasive computing are proposed in order to fulfill requirements for the new computing environments [1,2]. An ambient user interface is a set of hidden intelligent interfaces that recognize user’s presence and provides services to immediate needs [3]. With an ambient user interface, users are allowed to interact with a whole environment with expecting the highly intelligent responses. Many researchers expect that lots of novel interfaces and interaction methods would be appeared by combining visualization/user interaction techniques (AR/VR), ∗
This research was supported by the CTI development project of KOCCA, MCST in S.Korea.
J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 456–463, 2009. © Springer-Verlag Berlin Heidelberg 2009
Context-Aware Cognitive Agent Architecture for Ambient User Interfaces
457
context-awareness, and artificial intelligence (AI). These show that AR/VR can be combined with AI as ambient interfaces in ubiquitous computing environments. AIBAS (Adaptive Intent-Based Augmentation System) is a framework for improving tracking and display technology in order to minimize registration errors [4]. The idea is that if a system could understand purpose of augmentation, it could reduce registration error. DWARF is a framework for developing augmented reality system with a set of intelligence module [5]. It was designed under the assumption that AR systems are working in ubiquitous computing environments. ARFaçade is an AR storytelling system originated from Façade which is the first working storytelling system [6]. ubiAgent shows several AR agent applications and suggests future research directions in AR, AI, and ubiComp [7]. ARGardening and CAMAR (Context-Aware Mobile Augmented Reality) could be one of representative researches of user interfaces in ubiquitous virtual reality [8,9]. However, real-time and intelligent responses of user interfaces are still challenging problems. In this paper, we introduce the design of Context-aware Cognitive Agent Architecture (CCAA) for real-time and intelligent responses of user interfaces, and discuss possible scenarios for realizing ambient interfaces. It applies a vertically layered twopass agent architecture with three layers [10]. The three layers are AR (augmented reality) layer, CA (context-aware) layer, and AI layer. The three layers are classified according to their processing complexity. While higher layers are doing their processing, AR layer produces output in real-time. A higher layer makes up for the weak points of a lower layer. For example, CA layer could give a processing result for improving recognition performance in AR layer. This architecture enables users interact with ambient smart objects through an ambient user interface in various ways of intelligence by exploiting context and AI techniques. It is designed to guarantee real-time and intelligent responses of the ambient interfaces. Based on the architecture, several possible scenarios about recognition problems and higher level intelligent services for ambient interaction are suggested. This paper is organized as followings. In Section 2, we briefly introduce background about ambient user interfaces in ubiquitous virtual reality. Context-aware Cognitive Agent Architecture is presented in Section 3, and problems in ambient interfaces and possible scenario is in Section 4. Conclusion and future works are discussed in Section 5.
2 Ambient User Interfaces in Ubiquitous Virtual Reality Ubiquitous Virtual Reality (Ubiquitous VR) has been researched in order to apply the concept of virtual reality and its technology into ubiquitous computing environments [1,2]. The idea comes that the limitations of virtual reality could be improved through the new computing paradigm, on the other hands, the problems when we realize ubiquitous computing (ubiComp) or ambient intelligence (AmI) could be solved by conventional virtual reality technology. Kim et al. discussed how VR and ubiComp help each other to overcome the limitations [11]. VR is still far from users in Real world and it has no killer applications in our daily lives. UbiComp is novel paradigm and many technical problems are raised currently such as user interfaces, contextawareness with artificial intelligence, collaborative networking, resource sharing, and others. Those problems has been researched and discussed in VR research field.
458
Y. Lee, C. Shin, and W. Woo
Lee et al. presented three key characteristics of Ubiquitous VR based on reality, context, and human activity [2]. Reality-virtuality continuum was introduced by Milgram. According to Milgram’s idea, real world is ‘any environment consisting solely of real objects, and includes whatever might be observed when viewing a real-world scene either directly in person’ [12]. Context is defined as ‘any information that can be used to characterize the situation of an entity, where an entity can be a person, place, or physical or computational object’ [13]. Context can be represented as static-dynamic continuum. We call static context if it describes information such as user profile. On the other hand, if it describes wisdom obtained by intelligent analysis, it is called dynamic context. Human activity could be classified into personal, group, community and social activity. It can be represented a personal-social continuum. Ubiquitous VR supports human social connections with highest-level user context (wisdom) in mixed reality. Hence, Ubiquitous VR was described as socially wise mediated reality. An ambient user interface is a set of hidden intelligent interfaces that recognize user’s presence and provide services to their immediate needs [3]. It assumes that things necessary for daily life embed micro processors, and they are connected over wired/wireless network. It also assumes that user interfaces control environmental conditions and support user interaction in a natural and personal way. Hence, an ambient user interface is a user interface technology which supports natural and personalized interaction with a set of hidden intelligent interfaces. An ambient user interface in Ubiquitous VR has different features from other user interfaces. It has to cover real and virtual space (mixed reality). Hidden intelligent interfaces are not easy for human to be aware of their existences. Augmented reality technology could help solve this problem [9]. It should have not only simple input/output functions, but it should also support more abstract levels of computation and representation. Thus on the highest level of abstraction, the ambient user interface could understand user-centric contexts so as to show intelligent responses [14]. Ambient user interfaces in Ubiquitous VR should help users adapt the application to their personal needs intuitively and personalized way without interrupting their tasks. Without considering a user's situation, applications always provide the same interfaces to all users.
3 Context-Aware Cognitive Agent Architecture for Ambient User Interfaces in Ubiquitous Virtual Reality 3.1 Requirements In order to develop an ambient user interface in Ubiquitous VR, we need to combine augmented reality user interfaces, context-aware middleware, and artificial intelligence technology. The followings are requirements. 1. Processing Time ─ It is necessary that augmented reality user interfaces should display its contents and allow user interaction in real time. It requires at least 30 f/s. ─ It requires at least one second for acquiring, integrating, and processing context [18]. ─ AI algorithm requires enough time for reasoning and planning.
Context-Aware Cognitive Agent Architecture for Ambient User Interfaces
459
2. Complementary cooperation ─ It requires context-awareness to enhance augmented reality user interface and its contents. ─ It requires artificial intelligence technology for effective and accurate context-awareness. ─ It requires user profile for providing personalized services. 3.2 Context-Aware Cognitive Agent Architecture for Ambient User Interfaces Cognitive Agent Architecture for virtual and smart environment was proposed for realizing seamless interaction in ubiquitous virtual reality [10]. It is a cognitively motivated vertically layered two-pass agent architecture for realizing responsiveness, reactivity, and pro-activeness of smart objects, smart environments, virtual characters, and virtual place controllers. Direct responsiveness is bounded to time frame of visual continuity (about 40 msec). Immediate reaction is requested from user’s command and it could take more than 40msec, with a second. Pro-activity is schedule events and it could take any amount of time, five sec, a min., or a day. Two main types of tasks have to be handled in cognitive agent architecture. Context integration is the task of deriving aggregated and abstracted information about a situation from sets of singular data, particularly, sensory data and facts. Context management is the task of invoking services/actions appropriate in a context with appropriate contextual information. Context-aware Cognitive Agent Architecture (CCAA) is designed for real-time and intelligent responses of ambient user interfaces based on context-aware agent architecture in Ubiquitous VR. The three layers are AR (augmented reality) layer, CA (context-aware) layer, and AI layer. This architecture enables ambient smart objects to interact with users in various ways of intelligence by exploiting context and AI techniques.
Fig. 1. Context-aware cognitive agent architecture for ambient user interfaces in ubiquitous virtual reality
460
Y. Lee, C. Shin, and W. Woo
AR (augmented reality) layer is composed of several embedded sensors such as camera, light emission sensors, and temperature sensors, and libraries for augmentation such as graphic rendering, tracking, and physical simulation module as shown in figure 1. It is similar to conventional augmentation reality applications. CA (contextaware) layer is applied to unified context-aware application model (UCAM) [15]. It is a representative context-aware middleware in Ubiquitous VR. Context integrator gathers data and information form embedded sensors and environment through network. Context manager process those information and provide AR layer with final context. AI layer is in charge of planning and reasoning. The CCAA is designed to guarantee real-time and intelligent responses of the ambient interfaces. The three layers are classified according to their processing complexity. Its two passes interconnect the layers as an input or output. One of the passes of each layer is an input path from a lower layer or environmental sensors describing a situation. The other pass is an output path and deliveries a set of appropriated actions based on the understanding of the situation. While higher layers are doing their processing, AR layer produces output in every 40 msec. A higher layer makes up for the weak points of a lower layer. For example, CA layer could give a processing result for improving recognition performance in AR layer.
4 Possible Application Scenarios Based on the CCAA, several possible scenarios about recognition problems and higher level intelligent services for ambient user interfaces and user interaction are described here. 4.1 Recognition of the Large Number of Patterns As a possible scenario, CCAA can apply to recognize large number of patterns with natural feature tracking algorithm. Let’s assume that we need to recognize the thousands of patterns of Digilog Book [8,16] with natural feature tracking algorithm. It means the patterns follows dictionary order. We must face a problem that natural feature tracking algorithm is hard to match current observed pattern with the thousands of patterns in a database. One possible solution is that if we know exact pattern number or possible set of patterns, it might be easier and faster. Before the first layer detects patterns in a page, the CA layer could find out exact page number of the pattern or possible set of pages from user’s reading history. The page layout of Digilog Book can be a clue to find out the exact page number. If we can read the page number at the bottom or top of a page directly, it reduces computational complexity dramatically. On other way, if we know current page, then we can expect that the next pattern will be in the previous or next page. It could not happen that the next pattern is the next several pages. 4.2 Recognition of the Same Pattern with Different Meaning CCAA can apply to reducing search spaces for recognizing objects of ambient interfaces. Let’s assume that a mobile device with a camera needs to recognize a specific appliance with company logo to find out its name in a smart home [17]. In a smart
Context-Aware Cognitive Agent Architecture for Ambient User Interfaces
461
home, there are lots of appliances and some of them could be products from the same company with the same logo. In this case, it is a difficult problem to distinguish different appliances with the same logo. CCAA can give the solution of logo recognition problem. When the first layer detects company logo, it finds several possible candidates from appliances in the room. The second layer of CCAA acquires preliminary context from smart sensors and detects or predicts user’s position and orientation from user’s behavior pattern. The result goes back to the first layer and it helps the first layer get a correct appliance. 4.3 Animated Characters in Ubiquitous Vr CCAA can apply to animated AR characters which are able to perceive environmental conditions and to select behavior autonomously. Let’s assume that we need dancing AR robot in our ambient user interface. We expect that it listens to current music in a room and finds a path to avoid obstacles while dancing in real time. AR Layer performs superimpose a robot character on real world. It puts the robot at a proper position by analyzing coordinates system from video images, and it play animation sequences using animation libraries. The second layer, CA layer, receives music signal from a microphone and analyzes the signal to get the tempo of music. Then it selects a proper animation sequence according to music. The third layer performs planning process based on robot’s personality and goal. The results could be translated into values about personality or emotional statues and it could influence the behavior selection mechanism of the dancing robot. Hence the AR robot in an ambient user interface dances according to music tempo and its personality. Figure 2 illustrates how the dancing robot shows intelligent responses being aware of context in real time.
Fig. 2. Dancing robot scenario with CCAA
462
Y. Lee, C. Shin, and W. Woo
5 Conclusion and Future Works In this paper, we introduced our design of Context-aware Cognitive Agent Architecture and discussed possible scenarios for realizing ambient user interfaces in ubiquitous virtual reality. An ambient user interface in ubiquitous virtual reality has to visualize hidden interfaces in environment and to be aware of a user’s and environment context. It also has to process high-level intelligent responses. With the possible scenario, we expect that the context-aware cognitive agent architecture will be applied to ambient user interfaces in future computing environments. For the future works, we have plans to realize this architecture with concrete techniques and application scenarios in ubiquitous virtual reality.
References 1. Lee, Y., Oh, S., Shin, C., Woo, W.: Recent Trends in Ubiquitous Virtual Reality. In: International Symposium on Ubiquitous Virtual Reality, pp. 33–36 (2008) 2. Lee, Y., Oh, S., Shin, C., Woo, W.: Ubiquitous Virtual Reality and Its Key Dimension. In: International Workshop on Ubiquitous Virtual Reality, pp. 5–8 (2009) 3. Horvath, J.: Telepolis, Making Friends with Big Brother, http://www.heise.de/tp/r4/artikel/12/12112/1.html 4. MacIntyre, B., Coelho, E., Julier, S.: Estimating and Adapting to Registration Errors in Augmented Reality Systems. In: IEEE Virtual Reality 2002, Orlando, Florida, March 2428, pp. 73–80 (2002) 5. MacWilliams, A., Sandor, C., Wagner, M., Bauer, M., Klinker, G., Bruegge, B.: Herding Sheep: Live System Development for Distributed Augmented Reality. In: 2nd IEEE/ACM international Symposium on Mixed and Augmented Reality, October 07-10, 2003, p. 123. IEEE Computer Society, Washington (2003) 6. Dow, S., Mehta, M., Lausier, A., MacIntyre, B., Mateas, M.: Initial lessons from AR Façade, an interactive augmented reality drama. In: ACM SIGCHI international conference on Advances in computer entertainment technology, June 14-16 (2006) 7. Barakonyi, I., Psik, T., Schmalstieg, D.: Agents That Talk And Hit Back: Animated Agents in Augmented Reality. In: ISMAR, pp. 141–150 (2004) 8. Oh, S., Woo, W.: ARGarden: Augmented Edutainment System with a Learning Companion. In: Pan, Z., Cheok, D.A.D., Müller, W., El Rhalibi, A. (eds.) Transactions on Edutainment I. LNCS, vol. 5080, pp. 40–50. Springer, Heidelberg (2008) 9. Oh, S., Woo, W.: CAMAR: Context-aware Mobile Augmented Reality in Smart Space. In: International Workshop on Ubiquiotus Virtual Reality, pp. 48–51 (2009) 10. Lee, Y., Schmidtke, H., Woo, W.: Realizing Seamless Interaction: a Cognitive Agent Architecture for Virtual and Smart Environments. In: International Symposium on Ubiquitous Virtual Reality, pp. 5–6 (2007) 11. Kim, S., Lee, Y., Woo, W.: How to Realize Ubiquitous VR? In: Pervasive:TSI Workshop, pp. 493–504 (2006) 12. Milgram, P., Kishino, F.: A Taxonomy of Mixed Reality Visual Displays. IEICE Transactions on Information Systems E77-D(1994) 13. Abowd, G.D., Dey, A.K., Brown, P.J., Davies, N., Smith, M., Steggles, P.: Towards a Better Understanding of Context and Context-Awareness. In: Gellersen, H.-W. (ed.) HUC 1999. LNCS, vol. 1707, pp. 304–307. Springer, Heidelberg (1999)
Context-Aware Cognitive Agent Architecture for Ambient User Interfaces
463
14. Jang, S., Woo, W.: Unified context representing user-centric context: Who, where, when, what, how and why. In: International Workshop on ubiPCMM 2005, CEUR Workshop Proceedings, pp. 26–34 (2005) 15. Oh, Y., Woo, W.: A unified Application Service Model for ubiHome by Exploiting Intelligent Context-Awareness. In: Murakami, H., Nakashima, H., Tokuda, H., Yasumura, M. (eds.) UCS 2004. LNCS, vol. 3598, pp. 192–202. Springer, Heidelberg (2005) 16. Lee, Y., Ha, T., Lee, H., Kim, K., Woo, W.: Digilog Book: Convergence of Analog Book and Digital Content. KIPS 14, 186–189 (2007) 17. Yoon, H., Woo, W.: Design and implementation of a universal appliance controller based on selective interaction modes. IEEE Transactions on Consumer Electronics 54, 1722– 1729 (2008) 18. Oh, Y., Schmidt, A., Woo, W.: Designing, Developing, and Evaluating Context-Aware Systems. In: MUE 2007, pp. 1158–1163. IEEE Computer Society, Los Alamitos (2007)
An Embodied Approach for Engaged Interaction in Ubiquitous Computing Mark O. Millard1 and Firat Soylu2 1
School of Library & Information Science, Indiana University 2 Department of Cognitive Science, Indiana University 220 North Rose Bloomington, IN {mmillard,fsoylu}@indiana.edu
Abstract. A particular vision of ubiquitous computing is offered to contribute to the burgeoning, dominant interaction paradigm in human-computer interaction (HCI). An engaged vision of ubiquitous computing (UbiComp) can take advantage of natural human abilities and tendencies for interaction. The HCI literature is reviewed to provide a brief overview of promising interaction styles and paradigms in order to situate them within ubiquitous computing. Embodied interaction is introduced as a key theoretical framework for moving UbiComp forward as an engaged interaction paradigm. Keywords: ubiquitous computing, HCI, embodied interaction, tangible interaction.
1 Introduction Human-computer interaction (HCI) is a diverse, multidisciplinary research and practice domain. The multidisciplinary history of HCI is, in part, demonstrated by the various monikers (e.g., human factors, ergonomics) it has shared throughout its short history and also by the sheer number of academic disciplines that contribute and concern themselves with HCI problems. Preece et al. [1] note the multidisciplinary nature of HCI in a diagram that represents over 20 different components including academic disciplines, other interdisciplinary fields, and design practices that have a significant relationship with the HCI domain. This group of disciplinary contributors continues to grow as the issues of UbiComp continue to be uncovered [2]. discusses the “three faces of HCI” acknowledging several broad fields of historical influence, from industrial engineering to cognitive psychology. The extensive multidisciplinary nature of HCI highlights its historical character and resiliency, but also the paradigm challenges it faces. An analysis of the literature indicates that HCI is in the midst of a significant paradigm shift that is causing a fundamental shift in the problems, theories, methods and models needed within HCI; which leads us to the goal of this paper. This paper is not intended to propose a solution to any single problem, but rather is intended to illustrate a particular view of interaction within ubiquitous computing in order to raise questions that clarify Weiser’s [3] vision of calm computing. In order to understand the potential trajectory of HCI, this paper intends to clarify the scholarly discourse within HCI by providing an analysis and review of the literature J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 464–472, 2009. © Springer-Verlag Berlin Heidelberg 2009
An Embodied Approach for Engaged Interaction in Ubiquitous Computing
465
surrounding direct manipulation, multi-touch interaction (MTI), tangible user interfaces (TUI), and ubiquitous computing (UbiComp). In addition, the paper will attempt to situate these HCI concepts within the theoretical framework of embodiment / embodied interaction. Embodied interaction is the theoretical foundation that connects each of these seemingly disparate HCI concepts and provides a foundation for an ‘engaged’ vision of UbiComp.
2 Theoretical Foundations: Embodied Interaction For most of HCI’s history it has been approached theoretically from a cognitivist / information-processing perspective. Practically this illustrates a disconnected approach as the “human using an application or tool, or as a communicative process between human and machine” [4]. The Cogntivist perspective overlooks, as Heidegger posits, the sense of ‘being’, and the ways that we encounter the world and act through it to define meaning and action; a distinction further made with ready-to-hand and present-at-hand [5]. These are necessary theoretical elements that facilitate a more complete understanding of human interaction in the world. This paper argues that this perspective is necessary when conceptualizing interaction in ubiquitous computing. The notion of embodiment is not a new one, but has been a common theme throughout philosophy and cognitive science and is at the center of a branch of philosophy called phenomenology which is principally concerned with the elements of the human experience, including questions of ontology and epistemology [5]. From this perspective, humans are seen as embodied beings with sensory, perceptual, and cognitive abilities that are “inextricably tied to our physical being, and the need for that being to function as it is situated in a dynamic, time-pressured world” [4]. Dourish [5] explicitly extends embodiment to HCI through is concept of ‘embodied interaction’ noting “we encounter, interpret, and sustain meaning through our embodied interactions with the world and with each other.” He continues by explicitly linking tangible computing and social computing with embodied interaction by highlighting that tangible computing allows us to created and communicate our meaning and actions and encourages users to “explore, adopt, adapt and incorporate interactive technology” into our everyday worlds [5]. But why choose embodiment as a key theoretical foundation for conceptualizing UbiComp interaction? One particularly good reason is the existence of research from other disciplines that support the underlying principles of embodiment for communicating, learning, and socializing. For HCI it makes sense to design technologies, interfaces, and interaction that mimic that which humans are already naturally good at doing, and the ways that they naturally interact. For example, research in educational psychology supports the idea of embodiment as a foundation for natural and effective human interaction, learning, and development. Klemmer et al. [6] provide analysis of educational theorists such as Piaget, Montessori, and others who discuss the importance of thinking by doing, and learning by doing, as well as the role of gesture in communication. Much of this research highlights the importance that ones “direct physical interaction with the world is a key constituting factor of cognitive development during childhood” [6]. In relation to the current dominant GUI paradigm, Klemmer et al. [6] even go so far as to state that primarily interacting through a keyboard and mouse can have a potentially hindering effect on a user’s ability to think, to be creative, and to communicate, because the use of bodily
466
M.O. Millard and F. Soylu
movement and gesture is so important in these processes. This signals a potential strength of embodied interaction, and its recognition that space is not simply a “container for our actions”, but is a “setting within which we act” [12], and it signifies how we encounter the world and act through it to define meaning and action. Another unique aspect of embodiment is that it tightly links the physical and social worlds in which we act. In fact, Dourish [5] contends “embodiment is the common way in which we encounter physical and social reality in the every day world” by noting that humans are “enmeshed in a world of physical facts” and that our daily experiences are also extensively social as “we interact daily with other people and we live in a world that is socially constructed” [5]. Although extremely important a detailed discussion of the social aspects of UbiComp are beyond the scope of this paper. However, there has been a significant amount of research and discussion regarding the importance of social aspects of computing and interaction within UbiComp [3, 5, 7, 8, 9, 11], and others have studied the importance of gesture in social and communicative processes [6, 12]. This section has discussed the theoretical foundation from which ubiquitous computing should move forward. Although the application of embodiment and embodied interaction is quite new, it holds great promise for HCI as it shifts toward ubiquitous computing and begins to reconceptualize interaction. The paper will now focus discussion on ubiquitous computing and the developing HCI interaction styles and techniques that will most effectively align with UbiComp and the theory of embodied interaction.
3 Ubiquitous Computing: Envisioning the Next Paradigm Ubiquitous computing (UbiComp) is a fundamental shift in interaction and computing. Although this nascent term is sometimes used in too broad of context, much of the early work on UbiComp assumes Weiser’s vision, which conceives “a new way of thinking about computers, one that takes into account the human world and allows the computers themselves to vanish into the background” [13]. Recently, there has been a growing debate about the direction in which to proceed with Weiser’s vision of ‘calm computing’. In response, this paper will proceed with a view based on Rogers [10] conceptualization that engages users. Rogers proposes an alternative agenda “which focuses on designing UbiComp technologies for engaging user experiences. It argues for a significant shift from proactive computing to proactive people; where UbiComp technologies are designed not to do things for people but to engage them more actively in what they currently do. Rather than calm living it promotes engaged living… Furthermore, it argues that people rather than computers should take the initiative to be constructive, creative and, ultimately, in control of their interactions with the world – in novel and extensive ways” [9]. Since UbiComp is still a developing paradigm, it is referred to in the literature in different ways (e.g., pervasive computing, context-aware, wearable computing, calm computing, third wave, post-WIMP, post-cognitivist) and the categorization of the various UbiComp models, theories and interaction styles is still being debated within HCI. Nevertheless, HCI researchers are beginning to understand the shift in interaction, from a dominant cognitive/information-processing view to a more embodied and social-oriented view [5, 8, 9, 12]. This paradigm shift demonstrates a fundamental
An Embodied Approach for Engaged Interaction in Ubiquitous Computing
467
conceptual change for the role of interaction in HCI. This paradigm shift is well supported in the literature, yet there is extensive scholarly discourse about what form this new wave of interaction should take. Since the debates are still in progress, it is difficult to predict exactly where the debate will lead UbiComp, but this paper claims that embodied interaction has the ability to situate many of the most promising UbiComp interaction styles and techniques such as: tangible interaction, multi-touch interaction, among others. There are so many potentially interesting new interaction and interfaces styles, methods and techniques currently being researched including: haptics, ambient sound, natural-language, gesture-based, multi-touch, and tangible interaction that it is difficult to limit our discussion. However, we have focused our discussion on three particularly interesting interaction styles that naturally make use of the principles of embodied interaction and can drive the quality of UbiComp interaction: direct manipulation, tangible user interfaces (TUI), and multi-touch interaction. It is important to remember our discussion, thus far, regarding UbiComp and the theoretical framework of embodied interaction, as these are the important foundations from which interaction will be understood. 3.1 Direct Manipulation The direct manipulation interaction style began in the late-1960s and 1970s in research labs at Stanford, MIT, Maryland and Xerox PARC [16]. It was the pairing of the mouse, graphical user interface (GUI), software, multiple tiled windows, and eventually hypertext that have contributed to and sustained the dominance of the direct manipulation interaction style for nearly three decades [15, 16]. Ben Shneiderman brought organization to the concept by describing direct manipulation interfaces as having the following underlying characteristics: 1) Continuous representation/visibility of the objects and actions of interest, 2) Physical actions or presses of labeled buttons instead of complex syntax, 3) Rapid incremental reversible operations whose effect on the object of interest is immediately visible [18]. The popularity and success of this interaction style is attributed to its “learnability”, ease of use, and the availability of immediate feedback [1, 18]. Direct manipulation systems also “tend to have a similar look and feel, and empower users’ sense of control in executing and evaluating tasks” [1]. Regardless of the strengths and popularity of direct manipulation, the models, metaphors and interfaces associated with this style do have drawbacks. For example, as direct manipulation interfaces become more complex the metaphors can begin to confuse the user [1]. Other HCI researchers also echo the complexity issues by suggesting that increasing complexity of individual direct manipulation interfaces aggregate to create more complexity for the user [15]. The complexity issue appears to be a key shortcoming in the literature for direct manipulation. This is where new interaction styles such as tangible interaction, and multi-touch interaction may be able to alleviate the “complexity problems”, by releasing users from the confines of their desktop computer, mouse and keyboard, and “coupling digital information to everyday physical objects and environments” [36]. This paper argues that we should not immediately abandon direct manipulation. Perhaps these “complexity problems” are the fault of the GUI and WIMP interaction styles and the limited desktop computing metaphors that are based in the cognitive/information-processing perspective. Nearly twenty-five years ago Shneiderman [18] notes that direct manipulation systems “offer
468
M.O. Millard and F. Soylu
the satisfying experience of operation on visible objects. The computer becomes transparent, and users can concentrate on their task”. This helps to illustrate that the GUI and WIMP interaction style may be the limiting factor, not the underlying characteristics of direct manipulation. It’s possible that we may look to direct manipulation to provide an already proven foundation for new interaction styles in ubiquitous computing. As we will see in the next section, although not explicit, Tangible interaction holds many similarities by coupling physical objects with digital artifacts for directly acting (i.e., manipulating) in the world—this is direct manipulation in its purest form. It is apparent that the current dominant interaction style (i.e., direct manipulation) faces many problems as we move into the complexity of third wave computing, yet its strengths, familiarity, and popularity provide it an opportunity to be re-examined before being cast aside. This paper argues that the foundational characteristics of direct manipulation can provide a basis for new interaction styles such as tangible interaction and multi-touch interaction. In this case we look to Weiser himself for justification of the re-examination of direct manipulation in discussing older technologies he notes that each are “bountiful sources of innovation, and have required reopening old assumptions, and re-appropriating old technology into new contexts” [3]. 3.2 Tangible Interaction There has been a significant amount of research linking tangible interaction with both UbiComp and embodied interaction [5, 12, 32]. In fact, Tangible User Interface (TUI) is seen as a different approach to Mark Weiser’s vision of ubiquitous computing, rather than make pixels melt into an interface, TUIs use physical forms that fit seamlessly into a user's physical environment” [14]. This establishes the impetus for moving forward with an exploration of tangible interaction within ubiquitous computing. The notion of TUI involves: interactive surfaces, coupling of physical objects and digital bits, and ambient media for increase feedback and background awareness [36]. To make this connection more explicit a discussion of TUI incorporates these elements noting, “TUI also utilizes malleable representations, such as video projections and sounds, to accompany the tangible representations in the same space to give dynamic expression of the underlying digital information and computation” [14]. Hornecker & Buur [20] have recently reconceptualized TUI as ‘tangible interaction’ in order to broaden Ishii & Ullmer’s earlier conception, which focused mostly on the interface aspect of TUI which resulted in a data-centered view of tangible interaction and does not encompass other complementary views that have been explored (i.e., expressive-movement-centered, and space-centered). This broadening helps to situate tangible interaction squarely within the realm of embodied interaction and contributes to its larger research agenda by incorporating space and movement. Advantages of tangible interaction include: it encourages interaction with, and through the world; encourages two-handed interaction; allows for more parallel input by the users; leverages the natural human abilities for physical object manipulation; takes advantage of our developed spatial reasoning; provides for more direct and manipulable interface elements; and affords for multi-modal and collaborative use [21]. Tangible interaction is “not about making ‘computers’ ubiquitous per se, but rather about awakening richly-afforded physical objects, instruments, surfaces, and
An Embodied Approach for Engaged Interaction in Ubiquitous Computing
469
spaces to computational mediation, borrowing perhaps more from physical forms of the pre-computer age than the present” [36]. An interesting example of tangible interaction is seen through a TUI prototype telephone answer machine. The Marble Answering Machine design utilized tangible and visible marbles (i.e., the physical ‘atoms’) to display and interact with the answering machine (i.e., the digital ‘bits’). “This physical embodiment of incoming phone messages as marbles demonstrated the great potential of making digital information graspable by coupling bits and atoms” [36]. This is just one interesting example in which tangible interaction capitalizes upon our familiarity with physical interactions in the everyday world [5]. It is clear from these advantages and examples that tangible interaction aligns well with each of earlier concepts, as well as multi-touch interaction, and provides a much richer set of options for UbiComp interaction. 3.3 Multi-touch Interaction Multi-touch interaction has recently excited many consumers and the popular press. With the recent commercial releases of the Apple iPhone, and Microsoft’s Surface Computing initiative, one can begin to quickly realize the power of this form of interaction. Although some analysts argue over the impact and success of the multi-touch interface [24], if we situate multi-touch within a UbiComp perspective and combine it with TUI, and utilize an embodied interaction perspective, we begin to see great possibilities and connections that were not visible before. There are distinct advantages of multi-touch interaction that have not yet been clearly articulated in the UbiComp literature. Advantages include: the ability to provide an extra layer of interaction within TUI, such as touchable surfaces on realphysical objects; multi-touch can extend the benefits of soft machines and malleable (softwired) interfaces to provide improved interaction, and increased flexibility and configurability [22, 23]; provide direct physical interaction through multi-touch surfaces, and with the addition of haptic technology, can provide “tactile sensations and controls to interactions” [24]; provide for two-handed direct interaction; support gesture based interaction; can support multi-user collaborative environments for better collaboration, coordination and parallel problem solving [25]; and assist TUI in linking physical tangible objects with digital realm. This list of advantages is by no means comprehensive, but begins a conversation regarding the interaction advantages of multi-touch interaction within an embodied framework. Although there has been a considerable amount of research conducted on multi-touch over the past twenty years, much of the research literature has focused on the technological aspects of getting these systems to work [26, 27, 28]. There is also a valuable body of literature pertaining to general input studies, and specific interaction methods and techniques with prototypes [25, 29, 30 31]. However, now that numerous commercial versions of multi-touch systems are being realized, there is great opportunity for studying not only the properties and issues involved in multi-touch interaction, but for also the issues of interaction with UbiComp, tangible interaction, and other multimodalities. What is important to recognize in regard to interaction within UbiComp environments, is the usefulness of combining interaction styles for multi-modal interaction [4, 5, 12, 33]. This includes interaction modalities such as: multi-touch interaction, haptics, ambient sound, tangible user interfaces, natural language, among others. The
470
M.O. Millard and F. Soylu
embodied view of interaction focuses on humans in the world, and thus can prescribe a broader view of interaction that is more analogous to the ways we naturally interact in the world [32]. By employing the embodied interaction perspective, many of these new combined forms of interaction can flourish in UbiComp environments bringing us ever closer to the realization of true forms of interaction.
4 Conclusion Although this paper has focused on embodied interaction as the primary theoretical framework for incorporating the many complex components of UbiComp, it is important to note that there are other perspectives that must be addressed in cooperation such as new theories and approaches that consider the real-life practices of people situated in social and cultural contexts including: activity theory, distributed cognition, ethnomethodology, external cognition, and situated action [8, 9, 34, 35, 36]. Although it has not been explicitly discussed in this paper, one particularly important goal for the HCI UbiComp community is to gain an understanding of the social issues that are embedded within HCI (i.e., aspects of anxiety, control, privacy, trust, and collaboration). These important questions were raised even as Weiser first conceptualized the idea of UbiComp. He contemplated “If computers are everywhere they better stay out of the way, and that means designing them so that the people being shared by the computers remain serene and in control.” [3]. This notion of embedding computers in the environment underscores the importance of considering the social and cultural aspects of UbiComp. Unfortunately, “thinking and talking about computerization as the development of socio-technical configurations, rather than as simply installing and using a new technology, is not commonplace” [37]. The underlying importance of these social issues helped guide the deliberate selection of interaction styles and theories for this paper, which involves leveraging an ‘engaged’ multi-modal approach to interaction, and is explicitly aligned with Rogers [10] “engaged” vision of UbiComp. The vision of Rogers, Dourish, and others is more in sync with the ways that humans naturally interact in the world. Humans are more satisfied, and learn more about the world, and themselves, when they are engaged in it, not when things are being done on their behalf. The claim within is that an engaged approach to UbiComp utilizing an embodied perspective can help researchers and practitioners think through the social aspects since “embodiment is a unifying principle for tangible and social computing” [5]. Having an understanding that the physical and social are inextricably intertwined affords a holistic perspective that we did not have with Cartesian dualism / cognitivist perspective, and is a perspective that HCI will certainly need as it moves forward.
References 1. Preece, J., Rogers, Y., Sharp, H.: Interaction Design. Wiley, Chichester (2002) 2. Grudin, J.: Is HCI homeless? In search of inter-disciplinary status. Interactions 13(1), 54– 59 (2006) 3. Weiser, M., Brown, J.S.: The coming age of calm technology (1996), http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm (retrieved 09/15/2007)
An Embodied Approach for Engaged Interaction in Ubiquitous Computing
471
4. Quek, F.: Embodiment and multimodality. In: Proceedings of ICMI 2006 International Conference on Multimodal Interfaces, pp. 388–390. ACM Press, New York (2006) 5. Dourish, P.: Where the Action Is. MIT Press, Cambridge (2001) 6. Klemmer, S.R., Hartmann, B.: How bodies matter: Five themes for interaction design. In: Proceedings of Design of Interactive Systems, vol. 74, pp. 140–149 (2006) 7. Abowd, G., Mynatt, E., Rodden, T.: The human experience of ubiquitous computing. Pervasive Computing 1(1), 48–57 (2002) 8. Kaptelinin, V., Nardi, B., Bødker, S., Carroll, J., Hollan, J., Hutchins, E., Winograd, T.: Post-cognitivist hci: second-wave theories. In: Proceedings of CHI 2003 Extended Abstracts on Human factors in Computing Systems, pp. 692–693. ACM Press, New York (2003) 9. Rogers, Y.: New theoretical approaches for human-computer interaction. Annual Review of Information Science and Technology 38, 87–143 (2004) 10. Rogers, Y.: Moving on from Weiser’s vision of calm computing: Engaging UbiComp experiences. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 404– 421. Springer, Heidelberg (2006) 11. Vogiazou, Y., Reid, J., Raijmakers, B., Eisenstadt, M.: A research process for designing ubiquitous social experiences. In: Proceedings of NordiCHI 2006 the 4th Nordic Conference on Human-computer Interaction, pp. 86–95. ACM Press, New York (2006) 12. Williams, A., Kabisch, E., Dourish, P.: From interaction to participation: Configuring space through embodied interaction. In: Beigl, M., Intille, S.S., Rekimoto, J., Tokuda, H. (eds.) UbiComp 2005. LNCS, vol. 3660, pp. 287–304. Springer, Heidelberg (2005) 13. Weiser, M.: The Computer for the 21st Century. Scientific American (1991) 14. Ishii, H.: Tangible User Interfaces. In: Sears, A., Jacko, J. (eds.) The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, pp. 469–487. Lawrence Earlbaum Associates, New York (2008) 15. van Dam, A.: Post-WIMP user interfaces. Communications of the ACM 40(2), 63–67 (1997) 16. Myers, B.A.: A brief history of human computer interaction technology. ACM Interactions 5(2), 44–54 (1998) 17. Shneiderman, B.: Direct Manipulation: A Step Beyond Programming Languages. Computer 16(8), 57–69 (1983) 18. Hornecker, E., Buur, J.: Getting a grip on tangible interaction: a framework on physical space and social interaction. In: CHI 2006: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 437–446. ACM Press, New York (2006) 19. Fitzmaurice, G.W., Ishii, H., Buxton, W.A.S.: Bricks: laying the foundations for graspable user interfaces. In: CHI 1995: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 442–449. ACM Press, New York (1995) 20. Nakatani, L.H., Rohrlich, J.A.: Soft machines: A philosophy of user-computer interface design. In: Proceedings of CHI 1983 SIGCHI conference on Human Factors in Computing Systems, pp. 19–23. ACM Press, Boston (1983) 21. Villar, N., Gellersen, H.: A malleable control structure for softwired user interfaces. In: Proceedings of the 1st international conference on Tangible and embedded interaction, pp. 49–56. ACM Press, Baton Rouge (2007) 22. Nichols, S.: New interfaces at the touch of a fingertip. Computer 40(8), 12–15 (2007) 23. Shen, C., Ryall, K., Forlines, C., Esenther, A., Everitt, K., Hancock, M., Morris, M.R., Vernier, F., Wigdor, D., Wu, M.: Interfaces, Interaction Techniques and User Experience on Direct-Touch Horizontal Surfaces. IEEE Computer Graphics and Applications, 36–46 (September/October 2006)
472
M.O. Millard and F. Soylu
24. Han, J.Y.: Low-cost multi-touch sensing through frustrated total internal reflection. In: UIST 2005 Proceedings of the 18th annual ACM symposium on User interface software and technology, pp. 115–118. ACM Press, New York (2005) 25. Matsushita, N., Rekimoto, J.: HoloWall: designing a finger, hand, body, and object sensitive wall. In: Proceedings of the 10th annual ACM symposium on User interface software and technology, pp. 209–210. ACM Press, Banff (1997) 26. Wilson, A.: TouchLight: An Imaging Touch Screen and Display for Gesture-Based Interaction. In: Proceedings of ICMI 2004 International Conference on Multimodal Interfaces, pp. 69–76. ACM Press, New York (2004) 27. Buxton, W., Hill, R., Rowley, P.: Issues and techniques in touch-sensitive tablet input, Computer Graphics. In: Proceedings of SIGGRAPH 1985, pp. 215–223. ACM Press, New York (1985) 28. Forlines, C., Wigdor, D., Shen, C., Balakrishnan, R.: Direct-touch vs. mouse input for tabletop displays. In: CHI 2007: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 647–656. ACM Press, New York (2007) 29. Geibler, J.: Shuffle, throw or take it! Working efficiently with an interactive wall. In: CHI 1998: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 265–266. ACM Press, New York (1998) 30. Dourish, P.: Seeking a Foundation for Context-Aware Computing. Human-Computer Interaction 16(2), 229–241 (2001) 31. Tripathi, A.K.: Computers and the embodied nature of communication: Merleau-Ponty’s new ontology of embodiment. Ubiquity 6(44), 1–9 (2005) 32. Bodker, S.: When second wave HCI meets third wave challenges. In: Proceedings of NordiCHI 2006, the 4th Nordic Conference on Human-computer Interaction, pp. 1–8. ACM Press, New York (2006) 33. Nardi, B.: Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge (1995) 34. Suchman, L.: Human-Machine Reconfigurations: Plans and Situated Actions. Cambridge University Press, Cambridge (2006) 35. Kling, R.: What is social informatics? The Information Society 23(4), 205–220 (1999) 36. Ishii, H., Ullmer, B.: Tangible bits: Towards seamless interfaces between people, bits and atoms. In: Proceedings of CHI 1997, pp. 234–241. ACM Press, New York (1997)
Generic Framework for Transforming Everyday Objects into Interactive Surfaces Elena Mugellini, Omar Abou Khaled, Stéphane Pierroz, Stefano Carrino, and Houda Chabbi Drissi University of Applied Sciences of Western Switzerland, Fribourg, Bd de Pérolles 80, 1705, Fribourg, Switzerland {elena.mugellini, omar.aboukhaled, stephane.pierroz, stefano.carrino, houda.chabbi}@hefr.ch
Abstract. According to Mark Weiser, smart environments are physical worlds that are richly and invisibly interwoven with sensors, actuators, displays, and computational elements, embedded seamlessly in the everyday objects of our lives. At present however turn everyday objects into interactive ones is a very challenging issue and this limits their widespread diffusion. In order to address this issue we propose a framework to turn everyday objects, such as a table or a mirror, into interactive surfaces allowing to access and manipulate digital information. The framework integrates several interaction technologies such as electromagnetic, acoustic and optical one, supporting rapid prototype development. Two prototypes, an interactive table and an interactive tray, have been developed using the toolkit to validate the proposed approach. Keywords: human-computer interaction, interactive surfaces, RFID, electromagnetic, acoustic.
1 Introduction Smart environments is a technological concept that, according to Mark Weiser is "a physical world that is richly and invisibly interwoven with sensors, actuators, displays, and computational elements, embedded seamlessly in the everyday objects of our lives [1]. To reach this vision of ambient intelligence and pervasive computing, there is a need to turn everyday objects into intelligent one, as well as to interact with them in the most natural and intuitive manner. So far several research works have been carried out to develop dedicated interactive surfaces, such as interactive tables, interactive white board, etc. Our aim is to design a generic framework that allows to turn any object into an interactive surface. The framework has to guarantee the compatibility between multiple surfaces by proving a runtime environment for the dynamic adaptation of an application to a surface. In other world the framework has to make those interactive surfaces compliant to each other so that an application developed for a table could be played on a wall or on a tray, without requiring any adaptation work. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 473–482, 2009. © Springer-Verlag Berlin Heidelberg 2009
474
E. Mugellini et al.
Our main objective is to make possible to transform such physical objects and surfaces into virtual control interfaces, by using various technologies to track the interaction made by the users, either with the hand or other objects. As a consequence, our research work is focused on the design and development of a framework through which digital applications can be played on physical surfaces of common objects. For instance, a coffee table could be transformed into a game table, a poster or a painting could become a user-interaction-sensitive surface. The paper is organized as follow. Section 2 reviews briefly reviews relevant, stateof-the-art research contributions and development efforts. Section 3 introduces the Inter-face project. Section 4 presents the framework we have designed and developed to turn existing surfaces into interactive ones. Section 5 presents two prototypes (an interactive table and an interactive tray) that have been developed to validate the proposed framework. Section 6 presents the results of a preliminary prototyes evaluation and Section 7concludes the paper and provides some insights on our future work.
2 State of the Art So far several technologies (touch screen, RFID – Radio Frequency IDentification, optical, etc.) have been investigated in order to transform everyday objects into interactive surfaces [3][4][7][10]. Those studies showed that each technology has its own advantages and drawbacks and usually it is adapted to carry out specific tasks. For instance RFID is very useful to identify objects but not to localize them, while acoustic technology is very good to obtain position information but not for identification. Moreover acoustic sensing has the advantage of integrating smoothly with an object and with a minimum of intervention (only a few sensors to glue on the surface), but is more subject to perturbation due to the manipulation of the object or due to ambient noise. On the other hand, optical and computer vision techniques allow for tracking easily continuous movements but make difficult to detect when touching or not an object. Several works have been carried out to create interactive surfaces using touch sensing methods which require a sensitive layer to be applied on top of the surface one wants to make tactile (eg. touch screen or touch pad) such as MERL Diamond Touch [7], Surface [8], Entertaible [5]. More recently other works have been carried out to transform everyday objects into tactile interfaces with a minimum of modification [2]. However such efforts were mainly focused on creating dedicated interactive objects and the prototypes that have been developed are mainly ad-hoc implementations. Based on this observation, our work aims at developing a framework, called Interface, integrating electromagnetic, acoustic and optical technologies in order to augment physical objects towards interactive surfaces. This will make possible to transform any physical surface (a table, a mirror, a poster, etc.) into interactive one. Inter-face framework will support the use of all or a combination of aforementioned technologies on different surfaces, according to the user interaction [9].
3 Inter-Face Framework Design Inter-Face project [6] deals with the issue of transforming everyday objects, such as a table or a mirror, into interactive surfaces that can be used to access and manipulate
Generic Framework for Transforming Everyday Objects into Interactive Surfaces
475
Fig. 1. Inter-Face framework
digital information. This is done by seamlessly combining several interaction technologies (for object identification, finger localization, tracking, etc.) into a framework, called Inter-face, that provides an abstraction layer for easily developing applications based on these technologies without taking into account specific features of the surface. As shown in Fig. 1 sensing information coming from different technologies, referred to as interaction parameters (finger position, object tracking, etc.), is transformed into modalities (touch modalities, object modalities, etc.) and applications functionalities are mapped to modalities, this way being independent of underlying technology. Newly developed applications are integrated into the framework using a plug-in approach. Each application specifies the modalities needed to function properly and the framework is responsible to map them to the interaction parameters provided by the underlying surface (in case not all the required modalities are supported by the surface the application can either work in a reduced manner or not work at all). Moreover, depending on the application, different technologies could either provide redundant information or support different interactions. For instance we could use both acoustic and optical technologies for selection or we could use acoustic only for selection and optical only for tracking. As shown in Fig. 1, Inter-face framework is composed of three main layers: • The "applications layer" manages application (i.e. plugin) deployment and integration within the framework. This layer is principally dedicated to application developers; it provides an API for describing application functionalities in term of required modalities such as “TouchDown” or “ObjectAdd” (a detailed description of existing modalities is provided in section 4.1). Again, modality abstraction is a concept used for mapping those parts of a user interface that describe user interaction in abstraction from sensing technologies. The application layer contains two main parts: the “surface manager” and the “plugin manager”. The “surface manager” maps the logical surface defined by each plugin with the physical surface of
476
E. Mugellini et al.
the interactive object. The “plugin manager” handles multi-plugin issues related to simultaneous use of different plugins on the same physical surface. • The “technology layer” provides an API allowing the integration within the framework of the different sensing technologies a given surface is equipped with. Information provided by such sensing technologies is mapped to existing framework modalities that will be later on used for application integration. The API provides a common specification that makes it easy to integrate to the framework new sensing technologies. • The “modalities layer” is the middle layer. It acts as a bridge between the modalities used by applications and the data gathered by the different technologies that equip the physical surface. This layer is transparent to end-users. The most important work after the separation between the layers was to clearly specify the modalities offered in the “API of the applications layer” and to explain how to achieve them with the various technologies used on a given surface. For this purpose, a set of interaction parameters (low-level information provided by sensing technology) has been identified. Then this low-level information has been mapped into a set of modalities (higher level information which provides an abstraction from technology). With the introduction of the modality’s abstraction, fully supported by the “modalities layer” of our framework, it becomes easy to use an “inter-face application” on different equipped surfaces as long as these surfaces offer the needed modalities used by the “Inter-face application”. In addition, we ensure the separation between applications and the physical medium. Indeed, the catalog of proposed modalities is scalable as the catalog of elementary actions. This scalability allows introducing new technologies and adapting surfaces with new sensors without great changes in the “application layer”.
4 Inter-Face Framework Implementation This section presents more in details two of the three layers of Inter-face framework. First, we present the “modalities layer”. Second, we present the functioning of the “application layer”. As an important part of our work has been to define the set of modalities that could be obtained from the interaction parameters, we present also those parameters and the related modalities integrated within the framework so far. 4.1 Modality Layer The modality layer receives a set of interaction parameters from the underlying technology layer and provides a set of modalities to the application layer above it. Interaction parameters can be either “input-actions” or “output-actions”. The “input-actions” provide to the framework information related to user interaction while the “outputactions” are used to provide to the user a feedback to his interaction (displaying something or making a sound). Interaction parameters are determined by combining and analysing the input signals provided by sensing technologies (acoustic, RFID, etc.) the physical surface is equipped with. They are then mapped into modalities that are thus independent from the underlying technology. This makes the framework more
Generic Framework for Transforming Everyday Objects into Interactive Surfaces
477
flexible, since the applications that run on top of it are not bound to a particular technology. Sensors and technology at the bottom level can evolve through time, with no need to change or adapt existing applications. Interaction Parameters. The following interaction parameters have been integrated within the framework so far: • Position: The position parameter provides a couple of coordinates of a point on the surface in response to an event that took place at this location, i.e. through the acoustic technology, we can find out where the surface has been accurately hit. • Identification: The identification parameter provides an ID. This ID could be associated to a person or an object that approaches/or leaves the surface. For example, RFID or speech recognition can be used as an identification technology. • Tracking: The tracking parameter provides a path, a list of continues points on the surface, related to an event that took place and has persisted with a displacement. This can be achieved using infrared or computer vision technology. • Display: The display interaction is an output parameter; it does not provide any information to the framework, it’s the framework that provides the information to be displayed. This can be achieved using optical technology. Modalities. As stated before, the interaction parameters presented in the previous section have been mapped to a set of modalities the framework provides to the applications in order to make such applications independent of the sensing technologies used to gather such data. The Click modality returns a precise location of finger(s) or elongated objects, such as pen or sticks. The framework is designed to support multiple contact points (multi-touch). This modality uses position interaction parameter. It is used for example to select elements within an application and includes: • • • •
SimpleClick: a simple action. DoubleClick: a double action detected at the same location. TripleClick: a triple action detected at the same location. LongClick: when an action takes more than 2 seconds in one place.
The Touch modality returns a location and an ID of an object. It is used to interact with elements within an application. This modality combines identification and tracking interaction parameters. It includes: • TouchDown, TouchDrag, TouchUp: deal with selecting, moving and releasing respectively the object. • TouchMove: returns a location, it is dedicated to cursor movements. Fig. 2 shows the difference between the modality “TouchDrag” related to a virtual object (a window in the example) and the modality “TouchMove” only related to cursor. The Object modality returns an ID for physical object identification (e.g. tangible interfaces). Physical objects can be used to interact with the surface in order to populate the interface with contents, activate functions and personalize interactions. This modality uses the identification interaction parameter and includes:
478
E. Mugellini et al.
Fig. 2. Exemple of TouchDrag and TouchMove interactions
• ObjectAdd: an object or a person approaches the surface. • ObjectLost: an object or persons moves away from the surface. The Display modality takes an interface to be displayed on the surface. 4.2 Application Layer The application layer handles the deployment of applications (i.e. plugins) using the Inter-face framework on any physical interactive surface. To achieve this task, the application layer is composed of two main parts: the “surface manager” which builds a logical representation of the physical surface and the “plugin manager” which manages the plugins. These two components are presented in the following sections. Surface Manager. As stated before, an Inter-face plugin is intended to be used on different surfaces in our environment (tables, mirrors, walls, etc.) equipped with different sensing technologies and thus potentially providing different interaction parameters (which in turn means different modalities). At the “technology layer” we describe for each used technology its set of provided interaction parameters and the region of the physical surface where each technology is active (see Fig. 3a). Based on the information gathered from the “technology layer”, each physical surface is then logically divided into a set of “interaction’s regions”. Each one of these regions is associated to a set of interaction parameters and thus support a specific set of modalities via the “modality layer”. Plugin Manager. The plugin manager has to handle each plugin according to its position on the surface and to manage the simultaneous use of several plug-ins on the
Fig. 3. a) Mapping between the physical surface and sensing technologies, b) concept of plugin and plugin container
Generic Framework for Transforming Everyday Objects into Interactive Surfaces
479
same surface. Each plugin describes via a configuration file in XML, its features (size, orientation, etc.) and the modalities required to function properly. Each plugin is placed in a plugin container that provides a common API in order to guarantee framework flexibility and the easy integration of new applications. The framework only manages the pluginContainer and handles the interaction among differents plugin containers if necessary. This separation makes plugin independent from each other and makes it easy to add or delete any plugin the user needs.
5 Prototypes Using the Inter-face framework, two interactive surfaces have been developed: an interactive table and an interactive tray. The table is a mix of wood for the infrastructure and Plexiglas for the upper surface. Four microphones are fixed on the upper side of the table surface while the RFID antenna (multi-tag reader) is fixed just under the upper surface, along table border. Inside the table there is a beamer, the acoustic kit, four infrared lights and an infrared camera (see Fig. 4). The tray is in Plexiglas and it is equipped with RFID and acoustic technologies. Four microphones and the RFID antenna (multi-tag reader) are fixed under the surface (see Fig. 5). Data provided by the sensing technologies used in the two prototypes are mapped to high-level modalities (as described in section 4.1) as following: • • • •
Optical technology provides back projection and is mapped to the display modality. RFID technology provides identification, and is mapped to the identification modality. Acoustic technology provides localization and is mapped to the click modality. Infrared technology provides both location and tracking information and can be used to implement both touch and click modalities.
Fig. 4. Interactive table prototye
480
E. Mugellini et al.
Fig. 5. Interactive tray prototye Table 1. Developed applications and used modalities on the two interactive surfaces Plugins (application)
Plugins short description
Used modalities
Used technologies on the table
Used technologies on the tray
Photomanager
It allows users to manage photo collections (choose a photo album, select, rotate and zoom pictures, etc.
- Display - Click - Touch - Object
- Optic - Acoustic - Infrared - RFID
Not supported
Instrument player
One user + possibility - Click to choose an instru- Object ment and play it via different tagged cards
- Infrared - RFID
- Acoustic - RFID
Tic tac toe game
Tic tac toe game
- Display - Click - Object
- Optic - Acoustic
- Acoustic - External screen for display
Pong game
Pong game
- Display - Touch
- Optic - Infrared - RFID
Not supported
Different plugins have been developed to validate the proposed framework. Table 1 shortly presents some of the plugins that have been developed, the modalities they need to work properly, and to which sensing technology those modalities are mapped to when implemented on the two prototypes. It’s worth to notice that our interactive tray, due to sensing technologies it’s equipped with, cannot provide the touch modality. Hence, plugins using this modality cannot be played on this surface.
6 Preliminary Prototypes Evaluation Some preliminary tests of the two prototypes have been performed. The main goal of such tests was to assess the usability of the two interactive surfaces and the
Generic Framework for Transforming Everyday Objects into Interactive Surfaces
481
naturalness of the interaction. No quantitative tests have been done to evaluate user performances (e.g. response time, erroneous interpretations, etc.) compared to traditional PC interaction based on screen, keyboard and mouse devices. Different categories of users have been selected for the tests (10 persons from 14 to 16 years, 6 persons from 25 to 30 years and 6 persons from 40 to 60 years). After a short explanation of the functioning of the two surfaces and the existing applications, people were asked to play with those applications. People took few minutes to “tune” their interaction with the surface (e.g. to adapt the force necessary to select an element by tapping with a finger on the surface, to identify the minimum distance from the surface necessary for an RFID tagged object to be detected by the surface, etc.). Apart from the initial adaptation phase, the results of such tests provided good feedbacks about the two prototypes, in particular users appreciate the “physical” interaction with the surface (based on the direct use of fingers and hands) and they were positively surprised by the possibility to play the same application on different surfaces.
7 Conclusions This paper has presented the result of the work carried out within the framework of Inter-face project, which deals with the issue of transforming everyday objects into interactive surfaces that can be used to access and manipulate digital information. This is done by seamlessly combining several sensing technologies (for object identification as well as finger localization and tracking) into a framework that provides an abstraction layer for easily developing applications based on these technologies without taking into account specific features of the surface. In order to validate the proposed framework two prototypes have been developed (an interactive table and an interactive tray). A preliminary evaluation of the two interactive surfaces has been carried out. The results of such evaluation were encouraging. As next step an accurate evaluation of the prototype will be performed in order to assess how acceptable is for users, both end-users and developers, to use it. The evaluation will be centered on two main aspects: the easiness of use of the interactive surfaces from the end-user’s point of view and the willingness to use the framework to develop new applications from the developer’s point of view. Acknowledgments. This research work has been supported by RCSO-TIC within the framework of Inter-Face project. Particular thanks to all the people involved in the project for their valuable contributions.
References 1. Cook, D., Das, S.: Smart Environments: Technology, Protocols and Applications. WileyInterscience, Hoboken (2004) 2. Crevoisier, A., Bornand, C.: Transforming Daily Life Objects into Tactile Interfaces, Keynote paper. In: Proceedings of the Smart Sensors and Contexts Conference (EuroSSC 2008), Zurich, Switzerland (2008)
482
E. Mugellini et al.
3. Crevoisier, A., Polotti, P.: Tangible Acoustic Interfaces and their Applications for the Design of New Musical Instruments. In: International Conference on New Interfaces for Musical Expression (NIME 2005), Vancouver, BC, Canada, May 26-28, 2005, pp. 97–100 (2005) 4. Del Conte Natali, T.: Philips Debuts Cool New Prototype Touchpad Game Board. ExtremeTech. online magazine, August 30 (2006), http://www.extremetech.com 5. Hollemans, G., Bergman, T., Buil, V., van Gelder, K., Groten, M., Hoonhout, J., Lashina, T., van Loenen, E., van de Wijdeven, S.: Entertaible: Multi-user multi-object concurrent input. In: UIST 2006 - Adjunct Proceedings of the 19th annual ACM Symposium on User Interface Software and Technology, Montreux, Switzerland, October 15-18 (2006) 6. Inter-face project website, http://www.interactive-surface.ch 7. MERL DiamondTouch website, http://www.merl.com/projects/DiamondTouch/ 8. Microsoft Surface webiste, http://www.microsoft.com/SURFACE/index.html 9. Mugellini, E., Pierroz, S., Chabbi, H., Abou Khaled, O.: Interface toolkit for creating interactive surfaces. In: 3rd IEEE European Conference on Smart Sensing and Context, Zürich, Switzerland, October 29-31 (2008) 10. Mugellini, E., Rubegni, E., Gerardi, S., Abou Khaled, O.: Using Personal Objects as Tangible Interfaces for Memory Recollection and Sharing. In: 1st International Conference on Tangible and Embedded Interaction, TEI 2007, Baton Rouge, USA, February 15-17 (2007)
mæve – An Interactive Tabletop Installation for Exploring Background Information in Exhibitions Till Nagel1, Larissa Pschetz1, Moritz Stefaner1, Matina Halkia2, and Boris Müller1 1
Interaction Design Lab, University of Applied Sciences Potsdam {nagel, larissa.pschetz, moritz.stefaner, boris.mueller}@fh-potsdam.de 2 Joint Research Centre, European Commission, Ispra
[email protected]
Abstract. This paper introduces the installation mæve: a novel approach to present background information in exhibitions in a highly interactive, tangible and sociable manner. Visitors can collect paper cards representing the exhibits and put them on an interactive surface to display associated concepts and relations to other works. As a result, users can explore both the unifying themes of the exhibition as well as individual characteristics of exhibits. On basis of metadata schemata developed in the MACE (Metadata for Architectural Contents in Europe) project, the system has been put to use the Architecture Biennale to display the entries to the Everyville student competition. Keywords: Metadata, visualization, concept networks, tangible interface, exhibition, user experience.
1 Introduction In exhibitions, background information is usually provided in form of audio guides with manual chapter navigation or printed materials. Current research investigates the potential of interactive systems to enhance the discovery and exploration of information in this context, for example, by looking into the possibility of augmenting objects through mobile technology and developing new software to traditional screen based systems. This paper presents a case study of the design of the mæve installation1. It introduces mæve, an alternative approach to make information networks behind exhibits accessible to visitors in a highly interactive, tangible and sociable manner. The mæve system has three main components: Each exhibit is represented in paper cards, which display, for example, the title, a picture and some background information of the work. These cards are produced in large amounts, and can function as take-away souvenirs or reminders of the exhibit. The paper cards can be placed on an interactive tabletop, whose surface displays media, texts, and metadata related to the exhibits. If several cards are placed on the table, networks of other exhibits, related projects as well as mutually shared concepts emerge. 1
http://portal.mace-project.eu/maeve/
J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 483–491, 2009. © Springer-Verlag Berlin Heidelberg 2009
484
T. Nagel et al.
Each card combination results in a different network configuration. Position and orientation of cards constitute additional input parameters to allow the manipulation of the network visualization. A supplemental wall projection displays the same information configuration as the one on the table, but with enhanced media and text displays, and reduced network complexity. It works both as an attractor for bystanders as well as a content lens for individual inspection of media presented on the table. The information display is governed by an underlying metadata structure in the form of a concept network, which is based on characteristics of the exhibits. Referring to architectural exhibits, for instance, the metadata structure of mæve presents categories such as author, country of origin, material, functional typology, inspirational project, etc. The co-occurrences in such metadata define the visual connections among exhibits. Each card combination on the table reveals and highlights the local neighborhood defined by such configuration in the conceptual network. As a result, users can explore both the unifying themes of the exhibition as well as individual characteristics of exhibits with an easy to use tangible system. This fosters self-paced, informal learning in a playful setting. Moreover, as it is located directly in the exhibition, and can be used by several people at a time, it affords a wide range of information and social interactions. In the following, we discuss related work and introduce the design and technical features of mæve. We present a case study of how the system was used at the Venice Biennale of Architecture for the exhibition of the Everyville student competition. The paper closes with a reflection on user experience, feedback and lessons learned. mæve has been designed and developed within the MACE2 project. MACE connects digital information about architecture, and provides an infrastructure for enriching and annotating educational contents.
2 Related Work Presenting background information on exhibits has been a traditional practice among exhibitors. Over the last years, efforts have been done in order to offer alternative ways of carrying out this task. Examples are the development of interactive systems that are accessed via Web and/or presented in terminals in the exhibition location. These systems sometimes attempt to recreate the space of the exhibition through 3D models [11], or focus on grouping and presenting exhibits based on metadata [18]. Such projects, however, do not intend to integrated information in the exhibition itself. Thus, when visitors access the information terminal, they must remember which exhibits have drawn their attention during their visit, which might be a difficult task in large events. Another approach explores the use of mobile phones [10, 15], PDAs and RFID [6] technology to augment objects. The PhoneGuide [3], for instance, allows visitors to use their mobile phones to retrieve information. The software is installed on their phones, and recognizes when exhibits are photographed through image classification techniques. After identifying an object, it provides corresponding multimedia information. Projects with such approach have sometimes the advantage of creating collection of 2
Metadata for Architectural Contents in Europe, http://www.mace-project.eu
mæve – An Interactive Tabletop Installation for Exploring Background Information
485
works tagged on-site [3], which can be later transferred to the Web and checked at home. On-site, however, they rarely establish connections among exhibits, which is useful to reinforce remembrance and to provide an overview of the exhibition. Other projects have used interactive tables to display information on exhibits [5]. Examples are the floating.numbers [2] installation and the 2007 Graduate Exhibition of the London College of Fashion [1]. The first presents a simple click-and-view explorative interaction: hints of exhibits float over the table and open further information when touched. The second presented information as an isolated terminal would do: supporting not more than one user at the same time, and displaying not more than one exhibit or bunch of pre-defined information at each time.
3 Designing mæve The aim of mæve is 1) to support visitors in exploring background information on exhibitions in a self-paced, playful, constructive and sociable manner, and 2) to allow users to access and discuss their own niche in the available information. We believe that the direct interaction with representations of exhibits, enhanced with information about their conceptual relations, can deepen exhibition experience and foster understanding of underlying concepts. Designing in this area includes activities ranging from information architecture (choosing the right classification and metadata structure), interface and interaction design to algorithmic decisions (visualization model and look-and-feel), and even room setting and lighting. To investigate the complex interrelations among design decisions, technical constraints and resulting user experience in this context, we adopted an iterative prototyping approach. Firstly, interface drafts were tested and discussed in paper prototypes. The resulting coarse concept was then quickly implemented in a just–enough interactive prototype using a first table prototype in order to understand technical and interaction details. Many design decisions were only possible with this first implementation. Visual and interface design were guided by principles of information aesthetics [9], aiming at unifying high interactivity with accurate data representation whilst adhering to aesthetic principles in all design areas. It has been shown that data representations that are perceived as aesthetic lead to higher acceptance and lower abandonment rates [4]. Moreover, visiting an exhibition is a sensual, situated experience; in this context, the interactive installation should seamlessly blend in, and ideally establish a sensual and social experience on its own, in addition to communicating factual knowledge. 3.1 User Interaction In this section, we present the fundamental interaction mechanisms, and describe the interface and visual characteristics of mæve. The phase of collecting cards within the exhibition allows users to later explore exactly the collection of exhibits they were interested in. Each card acts as information storage and display. They not necessarily have to be used in conjunction with the interactive tabletop: with the collection of cards, visitors acquire exhibition memorabilia, and create personal collections of references of exhibits.
486
T. Nagel et al.
Fig. 1. Front and back sides of a card
The front side of each card shows a picture of the respective project, and some information such as its title, author names, and its URL (Fig. 1). A fiducial marker is printed on the backside of each card. This marker reflects an infrared light recognized by a high definition camera inside the table, which allows the system to identify the card, as well as its position and orientation. This tracking procedure was developed with reacTIVision framework [8]. When one or several cards are put on the table, their content and metadata as well as related exhibits emerge and connect to the already displayed content, forming a network. Conceptual connections are displayed as labeled lines connecting related exhibits. Determined by factors such as the cards placed on the table, their position and rotation, the display is continuously updated in order to optimize the information density for the given configuration. Placing One Card. After placing a card, background information of its exhibit emerges on the interactive surface (see Fig. 2). Media related to its exhibit are presented through thumbnails, which are organized in a fish-eye distorted semi-circle around the physical card. The top of the card acts as a pointer “selecting” one thumbnail. Once selected, this thumbnail is enlarged on the table display and simultaneously magnified on the wall projection. In this way, users can explore the media by rotating the card towards the thumbnail they are interested in.
Fig. 2. Placing one card
Fig. 3. Enlarging a related exhibit
mæve – An Interactive Tabletop Installation for Exploring Background Information
Fig. 4. Placing two cards
487
Fig. 5. Placing multiple cards
Related projects can be explored even if not revealed on the table by a physical card. Such exhibits exist in the network, but are initially only hinted. They can be enlarged by moving the card towards the related project (see Fig. 3). The virtual project goes through three states: from hinted, to medium-sized with title, up to full-sized with a slide show of its media files, thus revealing more information each step. Placing Multiple Cards. When multiple cards are placed on the table, relations among the exhibits, as well as further related projects appear (see Fig. 4 and 5). Depending on the number of cards placed on the table, the set of relations shown might be reduced to a subset. In order to reduce complexity, only the most important relations are displayed. The importance of a relation is computed based on the shared metadata, which are weighted according to their relevance to each single exhibit. Furthermore, the emerging structure depends not only on the kinds of cards placed simultaneously, but also on the order they are placed: the system prioritizes the establishment of relations to already visible projects. This makes the comparison of new and previous configurations easier – comparison that would demand more effort from the user if larger parts of the network changed. As more cards are placed on the table, the number of relations increases and display items are more likely to interfere visually. To minimize visually distracting line crossings, we implemented a path finding algorithm: The lines follow smooth curves to the related object, while avoiding obstacles such as other physical cards, or virtual projects in full-size. The control points of the continuous Bézier spline are prevented to collide with an obstacle by its repulsion force, thus resulting in smooth obstacle avoidance. For the remaining, unavoidable crossings, we introduced a spatial metaphor in order to reduce visual clutter: through changes in transparency, one of the lines appears to go underneath the other one. 3.2 Social Interaction The large tabletop as well as the overall spatial setup support interaction by multiple visitors at the same time. Accordingly, the installation induces not only humancomputer, but also human-human interaction in manifold ways. As Stanton et al. suggested [16], a large interaction area stimulates or even enforces collaboration since one single user is not able to manipulate all objects. Users are not isolated in a terminal accessing information of individual interest. In mæve, their interests are shared and may overlap. We observed a variety of ways in which visitors got in touch with each other via the table, e.g. discussing the displayed information, asking for more space or a distant card, or simply co-creating interesting networks. This collective activity promotes even more remembrance of exhibits [13].
488
T. Nagel et al.
4 Case Study: Mæve at Architecture Biennale The mæve system has been put at use on the occasion of the Everyville student competition of the 11th Biennale of Architecture in Venice. A fully working system has been implemented, using the reacTIVision framework for card tracking and the Processing [12] language for the visualization front-end. The installation provided background information for the selected entries of the competition. The entries have been presented with general data such as title, authors, location, and descriptive texts, as well as with the submitted images and additional media. Furthermore, we asked the winning groups to name inspirational projects, which we incorporated to deepen the understanding of the idea and concept of the original entry. To form the conceptual network and to be able to create meaningful connections the projects were classified by terms from the architectural and engineering taxonomy of the MACE Application Profile [17]. This taxonomy was designed to meet the specific information needs in architecture, and to support the semantic description of architectural and design projects. In addition, authors of exhibits chose freeform terms they judged appropriate to describe their works. These terms, however, have been unified into the taxonomy before being integrated into the database. Experts of the architectural domain set and adjusted the terms and their importance (weight) according to how characteristic and specific it was for each project. The spatial arrangement and the light setup were defined with the objective of guiding visitors through the installation, while inviting them to watch and participate. The table was located in the middle of the exhibition room (Fig. 6), backed by the main wall projection, which immediately grasped the visitors’ attention. The implemented interactive surface had an area of 1.87 m2. In order to guarantee visible access to the wall projection, the side of the table that directly faced the projection was subtly obstructed: on this side the table was lengthened with a non-interactive surface, which served as a display for the cards. The total dimension of the table was 2.2 m x 1.7 m x 1.0 m. The height of 1.0 m was necessary to allow the inner projection to reach the surface after being reflected by two mirrors. It also turned out to be optimal to allow access of standing users (Fig. 7).
Fig. 6. Setup at Arsenale, Venice
Fig. 7. Visitors using the installation
mæve – An Interactive Tabletop Installation for Exploring Background Information
489
The resolution of the interactive screen was 29.4 pixels per inch (full HD 1080p). As resolution strongly influences visualizations [7], this relatively low resolution led us to iteratively re-design the interface, by testing different font faces and sizes to improve readability and by adapting animations and the overall design. The spatial setup in combination with the chosen interface design allowed an “upwards” interaction metaphor; when cards were placed on the table, background information arose from the “bottom” of the table, floating in the surface, and ending up by being projected on the wall.
5 User Experience During the weeks mæve was presented at the Biennale, we had the opportunity to receive feedback from users and to observe them engaging in the exhibition. Overall, the interaction turned out to be very intuitive. Users affirmed that it took a “fraction of second” to understand the connection between cards and screens, however, the use of the table and function of graphics was reported to demand more time, being mostly understood after the first try. Some visitors attempted to directly manipulate the displayed objects, as if they were using a multi-touch table [14]. Others initially placed the card with the front side facing the table, obtaining no feedback. Many people were attracted by curiosity, but even those who failed to grasp the multiple levels of the installation used the opportunity to play a game of cards. We noticed that users generally spent more time engaged in the installation when more than one person was present. In fact we witnessed situations in which groups of university students, amounting to up to 60 young persons, were able to place at least one card each, spending long periods of time exploring and discussing the information provided by the system. During students’ visits it was also possible to realize the potential of mæve to be used as a teaching tool. The intricate web of concepts, elements and dependencies presented to the reader in the condensed visualizations has served as an object for both playing and extensive analysis, bringing up a diverse combination of concepts involved in architectural practices. This analysis benefited from the fact that this network of entities was entirely at the user’s control. While perusing theoretical concepts like distance and memory, form and identity, the act of card-playing underlined the ludic aspects of architecture and exhibitions. Lessons Learned. During the conception and development of mæve we experienced that the design of such an interactive system presents special challenges: card turning as an input action, or the absence of the traditional top-bottom screen orientation (as the table can be used from different sides) open space for innovation in user interface design. Another critical aspect is the interplay of the visualization with the data and information basis on the one hand, and the spatial setup and situation on the other hand. The physical component of the installation imposes its own limits: we were not able, for example, to find a satisfactory treatment for the case in which multiple instances of the same card were placed on the table. Thus, links to cards placed were lost when another instance of the same card was put on the table. However, we observed users turning this “fault” into a playful feature, by purposefully placing many instances of the same card in order to claim the link to the network of concepts. This
490
T. Nagel et al.
example demonstrates the social function of mæve as a facilitator of human-human interaction between strangers, and points out to the importance of the social aspect in shared interactive displays.
6 Conclusion Mæve shows a way of providing new browsing experience by combining tangible interaction, complex visualization and collaborative data examination. By inspecting conceptual relations on the table, visitors can find other significant works, or understand how their interests connect to information shared between exhibits. The simplicity of placing a card on the table imposes a low barrier to start interacting with the system. The visualization aesthetics and feedback invites users to explore both interaction mechanisms and displayed information without demanding great amount of concentration. In addition, the installation makes spectators aware of how metadata can be utilized to interconnect knowledge resources, and how the overall understanding benefits from these connections.
Acknowledgments The mæve installation was created by Tina Deiml-Seibt, Steffen Fiedler, Jonas Loh, Thomas Ness, Stephan Thiel, and the authors, members of the Interaction Design Team of the University of Applied Sciences Potsdam. This project was co-funded under contract number ECP 2005 EDU 038098 in the eContentplus programme in the context of the MACE project. The interactive table was kindly supported by Werk5 GmbH. We would like to thank Furio Barzon and Matteo Zambelli for their professional and organizational support, and Massimiliano Condotta, Elisa Dalla Vecchia, Elena Orzali, and Vittorio Spigai for contributing their expertise to creating a meaningful and understandable network of architectural contents, and for their efforts in classifying the material. Furthermore, we would like to thank all MACE partners for their support and ideas. Last but not least, we also thank La Biennale di Venezia, Telecom Italia, and all students who participated in the Everyville competition.
References 1. 2007 Graduate Exhibition of the London College of Fashion, http://www.fashion.arts.ac.uk/37189.htm 2. Art+Com: floating.numbers installation, http://www.artcom.de 3. Bruns, E., Brombach, B., Zeidler, T., Bimber, O.: Enabling Mobile Phones To Support Large-Scale Museum Guidance. IEEE Multimedia 14(2), 16–25 (2007) 4. Cawthon, N., Vande Moere, A.: The Effect of Aesthetic on the Usability of Data Visualization. In: IEEE International Conference on Information Visualisation (IV 2007), pp. 637–648. IEEE, Zurich (2007) 5. Geller, T.: Interactive Tabletop Exhibits in Museums and Galleries. IEEE Comput. Graph. Appl. 26(5), 991–994 (2006)
mæve – An Interactive Tabletop Installation for Exploring Background Information
491
6. Hsi, S., Fait, H.: RFID enhances visitors museum experience at the Exploratorium. Communications 48(9), 60–65 (2005) 7. Isenberg, P., Carpendale, S.: Interactive Tree Comparison for Co-located Collaborative Information Visualization. IEEE Transactions on Visualization and Computer Graphics 13(6), 1232–1239 (2007) 8. Kaltenbrunner, M., Bencina, R.: reacTIVision: a computer-vision framework for tablebased tangible interaction. In: Proceedings of the 1st international Conference on Tangible and Embedded interaction, TEI 2007, Baton Rouge, Louisiana, February 15-17, 2007, pp. 69–74. ACM, New York (2007) 9. Lau, A., Vande Moere, A.: Towards a Model of Information Aesthetic Visualization. In: IEEE International Conference on Information Visualisation (IV 2007), pp. 87–92. IEEE, Zurich (2007) 10. Mäkelä, K., Belt, S., Greenblatt, D., Häkkilä, J.: Mobile interaction with visual and RFID tags: a field study on user perceptions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2007, San Jose, California, USA, April 28-May 03, 2007, pp. 991–994. ACM, New York (2007) 11. Mourkoussis, N., White, M., Patel, M., Chmielewski, J., Walczak, K.: AMS: metadata for cultural exhibitions using virtual reality. In: Proceedings of the 2003 international Conference on Dublin Core and Metadata Applications: Supporting Communities of Discourse and Practice Metadata Research & Applications, Seattle, Washington, September 28October 02, 2003, pp. 1–10. Dublin Core Metadata Initiative (2003) 12. Processing website, http://www.processing.org/ 13. Rajaram, S., Pereira-Pasarin, L.: Collaboration can improve individual recognition memory: Evidence from immediate and delayed tests. Psychonomic Bulletin & Review 14(1), 95–100 (2007) 14. Rekimoto, J.: SmartSkin: an infrastructure for freehand manipulation on interactive surfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Changing Our World, Changing Ourselves, CHI 2002, Minneapolis, Minnesota, USA, April 20-25, 2002, pp. 113–120. ACM, New York (2002) 15. Roussos, G., Marsh, A.J., Maglavera, S.: Enabling Pervasive Computing with Smart Phones. IEEE Pervasive Computing 4(2), 20–27 (2005) 16. Stanton, D., Bayon, V., Neale, H., Ghali, A., Benford, S., Cobb, S., Ingram, R., O’Malley, C., Wilson, J., Pridmore, T.: Classroom collaboration in the design of tangible interfaces for storytelling. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2001, Seattle, Washington, United States, pp. 482–489. ACM, New York (2001) 17. Stefaner, M., Spigai, V., Vecchia, E.D., Condotta, M., Ternier, S., Wolpers, M., Apelt, S., Specht, M., Nagel, T., Duval, E.: MACE: Connecting and Enriching Repositories for Architectural Learning. In: Browsing Architecture: Metadata and Beyond: International Conference on Online Repositories in Architecture, Venice, Italy, September 20-21, pp.22–49. Fraunhofer IRB Verlag, Stuttgart (2008) 18. Yee, K., Swearingen, K., Li, K., Hearst, M.: Faceted metadata for image search and browsing. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2003, Ft. Lauderdale, Florida, USA, April 5-10, 2003, pp. 401–408. ACM, New York (2003)
Relationality Design toward Enriched Communications Yukiko Nakano, Masao Morizane, Ivan Tanev, and Katsunori Shimohara 1-3 Tatara-Miyakodani, Kyo-Tanabe, 610-0321 Kyoto, Japan {dmi11,dti0738}@mail4.doshishsa.ac.jp, {itanev,kshimoha}@mail.doshisha.ac.jp
Abstract. We have been conducting research on how to design relationality in complex systems composed of intelligent tangible or intangible, artificial artifacts, by using evolutionary computation and network science as methodologies. This paper describes the research concept, methodologies, and issues of relationality design. As one of research on relationality, we investigate here significance of linkage between a real world and a virtual world in a learning system.
1 Introduction In research on “Designing Relationaliy [1]”, we intend to investigate the significance and meaning of creating relationality through grasping, expressing and operating relationality as networks in the field of system science. Especially we focus on complex systems with emergent mechanisms. In other words, we are aiming at understanding the significance and functions of such complex systems from a viewpoint of relationality. 1.1 Viewing System as Dynamics Systems here denote collective systems that compose of elements and their interactions, and that have some mechanisms to autonomously maintain and regulate themselves. Assuming a football team as a system, players should be elements and teamwork or collaboration between players should be relationality. No matter how skillful and/or talented the player is, he cannot make games without a collaborated teamwork with other players. There should be no meaning if a team could not work as a system, and the performance of the team depends on collaborations between players. The opponent team is a system as well, and those systems should interact with each other. Collaborations and teamwork of both teams form interactions between those systems, and in turn such interactions influence collaborations between players. Collaboration between players, therefore, should be closely related with interactions between systems. System behaviors and its property should be dynamically generated not by simple summation of individual elements but by relationality between elements. In addition, the change of relationality over the time could be grasped as an evolutionary process of the system. It is very important to view the dynamics of systems from a viewpoint of relationality. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 492–500, 2009. © Springer-Verlag Berlin Heidelberg 2009
Relationality Design toward Enriched Communications
493
1.2 Communications as “Relationality with Others” Communications typically between people are activities related to information. There should be two kinds of information related to communications; one as an object of processing and/or operating such as transfer, accumulation, conversion, processing, edition, expression, and so forth. Another is as media working for adding/creating values, giving influences, and controlling some flow and/or mechanisms. On the basis of human beings’ information-related activities, there should be a human fundamental desire to seek for relationality with others. Postulating communications as “relationality with others”, we intend to create some mechanism through which people can find diverse relationality. This paper describes the research concept on relationality, and introduces the way of thinking and methodologies for relationality design, research directions and possibilities of relationality design.
2 Research on Relationality Design The concept of relationality here denotes interactions through which two entities mutually influence each other, linkage over time and space, and context as a result of accumulated interactions and linkage. Interactions and linkage form context with the passing of time, and in turn the context and linkage affect the upcoming interactions. Thus interactions, linkage and context are mutually related. Relationality is not limited to physical and spatial one, or rather basically invisible and informationdriven, and sometimes ecological and environmental. Social and economical systems, culture, region, and senses of value, therefore, are included in the relationality. We human beings should be entities that wish for relationality or relationships with others and hope to find meaning in these relationships. Human beings, in other words, might live in relationality and be alive with relationality. 2.1 “Dependence and Governance” in Relationality Relationality should work to form some structure of “dependence and governance”, as shown in Figure 1.
Fig. 1. Dependence and governance in relationality
Nodes, A, B, and C in Figure 1 are elements of a system, and directed links represent relationality between such elements. We may think that nodes represent human beings typically, but also objects and/or entities with which human beings have some relationality, such as social rules, economical systems, culture, religion, thoughts, senses of value, and so forth.
494
Y. Nakano et al.
For example, assuming that A represents a baby, B does its mother, and C does its father, the baby depends on the mother, and the mother depends on the father, as shown as a in Figure 1. Or assuming that A represents a man, B does his company, and C does some social system, the man depends on the company, and the company depends on the social system. Or cars depend on road traffic signals, road traffic signals depend on the traffic signal system or some traffic regulations. As you can see in the figure, dependence should be accompanied by the governance as the inverse, as shown as b in Figure 1. That is, the mother governs the baby, and the father governs the mother. The mother, however, should be responsible for the baby, and the father should be responsible for the mother, as shown as c in Figure 1. Moreover, the mother might often have something to live for from the baby’s dependence to the mother, and the father might be the same, as shown as d in Figure 1. Thus, such duplicated relationality eventually foster some sort of trust between entities, and finally human beings sometimes become unconscious of such relationality. We human beings behave and communicate with others, sometimes based on the past memories and sometimes on anticipation for the future, as shown in Figure 2. For example, the young works hard in school for their future. Religious people pray for their ancestors, and sometimes devote their lives to the pursuit of their faith, believing in the life after death. That is, current humans’ information processing is influenced by the past and the future. Thus, it is crucial to consider linkage over time and space as well as direct interactions especially in such systems where humans as entities or elements are involved. In that sense, context as a result of accumulated interactions and linkage is also very important.
Fig. 2. Relationality over time
2.2 Relationality Design Relationality design shall be aware of the invisible relationality behind the phenomena and affairs, to find a new value and/or implications of such relationality, and to utilize it, for example, in order to enrich the communications and/or human societies. From a viewpoint of informatics, relationality inherits two aspects of information as media, as follows: • Aspect to promote efficiency of information processing and operations by unifying the meaning of information and senses of value of people, • Aspect to extend and allow the freedom of interpretations of information by esteeming diversity and plurality of the meaning of information.
Relationality Design toward Enriched Communications
495
The second aspect of information as media should be essential in communications. The meaning and value of information and/or behavior in communications depends on the receiver’s interpretation. That is, information should be given to its meaning in a dynamical process where a communicating entity interprets and operates it. In other words, human beings coordinate their own behaviors seeking for specific information that identifies the meaning and value of their behaviors and the information they generated. In the sense that the meaning of information is originated from context, situations and/or environment in communications, information has an aspect of pragmatics. Thus, relationality that should be invisible in nature has structural properties which enable to be visualized as a network, semantic ones to be handled semantically, and pragmatic ones mentioned above. We need, therefore, a methodology to understand such semantic and pragmatic features of relationality as dynamic behaviors of a system in terms of structural features of relationality ― relationality networks.
3 Methodologies for Relationality Design The idea to envisage systems as relationality networks could be applied to complex systems. The topics of research include analysis of relationality between entities and the resulting emergent properties of complex systems and societies at various levels of hierarchy, from the lowest, molecular level (interactions between molecules in the cells)[2] through the DNA (genetic regulatory networks)[3][4], cells (interactions between cells during the growth, differentiation and specializations of tissues and organs in multi-cellular organisms) to the highest level ― artificial (interactions and collaboration between agents in multi-agent systems) and human societies (human communications and interactions). We intend to grasp the information processing mechanisms of such complex systems as a process in which the relationality networks emerge, grow, develop, split and/or collapse, to understand the functions of relationality through systems’ performance, and eventually to clarify the significance and meaning of relationality. Methodologies we have employed for relationality design are evolutionary computation[5]-[7], especially genetic programming, to imitate the mechanism of biological evolution on computers, and network science to visualize and analyze relationality as networks. In genetic programming, the candidate solutions (represented as genetic programs) to the design problem undergo alterations through genetic operations (such as selection and reproduction) and their survival depends on the fitness (i.e. quality of achieved quality of the solution to the problem) tested in the environment, which allows the population of solutions to evolve automatically in a way much similar to the evolution of species in the nature. As a holistic algorithmic paradigm, evolutionary algorithms are consonant with the holistic approach to the design and analysis of complex systems and societies, based on the belief that any complex system or society is more than the sum of its individual entities, more than the sum of the parts that compose it. And due to their heuristic nature, evolutionary algorithms offer the opportunity to explore various problems in the considered problem domains where the lack of exact analytical solutions or the
496
Y. Nakano et al.
extreme computational expensiveness of such solutions hinders the efficiency of traditionally applied analytical approaches. In network science, a system is modeled as a network in which elements of the system are represented by nodes and interactions between elements - by edges. The idea to envisage a system as a network can be applied to complex systems at various levels of hierarchy, from molecules, genes and cells, to human organization and society, and economical and social systems. As a matter of fact, recent studies on network analysis of complex systems have revealed the common characteristics for them. For example, properties represented by Small World networks and/or Scale Free networks have given us a new view to grasp and understand such complex systems as network dynamics. In other words, those systems are supposed to share some common mechanisms to gather, edit and represent information, and to achieve some dynamical functions. In research on “relationaliy design”, we are thinking of combining and integrating these methodologies and are conducting hypothesis-finding-based simulations in order to investigate a new value and significance of relationality and to utilize relationality [8][9].
4 Linkage between Real World and Virtual World As one of the research topics on relationality, research on linkage between a user world and a virtual world in a learning system is introduced here. Figure 3 shows schematic configuration of this research. A PC provides a user with a human-PC interactive learning game, i.e., a concentration card game for children to study the multiplication table by interacting with the PC. While the user and learning game compose the real world, a virtual world which consists of a few kinds of insect is also displayed on the same PC screen. In this system, we introduce some linkage between the real world and virtual world. The purpose of this research is to investigate the meaning and significance of
Fig. 3. Schematic Configuration of this Research
the linkage of the virtual world to the real world. The user’s behavior, mental and/or physiological state, and performance in the learning game in the real world affect the virtual world, and cause some change in the virtual world. The change in the virtual world, in turn, would make some effect on the user’s behavior, mental and/or physiological state and the learning game performance. That is, we are interested in how the existence of a virtual world with some linkage to the real world influences
Relationality Design toward Enriched Communications
497
users, what and how we should design the linkage to work it out, and how we could utilize such influence and how we could apply such mechanism to human-human and human-computer interactions. 4.1 Experimental System We implemented a concentration card game on the multiplication table as human-PC interactive learning game in PC. The left side of PC screen shown in Figure 4 represents the card game environment. A user, or PC by turns, selects two cards, one for multiplication question, and another for answer. If the two cards match each other in multiplication, the user or PC gets a point. The right side of PC screen in Figure 4 represents the virtual world with a few insects. These insects move around the world and the number of insects should be increased or decreased according to the user’s performance in the learning game. In other words, the user’s performance in the real world is evaluated, and the evaluation should be presented explicitly as a change of the virtual world, i.e. the increased or decreased number of insects. A subject is directed to play the learning game, and is not aware of the linkage to the virtual world. Whether a subject would notice the linkage or not, and how its consciousness would influence his/her performance are interesting questions. Whether the subject gets conscious about the linkage or not, his or her physiological response might be caused unconsciously. In this experiment, we employed heartbeat measurement to investigate mental and physiological effect of the linkage.
Fig. 4. A Snapshot of the PC screen
4.2 Results and Discussions Figures 5 and 6 show the experimental data in two measurements of the heartbeat. As you can see from these figures, there is no significant change in the number of heartbeats. But we could find some change of RR interval which is measured interval of adjacent R waves in electrocardiogram, as indicated by arrows in Figures 5 and 6. There should be always some fluctuation in heartbeat even if a human is in a rest. It is difficult in general to say that the heartbeat of the subject if actually affected by the virtual world[10]. Compared with the case where there is no change in the virtual world, however, we could find significant effect of change of the virtual world on RR interval. As the result, we conclude that the change of the virtual world causes some effect on human’s state mentally and physiologically.
498
Y. Nakano et al.
Fig. 5. Experimental Data (1)
Fig. 6. Experimental Data (2)
5 Research Direction and Possibilities of Relationality Design Relationality mediated by information, tangible and intangible, artificial artifacts influences human behaviors, thoughts and consciousness. Sooner or later intangible artifacts are formalized as social systems, tangible artifacts are embodied as objects, and then such social systems and objects work for generating new information. Such social systems, objects and new information, in turn, mediate new relationality to influence human behaviors, thoughts and consciousness. We believe that it should be essential to repeat such circulating generation and reciprocal interactions of human consciousness and senses of value with relationality. Different from direct interactions which we experience their effects, however in reality, we sometimes become unaware such intangible relationality as social systems and linkage, in spite of that the effects of such relationality are preserved. The ultimate goal of relationality design is to create some mechanisms that enable to reform people’s consciousness, and it should be very important to be aware of such possibilities of relationality design. In other words, we are aiming to create new senses of value through relationality design. Or we are working on proposing new concepts as possibilities or choices which human beings and human societies could select in the future. For that purpose, it should be useful for us not to take the way as extension of the present time but to take the way to think from an extreme, and/or to look back the present from the future. Based on such motivation mentioned above, some ideas are described below toward to possible research directions of relationality design. 5.1 From “Give and Take” to “Gift and Free-Ride/Use” In the modern societies driven mainly by market economy, equivalent exchange, e.g., give & take policy is one of the most powerful senses of value. It is also obvious that give & take is one of relationality.
Relationality Design toward Enriched Communications
499
People ask a sort of return in exchange for giving information. This policy seems to be the basis for almost all social and economical activities, and most people think it natural. The enclosure of information has been approved and promoted by the protection law of property rights and individuals’ information typically as privacy. Thus, the give & take policy as a sense of value has been reinforced by such social situations. Short term reckoning on borrowing and lending has been spread as a common sense, and people make it sure by contract. Social systems supporting to make contracts and to impose penalty to the break of contract have been established so far. In a sense, however, it might be true that social costs people have to pay have been increased. Let’s assume Gift & Free-Ride/Use instead of Give & Take. Let’s take a position to disclose personal information as much as possible instead of enclosing information. Let’s imagine the following situations: While no one asks a sort of return and/or right in exchange for giving information, people can utilize, change and edit it free and without notice. There should be no merit in a short term to people who disclose information. However, assuming that the policy of Gift & Free-Ride/Use takes root in society, people might be able to enjoy the benefits generated by the circulation of Gift & Free-Ride/Use. Also not a little decrease of social costs might be expected. An example of relationality design based on the Gift & Free-Ride/Use policy is discussed below. 5.2 Possibility of Effective Operation of City Infrastructure Based on Information Disclosure Let’s assume that all people living in a city and its environment disclose their personal information, for example, when they depart home and where they go by car, to a public center. Technologically it should be possible in the near future to collect such information by M2M (Machine-to-Machine) communications through network without labors by people. As an application of utilizing information collected to the center, road traffic in all cross sections in the city could be estimated, and all road traffic signals could be controlled according to the estimations. As the results, we could find some significant merits to the environment, for example, the decrease of fuel and exhaust gas of cars as well as saving time compared to the current road traffic system. Thus, people would be willing to disclose their information and their consciousness would be gradually turned to the direction. If we could clarify such effects as possibilities or options in the future even through simulations, we should focus on technologies and systems design to achieve the goal. When and how to collect information disclosed by people, how to edit such information into some scenarios to control road traffic, and how to execute them adaptively should be interesting research issues to be solved technologically. Road traffic control is just an example. If dynamical demands for city infrastructure could be estimated according to information disclosed by people, we believe it possible to result in effective operations of city infrastructure.
500
Y. Nakano et al.
6 Conclusion In this paper, we have proposed the concept of relationality that denotes interactions, linkage over time and space, and context as a result of accumulated interactions and linkage. Also we discussed research direction and issues on how to design relationality in complex systems composed of intelligent tangible or intangible, artificial artifacts, by using evolutionary algorithms and network science as methodologies. As one of research on relationality, we introduced research on linkage between a user world and a virtual world in an interactive learning system. We investigated the effect of the existence of the virtual world on the user world through subject experiment with heartbeat measurement. And we confirmed that the linkage with the virtual world, especially the change of the virtual world, affects the users’ state mentally and physiologically. There should be still research issues, for example, what of and how a real world should be related to what of a virtual world in time and space, a user’s physiological states would be related to the virtual world directly and/or the linkage between the real world and the virtual world, and so forth.
References 1. Shimohara, K.: Relationality Design. In: Proceedings of 2008 Int. Cong. on Humanized Systems, pp. 365–369 (2008) 2. Liu, J.Q., Shimohara, K.: Molecular Computation and Evolutionary Wetware: A Cuttingedge Technology for Artificial Life and Nanobiotechnologies. IEEE Transactions on Systems, Man and Cybernetics, Part C 37(3), 325–336 (2007) 3. Maeshiro, T., Hemmi, H., Shimohara, K.: Ultra-Fast Genome Wide Simulation of Biological Signal Transduction Networks: Starpack. In: Frontiers of Computational Science, pp. 243–246. Springer, Heidelberg (2007) 4. Maeshiro, T., Nakayama, S., Hemmi, H., Shimohara, K.: An evolutionary system for the prediction of gen regulatory networks in biological cells. In: SICE Annual Conf. 2007, pp. 1577–1581 (2007) 5. Tanev, I., Brozozowski, M., Shimohara, K.: Evolution, Generality and Robustness of Emerged Surrounding Behavior in Continuous Predators-Prey Pursuit Problem. Genetic Programming and Evolvable Machines 6(3), 301–318 (2005) 6. Tanev, I., Shimohara, K.: Evolution of Human Competitive Driving Agent Operating a Scale Model for a Car. In: SICE Annual Conf. 2007, pp. 1582–1587 (2007) 7. Tanev, I.: DOM/XML-based portable genetic representation of the morphology, behavior and communication abilities of evolvable agents. Artificial Life and Robotics 8(1), 52–56 (2004) 8. Hemmi, H., Maeshiro, T., Shimohara, K.: New Computing System Architecture for Simulation of Biological Signal Transduction Networks. In: Frontiers of Computational Science. pp.177–180. Springer, Heidelberg (2007) 9. Shimohara, K.: Network Simulations for Relationality Design — An Approach Toward Complex Systems. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS (LNAI), vol. 5178, pp. 434–441. Springer, Heidelberg (2008) 10. Suzuki, M.: Research on physiology change in Qigong breath method, Master Thesis, Tokyo Denki University, p. 79 (2005) (in Japanese)
Ultra Compact Laser Based Projectors and Imagers Harald Schenk, Thilo Sandner, Christian Drabe, Michael Scholles, Klaus Frommhagen, Christian Gerwig, and Hubert Lakner Fraunhofer Institute for Photonic Microsystems (FhG-IPMS), Maria-Reiche-Str. 2, 01109 Dresden, Germany
Abstract. 2D micro scanning mirrors are presented which make use of a degressive spring allowing to achieve an optical scan range of up to 112° x 84°, optically. The scanning mirrors are deployed for highly miniaturized monochrome and full color projectors as well as for laser imagers. The projectors allow for projection with VGA resolution at 50 Hz frame rate. The laser imager supports full color SVGA resolution at 30 Hz frame rate. Both, the projector and the imager are based on a single 2D scanner chip and thus could be combined in a single ultra compact system for simultaneous imaging and projection with high depth of focus. Keywords: scanner, projection, imager, MEMS, micro scanning mirror.
1 Introduction Visualization is and probably will also be in the future the most important method to transfer information from a computer to the human user/operator. To control the computer the keyboard is beginning to be replaced e. g. by touch screens and systems for gesture recognition. Latter one is possible w/o any sensors attached to the user/operator if the gestures are recorded by an imager and classified respectively interpreted by the computer. Laser based projection and imaging offer the advantage of large depth of focus. This allows us to omit any adjustable focusing optics and thus to miniaturize the system. At the same time projected images are sharp even on surfaces with large topologies. Correspondingly, sharp images can be taken from objects with large topologies or with strongly varying distance to the imager. The paper is focusing on projectors and imagers based on 2D micro scanning mirrors. The principle set-ups of both systems are very similar, as shown in Fig. 1. A combination of projection and imaging in one system using a single 2D scanning mirror is possible. The paper starts with an introduction to the working principle and design of the 2D micro scanning mirror including the integrated sensor for position read-out. After that a monochrome and a full color projector are presented. In the next section a laser imager is presented which is deployed in an endoscope tip. Finally, a short outlook is given which further developments of the scanning mirror are targeted to meet future resolution requirements. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 501–510, 2009. © Springer-Verlag Berlin Heidelberg 2009
502
H. Schenk et al.
Fig. 1. Left: Schematic set-up of a laser projector. Right: Schematic set-up of a laser imager.
2 2D Micro Scanning Mirrors The electrostatic driven 2D micro scanning mirrors are fabricated in a CMOS compatible micromachining process. The mirror plate as well as all mechanical elements are made from a single crystalline silicon layer with a thickness of 30 µm, typically. Figure 2 compares an actual fabricated chip with a schematic chip drawing. The highly doped silicon layer serves as electrical conductor. By means of filled insulation trenches the respective electrical paths are defined allowing us to excite and control the oscillation of the mirror plate and of the movable frame independently. On the actual device the path of the filled insulation trenches is designed as symmetric as possible with respect to both oscillation axes. For electrical connection the insulation path is only shortly interrupted at the respective location.
Fig. 2. Schematic drawing and micrograph of an actual chip
Applying a train of voltage pulses of suitable frequency between the respective comb electrode pairs an oscillation of the mirror plate and the movable frame, respectively is observed. Since the driving torque depends on the capacity change and thus on the deflection angle we are dealing with a parametric oscillator. The most important part of a typical response curve of the mirror plate is shown in Figure 3, left. Shape and properties of the movable frames’ response curve do correspond. The curve denoted by “p” is typical for a torsion spring which constant is independent of the torsion angle or shows a slight increase with increasing angle (compare inset of
Ultra Compact Laser Based Projectors and Imagers
503
12 10
Δfp
8
d
Δf d
6
rel. spring constant
amplitude θ0 / degree
14 1.2 1.0 0.8
p
0
5
10
15
torsion angle
d
20
p
4 2 0
p
d
1.5 mm 1.2 mm
5.30
5.35
5.40
5.45
frequency / kHz
Fig. 3. Left: Typical response curve of a mirror plate suspended by progressive “p”, respectively degressive “d” springs. The actual mirror plates have a diameter of 1.5 mm and 1.2 mm, respectively. Right: The two micrographs detail the design of the respective springs. A small part of the circular mirror plate can be seen at the bottom of both graphs.
Fig. 3). Typically, such behavior is observed for a straight torsion bar as shown in the micrograph denoted by “p” in Fig. 3. At a given maximum driving voltage there is a bandwidth Δf which allows us to meet a given oscillation amplitude. Exemplary, in Fig. 3 an oscillation amplitude of 9° was chosen. Obviously, the bandwidth Δfp of the progressive spring is very small. For the operation of a 2D scanner for projection or imaging the frequency ratio of the mirror plate and the movable frame influences resolution and can not be chosen arbitrarily. With frequency tolerances induced by the fabrication process and frequency variations at changing environmental conditions there is a need to have a bandwidth Δf as large as possible. A possible solution is the use of springs showing decreasing spring constant with increasing torsion angle. The characteristic curve of such a degressive “d” spring is shown in the inset of Fig. 3, left. This behavior is achieved by combining several suspensions near to the torsion axis (compare micrograph denoted “d” in Fig. 3). The combined effect of torque and bending moments at deflection results in the degressive character up to a certain torsion angle. The net effect on the response curve is clearly visible in Fig 3, left. The bandwidth Δfd of the degressive spring is significantly larger than that of the progressive spring. Therefore, with maximum driving voltage a much broader range of frequency mismatch can be adjusted. As the torsion springs are made from single crystalline silicon the mechanical properties do not degrade over time. Thus, open loop operation is possible. In this case the oscillation amplitude is determined by the excitation frequency and the driving voltage. However, as already indicated above, change of environmental conditions affects the response curve and the phase. For this reason closed-loop control of the oscillation amplitude and phase determination is required. In the case of projection this allows us to correctly synchronize the video data. Similar, for imaging it allows to correctly reconstruct the image. Especially because of the sinusoidal timedependence of the deflection angle this has to be done with high accuracy. Read-out of phase and amplitude can be done optically by directing the beam of a laser or an LED to the mirror which scans the beam over an array of photo detectors or a position sensitive device (PSD). A higher level of miniaturization is achieved with an integrated position sensor. In the following, an integrated piezo-resistive sensor is detailed which has been implemented as a volume transducer. This approach
504
H. Schenk et al.
enables to fabricate the sensor without any additional layers for the sensor like e. g. polycrystalline silicon on the surface. Instead, the p+ doped single crystalline 30 µm thick layer itself is used. Basically, the mechanical and the mechano-electrical transducers are combined. The principle is illustrated in Figure 4.
Fig. 4. Left: Illustration of the volume transducers’ principle
R1
R2 spring R1
R2
s p r i n g Fig. 5. Micrograph and SEM of a spring suspension with transducer elements forming the resistors R1 and R2. The spring is highly degressive.
The transducer is contacted by surface contacts, close to each other. This results in a significant asymmetric distribution of the electrical field with respect to the neutral fibre. At bending of the structure the resistance change is dominated by the deformation in the upper half where the electrical field lines are much denser than in the lower part. Figure 5 shows a detail of the spring suspension of a fabricated scanning mirror with two transducer elements forming the resistors R1 and R2. The transducers are mechanically connected to the spring. The spring consists of a central torsion beam and 2 x 4 parallel beams near to the torsion axes. This design results in a highly degressive spring. When the spring experiences torsion the two transducer elements are bent. Consequently, one resistance increases while the other one decreases. With the help of a half-Wheatstone gauge, the resistance change is read out. Figure 6 shows the experimental determined voltage output of the sensor from Fig. 5. The output voltage linearly depends on the scan amplitude θ0 in a very good approximation. Sensitivity of the sensor is 0.414 µV/(V°).
Uout/ U0 [mV/V]
Ultra Compact Laser Based Projectors and Imagers 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0
505
experimental data linear fit
0
1
2
3
4
5
6
7
8
9
10
torsion amplitude Θ 0 [degree]
Fig. 6. Normalized output voltage of the sensor
3 Laser Projector In comparison to any array based projectors like LCDs or micro mirror arrays, scanning mirror based laser projection has two distinctive advantages. First, the image is always in focus. No optics is needed to create a sharp image. Consequently, the image can even be projected sharply on non-planar surfaces. Further, one single 2D mirror chip, together with the bandwidth of the laser source and the electronic circuit, determines the resolution. Increased resolution is provided e. g. by an increase of the scan angle while in array based approaches the number of pixels has to be raised. In other words, the chip size and thus the whole size of the projector can be realized significantly smaller than in the case of any existing array based solution especially for medium and high resolution. MEMS-scanner based projection works similar to a cathode ray tube. Instead of the electron beam a laser beam is deflected two dimensionally. However, while electrons can easily be deflected and thus fast linear scans can be performed, laser deflection requires a mechanical movable mirror. Due to the inertial moment of the mirror extremely large forces are required to perform a linear scan for the fast axis. Therefore, in general at least the fast axis is excited resonantly to make use of the power savings expressed by the quality factor (e. g. Microvisions’ BiMag-Scanner 1). The projector described in the following uses a resonant driving for both axes. Consequently, the trajectory of the laser beam forms a Lissajous-pattern. The system architecture of the laser projector is illustrated in Fig. 7. A USB interface is used to transmit the data from the PC to the electronic circuit realized by an FPGA. The PC software reads and decodes still image formats as well as audio video interleave (AVI) type video files. The data are sent continuously to the projection system. The image/video data are fed forward to two dual-ported RAMs (DPRAMs) acting as image buffer. Double buffering is used so that the DPRAMs can be large enough to store two full video frames. The two sides of the DPRAMs are connected separately to the FPGA, so that simultaneous image-write from the USB interface and image-read of the previous image for video processing is enabled.
506
H. Schenk et al.
Fig. 7. System architecture of the laser projector
The FPGA includes the driver for the 2D MEMS scanner. For that TTL voltage pulses are generated which then are transformed to the required voltage level by MOSFETs. Based on the driving pulses and the known phase between pulse and mirror oscillation the current coordinates of the laser beam can be computed. From the image RAM the corresponding grey value is read out. The grey value is corrected with respect to the non-linear motion of the laser beam. Consequently, the laser intensity for a certain grey value at the image border (low speed of the laser spot) has to be smaller than in the middle of the image (high speed). For laser intensity modulation a digital signal with a depth of 8 bit for each color is forwarded to the laser driver / modulator.
Fig. 8. Left: monochrome VGA projector without housing. Right: projector with housing.
The FPGA has been used to drive both, a monochrome projector which allows us to demonstrate the miniaturization potential and a full color projector which total dimensions still suffer from the fact that small RBG laser modules are not yet commercially available. In the case of the monochrome projector a laser diode emitting at 660 nm and an optical power of 50 mW has been implemented. The frame rate is 50 Hz. Figure 8 shows the highly miniaturized VGA-projector with a dimension of 17 x 7 x 5 mm³.
Ultra Compact Laser Based Projectors and Imagers
507
Fig. 9. Test images of the monochrome VGA-projector
Figure 9 shows three test images demonstrating that arbitrary shapes can be projected despite the Lissajous trajectory. For full color projection three lasers are implemented with an optical intensity of 30 mW, each. The three laser beams are combined by dichroitic filters and directed to the 2D scanning mirror. Resolution and frame rate is identical to that of the monochrome projector. Figure 10 shows the RGB projector as well as a test image.
Fig. 10. RGB laser projector. a) housing with projection head. b) without housing: The RGB lasers are visible, c) full color image with VGA resolution and 8 bit color depth for each color.
4 Laser Imager For imaging, the laser beam in principal is scanned like in the case of the projector along a Lissajous-pattern (compare Fig. 1). The laser, however, is not modulated but emits continuously, i. e. in the cw-mode. The light reflected back from the object is collected and directed to a detector. The time dependent signal of the detector is
508
H. Schenk et al.
correlated to the time-dependent deflection angle and from this, the image is reconstructed. The system architecture of the scanned beam imaging system 2, 3 is illustrated in Figure 11. For a full color image, three lasers (RGB) are combined with dichroitic filters and focused onto a single mode glass fiber which guides the light to the 2D scanning mirror. Due to the oscillation of the micro mirror the RGB-beam is scanned across the object.
Fig. 11. System architecture of the laser imager (after 2)
The diffusely back reflected light is collected by several multimode fibers near to the MEMS scanner. The light is guided to the detection module where three dichrotic mirrors split the light according to the respective emission wavelength and direct it to the respective detectors. Due to the bandwidth requirements Avalanche photodiodes (APDs) are deployed which are thermoelectrically cooled to reduce Johnson noise. The laser imager offers further to detect signals at wavelengths between the excitation wavelengths or even below. In our case the system includes an additional detector for fluorescence signals. The signal at the respective detector is amplified and digitized with a rate of 50 MHz and 12 bit resolution. From the time-dependent signal and the corresponding deflection angle of the 2D scanning mirror the image is reconstructed in real time and displayed for the user. The system performance was demonstrated by Microvision by means of an endoscope 3. The objective was to achieve an optical scan range of 112° x 84° and SVGA (600 x 800) resolution at a frame rate of 30 Hz. For that a 2D scanning mirror with a frequency of 1 kHz for the slow axis and 16 kHz for the fast axis was developed and fabricated at FhG-IPMS. The high deflection angle required degressive springs. Simulation showed that a Y-shaped form provides a high operation bandwidth and allows us to keep the mechanical stress within the target range. Figure 12a shows a photograph of the 8 mm diameter endoscope tip including the glass fibers and the 2D scanner. A picture taken through the glass dome shows the assembled scanning mirror (Figure 12b). A full color sample image with SVGA resolution at 30 Hz is shown in Figure 12c. The laser imaging inherent property of
Ultra Compact Laser Based Projectors and Imagers
509
c )
Fig. 12. a) Endoscope tip with a diameter of 8 mm. b) Photo through the glass dome. The 2D scanner is visible. The mirror contains a 50 μm hole for illumination from the backside. c) sample image with SVGA resolution. (courtesy of Microvision).
very large focus depth allows to image samples with large topographies respectively at varying distance. Note that deploying modulated laser sources instead of cw-lasers would allow us to switch between imaging and projection with a single ultra-compact device. Alternatively, imaging and projection could be done at the same time but at different wavelengths or even at the same wavelengths when using two sources.
5 Outlook for System Optimization With increasing resolution the required electronic bandwidth of both, the laser projector and the laser imager is increasing. In case of the laser projector the bandwidth of the laser modulation has to be enlarged accordingly. To keep bandwidth requirements and thus system complexity low a linear scan is advantageous. For the fast axis it seems impossible to generate the required forces while keeping power consumption low. In case of the slow scanning axis, however, required forces are significantly lower. In comparison to a bi-sinusoidal scan, the bandwidth requirements for laser and electronics with the slow axis being linearly deflected and, for the sake of power saving, the fast axis being resonantly excited is lower by a factor of π/2. A linear scan requires the generation of a force which allows to deflect the mirror out-of-plane in a quasistatic manner. Our approach is to permanently deflect the driving electrode comb out of the chip plane. The permanent deflection is achieved by bonding a cover wafer comprising an activation stamp to the mirror wafer as illustrated in Fig. 13.
M irror w afer 1
+ Cover w afer activation stamp
2 pad + fixed electrodes (inactive)
->
Device 1 2
pad + fixed electrodes (active)
Fig. 13. Left: Functionalizing packaging. The stamp of the cover wafer deflects the comb permanently. Right: SEM graph of a test structure. The left comb is torsional deflected.
510
H. Schenk et al.
Preliminary simulation results show that a scan angle of up to ±10° mechanically in the static mode can be achieved (depending on the scan/switching frequency). Experimentally, the principle was proved by test structures. An example is shown in Fig. 13, right. A more detailed description of the approach including simulation results is found in 4.
6 Summary and Conclusions 2D micro scanning mirrors were presented with resonant driving for both axes. Large scan angles of up to 112° x 84° optically are supported by deploying degressive springs for the suspension of the mirror plate and the movable frame. An ultra compact laser projector and an ultra compact laser imager were presented. With the availability of small RGB-laser modules both systems can be further miniaturized. In principle, both systems can be combined and realized with one single 2D scanning mirror, only. To support higher resolutions a novel approach to realize a quasistatic scanner was presented. Here, the permanent out-of-plane deflection of the driving comb for the slow axis is achieved by a functionalizing packaging.
Acknowledgment The authors would like to thank Dr. Andreas Bräuer from FhG-IOF for the cooperation with respect to the laser projector and Microvision for the cooperation regarding the endoscope.
References 1. Yalcinkaya, A.D., Urey, H., Brown, D., Montague, T., Sprague, R.: Two-axis electromagnetic microscanner for high resolution displays. J. of Microelectromechanical Systems 15, 786–794 (2006) 2. Drabe, C., James, R., Klose, T., Wolter, A., Schenk, H., Lakner, H.: A new micro laser camera. In: Proc. of SPIE, San Jose, USA, vol. 6466, pp. 64660I-1–8 (2007) 3. James, R., Gibson, G., Metting, F., Davis, W., Drabe, C.: Update on MEMS-based Scanned Beam Imager. In: Proc. of SPIE, San Jose, USA, vol. 6466, pp. 64660J-1–11 (2007) 4. Jung, D., Kallweit, D., Sandner, T., Conrad, H., Schenk, H., Lakner, H.: Fabrication of 3D Comb Drive Microscanners by mechanically induced permanent Displacement. In: Proc. of SPIE, San Jose, USA, vol. 7208, pp. 72080A-1–11 (2009)
Understanding the Older User of Ambient Technologies Andrew Sixsmith Gerontology Research Centre, Simon Fraser University 2800-515 West Hastings Street, Vancouver, BC V6B 5K3, Canada www.sfu.ca/grc Abstract. This paper reports on the user-driven research and development (R&D) approach adopted with the EU-funded SOPRANO project (http://www.soprano-ip.org/) to develop an “ambient assisted living” (AAL) system to enhance the lives of frail and disabled older people. The paper describes the conceptual framework and methods used within SOPRANO and briefly presents some of the results from requirements capture, use case development and initial prototype development. The focus of the research is on understanding the potential user of the SOPRANO AAL system using a holistic ecological model of person and context and using methods that aimed to explore different experiential “realities”. The results demonstrate the usefulness of the approach for involving user in all stages of R&D and in generating and evaluating ideas for prototype development. Keywords: Ambient assisted living, older people, user-driven research.
1 Introduction AAL systems use technologies such as sensors, actuators, smart interfaces and artificial intelligence to create an intelligent, interactive and supportive home environment by to enhance safety, support independence and social participation. SOPRANO (Service Oriented Service-oriented Programmable Smart Environments for Older Europeans) is a consortium of commercial companies, service providers and research institutes with over partners from in Greece, Germany, UK, Netherlands, Spain, Slovenia, Ireland and Canada. Emerging information and communication technologies (ICTs), such as “pervasive computing”, “ubiquitous computing” and “ambient assisted living” have the potential for enhancing the lives of frail and disabled older people. However, this paper argues that there are considerable problems of mapping out this new area of technology in a way that is sensitive to the everyday needs, preferences and objectives of a very heterogeneous group. The paper describes the conceptual framework and user-driven approach used within the SOPRANO project and briefly presents the some of the results from requirements capture, use case development and initial prototype development.
2 Background - Conceptualizing the Socio-technical Domain As well as technical development, R&D within AAL has to deal with a number of challenges, not least being how should we explore, visualize and map out this J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 511–519, 2009. © Springer-Verlag Berlin Heidelberg 2009
512
A. Sixsmith
unchartered area in order to exploit this potential. Within the general field of technology and ageing, there has been a focus on health-related problems associated with old age. Moreover, R&D is often driven by a technological agenda that defines R&D in terms “engineering problems and solutions”, while evaluation of needs and outcomes focus on “usability” of systems and devices. However, this approach often misses the point of how technology is embedded in the everyday lives of older people and how it could positively enhance their quality of life. If these issues are not taken seriously, there is a danger that ill-conceived technologies will at best be irrelevant or inappropriately-designed and at worst will reinforce some of the negative ageist assumptions that frame much of society’s response to ageing [1]. In a very real way, the experience of ageing is a socially constructed phenomenon and the SOPRANO system and its technical components should be explicitly conceived and designed as part of a “socio-technical” system that incorporates both the human and machine domains within a coherent conceptual framework. Within SOPRANO this framework for understanding the user comprised two perspectives. A first perspective involved building a model of the personal and contextual domains relevant to AAL. Much of the literature around assistive technologies focuses on the functional impairments of the person and suggests solutions that directly or indirectly compensate for that impairment- what could be described as a “problem-need-solution” approach. Examples of this are prosthetic devices or powered wheelchairs for physically impaired people. In addition, much of the research and development in relation to older people has focused on the problems associated with advanced old age, particularly focusing on issues of safety and security (such as personal response systems, wandering alerts, fall detectors, hazard detection). While these kinds of help and support are important they ignore the interplay of the personal and contextual factors in the sense that “disablement” is an expression of the relationship of impairment within its environmental context. Moreover, the problem-need-solution approach ignores the creative and adaptive responses and resilience of the person to deal with their everyday life situations and the need to develop an approach that is about “enablement” as well as “care”. In light of this, the SOPRANO project adopted a simple ‘ecological” model of the person (3) in order to map out the main domains of the socio-technical AAL system (Fig 1). The model makes a distinction between objective and subjective components. The objective components of the model refer to: characteristics of the person (physical and cognitive status, psychological variables, etc.); characteristics of the context in which they live (physical environment, care provision, social network etc.); activities that are carried out in everyday life (social interaction, activities of daily living, leisure etc.). The ecological conceptualization suggests that these personal and contextual components are somehow in balance, for example, that the person’s functional capacities are sufficient to meet the demands of the context within which they live. The subjective components refer to the way the person experiences their everyday life in terms of personal meanings and well-being. For example, two people may be struggling to perform some activity (e.g. dressing or climbing stairs ) because of some physical problem. For one person this may be a source of frustration, but for the other
Understanding the Older User of Ambient Technologies
513
Person
Support Network
Social Network
Activities
Meaning
Well-being
Physical Environment
Culture
Fig. 1. Ecological model of socio-technical system [2]
it may result in a sense of achievement. The holistic nature of this model allows the researcher to explore the interplay of personal and contextual factors, their relation to everyday life (in terms of activities) and how these impact on the person’s subjective experience and well-being. A second perspective for understanding the user required research methods that addressed different levels of “reality”. Drawing on Dovey [3], it is possible to identify three modes of engagement that have both theoretical and methodological relevance to understanding place experience: • Contemplative engagement: This refers to the “conscious”, articulated perspectives that a person has about their lives. These are the ideas and meanings that a person has developed themselves or articulated in response to questioning, e.g. through interviews and surveys. • Everyday engagement: Much of everyday experience happens at a level of practical activity where individuals are only aware of a few key aspects of their experience at any given time, with phenomena only emerging when the familiar and predictable order of things is “threatened” or disrupted. Much of this experience is unarticulated and forgotten by the person and methods to explore this level of reality require to be situated within the “flow of experience” e.g observation or time use diaries.
514
A. Sixsmith
• Instrumental engagement: This can be seen as the immediate relationship between person and the things around them. In this sense, the life situation comprises features that facilitate or constrain the activities of the person. As with “everyday engagement” this requires methods that are able to directly examine this relationships as a person’s accounts may be inaccurate. A key issue here is that researching these different modes of engagement with the everyday world generally, and technology specifically, requires different approaches and methods. For example, the use of methods that depend on a person’s accounts, such as questionnaires and qualitative interviews, may bring out certain information, but miss out others. Table 1 provides a framework of possible methods for researching these different modes of engagement. The table also categorises these methods according to particular stages within the research and development cycle: user requirements; technology development; evaluation.
3 User-Driven Approach and Methods SOPRANO has aimed to be innovative in its approach to the research and development process by using an ethnographic approach to develop a holistic understanding of potential users and to involve them in all stages of the work to explore, visualize and map out an AAL system that will practical benefits for users in their everyday lives. Potential users were involved in order to gather their feedback on the key challenges to independence/quality of life [1]. This was done without reference to technological solutions in the first instance as the aim was to identify “opportunities” for technological support, without being driven by a predefined technical agenda. To this end, 14 focus groups (with more than 90 users) were conducted with older people, informal carers and care professionals in the UK, the Netherlands, Spain and Germany. Also interviews with older individuals were carried out in Germany, Spain and the Netherlands. A next stage of research and development involved the mapping out of use-cases (descriptive models) for potential technological solutions and exploring how these might work in a real-life context. A challenge was to help potential users to both in terms of design idea generation and design idea evaluation. Visualisation techniques such as theatre groups as well as specially designed focus groups applying multimedia demonstrators [1] were used to transform each use case into a drama session or animated within a multimedia demonstrator, which were viewed interactively and discussed within small user groups. A total of 72 potential users participated in 27 sessions conducted in the 4 different countries. All the research conformed to the ethical guidelines for each country. Sessions were audio or video recorded for subsequent analysis. 3.1 Results The paper outlines some of the results of the user research and development within SOPRANO in areas such as encouraging and reminding people to take medication, helping them to follow exercise programs, enhancing social interaction and living safely. The paper presents results from various stages of the user research from initial requirements analysis through to prototype development and refinement.
Understanding the Older User of Ambient Technologies
515
Table 1. Multi-method approach to research different realities Level of engagement
Perspective
Requirements methods
Development methods
Evaluation methods
Contemplative engagement
• Subjective: • Accounts based research • (quantitative and qualitative) • e.g. scaling • grounded theory
• Interviewsqualitative and quantitative • Focus groups • Surveys
• Intervention visualisation • Interviews • Focus groups
• Interviewsqualitative and quantitative • Focus groups • Surveys
Everyday engagement
• Subjective/o bjective: • Phenomenol ogy • Time/space usage analysis • Case studies
• Diaries • Prolonged researcher engagement • Observation • Activity monitoring using sensors • Experience sampling
• Working prototypes • Observation
• Diaries • Prolonged researcher engagement • Observation • Activity monitoring using sensors • Experience sampling • Logging of interactions by “smart” technologies
Instrumental engagement
• Mainly objective • Subjective to some extent: • Ergonomics • Occupationa l therapy • Personenvironment fit • Human factors
• Observation • Checklists • User assessments • Environment al assessments
• Lab prototypes • Mock-ups and wizard of Oz techniques • Observation
• Observation • Checklists • User assessments • Environment al assessments
4 Opportunities for Socio-technical Support The user research suggested that ambient assisted living has a potential role in a number of key areas. Social isolation has profound negative outcomes such as
516
A. Sixsmith
loneliness, depression, boredom, social exclusion and disruption of patterns of daily living. Safety and Security issues that were highlighted include falls, control of household equipment and access into the home. Forgetfulness appears to be a challenge to independence for many and concerns, for example, taking medication or remembering to switch off devices. Mobility inside and outside the home includes challenges to personal mobility, walking in the neighbourhood and use of public transport. Some of the other themes to emerge from the user research focused on how to enable people to lead a more active and participative lifestyles. Keeping healthy and active included physical and mental activity, exercise, good nutrition and daily routines. Community participation and contribution to local community was a priority for some people. Accessing information/keeping up to date was important in maintaining indpendence, as well as finding help and tradesmen to do little jobs around the home. Getting access to shops and services was problematic for people who have difficulty getting out of the house. The research also highlighted the potential of indirect help and support. Quality management of care provision is an important issue to ensure that the right amount and right quality of care is delivered in people’s homes.
5 SOPRANO Use Cases The user research provided the starting point for developing a set of use cases (descriptive models) that describe in a straightforward way the interactions between users and the AAL system: Safety and security focused on some of the more hazardous aspects of everyday life: Open Door addressed safety and security aspects of access into and out of the home; Safe involved environmental and activity monitoring for signs of problems; Fall focused on detection and response to falls at home. Forgetfulness were use cases aimed at helping people with mild cognitive impairments: Medication Reminding addressed how SOPRANO could help a person who keeps forgetting to take medicine. Easy-to-use Home Automation demonstrated smart home components supporting independent living. Remembering provided reminders and encouragement for people with every tasks of living. Active lifestyles were use cases that aimed to promote healthy and active living: Exercise focused helping older people to follow rehabilitation programs after discharge from hospital; Active aimed at monitoring and supporting healthy and active routines. Social activities were use cases that aimed to enhance social participation: In Touch aimed at improving social interaction; Entertained focused on supporting leisure activities.
6 Use Case Validation and Refinement - An Example It is not possible to describe all the use cases listed above. However, it useful to present a single use case to illustrate the value of the approach adopted within SOPRANO. The Exercise use case addressed the need to help a person follow a
Understanding the Older User of Ambient Technologies
517
programme of rehabilitation exercises that are typical for someone who has recently had an operation such as hip replacement. The key elements of the use case are: • Roger (user) has had hip replacement surgery and completes the in-hospital course of exercises • The hospital discharge planning service contacts Wendy (Therapist) and tells her that Roger is leaving hospital, that he has been prescribed exercises to do at home. • Wendy and Roger and Pamela decide whether Roger will: Receive assistance [from a person] at home to do the exercises Travel to the hospital physiotherapy department to do the exercises Do the exercises at home by himself with or without assistance from SOPRANO. • Wendy feels that if Roger does not want to travel to the hospital, he will still need support to do them correctly at home and that Roger might 1) might give up too easily, 2) forget some of the exercise steps or 3) need reminding when to start or encouragement to keep going. Roger feels that he would manage at home, but he does not refuse assistance. It is agreed that Roger will do the exercises at home with support from SOPRANO system. • The SOPRANO system uses a speech interface and on-TV avatar to prompt and encourage Roger to carry out his exercises as prescribed. Example dialogue is: “Good afternoon Roger, it’s time for your exercises. Do you want me to remind you what the first exercise is? Please say yes if you do or no if you do not”. The system provides a series of simple prompts and reminders to encourage Roger to carry out the exercises and to log if Roger responds “yes” at the completion of each stage. The Exercise use case was presented to a group of older people in a theatre session held in Newham London. This theatre session involved short plays to be enacted based on the use case to help the audience to visualise the system working in a reallife situation. The plays worked through all the different elements of the use case in a mocked-up AAL system interacting with actors. The short plays were followed up by audience feedback sessions moderated by a trained researcher. The main results of the feedback were: •
• • •
Perceived importance of the Exercise reminder: It was clear that older people receive very little direct support for rehabilitation from formal and informal sources. Everyone recognises the need to do exercise, but the spirit and motivation is often weak. People recognise the need for external support, providing evidence that further validates the use case. Look and feel of system: The tone of the support is very important- any help has to be done with sympathy and empathy to the situation of the person, but at the same time be firm in terms of clearly defining the need for doing the exercises. Possible drawbacks: Participants highlighted the possible isolation associated doing exercises alone in the home. This may be very important to the success of the rehab programme. Context: It is important that the person’s living environment is assessed to ensure exercises can be safely and appropriately carried out in the house.
518
• • •
A. Sixsmith
Demonstration: The user needs some kind of demonstration on how to do the exercise- e.g. the person has to be shown how to do it safely. Intrusiveness: The system should not be intrusive on the person’s everyday life and exercise reminders should be geared to the routines, preferences and activities of the person. Monitoring: It is important that a person receives feedback and also that they are monitored to ensure that exercises are actually carried out.
The results of the feedback form the theatre group were extremely important in refining the use-case and developing the system specification for the SOPRANO system. The example presented here illustrates the value of utilising innovative methods for involving users directly in the design process and to help them visualise the system as part of everyday “reality”.
7 Conclusions and Future Directions This paper has discussed the methods and results of a user-driven approach to developing an AAL system. The focus of the research has been on understanding the potential user of the SOPRANO AAL system using a holistic ecological model of person and context and using methods that aimed to explore different experiential “realities”. The paper presented some of the high level results of the SOPRANO user research and illustrated the usefulness of the approach through a more detailed examination of a particular use case supported by the SOPRANO system. The paper demonstrates the usefulness of the approach for involving user in all stages of R&D and in generating and evaluating ideas for prototype development. User involvement has been facilitated through continuous dialogue using methods that have been tailored to meet the specific information needs at each stage of the research process. Later phases of the SOPRANO project involve the development of prototype system and components and the deployment of the system in demonstration facilities and field trials. Large scale field trials of limited version of the SOPRANO system are planned to evaluate its impact in real-life situations with 300 users and carers across Europe. The large-scale trials will fully support the “medication”, “remembering” and “fall” use cases and partially support the “active” and “entertained” use cases. As with earlier phases of SOPRANO the user research will specified by the approach and methodological framework outlined earlier.
Acknowledgements SOPRANO (http://www.soprano-ip.org/) is an Integrated Project funded under the EU’s FP6 IST programme Thematic Priority: 6.2.2: Ambient Assisted Living for the Ageing Society (IST – 2006 – 045212). The authors acknowledge the input and role of the SOPRANO consortium and would also like to thank the many people who volunteered in the various stages of user research described in this paper.
Understanding the Older User of Ambient Technologies
519
References 1. Sixsmith, A.: SOPRANO An Ambient Assisted Living System for Supporting Older People at Home. In: Paper for ICOST 2009 7th International Conference on Smart Homes and Health Telematics: Ambient Assistive Health and Wellness management in the Heart of the City, Tours, France, July 1-3 (2009) 2. Sixsmith, A., Gibson, G., Orpwood, R., Torrington, J.: Developing a technology ‘wish-list’ to enhance the quality of life of people with dementia. Gerontechnology 6(1), 2–19 (2007) 3. Dovey, K.: Home as Paradox. In: Rowles, G., Chaudhury, H. (eds.) Home and Identity in Late Life. Springer Publishing Company, New York (2005)
Multi-pointing Method Using a Desk Lamp and Single Camera for Effective Human-Computer Interaction Taehoun Song1, Thien Cong Pham1, Soonmook Jung1, Jihwan Park1, Keyho Kwon2, and Jaewook Jeon2 1
School of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea {thsong,pham,kuni80,fellens}@ece.skku.ac.kr 2 School of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea {khkwon,jwjeon}@yurim.skku.ac.kr
Abstract. Multi-pointing has become an important research interest, and is used in many computer applications to allow users to interact effectively with a program. Multi-pointing is used as an input method, and can also be fun and very user-friendly. However, in order to use the method, a complex and expensive hardware configuration is required. This paper presents a new and low cost method of multi-pointing based on a simple hardware configuration. Our method uses dual hand recognition, a table lamp, and a single CMOS camera. The table lamp provides a steady illumination environment for image processing, and the CMOS camera is mounted to maintain good stability. A single camera is used for dual hand recognition to achieve multi-pointing. Therefore, image processing does not require intensive computing which allows us to use a stand-alone system (including a 32 bit RISK processor). The results of the proposed method show that effective control navigation of applications such as Google Earth or Google Maps can be achieved. Keywords: Multi-Pointing, Hand Recognition, Human-Computer Interaction.
1 Introduction Multi-pointing on a computer screen is an effective inputting method that is generating considerable research interest. Multi-pointing studies have increased recently due to the development of touch-pads and sophisticated camera applications. The latest display screens, used in personal computers, are typically liquid crystal displays(LCDs), which are also used as the displays in embedded systems. Embedded systems usually use seven inch LCD screens with touch-pads. Unfortunately, the normal display systems of personal computers cannot use such touch-pads because, in J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 520–525, 2009. © Springer-Verlag Berlin Heidelberg 2009
Multi-pointing Method Using a Desk Lamp and Single Camera
521
general, PC’ display resolution is much greater then the resolution in embedded systems. Research on camera-based multi-pointing is limited since camera-based methods can increase the frame rate of the system, thereby increasing hardware configuration requirements for real-time performance. The input devices used in PC systems normally include keyboards and mice, and possibly joysticks and tracker-ball devices [1]-[4]. These input methods do not support multi-pointing and commercial touchpads have limited multi-pointing ability with which to support desktop PC display systems.
2 Motivation The proposed multi-pointing method is performed using dual hand recognition, using a table lamp and a single CMOS camera. The aim is to support multi-pointing using a normal LCD monitor through a simple hardware configuration without intensive image processing. Section III describes multi-pointing environments and the multipointing strategy, including skin color information extraction, dual hand recognition, and hand position estimation. Section IV discusses the implementation scenario and the required interface commands. Section V presents results from our proposed multipointing system, and some conclusions from our research.
3 Simple Strategy for Multi-pointing 3.1 Overview of Multi-pointing Environments The experimental environments considered in this study are shown in Fig. 1. A CMOS camera mounted with a table lamp views the LCD monitor of the PC'. The images captured by the camera are transmitted to a stand-alone system consisting of a Marvell’s PXA320 processor through DMA, via the PXA320's Quick Capture Interface port. Dual hand recognition can be obtained through the received CMOS camera’s image using the multi-pointing strategy described in this section. We show that the Google Earth application program can be controlled by the commands generated from the dual hand recognition system. This allows screen control commands to move forward and backward, turn left and right, and zoom in and out on the axis of the PC screen. These screen-based control commands can be control Google Earth program, and can also control the navigational aspects within the Google Maps application. Figure 2 shows a block diagram for the experimental environments. The blocks of Fig. 2 represent all development and applications. The main components of proposed multi-pointing method are shown in five bold rectangle blocks. Our proposed multipointing method system is not heavy computing performance, and can be embedded to small size.
522
T. Song et al.
Fig. 1. Multi-pointing experimental environments
Fig. 2. Multi-pointing system block diagram
3.2 Multi-pointing Strategy A. Skin color extraction Our skin color extraction approach uses a red-green-blue (RGB) color-based threshold method, whereas previous studies used the HSV color-space for better
Multi-pointing Method Using a Desk Lamp and Single Camera
523
illumination tolerance [5], [6]. The working conditions in [5], [6] changed very quickly, the authors detected the skin in real videos, but the algorithm needed an extensive skin color database requiring many skin sample images. To avoid this dependence on an extensive database, our approach uses a much narrower working environment and requires much higher accuracy. The skin color extraction can be used on an input image by manually extracting a trial color so that the RGB characteristics of the hand’s color can be obtained, and the threshold values can be used (Fig. 3). The proposed method is performed in two steps: 1. Hand skin color extraction. 2. Skin color threshold value created by the proposed method: sample images are captured in the same working environment as the target hand color extraction program. After the skin color extraction step, background and noise effects remain in the resulting image, and future work will address these problems by adopting appropriate filtering of the affected pixels. B. Background and small noise removal This method is referred to in [7] and follows these steps to remove any small noise effects: 1. Every connected component is extracted. 2. Small connected components containing less than a pre-defined number of pixels are removed. End of this step, there are still some big components remaining; these will be filtered next. C. Hand recognition The relative information between any two components is considered in the recognition of the hands (Fig. 5). The hand extraction process described in [7] states that many conditions exist for effective hand recognition, and thus many evaluation functions must be applied to assess the result. However, our proposed method removes many of these hand recognition conditions by using a virtual grid, as shown in Fig. 4. In this way, an evaluation function to recognize the hands can be used to detect some key features that form the important connected components.
Fig. 3. Color Extraction
Fig. 4. Virtual Grid
Fig. 5. Multi-Pointing
524
T. Song et al.
The proposed hand recognition is performed in three steps: 1. Size of a component A: The number of pixels in component A. 2. Size index of a component A: The index of A, using its size information, compared to the other remaining components. 3. Estimate the start and end positions of a component A: Estimate the component’s start and end positions using the virtual grid on the screen.
4 Implementation The intended implementation of the proposed multi-pointing method is to control the Google Earth application. Google Earth is a global map searching program, based on the World Wide Web. Our implementation can control Google Earth in the following ways: move up image (by the forward command); move down image (by the backward command); turn left image (by using the left command); and turn right image (by using the right command). It can also control the image zoom in and out by using the zoom-in/out commands. Each command generated by the results of the hand recognition procedure is shown in Fig. 6.
Fig. 6. Implementation Google Earth Control
5 Conclusion The proposed multi-pointing method, using a desk lamp and single camera, can generate four directional commands: forward, backward, left, and right. In addition, it can generate zoom-in and zoom-out commands, based on human hand position and figure extraction. The Google Earth application controlled using the new approach working in real-time. This study focused on developing an effective input method using a simple multi-pointing approach which can promote the user experience and be fun to use when communicating with computer applications. It is hoped that the
Multi-pointing Method Using a Desk Lamp and Single Camera
525
proposed multi-pointing method can be a useful and effective way to replace traditional input devices such as keyboards, mice, and joysticks. Acknowledgments. This research was supported by MKE, Korea under ITRC IITA2009-(C1090-0902-0046).
References 1. Zhai, S.: User performance in Relation to 3D Input Device Design. Computer Graphics 32(4), 50–54 (1998) 2. Zhai, S., Kandogan, E., Barton, A.S., Selker, T.: Design and Experimentation of a Bimanual 3D Navigation Interface. Journal of Visual Languages and Computing, 3–17 (October 1999) 3. Ku, J.Y., Hong, J.P.: A Study on the Controlling Method for Remote Rehabilitation Assisting Mobile Robot Using Force Reflection Joystick. Journal of the Institute of Electronics Engineers of Korea 40, 26–34 (2003) 4. Lapointe, J.F., Vinson, N.G.: Effects of joystick mapping and field-of-view on human performance in virtual walkthroughs. In: Proceeding of 1st International Symposium on 3D Data Processing Visualization and Transmission Padova, Italia, June 18-21 (2002) 5. Sigal, L., Sclaroff, S., Athitsos, V.: Skin Color-Based Video Segmentation under TimeVarying Illumination. IEEE Trans. Pattern Analysis and Machine Intelligence, 862–877 (2004) 6. Dadgostar, F., Barczak, A.L.C., Sarrafzadeh, A.: A Color Hand Gesture Database for Evaluating and Improving Algorithms on Hand Gesture and Posture Recognition. Research Letters in the Information and Mathematical Sciences 7, 127–134 (2005) 7. Pham, T.C., Pham, X.D., Nguyen, D.D., Jin, S.H., Jeon, J.W.: Dual Hand Extraction Using Skin Color and Stereo Information. In: IEEE International Conference on Robotics and Biomimetics, Bangkok, Thailand, February 21-26 (accepted, 2009)
Communication Grill/Salon: Hybrid Physical/Digital Artifacts for Stimulating Spontaneous Real World Communication Koh Sueda1, Koji Ishii2, Takashi Miyaki1, and Jun Rekimoto1 1
The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, Japan
[email protected],
[email protected], {miyaki,rekimoto}@acm.org
Abstract. One of the problems encountered in face-to-face communication involves conversational imbalances among the participants caused by differences in conversational interests and social positions. It is common for us not to be able to communicate well with an unfamiliar person. On the other hand, old customs in the real world, such as the Japanese tea ceremony, effectively use physical artifacts to enable smoother conversation. In this project, we designed two communication systems that facilitate casual communication using physical/digital artifacts, such as a meal and text-chat, in order to clarify that real world communication can be supported by digital technology. The first system, called the "Communication Grill," connects a grill for cooking meat to a chat system. The grill is heated by the chatting activity. Thus, people must continue conversing to roast the meat. The second system is called the “Communication Salon.” It is a computer-enhanced tea ceremony with a chat screen displayed at a tearoom. Using these systems, we conducted user evaluations at SIGGRAPH and other open events. Based on the chat logs at these events, we found that conversational topics gradually shifted from topics about the systems to more general topics. An analysis of these chat logs revealed that the participants began to communicate spontaneously using this system. Keywords: Augmented reality, Chat, Chat-augmented meal, merging virtual and real, Communication Grill/Salon.
1 Introduction Face-to-face communication is very important in our daily life, even though online communication technologies now support communication. One of the problems encountered in face-to-face communication involves conversational imbalances among the participants caused by differences in conversational interests, social positions and so on. Some situations require casual conversation, such as brainstorming and blind dates. At the same time, some researchers have reported that virtuality activates communities in online spaces, such as the Internet [1][2]. The virtuality allows an individual to be separated from the real world. Therefore, in J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 526–535, 2009. © Springer-Verlag Berlin Heidelberg 2009
Communication Grill/Salon: Hybrid Physical/Digital Artifacts
527
online spaces, it is easy to create a unified community that transcends the conditions of the real world, such as appearance, culture, and social position [3]. Other problems that often cause us to hesitate to communicate with others involve differences in conversational interests or meeting someone for the first time, regardless of whether it is online or face-to-face. It has been found that some games or ceremonies transcend these hesitations by imposing conversational restrictions in the form of rules or goals. In this project, we proposed a system that stimulates spontaneous real world communication by inserting the advantages of an online communication system into face-to-face conversation and placing restrictions on the communication.
Fig. 1. The places used for the demonstrations: the systems require conversation to eat or enjoy a cup of tea. The imbalance in statements was found to be less significant than in traditional chat systems, and the conversational topics gradually shifted from topics about the systems to more general topics. Communication Grill Chang-tei (Left); Communication Salon Chang-tei (Right).
2 Approach In Japan, the tea ceremony has many rules and artifacts that provide topics for communication between the participants. These methods are seen in the tea ceremony because it is based on strict aesthetics and ideology. The tea ceremony was developed to optimize face-to-face communication in a feudal society [4]. It has been found that these kinds of methods stimulate casual face-to-face communication in the real world. These methods aim to optimize communication using ceremonies and the design of a physical space to reduce the gaps in the positions and social conditions of the participants.
Fig. 2. Restrictive rules are used to optimize a tearoom for face-to-face communication while every guest and a master enjoy a cup of tea (Left). Confessional: the penitent can confess anonymously by stepping into a small, enclosed booth for a face-to-face conversation (Right).
528
K. Sueda et al.
These are seen in the features of a Zen temple, including its tearoom and confessional in the cathedral, which is a venue for courtesy ceremonies (Figure 2: left). These methods have the following points in common: 1. They provide spatial or mental immersion by means of a restrictive procedure. 2. Daily activities, including conversation and a meal, are conducted under non-daily circumstances. In addition, real-time online communication, such as text-chat, has these common points [5][6]. Text-chat provides only verbal information under real-time communication. This kind of restriction forces participants to imagine non-verbal information more profoundly than during normal face-to-face communication. In this study, we focused on and considered the potentials of these two points. Our aim is to propose a system that allows spontaneous real world communication by the fusion of a text-chat and daily communication, such as during a meal. We designed two communication systems, the "Communication Grill" and "Communication Salon," to realize this idea. The “Communication Grill/Salon” stimulates a conversation by using systems that require the participants to chat in order to eat. Demonstrations were conducted and evaluated at exhibitions that included ACM SIGGRAPH and Ars Electronica. As a result, we found the following two points by stimulating casual conversations between participants using the “Restriction” that the system required conversation in order to eat. • The bonds between participants were deepened as a result of the meal. • The common goal such as enjoying meal or tea bring the participant positive utterance whether they were first meetings / join on the way or not. 2.1 Conceptual Background “Communication Grill/ Salon” makes participants to recognize a gap between verbal communication and nonverbal communication by the system requiring irrational textchat on a face-to-face communication in the real space. When the participants recognize inconsistent situations, they try to concentrate on the nonverbal information from each other. According to Mehrabian’s report, the spoken word is only 7% effective during face-to-face communication when we receive ambiguous messages, such as the words spoken are inconsistent with the tone of voice or body language of the speaker [7]. Normally, face-to-face conversations contain visual, tone of voice, and verbal information. However, it is difficult to recognize the importance of nonverbal information, because we usually receive both verbal and nonverbal information at the same time, and subconsciously. On the other hand, the “Communication Grill/Salon” provides a way for the participants to receive the two forms of information separately. Requiring conversation to eat is an easy way to overcome an awkward situation, because the participants have to talk with others quickly to heat up the grill. Thus the participants try to receive more nonverbal information from each other, and thus deepen their
Communication Grill/Salon: Hybrid Physical/Digital Artifacts
529
mutual understanding by more profound observations. This is why the system imposes restrictions on the participants, such as requiring them to converse in order to eat or drink. The participants become sensitive to nonverbal information because of the requirement of verbal based conversation with the others through the use of this system. The conversations on the absurdity of a system that requires communication in order to eat allowed the participants to gain more insights about each other.
3 System 3.1 Chat System The Communication Grill/Salon Chang-tei system is comprised of chatting devices, such as a browser phone, PDA, and laptop PC, chat applications that control a heater and indicate the remaining power time, and an electric heater powered by the conversation. When these various parts work together via a network, the act of chatting results in the heating of the heater (Figure 3).
Fig. 3. System Diagram: (The countdown indicators) The timer and meter indicating the time remaining before shutdown and urging the participants to converse
3.2 Interaction Design This system is equipped with a timer that controls the heating element of the heater. This timer turns off the power to the heater if the conversation stops. The chat application interface provides a countdown timer that indicates the remaining time before the power shutdown, urging the participants to converse (see Figure 4:right and Figure 5:center). In addition, the heater is equipped with a red pilot gauge that visually indicates the operating status (Figure 5). 3.3 Communication Grill The Communication Grill consists of an electric grill controlled by the participants’ text-chat, a chat application to communicate with the others sitting with them, a
530
K. Sueda et al.
network (enabling the use of non-Internet connections), and the tableware for eating (Figure 4).
Fig. 4. The communication grill (all in one edition): an LCD (1) to indicate an IP address, an Ethernet port (2) to receive operating signals from chat applications, and an indicator lamp (3) (Left). The Communication Grill Application: a screen shot of the chat application. The bar meter indicates the remaining time before power shutdown, implemented on the right side of the window (Right).
3.4 Communication Salon There are some differences between the Communication Grill and Communication Salon. These two systems provide almost the same interaction, but the interfaces are different. The Communication Salon consists of an Internet connected chat server, a power controller for the heater, Internet connected client devices, a hanging scroll onto which the chat is projected and the remaining time before power shutdown is indicated, and a tea set (Figure 5). One of the big differences compared to the communication grill is the hanging scroll display of chat logs.
Fig. 5. Equipment for the Communication Salon: power-controlling device (Left), hanging scroll chat-log display (Center), and electric heater (Right)
The participant are only able to join and communicate with each other in the same location, because the chat logs are only displayed on a hanging scroll in the tearoom. The hanging scroll also shows a countdown timer for the remaining time before power shutdown. Therefore the hanging scroll is very important in this system. Every participant has to watch the hanging scroll throughout the conversation. In the typical tea ceremony, a hanging scroll is a symbol of the event, and each participant respects
Communication Grill/Salon: Hybrid Physical/Digital Artifacts
531
the hanging scroll, which depicts the theme of the event. Thus the participants see the symbol of the conversation automatically through the use of the Communication Salon. 3.5 Chat Application The chat application for the Communication Grill was installed on a client PC, to provide a normal chat system and a status meter to indicate remaining operating time for the grill (see Figure 4). This chat application allowed the system to be used with a network, and controlled the timer to operate the electric grill (a remark turned on the heater for 5 s). The bar meter, which used animation to indicate the remaining operating time, was placed on the right side of the chat application window. The chat application was supported by multicast IP and so did not need chat servers and an Internet connection. The Communication Salon requires only an Internet connection, a power control device (Figure 5 (left)) for the electric heater, a client device to display the chat-log (Figure 5 (center)), and an electric heater (Figure 5 (right)). 3.6
Demonstrations
The Communication Grill was demonstrated at several exhibitions at open events, including Ars Electronica 2003. We sampled over 16,000 remarks in English or Japanese from the participants’ conversation logs, which contained first meetings. In this research, we analyzed 6,808 remarks from a total of 16 sessions and 33 participants. The Communication Salon was exhibited at open events that included Yokohama EIZONE 2007 and SIGGRAPH 2007. Just as with the Communication Grill, we sampled over 5,000 remarks in English or Japanese from the participants’ conversation logs, which contained first meetings. In this research, we analyzed 2,200 remarks from a total of 31 sessions and 57 participants. Individuals were invited to participate at the venues of the exhibitions.
4 Results/Evaluation 4.1 Overview The results of the chat log analysis showed that almost all of the participants tried to communicate with the others spontaneously. The following observations were also made. (1) There was not a big difference in the remark frequency based on whether or not they were meeting for the first time. (2) Conversational topics gradually shifted from topics about the systems to more general topics. (3) It was easy to shift to a personal topic because of the virtuality of the chat, even though the participants were in the same space. (4) Some of the participants noticed and mentioned the effects of the system. 4.2 Differences between the Grill and Salon The results with the Grill and Salon did not show big differences, except that the speaking frequency of the Salon was longer than the Grill’s. This difference was
532
K. Sueda et al.
caused by the fact that the Salon system worked longer than the Grill per message (Grill: 5 s per message; Salon: 20 s per message). 4.3 Chat Log Analysis As can be seen in Figure 6, there was no relationship between the remark frequencies and participation time. In the figure, the remark frequencies of two participants (they joined/left the conversation) were not lower than the others. (All of the participants were meeting for the first time and all of conversations were in English.) On the other hand, participant A (a non-native English speaker) spoke less than the other participants. This result showed two features of the system: it did not support language differences and did not stimulate meaningless remarks just for operating the heater.
Fig. 6. Relations between remarks and participation time: there was no relationship between the remark frequencies and participation time (participant B joined in the middle of the conversation)
Figure 7 indicates that the participants had become friendly by degrees through the use of this system. This graph classifies the contents of the conversations into three categories: “The topic was operating the heater,” “A topic that resulted from chatting (e.g. impressions about eating, chatting to maintain the heat)” and “other general topics.” The graph indicates the percentages with a 20 remark resolution. As shown in
Fig. 7. The shifting process of conversation topics: the conversational topics gradually shifted from topics about the systems to more general topics
Communication Grill/Salon: Hybrid Physical/Digital Artifacts
533
the figure, the conversations began by exchanging greetings and talking about the operation of the heater. After a while, the participants began to talk more about the topic of eating. Finally, the participants began to talk about personal topics. 4.4 Development of the Conversations Figure 8 shows samples from the beginning and middle of a chat log. As can be seen in the figure, the conversation had been changing to personal topics. In addition, the participants become to talk more naturally. These results show the possibility that the system enables spontaneous interaction by providing a sense of achievement through the use of restrictions. ---------------Early stage---------------7 b says: lets get it cook 8 b says: huhu 9 a says: huh 10 b says: try 11 a says: try 12 a says: ? ---------------After a while---------------143 a says: u wanna JP 144 a says: girlfriend? 145 b says: lost 146 b says: just a week ago 147 a says: uh hmm 148 b says: have a tea
---------------- Early stage -------------3 a says: I’m very hungry 4 b says: you haveta chat!! 5 c says: hungry… 6 b says: more and more~~~~ 7 a says: roast, roast --------------- After a while ---------------1095 b says: tell us beginning of your love..! 1096 d says: sausages’ve gotten coooool 1097 b says: anyway,, 1098 b says: u tell us 1099 a says: I don’t have a girl friend:-P
---------------- Early stage ---------------1 a says: Hello 2 b says: hi! 3 a says: How are you doing? 4 b says: bit sleepy... 5 a says: Have you been here all day? ---------------- After a while ---------------68 a says: did you see takeo igarashi speak at all? 69 b says: i m also 1st time SIG 70 b says: origarashi? 71 a says maybe. I may have butchered his name 72 a says: He made the teddy sketching system
[a: Male, b: Male, c: Female, d: female (Japanese)
] (Left-upside) [a: Male (Austrian), b: Male (Japanese) (Right-upside) [a: Male (Canadian), b: Male (Japanese) ] (Downside)
Fig. 8. Shift of conversational topics: Comparisons of the chat contents, early stage and after a while. As shown in the lists, the conversational topics gradually shifted from topics about the systems to more general topics.
5 Discussion 5.1 “Communication Grill/ Salon” System The goal of this text-chat system was to stimulate communication by mixing the virtual and the real worlds. This system is expected further development in the point of activating communications by situation such as urge, restriction, and reward. These situations provide environments where a user observes, and then considers others. Typical communication technologies pursue speed, efficiency, and accuracy. These are very important ways to promote communication. In this project, we proposed another way to promote communication that considered it from the viewpoint of culture, habits, and rewards. There are some studies that have surveyed the usability
534
K. Sueda et al.
of a communication system from these aspects [8][9]. These achievements should be applied to a greater extent in the HCI design process. 5.2 Future Works These features can be applied to communication that requires casual and spontaneous remarks, such as counseling and brainstorming. They can also be applied to amusement communications, because the restrictions of the system promote the discernment of other participants and the shared environment. It is a good example of using a restriction to trigger the same potential that is displayed by a blind person, who makes good use of aural information and memory to recognize environments [10]. This kind of example shows the possibility of "Real-World User Interfaces" that effectively use environmental information. In addition, this system requires quantitative analysis by comparing it with a typical online chat in real space or surveying related projects [11].
6 Conclusion The proposed Communication Grill/Salon is a system that promotes spontaneous remarks by applying the online text-chat virtuality to face-to-face communication. Observations of the chat logs showed that participants began with casual remarks about using the system. (1) The imbalance of remarks was improved by using this system in conversations, even under circumstances where the participants would normally hesitate to speak, such as first meetings or joining an ongoing conversation. (2) The system provided the participants with a motivation for conversation by requiring text-chat in order to eat. (3) In addition, as result of the chatting, the groups became more communicative throughout the meal. In the future, this research should pursue another new interface design that promotes spontaneous communication by applying the culture or customs of daily life.
Acknowledgments We thank all of the participants on this project, and all of the publication’s support personnel and staff, who provided helpful comments on this paper. Some of the references cited in this paper are included for illustrative purposes only.
References 1. Kraut, R., Patterson, M., Lundmark, V., Kiesler, S., Mukophadhyay, T., Scherlis, W.: Internet Paradox; A Social Technology that Reduces Social Involvement and Psychological Well-being? American Psychologist 53(9), 1017–1031 (1998) 2. Bordia, P.: Face-to-Face Versus Computer-Mediated Communication: A Synthesis of the Experimental Literature. Journal of Business Communication 34(1), 99–118 (1997)
Communication Grill/Salon: Hybrid Physical/Digital Artifacts
535
3. Kraut, R., Kiesler, S., et al.: Internet Paradox Revisited. Journal of Social Issues 58(1), 49– 74 (Spring 2002) 4. Okakura, K.: The book of tea. G.P. Putnam (1906) 5. Hall, E.T.: The Hidden Dimension. Doubleday, New York (1966) 6. Turkle, S.: Life on the Screen: Identity in the Age of the Internet. Simon & Schuster Trade (1995) 7. Mehrabian, A.: Silent Messages. Wadsworth Publishing Company, Inc., Belmont (1971) 8. Mainwaring, S., March, W., Maurer, B.: From meiwaku to tokushita!: lessons for digital money design from Japan. In: Proc. CHI 2008, pp. 49–74 (2008) 9. Wyche, S.P., Aoki, P.M., Grinter, R.E.: Re-placing faith: reconsidering the secularreligious use divide in the United States and Kenya. In: Proc. CHI 2008, pp. 11–20 (2008) 10. Ito, K., Miyamoto, E., Tanahashi, K.: Ample information picked up by blind travelers at the train station. In: International Conference on Environment Behavior Studies for the 21st Century, pp. 323–328 (1997) 11. Rekimoto, J., Ayatsuka, Y., Uoi, H., Arai, T.: Adding another communication channel to reality: an experience with a chat-augmented conference. In: CHI 1998 Summary (1998)
Motion Capture System Using an Optical Resolver Takuji Tokiwa1,4, Masashi Yoshidzumi2, Hideaki Nii3, Maki Sugimoto4, and Masahiko Inami5 1
School of Engineering, Tokyo University 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan Department of Mechanical Engineering and Intelligent Systems [email protected] 2 University of Electro-Communications 1-5-1, Chofugaoka, Chofu-shi, Tokyo 182-8585 Japan [email protected] 3 Graduate School of Information Science and Technology, Tokyo University 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan [email protected] 4 Media Design Institute, Graduate School of Media Design, Keio University 4-1-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8526, Japan [email protected] 5 Graduate School of Media Design, Keio University 4-1-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8526, Japan [email protected]
Abstract. In this paper, we present a novel position measurement method that makes use of a couple of plane light sources created from an IR-LED matrix array and a photo-detector. The light sources emit light with the same frequency, but different phases, while the optical axes of the sources are set up orthogonally. Then, the signal place is diffused by the space with phase differences in each position. Finally, the signal received by the photo-detector is analyzed to determine the position. Keywords: Motion Capture, Position Detection.
1 Introduction With ubiquitous computing, the input and output of information for users is performed through various interfaces. Interfaces that are optimized for the location and circumstances of users are necessary, as well as a standardized interface. An optimized interface facilitates the input and output of information, disperses it effectively and offers the user the opportunity for new knowledge. It is important in creating such an interface to ensure that the design retains basic functionality, that is, a combination of the accepted requirements and an environment that makes this possible, so that information appliances can be attached if necessary. Besides, an information environment not only responds to requests received from a user through an information terminal device, but can also detect via various sensors J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 536–543, 2009. © Springer-Verlag Berlin Heidelberg 2009
Motion Capture System Using an Optical Resolver
537
the position and actions of the user if necessary. The information given to a user is presented by means of information terminal devices, and also a combination of information appliances in the information environment. The presenting method depends on the location, circumstances and requests from the user. It is important to realize such an environment that has the technology to detect both a user and the position of an information appliance easily in the information environment. Consequently, techniques have been proposed and developed to detect a user and position, such as how to use RFID tags[1,2], how a marker is detected using image analysis technology[3-5], how distance is measured by supersonic waves, and so on, together with resolvers and “Polhemus”[6]. These techniques, however, have several weaknesses. For example, measurement precision depends on measurement time, while the system architecture is complicated as detection requires high-performance computers for analysis of images and data. Therefore, a technique has been suggested to measure information projected in the real world[7,8]. The measurement system is this case is simplified and easy to use. Recently we proposed a technique that uses two plane light sources which flash on and off continually with a fixed phase difference and the same frequency. This creates a signal field with phase difference changes in a particular direction in the space, that makes the detection of a position possible, when the signal is measured by a photodetector. Using this technique, device construction is simplified. Moreover, detection does not depend on the frame rate as is the case in movie capture using a video camera and position detection is continuous at high speed. The implemented system has, however, been developed only as a proof of concept. The device construction is complicated and cannot really be used in a ubiquitous computing environment. Consequently, in this paper, we examine an alternative method to detect phase difference on a computer to simplify device construction. The paper is organized as follows. The basic concepts of the proposed technique are explained in the next section. An implementation as proof of concept and the results are reported in Section 3. An application of the proposed technique is described in Section 4, while potential future problems are considered in Section 5.
2 Optical Resolver When light from two plane light sources, with a constant phase difference and that flickers with the same frequency, is composed in space, a signal field develops where a phase difference changes in a specific direction. This technique occurs as a result of the following three principles. 1. The light emitted by a plane light source irradiates a finite space and has directionality. The photo-detector changes according to the angle between itself and the plane light source, if the distance to the photo-detector of the received luminance is fixed, and if the photo-detector always faces the center of the plane light source. 2. The synthetic light is measured at the position where the light output from multiple light sources overlaps.
538
T. Tokiwa et al.
3. When two signals, with identical frequency but whose undulating phases differ, are synthesized, a signal is obtained in which only the phase deviates in proportion to the ratio of the strength of the two signals. The uniaxial angle detection system developed from these principles is explained in the next section. 2.1 Uniaxial Angle Detection System The concept of the uniaxial angle detection system is illustrated in Fig. 1. Plane light sources S1 and S2 flicker at Asin(ωt) and Bcos(ωt) (where A and B denote amplitude, and ω is the modulated angular frequency). The luminance, however, does not become negative. Then, the AC component is changed by adding a positive offset to the sinusoid. Henceforth, only the change in this alternating current is noticed. Similarly, the luminance of the other light source is also changed to alternating current.
Fig. 1. Uniaxial Angle Detector
The following are determined as depicted in Fig. 1: the origin O, X-axis, Y-axis, Zaxis, and rotation angle θ. Luminance Ls, which is mesured by a photo-detector placed in the region X,Y > 0 (Z is constant), is defined by Eq. 1. (1)
Motion Capture System Using an Optical Resolver
539
Phase difference φ is defined as φ = tan-1 (Bsin(θ) / Acos(θ)). This becomes φ = θ, if A = B. Angle θ, measured between the X-axis and the original direction, is equal to the phase difference of the composite signal and the drive signal of the source of light that is detected by the light receiving element. The wavefront develops radially at the Z-axis center in the first quadrant of the XY plane. The proper phase difference φ is measurable by the phase detection. It is possible to ensure that the sampling rate does not depend on the modulation frequency of the light source. In addition, in practice, a light receiving element exhibits directivity. However, the problem of directivity is removed by the diffusion board which is installed at the front of the photo-detector and spherical solar batteries without directivity. As a result, the directionality of the photo-detector is not considered in Eq. 1. 2.2 Development of Uniaxial Angle Detector A block diagram of the system produced is shown in Fig. 2. A source of light arranged as a chip type infrared LED with wide directivity in the array state is substituted for the plane light sources. A “CL-190IRS-X” infrared LEDm manufactured by CITIZEN ELECTRONICS CO., LTD., was used as the LED. The plane light source mounted in the LED array is shown in Fig. 3. According to the basic principles given in Section 2, the light source is driven by a sine wave. However, it is not easy to allow the luminance of an LED to be changed accurately in the sinusoid. Furthermore, this complicates the composition of the equipment. Thus the LED was actually made to flicker in accordance with a 1[kHz] rectangular wave. The higher harmonics of rectangular wave were cut off by the filter in the phase difference detection circuit. A “TPS615” phototransistor manufactured by the Toshiba Semiconductor Company was used as the photo-detector. This component has directionality. The photo-detector was installed in such a way that it always faces the direction of the intersection of the light sources in the experiment. A “CD-505R2”
Fig. 2. System block diagram
540
T. Tokiwa et al.
Fig. 3. PlaneLightSource
phase detector from NF Corporation. was used as a noise filter and for cutting out the higher harmonics and elements in the phase difference detection circuit [9]. The “CD505R2” consists of the following: input differential amplifier, two post amplifiers, bandpass filter, phase shifter, phase detector, and lowpass filter. It is possible to decide the characteristics of each circuit by its resistance and the capacitor of the outside attachment. The phase-detector uses a rectangular wave in the reference signal, takes the inner product with the signal passed through the bandpass filter, and finally outputs a direct current through the lowpass filter.
3 Improvements to the System The system described in Section 2 was developed to verify the concepts of the proposal, and uses a specialized custom-made analog circuit with masking. In addition, an AD interface board is used to allow the measurement result to be analyzed by a computer. These components are expensive, and the equipment itself is bulky, making it difficult to include the system in other equipment. A ubiquitous computing environment comprises various components of information-processing equipment. For the proposed technique to be used in such an environment, it needs to be incorporated in this information-processing equipment. For this reason, it needs to be low-cost and small and should be simple to use. On the other hand, computer performance has improved remarkably in recent years, making it possible to carry out digital signal processing, even without the use of specialized circuitry. In the field of media arts, it has been proposed that the computer's audio input port and audio interface for music creation be used instead of the AD interface board [10]. Audio interfaces provide a large number of input ports at low-cost and enable system development using a software environment for realtime signal processing for audio and multimedia, such as “MAX”, “PD (PureData)”, “jMax”, and so on [11-15].
Motion Capture System Using an Optical Resolver
541
Since precision of processing is not guaranteed in these software environments, verification is necessary. Consequently in the simulation, we examined the accuracy of a program for phase difference detection created in “MAX”. The results of the simulation are shown in Fig. 4. In the simulation, a square wave (5[Hz]) was used as the driving signal of the plane light sources. The angle varied between zero and 90 degrees, with an interval of 10 degrees. The signal driving S1 was used as the reference signal. The band pass filter extracts signals of the basic frequency from the reference signal and the signal measured by the photo-detector. Signals extracted using filters are segmented by means of a flip-flop method. The difference in the starting time of the two signals is measured and denotes the phase difference.
Fig. 4. Simulation results. (Signal field generated in a square wave 5[Hz], and measured with sampling rate 96k[Hz])
4 Application of the Proposed Technique 4.1 Two-Dimensional Position Detection System Measurement in two dimensions becomes possible when the installed system is orthogonal to the direction of the signal field formed by two plane light sources. Multiple signal fields can coexist by using a frequency that differs for every signal field. Each signal field can be separated by orthogonal detection and a bandpass filter. All angles and positions can be measured, if the phase difference is detected in every signal field.
542
T. Tokiwa et al.
4.2 Ubiquitous Computing Environment The proposed technique can be used in an indoor ubiquitous computing environment. Plane light sources are included in lighting equipment, while receivers are included in information devices that comprise the ubiquitous computing environment. Then, information devices would be able to grasp positions autonomously, if these are within a signal field irradiated from the light source. If combined in a information terminal for ubiquitous computing that allows a user to feel the direction of the source of information such as CoBIT [16], not only the perception of the user but also the terminal itself can grasp the direction of the source of information.
5 Conclusion Conventional implementations of the proposed position sensing technique depended on a specialized and custom-made analog circuit module and an AD interface board. This made it difficult for incorporation in information devices used in a ubiquitous computing environment. To solve these problems, an experimental phase difference detection program was developed and validated in a simulation in which it was implemented as an audio interface in a realtime audio/multimedia programming environment.
References 1. Tokiwa, T., Tokuhisa, S., Honna, Y., Shinozaki, T., Kusunoki, F., Nishimura, T., Iwatake, T.: Surround CoBIT: A method for presenting auditory information as a virtual acoustic field. In: International Workshop on Smart Appliances and Wearable Computing 2004 (2004) 2. Kusunoki, F., Isyama, A., Tokiwa, K., Nishimura, T.: The sensing board enhanced by interactive sound system for collaborative work. In: Proceedings of the 2004 ACM SIGCHI international Conference on Advances in Computer Entertainment Technology (2004) 3. Kato, H.: Marker tracking and hmd calibration for a video-based augmented reality conferencing system. In: Proc. 2nd IEEE/ACM Int. Workshop on Augmented Reality (IWAR 1999) (1999) 4. Rekimoto, J., Ayatsuka, Y.: CyberCode: Designing Augmented Reality Environments with Visual Tags. In: Designing Augmented Reality Environments (DARE 2000) (2000) 5. Kaltenbrunner, M., Bencina, R.: reacTIVision: A Computer-Vision Framework for TableBased Tangible Interaction. In: Proceedings of the first international conference on Tangible and Embedded Interaction (TEI 2007) (2007) 6. Polhemus: Motion Tracking, 3D Scanning, and Eye Tracking Solutions from Polhemus, http://www.polhemus.com/ 7. Kagotani, G., Kojima, M., Sugimoto, M., Nii, H., Inami, M.: PTS: Projector-based Tracking System, International VR. In: Media Art and Technology 2004 Proceedings, p. 27 (2004)
Motion Capture System Using an Optical Resolver
543
8. Raskar, R.: Rfig lamps: Interacting with a self-describing world via photosensing wireless tags and projectors. ACM Transactions on Graphics (TOG) SIGGRAPH 23, 406–415 (2004) 9. NF Corporation, CD-505R2, http://www.nfcorp.co.jp/english/pro/fm/pha/index.html 10. Jo, K.: Audio Interface as a Device for Physical Computing. In: Audio Mostly 2008 - a conference on Interaction with Sound, Piteå, Sweden, October 22-23, pp. 123–127 (2008) 11. Cycling 1974, Max/MSP (1974), http://www.cycling74.com/products/max5 12. Puckette, M., Apel, T.: Real-time audio analysis tools for Pd and MSP. In: Proceedings of International Computer Music Conference, pp. 109–112. International Computer Music Association, San Francisco (1998) 13. Déchelle, F., Borghesi, R., De Cecco, M., Maggi, E., Rovan, B., Schnel, N.: jMax: A New JAVA-Based Editing and Control System for Real-Time Musical Applications. In: Proceedings of the 1998 International Computer Music Conference. International Computer Music Association, San Francisco (1998) 14. Geiger, G.: PDa: Real Time Signal Processing and Sound Generation on Handheld Devices. In: Proceedings of the International Computer Music Conference, Singapore (2003) 15. Breidenbruecker, M., Geiger, G., Brossier, P., Hazan, A., Barknecht, F., McCormick, C., Nordgren, A.: RjDj, http://www.rjdj.me 16. Nishimura, T., Itoh, H., Nakamura, Y., Yamamoto, Y., Nakashima, H.: A Compact Battery-Less Information Terminal for Real World Interaction. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 124–139. Springer, Heidelberg (2004)
The Effects of an Anti-glare Sleeve Installed on Fluorescent Tube Lamps on Glare and Reading Comfort Shiaw-Tsyr Uang1, Cheng-Li Liu2, and Mali Chang1 1 Department of Industrial Engineering and Management, Minghsin University of Science and Technology, Hsinchu 304, Taiwan 2 Department of Industrial Management, Vanung University, Taoyuan 302, Taiwan [email protected], [email protected], [email protected]
Abstract. Our previous study has demonstrated the benefits of a reflective sleeve to redirect lighting and to enhance luminous intensity of fluorescent tube lamps in certain light projecting angles. A reflective sleeve is composed of a plastic reflector and a transparent refractor. However, the intensive centralized lighting may increase the possibilities of producing glare. In this study, the transparent refractor of the sleeve is replaced with a diffuser to compose an anti-glare sleeve. This study adopts measurement, optical software simulation, and experiment methods to investigate the effects of an anti-glare sleeve on redirecting lighting and reducing glare. The results demonstrate that luminous intensity towards viewing objects of a fluorescent tube lamp enhances after adopting an anti-glare sleeve. In addition, software simulation indicates an antiglare sleeve increases light uniformity and reduces glare. The subjective evaluation also shows that florescent tube lamps with anti-glare sleeves produce less light reflection on various papers and more comfortable reading. Keywords: Glare, Reading comfort, Fluorescent tube lamp, Lamp sleeve.
1 Introduction According to previous studies, up to 20% of world's electrical energy consumption is used for lighting purposes [1]. Fluorescent lamps have the advantages of larger light emitting area, more evenly light uniformity, lower temperature on tube surface, similar color to sunlight, and longer life period than incandescent lamps do. Hence, fluorescent lamps are the most commonly and widely used artificial light sources in indoor spaces nowadays [2-4]. Our previous study has demonstrated the benefits of installing a reflective sleeve on a fluorescent tube lamp to redirect lighting and to enhance luminous intensity in certain light projecting angles around 80% [5]. A reflective sleeve is composed of a plastic reflector and a transparent refractor to control light distribution and density. However, the intensive centralized lighting may increase the possibilities of producing glare which cause eye discomfort and/or performance decrement. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 544–553, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Effects of an Anti-glare Sleeve Installed on Fluorescent Tube Lamps
545
Regarding reducing glare, Japuntich [6] found that the use of a linear polarized light source helps to minimize specular glare by darkening the reflected image of the light source on the document. Theoretical predictions and light measurement analysis of specular glare reduction were compared with empirical results from testing on a panel of humans on semi-gloss finish and matte finish papers. This study showed that with the right alteration of a polarized light source position, specular glare may be significantly reduced, and correlations exist between the theory, empirical measurements and the human response to specular glare reduction. Osterhaus [7] reviewed and discussed the advantages and limitations of using existing glare indices for daylighting conditions. It concluded that available assessment and prediction methods are of limited practical use in daylit situations and currently have no provision for integrated systems that combine daylighting and electric lighting. This paper also presented selected findings from a case study of daylit office environments which identify a number of important design considerations. Iwataa et al. [8] measured subjective response to intense light, or glare caused by a wide source. Three glare indices have been investigated in this study: the Building Research Station glare index, the CIE glare index and the Cornell daylight glare index. They have also examined the glare vote and have proposed a new glare evaluation scale, as well as asking the subjects to vote on the condition's acceptability. The Cornell formula most accurately predicts glare discomfort, but it is found to be inadequate for a range of wide-source glare conditions. Both the discomfort sensation and the glare ratings which they proposed correlate well with the percentage of subjects dissatisfied when looking directly at the light source. Koga Kim [9] proposed a practical method of determining the background luminance in the evaluation of discomfort glare. Two experiments were conducted, a visual sensitivity test and a glare sensitivity test. The results show that the luminance of threshold and the luminance of discomfort glare are mainly determined by the luminance of the immediate background of the source, rather than by the average background luminance. Velds [10] tried to draw up a relation between glare assessments and measured quantities. For this purpose two test rooms were used: one room was occupied by the subject, and the required measuring equipment was placed in the other one. An electronic questionnaire was developed for these studies and installed on the computer of the subject. Continuous measurements were necessary to link subjective assessments to measured quantities that were obtained at the same time. The study showed that the vertical illumination measured near the facade and the average sky luminance measured from the back of the room are good measures to monitor visual comfort under intermediate and overcast sky conditions. In this study, the transparent refractor of the sleeve is replaced with a diffuser to compose a newly designed lamp sleeve, which we called it an anti-glare sleeve (see Fig. 1). The reflector of this sleeve is used to redirect lighting towards viewing objects. On the other hand, the diffuser is used to diffuse lights, thus to reduce glare. Hence, the purpose of the present research is to investigate the effects of an anti-glare sleeve on redirecting lighting and reducing glare.
及
546
S.-T. Uang, C.-L. Liu, and M. Chang
Reflector
Diffuser
Fig. 1. A fluorescent tube lamp with an anti-glare sleeve
2 Methods This study adopts measurement, optical software simulation, and experiment methods to investigate the effects of an anti-glare sleeve on redirecting lighting and reducing glare. These three methods are described below in this section. First, the luminous intensity distribution curves of 10, 20 and 40 watts T8 florescent tube lamps with no sleeve, a reflective sleeve and an anti-glare sleeve in different light projecting angles (0°, 45°, 90°) were measured and recorded by a goniophotometer system. This measurement can obtain the maximum and average luminous intensity (unit: cd), as well as the shape of light distribution. Fig. 2 is the apparatus used in this study and their configuration.
Fig. 2. The measurement apparatus and their configuration
The Effects of an Anti-glare Sleeve Installed on Fluorescent Tube Lamps
547
Next, the luminous intensity collected was imported to the optical software— Lumen Micro 2000. This software was then used to build up simulated lighting condition of one classroom and to calculate illumination levels and glare indexes in order to evaluate the effects in enhancing luminous intensity and reducing glare. Simulation has the advantage in estimating lighting conditions without installing real lamps and lighting fixtures. In addition, Lumen Micro 2000 is capable of computing the glare indexes (visual comfort probability, VCP) of the users. Finally, 30 subjects participated in a laboratory experiment. All 30 subjects have no difficulty in discriminating colors and have normal or corrected-to-normal visual acuity. Subjects evaluated reading effects and their visual comfort while viewing words and graphs on matt finish papers, copy papers, and dowling papers with different kinds of florescent tube lamps (no sleeve, a reflective sleeve and an antiglare sleeve). Subjects experienced all nine (3*3) experimental combinations in random order. Their evaluation was based on a 7-point Likert-type scale, which “1” denotes “strongly disagree” and “7” denotes “strongly agree”. Fig. 3 is our experimental configuration.
Fig. 3. The experimental configuration of this study
3 Results This study adopts measurement, optical software simulation, and experiment methods to investigate the effects of an anti-glare sleeve on redirecting lighting and reducing glare. This section provides the findings from these three research methods respectively.
548
S.-T. Uang, C.-L. Liu, and M. Chang
3.1 Results of Measurement A goniophotometer was used to collect the luminous intensity data of different watts (10W, 20W, 40W) T8 fluorescent tube lamps with different sleeve conditions (no sleeve, a reflector sleeve, an anti-glare sleeve) from -90 to 90 in different light projecting angles (0°, 45°, 90°). 27 paired t tests were conducted to compare the difference of luminous intensity. The results indicate 25 of the 27 t tests are statistically significant. It implies the adoption of sleeve can produce significant difference in luminous intensity. In order to explore this finding further, the luminous intensity diagram curves from -90 to 90 were draw and used to compare the shapes of lighting. These curves demonstrate similar and consistent findings; therefore, only the diagrams of a 10W fluorescent tube lamp in 90 with different sleeve conditions are provided in this paper (shown in Fig. 4~6). The lighting of a fluorescent tube lamp dispersed widely from -90 to 90 (Fig. 4). However, with a reflective sleeve (Fig. 5) or an anti-glare sleeve (Fig. 6) cause the light distribution centralized toward the illuminated surface.
°
°
°
°
°
°
°
Fig. 4. Luminous intensity curve of a 10W fluorescent tube lamp in 90°
Fig. 5. Luminous intensity curve of a 10W fluorescent tube lamp with a reflective sleeve in 90°
The Effects of an Anti-glare Sleeve Installed on Fluorescent Tube Lamps
549
Fig. 6. Luminous intensity curve of a 10W fluorescent tube lamp with an anti-glare sleeve in 90°
The maximum and average luminous intensity (unit: cd) in light projecting angle 0°, 45° and 90° are summarized in Table 1~3. Compared to no sleeve condition, a reflective sleeve or an anti-glare sleeve increases the maximum and average luminous intensity. Besides, the increment of these two sleeves seems to have similar trends. Table 1. The maximum and average luminous intensity in 0° Sleeve Items
Maximum cd Average cd
10W
No sleeve 20W
40W
A reflective sleeve 10W 20W 40W
An anti-glare sleeve 10W 20W 40W
55.4
111.8
179.7
90
211.5
373.4
90.1
219.6
414.9
45.6
91.7
147.8
74.2
174.3
307.9
73.7
180.5
343
Table 2. The maximum and average luminous intensity in 45° Sleeve Items
Maximum cd Average cd
10W
No sleeve 20W
40W
A reflective sleeve 10W 20W 40W
An anti-glare sleeve 10W 20W 40W
55.9
105.9
177.1
95.4
204
317.8
90.6
213.6
412.6
46.0
87.9
144.1
76.1
166.2
258.7
73.8
173.3
338.3
Table 3. The maximum and average luminous intensity in 90° Sleeve Items
Maximum cd Average cd
10W
No sleeve 20W
40W
A reflective sleeve 10W 20W 40W
An anti-glare sleeve 10W 20W 40W
57.2
121.3
206.7
94.4
208.5
347.1
89.5
215.5
395.7
52.9
111.8
171.1
74
159.6
272.4
73
174.7
322.8
550
S.-T. Uang, C.-L. Liu, and M. Chang
3.2 Results of Simulation The simulated space of this research is a classroom of one junior high school in Taiwan. Fig. 7 is the photo of this classroom, and Fig. 8 is the same classroom build up in Lumen Micro 2000.
Fig. 7. The photo of the classroom being simulated
Fig. 8. The simulated classroom by Lumen Micro 2000
The average horizontal illumination measured in this classroom is 251.27 lux, which is closed to the simulated illumination, 255.7 lux. The average horizontal illumination while using reflective sleeves is estimated to be 333.9 lux by Lumen Micro 2000; and 377.2 lux while using anti-glare sleeves. Visual comfort probability (VCP) represents the degree of visual comfort of persons. The larger the VCP value is, the smaller the glare is. Smaller max/min VCP value indicates the uniformity distribution of lighting in a space. The VCP values estimated by Lumen Micro 2000 are summarized in Table 4. An anti-glare sleeve has larger average VCP (54.7) and smaller max/min VCP (3.6). In other words, an antiglare sleeve seems to be beneficial in reducing glare. Table 4. A summary table of the visual comfort probability (VCP) values VCP
Average VCP
Maximum VCP
Minimum VCP
Max/Min VCP
No sleeve
50.5
99
16.4
6.1
A reflective sleeve
50.1
98
20.1
4.9
An anti-glare sleeve
54.7
97.9
27.1
3.6
Sleeve
Unit: Percentage (%)
3.3 Results of Experiment The experiment contained two factors: three kinds of sleeve conditions (Sleeve), no sleeve, a reflective sleeve and an anti-glare sleeve; and three types of paper materials
The Effects of an Anti-glare Sleeve Installed on Fluorescent Tube Lamps
551
(Paper), matt finish papers, copy papers and dowling papers. Each participant (n=30) randomly experienced the sequences of all the 9 combinations of Sleeve and Paper. The 3*3 within-subject two-way ANOVA analyses were conducted to examine the main effects and interactions of Sleeve and Paper on reading effects and visual comfort. Reading effects. There are four subjective ratings related to reading effects as described in the followings. An ANOVA on “I can clearly read the words on the paper.” revealed one significant interaction between Sleeve and Paper, F(4,116)=5.619, p<0.01. As shown in Fig. 9, the two-way interaction term shows that for matt finished papers, an anti-glare sleeve significantly increased the words readability (Mean=6.10) beyond the other sleeve conditions. However, the same effect was not found for copy or dowling papers.
Fig. 9. The interaction between Sleeve and Paper on “I can clearly read the words on the paper.”
An ANOVA on “I can easily identify the words on the paper.” revealed two significant main effects: Sleeve, F(2,58)=10.696, p<0.01, and Paper, F(2,58)=6.990, p<0.01. Further analysis on Sleeve indicated an anti-glare sleeve has better words identification (Mean=5.72) than no sleeve (Mean=5.15) and a reflective sleeve (Mean=5.26). Further analysis on Paper indicated matt finish papers have better words identification (Mean=5.60) than copy papers (Mean=5.33) and dowling papers (Mean=5.20). An ANOVA on “I feel the graph on the paper is clear.” revealed one significant main effect: Sleeve, F(2,58)= 7.178, p<0.01. Further analysis on Sleeve indicated an anti-glare sleeve has better graph clarity (Mean=5.42) than no sleeve (Mean=4.92) and a reflective sleeve (Mean=4.90). An ANOVA on “I feel the color of graph is vivid.” revealed two significant main effects: Sleeve, F(2,58)= 21.648, p<0.01, and Paper, F(2,58)= 5.962, p<0.01. Further analysis on Sleeve indicated an anti-glare sleeve has better color (Mean=5.24), followed by a reflective sleeve (Mean=4.67), and no sleeve is worse (Mean=4.39).
552
S.-T. Uang, C.-L. Liu, and M. Chang
Further analysis on Paper indicated matt finish papers have better color (Mean=5.11) than copy papers (Mean=4.58) and dowling papers (Mean=4.61). Visual Comfort. There are four subjective ratings related to visual comfort as described in the followings. An ANOVA on “I feel eye fatigue.” revealed two significant main effects: Sleeve, F(2,58)= 42.095, p<0.01, and Paper, F(2,58)= 5.288, p<0.01. Further analysis on Sleeve indicated no sleeve has higher eye fatigue (Mean=4.61), followed by a reflective sleeve (Mean=4.07), and an anti-glare sleeve is less eye fatigue (Mean=2.95). Further analysis on Paper indicated matt finish papers have higher eye fatigue (Mean=4.18) than copy papers (Mean=3.74) and dowling papers (Mean=3.71). An ANOVA on “I am aware the light reflection on the paper.” revealed one significant interaction between Sleeve and Paper, F(4,116)=5.751, p<0.01, and two main effects: Sleeve, F(2,58)=75.130, p<0.01, and Paper, F(2,58)=60.318, p<0.01. As shown in Fig. 10, the two-way interaction term shows that an anti-glare sleeve significantly reduced light reflection regardless paper materials, and viewing matt finish papers under a reflective sleeve or no sleeve may increase the experienced light reflection.
Fig. 10. The interaction between Sleeve and Paper on “I am aware the light reflection on the paper.”
An ANOVA on “I feel this light source harsh to my eyes.” revealed one significant main effect: Sleeve, F(2,58)= 67.199, p<0.01. Further analysis on Sleeve indicated no sleeve has higher harsh (Mean=5.20), followed by a reflective sleeve (Mean=4.04), and an anti-glare sleeve is less harsh (Mean=2.65). An ANOVA on “I feel this lighting condition comfortable.” revealed one significant main effect: Sleeve, F(2,58)= 75.025, p<0.01. Further analysis on Sleeve indicated no sleeve is the least comfortable (Mean=2.90), followed by a reflective sleeve (Mean=3.81), and an anti-glare sleeve is more comfortable (Mean=5.38).
The Effects of an Anti-glare Sleeve Installed on Fluorescent Tube Lamps
553
4 Conclusions This research demonstrates a new way instead of lighting fixtures to control light distribution and glare emitted from fluorescent tube lamps by lamp sleeves. This antiglare sleeve is made of plastic material (PET), thus it is remarkable cheaper than lighting fixtures. In addition, an anti-glare sleeve is easy to install on the lamp. Especially, the findings of the present study verify the benefits of an anti-glare sleeve in increasing luminous intensity towards viewing objects, reducing glare, and generating more comfortable reading.
Acknowledgments This research is financially supported by the National Science Council of Taiwan under contract number NSC97-2221-E-159-012.
References 1. Giris, T.E.: Some Suggestions for Photovoltaic Power Generation Using Artificial Light Illumination. Solar Energy Materials & Solar Cells 90, 2569–2571 (2006) 2. Rea, M.S.: IESNA Lighting Handbook: Reference and Application. Illuminating Engineering Society of North America (1994) 3. Dizik, A.A.: Concise Encyclopedia of Interior Design, 2nd edn. Van Nostrand Reinhold, New York (1998) 4. Gluskin, E., Topalis, F.V., Kateri, I., Bisketzis, N.: The Instantaneous Light-intensity Function of a Fluorescent Lamp. Physics Letters A 353, 355–363 (2006) 5. Uang, S.-T., Liu, C.-C.: An Investigation of Adopting a Sleeve to Redistribute Lighting of a Fluorescent Tube Lamp. Journal of Illuminating Engineering (2009) (in Chinese) 6. Japuntich, D.A.: Polarized Task Lighting to Reduce Reflective Glare in Open-plan Office Cubicles. Applied Ergonomics 32, 485–499 (2001) 7. Osterhaus, W.K.E.: Discomfort Glare Assessment and Prevention for Daylight Applications in Office Environments. Solar Energy 79, 140–158 (2005) 8. Iwataa, T., Kimuraa, K.-I., Shukuyab, M., Takanoc, K.: Discomfort Caused by WideSource Glare. Energy and Buildings 15(3-4), 391–398 (1990-1991) 9. Kim, W., Koga, Y.: Effect of Local Background Luminance on Discomfort Glare. Building and Environment 39, 1435–1442 (2004) 10. Veldz, M.: User Acceptance Studies to Evaluate Discomfort Glare in Daylit Rooms. Solar Energy 73, 95–103 (2002)
Electromyography Focused on Passiveness and Activeness in Embodied Interaction: Toward a Novel Interface for Co-creating Expressive Body Movement Takabumi Watanabe1, Norikazu Matsushima1, Ryutaro Seto1, Hiroko Nishi2, and Yoshiyuki Miwa1 1
Waseda University, 3-4-1, Ohkubo, Shinjuku-ku, Tokyo, Japan [email protected], [email protected], [email protected], [email protected] 2 Toyo Eiwa University, 32, Miho-cho, Midori-ku, Yokohama, Kanagawa, Japan [email protected]
Abstract. In expressive body movement created by one person and his/her partner, a sense of nonseparation, as if one’s body and his/her partner’s body are united, is experienced. For such a relationship between the two, a process to feel passiveness and activeness physically is important. The objective of this study is to capture passiveness and activeness in bodily interaction. We focused on myoelectric (ME) potential from which time of generation and amplitude differ in voluntary and reaction movements. A measurement system was developed using ME potential in bodily interaction. This technique was validated by our data. Keywords: embodied interaction, expressive body movement, passiveness and activeness, surface EMG.
1 Introduction Since the embodiment plays a very important role for the communication, both participants’ bodies need to be on the common actual field [1]. Through such embodied interactions, various expressions are co-created while each decides his/her own role extemporaneously. For example, in extemporaneous bodily expression activities performed by two or more persons as typified by “contact improvisation [2]”, performers create one bodily expression with a new image while reading their partner’s weight or movement and also their mind through physical contact. If the boundary between one’s own body and the others’ body is opened, a nonseparable relationship such as a sense of unity between oneself and others sometimes realized by both. Nishi, who is one of authors, has investigated phenomenologically the abovementioned relationship created by interactive communication through bodies [3-4]. This was based on her experience in the field of bodily expression activity with people of different age, sex, or bodily features. An outline of the process is described below. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 554–562, 2009. © Springer-Verlag Berlin Heidelberg 2009
Electromyography Focused on Passiveness and Activeness
555
In bodily expression activity with others who have a physical difference, somesthetic difference as well as visual difference is vividly realized. In the wake of the sense of gap, consciousness accepting a partner’s motion and feeling are mutually strengthened. Active movement for sustaining bodily expression and spontaneous response to others’ activity are generated continuously. “Sympathetic body awareness,” synchronizing one’s own and others’ mind and body, is developed, and the person’s and partner’s minds are united. On the basis of the study mentioned above, in the process of developing a nonseparable relationship between oneself and others, sympathetic body awareness generated from passiveness and activeness has a significant role. If we capture objectively how the passive–active state changes dynamically in each other and how sympathetic body awareness is generated, the generation mechanism of the nonseparable relationship may be elucidated. This knowledge may contribute to the interface technology that supports co-creating bodily expression. We attempted to measure the passive–active state in mutual bodies. Specifically paying attention to myoelectric (ME) potential, from which time of generating timing and amplitude differ in voluntary and reaction movements, we proposed a ME potential measurement system in bodily interaction. Based on the measurements using this system, this technique was validated by employing ME potential.
2 Measurement Method Focused on ME Taking contact improvisation as a typical example of creation of each other’s expression by communication through embodied interaction, the passive–active state was measured during this period. We initially selected bodily expression in which two people touch each other’s palm, and the united palm was considered to be a measurement object. Movement of the palm is limited to one DOF (back and forth) to simplify the measurement object. For limiting DOF, a slidable board is used, as shown in Fig. 1. Two subjects hold this board between their palms and push or pull it.
Person A
Person B
Slidable board
Fig. 1. Slidable-board-mediated bodily interaction
The measurement method and its principle are described. Whether it is active movement or passive movement cannot be distinguished only from information on movement of their bodies (e.g., position, speed, acceleration). For example, when one
556
T. Watanabe et al.
side moves the hand to a previous position, it is difficult to judge if one pushed the hand actively or if it is pulled by one’s partner. We therefore focused on the force generated in each body. We proposed a method utilizing ME potential, which can measure generating force simply. It has been noted that the ME potential in voluntary movement increases several tens of milliseconds earlier than the start of body movement. This phenomenon is called “electro mechanical delay” (EMD). When movement is in response to one’s partner, it is expected that force is reactive to a movement. If these expectations are correct, it is believed that increase in ME potential of active movement precedes body movement, and increase in ME potential of passive movement lag body movement. It is also believed that the amplitude of force generated by active movement is larger than that generated by passive movement, and ME potential also increases according to it. We attempted to judge and measure passive–active status based on the observations mentioned above. Although research on brain activity in passive movement with respect to restriction of body has been conducted [5-6], applying it to the present study, which deals with communication in bodily expression, is difficult.
3 Development of Measurement System An overview of the developed system is shown in Fig. 2. PC for Recording Data Amplifier A/D Board Acceleration sensor Receiver
Belt for fixing a hand 00 4
Magnetic tracking sensor
Active electrode 250 800
Transmitter Linear guide rail
Fig. 2. Bodily movement and EMG measurement system
This system comprises a slidable board only in the front–back direction, an acceleration sensor, a magnetic tracking sensor, active electrodes for surface EMG and an amplifier, an A/D board, and a personal computer for data recording. In consideration of acceleration measurement, the slidable board must deaden the vibrations and must be highly stiff. The aluminum frame was therefore fixed to an iron linear guide (length, 800 ms) with screws and an acrylic board (400 × 250), which was pushed or pulled by the subject’s hand while fixed to this aluminum frame.
Electromyography Focused on Passiveness and Activeness
557
A belt was attached for fixing the palm to the acrylic board. The EMG measurement system needed >1 kHz sampling frequency due to the properties of the ME signal. Activity electrodes (4 ch) attached to an arm and amplifiers (Delsys, Bagnoli-2) were connected to the PC using an AD board (Interface, PCI-3135). Differential amplification (80 dB of profits) was possible using a reference electrode for this amplifier. To acquire ME signals and the signal from the acceleration sensor synchronously, these signals were recorded from the same A/D board. Data logging can be carried out in about 1800 Hz when ME signals (4 ch) and the acceleration sensor’s signal are measured simultaneously. In addition to the above, the position of a slidable board was also measured using a magnetic tracking sensor. Because positional data from the magnetic tracking sensor is transmitted using RS-232C, a time gap between positional data and ME potential signals or acceleration data arises due to the difference in the transfer rate. To compensate for the time gap, the time gaps of acquisition by the A/D board and magnetic tracking sensor were measured beforehand, and compensation was applied using them (an A/D board acquires about 19 ms earlier than the magnetic tracking sensor.).
4 Measurement of ME Potential during Embodied Interaction 4.1 Embodied Interaction with Fixing the Relationship between Passive and Active The difference between the ME potential of passive movement and the ME potential of active movement has been confirmed. Either subject was directed to move the slidable board always actively, or the other subject was directed to move it always passively. Subjects repeatedly pushed and pulled the slidable board for 30 s three times in total while changing the speed of movement. In addition, an active electrode was put on the center of the deltoid muscle (which is used to pull the board) and another was put by the armpit in the pectoralis major muscle (which is used to push the board) using double-faced tape (Fig. 3). A reference electrode for differential amplification was placed on the left elbow (which has few muscles). A situation of an experiment is shown in a Fig. 4.
Reference electrode Pectoralis major muscle Deltoid muscle
Active electrodes Fig. 3. Pasting Positions of active and reference electrodes
558
T. Watanabe et al.
A part of the measured data is shown in Fig. 5. To clarify the rising point of ME potential and acceleration, original waveforms were filtered by a one-order Butterworth low-pass filter (time constant, 0.01 s). ME potentials changed according to the change in position and acceleration of the slidable board (Fig. 5). Increase in ME potential of the active side tended to occur earlier than that of the passive side. Repeated pushing and pulling by a certain subject in 30 s enabled the mean time lag to be ascertained. The rising time of ME potential minus that of acceleration between increase in ME potential and increase in acceleration for 30 s was computed (Table 1). The rising time of ME potential of active movements I was about 170 ms earlier than that of passive movements. When the slidable board was moved forward, an EMD was confirmed but, if it was moved backward, EMD was not confirmed. This is partly because the slidable board may have started to move using the arm or waist before activation of the deltoid muscle.
Subject A Subject B
Fig. 4. A situation of an expriment during slidable-board-mediated bodily interaction
Position of Slidable board cm
Acceleration
Pectoralis major muscle Deltoid muscle
EMD
Pectoralis major muscle Deltoid muscle
ME Potential of Passive Movement mV
ME Potential of Active Movement mV
Acceleration of Slidable board m/s2
Position
Time sec
Fig. 5. A part of result of measurement during Embodied Interaction with fixing relationship between passive and active
Electromyography Focused on Passiveness and Activeness
559
Table 1. Time lags between rising of ME potential and rising of acceleration Active movement Moving forward Moving Backward sec sec -0.06±0.03 0.125±0.06 Average sec 0.04±0.11
Passive movement Moving forward Moving Backward sec sec 0.18±0.07 0.23±0.06 Average sec 0.21±0.07
Amplitudes of the ME potential of active and passive movements were compared. In general, it is thought that the ME potential increases proportionally to the amplitude of generated power in its muscle. If a large force is applied to a slidable board, the acceleration of the board also becomes large. As shown in Fig. 6, the peak value of the acceleration of one operation was plotted on the horizontal axis, and the amplitude of the peak of ME potential at the time was plotted on the vertical axis. The ME potential generated by active movement was larger than that generated by passive movement when the slidable board was moving at a certain acceleration. This suggested that distinguishing active movement from passive movement was possible by determining increase in ME potential.
Active-movement
ME Potential mV
Passive-movement
Acceleration m/s2
Fig. 6. Relationship between ME potential of passive/active movement and acceleration
4.2 Embodied Interaction without Fixing the Relationship between Passive and Active When a subject interacted with his partner freely without deciding their passive– active state, we attempted to distinguish active movement from passive movement using the waveform of the ME potential based on the result of Section 4.1. Two subjects moved the slidable board freely for 1 min and their ME potentials were
560
T. Watanabe et al.
measured. To investigate how much their passive–active state was judged by a ME potential comprising a subjective sense of “I lead” or “I am led,” subjects reported it in real time during the experiment by pushing buttons assigned “I lead” or “I am led,” respectively. Figure 7 shows the waveforms of the position and acceleration of the slidable board, and the ME potentials of the pectoralis major muscle and deltoid muscle. The motion of pushing and pulling changed continuously, as noted from the waveform of the position of the slidable board. The pectoralis major muscle and deltoid muscle are often both activated, so increase in ME potential was not clear at the start of moving. We therefore tried to judge the passive–active state by the amplitude of ME potential. The peak ME potential during one movement was assumed to be a representative value of ME potential according to the movement. The mean of a few dozen representative values obtained during the experiment for 1 min was calculated. When the peak ME potential was larger than this mean value, it was judged to be active movement, and when it was smaller, it was judged to be passive movement. Figure 8 shows the results of judgment by ME potential and report by pushing buttons. Judgment by ME potential corresponded well with reports of subjective sense. The concordance rate was 73%, and validity of this method was suggested.
Acceleration
ME Potencial mV
Subject A
Deltoid muscle Pectoralis major muscle
ME Potencial mV
Position cm
Acceleration m/s2
Position
Subject B
Deltoid muscle Pectoralis major muscle
Time sec
Fig. 7. Result of measurement during Embodied Interaction without fixing relationship between passive and active
Subject B
Subject A
Realtime reports of subjective sense by pushing buttons Judgments the passive–active state by ME potential Active Passive Active Passive Active Passive Active Passive Time sec
Fig. 8. Result of realtime reports of subjective sense by pushing buttons and judgments the passive–active state by ME potential
Electromyography Focused on Passiveness and Activeness
561
As a threshold value for dividing passive from active, the mean value of peaks of line potential is used. If a subject is inclined only toward activity or passivity in one side during an experiment, this method fails. If the relationship between peak ME potential of active movement or passive movement to acceleration of a slidable board is investigated beforehand for each person, obtaining a correspondence rate by calibration using the relationship is possible. Realtime reports of subjective sense were obtained in this experiment, but the meaning of these reports relates only to the conscious mind. Unconscious bodily movements or responses could not be measured, so there was a discrepancy between judgments by ME potential and reports of subjective sense. From another standpoint, this measurement method goes far beyond investigating simple correspondence of judgments by ME potential with reports of subjective sense. It can concurrently measure subjective sense surfacing in consciousness and ME potential because bodily information can handle two types of information (cognition and action). As mentioned above, this method can measure dynamics of the passive–active state, which are expected to involve embodied duality. This research can be developed further.
5 Conclusions To study the generation mechanism of a nonseparable relationship as if your body and your partner’s body are united, we measured the dynamism of the passive–active state in a mutual body during expressive body movement. Paying attention to ME potential, from which a generating timing and amplitude differ in a voluntary movement and reaction movement, we proposed and developed a measurement system for ME potential in bodily interaction. The effectiveness of this method was suggested by our data. In the future, we will study the method of analysis to judge the passive–active state and dynamics of cognitive information by reporting subjective sense and embodied information (including unconscious factors).
Acknowledgment This research was partially supported by Grant-in-Aid for Scientific Research, 20700249, from Japanese Ministry of Education, Science, Sports and Culture. This research also has been conducted under the project "Generation and Control Technology of Human-Entrained Embodied Media" and is supported by CREST of JST.
References 1. Shimizu, H., Kume, T., Miwa, Y., Miyake, Y.: Ba and co-creation. NTT Publishing (2000) (in Japanese) 2. Novack, C.J.: Sharing the Dance: Contact Improvisation and American Culture. Univ. of Wisconsin Pr. (1990)
562
T. Watanabe et al.
3. Nishi, H., Shiba, M.: The Potentiality of Formed Interactive Synchrony in the Body Expressive Activity. Journal of Cultural Studies in Body, Design, Media, Music and Text 1(1), 23–30 (2001) (in Japanese) 4. Nishi, H., Noguchi, H.: An Education Program for Developing Sympathetic Body Awareness of Preschool Teachers: Through the Experience of Expressive Body Movement in College-Level Course Works. Research on early childhood care and education in Japan 43(2), 156–165 (2005) (in Japanese) 5. Alarya, F., Simõesa, C., Jousmäkia, V., Forssa, N., Harib, R.: Cortical Activation Associated with Passive Movements of the Human Index Finger: An MEG Study. NeuroImage 15(3), 691–696 (2002) 6. Alary, F., Doyon, B., Loubinoux, I., Carel, C., Boulanouar, K., Ranjeva, J.P., Celsis, P., Chollet, F.: Event-Related Potentials Elicited by Passive Movements in Humans: Characterization, Source Analysis, and Comparison to fMRI. NeuroImage 8, 377–390 (1998)
An Integrated Approach to Emotion Recognition for Advanced Emotional Intelligence Panagiotis D. Bamidis1, Christos A. Frantzidis1, Evdokimos I. Konstantinidis1, Andrej Luneski1, Chrysa Lithari1, Manousos A. Klados1, Charalambos Bratsas1, Christos L. Papadelis2, and Costas Pappas1 1
Lab of Medical Informatics, Medical School, Aristotle University of Thessaloniki, Greece [email protected] 2 Center for Mind/Brain (CIMEC), University of Trento, Mattarello, Trentino, Italy
Abstract. Emotion identification is beginning to be considered as an essential feature in human-computer interaction. However, most of the studies are mainly focused on facial expression classifications and speech recognition and not much attention has been paid until recently to physiological pattern recognition. In this paper, an integrative approach is proposed to emotional interaction by fusing multi-modal signals. Subjects are exposed to pictures selected from the International Affective Picture System (IAPS). A feature extraction procedure is used to discriminate between four affective states by means of a Mahalanobis distance classifier. The average classifications rate (74.11%) was encouraging. Thus, the induced affective state is mirrored through an avatar by changing its facial characteristics and generating a voice message sympathising with the user’s mood. It is argued that multi-physiological patterning in combination with anthropomorphic avatars may contribute to the enhancement of affective multi-modal interfaces and the advancement of machine emotional intelligence. Keywords: Emotion, Affective Computing, EEG, Skin Conductance, Avatar, Mahalanobis, classifier.
1 Introduction Emotions were until recently only rarely the discussion topic mentioned in humancomputer interaction (HCI) studies of human intelligence. Lately, the arguments put alongside the significance of emotions gave birth to a new area of “emotional intelligence” within Ambient Intelligence (AmI). Moreover, recent research on neuronal mechanisms involved in emotional processing provides some remarkable insights. There is now growing evidence that there exist two motivation systems - appetitive and aversive - activated according to the judgment of each situation as either pleasant or unpleasant. The reaction intensity is then modulated by the aforementioned systems and reflects the activation level [1]. The International Affective Picture System provides a set of normative emotionally evocative pictures that differ according to their arousal and valence dimensions and may be used for such experimental investigations [2]. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 565 – 574, 2009. © Springer-Verlag Berlin Heidelberg 2009
566
P.D. Bamidis et al.
Emotional intelligence has been demonstrated to be crucial during the performance of several cognitive functions [3]. Furthermore, the task of emotion recognition is very important during the interaction with other people. Humans communicate each other mainly due to their skill of emotional understanding. Theoretical research has demonstrated that the successful interaction of computers with humans will adopt basic principles required for the communication among human beings [4]. In order to achieve this goal it is essential to empower computers with emotion discrimination capabilities. Due to the fact that emotional processing modulates several aspects of human communication like voice and facial expressions, as well as, physiological reactions, emotional interactions could become more accurate, had they combined all or some of these features in concert. Pattern recognition of emotional processing based on multi-physiological recordings has been growing recently as a field on its own within the HCI community [5]. Specific affective states, such as fear, melancholy, excitement, etc. have been demonstrated to show characteristic response patters regarding both the central and the autonomic nervous systems. This is further empowered by recent technological improvements enabling the use of wearable and miniaturized sensors [6], [7] which seem promising for a wide range of new health and medical applications by acquiring large sets of recorded data during daily realistic situations. Previous studies have investigated the use of physiological pattern recognition during emotional processing. One of the first such works, was the one conducted by the MIT Media Lab in which a single subject intentionally expressed eight affective states over a period of more than a month [8]. During the experiment, features based on autonomic functions were extracted in order to discriminate between eight affective states by means of various techniques and classifiers. Anger was fully discriminated from the peaceful emotions (99%). Furthermore the eight emotions were separated into two classes according to their arousal dimension (80% for the high and 88% for the low arousal case), but the study faced difficulties trying to distinguish between pleasant and unpleasant emotions in a robust way (82% for the pleasant and 50% for the negative ones). A later work of the same team [9], improved the results achieving 81% recognition accuracy by seeding a Fisher Projection with the results obtained by means of the Sequential Floating Forward Search. The above classification rates regarded only a single subject. A more recent work [10] gathered physiological data from the autonomic nervous system from a single subject on different days and different times of the day. A great number of data segments (1000) was extracted and used for training a neural network classifier. Due to the large number of feature vectors (700 for training, 150 for testing and 150 for validation) it was feasible to robustly detect both arousal (96.58%) and valence (89.93%) dimensions of the emotions elicited by photos from the International Affective Picture System (IAPS) set [2]. Another study [11] used a long (45 min) show of slides and movie clips to elicit emotions in a fixed order. The emotion recognition task used non-invasive wearable sensors to gather data such as heart rate, skin temperature and phasic increases of the subject’s electrodermal activity. Unlike previous studies, the sample included 31 participants. Recognition rates were 65.38% by means of kNearest Neighbors (kNN) and 69.28% when using an algorithm based on Discriminant Function Analysis (DFA).
An Integrated Approach to Emotion Recognition for Advanced Emotional Intelligence
567
In previous work [12], we introduced a framework for the combination of multichannel psycho-physiological recordings towards emotion aware computing. This aim of the current piece of work is to extend the emotion discrimination measured capacity obtained in a previous study through the use of a user-independent classifier [13], by presenting an integrated approach to emotion recognition through the fusion of signals obtained/derived from both the central (Electroencephalographic Response Potentials (ERPs), brain oscillatory activity) and the autonomic nervous system (skin conductance), and the utilization of machine learning techniques, such as the Mahalanobis classifier. So, in the remaining of this paper, the experimental procedure, as well as, the description of the system architecture is provided in section 2. The classification results achieved by the proposed approach are presented in section 3 and discussed in the last section.
2 Material and Methods Healthy adult users (14 men and 14 females) were exposed to emotionally evocativestimuli (pictures selected from IAPS) presented on a PC monitor. Each picture had a specific (L for Low, H for High) Valence-Arousal dimension (HVHA, LVLA, LVHA, HVLA). There were 40 repetitive trials from each one of the four affective space conditions (emotion categories). The sequence of the four conditions (or else blocks) was randomly selected for each subject. Each picture was presented for 1 second. ERPs were recorded from nineteen sites with reference electrodes placed on the ear lobes at a sampling frequency of 500Hz. Skin conductance was recorded from medial phalanges of the non dominant hand. An off-line pre-processing step took place in order to remove artifacts from both signals. More specifically, EEG signals were filtered with a band-pass filter in the frequency range 0.5-40 Hz. Then, the Infomax Independent Component Analysis (ICA) technique was applied to remove artifacts caused by eye blinks. As for the electrodermal activity, the signal was digitally filtered by means of a low-pass short IIR with cut-off frequency at 2.5 Hz. Then, the data formed epochs time-locked to the stimulus onset. Finally, the average signal was computed for each stage. The data from frontal (Fz), central (Cz) and parietal sites (Pz) distributed along the anterior-posterior midline of the brain, were analyzed and their main ERP components were extracted. Prominent peaks of the delta (0.5-4 Hz) and theta (4-8 Hz) oscillatory activity (Event Related Oscillations, EROs) from all electrode sites were used as features. The main features of a phasic skin conductance response (SCR) were also computed. These characteristics were the rise time, latency, amplitude and duration of the SCR response. The grand average signals from the aforementioned recordings are depicted in Figure 1. Statistical analysis (Repeated Measures of ANOVA) with valence and arousal as within subject’s factors and gender as between subject’s factor was performed on the average values of the ERPs, EROs and on the SCR characteristics in order to estimate the discrimination capacity of each feature. Then, feature selection took place according to the p-values obtained from the selected features.
568
P.D. Bamidis et al.
Fig. 1. Grand average waveforms of skin conductance responses are depicted for female subjects (a) and for male subjects (b). The Event-Related Potentials (ERPs) and Event-Related Oscillations (EROs) for the delta frequency band are represented in (c) and (d) respectively. In all subplots, “blue” indicates “High Valence”, while “red” indicates “Low Valence”; “solid” curves represent “Low Arousal”; “dotted” curves represent “High Arousal”.
2.1 The Emotion Recognition Sub-system The emotion recognition subsystem is comprised of the arousal and valence recognition sub-components. Firstly, the feature vector is classified according to its arousal status. The arousal differentiation is conducted using different features according to the subject’s gender. Previous neuroscience studies have mentioned that males respond in a different way to the emotionally evocative stimuli than females [14]. Therefore, the gender effect is an important factor that should be taken into consideration during emotional interaction applications. The second part of the emotion recognition subsystem is intended to classify the feature vector according to its valence dimension. Once more, different feature vectors are used for the high and for the low arousing data segments. The classifier used for the emotion recognition was based on the Mahalanobis distance and is computed as shown in equation (1):
D 2 M = ( x − mi )T Ci −1 ( x − mi ), i=1, 2
(1)
where Ci is the covariance matrix for the particular emotional category considered and T stands for the transposition operator. Let x be a feature vector being compared to a pattern class for which mi is the class mean vector. The Mahalanobis distance is a minimum-distance classifier since a small value indicates a higher potential membership of the vector x in the emotional group under consideration.
An Integrated Approach to Emotion Recognition for Advanced Emotional Intelligence
569
The emotion recognition unit is the core component of the proposed application. According to the derived result, an XML file is created for the emotion description. This type of annotation is used in order to provide a standard format of the record which describes the classified emotion. Furthermore, it contains both quantitative (statistical) and qualitative information of both the signals and the derived features [15]. A block diagram of the emotion recognition subsystem is given in Figure 2.
Fig. 2. Block diagram of the Emotion Recognition Sub-System and visualization of the classification ratings
2.2 Avatar Behavior Generator Sub-system Attached to the emotion classification subsystem, an intermediate sub-system has been developed in order to transform the emotional data into Haptek HyperText commands [16] which act as the interface between the programming language and an anthropomorphic avatar. These commands change the avatar’s characteristics (e.g. mouth, eyes, energy, etc.) which form suitable facial expressions according to the emotion identified by the classification sub-system. The output is a file with a predefined structure. The usage of the proposed intermediate sub-system offers platform independency, since the avatar implementation details remain hidden. So, it is easier for a new avatar to be added to the system in case of future web-based applications. Moreover, the same sub-system except from changing the avatar’s facial
570
P.D. Bamidis et al.
expressions causes the avatar to respond with a voice message enhanced with some basic emotional features such as laugh, excitement, etc. [17]. This message is currently predefined according to the evaluated feature, with the sole purpose of attempting to counteract/neutralize the subject’s emotional mood in case of experiencing a negative feeling or to enhance the subject’s positive mood. The overall system architecture is visualized in Figure 3.
Fig. 3. Overall architecture of the system of integrated emotion recognition
As it can be seen, the significance of the proposed methodology lies mainly with the use of an affective protocol capable of eliciting and extracting the neurophysiological signatures for a variety of discrete human emotions. The emotion recognition subsystem then adopts machine learning techniques, such as the Mahalanobis classifier in order to recognize the elicited emotion. The outcome is provided to an avatar that through expressions and verbal contents provides emotional feedback to the subject/user. The connectivity between the various subsystems is achieved by means of XML specifications since it provides platform independency [15]. The use of the avatar guarantees a multi-modal interaction, alongside a kind of emotional interaction by eliciting certain expressions (video) and speech (voice), thereby achieving human embodiment inside the computer. Consequently, the described approach aims to the enhancement of the human-computer interaction by combining features of the cerebral and the sympathetic activity in a structured way, and therefore attempts to establish a complete loop of affective HCI, which is presumably closer to the human-human interaction.
An Integrated Approach to Emotion Recognition for Advanced Emotional Intelligence
571
3 Results The classification results indicate that pleasant and highly arousing stimuli, as well as, unpleasant and low arousing stimuli are classified in a very robust way, 89.29% and 85.71%, respectively. However, the classification accuracy of unpleasant highly arousing stimuli (57.14%) and pleasant low arousing stimuli (64.29%) is much lower. The overall/average performance obtained for the whole set of emotion categories reached 74.11%, as shown in Table 1. Table 1. Classification results for emotionally evocative stimuli
Emotional Category HVHA HVLA LVHA LVLA Average Performance
Performance Rate 89.29% 64.29% 57.14% 85.71% 74.11%
To investigate the inherent properties of the classification subsystem the obtained average performance was studied as a function of the feature number used. As shown in Figure 4, the classification performance is generally increased as more features are considered into the Mahalanobis distance computation.
Fig. 4. Classification performance of the two components of the Emotion Recognition SubSystem in association with the number of features used by the classifier
The first subplot in Figure 4 depicts the accuracy rates for the arousal subcomponent, while the second visualizes the valence discrimination task. According to the first sub-plot, the arousal discrimination task is simpler, faster and more accurate for females than males. Regarding the valence dimension, the low arousing
572
P.D. Bamidis et al.
stimuli can be differentiated much easier and in a more precise way than the highly arousing ones.
4 Discussion The results indicate that the use of a multitude of features improves the overall classification capacity of the system. More specifically, there are certain ERPs or EROs features that are very good arousal indicators. On the other hand, some features derived from the electrodermal activity (SCRs) can discriminate between pleasant and unpleasant pictures. The fusion, of all these components accounts for the overall emotional discrimination result. The use of the Mahalanobis distance as an emotion metric eliminates several drawbacks posed by linear classifiers such as the Euclidean distance. More specifically, the scaling of the coordinate axis as well as the correlations between the derived features is taken into consideration [18]. On the other hand, the computational cost is higher and the memory requirements grow quadratically as the number of the derived features increases. However, in the present case none of the four Mahalanobis sub-components employed more than thirteen features. The results obtained in the proposed study suggest that the use of Mahalanobis metrics is capable of classifying physiological recordings during emotional interaction paradigms. As mentioned earlier, previous emotional interaction studies have employed various protocols, which generally can be divided in two main categories. Most of them, especially the earlier ones, used only a single subject which intentionally expressed affective states several times. These studies, have reported mainly better results ranging from 66% for valence discrimination [8] to 93.5% total performance [10]. In comparison to the aforementioned studies, the performance of the Mahalanobis classifier is much lower than the usage of neural network classifier (93.5%) [10], almost the same when using a quadratic classifier for arousal discrimination (84%) [8], a bit lower when combining linear and quadratic algorithms [9] with a Sequential Floating Forward Search technique (81.25%) and better in the case of valence discrimination by means of a quadratic classifier (66%) [8]. However, it should be clarified that the development of a user-independent classifier highly differs from extracting the neurophysiological pattern of a single subject during emotional processing. Therefore, any straightforward result comparison may lead to misinterpretations. In case of user-independent classifiers based on the k-Nearest Neighbor (kNN) method and Discriminant Function Analysis (DFA), the Mahalanobis classifier seems to perform better (74.11% versus 65.38% and 69.28%, respectively) [11]. The use of anthropomorphic avatars, which adopt its facial characteristics according to the user’s affective state, facilitates the emotional interaction, since facial expressions are universally expressed and recognized by humans. Therefore, the proposed application may be employed in various applications like telemedicine, virtual or special education [19], monitoring of dangerous situations, entertainment, etc. Furthermore, the avatar is capable of taking action by generating a voice message in order to neutralize the user’s negative emotions, such as fear, anger, disgust, melancholy, etc. Its beneficial use has already been reported by several studies which
An Integrated Approach to Emotion Recognition for Advanced Emotional Intelligence
573
demonstrated that including an avatar as part of an emotional interaction interface helps to increase human performance [20]. Consequently, our study proposes the use of anthropomorphic avatars in order to mirror the user’s affective state. In a future evolution of the system presented herein, we envisage the creation of emotional paradigms in terms of virtual user presence (users modeled as avatars, virtually interacting with each other). This notion may be employed in several applications such as virtual games, psychotherapy groups and specific web folkosonomies [21]. Last but not least, the whole approach is accompanied by a C# based system integrating the above in an internet downloadable form (to be demonstrated). The whole endeavor is taken in accordance with previous efforts to fuse the HCI domain with ideas from neurophysiology and medical informatics [12], [22], [23] so as to enrich the multimodal affective arsenal with more robust emotion identification tools.
References 1. Bradley, M.M., Codispoti, M., Cuthbert, B.N., Lang, P.J.: Emotion and motivation I: Defensive and appetitive reactions in picture processing. Emotion 1(3), 276–298 (2001) 2. Lang, P.J., Bradley, M.M., Cuthbert, B.N.: International Affective Picture System (IAPS): Technical Manual and Affective Ratings. NIMH Center for the Study of Emotion and Attention (1997) 3. Picard, R.W., et al.: Affective learning – A Manifesto. BT Technology Journal 22(4), 253– 269 (2004) 4. Reeves, B., Nass, C.: The Media Equation: How People Treat Computers. Cambridge Univ. Press, Cambridge (1996) 5. Cockton, G.: Editorial:From doing to being:bringing emotion into interaction. Interacting with Computers 14(2), 89–92 (2002) 6. Picard, R.W., Healey, J.: Affective Wearables. Personal and Ubiquitous Computing 1(4), 231–240 (1997) 7. Konstantinidis, E.I., Bamidis, P.D., Koufogiannis, D.: Development of A Generic And Flexible Human Body Wireless Sensor Network. In: 6th European Symposium on Biomedical Engineering (ESBME 2008), Chania, Crete Island, Greece (2008) 8. Healey, J., Picard, R.W.: Digital processing of affective signals. In: International Conference on Acoustics, Speech and Signal Processing, Seattle, USA, vol. 6, pp. 3749– 3752 (1998) 9. Picard, R.W., Vyzas, E., Healey, J.: Toward Machine Emotional Intelligence: Analysis of Affective Physiological State. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(10), 1175–1191 (2001) 10. Haag, A., Goronzy, S., Schaich, P., Williams, J.: Emotion recognition Using Bio-Sensors: First Steps Towards an Automatic System. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS, vol. 3068, pp. 36–48. Springer, Heidelberg (2004) 11. Nasoz, F., Lisetti, C.L., Alvarez, K., Finkelstein, N.: Emotion Recognition from Physiological Signals for User Modeling of Affect. In: 3rd Workshop on Affective and Attitude User Modeling, USA (2003) 12. Bamidis, P.D., Luneski, A., Vivas, A., Papadelis, C., Maglaveras, N., Pappas, C.: Multichannel physiological sensing of human emotion: insights into Emotion-Aware Computing using Affective Protocols, Avatars and Emotion. Stud. Health Technol. Inform. 129(Pt 2), 1068–1072 (2007)
574
P.D. Bamidis et al.
13. Frantzidis, C., Lithari, C., Vivas, A., Papadelis, C., Pappas, C., Bamidis, P.D.: Towards Emotion Aware Computing: a Study of Arousal Modulation with Multichannel Event Related Potentials, Delta Oscillatory Activity and Skin Conductivity Responses. In: 8th IEEE International Conference on BioInformatics and BioEngineering (BIBE 2008), Athens, Greece, pp. 1–6 (2008) 14. Brody, L.R.: Gender differences in emotional development: A review of theories and research. Journal of Personality 53(2), 102–149 (2006) 15. Luneski, A., Bamidis, P.D.: Towards an Emotion Specification Method: Representing Emotional Physiological Signals. In: Proc. of the 20th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2007), Maribor, Slovenia, pp. 313–320 (2007) 16. The free Haptek Guide, http://haptek.com/ (last access 24/2/2009) 17. Loquendo: global supplier of speech recognition and speech synthesis technology and solutions, http://www.loquendo.com/ (last access 24/2/2009) 18. Cincotti, F., et al.: The use of EEG modifications due to motor imagery for brain-computer interfaces. IEEE Transactions on Neural Systems and Rehabilitation Engineering 11(2), 131–133 (2003) 19. Luneski, A., Konstantinidis, E.I., Hitoglou-Antoniadou, M., Bamidis, P.D.: Affective Computer-Aided Learning for Autistic Children. In: 1st Workshop on Child, Computer and Interaction, ICMI 2008, Chania, Crete, Greece, October 20-23 (2008) 20. Lisetti, C.L., Nasoz, F.: Using Noninvasive Wearable Computers to Recognize Human Emotions from Physiological Signals. EURASIP Journal on Applied Signal Processing 2004(11), 1672–1687 (2004) 21. Boulos, K., Maged, N., Wheeler, S.: The emerging Web 2.0 social software: an enabling suite of sociable technologies in health and health care education. Health Information and Libraries Journal 24(1), 2–23 (2007) 22. Vilon, O., Lisetti, C.L.: Toward Recognizing Individual’s Subjective Emotion from Physiological Signals in Practical Application. In: Proc. of 20th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2007), Maribor, Slovenia, pp. 357–362 (2007) 23. Luneski, A., Bamidis, P.D., Hitoglou-Antoniadou, M.: Affective computing and medical informatics: state of the art in emotion-aware medical applications. Stud. Health Technol. Inform. 136, 517–522 (2008)
Addressing the Interplay of Culture and Affect in HCI: An Ontological Approach Emmanuel G. Blanchard1, Riichiro Mizoguchi2, and Susanne P. Lajoie1 1 ATLAS Laboratory, McGill Faculty of Education, 3700 McTavish Street, Montréal (QC), H3A 1Y2 Canada {Emmanuel.Blanchard,Susanne.Lajoie}@mcgill.ca 2 Institute of Scientific and Industrial Research (I.S.I.R.), Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047 Japan [email protected]
Abstract. Culture and affect are closely tied domains that have been considered separately in HCI until now. After carefully reviewing research done in each of those domains, a formal ontology engineering approach brings us to identify and structure useful concepts for considering their interplay. Keywords: Affective Computing, Cultural Computing, Ontology Engineering, Awareness, Adaptation.
1 Introduction At first sight, computers can be seen as “cold” agents from which cultural and affective awareness are absent by default. More than a decade ago, the ground breaking work of Picard [21] helped establish affective computing as a field of research and, more recently enhancing cultural awareness of computer systems has gained some interest [2, 4, 12, 23]. Specific issues are pertinent to the study of affect and culture. One of the most notable problems is the relative dependency of both topics to folk language, which is subjective and relatively fuzzy by nature. Furthermore, both culture and affect are discussed within a range of scientific disciplines (psychology, anthropology, management, sociology, philosophy), each of them having a specific approach to research and each has a specific corpus of definitions. Without a shared corpus of definitions, interdisciplinary comparisons are limited, thus limiting the application of research findings from one discipline to the next one. Formal ontology engineering aims at defining domains by stressing the internal structure of its related concepts (i.e. determining what their essential parts and properties are; semantic labelling being secondary) as well as inter-concepts relationships. Practically speaking, formal ontologies are concept graphs (each node being a concept) [18] whose concepts’ structures refer as much as possible to their philosophical (i.e. objective) nature [30]. While strengthening interoperability of a domain by providing a coherent core of interrelated and well-structured concepts, J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 575–584, 2009. © Springer-Verlag Berlin Heidelberg 2009
576
E.G. Blanchard, R. Mizoguchi, and S.P. Lajoie
ontological representations are frequently used as a powerful means of enhancing domain awareness within computer systems. In order to promote the quality of cultural and affective human-computer interactions, we aim at developing artificial awareness for both those elements through formal ontology engineering. At first sight, cultural and affective awareness might seem distinct, affective phenomena occurring at the individual level whereas cultures emerging from a group of persons. However, there is no doubt that culture influences an individuals’ self-regulation, as well as affects are important in human social processes. Indeed, affect and culture are inherent intertwined elements of human interactions, thus influencing many aspects of human behaviours, communication, and social practices among other things. In this paper, we present our research for ontologically modelling the interplay between cultural and affective domains. Developing a good ontology requires a clear understanding of it. We report on previously developed frameworks in order to ensure that existing data can be adapted to the resulting work. In section two, we establish how the cultural and the affective domains are closely intertwined, and in section three, we discuss major existing approaches to represent culture and affect. Finally, in section four, after giving a brief review of previous research analyzing the structure of the affective and cultural domains, we introduce our work by focusing on description of basic concepts held at the intersection of both these domains.
2 Affect and Culture as Intertwined Domains Culture has a strong influence on affective experiences. Mesquita and her colleagues [17] concluded in their review of literature that it has been “convincingly demonstrated that there are cultural differences in the ecology of emotions”. In their review, they reported that affective antecedents (events or objects that trigger an affective phenomena), subjective experiences (feelings), appraisals, behavioural responses, and even physiological changes related to an affective experience, may vary across cultures [17]. Research also demonstrated that, depending on their cultural origins, people may be more likely to report positive or negative affects, valence of affect itself being subject to cultural variations [29]. Furthermore, processes pertaining to emotion recall are also culturally-sensitive, suggesting that cultural bias might be inherent to the post-hoc conscious assessment of affect [24, 29]. Similarly, cultural experiences frequently produce affective reactions. Thus, one of the aims of research on cultural management is to lower risks of misunderstandings or bad communication that could lead to negative affective reactions [10, 11]. Indeed, stereotypes or use of cultural information in an erroneous manner may trigger negative affective reactions within foreigners, going from revolt and pride, to disgust, cynical amusement or even aggressiveness. When a speaker faces uncommon cultural elements without the resource to correctly understand and/or manipulate them, he or she can respectively experience similar affective reactions. On the other hand, manifesting intercultural awareness and competences can enhance the trust of foreigners, thus strengthening the ethos (credibility) of a speaker. In a general way, the ability to successfully endorse cultural practices, particularly communication and
Addressing the Interplay of Culture and Affect in HCI: An Ontological Approach
577
traditional norms, as well as efficiently manipulating cultural references frequently results in positive attitudes towards the speaker [10, 11]. For the latter it can also be a source of positive self-worth, pride, and personal satisfaction. Finally, the current affective state of an individual, as well as his or her cultural profile, are both closely tied to the cognitive domain by influencing memory load and impacting on several cognitive processes such as decision making or interpretation of the surrounding world (context sensitiveness). Self regulation and motivational processes can also be affected and then alter behavioural management. Indeed, as mentioned previously, it is well known that affective phenomena developing through the cultural interpretation of a context can trigger culturally-variable communication practices and body languages that, in their turn, could nurture cognitive and affective reactions of other interacting agents. Furthermore, members of a same cultural group have great chances to endorse similar cognitive conceptions of world elements [26] that sometimes include a culture-specific affective charge.
3 Representing Affective and Cultural Domains Affective and cultural experiences can greatly differ. Affect mainly refers to intraindividual elements whereas culture pertains to between-group and within group (social) situations as well as intra-individual elements. However, as it has just been shown, several similarities and interplays exist between these domains, Thus it is not surprising that research in each of those domains has led to the development of sensibly similar techniques, i.e. dimensional and categorical approaches, for capturing their complexity. Some of them are reviewed in the next part. 3.1 Dimensional Representation The aim of dimensional representation methods is to reduce a complex domain to a limited set of independent dimensions expressing the major content of the domain. In the affective domain, recent research has frequently referred to the valence – arousal system [25], with valence going from positive (or pleasant) to negative (or unpleasant), and arousal going from low to high levels in order to describe the suitability and the intensity of affective phenomena. However, several other multidimensional models have been proposed. Fontaine and his colleagues [8] have suggested a four dimensional model including the following dimensions: evaluationpleasantness (similar to valence), appraisal of control (on the surrounding environment), appraisal of novelty (related to surprise), and activation-arousal. However, determining the correct number of dimensions to perfectly report affective experiences remains an open question. National systems of values have emerged as an important method to address the complexity of cultures within the last three decades. They are aimed at reporting tendencies that are likely to be endorsed by members of such nations. Hofstede’s system of national values was the first, and still popular, of such frameworks [10]. The five bipolar dimensions that Geert Hofstede identified are Power Distance, Individualism, Uncertainty Avoidance, Masculinity, and Long Term Orientation (see
578
E.G. Blanchard, R. Mizoguchi, and S.P. Lajoie
[10] for a full definition of each of those dimensions). Despite the very large number of studies that have endorsed the latter approach to discuss cultural issues in several domains [14], there is a long history of debate on the pros and cons of Hofstede’s work [16]. One major criticism is that such group-based analyses could not be applied at the individual level [16]. Several other dimensional systems exist that directly include distinct, however related, group and individual levels [11, 28]. 3.2 Categorical Representation Categorical methods focus on establishing a list of clusters (or categories) that are independent from each other. An element is thus discriminated according to its membership to a cluster. Specific properties are sometimes enounced for each cluster and provide additional information to help compare instances of various categories, or to understand their effect. Categories are often established by determining threshold in frequently used dimensions. When discussing famous categorical methods for affect discrimination, one can mention lists of basic emotions [7], or distinctions between positive and negative emotions. Researchers also distinguish various affective phenomena, moods and emotions being discussed most frequently [15, 27]. Cultural discrimination is sometimes made according to geographical locations (western, middle-eastern, eastern; European, Asian, American, African), historical or dominant belief system (Christian, Muslim, Atheist), socially dominant attitudes (collectivist vs. individualist; traditionalist vs. modern), race or ethnicity (black, caucasian, asian, african). Nations are probably the most currently used cultural categories [10, 11]. Blending several of these categories is also a common practice (afro-American, asian American…) to analyse supposedly more cohesive groups.
4 Formalizing the Interplay of Culture and Affect through Formal Ontology Engineering Previously presented approaches mainly focus on developing representations to allow comparisons between affective or cultural elements. However, as mentioned, the dependency of those methods to folk language and its inherent fuzziness, raise risks of personal interpretation, misconception or overgeneralization, thus making objective comparisons difficult. Several scholars have already tried to bypass this issue by addressing the structural nature of affect and culture through meta-analysis techniques. Some of this research was of particular importance for our project and is introduced in the next section. 4.1 Previous Structural Analyses Recently, Klaus Scherer produced a framework of the affect domain that shares a lot of concerns with ontology engineering [27]. Scherer’s framework is interesting for our project, first, because it tries to consider all already identified aspects of affective experiences (i.e. their cognitive, neuro-physiological and behavioural dimensions). Following this, six different affective processes, sometimes confused in folk
Addressing the Interplay of Culture and Affect in HCI: An Ontological Approach
579
language, are clearly discriminated from each others, and described as multicomponent processes that affect various physiological, cognitive, and behavioural subsystems (table 1). Furthermore, several design features are also identified, whose variations distinguish the different kinds of affective processes (i.e. event focus, appraisal driven, response synchronization, rapidity of change, behavioural impact, intensity, duration). The importance of each of the latter features to identify the different kinds of affective processes is also clearly established [27]. Regarding the specific affective process of emotion, Scherer emphasizes its distinction with “feeling”, an emotion being “the total multimodal component process”, whereas a feeling should be seen as “a single component [of any affective process] denoting the subjective experience process”. Scherer also disambiguates aesthetic emotions, “produced by the appreciation of the intrinsic qualities of the beauty of nature, or the qualities of a work of art or an artistic performance”, from utilitarian emotions, “facilitating our adaptation to events that have important consequences on wellbeing”. Basic emotions generally refer to the latter category. Table 1. Scherer’s list of affective processes and their related descriptions Affective Process
Definition
Emotions
- "An episode of interrelated, synchronized changes in the states of all or most of the five organismic subsystems in response to the evaluation of an external or internal stimulus event as relevant to major concerns of the organism" - "Diffuse affect states, characterized by a relative enduring predominance of certain types of subjective feelings that affect the experience and behavior of a person"; - “Often emerge without apparent causes”; - “Generally of low intensity” - "Relatively stable evaluative judgments in the sense of liking or disliking a stimulus, or preferring it or not over other objects or stimuli" - "Relatively enduring beliefs and predispositions towards specific objects" - "Can be labeled with terms such as hating, valuing or desiring" - "Tendency of a person to experience certain moods more frequently or to be prone to react with certain types of emotions" - "Affective style that spontaneously develop, or is strategically employed in the interaction with the person or a group of persons" - “Examples: being polite, distant, cold, warm, supportive, contemptuous" - “Often triggered by events (encounter of a person), but less shaped by spontaneous appraisal than by affect dispositions, interpersonal attitudes, and most importantly strategic intentions”.
Moods
Preferences Attitudes Affect dispositions Interpersonal Stance
Several studies inform us about the structure of cultural elements. In cross-cultural psychology, Kashima [13] found that scholars identify a culture as a “a process of production and reproduction of meanings in particular actors’ concrete practices (or actions or activities) in particular contexts in time and space”. or as a “relatively stable system of shared meanings, a repository of meaningful symbols, which provides structure to experience”. Dawkins [6] popularized a vision of culture and its evolution inspired by genetics, where memes (i.e. cultural elements) are transmitted through individuals. If a meme provides social advantages to its owner, then he is more likely to be transmitted
580
E.G. Blanchard, R. Mizoguchi, and S.P. Lajoie
and to become a genuine part of a culture. Several other scholars [9], have extensively discussed the interest and limitation of this approach, particularly in its mental (cognitive) dimension [22]. UNESCO is also a natural source of information for cultural comprehension, and defines culture as “The set of distinctive spiritual, material, intellectual and emotional features of society or a social group, [...] it encompasses, in addition to art and literature, lifestyles, ways of living together, value systems, traditions and beliefs” [32]. Representing cultural heritage, tangible or intangible, is also an important aspect to consider in our project [2]. Finally, our project is rooted on concepts developed for the YATO upper ontology project [18], that is introduced in the next section. 4.2 Overview of YATO According to the IEEE Standard Upper Ontology Working Group, an upper ontology “is limited to concepts that are meta, generic, abstract and philosophical, and therefore are general enough to address (at a high level) a broad range of domain areas. Concepts specific to given domains will not be included; however, this standard will provide a structure and a set of general concepts upon which domain ontologies (e.g. medical, financial, engineering, etc.) could be constructed”. Following is a short summary of some of YATO’s main concepts. Entity, “something which exists independently of others” is divided into three sub-kinds of concepts, the first two of them being common in the ontology literature. − abstract entities are “things that need neither 3D space nor time to exist” (such as truth). − physical entities (or concrete), are things “that needs both 3D space and time to exist”. Physical has two sub-categories: occurrent, that can evolve mainly in the time dimension (such as process), and continuant, that can evolve mainly in the 3D space (such as artefact). − semi-abstract entities are introduced in YATO as things that “need only time to exist”. Representation is an important kind of semi-abstract entity. Indeed, as in philosophy studies, YATO clearly makes a distinction between an element and its representation, described as a “content-bearing thing”. YATO also extensively discusses the notions of quantity and quality among other things. Readers interesting in taking a closer look at YATO can browse it online at [18]. Our own project can be seen as an extension of YATO to deal with cultural issues and is implemented with the same ontology builder tool called HOZO. HOZO is based on a theory of role described in [19]. For instance, depending on the context, an instance of a human agent may have, say, a role of a teacher, a nurse, etc. and HOZO allows one to explicitly mention the role of a concept class in its context of use. Any kinds of inter-concept relation can also be defined and used to create cohesive more models of a domain. “Is-a” links are particularly important: they allow to group “families” of concepts (a root concept and its specializations recursively). Internal structure of concepts can also be represented by specifying its essential part (p/o) and attributes (a/o) links.
Addressing the Interplay of Culture and Affect in HCI: An Ontological Approach
581
4.3 Basic Concepts at the Interplay of Culture and Affect As seen in previous sections, there are a lot of domains that may be important for considering the interplay of culture and affect. The structure of the culture concept itself has to be discussed because several of its parts can induce affective experiences. We also need to discuss the identity of an affective experience i.e. what are its essential parts and properties, and what are the different kinds of it. Furthermore, it appears necessary to find a way to describe the context (or situation) of occurrence of these experiences, which is an inherent part of them. Finally, having some conceptualization of the mental (cognitive) world appears to be a prerequisite to any further development since it is the location of much of this interplay. Cognitive World. Two families of concepts have to be distinguished: mental atoms of information (such as thoughts in YATO) and mental processors for managing this information. Both concepts have been the subject of intense discussion in the cognitive science domain. YATO identified top mental processors as single mind (related to a singleton agent, such as a human being) and collective mind, (shared through a complex agent such as a cultural group or a multi-agent system). We see top mental processors as compositions of several lower level mental processors, whose identity refers to the specific processing task they are in charge of. Until now, we have mainly focused on the memory processors, in charge of memory management functions (others processors still have to be described). For a long time, research has divided the latter into three kinds of sub-modules [3]: sensory memory module for sensorial information retaining after the end of the sensorial experience, long term memory module for long term information storage, and working memory module for temporarily storing and manipulating information. The later is particularly interesting in the frame of this paper. It is frequently described as a limited buffer. Intense affective experiences is said to lower the amount of such memory available for cognitive processing. Elements that are culturally uncommon are said to necessitate more working memory than those that are common. Moreover, affective cues are known to facilitate memory recall (i.e. transferring thoughts from long term memory to working memory). Mental information or thought also received a lot of attention from the research community, with two kinds of information frequently mentioned [1]: declarative memory, that is fact-like information, and procedural memory, that is skills or procedures. Collective thought is a concept similar to shared cognition, which has been discussed by the multi-agent research community [20] among others. All those memory concepts have been much more elaborated since early research, leading to efficient cognitive models such as ACT-R [31] but due to space constraint, we can’t present further details in this paper. Context. The complete genesis of our conceptualization of centered context can be found in [5]. For resuming, a centered context is objective. It is a subset (i.e. parts) of a related world (3D world, social world, political world, cultural world, emotional world) that surrounds a context center. For each elicited dimension, contextual relations between its center and its parts are enumerated. Primitive contexts are unidimensional contexts (spatial context, temporal context), but contexts of the real world (such as a cultural context) are mainly multidimensional. They are called composite contexts, and are elicited as an association of primitive and/or lower-level composite contexts. One has to notice that a composite context is a context that is more complex than the sum of its parts.
582
E.G. Blanchard, R. Mizoguchi, and S.P. Lajoie
A second kind of context, that we call mental context, is subjective and refers to the set of memories that comes to mind when stating a name or a situation (for instance when someone is asked to think about a “medical context”). Such a context depends a lot of personal experience, and is thus highly culturally-sensitive. It can also easily trigger affective reactions. Affective Experience. Our conceptualization of affective experience strongly refers to Scherer’s meta-analysis presented in [27]. An affective experience (Figure 1) 1 is a multidimensional process made of a cognitive dimension, a neuro-physiological dimension and a subjective dimension (feelings). Such a process has an owner (the agent in which it occurs), and is strongly sensitive to the context of occurrence. It will eventually produce a behavioural response, which may vary according to its intensity. Intensity (arousal) Fig. 1. Structure of the affective process concept itself is related to both cognitive arousal, and neuro-physiological arousal. The affective process can have been triggered by affective antecedent(s), which would be specific parts of its context of occurrence (in one or several contextual dimensions). We considered the same list of affective processes as the one defined in Scherer’s framework (see table 1). However there are structural distinctions between mood and emotion (individual affective modulator related to individual regulation through physiological and cognitive influence), attitude and interpersonal stances (interaction affective modulator, related to regulation of external interactions), and preference and affect disposition (personality informer that describe the affective dimension of personality). Finally, blended affective process describes a multi-component affective experience, whose components (i.e. affective process) cannot be considered individually. Fig. 2. Structure of the culture concept Culture. Culture’s identity (Figure 2) refers to cultural elements that the owner of the culture (its related cultural group) has produced or endorsed (through historical processes such as conquests). Such elements can 1
In figures 1 and 2, the (U) symbol indicates concepts already elicited in YATO.
Addressing the Interplay of Culture and Affect in HCI: An Ontological Approach
583
be artefacts, practices (such as rituals, language, or common behaviours), or ideational elements (such as norms, scientific knowledge or beliefs). Interactions of members of the cultural group with any of these elements can be characterized by affective reactions. Same members can also share perceptions of such elements that may be unknown to foreigners, and original representations of the world, that are not specifically related to reality (such as stereotypes). Because they are difficult to understand by a foreigner, such representations may lead to misinterpretations and trigger affective reactions.
5 Conclusion and Future Work In this paper, we have explored the interplay between culture and affect. We have introduced our reflection on important elements to be considered when addressing this interplay as a part of our long-term project of developing a formal ontology for allowing artificial cultural awareness. The concepts we are structuring are also useful information for those interested in affective computing. The work presented here could guide the development of data structures to manipulate cultural and/or affective concepts. It could also inform the development of new cognitive models for enhancing learner representation, as well as the design of more realistic autonomous agents. Developing a formal ontology is a long journey. It is only through discussions, corrections and agreements with other scholars; and through subsequent successful developments of several culturally and/or emotionally-aware systems that we will be able to consider our project as stabilized.
References 1. Anderson, J.R.: Language, Memory and Thought. Erlbaum, Mahwah (1976) 2. Aroyo, L., Hyvönen, E., Van Ossenbruggen, J.: First Workshop on Cultural Heritage on the Semantic Web, held in conjunction with ISWC 2007, Busan, Korea (2007), http://www.cs.vu.nl/~laroyo/CH-SW/ISWC-wp9-proceedings.pdf 3. Atkinson, R.C., Shiffrin, R.M.: Human memory: A proposed system and its control processes. In: Spence, K.W., Spence, J.T. (eds.) The psychology of learning and motivation, vol. 8. Academic Press, London (1968) 4. Blanchard, E.G., Allard, D.: First Workshop on Culturally-Aware Tutoring Systems (CATS 2008), held in conjunction with ITS 2008, Montréal, Canada (2008), http://www.iro.umontreal.ca/~blanchae/CATS2008/CATS2008.pdf 5. Blanchard, E.G., Mizoguchi, R.: Designing Culturally-Aware Tutoring Systems: Towards an Upper Ontology of Culture. In: Blanchard, E.G., Allard, D. (eds.) 1st Workshop on Culturally Aware Tutoring Systems, Montreal, Canada (2008) 6. Dawkins, R.: The selfish gene. Oxford University Press, Oxford (2006) 7. Ekman, P.: An argument for basic emotions. Cognition and Emotion 6, 169–200 (1992) 8. Fontaine, J.R., Scherer, K.R., Roesch, E.B., Ellsworth, P.C.: The world of emotion is not two dimensional. Psychological Science 18(2), 1050–1057 (2007) 9. Henrich, J., Boyd, R.: On modeling Cognition and Culture: Why culture evolution does not require replication of representations. Journal of Cognition and Culture 2(2), 87–112 (2002)
584
E.G. Blanchard, R. Mizoguchi, and S.P. Lajoie
10. Hofstede, G.: Culture’s consequences: Comparing values, behaviors, institutions, and organizations across nations, 2nd edn. Sage, London (2001) 11. House, R.J., Hanges, P.J., Javidan, M., Dorfman, P., Gupta, V.: Culture, leadership and organizations: The Globe study of 62 societies. Sage, Thousands Oaks (2004) 12. Ishida, T., Fussell, S., Vossen, P.: IWIC 2007. LNCS, vol. 4568. Springer, Heidelberg (2007) 13. Kashima, Y.: Conceptions of Culture and Person for Psychology. Journal of Cross-cultural Psychology 31(1), 14–32 (2000) 14. Kirkman, B.L., Lowe, K.B., Gibson, C.B.: A quarter century of culture’s consequences: a review of empirical research incorporating Hofstede’s cultural values framework. Journal of International Business Studies 37, 285–320 (2006) 15. Lazarus, R.: Emotion and Adaptation. Oxford University Press, New York (1991) 16. McSweeney, B.: Hofstede’s model of national cultural differences and their consequences: a triumph of faith – a failure of analysis. Journal of Human Relations 55(1), 89–118 (2002) 17. Mesquita, B., Frijda, N.H., Scherer, K.R.: Culture and Emotion, Handbook of CrossCultural Psychology. Basic processes and Developmental Psychology, vol. 2, pp. 255– 297. Allyn & Bacon, Boston (1997) 18. Mizoguchi, R.: Yet Another Top Ontology: YATO, Interdisciplinary Ontology Conference 2009 (2009), http://www.ei.sanken.osaka-u.ac.jp/hozo/ onto_library/upperOnto.htm 19. Mizoguchi, R., Sunagawa, E., Kozaki, K., Kitamura, Y.: A Model of roles within an ontology development tool: Hozo. J. of Applied Ontology 2(2), 159–179 (2007) 20. Panzarasa, P., Jennings, N.R.: Collective cognition and emergent in multi-agent systems. In: Sun, R. (ed.) Cognition and Multi-Agent Interaction: From Cognitive Modelling to Social Simulation, pp. 401–408. Cambridge University Press, Cambridge (2006) 21. Picard, R.: Affective Computing. MIT Press, Cambridge (1997) 22. Pyysiäinen, I.: Ontology of culture and the study of human behavior. Journal of Cognition and Culture 2(3), 167–182 (2002) 23. Rehm, M., André, E., Nakano, Y., Nishida, T.: Workshop on Enculturating Interfaces, held in conjunction with IUI 2008 (2008), http:// mm-werkstatt.informatik.uni-augsburg.de/documents/ECI/ 24. Robinson, M.D., Clore, G.L.: Belief and feelings: Evidence for an accessibility model of emotional self-report. Psychological Bulletin 128, 934–960 (2002) 25. Russell, J.: Pancultural aspects of human conceptual organization of emotion. Journal of Personality and Social Psychology 45, 1281–1288 (1983) 26. Scharifian, F.: On cultural conceptualisations. Journal of Cognition and Culture 3(3), 187– 207 (2003) 27. Scherer, K.R.: What are emotions? And how they can be measured? Social Science Information 44(4), 695–729 (2005) 28. Schwartz, S.H.: Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In: Zanna, M.P. (ed.) Advances in experimental social psychology, vol. 25, pp. 1–65 (1992) 29. Scollon, C.N., Diener, E., Oishi, S., Biswas-Diener, R.: Emotions across cultures and methods. Journal of Cross-Cultural Psychology 35(3), 304–326 (2004) 30. Smith, B.: Ontology. In: Floridi, L. (ed.) Blackwell guide to the philosophy of computing and information, pp. 155–166. Blackwell, Oxford (2003) 31. Taatgen, N., Lebiere, C., Anderson, J.: Modelling paradigms in ACT-R. In: Sun, R. (ed.) Cognition and Multi-Agent Interaction: From Cognitive Modelling to Social Simulation, pp. 29–52. Cambridge University Press, Cambridge (2006) 32. UNESCO, Universal Declaration on Cultural Diversity, Mexico City (1982)
Love at First Encounter – Start-Up of New Applications Henning Breuer1, Marlene Kettner1, Matthias Wagler2, Nathalie Preuschen3, and Fee Steinhoff1 1
Deutsche Telekom Laboratories, Ernst-Reuter-Platz 7, 10587 Berlin 2 Intuity Media Lab GmbH, Feuerseeplatz 14, 70176 Stuttgart 3 T-Mobile International, Landgrabenweg 151, 53227 Bonn {henning.breuer, marlene.kettner}@telekom.de
Abstract. Whereas most research on usability focuses on known applications we explore the first encounters. Starting up new applications expectancy, impression management, initial dialogues and acquaintance, and ritualizing operations have to be handled. We present the research approach and document short histories of learning and fascination. Focussing on business users of mobile services we conducted diary research and expert interviews, reviewed design guidelines, and conducted a pattern-driven and resource-oriented innovation workshop. We present insights and results from the synthesis of guidelines, and ideas translated into experience prototypes. Keywords: Start-up, seven touchpoints, learnability, service innovation, creativity, diary research, experience design.
1 Introduction Users encounter new products and services along seven touchpoints: They become aware of an opportunity, inform themselves about tradeoffs, buy what they choose (or buy into), start-up new devices, applications and services, and they use, change, and drop or renew them [11]. From usability testing we know the value of the first moments a user encounters a new artefact. Asking users to explore undirected and to report what comes up to their mind yields valuable insights into initial orientation and attention. Few companies like Apple manage to stretch the start-up phase up front into the first shows announcing new products. Microsoft’s earcons design succeeded in making it a daily audition. Still, most companies and researchers neglect the first encounter as such. The start-up phase is of crucial importance: Within this phase users decide whether they will accept a product, configure basic settings with lasting impact, and explore how to integrate it into their everyday practises. Especially for business customers start-up may be a critical phase. While for consumer products we may assume that users have approached the product through the first three touchpoints and start-up with intrinsic motivation, business customers of mobile devices and applications may lack this motivation, if e.g. the IT-department selected the product. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 585–594, 2009. © Springer-Verlag Berlin Heidelberg 2009
586
H. Breuer et al.
2 Previous Works Start-up takes relatively little time, but creates a lasting impression. Still, hardly any research on human-computer interaction addresses start-up as such. Related works deal with package design and unboxing experiences, intuitive usability implying hedonistic and pragmatic factors, and learnability. Literature on persuasive technology describes tools to increase users’ interest in new applications. According to Haramundanis [9] good learnability in information design can be accounted to content (memorability, logic, reconstructability, consistency of the material) and formal aspects (legibility, readability and usability) Chak [4] describes the decision process as a transformation between four stages browsers, evaluators, transactors and customers - providing guidelines for each stage, similar to the first 3 touchpoints, but gathering the last stages into one. Some of his findings for “Customers” could be applied to our researched situation - such as the provision of a channel for (negative) customer feedback, responding to customer problems on a personal level, keeping in touch. Both Cohen [5] and Chak [4] focus on the medium of web interfaces, whereas a telecommunication provider has a much broader set of channels to be used for interaction with the customer. Fogg [7] identifies seven types of tools that can be addressed in technology to persuade to certain behaviour. Tunnelling technology provides users with a clear overview of the process and next steps: not having to think about what to do next helps to focus on content. Other tools include: customization, simplification, suggestion technology for opportune moments, self-monitoring surveillance by others, and reinforcing behaviour for conditioning. Fogg emphasizes the strong persuasive power of social interaction patterns being reproduced in HCI through simulation, as well as the power of knowing (or believing) to interact with real people. Although he applies these guidelines to digital interfaces, we believe these principles can be used in other communication channels. In different sources on persuasive design, the establishment of trust is regarded as a basis [7] – this we found strongly confirmed in our research. Still, as most are general principles of usability, they do not differentiate between touchpoints or user characteristics. While learnability plays an essential role in start-up neither is fully implied in the other.
3 Research Design and Project Setup We distinct 3 roles: users, IT Experts and Sponsors (Management). The Sponsor makes the decision for or against a purchase, based on conscious and unconscious reason and is very prone to persuasion [7]. His goal is to make the investment a success for the company. Assisting him in implementation is either an internal or external IT Expert, who will set-up and support well-functioning of the product. The user has to live with the product – but he determines how and if a product really will be used (or if, as with many devices, he will resort to a few known features and ignore the new or non-obvious ones) and turned into a successful tool for the company. Marketing and sales of new products target the initial decision makers – the Sponsor and IT Experts. But how do we persuade the users? For them, the decision process
Love at First Encounter – Start-Up of New Applications
587
Fig. 1. Five sub-phases of the start-up experience with new products
has to be repeated – keeping in mind that they find other benefits in a product than Sponsor and IT Expert - in order to really make the product accepted. For our research we were interested in two extreme user scenarios: “Innovative users” from innovation affine industries (e.g. Software Developer), and “Follower users” from more conservative service industries (e.g. Law Firm). How should each user group be addressed and may some things fit for all? The product we focused on is an innovative communication tool for very small to medium sized companies. The product integrates fixed with mobile phones and PCs. It enables a new approach to a more efficient team-work with features like shared address book including availability overview of staff. In order to make this product a successful tool for the team and company, the behaviour of each individual user has to change: he has to adapt to the new environment, but this is driven by the benefits he sees for himself. As a team-based application, the success of most features highly depends on the quick acceptance by a high number of users within the group. Looking at the product itself, we found that some features could provide initial benefits; others will show their value over time. Therefore we analyzed different types of features along a timeline of 5 five subsequent sub-phases. • Hearing about: User hears about the potential availability, the upcoming introduction of a new business product in the company. Expectation management starts here. Since the user does not decide the purchase, a main challenge of the first touchpoints is winning his or her consent. • First impression: During the first real encounter initial expectations are being fulfilled or frustrated, potentials for surprise, but also for lasting reluctance to use the application abound. As in “love at first sight” the first impression is a holistic experience – all senses, play a role with the packaged product. • Initial Dialogue: Initial user tasks include powering, set-up, and installation. First impressions are confirmed or changed. Curiosity to explore may be fed by constructivist instructional design and guidance. • Getting to know each other: users explore and customize the product or service. They learn how to navigate within and form a mental model of the application. • Establishing routines: Within the first three months users cultivate their interaction and get used to automatically handling the system according to their needs. Within activity theory [10] this sub-phase indicates the transition from action to operation. Our question is how to design a product for love at first sight. We try to consider the user experience of the start-up process to its full extend, regarding not only physical and digital design aspects of the product, but also other channels (such as communication guidelines, special training of staff, or additional channels of information such as mail). Therefore we set up a unique research design of five concurrent phases:
588
H. Breuer et al.
• Desk research for benchmark examples from neighbouring fields, derivation of guidelines and patterns for start-up. • Expert interviews: Two experts from different but related fields (human resources and event management), have been interviewed. • Field research: six users filled in user diaries for one week during their start-up phase with different akin devices. • A design workshop targeting at conceptual ideas. A resource-oriented innovation method (systematic inventive thinking) has been applied and adapted. Synthesis and design exploration: digging into details, specifying different ideas down into one conceptual design for the start-up process, including mock-ups and instructions for important items and aspects of that scenario.
4 Guidelines for Start-Up In order to gain a basis for the ideation workshop and to inform design decisions, we reviewed existing literature, conducted two expert interviews, and a diary study. We summarize the insights from these studies into eleven guidelines. Our desk research aimed at the collection of existing concepts, best practices and design patterns for start-up. First impressions also have significant influence on decisions made in other areas like Marketing and Human Resource Management. Hence we identified start-up experts from these fields and conducted semi-structured interviews with them to learn from their experiences. The goal was to gain insights about how people experience start-up and how the experts are dealing with that situation. Furthermore, we used a one-week diary study with 6 start-up users of mobile devices (MDAs, Blackberries), to gain qualitative information on how the start-up process is being experienced. The diary contained a set of questions targeting at experiences and actions during different stages of expectation, setup and the first days of usage. We gathered a number of citations and commons – e.g. the number of features used was in all cases shortly after receiving the device, and then later levelled off to a lower degree, instead of increasing constantly as we had previously suspected. The essential guidelines we derived from these activities include the following: • Give users a reason to join: When introducing a product, communicate clearly what it stands for, and what the users personal advantage, his value, will be when using it. This reason might differ from other Stakeholders’ reasons. One simple sentence should be enough to get the idea across [1]. On Slide Rocket (www.sliderocket.com) the main message (Make great presentations) is supported not only by wording, but also by the visual language – simplicity of Layout and pictures subtly promise High tech presentations, a simple and attractive tool. Details and additional advantages are communicated when exploring deeper into the product – in subtext (create, manage and deliver presentations online) and links. • Tell engaging stories: The consistent and well-orchestrated flow of experience – the “story” – inherent in the product should lead through all phases of the Start up Process (and all seven touchpoints). The chore of this story ideally relates to the
Love at First Encounter – Start-Up of New Applications
589
Fig. 2. Start page of Slide Rocket
•
•
•
•
Reason to join, providing additional information and building up excitement at opportune moments. Good stories are not only consistent and memorable (learnable), but also stir the users curiosity, making him explore further. Stories ABOUT a product can emphasize and add to the inherent narrative. We found in our desk research that successful stories are often told by people. On www.spotify.com, the story of music storage media is told in a cartoon with a human narrator. Use the mystery box: As in great stories, sometimes mystery is more important than knowledge – “holding back information intentionally is more engaging” [1]. Scarce bits of information spread about Apple products before the actual launch stir the excitement of their fans. To work this principle, some primary interest has to be present and addressed in the user. Mystery should be used with caution and mastery in order not to frustrate, irritate and drive potential users away. Polaroids stick in your mind: Users judge quickly with past experiences in mind. Addressing learned mental models can help understand a product, or mislead to wrong interpretation. For unknown products, affordances need to be designed to lead the user. Clues may point at new features. One participant of the diary research attributed known functionality to the central button of a new Blackberry, but it worked “like a joystick” – the initial judgement made the function harder to learn. Find the low-hanging fruits - challenge, reward and grow: The goal to be reached by using a new product should be worth the effort to the user. Easy initial goals reward with instant success and positive feedback, providing motivation. For new products, this may mean to advert and explain simple functions first, levelling up as expertise increases. This mechanism is often used in gaming. Current blockbuster games like Spore and Little Big Planet are making heavy use of this pattern to get people into the flow of the game. Lego manuals provide fast success first and then challenge the user to more difficult constructions based on the principles practiced earlier. Lower the first hurdle: Another way to make the goal worth the effort of getting to know a new product is to lower the first hurdle. A frictionless initial dialogue makes it much more rewarding to interact with a system as goals are reached faster. Principles of tunnelling and simplification [7] are often used in Setup wizards to guide people through the first steps.
590
H. Breuer et al.
Fig. 3. Detail from a Lego instruction manual
• Specify the effort and overall process: Expectations can be managed by letting people know what they need to do first, how long the process will take and what next steps are, provides a clear picture of this first hurdle –. Status bars and progress meters are examples of how this can be visualized. • Don’t leave users hanging: When entering a new situation or starting a new application, users are easily irritated, and usually do not know their options and the capacities of the product [1]. Clear information on where to get help at any time during the start-up process is crucial. Some participants in the diary study were irritated when familiar sources of help were missing (“there was no manual!”), and not replaced by well-communicated alternatives. • Provide input hints and prompts as additional clues to inexperienced users. On http://www.mixin.com/, an online shared calendar tool, the initial page is used to present the tool filled with sample content. This content itself thus explains the features. Users can try them one by one, in freely chosen order, matching their own needs. Above that, the sample content gives the pleasant impression of not starting with an empty page. • Accelerate initial connection making: For any application addressing social network mechanisms, fast connection making provides perfect “low hanging fruits”, as well as social recognition, one of the strongest persuasive powers [7]. Successful social network platforms help users build a network – inviting them after signing to search for their friends in the network. E.g. Facebook offers to send invitation mails to those of your friends who are not yet a part of the community, using a snowball effect for viral marketing. • Break the daily routines: In the interviews, we found that special events make people more receptive. Training units or team building events serve as such occasions. An event may feel like a treat and users are not distracted by everyday business. Informal breaks can serve the same purpose. Some companies introduced competitions for who adapts to a new system fastest or best, addressing gaming mechanisms and providing the winner with an extra benefit. One participant of the diary study stated that she did not like to try her new device at work while her colleagues worked on “real projects”. For the process of a good start-up, it is important to provide or use such an opportunity to focus on getting acquainted with a new product. • Make it Yours – Customizing: Addressing individual needs and habits can greatly increase the efficiency of the start-up process for different users. As users customize and hack, they explore and make a product their own. Customization
Love at First Encounter – Start-Up of New Applications
•
591
features, adding pictures, personalized sounds or materials to a product increases identification and acceptance of a product. Engage innovators as evangelists: Other users are usually considered as most reliable when judging a product, and people usually trust those they know more than strangers. Within most companies, there are some early adapters who have more advanced knowledge and interest in telecommunication technology than others. They may be the administrators of IT systems or plain colleagues, are regarded as specialists and often the first asked when facing a problem. Once these users are convinced, they can serve as evangelists promoting a product. The experts we interviewed strongly resorted to such evangelist mechanisms. Companies like Adobe, Sun, Microsoft have professional evangelists, blogging and answering questions in user forums. Basecamp features customer videos giving advice on features (http://www.basecamphq.com). Empowering users to serve as evangelists in public or within their immediate environment provides them with information and arguments that make them look good. Many of these advanced users may take pride in their position.
5 Systematic Idea Generation In order to generate impacting ideas to enhance the start-up experience we adopted the method of systematic inventive thinking [8; 2]. Starting with existing products and their characteristics it helps to generate ideas that can be easily produced and marketed with existing resources. This resource-oriented approach to innovation is supported by psychological research on “preinventive forms” [6] that may foster creative thinking more than targeting at a specific purpose. The thinking “inside the box” [8] approach applies a set of patterns. Patterns are not only used to categorize ideas – as Altshuller [2], originator of the approach, originally did in his analysis of patents – but also to generate new product ideas [3]. It deconstructs a product or service and its immediate environment into component elements in order to reassemble using e.g. five “patterns of innovation” [8]. In order to generate ideas for start-up, we worked with the patterns of multiplication – adding copies of components to achieve a qualitative change (the Gilette double-bladed razor is the classical example here), and task unification - assigning new tasks to elements that may then acquire the function of other elements (the suitcase with wheels absorbs a function of luggage carts), but adapted the method to our needs. While the theory of systematic inventive thinking opposes to customer centric methods of generating innovation we took existing design guidelines and insights from user diary research and the expert interviews in neighbouring fields as an input to direct the innovation process. At first, we analyzed the values and benefits the start-up phase might provide to users. For instance, a user might want to maximise his benefit with little effort. We specified this value saying “I want to learn new things fast”, and even further “I want an instant sense of achievement” and “I want to define myself, what to achieve” and so on. This line of values was labelled self-determination. In a second step we collected the essential product features and attributes our company owns to support the values – such as its sales department, employees, clients, online services, products
592
H. Breuer et al.
Fig. 4. “Team Screen” photo mock-up from Workshop session
with keyboards and screen. In a third step, we picked out several attributes one after another, applying one of the innovation patterns and asking, how it could be employed to achieve a certain value. For each idea we listed its benefits, challenges and alternatives. In this manner a total of 34 ideas was described and elaborated upon. In order to prioritize the most promising ideas we voted for the most important values identified earlier. Insights like the need for user participation and open issues such as the assignment of responsibilities and roles involved in managing the start-up process emerged from the discussion and were also documented. The ideas that have been valued the most we then assigned to the five subsequent sub-phases of start-up (the following are some examples): • Hearing about: a client event where initial data is collected from the users to pre-configure the application. • First impression: Use the packaging to communicate user values and benefits. • Initial Dialogue: A “snowball activator” requires another participant to invite the user for an initial interaction in order to initiate the application. • Getting to know each other: Additional gadgets may be used to provide an easy access to basic function, e.g. LED signals or an additional small desk screen to show the current state. • Establishing routines: Different charging cables for usage at home or at work might automatically activate different usage profiles. Other ideas like an online support tutorial could be used across several sub-phases. In the final part of the project, the guidelines we derived from the literature review, diary research and expert interviews and the prioritized ideas from the workshop were used to synthesize results into design concepts supporting start-up as such and each of its sub-phases. For each we defined a set of aspects as illustrated by the sample idea “Team Screen”. • Slogan: Experience Team Spirit together! • User-values: Proof of personal and team benefit, feedback for action, new experience, breaking routines, visualization of existing team structure • Objective: To visualize team dynamics (e.g. how calls are forwarded) to everyone, Provide a channel to learn the effects of the product together • Responsible: IT Administrator, Team Members
Love at First Encounter – Start-Up of New Applications
593
• Description: A large screen provides an overview map or list of all team members, showing individual status and reachability settings as well as connections between users. This visualization helps to understand the products features and effects. Introduced early in the start-up phase, this tool may help users understand the overall concept of the product, and see the application on their mobile devices as a miniature part of a whole. • Challenges: Cost and feasibility, need of an additional tool, communicating to users that their privacy is not affected in a negative way • Related start-up guidelines: Find the low-hanging fruits: Don’t leave users hanging; accelerate initial connection-making; Breaking the daily routine; customization Since feasibility still has to be evaluated from a business modelling and technological point of view, we cannot ensure that the solution will actually be implemented. But as we are working towards that goal it would provide a valuable case for real life evaluation once it is rolled out to the market.
6 Conclusions Being a one-time experience evaluation of start-up experiences is a challenging task. Field trials and focus groups indicate insights. We hope that market success in comparison to benchmark products delivers an additional proof of concept and reason to believe in our value propositions. With initial experience prototypes we have five users experience initial start-up of our business application suite while thinking aloud, comparing with benchmark products, and discussing. From the products point of view users move along seven touchpoints, from the users’ point of view these products embellish and ensemble their ordinary practises. The start-up process will differ with circumstances in each company, and is prone to technical and organisational malfunctioning – something is bound to go wrong. With this project, we provide a toolbox to overcome these initial problems for users. Then we will explore how to transfer these guidelines for other products and services.
References 1. Adaptive Path (2008), http://www.adaptivepath.com/ideas/reports/signup/ 2. Altshuller, G.: Innovation Algorithm: TRIZ, systematic innovation and technical creativity. Technical Innovation Center, Worcester, Mass (1999) 3. BBreuer, H., Baloian, N., Matsumoto, M., Sousa, C.: Interaction Design Patterns for Classroom Environments. In: HCI International Conference, Beijing, China, July 22-27, 2007. LNCS. Springer, New York (2007) 4. Chak, A.: Submit Now: Designing Persuasive Websites. New Riders Publishers (2002) 5. Cohen, M.: Persuasive Design in Action: PET design Expert Reviews. White Paper, Human Factors International (2008) 6. Finke, R.A.: Creative imagery: Discoveries and inventions in visualization. Eribaum, Hillsdale (1990)
594
H. Breuer et al.
7. Fogg, B.J.: Persuasive Technology: Using Computers to Change What We Think and Do. Morgan Kaufmann Publishers, San Francisco (2003) 8. Goldenberg, J., Horowitz, R., Levav, A., Mazursky, D.: Finding your innovation sweet spot. Harvard Business Review (2003) 9. Haramundanis, K.: Learnability in Information Design. In: SIGDOC 2001, pp. 7–11 (2001) 10. Nardi, B.A. (ed.): Context and consciousness: Activity theory and human-computer interaction. MIT Press, Cambridge (1996) 11. Rogers, E.M.: Diffusion of Innovations. Free Press, New York (2003)
Responding to Learners’ Cognitive-Affective States with Supportive and Shakeup Dialogues Sidney D‘Mello1, Scotty Craig2, Karl Fike2, and Arthur Graesser2 1
Department of Computer Science 2 Department of Psychology University of Memphis, Memphis, TN 38152, USA {sdmello,scraig,karlfike,a-graesser}@memphis.edu
Abstract. This paper describes two affect-sensitive variants of an existing intelligent tutoring system called AutoTutor. The new versions of AutoTutor detect learners’ boredom, confusion, and frustration by monitoring conversational cues, gross body language, and facial features. The sensed cognitive-affective states are used to select AutoTutor’s pedagogical and motivational dialogue moves and to drive the behavior of an embodied pedagogical agent that expresses emotions through verbal content, facial expressions, and affective speech. The first version, called the Supportive AutoTutor, addresses the presence of the negative states by providing empathetic and encouraging responses. The Supportive AutoTutor attributes the source of the learners’ emotions to the material or itself, but never directly to the learner. In contrast, the second version, called the Shakeup AutoTutor, takes students to task by directly attributing the source of the emotions to the learners themselves and responding with witty, skeptical, and enthusiastic responses. This paper provides an overview of our theoretical framework, and the design of the Supportive and Shakeup tutors. Keywords: affect, emotion, affect-sensitive AutoTutor, ITS.
1 Introduction Attempts to acquire a deep level understanding of conceptual information through effortful cognitive activities such as a systematic exploration of the problem space, generating self-explanations, making bridging inferences, asking diagnostic questions, causal reasoning, and critical thinking often lead to episodes of failure and the learner experiences a host of affective responses [1, 2]. Negative emotions are experienced when expectations are not met, failure is imminent, and important goals are blocked. For example, confusion occurs when learners face obstacles to goals, contradictions, incongruities, anomalies, uncertainty, and salient contrasts [3-5]. Unresolved confusion can lead to irritation, frustration, anger, and sometimes even rage. On the other hand, a learner may experience a host of positive emotions when misconceptions are confronted, challenges are uncovered, insights are unveiled, and complex concepts are mastered. Students that are actively engaged in the learning session may have a flow-like experience when they are so J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 595–604, 2009. © Springer-Verlag Berlin Heidelberg 2009
596
S. D‘Mello et al.
engrossed in the material that time and fatigue disappears [6]. They may also experience other positive emotions such as delight, excitement, and even one of those rare eureka (i.e. “a ha”) moments. Simply put, emotions are systematically affected by the knowledge and goals of the learner, as well as vice versa [1, 2, 7]. Cognitive activities such as causal reasoning, deliberation, goal appraisal, and planning processes operate continually throughout the experience of emotion. Given this inextricable link between emotions and learning, it is reasonable to hypothesize that an Intelligent Tutoring System (ITS) that is sensitive to the affective and cognitive states of a learner would positively influence learning, particularly if deep learning is accompanied by confusion, frustration, anxiety, boredom, delight, flow, surprise and other affective experiences [8-11]. An affect-sensitive ITS would incorporate assessments of the students’ cognitive, affective, and motivational states into its pedagogical strategies to keep students engaged, boost self-confidence, heighten interest, and presumably maximize learning. For example, if the learner is frustrated, the tutor would need to generate hints to advance the learner in constructing knowledge, and make supportive empathetic comments to enhance motivation. If the learner is bored, the tutor would need to present more engaging or challenging problems for the learner to work on. However, a number of technological challenges need to be overcome before the benefits of an affect-sensitive ITS can be fully realized. An affect-sensitive ITS must be fortified with sensors and signal processing algorithms. Further, these elements must be robust enough to detect the affective states of a learner within real-time constraints. The tutoring system also needs to select pedagogical and motivational moves that maximize learning while positively influencing the learners’ affect. The system might also synthesize affect through facial expressions and modulated speech. We are in the process of implementing this two-phase strategy (affect detection and response) into AutoTutor. AutoTutor is an intelligent tutoring system that helps learners construct explanations by interacting with them in natural language and helping them use simulation environments [12]. AutoTutor helps students learn Newtonian physics, computer literacy, and critical thinking skills by presenting challenging problems (or questions) from a curriculum script and engaging in a mixed-initiative dialog while the learner constructs an answer. AutoTutor provides feedback to the student on what the student types, pumps the student for more information, prompts the student to fill in missing words, gives hints, fills in missing information with assertions, identifies and corrects misconceptions and erroneous ideas, answers the student’s questions, and summarizes topics. While the current version of AutoTutor adapts to the cognitive states of learners, the affect-sensitive AutoTutor would be responsive to both the cognitive and affective states of learners [8]. The affect-detection phase focused on the development of computational systems that monitor conversational cues, gross body language, and facial features to detect the presence of boredom, engagement, confusion, and frustration (delight and surprise were excluded because they are extremely rare). These emotions were selected on the basis of previous empirical studies that used multiple methodologies (i.e. observational, emote-aloud, retrospective judgments by multiple judges) to monitor the emotions that learners’ experienced during tutoring sessions with AutoTutor [9, 13-15]. Automated affect-detection systems that detect these emotions have been integrated into AutoTutor. They have been extensively discussed in previous publications [8, 16, 17] and will not be addressed here.
Responding to Learners’ Cognitive-Affective States
597
The other essential component towards affect-sensitivity is to build mechanisms that empower AutoTutor to intelligently respond to these emotions, as well as to their states of cognition, motivation, social sensitivity, and so on. In essence, how can an affect-sensitive AutoTutor respond to the learner in a fashion that optimizes learning and engagement? Therefore, the next phase of our research focused on fortifying AutoTutor with the necessary pedagogical and motivational strategies to address the cognitive and affective states of the learner. This paper provides a synthesis of these research efforts.
2 Foundations of Affect Sensitivity Boredom, confusion, and frustration are negative emotions, and are states that, if addressed appropriately, can have a positive impact on engagement and learning outcomes. Flow, on the other hand, is a highly desirable positive affective state that is beneficial to learning. Although most tutoring environments would want to promote and prolong the state of flow, any intervention on the part of the tutor runs the risk of adversely interfering with the flow experience. Therefore, the current version of the affect-sensitive AutoTutor does not respond to episodes of flow. Instead, we focus on addressing the affective states of boredom, frustration, and confusion. At this point in science, there are no empirically proven strategies to address the presence of boredom, frustration, and confusion. Therefore, possible tutor reactions to student emotions were derived from two sources: theoretical foundations of pedagogy/affect, and recommendations made by pedagogical experts. 2.1 Theoretical Perspectives An examination of the literature provided some guidance on how best to respond to the states of boredom, confusion, and frustration. We focused on two major theoretical perspectives that address the presence of these negative emotions. These included attribution theory [18-20] and cognitive disequilibrium during learning [3-5, 13]. Attribution Theory to Address Boredom and Frustration. Attribution theory is based on the explanations people make to explain their success or failure. According to this theory, the cause of the success or failure can be based on three dichotomous factors: internal or external; stable or unstable; and controllable or uncontrollable. A basic principle of attribution theory is that a person's attributions for success or failure determine the amount of effort the person will expend on that activity in the future, and that people tend to make attributions that allow them to maintain positive views of themselves. So, success will be attributed to stable, internal, and controllable factors and major failures will be attributed to external, uncontrollable factors. However, it is important to get learners to change this failure attribution so that their failures are attributed to internal, unstable factors over which they have control (e.g., effort) [19,20]. In order to change this attribution, learners must be encouraged to focus on learning goals. People who emphasize learning goals are likely to seek challenges if they believe the challenges will lead to greater competence, and they tend to respond to failure by increasing their effort [7].
598
S. D‘Mello et al.
Empathy has been indicated to be an important emotional response for attributions [21]. In this case empathy serves two functions. First, displaying empathy portrays an awareness of blocked goals and a willingness to help. When displays of empathy are observed, the learner is more likely to anticipate the goals of the other displaying empathy [18]. So an example from a tutoring context would be the tutor displaying empathy for the student will cause the student to understand the tutor is attempting to help and will make the student more likely adopt the learning goals put forth by the tutor. Therefore, both boredom and frustration can be handled in similar ways using empathetic responses by the tutor. Cognitive Disequilibrium Theory to Address Confusion. Cognitive disequilibrium is believed to play an important role in comprehension and learning processes [4, 5]. Deep comprehension occurs when learners confront contradictions, anomalous events, obstacles to goals, salient contrasts, perturbations, surprises, equivalent alternatives, and other stimuli or experiences that fail to match expectations [1, 22]. Cognitive disequilibrium has a high likelihood of activating conscious, effortful cognitive deliberation, questions, and inquiry that aim to restore cognitive equilibrium. When a learner enters a state of confusion due to the content they are learning it is equivalent to entering cognitive disequilibrium. The tutor’s first step should be to encourage the tutee to continue working so they can reach a state of equilibrium again and by doing so reach the full benefit of the state of disequilibrium. However, if the learner persists in a state of cognitive disequilibrium for too long the tutor should display empathy with the learner’s attempts, thereby acknowledging their attempts to reach their goals and direct them out of the state of confusion before they give up. 2.2 Recommendations by Pedagogical Experts In addition to theoretical considerations, the assistance of experts was enlisted to help create the set of tutor responses. Two experts in pedagogy, with approximately a decade of related experience each, were provided with excerpts from real AutoTutor dialogues (including both the tutor and student dialogue content, screen capture of the learning environment, and video of the student’s face as illustrated in Figure 1). There were approximately 200 excerpts averaging around 20 seconds in length, each of which included an affective response by the student. The experts were instructed to view each of the excerpts and provide an appropriate follow-up response by the tutor. These example responses were placed into similar groups that loosely resembled production rules. For example, if a student is frustrated then the tutor should provide encouragement to continue and establish a small sub-goal, perhaps a hint or simplified problem. The tutor might also provide motivational and empathetic statements to alleviate frustration because this approach has been shown to be quite effective in reducing frustration [21].
3 Strategies to Respond to Learners’ Affective States We created a set of production rules that addressed the presence of boredom, confusion, and frustration by amalgamating perspectives from attribution theory and cognitive disequilibrium theory with the recommendations made by the experts. Although
Responding to Learners’ Cognitive-Affective States
599
the rules created by the pedagogical experts allowed for any possible action on the part of the tutor, AutoTutor can only implement a portion of those actions. For example, one possibility to alleviate boredom would be to launch an engaging simulation or a seductive, serious game. However, the current version of the tutor does not support simulations or gaming, so such a strategy is not immediately realizable. Consequently, we only selected production rules that could be implemented by AutoTutor’s current actions which include feedback delivery (positive, negative, neutral), a host of dialogue moves (hints, pumps, prompts, assertions, and summaries), and facial expressions and speech modulation by AutoTutor’s embodied pedagogical agent (EPA). The production rules were designed to map dynamic assessments of the students’ cognitive and affective states with tutor actions to address the presence of the negative emotions. There were five parameters in the student model and five parameters in the tutor model. The parameters in the student model included: (a) the current emotion detected, (b) the confidence level of that emotion classification, (c) the previous emotion detected, (d) a global measure of student ability (dynamically updated throughout the session), and (e) the conceptual quality of the student’s immediate response. AutoTutor incorporates this five-dimensional assessment of the student and responds with: (a) feedback for the current answer, (b) an affective statement, (c) the next dialogue move, (d) an emotional display on the face of the EPA, and (e) emotionally modulating the voice produced by AutoTutor’s text-to-speech engine. As a complete example, consider a student that has been performing well overall (high global ability), but the most recent contribution wasn’t very good (low current contribution quality). If the current emotion was classified as boredom, with a high probability, and the previous emotion was classified as frustration, then AutoTutor might say the following: “Maybe this topic is getting old. I'll help you finish so we can try something new”. This is a randomly chosen phrase from a list that was designed to indirectly address the student’s boredom and to try to shift the topic a bit before the student becomes disengaged from the learning experience. This rule fires on several different occasions, and each time it is activated AutoTutor will select a dialogue move from a list of associated moves. In this fashion, the rules are context sensitive and are dynamically adaptive to each individual learner. The subsequent section discusses each of the major components of the affect-sensitive AutoTutor. These include the short feedback, an emotional or motivational expression that is sensitive to the learners’ affective and cognitive states, an emotionally expressive facial display, and emotionally modulated speech. 3.1 Short Feedback AutoTutor provides short feedback to each student response. The feedback is based on the semantic match between the response and the anticipated answer. There are five levels of feedback: positive, neutral-positive, neutral, neutral-negative, and negative. Each feedback category has a set of predefined expressions that the tutor randomly selects from. “Good job” and “Well done” are examples of positive feedback, while “That is not right” and “You are on the wrong track” are examples of negative feedback. In addition to articulating the textual content of the feedback, the affective AutoTutor also modulates its facial expressions and speech prosody. Positive feedback is delivered with an approval expression (big smile and big nod). Neutral positive feedback receives a mild approval expression (small smile and slight nod). Negative
600
S. D‘Mello et al.
Fig. 1. Affect synthesis by embodied pedagogical agents
feedback is delivered with a disapproval expression (slight frown and head shake), while the tutor makes a skeptical face when delivering neutral-negative feedback (see Figure 1). No facial expression accompanies the delivery of neutral feedback. 3.2 Emotional Response After delivering the feedback, the affective AutoTutor delivers an emotional statement if it senses that the student is bored, confused, or frustrated. A non-emotional discourse marker (e.g. “Moving on”, “Try this one”) is selected if the student is neutral. We are currently implementing two pedagogically distinct variants of the affect-sensitive AutoTutor. These include a Supportive and a Shakeup AutoTutor. Supportive AutoTutor. The supportive AutoTutor responds to the learners’ affective states via empathetic and motivational responses. These responses always attribute the source of the learners’ emotion to the material instead of the learners’ themselves. So the supportive AutoTutor might respond to mild boredom with “This stuff can be kind of dull sometimes, so I'm gonna try and help you get through it. Let's go”. A more encouraging response is required for severe boredom (“Let's keep going, so we can move on to something more exciting”). An important point to note is that the supportive AutoTutor never attributes the boredom to the student. Instead, it always blames itself or the material. A response to confusion would include attributing the source of confusion to the material (“Some of this material can be confusing. Just keep going and I am sure you
Responding to Learners’ Cognitive-Affective States
601
will get it”) or the tutor itself (“I know I do not always convey things clearly. I am always happy to repeat myself if you need it. Try this one”). If the level of confusion is low or mild, then the pattern of responses entails: (a) acknowledging the confusion, (b) attributing it to the material or tutor, and (c) keeping the dialogue moving forward via hints, prompts, etc. In cases of severe confusion, an encouraging statement is included as well. Similarly, frustration receives responses that attribute the source of the frustration to the material or the tutor coupled with an empathetic or encouraging statement. Examples include: “I may not be perfect, but I'm only human, right? Anyway, let's keep going and try to finish up this problem.”, and “I know this material can be difficult, but I think you can do it, so let's see if we can get through the rest of this problem.” Shakeup AutoTutor. The major difference between the shakeup AutoTutor and the supportive AutoTutor lies in the source of emotion attribution. While the supportive AutoTutor attributes the learners’ negative emotions to the material or itself, the shakeup AutoTutor directly attributes the emotions to the learners. For example, possible shakeup responses to confusion are “This material has got you confused, but I think you have the right idea. Try this…” and “You are not as confused as you might think. I'm actually kind of impressed. Keep it up”. Another difference between the two versions lies in the conservational style. While the supportive AutoTutor is subdued and formal, the shakeup tutor is edgier, flaunts social norms, and is witty. For example, a supportive response to boredom would be “Hang in there a bit longer. Things are about to get interesting”. The shakeup counterpart of this response is “Geez this stuff sucks. I'd be bored too, but I gotta teach what they tell me”. 3.3 Emotional Facial Expressions Seven facial expressions were generated for the affective AutoTutor. These include: approval, mild approval, disapproval, empathy, skepticism, mild enthusiasm, and high enthusiasm. The Short Feedback section lists some of the conditions upon which these expressions are triggered. The supportive and shakeup responses are always paired with the appropriate expression, which can be neutral in some cases. Example affective displays are illustrated in Figure 1. The facial expressions in each display were informed by Ekman’s work on the facial correlates of emotion expression [23]. For example, empathy is a sense of understanding displayed to the user. This is manifested by an inner eyebrow raise, eyes open, and lips slightly pulled down at the edges (action units 1, 5, 15; [24]). Skepticism is a combination of confusion and curiosity, characterized by a furrowing of the brow, an eye squint, and one outer eyebrow is raised (action units 2, 4, 7) [25]. These displays were created with the Haptek Software Development Kit. 3.4 Emotionally Modulated Speech The facial expressions of emotion displayed by AutoTutor are augmented with emotionally expressive speech synthesized by the agent. The emotional expressivity is obtained by variations in pitch, speech rate, and other prosodic features. Previous research has led us to conceptualize AutoTutor’s affective speech on the indices of pitch range, pitch level, and speech rate [26].
602
S. D‘Mello et al.
4 Conclusions We have described a new version of AutoTutor that aspires to be responsive to learners’ affective and cognitive states. The affect-sensitive AutoTutor aspires to keep students engaged, boost self-confidence, and presumably maximize learning by narrowing the communicative gap between the highly emotional human and the emotionally challenged computer. We are currently conducting a study that evaluates the pedagogical effectiveness of the two affect-sensitive versions of AutoTutor when compared to the original tutor. This original AutoTutor has a conventional set of fuzzy production rules that are sensitive to cognitive states of the learner, but not to the emotional states of the learner. Both versions of the improved AutoTutor are sensitive to the learners’ affective states in distinct ways. The obvious prediction is that learning gains and the learner’s impressions should be superior for the affect-sensitive versions of AutoTutor. In addition to testing for learning gains, we will also compare learners’ engagement levels while interacting with the different versions of AutoTutor. We will also test if personality differences predict preference for Supportive versus Shakeup AutoTutor. The affect-sensitive AutoTutor represents one out of a handful of related efforts made by a number of researchers who have a similar vision [21, 27-30]. Our unified vision is to advance education, intelligent learning environments, and humancomputer interfaces by optimally coordinating cognition and emotions. Whether the affect-sensitive AutoTutor positively influences learning and engagement awaits further development and empirical testing.
Acknowledgments This research was supported by the National Science Foundation (REC 0106965 and ITR 0325428). Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of NSF.
References 1. Mandler, G.: Another theory of emotion claims too much and specifies too little. Current Psychology of Cognition 4, 84–87 (1984) 2. Stein, N.L., Levine, L.J.: Making sense out of emotion. In: Kessen, A.O.W., Kraik, F. (eds.) Memories, thoughts, and emotions: Essays in honor of George Mandler, pp. 295– 322. Erlbaum, Hillsdale (1991) 3. Festinger, L.: A theory of cognitive dissonance. Stanford University Press, Stanford (1957) 4. Graesser, A.C., Olde, B.A.: How does one know whether a person understands a device? The quality of the questions the person asks when the device breaks down. Journal of Educational Psychology 95, 524–536 (2003) 5. Piaget, J.: The origins of intelligence. International University Press, New York (1952) 6. Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience. Harper and Row, New York (1990)
Responding to Learners’ Cognitive-Affective States
603
7. Dweck, C.S.: Messages that motivate: How praise molds students’ beliefs, motivation, and performance (in surprising ways). In: Aronson, J. (ed.) Improving academic achievement: Impact of psychological factors on education, pp. 61–87. Academic Press, Orlando (2002) 8. D’Mello, S., Graesser, A., Picard, R.W.: Towards an Affect-Sensitive AutoTutor. IEEE Intelligent Systems 22, 53–61 (2007) 9. Graesser, A.C., Chipman, P., King, B., McDaniel, B., D’Mello, S.: Emotions and learning with AutoTutor. In: 13th International Conference on Artificial Intelligence in Education 2007, pp. 569–571 (2007) 10. Lepper, M., Woolverton, M.: The wisdom of practice: Lessons learned from the study of highly effective tutors. In: Aronson, J. (ed.) Improving academic achievement: Impact of psychological factors on education, pp. 135–158. Academic Press, Orlando (2002) 11. Picard, R.: Affective Computing. MIT Press, Cambridge (1997) 12. Graesser, A., Lu, S.L., Jackson, G.T., Mitchell, H.H., Ventura, M., Olney, A., Louwerse, M.M.: AutoTutor: A tutor with dialogue in natural language. Behavioral Research Methods, Instruments, and Computers 36, 180–193 (2004) 13. Craig, S.D., Graesser, A.C., Sullins, J., Gholson, J.: Affect and learning: An exploratory look into the role of affect in learning. Journal of Educational Media 29, 241–250 (2004) 14. D’Mello, S.K., Craig, S.D., Sullins, J., Graesser, A.: Predicting Affective States expressed through an Emote-Aloud. Procedure from AutoTutor’s Mixed-Initiative Dialogue 16, 3–28 (2006) 15. Graesser, A.C., McDaniel, B., Chipman, P., Witherspoon, A., D’Mello, S., Gholson, B.: Detection of Emotions during learning with AutoTutor. In: Proceedings of the 28th Annual Conference of the Cognitive Science Society, Mahwah, NJ, pp. 285–290 (2006) 16. D’Mello, S.K., Craig, S.D., Witherspoon, A., McDaniel, B., Graesser, A.C.: Automatic detection of learner’s affect from conversational cues. User Modeling and User-Adapted Interaction 18, 45–80 (2008) 17. D’Mello, S., Graesser, A.: Automatic Detection of Learners. Affect from Gross Body Language 23, 123–150 (2009) 18. Batson, C.D., Turk, C.L., Shaw, L.L., Klein, T.R.: Information Function of Empathic Emotion - Learning That We Value the Others Welfare. Journal of Personality and Social Psychology 68, 300–313 (1995) 19. Heider, F.: The Psychology of Interpersonal Relations. John Wiley & Sons, New York (1958) 20. Weiner, B.: An attributional theory of motivation and emotion. Springer, New York (1986) 21. Burleson, W., Picard, R.: Evidence for Gender Specific Approaches to the Development of Emotionally Intelligent Learning Companions. IEEE Intelligent Systems 22, 62–69 (2007) 22. Schank, R.C.: Explanation Patterns: Understanding Mechanically and Creatively. Erlbaum, Hillsdale (1986) 23. Ekman, P., Friesen, W.V.: The Facial Action Coding System: A Technique For The Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto (1978) 24. Chovil, N.: Discourse-oriented facial displays in conversation. Research on Language and Social Interaction 25, 163–194 (1991) 25. Craig, S.D., D’Mello, S., Witherspoon, A., Graesser, A.: Emote aloud during learning with AutoTutor: Applying the facial action coding system to cognitive-affective states during learning. Cognition & Emotion 22, 777–788 (2008) 26. Johnstone, T., Scherer, K.R.: Vocal communication of emotion. In: Lewis, M., HavilandJones, J. (eds.) Handbook of Emotions, 2nd edn., pp. 220–235. Guilford Press, New York (2000)
604
S. D‘Mello et al.
27. Conati, C.: Probabilistic assessment of user’s emotions in educational games. Applied Artificial Intelligence 16, 555–575 (2002) 28. Litman, D.J., Forbes-Riley, K.: Predicting student emotions in computer-human tutoring dialogues. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Barcelona (2004) 29. McQuiggan, S.W., Mott, B.W., Lester, J.C.: Modeling self-efficacy in intelligent tutoring systems: An inductive approach. User Modeling and User-Adapted Interaction 18, 81–123 (2008) 30. Woolf, B., Burleson, W., Arroyo, I.: Emotional Intelligence for Computer Tutors. In: Workshop on Modeling and Scaffolding Affective Experiences to Impact Learning at 13th International Conference on Artificial Intelligence in Education, Los Angeles, USA, pp. 6– 15 (2007)
Trust in Online Technology: Towards Practical Guidelines Based on Experimentally Verified Theory Christian Detweiler and Joost Broekens Man-Machine Interaction Group, Delft University of Technology, Delft, The Netherlands {c.a.detweiler, d.j.broekens}@tudelft.nl
Abstract. A large amount of research attempts to define trust, yet relatively little research attempts to experimentally verify what makes trust needed in interactions with humans and technology. In this paper we identify the underlying elements of trust-requiring situations: (a) goals that involve dependence on another, (b) a perceived lack of control over the other, (c) uncertainty regarding the ability of the other, and (d) uncertainty regarding the benevolence of the other. Then, we propose a model of the interaction of these elements. We argue that this model can explain why certain situations require trust. To test the applicability of the proposed model to an instance of human-technology interaction, we constructed a website which required subjects to depend on an intelligent software agent to accomplish a task. A strong correlation was found between subjects’ level of trust in the software and the ability they perceived the software as having. Strong negative correlations were found between perceived risk and perceived ability, and between perceived risk and trust. Keywords: Trust, user modeling, empirical research.
1 Introduction “Without trust the everyday social life which we take for granted is simply not possible” [1]. Trust is fundamental to everyday life. Relationships between people could probably never build without trust. Clearly, trust is not only fundamental to our everyday social life, but also to many of our everyday interactions with technology. This is especially the case as we depend more and more on increasingly complex and even autonomous technology. As trust is such a fundamental aspect of everyday life, much effort has gone into defining trust. Yet, for all the effort spent on defining trust, surprisingly little effort has gone into experimentally verifying what makes any given situation require trust. What characteristics or features of a situation make that situation require trust? The purpose of the present work is to understand how the need for trust arises in given situations of human-technology interaction. We propose a model that describes the elements that characterize a trust-requiring situation and experimentally evaluate the proposed model. To achieve this, we will proceed as follows: in section 2 the topic of trust will be introduced. First, we will discuss what trust is not. Second, we will J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 605–614, 2009. © Springer-Verlag Berlin Heidelberg 2009
606
C. Detweiler and J. Broekens
discuss what trust is by reviewing some of the main definitions given in trust research, and distinguishing between various stages of trust. In section 3 our model will be presented. A method to test this applied model will be discussed in section 4, followed by the results of an experiment in section 5 and a discussion thereof in section 6. Finally, in section 7 conclusions will be drawn from these results and suggestions will be made for future work.
2 Background 2.1 What Trust Is Not Trustworthiness. Two terms that are often confused are trust and trustworthiness. Trustworthiness is a property of the person or thing being trusted (the trustee) as perceived by the person doing the trusting (the truster). Trust, on the other hand, is not a property of the parties involved in a situation of trust, but an attitude of the truster toward the trustee [2], or a mechanism that makes trusting behavior possible. However, trustworthiness is not an immutable perceived property. Once a truster has assessed the trustee's trustworthiness, formed an attitude of trust, and acted upon it, the outcomes of that interaction influence the perceived trustworthiness of the trustee. If the outcomes were beneficial to the truster, the perceived trustworthiness of the trustee will be confirmed or reinforced. If the outcomes of the interaction were detrimental to the truster, that is, the truster's trust was unjustified, the perceived trustworthiness of the trustee will decrease. The perceived trustworthiness of a trustee could be viewed as a record of properties of the trustee that are relevant to a situation of trust. Confidence. Trust is also frequently confused with confidence [3] [4]. Confidence can be described as a strong conviction based on substantial evidence or logical deduction [5]. On this view, confidence is an attitude that involves little regard of possible negative outcomes, because there is substantial evidence that the outcome will be positive. Trust, on the other hand, necessarily involves the consideration of possible negative outcomes. Trusting involves recognizing and especially accepting risk; it involves choosing one action in preference to others, in spite of the possibility of being disappointed [3]. One can choose not to take such a risk, but in doing so one forgoes the benefits associated with taking that risk. Faith. As a mental attitude faith is similar to the attitude of trust, though the concepts differ in an important way. Faith can be seen as an “emotionally charged, unquestioning acceptance”; it does not require evidence [5]. It is what we are left with if we remove all cognitive content from trust [6]. On this view, an attitude is formed that the outcome of the situation will be positive, but this attitude has little or no evidential basis, or no evidence is taken into account. The mental attitude of trust does involve an amount of deliberation. As mentioned earlier, it involves recognizing and accepting risk. In recognizing risk one identifies evidence for possible negative outcomes of the situation. One also willfully accepts the recognized risk based on evidence that a positive outcome is possible. “[T]rust is an expectation based on inconclusive evidence [and] is tolerant of uncertainty or risk” [5].
Trust in Online Technology: Towards Practical Guidelines
607
Reliance. The acts of reliance and trust both involve depending on someone or something. Reliance can be seen as “complete confidence, a presumptively objective state where belief is no longer necessary” [5]. Reliance does not necessarily involve the assessment of the possible outcomes of the act of reliance. One can rely on someone or something without trusting that person or thing [7]. The act of trust, on the other hand, necessarily involves forming expectations and taking risks. It involves a prior assessment of possible outcomes and an acceptance of the risk involved in taking the trusting action. It is tied to the attitude of trust that precedes the act, briefly described above. 2.2 What Trust Is The main focus of this paper is identifying what makes trust needed, not defining what trust is. Nevertheless, it is useful to discuss some conceptualizations of trust to place the present work in perspective. Research on trust spans several disciplines, such as sociology, psychology, economics, and computer science, leading to a vast amount of different definitions of trust. Some of the most influential works will be discussed here. Many definitions of trust assume the point of view of the truster. Trust is frequently defined as a set of expectations that the truster has regarding the actions trustee that are relevant to the truster [8]. Trust is also defined as a truster's subjective probability regarding the trustee's actions that are relevant to the truster [9]. Such views of trust are limited to the truster's expectation that someone (or something) will perform actions that are relevant to the truster's own actions. These conceptualizations describe an attitude that precedes trusting behaviors. In trusting behaviors “one party behaviorally depends on the other party” [10]. Trust is often defined as more than the truster's expectations. Many definitions assert that trust invariably involves an element of risk [6]. Trust is described as an attitude that allows for risk-taking decisions [4]. Trust is similarly defined as “the willingness of a party to be vulnerable to the actions of another party based on the expectation that the other will perform a particular action important to the trustor, irrespective of the ability to monitor or control that other party”[3]. This willingness may be one of the few characteristics common to all trust situations [11]. It also describes an attitude that precedes trusting behavior. The trusting behavior involves assuming risk [3], so the attitude of trust that precedes it must be a mechanism that enables one to cope with perceived risk.
3 Proposed Model of Trust-Requiring Situations The studies described above and many others attempt to define what the attitude of trust is, what the attitude of trust is based on, or what the behavioral outcomes of trust are. Very few attempt to identify why trust is needed in certain situations. Here we will do exactly that. We will describe the type of situation in which trust is needed. For the remainder of this paper, we will view trust as a mechanism to cope with the uncertainty and perceived risk that elements that we describe bring forth.
608
C. Detweiler and J. Broekens
3.1 Human-Human Interaction Goals. Goals are the primary characteristic of trust-requiring situations. They can be described as desired outcomes (of an event) toward which efforts or actions are directed. Though there is a lack of explicit mention of goals in trust research, this does not mean that goals are ignored altogether. The basis of trust is indirectly explored in terms of the truster's goals by a number of authors [12]. Trust would not be a consideration if the truster did not depend on the trustee to perform actions conducive to the truster's goals [12]. Perceived Lack of Control. Goals alone are insufficient for a situation to require trust. The truster must also perceive a lack of control over the relevant actions of the trustee. Perceived control is understood here to be the power the truster perceives having to influence or direct the trustee's actions. Trust is only necessary if there is a perceived lack of adequate control, as not having control over the actions of the trustee is a source of uncertainty regarding the the trustee's actions [13], and thus regarding the achievement of the truster's goals. Trusting another person might serve as a means of compensating for perceived lack of control in a situation [14]. Perceived Ability. Some other factors have to be taken into consideration. In the type of situation described here, the truster does not merely assess his or her own goals and proceed to form a trusting attitude toward the trustee. If a truster has a goal that requires the actions of someone else to achieve, the truster will always make some form of assessment about the ability of that person to perform the actions necessary to achieve the truster's goals. Several authors have explicitly recognized the role of ability as an antecedent to trust, as reviews of literature in [3] and [13] clearly demonstrate. On this view, ability can be considered to be the ability of the trustee that is relevant to the truster's goals. Without this abilty, the truster's goals could not be achieved. Perceived Benevolence. The truster must also perceive that the trustee is actually positively willing to enact the ability the truster perceives. In other words, the truster must perceive an amount of benevolence in the trustee. As a characteristic of the trustee perceived by the truster, benevolence can be understood as the intention or willingness of that trustee to carry out the actions required to achieve the truster's goals [3]. If the trustee were not willing to carry out these actions, the truster's goals could not be achieved. Uncertainty and Perceived Risk. In most situations, the truster will be uncertain of the ability and benevolence of the trustee. This uncertainty entails a possibility, increased by the perceived lack of control, that the truster's goals will not be achieved. So, there is a perceived risk that the truster's goals will not be accomplished. In trustrequiring situations, uncertainty and risk can be seen as consequences of the elements of the model proposed here.
Trust in Online Technology: Towards Practical Guidelines
609
Interrelationship. Perceived full control eliminates the need for trust, as does certainty regarding the presence of both ability and benevolence by eliminating perceived risk. Certainty regarding the absence of either ability or benevolence leads to certainty that the goal will not be achieved, forcing the truster to find another trustee or redefine his goal and eliminating the need for trust. In all situations between these extremes, ranging from situations in which the achievement of the truster's goal is almost certain, to situations in which the truster's goal will almost certainly not be achieved, there is a need for trust. Situations toward the positive end of this spectrum require little trust, and in situations toward the negative end of the spectrum high amounts of trust are needed to depend on the trustee to perform the actions necessary to achieve the truster's goal. The amount of trust needed to depend on the actions of another is influenced by the truster's propensity to trust. A truster with a lower propensity to trust will require more trust in all situations within the described extremes. A truster with a high trust propensity will require less trust in said situations. The importance or intensity of the truster's goal also influences the level of trust needed. The interaction of these elements can be expressed with the following formula: T = Gi (1 - C) ((1 - A) + (1 - B)) - Tp
(1)
Where T is the amount of trust needed, Gi is the intensity of the goal, C is the amount of perceived control, A is the perceived amount of ability, B the perceived amount of benevolence, and Tp the propensity to trust or baseline trust. 3.2 Human-Technology Interaction The truster's goal, perceived lack of control, and perception of the ability of a specific technology then interact to determine the amount of trust needed to depend on that technology. Benevolence does not seem to be involved, as strictly speaking, technology does not possess such intention or willingness. We hypothesize that the amount of trust needed to depend on a technology will increase as the intensity of the truster's goal is increases, the perceived lack of control increases, and the uncertainty regarding the presence of ability increases (without there being certainty regarding inability). In a sense, the amount of trust required is a transaction cost of depending on the actions of another. Perceived control, perceived ability, and propensity to trust can lower this cost. The amount of trust the truster initially has while depending on the actions of the trustee will have to match the amount of trust required, or no depending can take place. In the following section we will describe a method used to test this model.
4 Methodology To assess the applied model proposed here, we used an experimental survey approach to collect data from a cross section of internet users. Various methods were used to recruit subjects. Initially a group of 50 people was approached through personal messages on a popular social networking site and through e-mail. These messages stated that a new product was being evaluated in cooperation with a major Dutch university, and participants were needed to help evaluate it. All participants, it stated, were
610
C. Detweiler and J. Broekens
eligible to win 20 Euros. A second group of 429 people was approached by e-mail. In addition, advertisements were purchased through an advertising service of a large search engine. The advertisement was displayed next to search terms that included terms such as “investing software”, “online investment”, and “online investing tools”. The advertisement was displayed 20,217 times. We intended to attract subjects interested in the type of software the experiment attempted to mimic. Also, an invitation to participate was placed on an internet forum for computer science graduate students at a Dutch university. From the group of people that were contacted, around 80 people visited the initial page. Of these visitors, one had an IP address in the United States, one in the United Kingdom, and the rest in The Netherlands. 26 people actually completed the survey (23.1% female, 76.9% male). A website was constructed to mimic various popular online investing tools. The intention was to create the impression that an actual novel product was in development, and that subjects were testing the performance of that product. A website was chosen due to the ease with which it allowed subjects to be recruited and due to the natural setting (subjects’ own computers) it allowed for. As stated by Bhattacherjee, the Internet is the most effective way to reach a population of online users [15]. After an initial test with 5 users, adjustments were made based on feedback and the experiment was launched. Participants were lead to believe they were evaluating a new online investment product, an autonomous software agent that invested users’ money. The interface of this supposed product was designed to resemble existing online financial products. Participants were asked to select an amount to allow the software to invest. Participants were placed in one of four experimental situations in which the supposed ability of the software and the control of the participant over the actions of the software were manipulated. To manipulate certainty regarding perceived ability, data on the software's past performance were provided in a graph, and attributed to either a verified source or the software itself. We assumed uncertainty regarding perceived ability would be higher if performance data was attributed to the software. Perceived control was manipulated by either offering an undo function, which allowed the invested amount to be changed before fully committing to it, or not offering one. We assumed that if the delegated investment could not be changed, subjects would perceive less control over the software. For the questionnaire, component items for perceived control were adapted from [16]. The component items for perceived ability were based on [15]. For perceived risk, component items were adapted from [17] and [16]. Items for trust propensity were based on [18]. Finally, items for trust were based on [19]. Component items for the goal construct were constructed for the present work and not based on previous work.
5 Results Each element of the model was measured with two questions in the post-task questionnaire. Internal consistency of the scales was measured by performing a reliability analysis of the questionnaire. This analysis yielded Cronbach alpha coefficients between .548 and .764. For an exploratory experiment with this sample size, we considered these coefficients to be acceptable.
Trust in Online Technology: Towards Practical Guidelines
611
Following the reliability analysis, a Kruskal-Wallis one-way variance analyses were conducted to assess statistically significant differences in the various variables across the four experimental groups. This non-parametric test was chosen due to the small sample size and lack of normally distributed data assumed in parametric tests. The analyses did not reveal any statistically significant differences in the levels of any of the variables, or even time spent on the task, across the four different experimental groups. The lack of differences between experimental groups suggests the ability and control manipulations did not have a significant effect. The amount participants allocated to the agent was also included in the data for experimental situations B and D. A Mann-Whitney U test revealed no significant difference in the amounts allocated in situation B (Md = 13.695, n = 6) and in situation D (Md = 8.0, n = 6), U = 9, z = -1.462, p = .144, r = .42. Again, this suggests a lack of effect of the manipulations, though in this case ability was the only element manipulated across the two situations. Finally, relationships between the post-task questionnaire items were investigated using Spearman rho correlation coefficient. There were a number of significant correlations. There was a strong correlation was between ability and trust, r = .613, n = 26, p = .001, with low levels of perceived ability associated with low levels of trust. Further, risk was strongly, negatively correlated with trust, r = -.684, n = 26, p < .001, with low levels of trust perceived associated with high perceived risk. Finally, ability and risk were negatively correlated, r = -.418, n = 26, p = .034. Here, low levels of perceived ability are asociated with high levels of perceived risk.
6 Discussion Our hypothesis was that the required amount of trust would increase as the importance of the goal, the perceived lack of control, and uncertainty regarding the trustee's ability increase. We also stated that the uncertainty regarding the trustee's ability entails a perceived risk that the truster's goals will not be achieved. The correlations we found between ability and trust, and the negative correlations between trust and risk, and risk and ability, suggest that when certainty regarding perceived ability is low, the perceived amount of risk will be high. When perceived ability is low, the amount of actual trust will also be lower. There are some limitations to the present study. Most importantly, we cannot rule out the possibility that our definitions of trust and trustworthiness, and the distinction we make between them, were not shared by test subjects. For example, correlations found between perceived ability and trust could actually be indicative of a relationship between perceived ability and perceived trustworthiness. Also, the sample size of 26 was small. Some e-mail responses to the invitation to participate suggested the invitation was seen as an unsolicited e-mail advertisement for a commercial product, which could explain why a large number of people did not even visit the site. If it was seen as an advertisement an actual product, on the other hand, the e-mail worked exactly as intended. The low number of people that proceeded after visiting the first page could be explained by the length of the introductory text, or the fact that the text was in English and the majority of the people contacted was Dutch. The request to submit an e-mail address to be eligible to win the prize money could also have discouraged potential
612
C. Detweiler and J. Broekens
participants from participating. Another possibility is that potential participants could not muster enough trust to visit the site based on the e-mail, or proceed with the experiment based on the introductory text. A further problem was the lack of sufficient differentiation between manipulations. Possibly, participants failed to notice the statements regarding the agent's ability or the participant's control. Although participants in a preliminary test deemed the interface understandable, there remains a possibility that participants in the eventual experiment found elements of the interface unclear. More than half of the participants spent less than a minute viewing the main interface, so it is also possible the manipulated elements were not noticed by participants. Another problem with the task and subsequent questionnaire was mentioned by a participant in the preliminary test and a participant in the final experiment. They indicated that it was difficult to answer questions regarding the ability of and trust toward the agent, because they had not had a chance to familiarize themselves with the website over a longer period. It is possible that other participants experienced this as well, however one of the purposes of the experiment was to study situations of initial trust with unfamiliar technology. Ideally, a balance would be found between letting participants familiarize themselves enough to be able to answer questions with some confidence, while still being able to speak of a situation of initial trust. Finally, the present work only involved experiments involving one instance of unfamiliar technology. While this gives some insight into the characteristics of trustrequiring situations involving unfamiliar technology, to build a more complete picture of the general underlying mechanisms, experiments should involve several instances of technology.
7 Conclusions and Future Work This paper set out to examine which features underlie trust-requiring situations, particularly situations of initial trust. It further sought to assess the extent to which these mechanisms can be considered universal, that is, extend beyond interpersonal trust to human trust in the things humans use. Throughout most of the literature reviewed here, though differences abound, the consensus is clear: everyday life would be impossible without trust. To discover what defines a trust-requiring situation when humans depend on humans and when humans depend on technology, fundamental elements of the emergence of the need for trust in new situations were identified. For a situation to require trust, each of the following elements must be present: the truster's situation-specific goals, which required the actions of another to achieve; the truster's perceived lack of control over those actions; the uncertainty the truster has regarding the trustee's ability to achieve the truster's goals; and the uncertainty the truster has regarding the trustee's benevolence toward achieving the truster's goals. It was argued that this model, with the exception of benevolence, can be applied to human-technology interaction. As it is not clear that benevolence is a property of technology, it was left out in the present work. Correlations found between perceived ability, perceived risk, and trust offer support for this model.
Trust in Online Technology: Towards Practical Guidelines
613
Future work should take the limitations of the present study into mind. It is important to examine these mechanisms of trust across a wide range of ages. As younger generations are born into a world in which technology is increasingly ubiquitous, it will be intriguing to see how this affects their trust in technology, and, as a result perhaps, their trust in general. There is no need for future research to be limited to online environments, though. To truly assess the general applicability of the model proposed here to human interactions with technology, as many instances of technology as possible should be tested. New forms of technology emerge every day, and models that are too tightly attuned to specific technologies will become obsolete as soon as those specific technologies do. To be able to cope with such changes, models of trust should be tested on new technology with every opportunity that arises. Ultimately, a thorough understanding of why we need to trust some of the technology we use every day, will help us make technology easier to trust.
References 1. Good, D.: Individuals, Interpersonal Relations, and Trust. Trust: Making and Breaking Cooperative Relations, 31–48 (1988) 2. McLeod, C.: Trust. In: Zalta, N.E. (ed.) The Stanford Encylcopedia of Philosophy (Fall 2008) 3. Mayer, R., Davis, J., Schoorman, F.: An integration model of organizational trust. Academy of Management Review 20(3), 709–735 (1995) 4. Luhmann, N.: Familiarity, Confidence, Trust: Problems and Alternatives. Trust: Making and Breaking Cooperative Relations, 94–107 (1988) 5. Hart, K.: Kinship, Contract, and Trust: The Economic Organization of Migrants in an African City Slum. Trust: Making and Breaking Cooperative Relations, 176–193 (1988) 6. Lewis, J., Weigert, A.: Trust as a Social Reality. Social Forces 63, 967 (1984) 7. Blois, K.: Trust in Business to Business Relationships: An Evaluation of its Status. Journal of Management Studies 36(2), 197–215 (1999) 8. Dasgupta, P.: Trust as a commodity. Trust: Making and Breaking Cooperative Relations, 49–72 (1988) 9. Gambetta, D.: Can we trust trust. Trust: Making and Breaking Cooperative Relations, 213–237 (1988) 10. McKnight, D., Chervany, N.: What Trust Means in E-Commerce Customer Relationships: An Interdisciplinary Conceptual Typology. International Journal of Electronic Commerce 6(2), 35–59 (2002) 11. Johnson-George, C., Swap, W.: Measurement of specific interpersonal trust: Construction and validation of a scale to assess trust in a specific other. Journal of Personality and Social Psychology 43(6), 1306–1317 (1982) 12. Lee, J., See, K.: Trust in Automation: Designing for Appropriate Reliance. Human Factors (2004) 13. Das, T., Teng, B.: The Risk-Based View of Trust: A Conceptual Framework. Journal of Business and Psychology 19(1), 85–116 (2004) 14. Koller, M.: Risk as a Determinant of Trust. Basic and Applied Social Psychology 9(4), 265–276 (1988) 15. Bhattacherjee, A.: Individual Trust in Online Firms: Scale Development and Initial Test. Journal of Management Information Systems 19(1), 211–241 (2002)
614
C. Detweiler and J. Broekens
16. Hsu, M., Chiu, C.: Internet self-efficacy and electronic service acceptance. Decision Support Systems 38(3), 369–381 (2004) 17. Pavlou, P.: Consumer Acceptance of Electronic Commerce: Integrating Trust and Risk with the Technology Acceptance Model. International Journal of Electronic Commerce 7(3), 101–134 (2003) 18. Cheung, C., Lee, M.: Trust in Internet Shopping: A Proposed Model and Measurement Instrument. In: Proceedings of the Americas Conference on Information Systems, pp. 681– 689 (2000) 19. Corritore, C., Marble, R., Kracher, B., Wiedenbeck, S.: Measuring Online Trust of Websites: Credibility, Perceived Ease of Use, and Risk. In: Proceedings of the Eleventh Americas Conference on Information Systems, Omaha, NE, pp. 2419–2427 (2005)
Influence of User Experience on Affectiveness Ryoko Fukuda Keio University Faculty of Environment and Information Studies Endoh 5322, Fujisawa-shi, Kanagawa, 252-8520 Japan [email protected]
Abstract. Affectiveness is frequently discussed based on the first impression to the appearance of a product. However, experience in use of that product can also influence affectiveness. In order to clarify the influence of user experience on affectiveness, user perception of products should be investigated in several phases of using a product. In this paper, two experiments were presented, which compared user perception between before and after using products and investigated user perception during repeated use of products. The results suggested that user experience can affect affectiveness in several forms. Keywords: user experience, affectiveness, attachment.
1 Affective Design and User Perception Norman [1] suggested three levels of design: visceral, behavioral, and reflective. In his phrase, visceral design is all about immediate emotional impact. Behavioral design is all about use. Reflective design is all about message, about culture, and about the meaning of a product or its use. For one, it is about the meaning of things, the personal remembrances something evokes. For another, very different thing, it is about self-image and the message a product sends to others. Which level of design, and which phase in purchase of a product is related to affectiveness? “Seemingly” affective is visceral design perceived in the initial phase. Actually, behavioral design and reflective design can also be affective. If a product offer the functions just needed and their usability are perfect, behavioral design can be perceived after using products. If use of a product brings a special meaning for a user, he/she can perceive reflective design by his/her experience gradually for longer period after its purchase. Therefore, user perception of products should be investigated in several phases of using a product, in order to clarify the influence of user experience on affectiveness. Conventional researches investigated mainly perception at the first sight or after using them, but most of them regarded only one of these two phases and did not compare them. Perception of products during repeated use of products should also be studied. In the early phase in use of a certain product, the perception of it is rather unstable. As we repeatedly use the product, its perception becomes stable. On which point of time does the perception the product become definite? What is the key factor for the “final” perception? These points were not frequently studied so far. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 615–620, 2009. © Springer-Verlag Berlin Heidelberg 2009
616
R. Fukuda
In this paper, two experiments will be presented, which compared user perception between before and after using products and investigated user perception during repeated use of products.
2 Comparison of User Perception between before and after Using Products In our former study [2], influence of user experience on affectiveness was analyzed with regard to calculators. 20 university students evaluated appearance of four different calculators at first. Consequently, participants accomplished numerical calculation tasks by using these calculators and evaluated them again based on the same criteria as in the first evaluation. 37 criteria were applied for the evaluation: 20 criteria for Semantic Differential (SD) analysis in order to determine the factors in impression evaluation, seven criteria to evaluate usability of each calculator, three criteria to evaluate subjects’ attitude toward use of each calculator, six criteria to evaluate degree of basic emotions (joy, surprise, sadness, anger, fear, and disgust), and last criterion to evaluate comfort. All the criteria were evaluated in a seven-point scale. With regard to the evaluation of basic emotions, which will be focused in this paper, evaluation value “1” meant “feel that emotion not at all” and “7” meant “feel fully that emotion”. 2.1 User Perception before Using Calculators At the first sight, joy and surprise were felt by some participants, whereas sadness and anger were rarely felt and fear and disgust were elicited only by Calculator B (Figure 1). Joy is regarded as “positive” emotion and was related to appearance of the calculators. Participants commented the reason for feeling joy, for instance, “the forms of buttons and display are unusual and very stylish” (Calculator C) or “blue and translucent calculator looks nice” (Calculator D). The appearance was evaluated rather “generally”. On the other hand, Calculator B elicited surprise, fear, and disgust. Calculator B was a scientific calculator and several participants have argued that they had bad experiences with it. Other participants had no experience with a scientific
Fig. 1. Evaluation of basic emotion before using calculators
Influence of User Experience on Affectiveness
617
calculator and it was unfamiliar for them. In this case, the evaluation was related with a personal experience and/or sense of value. Overall, the participants felt basically neutral at the first sight of the calculators. If the appearance of a calculator was good, positive emotion was elicited. 2.2 User Perception after Using Calculators Emotions of the participants after use of calculators were different from those before use of calculators (Figure 2). Especially Calculator C elicited anger and disgust, which were not elicited by its appearance. It was strongly related with the performance of the numerical calculation task. The arrangement of Calculator C was different from other calculators, so that most of the participants had great difficulty to operate them. In addition, the transparent display placed almost vertical to the body impaired its visibility. Due to these factors, most of the participants showed a bad performance in task accomplishment and felt anger and disgust. In the cases of Calculator B, most of the participants felt joy and were positively surprised after they understood that Calculator B was not difficult to operate and some functions were useful for the task accomplishment. At the same time, the feeling of fear and disgust was diminished. Use of products elicited emotion such as joy, surprise, anger, or disgust. Negative emotions were related with the difficulties in use which were not expected by appearance of products. If the usability was unexpectedly good, it lead to the feeling of happiness or “positive” surprise. 2.3 Cases Where User Experience Influences Affectiveness As mentioned above, it was confirmed that user experience could elicit some kinds of emotion. However, the interview data showed further interesting points. After trials, the participants were asked some related questions. One of them was, of which they take more account for the evaluation of a product, its appearance or usability. With regard to calculators, the answer of most of the participants was usability. On the other hand, it was not always true. For instance, many participants commented that they selected a cellular phone based on its appearance and regarded
Fig. 2. Evaluation of basic emotion after using calculators
618
R. Fukuda
not necessarily its usability. A good-looking cellular phone is preferred, even if its usability is bad. This difference is related to the difference in the situation of use. A calculator is used rather in the workplace or at home and the user is rarely observed by other people. In contrast, a cellular phone is used frequently in a public space, so that it can be observed by other people. Actually, several participants said that they cared how other people thought about the design of their own cellular phone. They even think that people evaluate them based on the appearance of the cellular phone they use. In such cases, usability is not regarded and user experience with regard to the task performance does not influence affectiveness. Instead, user experience with regard to impression of other people on users themselves has some influence affectiveness.
3 Investigation of User Perception during Repeated Use of Products The above mentioned study analyzed the change in emotion for a short time by comparing user perception between before and after using products. Another example of our study focused on the process of product use in a longer period. Skin lotion was employed as stimulus in this study, because it should be evaluated whether it is suitable for users’ skin type by its daily application and it takes certain time till a judgment can be made. 40 female university students evaluated four types of skin lotion at first and selected the best one of them. Then they applied and evaluated it everyday for two weeks. After two weeks, they evaluated four types of skin lotion again. 20 criteria were applied for the evaluation. Change in evaluation values for these criteria was investigated. With regard to the data of the first day and the last day, Semantic Differential (SD) analysis was also accomplished to clarify evaluation structure. 3.1 Change in Evaluation Structure The results of SD analysis showed a difference in the extracted evaluation factors between the first day and the last day. Based on the data of the first day, six factors were extracted: evaluation, feeling of refresh, feeling of moisture, feeling of absorption, feature, and stickiness. Based on the data of the last day, five factors were extracted: effect, attachment, feeling of refresh, sharpness, and thickness. The first factor “evaluation” on the first day consisted of preference or willingness to use. These criteria are related to a higher level in evaluation. In other words, the first factor is rather abstract. Other five factors evaluated individual characteristics of skin lotions, which were related to rather a lower level in evaluation. In contrast, the first factor on the last day was “effect” (of skin lotions). It means that the evaluation structure got concreter than before through user experience. After two weeks use, most important was whether the selected skin lotion had some effects on the users’ skin. 3.2 Change in User Perception The impression of skin lotion changed through daily application, especially with regard to evaluation of texture, absorption, and moisture retention. Participants became aware of change in own impression at various points of time: most frequently five to
Influence of User Experience on Affectiveness
619
seven days after starting to use the selected skin lotion. However, patterns in change were very different among participants (see Figure 3 and 4). Actually, it was strongly influenced by users’ skin condition and also air condition. Some participants therefore evaluated feeling of use everyday differently, whereas other participants changed hardly their evaluation. Exceptionally, the evaluation value of thickness became stable in relatively early phase or remained stable for a longer period. The reason for that is assumed that thickness is one of the physical characteristics of skin lotions and not influenced by users’ skin condition. Other criteria such as reliability, trust, satisfaction, and willingness to use were evaluated based not only one characteristic of the skin lotion, but regarding several characteristics of the skin lotion and also own feeling. Therefore, change was observed less frequently than in the evaluation of individual aspects of feeling in use.
Fig. 3. Example of change in evaluation value (1)
Fig. 4. Example of change in evaluation value (2)
620
R. Fukuda
3.3 Attachment and Affectiveness Different from the experiment mentioned in “0. 2 Comparison of User Perception between Before and After Using Products”, degree of emotion was not evaluated in this experiment. The evaluation value for preference is supposed to be more or less related with emotion. After two weeks, it was improved for 18 participants, unchanged for eleven participants, and deteriorated for eleven participants. Each participant applied the skin lotion, which she regarded as the “best” one on the first day. Above mentioned result showed that the first impression was not always correct. In addition, it takes some time until the effect of a skin lotion becomes apparent. In the case of such product, change in emotion occurs slowly as the participants became aware of the effect of the product. Influence of user experience on preference - some kind of emotion - was observed also in this experiment. Attachment is also assumed to reflect one aspect of emotion. After two weeks, 26 of 40 participants answered that they became attached to their selected skin lotion. The keywords in the reasons for that were “experience”, “adjustment”, or “accustomedness”. Through experience in daily application, the participants got used to characteristics (scent, texture, etc.) of the skin lotions and got to know the essential features. Even characteristics which impressed not much or rather badly could elicit attachment. This result implies the importance of user experience in affective design.
4 Future Perspective Above two studies suggested that user experience influences on affectiveness. The way to influence was different according to types of products and preference of users. Sometimes the experience in use itself influences affectiveness directly. In another case, the perception of belongings by other people can have some influence on affectiveness. Anyway, the relationship between user experience and affectiveness should be investigated in more detail, in order to clarify what kind of experience can strongly influence affectiveness. In addition, change in emotion should be studied for longer period – not two weeks, but one month, or several months – with regard to various products.
References 1. Norman, D.A.: Emotional design: Why we love (or hate) everyday things. Basic Books, New York (2004) 2. Fukuda, R.: Change in emotion during use of products. In: Conference Proceedings of AHFE (Applied Human Factors and Ergonomics) International Conference (CD-ROM) (2008)
A Human-Centered Model for Detecting Technology Engagement James Glasnapp and Oliver Brdiczka Palo Alto Research Center (PARC) 3333 Coyote Hill Road Palo Alto, CA, 94304, USA {glasnapp,brdiczka}@parc.com
Abstract. This paper proposes a human-centered engagement model for developing interactive media technology. The human-centered engagement model builds on previous interaction models for publicly located ambient displays. It is designed from ethnographic observation with the aim of informing technological innovation from the perspective of the user. The model will be presented along with technological mechanisms to detect human behavior with the aim of responsive media technology development.
1 Introduction This paper proposes a human-centered model to aid interactive media technology development that can detect and thereby engage individuals with this technology. We derived the human-centered engagement model from exploratory ethnography that aimed to develop a broad conceptual model for designing responsive display media technology. The concept of engagement has a wide range of contextual meanings crossing different disciplines [1] [2]. For our purposes, we are referring to engagement as a goal for focused interaction with display technology. [12] This technology is intended to develop a way to sense individuals’ presence and subsequently attract them to interact with a display (or other possible technology). The rise of digital signage as an opportunity space is vast. Several working prototypes exploring expansion of what has been accomplished with digital signage are currently being tested or are in planning [3]. One such system selects ads and other content based on the viewing audience’s size and demographics. Interactive displays will be commonplace in the near future. Design frameworks that take into account how people interact with technology will help shape the way this technology is designed to respond and interact with humans [4]. Many current conceptual models for how humans interact with displays are machine centered; they are conceptualizations of users in terms of their interactive relationship with a display. Our research explores engagement with display technology from the perspective of the mobile transient users in public settings. We focus on understanding human behavior so that technology may be designed to respond and engage individuals based on fundamental human interactive practices. [23] Ethnographic observations have been analyzed and serve as basis for a human-centered model of engagement. The different stages of this model J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 621–630, 2009. © Springer-Verlag Berlin Heidelberg 2009
622
J. Glasnapp and O. Brdiczka
are intended to be a fine-grained description of the engagement process with a technology device and as a foundation for future detection algorithms. The remainder of this paper is structured as follows. First, we will give a short review of literature on human-computer interaction and human engagement. Our ethnographic data and methodology is then described. Finally, the human-centered model of engagement and possible detection mechanisms are detailed. A short conclusion terminates this paper.
2 Related Work Related work includes human engagement with robots as well as displays. Sidner and Lee, [5] focusing on human-robot interaction have proposed a three-part model of engagement for developing the ability of robots to engage with an individual in collaborative conversation. Their model consists of, (1) initiating interaction, (2) maintaining interaction, and (3) disengaging. Research in the arena of interaction with display technology has been largely focused on collaborative workspaces and environments. Izadi, et al. [6] explored the design of interactive design in shared and sociable spaces. Beehl et al. [7] studied collaboration in multiple display workspace environments. Bringnull and Rogers [8] observed people’s activity patterns around large displays and identified three activity spaces: peripheral awareness at the outer boundary; focal awareness in an area where one can give attention to the display; and, direct interaction where people can straightforwardly interact with the display (Figure 1) Building on this conceptual model, Finke et al. [9] designed a display to encourage user participation in an interactive game. The intent is to move those in the periphery into direct interaction, or from being bystanders, to spectators, and finally, to actors. Benford et al. [10] used a similar framework to focus on the design of the audience experience of public interaction referring to those on the periphery as spectators and those who are engaged as performers. Similarly, Vogel et al. [11] proposed a display-centered model of engagement, for sharable, interactive public ambient displays (Figure 1). This display centered framework offers four phases starting with an ambient display phase at the farthest distance from the display where users can see static and generalized information. From there, as users move into closer proximity to the display, they are in the implicit interaction phase and can be recognized by the system by their body position. When the user
Fig. 1. Conceptual models of how individuals orientate themselves in public settings
A Human-Centered Model for Detecting Technology Engagement
623
pauses, the system enters what is called the subtle introduction phase. During this phase, the system may identify relevant information. Finally, in the personal interaction phase, the user moves closer to the screen and touches items for more information. This framework for interaction allows the user to go from a state of implicit to explicit interaction. Additionally, Vogel et al. conceptualized a list of design principles to guide the development of responsive displays. These display centric models assist in understanding how individuals orient themselves to displays and how groups of people involved in collaboration or social settings respond and relate to displays. Studies of human interaction and conversation analysis help us to understand the human practices of people in these environments— both within and outside the periphery of the display. Goffman’s description of social interaction has implications for understanding how individuals behave within and outside display peripheries. Goffman [12] divides face–to–face interaction into unfocused and focused interaction where unfocused interaction occurs when individuals are merely copresent. In this type of interaction one or both parties might modify behavior because they know they can be observed. Such would be the case when walking in a busy shopping center, or when in close proximity to strangers in the periphery or interacting with a display. In contrast, focused interaction occurs when people “agree to sustain for a time in a single focus of cognitive and visual attention, as in the conversation, a board game, or a joint task [with a] close face-to-face circle of contributors.” [12] Goffman refers to social arrangements that involve participants with a cognitive focus of attention as a “focused gathering.” [12] In these encounters, an individual’s presence is acknowledged through expressive signs and formal rituals. What emerges is a single thing or shared experience that both parties achieve together over time. Encounters can be accompanied by rituals that include ceremonies of entrance and departure, but always provide a central base for communication between parties as well as “corrective compensation” for deviant acts. Benford et al. [13] used the concept of frame in their conceptual framework for promoting game play so that players understand what is within the circle of play. As individuals move from subtle to personal interaction, or from bystander to actor, the encounter with the display could be considered a focused gathering. Mobile individuals outside the periphery of a display would have a different definition of their situation than those who have a focal awareness of the display, particularly if they are in a close-knit collaborative or social environment. Kendon [14] writes about how individuals agree on a ‘frame of the situation’ [14]. He also argues that participants establish assumptions about situations through their own interpretive perspectives. Garfinkel [15] suggests these frames are shared because people agree on ‘the constitutive expectancies of a situation.’ Transitory individuals in mobile settings have a different frame of the situation that is communicated through embodied actions. To be effective, media technology would need to identify these individuals and transition them from one interpretive perspective to another, from a receptive yet mobile experience to an interactive one. Finally, in regard to the act of engagement itself Goffman [12] desribes engagement as when an individual is “caught up by it, carried away by it, and engrossed in it – to be, as we say, spontaneously involved in it.” [12] He specifies engagement as a spontaneous involvement in a joint activity and that an individual becomes an integral
624
J. Glasnapp and O. Brdiczka
part of the situation. This is the very goal of technology developers to create a situation where the user is engaged with the display as an actor or performer in an interactive experience.
3 Methodology Our research objective was to describe the natural practices people use as they come into contact with and interact with public displays in transitory and mobile settings. We collected ethnographic video data in public settings with the aim of observing behaviors that demonstrate engagement and disengagement with products or displays. The data includes observations of individuals in public spaces such as an airport, two shopping malls, a movie theater pavilion and a sidewalk in a commercial setting. We specifically selected areas where people had an opportunity to enter the periphery of an interactive or static self-serve kiosk or display. Data collection occurred over the course of three weeks. Approximately 8 hours of video data was collected in addition to 10 hours of observation. Video data was analyzed on a computer desktop. We included individuals in the analysis who entered into the periphery of the display and excluded those who could not be adequately observed or who did have an opportunity to visibly show their attention to the display. We categorized behavior traits into progressive stages toward potential engagement with displays or interactive media.
4 Qualitative Analysis and Results Based on the collected video data and observations, we inductively derived a conceptual model that we call a human-centered engagement model. This model consists of five stages that guide technological development with the goal of reaching and maintaining full user engagement: receptiveness, interest, evaluation, engagement, and disengagement. The purpose in applying this model to technology design is for technology to assist or promote a user through increasing stages of participation, to achieve and maintain full engagement between the user and machine as long as possible before disengagement. What follows are descriptions of each of the phases including definition, basic concepts, indicators and possible detection. We have adopted computer vision technology as major cue for detecting these phases because it is rather light-weight to be deployed in an existing setting (mounting cameras compared to e.g., installing floor sensors or other more invasive technology) and it does not require users to carry or wear specific technology items to be detected (e.g., sensor badges or dedicated cell phones). Receptiveness. The first stage of our model describes basic receptiveness, corresponding to the capacity and willingness of a user to receive cues like advertisements or qualified information. An individual might be available to engage with a new activity irrespective of distance as long as the individual is able to observe and approach the technology in question, and that technology is able to sense their presence. Our data set consists of individuals in public areas, particularly areas in which individuals are on leisure time or have idle time (i.e., shopping malls, airports); people can be seen to walk at a more leisurely pace and demonstrate signals that they are available
A Human-Centered Model for Detecting Technology Engagement
625
Fig. 2. Human-Centered Engagement Model
and looking for something of interest. Not all individuals were receptive, some were otherwise engaged with an electronic device, social encounters with friends, or presumably rushing to their destination. For example, a male and a female subject stroll together past a kiosk, arms at their sides, walking co-jointly at a comfortable, calm stride. Their heads were upright and their attention was unfocused (on any singular object) (Figure 3, below). We also observed this behavior on public sidewalks where individuals appeared more intent on walking as a means of transportation, but still showed signs of interruptability and altered their pace, gaze and in many instances, stopped to observe the informational or eye catching displays in question.1 These observable embodied actions indicate possible capability for a user to receive cues. They have a receptive mind and are accessible listeners. Indicators of receptiveness are an individual’s path, pace, gaze, and the extent to which their attention is unfocused on any single object or event activity. A video tracking system [16] can be used to locate one or several people in real-time using one or several cameras. A face detector [17] applied on different camera images can give indications about approximate gaze and focus of attention. Technology is capable of sensor-based predictions of receptiveness [18]. The ideal scenario would be similar to the intuitive sales person who knows just how much and when to interact with customers. Our observations indicate that current display 1
We observed individuals walking down a large boulevard in a major metropolitan city in front of a real estate office and a clothing store. The real estate office displayed non-interactive rotating home information on a large plasma display. The clothing store had two display windows.
626
J. Glasnapp and O. Brdiczka
Fig. 3. A couple demonstrates receptive behavior in front of a kiosk
Fig. 4. Receptive groups and individuals strolling on busy boulevard
technology, particularly interactive technology, is lacking the ability to consistently attract individuals who are otherwise receptive to interaction. For example, technology could utilize “cues to action” to move qualified individuals to notice the display and become interested in engaging with it. Assuming that the technology can recognize the qualified user, it uses cues to get users to notice it, leveraging human senses with light, movement, sound, and potentially smell in order to solicit user attention. In our observations of self-service kiosks, many opportunities to attract a potential user’s attention were missed. Most individuals who appeared available for engagement were unaware of the kiosk when passing by. Even if the user had noted the kiosk with peripheral vision, she did not have sufficient information from which to dismiss or make judgment about the target display. Interest. The interest phase represents the point at which the user observably demonstrates curiosity or interest in an object. Once a cue from an interactive media technology has drawn the user’s attention and awareness, the user will show interest through abrupt physical changes in body movement that demonstrates at least a minimum level of concentration, albeit sometimes brief. We observed patterns of behavior that we later classified as low interest (e.g., head turns, changing the velocity of pace, changing course) and high interest (such as stopping). The technological goal for this phase is to identify potential users at the point of curiosity, increasing the likelihood of transitioning to engagement. For example, in Figure 5 below, we see two individuals exhibiting interest in a kiosk. The elderly woman pushing a cart passes by the kiosk and looks (low interest) while the man is briefly stopped in front of the sample scent tester. Reducing pace enables more precise face and gaze detection [19], which are good indicators for upcoming interest. Tracking body parts like the hands [20] and body posture which can be provided by a video tracking system further enhance the detection of interest.
A Human-Centered Model for Detecting Technology Engagement
627
Fig. 5. Low and high levels of Interest
Evaluation. Evaluation is the stage where the user will determine her goal for any potential interaction. This stage is the linkage between noticing a display and engaging with the display. This evaluation can be brief, but we include it because of the importance from the technological receptiveness perspective. The ability to detect and respond to individuals at a decision point will increase the probability of their transition from showing an interest to engagement. Based on the interest detection using face, gaze, and body parts, simple temporal triggers or more complex temporal models like hidden Markov models may indicate the transition to the evaluation phase. Three important aspects of the evaluation stage (and thus of the user’s considerations) are privacy, self-efficacy, and social norms. Each is important to the development of the user interface, location, and overall design. Privacy concerns are a potential design consideration, particularly in busy public settings which has been noted and the focus of other work [21]. Another consideration is whether individuals feel they have the capacity to successfully engage with unfamiliar technology; that is one’s own considered opinion about their personal capability, or self-efficacy [22]. If users show interest in a display, but immediately determine that further attention would fail, then there is a failed opportunity. Finally we consider the issue of social norms in public spaces. In the case of shopping areas where individuals are either alone or shopping with groups, altering the existing interaction or behavior to engage in something potentially different or beyond that which one is accustomed could deter the user from initiating the engagement. We observed individuals who demonstrated interest, but physically displayed frustration, which is the result of a negative evaluation. These cases included situations in which the display was too small to approach closely when others were located in front of the display and head shaking. We also noted behavior that indicated embarrassment. Engagement. Engagement is the point at which the user is focusing her attention. We observed individuals alone and in groups with focused interaction with an object or display and they were unfocused on everything else for a period of time. Physical characteristics that exhibited concentration could be observed: arms folded, arm to the chin, pointing, head slanted to the side. The body position also changed from other phases to a stance: arm perched to the waist, one leg bent to the side. In the figures below, note the children completely engrossed in an interactive floor display (Figure 6). Next to them, onlookers disrupt their movement and show interest in the display while the mother (to the right) tries to coax the children off the display. In the
628
J. Glasnapp and O. Brdiczka
Fig. 6. Two children engaged with floor display
other photo, two men engaged, one pointing to the display while the other puts his hand to his chin (Figure 7). Detecting the engagement phase could be realized by using proximity information coming from the video tracking system, i.e. the distance of a detected entity (person) is below a distance threshold to an object of interest. Additionally, gaze detection using a dedicated camera at the object of interest and direct interaction with the object itself are strong features for detecting this phase. Disengagement. Disengagement is the point at which the user becomes disengaged. Presumably a user can go from this stage back to the receptiveness stage. Increasing pace, changing path and activity are indicators for this phase. A video tracking system could provide the means for detecting pace and path of moving people. A few other insights we had were that groups tended to find space around the display as appropriate to view material. Children often created their own space on displays at eyelevel, often mimicking adults and interacting with made-up content. We also noted what we called engagement by default. These secondary users are accompanying individuals who are engaged. We include this insight because technology development might recognize this and provide unique content aimed at this type of user. Finally, we also noted that physical store settings within shopping malls that had brighter lights, open spaces, and easily identifiable and approachable staff attracted and engaged shoppers. We bring this up because the staff can be equated as human sensors and the attractiveness of the stores and interaction they promote are the very goal of the technology we aim to create.
Fig. 7. Two individuals engaged with a display
A Human-Centered Model for Detecting Technology Engagement
629
5 Conclusion and Future Work We have proposed a human-centered model to describe and model human engagement with the objective of enabling and improving interactive media technology. In contrast to current conceptual machine-centered models aimed at collaborative spaces, our research focuses on the perspective of the mobile transient user in public settings. The five stages of the User-Centered Engagement Model are receptiveness, interest, evaluation, engagement, and disengagement. The first stage is basic receptiveness, corresponding to the capacity and willingness of a user to receive cues. The second stage represents the moment when a cue has drawn attention and curiosity (interest), the user will get affected and briefly determines his goals for interaction. In the third stage, evaluation, the user considers whether to engage in an interaction. The fourth step represents a positive evaluation when the user invests his full attention and gets absorbed by the interaction. The last step constitutes the disengagement: when the interaction has concluded, be it positively or negatively, the user disengages and becomes receptive again. Our observations indicate that many current systems fail to attract mobile transitory users otherwise available for engagement. We believe that our user-centric model is a keystone to make these systems more appealing to people for interaction. The model is very general and can encompass a wide variety of technology uses and goals. In particular, interaction design and digital content presentation will vary from one system to another and the system deployment will affect and possibly constrain the detection methods that are used for the different phases. Thus, we did not focus on the detection aspects or on the interaction design of an ideal system. Future work will concern implementation and validation of the different phases of the User-Centered Engagement Model in various settings involving a higher number of users. We are taking first steps towards the implementation of an integrated system combining technological detection of the different phases with interaction design and evaluation. The implementation of a technological prototype for detecting the different phases of the model is foreseen. More focused user studies that reveal each of the proposed phases is also needed.
References [1] Skocpol, T., Fiorina, M.P.: Civic Engagement in American Democracy. Brookings Institution Press (1999) [2] Greenwood, C.R., Horton, B.T., Utley, C.A.: Academic Engagement: Current Perspectives in Research and Practice. School Psychology Review 31(3), 328–349 (2002) [3] Williams, M.: Japanese billboards are watching back. IDG News Service (12/12/2008), http://www.goodgearguide.com.au/index.php?q=article/270798/ japanese_billboards_watching_back&fp=&fpid (1/15/2008) [4] Canton, J.: The extreme future - the top trends that will reshape the world in the next 20 years. Institute for global futures, Inc./Penguin books (2006) [5] Sidner, C., Lee, C.: Engagement rules for human-robot collaborative interactions. Systems (2003) [6] Izadi, et al.: The iterative design and study of a large display for shared and sociable spaces. In: Proceedings of the 2005 conference on Designing for User ... (2005)
630
J. Glasnapp and O. Brdiczka
[7] Biehl, J., Baker, W., Bailey, B.: Framework for supporting collaboration in multiple display environments and its field evaluation for .... portal.acm.org (2008) [8] Brignull, H., Rogers, Y.: Enticing People to Interact with Large Public Displays in Public Spaces. Human-Computer Interaction (2003) [9] Finke, M., Tang, A., Leung, R.: Lessons learned: game design for large public displays. In: Proceedings of the 3rd international conference on Digital ...(2008) [10] Benford, S., Crabtree, A., Reeves, S., Flinham, M., Drozd, A., Sheridan, J., Dix, A.: The Frame of the Game: Blurring the Boundary between Fiction and Reality in Mobile Experiences. In: Proceedings of the SIGCHI conference on Human factors in ...(2006) [11] Vogel, D., Balakrishnan, R.: Interactive public ambient displays: transitioning from implicit to explicit, public to personal .... In: Proceedings of the 17th annual ACM symposium on User ...(2004) [12] Goffman, E.: Encounters; Two studies in the sociology of interaction. Bobbs-Merrill, Indianapolis (1961) [13] Benford, S., Crabtree, A., Reeves, S., Flinham, M., Drozd, A., Sheridan, J., Dix, A.: The Frame of the Game: Blurring the Boundary between Fiction and Reality in Mobile Experiences. In: Proceedings of the SIGCHI conference on Human factors in ...(2006) [14] Kendon: Conducting interaction: patterns of behavior in focused encounters (Studies in Interactional Sociolinguistics). Cambridge Univ. Press, Cambridge (1990) [15] Garfinkel, H.: Trust and stable actions. In: Harvey, O.J. (ed.) Motivation and Social interactions. Ronald Press, New York (1963) [16] Zhou, S., Chellappa, R., Moghaddam, B.: Visual tracking and recognition using appearance-adaptive models in particle filters. IEEE Transactions on Image Processing 11, 1434–1456 (2004) [17] Viola, P., Jones, M.J.: Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137–154 (2004) [18] Hudson, S., Fogarty, J., Atkeson, C., Avrahami, D.: Predicting human interruptability with sensors: a Wizard of Oz feasibility study. In: Proceedings of the SIGCHI conference on Human factors in ...(2003) [19] Magee, J.J., Scott, M.R., Waber, B.N., Betke, M.: EyeKeys: A Real-Time Vision Interface Based on Gaze Detection from a Low-Grade Video Camera. In: Proceedings of IEEE Computer Vision and Pattern Recognition Workshops (2004) [20] Kolsch, M., Turk, M.: Robust hand detection. In: Proceedings of Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 614–619 (2004) [21] O’Neill, E., Woodgate, D., Kostakos, V.: Easing the wait in the emergency room: building a theory of public information systems. In: Proceedings of the 5th Conference on Designing interactive Systems: Processes, Practices, Methods, and Techniques, DIS 2004, Cambridge, MA, USA, August 1-4, pp. 17–25. ACM, New York (2004), http://doi.acm.org/10.1145/1013115.1013120 [22] Bandura, A.: Social foundations of thought and action: A social cognitive theory, pp. 390–449. Prentice-Hall, Engelwood Cliffs (1986) [23] Sacks, A., Schegloff, E., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language (1974)
Relationship Learning Software: Design and Assessment Kyla A. McMullen and Gregory H. Wakefield The University of Michigan, Ann Arbor MI 48109-2121, USA [email protected], [email protected]
Abstract. Interface designers have been studying how to construct graphical user interfaces (GUIs) for a number of years, however adults are often the main focus of these studies. Children constitute a unique user group, making it necessary to design software specifically for them. For this study, several interface design frameworks were combined to synthesize a framework for designing educational software for children. Two types of learning, relationships and categories, are the focus of the present study because of their importance in early-child learning as well as standardized testing. For this study the educational game Melo’s World was created as an experimental platform. The experiments assessed the performance differences found when including or excluding subsets of interface design features, specifically aesthetic and behavioral features. Software that contains aesthetic, but lack behavioral features, was found to have the greatest positive impact on a child’s learning of thematic relationships. Keywords: human computer interaction, educational technology, interactive systems design, user interface design.
1
Introduction
Interface designers have been studying how to construct graphical user interfaces (or GUIs) more suitable for humans for a number of years. However, these designers tend to focus solely on issues concerning adults and tend to neglect those that must be considered when designing GUIs for children. The capabilities of children differ from adults in several areas including, motor skills, literacy levels, and attention spans. Because of these differences, the study of interface design for children deserves as much attention as the study of interface design for adults. In particular, we would like to study the design of user interfaces used in educational software for children four to six years of age. The ages of four years old to six years old are the most important years of learning for children [13]. We would like to study how to most effectively design software to aid learning during these critical years. Typically the designers of educational software for children focus on aesthetics, such as flashy graphics or intriguing sounds, in order to hold the attention of the child playing the game. Although these aesthetics J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 631–640, 2009. c Springer-Verlag Berlin Heidelberg 2009
632
K.A. McMullen and G.H. Wakefield
and graphics hold the child’s attention, there is no empirical evidence that learning is actually taking place. Educational software designers should perform more empirical analysis to ensure that the game interfaces that are created for children actually facilitate learning. Though the interest in interactive multimedia continues to grow, so far its design has been primarily driven by technological advances, rather than by theoretical principles [8], [14]. The goal of this study is to evaluate the effects of user interfaces for children’s learning technology, in order determine how to best facilitate learning, specifically, relationship learning. Simple Adobe Flash based games were used to develop the educational software necessary to conduct the study. They can be run on any computer with a free Flash Player installed. We hypothesize that an educational game interface that has behavioral components (has an easy to understand interface, gives hints and clues, responds to every action performed by the child, etc) and that has aesthetic components (bright colors, soothing sounds, etc) will facilitate learning in an educational software environment better than an educational game interface lacking one of these components.
2 2.1
Background and Significance Child Centered Interface Design
This study seeks to improve the quality of educational technology design for children. Interface designers sometimes forget that children are young people that constitute an entirely different computer user population with their own culture, norms, and complexities [2]. Most research on interface design focuses on adults as the primary users, neglecting the differences that children may have when interacting with educational technology. When designing software for young children, designers should focus on a particular age range, because children of different ages have vastly different preferences and levels of skills [7]. The present study focuses on the four to six year old age group. As stated earlier, these are the most formative years for a child’s learning. The focus lies here because the research on learning technologies for children has been conducted primarily using software for older children. Furthermore, the existing guidelines for designing learning technology for children do not distinguish between children of different age groups. General guidelines for designing materials for learning technology for children have been postulated (e.g. Jones [9], Clanton [5], Neilsen and Molich [11], Norman [12], and Buckleitner [3]). These necessary design features are also pictured in Figure 1. All of these interface design features can be grouped into 4 following categories: Aesthetic, Behavioral, Interaction, and Uncertainty. The choices made in designing the interface for Melo’s World was derived directly from these guidelines. 2.2
Relationship Learning
The game that was created for this study focuses on enriching relationship learning skills in children. As we know, objects in the world can be related in a
Relationship Learning Software: Design and Assessment
633
Fig. 1. Interface Design Features
multitude of ways. For example, objects can be related by the ways in which the objects participate in the same event or theme (cats eat mice, people read books, birds build nests). These assorted external relations between objects are referred to as thematic relationships. Research has shown that thematic groupings form the basis of children’s categorization and learning [10]. Wyche has shown that children in low socioeconomic environments tend to struggle with relationship learning and categorization stills [16]. For those interested in lowering the socioeconomic status barriers often found in educational software, this study is of particular interest. These skills are critical in order for children to perform well in school. Classification and categorization of objects are also a major skill set that is necessary to perform well on standardized testing. Often a school’s standardized testing performance is used to determine the amount of funding that is received; with the lower performing school systems receiving less money than the higher performing school systems. Relationship learning helps to form the foundation upon which children learn to categorize. 2.3
Categorization and Classification
There is evidence to suggest that the way in which participants categorize entities externally reflects their internal, mental representation of these concepts. Card sorting is a common empirical technique to externally elicit a representation of a person’s internal categorical structure. Card sorting originated in George Kelly’s Personal Construct Theory. Card sorts have been used historically in order to elicit each individual’s individual and often semi-tacit, understanding about objects world and their relationships to one another. Eliciting the structures of knowledge via card sorts is a more reliable indicator of a user’s expertise and learning than quantities of facts, as demonstrated by a series of investigations by Chi and Koeske [4]. This study used the free card sorting (or open card sorting) method in order to assess the participants’ initial categorical structure. In free card sorts, neither
634
K.A. McMullen and G.H. Wakefield
categories nor criteria are specified in advance and the user may arrange the cards in as many groups as they wish. In order to compare the user’s pre-existing categorical structure, as evidenced by the groups formed during the free-sort, to the correct categorical structure, a comparison metric had to be used. The metric of choice is called the Card Sort Edit Distance (CSED) metric which will later be described in more detail.
3
Melo’s World
For this study, we created an educational software game in Adobe Flash called Melo’s World. By creating and using Melo’s World, we were able to easily manipulate the game’s design features. There were three versions of Melo’s World created. All three versions contain the base features. The Aesthetic-only version contains all of the base features along with the aesthetic features. The Behavioral-only version contains all of the base features along with the behavioral features. Finally, the Aesthetic+Behavioral version contains all of the base features along with the aesthetic and behavioral features. The divisions of these features can be seen in Figure 2. At the beginning of game play, the child is able to choose the specific portion of the game that they would like to explore as seen in Figure 3. In the first scene, the child is shown a map of Melo’s World where they are allowed to pick the level they would like to play. Each level of the game focuses on enriching the child’s relationship learning skills in some familiar context (a messy room, school, the playground, grandma’s house, and the grocery store).
Fig. 2. Interface Design Features used to Develop Melo’s World
Relationship Learning Software: Design and Assessment
635
Fig. 3. Scene 1: Adventure Map - In this scene, the user is able to select the specific game module that they would like to play. Scene 2: Melo’s Messy Room - In this scene, the child is presented with Melo’s Messy Room. It is the child’s job to help pick up all of the objects in the room. Scene 3: Coloring Book - This is the media reward that the child receives for correctly classifying all of the items in the room.
In order to simplify the experiment, the child was only allowed to choose the module called Melo’s Home. This module consists of 3 scenes as shown in Figures 3. After choosing to explore Melo’s home, the child is brought to the next scene which is called Melo’s Messy Room. In this room, there are a number of items messily scattered about the floor and door. All of these items fall into the clothing or toys category. Along with the clothing and toys are a clothes hamper and a toy-box. In this scene, the child must clean up the room by correctly putting away all of the items strewn about the room by dragging each item from the floor and into its proper place. All of the toys belong in the toy-box and all of the clothes belong in the clothes hamper. Following the game’s successful completion, the child is given a media reward for playing the game. This is delivered in the form of a coloring book (as seen in Figure 3) activity. In the coloring book activity, the child is initially presented with a black and white picture, and is allowed to color the scene in any manner that they would like. After they have indicated that they are finished the coloring portion, the child is brought back to the game’s initial screen.
4 4.1
Experiment 1 Procedure
After first obtaining written parental and verbal participant consent, the participants were tested individually in a small room containing a table, a laptop with an external mouse, 2 chairs, and a video camera placed on a tripod behind the child. First, the Flashcard Pretest Task was performed. In this task, the stimuli presented were 9 black and white images on 8.5 x 11 cards. Each flashcard contained one image of an object that the child would encounter while playing the software. The images on the flashcards fell into one of two categories: toys (blocks, crayons, doll, jack-in-the-box) and clothing (jeans, jersey, pants, sweater, and tie). At the start of the task, the participants were given a stack of flashcards that were randomly arranged. Each child was given the instructions:
636
K.A. McMullen and G.H. Wakefield
Put these flashcards into groups of things that go together. After the child had grouped the flashcards, the researcher noted the groupings that were made. Next, each child played the educational computer game, corresponding to their randomly assigned condition (if the child was assigned the control condition, the game was not played). Each child played the game for two iterations. Finally, each participant performed the Flashcard Task posttest, using the same method mentioned described above in the Flashcard pretest task. 4.2
Participants
The participants were fifty-five four to six year old children (29 female and 26 male)from various after school programs, day care centers, kindergarten classes, and Head Start centers in the Ypsilanti and Ann Arbor Michigan areas. All of these children had received between one and two years of formal schooling. The mean age was 5.01 years. 4.3
Evaluation Measures
First, each child’s performance was measured by evaluating the differences between their pretest and posttest scores in a free sorting Flashcard task. We also assess the child’s performance as it changes while playing the game. The Flashcard Tasks (the pretest and the posttest) were evaluated, using the CSED metric [6] mentioned earlier. The CSED metric is a measure of how many cards need to be moved in order to transform one card sort into another. The CSED metric was used to measure how many cards are necessary to move in order to convert the participant’s sorted groups into the correctly sorted groups. An integer score between 0 and 7 can be obtained using the CSED metric. A score of 0 indicates there were no cards that needed to be moved, and the correct sorting had been produced. A score of 7 indicates every card needed to be moved in order to transform the card sort create by the participant into the correct card sort. Once this score had been generated for the pretest and the posttest, the scores were compared to each other. Using the CSED metric helped to determine how well the participant was able to sort the cards into thematic categories. Furthermore, it quantifies the effect that the software had on the child’s learning of categories, from playing the game. While each participant played the software, video data was collected(from those whose parents had consented video recording). From the video, the number of trials that each participant needed to make in order to correctly classify each item in the room (as either a toy or a piece of clothing) was recorded for each round of game play. This metric produces an integer score between 9 and 18 (assuming that an item could be incorrectly classified only once). A total number of trials closer to 9 would indicate that the child possessed the skill necessary to correctly classify each item. A number of necessary trials that was closer to 18 would indicate that the child had incorrectly classified each item before selecting the correct classification.
Relationship Learning Software: Design and Assessment
637
Fig. 4. Paired Samples t-test and ANOVA - Here a paired sample t-test is performed to determine whether there is a significant difference between the pretest scores and the post test scores for each condition. The conditions are abbreviated to the first letter of the condition name.
4.4
Results
Using the CSED metric to measure the performance during the flashcard task free sorting, the mean pretest score for the Behavioral+Aesthetic condition was 4.29 and the average posttest score was 4.36. In the Aesthetic-only condition, the mean pretest score was 4.56 and the mean posttest score was 3.00. In the Behavioralonly condition, the mean pretest score was 4.20 and the mean posttest score was 3.67. In the control condition the mean pretest score was 4.47. The children in the control condition only participated in the flashcard task pretest, and did not play the educational software game, nor perform the posttest. In order to determine whether there was a significant difference between the pretest scores and the post test scores for each condition (excluding the control condition), a paired samples t-test was performed. The results can be seen in Figure 4. In comparing the Behavioral+Aesthetic condition’s pretest and postetst scores, we obtain a paired difference significance value of 0.876. When the Aesthetic-only condition’s pretest and postetst scores are compared, we obtain a paired difference significance value of 0.023. When the Behavioral-only condition’s pretest and postetst scores are compared, we obtain a paired difference significance value of .318. From this, we can see that there is only a significant difference pretest and posttest scores in the Aesthetic-only condition. In order to determine if there was a difference between the three versions of the game, and where those differences may lie, a one way ANOVA was performed. The results are displayed in Figure 4. In this figure, one can see that the greatest difference lies between the Aesthetic-only condition and the Behavioral+Aesthetic condition with a significance of 0.043. The differences between the other pairs of conditions(Behavioral+Aesthetic and Behavioralonly; Behavioral-only and Aesthetic-only) were not found to be significant. Each child’s performance while playing the game was also recorded. The gameplay data can be seen in Figure 5. It was observed that during the first round of game play, the mean number of attempts needed in order to correctly classify all
638
K.A. McMullen and G.H. Wakefield
Fig. 5. Trial 1 and 2: Distribution of Correct Classification attempts - Along the abscissa is each room item. Item 1 is the first item correctly classified, Item 2 is the second item to be correctly classified in the room and so on. Along the ordinate is the number of incorrect categorizations observed before the item was correctly classified. Here, one can observe that although there is a steep learning curve in the Aesthetic-only condition, the performance monotonically increases, leading to less incorrect categorizations as the room items are encountered. There is no such trend for the other conditions.
of the items in the room was 11.50 for the Behavioral+Aesthetic condition, 11.91 for the Aesthetic-only condition and 11.12 for the Behavioral-only condition. During the second round of game play, the mean number of attempts needed in order to correctly classify all of the items in the room was 9.40 for the Behavioral+Aesthetic condition, 10.00 for the Aesthetic-only condition and 9.39 for the Behavioral-only condition. The distribution of classification attempts can be seen in Figure 5. From these figures one can see that in Trial 1, the participants in the Aesthetic-only condition required more attempts at the beginning of the game before they began to correctly classify the objects in the room. The number of attempts sharply decreases as game play continued. Although the Aesthetic-only condition had a higher initial learning curve, the number of attempts needed by each child to correctly classify the objects in the room monotonically decreased. In the Behavioral+Aesthetic condition, the average number of attempts needed in order to correctly classify the items in the room did not adhere to any specific patterns. The same can be said for the Behavioral-only condition. In the second round, the participants in all 3 game conditions behaved similarly, with the number of necessary attempts remaining close to 1.
5
Discussion
From the data presented in the previous section, we conclude that the Aestheticonly version of the software produced the highest positive difference in pretest and posttest scores. The Aesthetic-only version of the software also indicates a
Relationship Learning Software: Design and Assessment
639
monotonically increasing performance measure in a child’s initial game play. This is very different from the expected result, that the Behavioral+Aesthetic version of the software should produce the highest positive difference in pretest and posttest scores, and that the Behavioral+Aesthetic version of the software should show the greatest improvement in performance while the child is playing the game. This finding may indicate that the addition of the behavioral components to the aesthetic components was a distraction to the child. This finding may also be due to the fact that in the aesthetic condition, the auditory instructions were excluded. As a result, the participants had to actively deduce the instructions and the goal of the game by themselves. This is a form of active learning. Active learning tasks have been found to results in greater learning than passive learning tasks [1]. The Behavioral-only condition and Behavioral+Aesthetic condition are passive learning tasks because the child is being told the necessary information, rather than having to deduce it for themselves.
6
Conclusion
This study sought to assess the consequences of incorporating and excluding subsets of interface design features into learning technology for children. A piece of educational software called Melo’s World was developed to serve as a test facility in which we could augment the interface features and study the effects. It was found that software containing only aesthetic interface elements, thus possibly promoting active learning, produces the greatest positive differences in children learning thematic relationships. Most multimedia programs today fail because they merely add video and graphics to passive learning techniques. It does not matter whether that next page is text, graphics, or video, because the student is not doing anything. [15] These ineffective programs are using outdated, passive manners of instructing the child. In passive learning, people may absorb the facts, but they will be less active in interpreting and integrating them. Active learning results in greater learning and in more positive self-related affects and cognitions. [1] Creating educationally effective multimedia programs means taking seriously the idea of active learning. Good educational software promotes active learning, not passive learning. Furthermore, it ensures that the students are actively learning through the user interface by doing, and not simply watching. Acknowledgments. I would like to thank all of the families, staff, and students at all of the kindergarten classrooms, after school programs, and head-start programs that supported this study. I would also like the Office of Naval Research - HBCU Future Engineering Faculty Fellowship Program for funding this project.
References 1. Benware, C.A., Deci, E.L.: Quality of Learning with an Active versus Passive Motivational Set. American Educational Research Journal 21, 755–765 (1984) 2. Berman, R.A.: Preschool knowledge of language: What five-year olds know about language structure and language use. In: Pontecorvo, C. (ed.) Writing development: An interdisciplinary view, pp. 61–76. John Benjamin’s Publishing, Amsteredam (1977)
640
K.A. McMullen and G.H. Wakefield
3. Buckleitner, W.: The relationship between software interface instructional style and the engagement of young children. Phd Dissertation, Michigan State University (2004) 4. Chi, M.T.H., Koeske, R.D.: Network representation of a child’s dinosaur knowledge. Developmental Psychology 19, 29–39 (1983) 5. Clanton, C.: An Interpreted Demonstration of Computer Game Design. In: CHI 1998 Conference Summary on Human Factors in Computing Systems, pp. 1–2 (1998) 6. Deibel, K., Anderson, R., Anderson, R.E.: Using edit distance to analyze card sorts. Expert Systems 22(3), 129–138 (2005) 7. Gelderblom, H.: Designing software for young children: theoretically grounded guidelines. In: Proceedings of the 2004 Conference on interaction Design and Children: Building A Community, pp. 121–122 (2004) 8. Hegarty, M., Quilici, J., Narayanan, N.H., Holmquist, S., Moreno, R.: Designing multimedia manuals that explain how machines work: Lessons from evaluation of a theory-based design. Journal of Educational Multimedia and Hypermedia 8, 119–150 (1999) 9. Jones, M.K.: Human-computer interaction: A design guide. Educational Technology Publications, Englewood Cliffs (1989) 10. Markman, E.: Categorization and Naming in Children. MIT Press, Cambridge (1989) 11. Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI conference on Human factors in computing systems: Empowering people, pp. 249–256 (1990) 12. Norman, D.A.: Emotion and attractive Interactions 9(4), 36–42 (2002) 13. Palmer, J.: Pre-School Education, Pros. and Cons. A Survey of Pre-School education with Emphasis on Research Past, Present, and Future. Toronto Board of Education, Ontario (1996) 14. Park, I., Hannafin, M.J.: Empirically based guidelines for the design of interactive multimedia. Educational Technology Research and Development 41(3), 63–85 (1993) 15. Schank, R.C.: Active learning through multimedia. IEEE Multimedia 1(1), 69–78 (1994) 16. Wyche, L.G.: Conceptualization Processes in Third Grade Black Children. The Journal of Negro Education 49(4), 373–384 (1980)
Relationship Enhancer: Interactive Recipe in Kitchen Island Tsai-Yun Mou1, Tay-Sheng Jeng2, and Chun-Heng Ho3 1
Dept. of Media Arts, Kun Shan University, Tainan 710, Taiwan [email protected] 2 Dept. of Architecture, National Cheng Kung University, Tainan 701, Taiwan 3 Dept. of Industrial Design, National Cheng Kung University, Tainan 701, Taiwan {tsjeng,[email protected]}
Abstract. HCI researches on kitchen have been focusing on creating new devices to facilitate cooking works and eliminate mistakes. However, kitchen is also a place where family and friends create meaning and memories. Therefore, here we develop an interactive recipe in kitchen island that aims to enhance social bonds and pleasure among people. The system utilizes tangible interaction for creative recipe and keeps records of people’s favorite foods. Groups of family and friends participated in the study. The results indicate that there existed a cognition gap in people’s understanding of each other’s food preference. Participants agreed that the interactive recipe increased communication when preparing food for others. In creativity aspect, the numbers of new dish did not increase because of collaboration, but instead people showed more creative dish ideas. On the other hand, individual developed more dish variations but ordinary recipe design. Keywords: HCI, recipe, communication, creativity, social interaction.
1 Introduction HCI researches on kitchen have developed in great advancement in recent years. Activity in the kitchen, such as cooking, has been investigated with profound efforts and technology. If we examine current studies, major trend of HCI design in kitchen focuses on developing some tools to improve work efficiency [5][9], accuracy [11][21] or nutrition awareness [6][14]. While the development of technology product may improve human’s defectiveness and mistakes, treating people’s problems with corrective technology [8] considers only part of user experiences. Besides, whether the product could meet human’s need remains an unanswered question, since it is through the explorative process [1] that people find pleasure and personal identity in cooking. Previous social study on food also suggests that what people regard as important in food consumption are cheerful physical environment, including temperature, lighting, and acoustic conditions, and good social interactions in the whole process [18]. Thus, while on the way of endeavoring to assist human’s frailty in the kitchen, more efforts should be devoted to increase people’s interaction. This could suggest that HCI in the kitchen besides treating problems with technology, increasing positive experiences in kitchen activity is more essential and potential to fulfill human’s need. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 641–650, 2009. © Springer-Verlag Berlin Heidelberg 2009
642
T.-Y. Mou, T.-S. Jeng, and C.-H. Ho
With the concept of good social interaction is people’s common value, we should try to develop HCI artifact that supports not only external convenience, but also internal need [19]. Terrenghi et al. [26] had tried to explore this concept in their Living Cookbook project. By setting up a camera and a recipe system in kitchen counter, people’s cooking process is captured and shared with friends. Yet their design goal did not seem to be attained through the system because people mainly responded to its functionality. Liu et al. [16] in their design of Synesthetic Recipes also contained social feature of family tastes to support food decision making. However, their design aimed at individual person especially mother; social interaction or communication among people had not been examined so far. Even though these studies contributed less than their design concept, they have set up a milestone in human-food interaction, i.e. social aspect of HCI. Hence, here we would further probe into this direction by developing a sociable artifact to increase interaction and communication among people in their food consumption process. 1.1 Design Concept The decision on what kind of medium to use and where the implement would apply to in the kitchen space has to meet the goal of being pervasive into people’s life and environment [10]. Examining current activities at home, Taylor and Swan [25] indicated that family members often communicated through some kitchen objects, such as kitchen table or refrigerator, to share their social information and daily activities. Therefore, instead of placing a display in cooking counter [9][21], a surface that could support ordinary usage and technology implementation is more appropriate. Thus we chose kitchen island for its role and function in the kitchen; since it is the place where people would place ingredients and stay around to talk to family members or friends. To develop a sociable artifact that would merge with the island, we employed reacTIVision [12] system which is suitable for multiple people interaction. The original application of reacTIVision is for music and sound creation. People could play with tangible tags and mix at their likes to visualize not only multimedia elements but also sounds. By taking advantaging of its interactive visual features and table-based characteristic, we transformed the framework into an interactive recipe on the surface of island. Since our goal is to increase communication in food consumption process, we designed the system that encouraged people to play with tangible ingredient disks to create dishes. Through this system, we hoped to foster people to discuss their preferences in food preparation stage and further stimulate for new ideas of dish. This concept is supported by Short’s [23] study of domestic cooking, which indicated that people got inspirations from one another informally. Thus through utilizing an interactive recipe system in the island, preparing for meals could be a process that people can participate in and create experiences [4] together. The preparation time people spending together could be a social moment for sharing, listening, and getting to know each other more [17]. 1.2 Objectives and Hypotheses For the design of interactive recipe is to augment celebratory aspects of HCI; thus our objectives here are to examine to what extent and how the implement affects people’s experiences. First, we want to know whether the system could increase people’s
Relationship Enhancer: Interactive Recipe in Kitchen Island
643
understanding of and communication on each other’s food preferences. Second, we hope to know in the creation of new dishes, whether the system has positive effect on people’s creativity and whether collaborative work would produce more dish variations than individual. So here are our hypotheses. First, interactive recipe in the island could increase communication and understanding among people. Second, people could create more new dishes together than individual. Third, people could have pleasant experiences in using the system.
2 Method 2.1 Interactive Recipe Design The main page of the interface contained My Favorite, Recipe, and Web Video. The system had not made connection to web data. Therefore, except Web Video, both My favorite and Recipe are fully functioned. The design concept is to increase communication and understanding of each other in food preparation. So the island serves as a place that people can stay around and share thoughts together. In Recipe option, people can place multiple ingredient disks with fiducial symbols in the back to see possible food suggestions. Nutrition facts of each ingredient are shown around the disks to remind and convey indirect messages [2] to people about their eating habits. Currently the system has ten ingredients and over 70 dishes in the data. Eight dishes have cooking videos that people could use a play disk to play/stop video by placing/removing it on the table. Figure 1 is the snapshot of pork and cabbage disk with their food suggestions. Besides creative recipe to foster interactions among people, a record of people’s favorite foods could be added into the system as well. In the creation of dishes, people can add in their love food by simply touching the picture and selecting a character to represent himself/herself. Therefore, the system could not only be a platform for people to understand each other, but also a reminder for the one who has to prepare meal for family or friends. Figure 2 is the interface of a participant’s favorite dish. 2.2 System Technical Attributes The system’s hardware is constructed with a CCD camera, a projector, an IR pass filter, several infrared lights, a semitransparent glass, and a dual-core computer. Figure 3 is the system prototype placed in the kitchen. We utilized an open-source reacTIVision [12] framework for our recipe system. Feature of this framework is its
Fig. 1. Interactive recipe interface
Fig. 2. Favorite dish interface
Fig. 3. System prototype in the kitchen
644
T.-Y. Mou, T.-S. Jeng, and C.-H. Ho
table-based tangible interaction. The camera detects multiple fiducial symbols attached on the bottom of tangible objects, which are tracked by reacTIVision algorithm. With the system’s own communication protocol, TUIO [13], it encodes and transmits the attributes of tangible objects, such as their presence, position, and rotation, to client applications such as C++, java and Processing. These messages are then decoded and graphical/musical results are presented. Besides fiducial symbols, latest version of reacTIVision could also recognize finger, which is intuitive for actions such as touch and select. For more detailed descriptions of reacTIVision framework can be found in Kaltenbruner’s [12] research. In our system, we used Processing as the interface application. 2.3 Participants Since the system’s ultimate goal is to increase the communication among people and thus further enhance people’s relationships; therefore, people with some kind of social bonding are an appropriate starting point. Two social groups were studied in the research. The first group was a nuclear family of three; the second group was friends of three who had known each other and worked together for several years. 2.4 Experimental Design To evaluate whether the system is effective in improving understanding and creativity of people, two stages of experiment were designed. The first stage was to examine how people know about each other in food choice. Before using the interactive recipe, two people from both groups were presented separately all of the dishes with pictures built in the system. Each participant was asked to choose 8 dishes that he/she thought the other person would like. This data were collected for later examination of people’s preferences and cognition toward each other. The second stage was to study people’s experiences with the recipe system, both individually and collaboratively. Two experiments were designed. In the first experiment, the two users from both groups used the system individually and were required to add his/her 8 favorite dishes into the system. This was to compare with the results from first stage. Second task for each participant was to create new dishes within 10 minutes. This was to see how people’s creativity developed with the system. In the second experiment, the two people both groups were given a party scenario for the
Fig. 4. Structure of experimental design
Relationship Enhancer: Interactive Recipe in Kitchen Island
645
third person in their group. They were asked to use the system together and decide their selection of 8 dishes for the party. Likewise, the twos from both groups were asked to create new dishes within 10 minutes. As for the third person in the two groups, he/she was presented all dishes in the system and was asked to choose 8 of his/her favorite food. After the experiments, a survey and focus group interview was conducted to understand people’s experience and cognition/understanding towards each other. The structure of the whole experimental design is shown in Figure 4.
3 Results 3.1 Participants’ Cognition of People’s Favorite Foods
Correct Selection of Dish
By comparing the results of stage one experiment and stage two experiment 1, we hoped to see whether there existed differences between people’s cognition of each other’s food preference. In group of family, mother and son participated together, and father acted as the third person for experiment. Figure 5 is the result of selection accuracy. The results showed that in group of family mother had two dishes correct with son’s answer; however, son got no dishes correct with mother’s preference. For group of friends, both of them only had one dish match with the other’s answer. The accuracy of their understanding of the other person was not high enough in both groups. This low percentage of accuracy indicated that even people with certain social bonding and relationship did not mean they understood each other’s food preference well. Explanation for this phenomenon revealed in the interview as described here: (Friend A) We often eat out together during weekdays. There are not many choices of food store. We already get used to eating same food everyday. It’s not we like it or not, but it’s familiar and convenient. (Son) I don’t really know what kind of food my mother would like. In dinner table, my mom often cooks one or two dishes I like. Now I figure out I don’t know much about what she would like. (Mother) When I cook, I usually look what I have in the refrigerator and make dishes I am familiar with. I know my son like certain food so I would try to make it for him. For other dishes I would try to balance the meal to be healthy for my family.
Mother
Son
Friend A
Friend B
Participant
Fig. 5. Participant’s correct selection of each other’s food preference
646
T.-Y. Mou, T.-S. Jeng, and C.-H. Ho
From the interview we could see that for group of friends, food choice is restricted to the surrounding places close to their work. Thus, even though they spent time eating together, there still not much understanding of partner’s like. For group of family, young generation (son) only knew personal taste but not his parents’. Although mother knew what kind of food her son would like, only two of them were correct. This suggests that from son’s perspective, there were more dish variety in the archive than what he usually liked at home. Therefore, son would love to eat some dishes that his mother had never cooked, since mother also mentioned that she only made familiar dishes. When we showed the records of participants’ favorite foods to the two people in both groups, they were interested to find out the other’s food preference. The visual presentation of dishes increased their attention and discussion on the food. It was in the discovering process that a sense of social interaction and mutual understanding occurred between them. As they mentioned in the following interview: (Friend B) I didn’t know that she likes this (dish). The display of people’s love food is quite useful information. It raised my interest to know more about my friend. (Mother) It is good to know what other choices of food my son would like. This could be a good medium for us to communicate food choice and educate healthy victuals. But even I know what my family likes from the record, I would still consider nutrition balance in the meal. Therefore, for participants, a record of favorite food here not only served as a reference for people to know more about personal taste, but also a stimulus for family to share and communicate food ingest. Nevertheless, for mother this record is not a bible to follow, since nutrition balance for the whole family is more important than particular likes. As for group of friends, they did not consider nutrition as an important factor to be a favorite food either. They thought it was useful to know each ingredient’s nutrition, but not necessary to forbid too much or less of certain food. 3.2 Participants’ Creativity In the second stage of experiment, we hoped to see whether the design of the system has positive influence on people’s creativity and thus increases interaction between each other. Since the system could be used personally as well as jointly, we compared two experiments that required participants to create new dishes in 10 minutes. Figure 6 and 7 are respectively the results of individual creation and collaborative creation of dishes. For individual creation with the recipe system, we could see that son had more dishes than other people in the experiment; following are mother, friend B and A. It is interesting that son was the only one that had never cooked before but showed more creativity than others who had experiences. In their process of creation, participants mostly played with the ingredient disks quietly and thought for new ideas of dish. From observation of each participant’s experiment, we noticed that they tended to pick up ingredients that they personally liked and created based on that combination. To better understand their ideas in food design, follow up interviews showed some clues of participants’ thinking. (Friend B) Since I don’t cook very often, I don’t know whether this dish is eatable or not. I could imagine the combination of these ingredients are ok, not too strange to swallow.
20 15
15 10 7
10
9
5 0 Mother
Son
Friend A
Friend B
Participant
Fig. 6. Participant’s individual creation of dishes
Dish Creation
Dish Creation
Relationship Enhancer: Interactive Recipe in Kitchen Island
647
15 10
8
10 5 0 F amily
Fr iend
Group
Fig. 7. Group’s collaborative creation of dishes
(Friend A) It took me a while to think of new dishes. That’s why I came up with so few answers. And honestly speaking, I find out that my ‘new dishes’ are not new enough compared with others. They are ordinary home dish that I could think of. (Mother) When I designed new dish, what I thought about were the cooking methods and whether it was delicious or not. (Son) I don’t think too much about how to cook. Designing new dishes is fun because I can imagine any new combinations of these ingredients. From the interview above we could find out that when participant interacted with the system individually, they tended to think of new dish from their perspective. As we can see from Figure 9 and 10, the overall number of dishes is higher in individual creation than collaborative design, which is opposite to our hypothesis. Explanation for this phenomenon could because for personal interaction, participant’s inner thinking generates ideas independently, without interruption or interaction with others. However, if we examine from creativity degree, although individual had more dishes, those were ordinary food that they were familiar with and felt safe about. In the process of collaborative experiment, we noticed that interaction increased in both groups with discussion that reflected their relationship. In group of friends, when they brainstormed together, they would agree/disagree the other’s dish suggestion. The discussion atmosphere was not unpleasant; instead, it was through this communication they achieved a common solution. Regarding group of family, their interaction was more fluent between the participants. Mother played an assistant role who guided son’s creative thinking into practical realization. Moreover, mother also tried to bring the concept of food nutrition in their recipe creation. Thus, by constructing the discussion on son’s ideas, mother artfully communicated and educated son about food knowledge. 3.3 Collaborative Decision The last experiment is a party scenario for participants to select 8 dishes for the third person in their group. By comparing the results from the two’s decision and third person’s like, we could further inspect their co-experiences [3] with the system and social interaction with each other. The results showed that in group of friends, they had three dishes matched and for group of family was four. Apparently, this indicates that people could derive right dishes from discussion with partner. The visual representation of dishes is a good means for them to discuss and recall related experiences; as they described in the interview:
648
T.-Y. Mou, T.-S. Jeng, and C.-H. Ho
(Friend A) It is quite helpful to see the pictures when we design the meal for party. We know some ingredients that he (third person) likes, but can’t exactly recall what kind of dishes. So the display of combinations makes dish selection easier for us. (Mother) It’s good to talk with my son to design a meal for her father. This recipe system is useful in supporting our family interaction in certain level.
4 Conclusions This research we developed an interactive recipe system in kitchen island to foster positive experiences in kitchen. The design goal is to increase social interaction in food consumption process and further understand and communicate with each other in dish preference. Also, we hope to encourage people to create their own dishes by using the system since cooking new recipes is what people express creativity [1]. The results show that even people with certain social bonding such as family and friends still have limited understanding of others’ food preference. Thus although eating is a time for social interaction [18], their understanding of each other’s food preference does not reveal in the experiment. For family, mother has better comprehension of son’s like but because she does not have a variety of new dishes for meals, only two match with the son’s answer. As for son, the data presents more than what he usually has; this could explain the low correctness of mother’s prediction. From the first stage’s results, this infers that there is a need to increase social interaction among people in food consumption. A design that could assist people to know more about each other would be supportive to positive interaction. This point is verified with the presentation of people’s favorite food in the system. Participants show interests and talk about their experiences; it is in this process that a mutual understanding and social interaction increase between each other. Regarding people’s creativity in recipe design, opposite to our hypothesis, individual demonstrates more dish variations than collaborative creation. However, if we examine from the originality of dish design, people are more creative in collaborative work than individual. Participants show more selfhood when design dishes alone with the system since they tend to choose ingredients they personally like. Also, different roles that people currently play in their life affect their design thinking [7] as well. Their previous experiences with food would influence their viewpoint and results. One exception is that son as a novice [15] in cooking, demonstrates more creative and numerical recipes than others. This implies that a novice in food could be a good inspirer for other people to join in and communicate together. Whether people with no family relation could have similar effect needs to be examined in future study. From our study we could only suggest that relationship such as friends with similar food experiences have positive interaction together in food design as well. Thus broadly speaking, the sharing and brainstorming process contributes to a desirable co-experiences [4] in both family and friends. This is further confirmed in collaborative recipe decision for the other participant in their group. People recall related experiences together and the visual representation of dishes is helpful in communication. Home, is a social place where family and friends spend time together. Eating is a social activity that everyone regardless of age and gender can enjoy and participate. In the process of food consumption, food preparation is a time that people can communicate and know more about each other [17]. HCI development in kitchen space shall
Relationship Enhancer: Interactive Recipe in Kitchen Island
649
not only solve problems [27] that people encounter in the kitchen, but also extend its capability of enhancing social relationship and interaction [8]. A design that considers social aspect of feature [20] is a trend in HCI; this concept also corresponds with what Norman proposed the idea of Sociable Design. Here our interactive recipe system is a start in kitchen study. Our future research will examine the artifact’s impact on not only social interaction within the family, but also beyond the domestic space into the virtual community [24]. Hence, a sociable product in the kitchen from space speaking, could gather people physically and virtually; from cognition speaking, could enhance people’s interaction and relationship, which is our ultimate objective. Acknowledgements. This research was funded by National Science Council of Taiwan (NSC 97-2218-E-006-012).
References 1. Abarca, M.E.: Authentic or Not, It’s Original. Food & Foodways: History & Culture of Human Nourishment 12(1), 1–25 (2004) 2. Aleahmad, T., Balakrishnan, A.D., Wong, J., Fussell, S.R., Kiesler, S.: Fishing for sustainability: the effects of indirect and direct persuasion. In: CHI 2008 extended abstracts on Human factors in computing systems (2008) 3. Battarbee, K.: Defining co-experience. In: 2003 international conference on Designing pleasurable products and interfaces (2003) 4. Battarbee, K., Koskinen, I.: Co-experience: user experience as interaction. CoDesign 1(1), 5–18 (2005) 5. Bonanni, L., Lee, C.-H., Selker, T.: Attention-based design of augmented reality interfaces. In: CHI 2005 extended abstracts on Human factors in computing systems (2005) 6. Chi, P.-Y., Chen, J.-h., Chu, H.-h., Chen, B.-Y.: Enabling nutrition-aware cooking in a smart kitchen. In: CHI 2007 extended abstracts on Human factors in computing systems (2007) 7. Finke, R.A.: Creative Imagery: Discovery and Inventions in Visualisation. Lawrence Erlbaum Associates, New Jersey (1990) 8. Grimes, A., Harper, R.: Celebratory technology: new directions for food research in HCI. In: Twenty-sixth annual SIGCHI conference on Human factors in computing systems (2008) 9. Hamada, R., Okabe, J., IdeI, C., Satoh, S.i., Sakai, S., Tanaka, H.: Cooking navi: assistant for daily cooking in kitchen. In: 13th annual ACM international conference on Multimedia (2005) 10. Izadi, S., Fitzpatrick, G., Rodden, T., Brignull, H., Rogers, Y., Lindley, S.: The iterative design and study of a large display for shared and sociable spaces. In: 2005 conference on Designing for User eXperience (2005) 11. Ju, W., Hurwitz, R., Judd, T., Lee, B.: CounterActive: an interactive cookbook for the kitchen counter. In: CHI 2001 extended abstracts on Human factors in computing systems (2001) 12. Kaltenbrunner, M., Bencina, R.: reacTIVision: a computer-vision framework for tablebased tangible interaction. In: 1st international conference on Tangible and embedded interaction (2007)
650
T.-Y. Mou, T.-S. Jeng, and C.-H. Ho
13. Kaltenbrunner, M., Bovermann, T., Bencina, R., Costanza, E.: TUIO - A Protocol for Table Based Tangible User Interfaces. In: 6th International Workshop on Gesture in HumanComputer Interaction and Simulation (2005) 14. Kim, E., Koh, B., Ng, J., Su, R.: myPyramid: increasing nutritional awareness. In: CHI 2006 extended abstracts on Human factors in computing systems (2006) 15. Kokotovich, V., Purcell, T.: Mental synthesis and creativity in design: an experimental examination. Design Studies 21(5), 437–449 (2000) 16. Liu, H., Hockenberry, M., Selker, T.: Synesthetic recipes: foraging for food with the family, in taste-space. In: ACM SIGGRAPH 2005 Posters (2005) 17. Locher, J.L., Yoels, W.C., Maurer, D., Van Ells, J.: Comfort Foods: An Exploratory Journey into The Social and Emotional Significance of Food. Food and Foodways 13(4), 273– 297 (2005) 18. Macht, M., Meininger, J., Roth, J.: The Pleasures of Eating: A Qualitative Analysis. Journal of Happiness Studies 6(2), 137–160 (2005) 19. Maslow, A.H.: A theory of human motivation. Psychological Review 50(4), 370–396 (1943) 20. Mutlu, B.: An empirical framework for designing social products. In: 6th conference on Designing Interactive systems (2006) 21. Nakauchi, Y., Fukuda, T., Noguchi, K., Matsubara, T.: Intelligent kitchen: cooking support by LCD and mobile robot with IC-labeled objects. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2005) 22. Norman, D.: Sociable Design, http://jnd.org/dn.mss/sociable_design_-_introduction.html 23. Short, F.: Domestic Cooking Skills - What Are They? Journal of the HEIA 10(3), 13–22 (2003) 24. Svensson, M., Höök, K., Cöster, R.: Designing and evaluating kalas: A social navigation system for food recipes. ACM Trans. Comput.-Hum. Interact. 12(3), 374–400 (2005) 25. Taylor, A., Swan, S.L.: Artful systems in the home. In: SIGCHI conference on Human factors in computing systems (2005) 26. Terrenghi, L., Hilliges, O., Butz, A.: Kitchen stories: sharing recipes with the Living Cookbook. Personal Ubiquitous Comput. 11(5), 409–414 (2007) 27. Tran, T.Q., Calcaterra, G., Mynatt, D.E.: COOK’S COLLAGE. IFIP International Federation for Information Processing, Home-Oriented Informatics and Telematics 178, 15–32 (2005)
ConvoCons: Encouraging Affinity on Multitouch Interfaces Michael A. Oren and Stephen B. Gilbert Iowa State University, Graduate Program in Human Computer Interaction, 1620 Howe Hall, Ames, IA, 50011, United States {moren,gilbert}@iastate.edu
Abstract. This paper describes the design of ConvoCons, a system to promote affinity of group members working in a co-located multitouch environment. The research includes an exploratory study that led to the development of ConvoCons as well as the iterative evolution of the ConvoCon system, design tradeoffs made, and empirical observations of users that led to design changes. This research adds to the literature on social interaction design and offers interface designers guidance on promoting affinity and increased collaboration via the user interface. Keywords: Multitouch, affinity, table computing, collaboration, virtual assembly, creativity support.
1 Introduction When individuals work together for the first time they lack knowledge of one another’s reputations and other elements typically useful for successful cooperation [1]. Strangers cooperating for the first time without a shared connection to facilitate introductions and establishing common ground may at first struggle to establish a level of affinity needed for productive cooperation [3][12]. Individuals seek affinity as a means to fill a need for interpersonal relationships and established affinity is necessary for sustained cooperative relationships [7][17]. We created a system, ConvoCons, as a means of helping strangers begin the process of building affinity and using cooperative strategies. Our research intends to answer the question of whether or not a software interface can be built that can promote affinity between group members in a co-located collaborative environment. This paper presents an exploratory study that led to the ConvoCons framework and discusses the design trade-offs and iterative evaluation process that led to our current system. At this point, our research focuses on affinity creation and does not look at the length of affinity bonds created nor does it explore whether or not affinity creation through our system promotes cooperation in a competitive environment; it simply seeks to explore a low-cost method of promoting affinity within a co-located dyad where neither partner has previous knowledge of the other. The system we created, ConvoCons, is an applied reification of Nardi’s observations suggesting that incidental communication, even if unrelated to the task at hand, J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 651–659, 2009. © Springer-Verlag Berlin Heidelberg 2009
652
M.A. Oren and S.B. Gilbert
is critical to supporting productive collaborative strategies [12]. ConvoCons, defined as conversation starting icons or other visual features, are designed to serve as icebreakers and casual distractions to encourage informal discourse between new partners. This informal discourse leads to connections that aid in collaboration critical to productive collaborative strategies [12]. We believe that these affinity bonds, formed through the affinity of discussing the ConvoCons, lead to the critically important state of social cohesion [10]. To measure whether the ConvoCons approach increases affinity, a measurable definition is required. Nardi [12] defines affinity as a ‘feeling of connection between people’. We have narrowed this definition to the "convergence of thoughts, actions, or ideas" and made the following assumptions for measurement purposes within a multitouch collaborative context. First, a group that lacks affinity will have group members that are more likely to work independently from one another and more likely to enforce personal space. Signs of increasing affinity include actions such as reaching into a partner’s personal space. Personal space on a multitouch device is defined as the area immediately in front of an individual [14][16]. Second, joint work is also a sign of affinity as coordination is required. A leader-follower approach, with one person directing the other, may be a sign of affinity if the partners deem the work afterward as equally representing each other’s ideas; the leader-follower dynamic can demonstrate that the partners understand each other’s roles and skills. Third, communications indicating agreement and affirmation of actions are also indicators of a group that has acquired a degree of affinity. Fourth, high affinity groups will indicate no hesitation in close proximity working areas. Fifth, planning communications, e.g., discussion about what to make and how to make it are indicators of a group that has acquired affinity. Sixth, communication unrelated to the task is an indicator of affinity, including reading ConvoCons to one another. Simiarly, shared laughter is also an indicator of increased affinity. Given the desire to create an interface component designed to promote affinity, three main challenges arise: the types of content to use, when to display it, and how to integrate it visually with the interface at hand. The research described below describes four phases of efforts to refine the ConvoCon design based varied approaches to addressing these challenges. Phase I is based on a study of Baseplate, a virtual block assembly application, where we discovered that dyads used the abstraction present in the interface as a means of jumpstarting collaboration on a multitouch table using the SparshUI architecture [13][15]; the interface itself served as ConvoCon. Phase 2 describes a separate ConvoCon interface layer that could be attached to applications as a means of promoting affinity. Phase 2 used news headlines that appeared as rotating circles within the middle of the multitouch device that appeared in the same color each time, had some transparency, and rotated in a circle so both participants could view it (see figures 1-4 below). Phase 3 displayed riddles and jokes (first the question, then the next ConvoCon would display the joke or punch line) and the text would flash on and off as they rotated in the center of the screen. Phase 4 ConvoCons displayed a question (either a joke or riddle) on one end of the device while displaying an answer on the other end of the device to face the participant at each end. The background color of the ConvoCon was different each time. We conducted this study using a multitouch device as it allowed co-located collaboration where both pairs in a dyad could have equal control over the results, but unlike
ConvoCons: Encouraging Affinity on Multitouch Interfaces
653
collaboration with paper on a physical table we could dynamically display ConvoCon items [2]. These initial phases of ConvoCon research takes place in a co-located environment to optimize conditions for partners to understand one another’s intended actions ("social translucence") [9]. To enable two participants to collaborate, Phases 1 and 2 employed an FTIR-based 60” multitouch table [4] with participants standing. Phases 3 and 4 used a Stantum 15.4” multitouch display with participants seated.
2 Phase 1 ConvoCons Phase 1 was originally designed as an exploratory study to evaluate the use of Baseplate, a collaborative block-assembly application, in a co-located and remote environment to explore the collaboration strategies used by participants. However, after running this study, the main findings led to our development of ConvoCons. We analyzed the conversations that created bonds within dyad groups that led to their collaboration on the tasks. [5] and [6] describe the need for collaborators to have a shared vocabulary of the task in terms of distributed cognition. When using Baseplate, participants needed to have a shared understanding/vocabulary of the interface to complete the tasks. From results that showed that elements of the interface led to collaborative strategies, we were able to develop the initial framework for ConvoCons, which we hoped to use for the purpose of creating a theoretical framework to guide the future designs of co-located and remote collaboration virtual assembly environments. 2.1 Phase 1 Methods Our first experimental groups consisted of five co-located dyads that used the table, having a shared physical space (the input device of the tabletop) and a shared virtual space (the Baseplate workspace) where work was performed on a 60” FTIR table [13]. The second experimental group consisted of three dyads where one participant using Baseplate on the table and the other participant using Baseplate on the Stantum—in this condition participants were located in the same room and allowed to talk with each other, but they were not allowed to look at one another’s devices. All participants were asked to reproduce a series of simple, 2D patterns using Baseplate for three tasks, with ten minutes per task, and then were given up to ten minutes to create a new pattern collaboratively for a total of four tasks. Beyond instructions on how to place, rotate, and move blocks, participants were not given any information about the interface and what each block represented. Baseplate and the patterns used for Tasks 1- 3 can be seen in Figure 1.
Fig. 1. Baseplate (left) and the three task patterns used in the study
654
M.A. Oren and S.B. Gilbert
Participants’ hands and conversations were video recorded during task completion. Video and audio feeds were analyzed qualitatively for strategies of collaboration, while survey data were analyzed quantitatively. 2.2 Phase 1 Results and Discussion We observed a discussion that often jumpstarted the collaborative process for dyads in both focused on which block in the interface represented which block in the pattern. This discussion resulted from the ambiguity of the blocks within the interface (appearing from a 3D perspective view, whereas blocks in the pattern were in a 2D top-down view). The interface's ambiguity was thus a source of increased affinity. The idea that a challenging interface can lead to positive collaborative strategies may seem counterintuitive to a HCII audience, but it aligns with the MIT constructionism philosophy that participants learn precisely through such meaning making during constructive design [8]. Developing challenging interfaces intentionally, of course, would incur significant usability costs. Seeking a solution that would provide similar affinity benefits without the cognitive or usability costs, we began ConvoCons research. Given Nardi's observations noted above on the importance to casual conversation for affinity building, the question remained of whether the key to increased affinity and effective collaboration was task-related conversation (about the interface, leading to a shared representation of the application) or whether any conversation would have helped (per Nardi).
3 Phase 2 ConvoCons For Phase 2, we designed a structured ConvoCons system to promote affinity that could be used with any and all of our multitouch applications with equal effectiveness in promoting affinity. Since our original design was focused on promoting affinity for groups standing around a 60” FTIR table, we chose to make the ConvoCons round to indicate that they are intended for all individuals and to rotate them so no single user had a privileged view that would provide them ownership of the ConvoCon content. ConvoCons were circular, placed in the center of the display with a width of approximately 30% of the total table width and a 50% transparent background so users would both be forced to pay attention to it and able to continue working while displayed. Since ConvoCons were intended to serve as icebreakers, the first touch to occur on the multitouch device triggered the display of the first ConvoCon. This first iteration used the day's news headlines as an informal icebreaker to promote affinity. ConvoCons appeared during the first 15 minutes of interaction at 1.5-minute intervals. The 15-minute time limit was set to allow their use as icebreakers but prevent participants from being distracted during the entire course of the task. The 1.5-minute interval was set as a sufficient amount of time for the previous ConvoCon to make a full rotation while also providing one minute of uninterrupted work time for participants. In creating this initial design for ConvoCons, certain compromises were made based on choices in design tradeoffs. The most significant of these compromises was the fact that the design risked annoying and alienating users by appearing in the center of their work area; however, we chose to do this because it provided equal access to
ConvoCons: Encouraging Affinity on Multitouch Interfaces
655
Fig. 2. The virtual tangram application and the three patterns
all collaborators and because we wanted to force users to attend to the ConvoCons so we could observe the effects of ConvoCons on affinity building. This was particularly an issue of possible user frustration since ConvoCons could not be moved out of the way or made to disappear before they had completed a full rotation—this was intended to prevent one user from taking control and dominating the ConvoCons, which would prevent them from serving as a means of promoting affinity at a level of equality within the group. In addition, headlines required minimal for reading, though they required users to attend more fully to reading them than to and image or color. Headlines also had the issue that three to ten words often provide very little information to begin a discussion about a topic if no participants are familiar with the story. In order to evaluate the efficacy of this design, an initial pilot test of the ConvoCons system was conducted. This pilot took place on a 60” FTIR table using our Tangrams application as seen in Figure 2. Tangrams offers puzzles that require users to combine smaller geometric shapes into a larger geometric shape. This application was chosen as the initial test bed for the ConvoCons system due to the graphical simplicity and clarity of the application, which would ensure minimal confusion from abstractions of interfaces that could serve as an additional means of affinity building. Participants were given instructions on how to rotate, drag, and flip the seven shapes that make up tangram puzzles. Participants were then asked to complete three tangram puzzles with the solutions seen in Figure 2 and told they had up to ten minutes per puzzle. After participants completed all three puzzles they were given up to ten minutes to create anything they wanted. 3.1 Phase 2 Results Phase 2 ConvoCons were evaluated with three dyads with each dyad containing one male and one female participant and a mean age of 27. All participants in this phase had previous experience with multitouch and four of the six individuals reported their sociability as not very social, defined in our Likert scale as preferring tight groups while two reported it as highly social, defined in our Likert scale as being comfortable with talking to strangers. The two highly social participants were part of the same dyad, this dyad rated each other as acquaintances. The other two dyads rated familiarity with their partner as "seen around" and "never met," respectively. This initial iteration of ConvoCons failed almost entirely to promote affinity in a manner that would not be invasive to users. Users sometimes read the first headline and then quickly came to ignore all subsequent content of ConvoCons. Users still attended to the ConvoCons on occasion after the initial headline, but conversations about the ConvoCons were focused on how to get rid of them and about how annoying they were while trying to complete the puzzles. In unstructured interviews after
656
M.A. Oren and S.B. Gilbert
the tasks were completed, participants were unable to remember any headline in its entirety and only had a vague notion of one headline at most. All participants stated that they found the ConvoCons to be annoying and distracting and described them with such terms as "irrelevant" and "uninteresting." Two participants, each in different dyads, noted that they felt a sense of bonding over the ConvoCons in the annoyance they shared with their partner as they tried to get the ConvoCons to go away. The last of these three groups tried a modified version where the ConvoCon text flashed; however, they found it difficult to read and annoying and ignored the ConvoCons just as much as the previous groups.
4 Phase 3 ConvoCons With the failure of Phase 2 ConvoCons came the need to tweak the ConvoCon system in hopes we could still promote affinity through their use. Phase 3 ConvoCons incorporated the same design elements as Phase 2 ConvoCons but were made harder to become habituated to by having each ConvoCon appear with a random colored background. In this iteration, the headline text was replaced with riddles or jokes with the question displayed and then the next ConvoCon displaying the answer or punch line. This created the tradeoff that this iteration was very culturally grounded so its global use would be highly limited. In addition, riddles and jokes tended to be significantly longer than headlines, so users had to devote more time to read and process the text to converse about it. However, unlike headlines participants these required no additional information in order to fully understand the information. Since the first riddle and joke portion of the ConvoCon was posed as a question, it provided a potential point for users to discuss it to try to figure out the answer or punch line before receiving it from the system. The procedures for evaluating Phase 3 ConvoCons were similar to that of Phase 2 in that participants received training of the basic functionality of the system and were then asked to complete three patterns, the same three that were used for Phase 2, and were then given up to ten minutes to create any pattern they chose using the tangram pieces. However, unlike Phase 2, participants in this phase were not told of a time limit to complete the puzzles as some results in Phase 2 raised concern that placing a time on puzzle completion may bias participants to be focused on reducing the time spent on non-task oriented items like reading ConvoCons text and talking to one another. This concern arose from the unstructured interviews of Phase 2 where one participant noted that her reason for ignoring the ConvoCons was a feeling that she needed to complete the task as quickly as possible given the time constraint. 4.1 Phase 3 Results Phase 3 involved six dyads recruited from the undergraduate psychology department. Our observations indicated significant confusion of participants when the first ConvoCon appeared with one participant remarking "I didn’t know there’d be a quiz." It was also observed that due to participants’ focus on the task that by the time the answer arrived they often had forgotten the question. Generally, conversations between these dyads were muted, both in terms of conversations around ConvoCons and conversations about the task, with conversations focusing mostly on issues where the
ConvoCons: Encouraging Affinity on Multitouch Interfaces
657
solution they were independently working on was found not to work. In unstructured interviews after the tasks, two of the six groups paid significant attention to the ConvoCons with a member from group 2 stating they originally paid more attention to the ConvoCon text than they did the task. One of the two groups that attended to the ConvoCons had a modified version where two ConvoCons were displayed simultaneously without rotation with one partner receiving the question and the other the answer. The group with this slight modification recalled the greatest number of ConvoCons (3 of 10). This group was also the only of the six groups to read aloud a ConvoCon beyond the first one. The group with the modified Phase 3 ConvoCons was composed of a mixed sex dyad who had never met and both self-reporting a preference for "tight groups."
5 Phase 4 ConvoCons With the promising results from the slightly modified Phase 3 ConvoCons, we formalized the design modifications for two ConvoCons on opposite sides of the multitouch display with fixed orientation toward one user, similar to the placement of numbers or letters on playing cards. This approach provided the tradeoff that this version of ConvoCons biases interaction and affinity promotion toward two individuals rather than a group. No other visual changes were made to the ConvoCons. The procedure used was changed slightly for Phase 4 ConvoCons evaluation in order to further reduce user focus on the tasks to promote user attendance to the ConvoCons. This change was to remove the training on the Tangram application and instead provide five minutes for participants to play with the system, during which ConvoCons appeared from first touch and at minute and a half intervals. This playtime had the additional advantage in that it allowed us to evaluate the intuitiveness of the gestures employed in the application. The decision to make this change came from an observation of a tour group to which we demoed the ConvoCons-enabled Tangram application. We observed a user on the answer side covering it up while another user read the question, which suggested to us that participants may need a similarly relaxed setting in order to make use of the ConvoCons as an affinity building mechanism. Given the power of authority and the tendency to conform to assigned roles demonstrated by Milgram and Zimbardo [11][18], participants may have been strongly focused on the puzzle tasks by 1) hearing our experimenter conduct training on the tasks and 2) knowing that they would receive a departmental research credit for participating in the study. The playtime was designed to lessen these influences. We also ran participants that used Tangrams without ConvoCons but with the playtime in order to ensure that observations of affinity were a product of ConvoCons and not the playtime. 5.1 Phase 4 Results Phase 4 was evaluated using ten dyads using a ConvoCons enabled version of Tangrams and nine dyads with ConvoCons turned off. Observations of users suggested that playtime does have a role in users attending to ConvoCons as all groups attended to at least one ConvoCon during this playtime and all but one ConvoCon group chose to use the entire five minutes of the playtime although all but two groups had learned all gestures
658
M.A. Oren and S.B. Gilbert
within the first two minutes of training. In contrast, only three of the nine dyads working without ConvoCons used the entire five minutes of playtime, most stopping after roughly three minutes, one stopping after just over one minute, and others asking the researcher to advance to the task or sitting in awkward silence staring at the researcher or at their own hands until being asked if they wanted to start the puzzles. In unstructured interviews after the tasks were completed, dyads were able to remember at least three ConvoCons, both the general content and specifics, the dyads also stated that during the playtime the ConvoCons were not distracting or annoying although they were at first confused what they were and why they were there. Reaction to the ConvoCons during task time were similar to those in previous iterations where they were often ignored, although some groups continued to pause work to read over and have a shared laugh over a joke or try to solve a puzzle—this most often occurred when a ConvoCon appeared while a dyad was having difficulty solving a puzzle. One group, in commenting on the ConvoCons, responded, "[ther were] maybe not so much for getting to know each other, but for creating conversation." Another group stated that the ConvoCons probably made them talk more than they would have without them; however, they also felt the ConvoCons were irrelevant and distracting.
6 Conclusions Phase 4 ConvoCons indicate that it is possible to create a layer on top of the interface that enables users unfamiliar with one another to build affinity. Efforts to code and quantitatively compare the levels of affinity for each of the groups in Phase 4 are underway to determine the magnitude of the effects of ConvoCons. Future work looks to expand ConvoCons beyond dyads to small groups in addition to looking at the effects of ConvoCon-encouraged affinity when a reward structure is present that would result in a level of competition between participants. We also hope to explore the use of ConvoCons as a way of building affinity among remote collaborators. Acknowledgements. We thank Cole Anagnost, Thomas Niedzielski, and Desirée Velázquez for developing the initial Baseplate application. We also thank Jay Roltgen, Prasad Ramanhally, Ashley Polkinghorn, and Ankit Patel for their development efforts of the ConvoCon code base and the Tangram application. This research was performed as part of a Research Experience for Undergraduates sponsored by NSF (IIS-0552522). Additional funding was provided by the Air Force Research Lab and the Grow Iowa Values Fund.
References 1. Bolton, G., Katok, E., Ockenfels, A.: Cooperation among strangers with limited information about reputation. Journal of Public Economics 89, 1457–1468 (2005) 2. Buxton, W., Hill, R., Rowley, P.: Issues and techniques in touch-sensitive tablet input. In: Proc. of SIGGRAPH 1985, pp. 215–224 (1985) 3. Convertino, G., Mentis, H., Rosson, M., Carroll, J., Slavkovic, A., Ganoe, C.: Articulating common ground in cooperative work: content and process. In: Proc. of CHI 2008, pp. 1637–1646 (2008)
ConvoCons: Encouraging Affinity on Multitouch Interfaces
659
4. Dohse, K., Dohse, T., Still, J., Parkhurst, D.: Enhancing Multi-user Interaction with MultitouchMultitouch Tabletop Displays Using Hand Tracking. In: Proc. of Tabletop 2008, pp. 297–302 (2008) 5. Fischer, G., Arias, E., Carmien, S., Eden, H., Gorman, A.: Supporting Collaboration and Distributed Cognition in Context-Aware Pervasive Computing Environments. In: Proc. of HCIC 2004 (2004) 6. Hollan, J., Hutchins, E., Kirsh, D.: Distributed cognition: toward a new foundation for human-computer interaction research. TOCHI (2000) 7. Honeycutt, J., Patterson, J.: Affinity Strategies in Relationships: The role of gender and imagined interactions in maintaining college roommates. Personal Relationships 4, 35–46 (1997) 8. Kafai, Y., Resnick, M.: Constructionism in Practice, 1–8 (1996) 9. Kellogg, W., Erickson, T.: Social Translucence, Collective Awareness, and the Emergence of Place. In: Proc. of CSCW 2002 (2002) 10. King, J., Star, S.: Conceptual Foundations for the Development of Organizational Decision Support Systems. In: The Proceedings of the Hawaii International Conference on Systems Science, pp. 143–151 (1990) 11. Milgram, S.: Obedience to Authority. Tavistock Publication, Londong (1974) 12. Nardi, B.: Beyond Bandwidth: Dimension of Connection in Interpersonal Communication. In: Proc. of CSCW 2005, pp. 91–129 (2005) 13. Ramanahally, P., Gilbert, S., Anagnost, C., Niedzielski, T., Velázquez, D.: Creating a Collaborative Multi-Touch Computer Aided Design Program. In: Proc. of WinVR 2009 (2009) 14. Scott, S., Grant, K., Mandryk, R.: System guidelines for co-located, collaborative work on a tabletop display. In: Proc. of the Eighth European Conference on Computer-Supported Cooperative Work (2003) 15. Sparsh UI (2008), http://code.google.com/p/sparsh-ui/ (retrieved September 7, 2008) 16. Tuddenham, P., Robinson, P.: Distributed Tabletops: Supporting Remote and MixedPresence Tabletop Collaboration. In: Proc. of IEEE International Workshop on Horizontal Interactive Human-Computer Systems 2007, pp. 19–26 (2007) 17. Whittaker, S.: Theories and Models in Mediated Communication. In: Graesser, A. (ed.) The Handbook of Discourse Processes. Lawrence Erlbaum, Cambridge (2003) 18. Zimbardo, P., Maslach, C., Haney, C.: Reflections on the Stanford Prison Experiment: Genesis, Transformation, Consequences. In: Blass, T. (ed.) Obedience to Authority: Current Perspectives on the Milgram Paradigm, pp. 193–238. Lawrence Erlbaum Associates, Mahwah (2000)
Development of an Emotional Interface for Sustainable Water Consumption in the Home Mehdi Ravandi, Jon Mok, and Mark Chignell University of Toronto Dept. of Mech. & Industrial Engineering Toronto, ON, Canada, M5S 3E4
Abstract. The design of an application to monitor, analyze and report individual water consumption within a household is introduced. An interface design incorporating just-in-time feedback, positive and negative reinforcement, ecological contextualization, and social validation is used to promote behavior change. Reducing water consumption behavior in the shower is targeted, as it is the leading source of discretionary indoor water use in a typical home. In both in-shower and out-of-shower scenarios, interface designs aim to address user needs for information, context, control, reward, and convenience to reduce water consumption. Keywords: Emotional design, Water Conservation, Home, Shower, Sustainability.
1 Introduction Depending on the duration and equipment used, on average, taking a shower consumes 40 to 80 liters of water per day [1]. Showering is one of the more easily controllable water consuming activities and accounts for a significant proportion of discretionary use. It is inconvenient to measure water consumption in our daily lives and it is even harder to translate it’s impact on the environment. An interface to measure the amount of water consumed during a shower can significantly improve general attitudes towards sustainability. A general lack of awareness and difficulty in connecting personal consumption with consequences for the environment is an ongoing challenge. As showering is the leading source of discretionary water-usage in the home, designs are presented that adopt a user centered approach to providing real-time feedback of water consumed in the shower.
2 Emotional Design Traditionally, advertising campaigns designed to persuade and modify consumption behavior have been the vehicle by which conservation policy is enacted. As Aronson J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 660–669, 2009. © Springer-Verlag Berlin Heidelberg 2009
Development of an Emotional Interface for Sustainable Water Consumption
661
discovered, while such an approach can produce attitude change, the effect is frequently short-lived [2]. It is also known that while messages praising the value of water conservation succeed in changing people’s attitudes, they do not translate into new behaviors [3]. Social psychologists have long been aware that the link between attitudes and behavior is problematic [4]. Simple communication of conservation policy does not guarantee reduced consumption. Flemming emphasizes not only the importance of real-time feedback, but the method by which feedback is provided [5]. Careful consideration for user emotions in the feedback provided is needed to influence user action. In the case of showering, solutions must inform users of the volume of water used in a manner that persuades reduced consumption. An example of designing with emotion is WaterBot [6]. WaterBot is a device attached to household faucets that promotes water conservation. Using simple visual and auditory reminders, WaterBot rewards users for turning off the tap where possible. Donald Norman describes emotional design as “a framework for analyzing products in a holistic way to include their attractiveness, their behavior, and the image they present to the user - and of the owner” [7]. According to Norman, product design should address three different levels of cognitive and emotional processing: visceral, behavioral, and reflective. Emotional design is about “designing for affect - that is, eliciting the appropriate psychological or emotional response for a particular context - rather than for aesthetics alone” [8]. Using emotions as a primary driver of design considerations requires a greater focus on user needs, desires and emotions.
3 Design Process 3.1 User Needs Gathering An ethnographic approach was employed to gather information about the perceptions and behaviors people have towards water usage and conservation. Using open-ended questions around such aspects as daily showering experience, perception of consumption habits, means to improve consumption, and reactions to over consumption by members of the household was collected. Although quantitative conclusions cannot be drawn from this line of enquiry, qualitative insights into how people perceive water conservation in the shower was collected. Typically, it is assumed that how an individual behaves is a function of their knowledge and perceptions, which in turn are directly motivated by their underlying needs. To inform design requirements and leverage user emotions, an understanding of user needs motivations is necessary. As it pertains to conserving water in the shower, and satisfy user needs, it is important to support a link between individual water consumption and environmental consequences. The following represents different user needs:
662
M. Ravandi, J. Mok, and M. Chignell
1. Information. At present, conventional shower systems in the home do not have the required mechanisms to provide feedback about an individual’s water usage, in a given instance or over time. 2. Context. A disconnect exists between water usage in the home and the impact on the environment. 3. Control. Individuals desire the freedom to choose shower settings and usage targets, and would not enjoy following a prescribed solution. 4. Reward. Individuals may not always make choices for a greater good, and look for incentives linked to personal values to influence behavior change. 5. Convenience. Showering is commonly seen as a cleansing and satisfying experience; any technology introduced into the shower should enhance, not encumber the experience. In addition, past studies on energy conservation have also shown that feedback of aggregated energy use often does little to motivate conservation over the short term [9]. In order to be effective, information feedback must be presented in combination with some other encouragements to modify behavior such as to spur competition, set a goal, or obtain commitment from the consumer [9]. Applying this understanding to the problem of water conservation in the shower requires designs to incorporate the following elements: • Encourage friendly competition towards a shared objective to reduce discretionary water usage • Support individual and communal goal setting • Promote participating individuals to want to conserve water 3.2 User Needs Scenarios Through informal interviews and group discussions with users, five underlying user needs pertaining to showering and water conservation were identified. These needs informed the development of use-case scenarios and specific interaction schemes. The scenarios are as follows: User Need for Information (Individual Consumption & Real-Time Feedback). Designs must provide users with quantitative metrics about their water consumption behavior. The following represents a summary of individual information needs: • Monitor and report consumption metrics such as time spent, and volume consumed (which is affected by the water-flow rate) in real-time; • Maintain an audit trail of the amount of water consumed and saved relative to a desired goal • Understand water usage trends over differing periods of time such as time-of-day, weekly, monthly, and seasonal differences, etc.
Development of an Emotional Interface for Sustainable Water Consumption
663
User Need for Context (Relative Consumption and Associated Costs). Users must be able to connect reduction in water usage to a quantifiable benefit and avoid acting in isolation. Users see greater contextual awareness as a means to form meaningful relationships with others and achieve a sense of fulfillment. Greater contextual awareness therefore helps users: • Understand the cause and effect of incremental water usage • Identify the sources of high water consumption to allow for targeted or incremental adjustments • Learn about personal water consumption using relevant but non-traditional metrics such as economic costs and ecological impact • Achieve a broad perspective by placing and comparing personal consumption with family and community • Engage in shared experiences with others toward a common goal User Need for Control (Control Over Settings and Usage Targets). Users indicated a need to decide how their values affected their actions, whether it be for water conservation or otherwise. The user need for control can be represented through scenarios that speak to the ability for individuals to: • Maintain control over their shower settings, such as flow rate, temperature, and duration • Relative to a goal, set acceptable upper and lower limits for water flow User Need: Reward (Behavior-Modifying Incentives). Users expressed a need to reward good behavior while dissuading wasteful action. An individual’s values alone may not be enough to produce lasting change. It is therefore desirable to reward and punish users based on the amount of water they conserve or waste. To satisfy the user need for rewards, the interface must allow individuals to: • Earn and lose valued incentives in relation to acts of consumption • Educate users about water conservation using meaningful metaphors User Need for Convenience (Unencumbered Showering Experience). Individual perceptions of showering range from it being an unavoidable necessity to an enjoyable luxury; in attempting to promote water conservation, the design must avoid inconvenient interactions. To create a convenient user experience, users need to: • Relate information in a way as avoid significant interaction in the shower • Minimize unnecessary time in the shower 3.3 Design Requirements Careful consideration of user needs resulted in a basic set of requirements that were used to inform design alternatives. User needs and their corresponding design requirements are captured in Table 1.
664
M. Ravandi, J. Mok, and M. Chignell
Table 1. Determined design requirements necessary for satisfying different user needs towards achieving water conservation in the shower User Need Information
Context
Scenarios Monitor their showers using relevant metrics such as time spent, and volume consumed in real-time Learn how much water is being saved or over-utilized relative to a desired amount Understand water usage trends over differing periods of times such as time-of-day, weekly, monthly, and seasonal differences, etc. Understand the cause and effect of incremental water usage Identify the sources of highest consumption to allow for targeted or incremental adjustments
Control
Reward
Convenience
See the bigger picture, in terms of consumption relative to their household and community Maintain control over their shower settings, such as flow rate, temperature, and duration Set their own acceptable upper and lower tolerances for shower settings, relative to a goal Earn or lose user-valued incentives according to their water usage; reinforce positive behavior with rewards and admonish negative behavior with disincentives Relate information in such a way as to not require significant interactions while in the shower
Design Requirements Real-time graphical display of elapsed showering time and total water consumed Graphical display that emphasizes the magnitude of water consumed, over or under a relative goal Graphical display comparing usage trends of water consumed/saved over differing spans of time
Use a representative metaphor to demonstrate the value in saving the natural environment Allow for respective users to be uniquely identified and provide tailored tips and reminders during the shower, in real-time, to allow the user to modify their behavior Visually differentiate water consumption at the household and community levels Allow water settings to be usercontrolled rather than automated by the system Facilitate users to create their own action plans for water reduction, over a chosen period of time; incremental goal setting via system administration Create a sense of urgency in the user by rewarding (earn) or punishing (lose), based on the duration of their shower relative to a stated goal; these changes should be able to be represented in the system metaphor Minimize or remove user contact and allow for system administration to be carried out in a non-shower context such as on a computer or mobile device
4 Prototype Iterative scenario-based design sessions, followed by rapid development of prototype designs yielded several designs. Where in-shower scenarios take place, as part of the functioning of the system, the use of an LCD display in the shower is envisioned
Development of an Emotional Interface for Sustainable Water Consumption
665
while the out-of-shower scenario to administer the system would be facilitated through a computer interface outside the shower. Fig. 1. shows the various user-flows in the designed system for both in-shower and out-of-shower scenarios. The lightgray boxes under the left-hand column represent the various screens that a user would encounter within in-shower scenarios.
Fig. 1. Interaction scenarios. The light-gray boxes under the left-hand column represent the various screens that a user would encounter within in-shower scenarios. Dark-gray boxes under the right-hand column represent the various back-end screens a user would review in the out-ofshower context.
The following interface design shows the initial screen presented to a user upon entering the shower. As displayed in Fig. 2., users select their profile button to identify themselves and initiate tracking. Users can view the family’s progress-todate, as represented by a ‘pond’ ecosystem. Individual users and their creatures are differentiated by color.
666
M. Ravandi, J. Mok, and M. Chignell
Fig. 2. This initial screen is presented to a user upon entering the shower. The interface includes system initiation buttons for each user. The user with the highest liters saved, and therefore greatest diversity, is placed on top. The also interface represents a virtual ‘pond’ ecosystem showing total volume of water collected from all four users and all creatures earned by the household. On the right, a comparative progress bar captures the number of milestones achieved by each user. Upcoming milestones are shown as a ‘treasure chest’ that can be ‘unlocked’ as users increase the amount of water saved. The interface displays the volume of water saved and the length of time since the creation of the ecosystem.
Following initialization, and when showering begins, the interface in Fig. 3. is presented. This interface allows the user to monitor real-time water usage & shower duration. To provide added context to information presented, all data is shown relative to targets. Also, a pipe-and-valve metaphor is used to show connectivity to various family members and their respective water-reserves.
Development of an Emotional Interface for Sustainable Water Consumption
667
Fig. 3. This screen is displayed while showering to provide real-time feedback on water usage and shower duration. For added context, data is shown in relation to tracked targets. This interface includes a shower summary section to display the number and species of creatures earned and lost over the course of a shower. A central vessel shows the user’s personal aquarium. The purple-colored water, above the “0 L/0:00” guidepost represents the user’s allotted water for daily showers. The blue-colored water displays the level of water conserved. Creatures are shown in separate ‘bubbles’ to convey their individual water requirements. ‘Locked’ items allude to the potential for users to earn creature using a increasing amounts of liters conserved. A pipe-and-valve representation, connected to a shower-head, shows the water source (by color) and the source of water withdrawal.
Statistics on all family members are collected by the system and is accessible to members of the household outside of the shower. Fig. 4. shows one of three system functions available to users in the out-of-shower scenario. The reporting functionality allows users to compare usage statistics for different users, and reflects areas of improvement, individually and collectively.
668
M. Ravandi, J. Mok, and M. Chignell
Fig. 4. This is one of the three system functions that a user is able to select by using the out-ofshower interface. The reporting functionality allows users to view and compare usage statistics among users. Features of this screen include tabs for global navigation in the out-of-shower interface. Available data views include: At a Glance (currently shown), Daily, Weekly, Monthly and Year-To-Date summaries. The interface also captures each user’s relative contribution to the communal pool of water. The system also displays each user’s average showering time, the total volume of water saved and the count of creatures earned/lost over different periods of time.
5 Conclusion 5.1 Extending beyond the Shower Such an approach to water conservation can be applied to various domains. Where detailed reporting is available, ecologically representative metaphors can be extended to include all water management systems. The use of such an approach will be plausible as technology in the home converges to integrate toilets, faucets, dishwashers, laundry machines, etc.. Such an approach to water conservation can also be applied to other resources such as electricity and carbon based fuels. 5.2 Further Investigation Goal Setting. Consideration for user’s perception of collective efforts and their view of the fairness where goal setting is concerned was noted. Irresponsiveness to
Development of an Emotional Interface for Sustainable Water Consumption
669
improvement is of primary concern as it can dissuade the collective from active participation. Understanding drivers for user engagement will allow for more personalized designs to cater to individualized incentives. Complexity. Much discussion was devoted to representing all phases of water management process including replenishment phases of the life-cycle. Further inquiry would be needed to assess user’s reactions to a metaphor that conveys the total effort required to purify and process water (including procurement, processing, use, reprocessing, and re-introduction back into nature). Although adding aspects of realism would, on the one hand, increase transparency and promote understanding, it would conversely add layers of complexity to the metaphor and may diminish the simple intent of the system. Maintainicng User Engagement over the Long-Term. As user interacts with the interface over a long period of time, the question becomes how to make the interface engaging and scalable in order to maintain user interest and engagement. The concept of indefinite growth was discussed as a possible solution, using scalable ecosystems (i.e. lakes, seas, oceans), but more thought needs to be put into understanding how these interactions might occur. A deeper level of understanding is required.
References 1. ARohles, F.H., AKonz, S.A.: Showering behavior: implications for water and energy conservation. In: Energy Conservation, Consumption, & Utilization– Residential Buildings, United States, pp. 1063–1076 (1987) 2. Aronson, E.: Persuasion via self-justification: Large commitments for small rewards. In: Festinger, L. (ed.) Retrospection on social psychology, Oxford (1980) 3. Dickerson, C., Thibodeau, R., Aronson, E., Miller, D.: Using Cognitive Dissonance to Encourage Water Conservation. Journal of Applied Social Psychology 22, 841–852 (1992) 4. Wicker, A.: Attitudes versus actions: The relationship of verbal and overt behavioral responses to attitude objects. Journal of Social Issues 25, 41–78 (1969) 5. Flemming, S., Hilliard, A., Jamieson, G.: The Need for Human Factors in the Sustainability Domain. In: Human Factors and Ergonomics Society 52nd Annual Meeting (2008) 6. Arroyo, E., Bonanni, L., Selker, T.: Waterbot: exploring feedback and persuasive techniques at the sink. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Portland, Oregon, USA (2005) 7. Norman, D.: Emotional Design: People and Things. Donald Norman’s jnd.org (2004), http://www.jnd.org/dn.mss/emotional_desig.html 8. Reimann, R.: Personas, Goals, and Emotional Design, UX Matters (November 2005), http://www.uxmatters.com/MT/archives/000019.php 9. McCalley, L.T., Midden, C.J.H., Haagdorens, K.: Computing Systems for Household Energy Conservation: Consumer Response and Social Ecological Considerations. In: Proceedings of First International Workshop on Social Implications of Ubiquitous Computing (2005)
Influences of Telops on Television Audiences' Interpretation Hidetsugu Suto, Hiroshi Kawakami, and Osamu Katai Department of Computer Science and System Engineering, Muroran Institute of Technology, 27-1, Muzumotocho, Muroran-shi, Hokkaido, 050-8585, Japan [email protected] Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Kyoto 606-8501, Japan {kawakami, katai}@i.kyoto-u.ac.jp
Abstract. The influence of text information, known as “telops,” on the viewers of television programs is discussed. In recent television programs, textual information, i.e., captions and subtitles, is abundant. Production of a television program is facilitated by using telops, and therefore, the main reason for using this information is the producers' convenience. However, the effect on audiences cannot be disregarded when thinking about the influence of media on humans' lives. In this paper, channel theory and situation theory are introduced, and channel theory is expanded in order to represent the mental states and attitudes of an audience. Furthermore, the influence of telops is considered by using a scene of a quiz show as an example. Some assumptions are proposed based on the considerations, and experiments are carried out in order to verify the assumptions.
1 Introduction A lot of textual information, e.g., telops and subtitles, is used in television programs in Japan. In many cases, these messages request laughter and emphasize the features of the cast. Therefore, it can be said that this information is used for the convenience of program producers. However, when considering the strong impact of mass media on society, the side effects of using textual information for audiences cannot be ignored. The authors have investigated the influence of telops on focusing points and memory of a television audience by carrying out experiments [1, 2]. However, these studies did not include the influence on the audience's interpretations. In this paper, the influence of textual information on the mental states of an audience is discussed with channel theory [3], which is a mathematical framework used for modeling information flows. Kawakami et al. have expanded channel theory in order to represent a diverse interpretation [4]. In this paper, the expanded theory is used to describe an interpretation of an television program audience. Some scenes of a quiz show are represented by mathematical models using the theory, and the effects of telops are investigated. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 670–678, 2009. © Springer-Verlag Berlin Heidelberg 2009
Influences of Telops on Television Audiences' Interpretation
671
2 Channel Theory and Situation Theory 2.1 Intra-classification A classification: A =< tok(A), typ(A), modelsA > consists of a set tok(A) of objects to be classified, called “tokens of A,” a set typ(A) of objects used to classify the tokens, called “types of A,” and a binary relation modelsA between tok(A) and typ(A) indicating the types into which tokens are classified. Given a classification A, a pair < Γ , Δ > of subsets of typ(A) is called a “sequent of A.” A token a ∈ tok(A) satisfies < Γ , Δ > if a is of type α for ∀α ∈ Γ ; then a is of type β for ∃β ∈ Δ . If every token a ∈ A satisfies < Γ , Δ >; then < Γ , Δ > is called a “constraint” supported by A, and denoted as Γ vdashA Δ . A local logic: L= A, vdashL , N L consists of a classification A, a set vdashL of sequents of A called the constraints of L, and a set N L ⊆ tok(A) of tokens called the normal tokens of L, which satisfy all the constraints of L. L is sound if N L = tok(A), and L is complete if vdashL includes all the constraints supported by A. Given a classification A, a sound and complete local logic, called Log(A), is generated from A. 2.2 Inter-classification An infomorphism: is a pair of functions. Given two classifications A and B, an infomorphism from A to B written as A → B satisfies g(b) modelsA α iff b modelsB f( α ) for ∀α ∈ typ(A),∀b ∈ tok(B) , where f and g are whole-part relationships. An information channel: C= fi : Ai → C , where i ∈ I , is an index family of infomorphisms with a common codomain C called the “core of the channel.” I is an index set. 2.3 Situation Semantics Situation semantics [5] is one of the frameworks that explain what intentions are included in our utterance. In situation semantics, a speaker's mental states are classified into some cognition states, e.g., see, know, believe, and assert. Furthermore, utterances made with “see” (“see reports”) are classified into a primary see report and secondary see report. The primary see report reports a direct acquisition of knowledge via perception, and the secondary report reports an acquisition of knowledge based on perception which is supplemented with what should be known based on what one sees. For instance, thinking about the following conversation; A: “Is Mr. C still in the room?” B: “The room was empty.” In this case, the content of the primary see report by B is the fact that the room is empty. On the contrary, the content of the secondary see report is the belief by B that C has already gone home.
672
H. Suto, H. Kawakami, and O. Katai
When thinking about a situation in which an audience is watching a weather forecast on television, his/her content of the primary see report is “it may rain tomorrow,” and his/her content of the secondary see report is “I should take an umbrella tomorrow.”
3 An Extension of Channel Theory This section gives specific interpretations to the terms used in channel theory for modeling diverse interpretation, and then extends the theory. 3.1 Specific Interpretation of Classification and Soundness Assuming communication among humans, we interpret a classification A as a representation of the cognitive system of a person A. Each element of tok(A) represents an instance that is perceived by A, and that of typ(A) represents a cognition of A. The next assumption is that each cognitive system A always tries to maintain the soundness of “generated local logic Log(A).” Each modelsA shows a cognitive rule of A, and each abnormal token indicates a matter that cannot be understood by A in the light of modelsA . The existence of abnormal tokens cause unsoundness of Log(A), which can be interpreted as a confused state of A. 3.2 Indetermination of modelsA Originally, modelsA was defined as a binary relation. This paper extends the definition and allows an indeterminate state. This extension enables the following discussions. Primary See Reports and Secondary See Reports. In the case where a classification A represents a cognitive system, the state, where t1 modelsA τ1 ( t1 ∈ tok(A) , and τ1 ∈ typ(A) ), corresponds to a primary see report of situation semantics. Basically, a subject believes t1 modelsA τ1 from direct information. However, the subject is also influenced by secondary see reports, i.e., facts that should be known based on direct information with primary see reports. This fact implies that A has to have a state where t1 modelsA τ1 is not determined, and it is inferred by t1 modelsA ωn ( ωn ∈ typ(A) ) and ωn vdash τ1 . This extension enables the modeling transition of a subject's mental states caused by secondary see reports. Supplementing New Type and Token. Acquiring a new concept and encountering an unknown matter correspond to supplementing a new type and a token, respectively. Generally, humans do not recall all details, but only the relevant concepts and/or previous events when they acquire new concepts and/or experiences. This fact can be implemented by indetermination of modelsA . 3.3 Relative Relation among Tokens Establishment of each constraint ( ∈ vdashA ) of a classification A varies depending on each token ( ∈ tok(A) ). In other words, constraints cannot represent the relative relationship among multiple tokens. It is as if the number of universal variables involved
Influences of Telops on Television Audiences' Interpretation
673
in wff of the predicate logic is restricted to one. For example, the constraint equivalent to the following wff cannot be represented by a constraint: (∀x,∀y){clemency(x)∧aggressive(y) → fast(x,y)}
Given
this
fact,
this
paper
defines
an
“accompanying
(1) classification
R =< tok(R), typ(R), vdashR > ” for a normal classification A =< tok(A), typ(A), modelsA >
as follows: tok(R) = tok(A) × tok(A) typ(R) ⊇ {l,r} × typ(A) ai modelsA α j → {(ai ,∗) modelsR (l,α j )}∧{(∗,ai ) modelsR (r,α j )}
where ∀ai ∈ tok(A), ∀a j ∈ typ(A) . If tok(A) = {gentoo, rockhopper} and typ(A) = {clemency, aggressive} then tok(R) = {(rockhopper, rockhopper), (rockhopper, gentoo), (gentoo, rockhopper), (gentoo, gentoo) }, typ(R) ={(l, clemency), (l, aggressive),(r, clemency), (r, aggressive), "} Supplementing a relative relation fast to typ(R), classifications A and R are represented by Chu maps [6] as follows:
here 1 denotes the establishment of “models”. In this case, the constraint equivalent to the above mentioned wff is written as; {(l, clemency), (r, aggressive)} modelsR {fast}, but we cannot expect natural infomorphism between A and R [4].
4 Influence of Telops on Audiences' Mental State In this section, the influence of telops in a television program on audiences' mental states is discussed by using the extended channel theory. 4.1 Mental States of an Observer Assuming a situation where answerer 1 ( P1 ) has given a wrong answer, and answerer 2 ( P2 ) has given the correct answer in a quiz show, an audience sees (primary SEE) facts indicating that − P1 has made a mistake, − P2 has succeeded
674
H. Suto, H. Kawakami, and O. Katai
and believes those facts. Here, the mental states of the observer are represented with classification R shown as follows: tok(R) = { P1 , P2 }, typ(R) = {correct, error}, P1 modelsR error, P2 modelsR correct. In this case, if the observer feels “I can answer the question,” a token self is added that represents him/herself and is classified as “self models correct.” The feeling of “can answer” depends on a feeling of knowing [8], but has not actually been verified in tests. Accordingly, it is a state in which the observer believes “I can answer.” In this case, the accompanying classification Rr for R is represented by a Chu map as follow:
Generally, media have an agenda-setting function [9] which defines “what a problem is in this situation”. In consequence of situations where more that one person tries to answer questions, a type “foolish,” which indicates a relative evaluation between answerers, is added. In this case, an observer believes two facts: that P1 is more foolish than P2 ; and P1 is more foolish than her/himself. On the other hand, when an observer does not have a feeling of knowing, Rr is represented by a Chu map as follows:
In this case, the audience does not understand or does not think that they can answer the question. 4.2 Influence of Telops on Observers A background in which emphatic telops display wrong answers is a producer's interpretation that “average people do not give such answers.” If classification S represents a producer's interpretation and Sr represents the accompanying classification of S, then the following constraints in Sr are translated to Rr viewing the telop. {(r, average)} modelsS r (r, correct)
(2)
Influences of Telops on Television Audiences' Interpretation
{(l, average)} modelsS r (l, correct)
675
(3)
As a result, the type “average” is added to Rr . We have a tendency to understand society via several media, and to sympathize with the majority opinion and suppress our own assertions in order to avoid isolation [9]. In this case, self models average holds because of the consideration “I am average.” As a result, Rr transitions as follows:
Here, the audiences' mental states obey the constraints of {(l, error), (r, correct), (r, average)} modelsR (l, isFoolish)
(4)
which means “the person who makes a mistake with a typical question is foolish.” The type “isFoolish” indicates absolute relationships between tokens. Accordingly, Rr transitions as follows:
When the concept of “average” is introduced by producers, audiences begin to think about the absolute relationships. Furthermore, “self models correct” and “ ( P1 , self ) models foolish” are derived from constraint (4). These classifications mean “I must answer the question, and answerer 1 is more foolish than me.” Hence, even for audiences who did not have a feeling of knowing for the problem until the telop was displayed, the possibility arises that they will think, “I should have been able to answer that correctly.” 4.3 Experiments Simple experiments were carried out to investigate the discussion in the previous section. The movies A and A' were used in the experiments: Both movies included the same parts of a quiz show program, and some telops were appended to A'.
676
H. Suto, H. Kawakami, and O. Katai
The subjects were 14 physically and mentally healthy university students. The experimental procedure was as follows. 1. The subjects were divided into two groups, G1 and G2. 2. The subjects were led into the room one by one. 3. A was presented to the member of G1, and A' was presented to the member of G2. 4. The subjects answered the questionnaire shown in Fig. 1.
Fig. 1. Part of the questionnaire sheet
Fig. 2. Results of experiments
Influences of Telops on Television Audiences' Interpretation
677
The results of the experiments are shown in Fig. 2. As a result of the t-test, it is clear that there is a significant difference between the “difficult vs. easy” data of A and A' in the case of Q1 and Q2 (p<.05). Slight differences are also seen in “smart vs. foolish” in the case of Q1 (p<.10). These results agree with our assumptions that if a telop is displayed when an answerer makes a mistake, − audiences feel that the question is easier than in the case when a telop is not displayed. − audiences recognize that the answerer is inferior to themselves. In the case of Q2, no differences are seen in both “difficult vs. easy” and “smart vs. foolish.” The reason for this is thought to be that question Q2 might be too difficult for the subjects.
5 Conclusion In this paper, audiences' mental states of a quiz show were discussed in the context of extended channel theory. We proposed an assumption that if a telop was displayed when an answerer made a mistake, audiences would recognize that the answerer was inferior to the average person. We also proposed an assumption that telops affect the audiences' feeling of knowing for a question. Experiments were carried out in order to verify these assumptions, and the results largely agree with our assumptions. The authors plan to conduct more experiments with easier questions in the future. Because an observer's recognition is classified as “I must solve the question,” his/her recognition of the difficulty of a question may also change. That is to say, the assumptions “The question might be very difficult” and “ P2 might be excellent” are denied, and audiences may fall into a uniform interpretation that “ P1 made a mistake even though the question was easy.” The authors expect that this situation can be explained by adding a “problem” into tok(R). Using telops is an easy solution for program providers. It is also easy for audiences because they can understand the producers' intentions without having to think deeply. However, using telops reduces the diversity of interpretation, and audiences may lapse into having a uniform interpretation. To address this problem, it is expected that an analysis from the aspect of the Benefit of Inconvenience [10] will be effective. According to the Benefit of Inconvenience, conserving labor will not necessarily lead to the best design solution. We plan to analyze the relationship between using telops and the Benefit of Inconvenience in our future work. Acknowledgements. This work was supported by a Grant-in-Aid for Scientific Research by the JSPS (no. 20500220). We would like to thank the reviewers for their helpful comments.
References 1. Suto, H., Yoshiguchi, Y.: A mathematical analysis of influence of telops on the receivers. In: Proc. 35th SICE Symposium on Intelligent System, pp. 103–108 (2008) (in Japanese) 2. Suto, H.: Influences of Telops on the Receivers’ Interpretation. Informatics 1-2, 13–20 (2008) (in Japanese)
678
H. Suto, H. Kawakami, and O. Katai
3. Barwise, J., Seligman, J.: Information flow. Cambridge University Pr., Cambridge (1997) 4. Kawakami, H., Suto, H., Handa, H., Katai, O., Shiose, T.: Analyzing Diverse Interpretation as Benefit of Inconvenience. In: Proc. Int. Sympo. on Symbiotic Nuclear Power Systems for 21st Centry, vol. 2, pp. 75–81 (2008) ISBN 978-7-81133-306-0 5. Barwise, J., Perry, J.: Situations and Attitudes. MIT Press, Cambridge (1983) 6. Gupta, V.: Chu Spaces: A Model of Concurrency, Ph. D Thesis, Comp. Sci. Dept., Stanford Univ. (1994) 7. Kawakami, H., Suto, H., Handa, H., Katai, O., Shiose, T.: Analyzing Diverse Interpretation as Benefit of Inconvenience. In: Proc. 2006 International Symposium on Humanized Systems, pp. 154–157 (2006) 8. Hart, J.T.: Memory and feeling-of-knowing experience. Journal of Education Psychology 56, 208–216 (1965) 9. Yoshimi, S.: Invitation to Media Cultural Studies. Yushikaku publishing Co., Ltd. (2004) (in Japanese) 10. Kawakami, H., Suto, H., Handa, H., Shiose, T., Katai, O.: A Way for Designing Artifacts based on Profit of Inconvenience. In: Proc. 11th Asia-Pacific Workshop on Intelligent and Evolutionary Systems, CD-ROM (2007)
Extracting High-Order Aesthetic and Affective Components from Composer's Writings Akifumi Tokosumi and Hajime Murai Department of Value and Decision Socience, Tokyo Institute of Technology, 2-12-1 Ookayama Meguro-ku Tokyo 152-8552 Japan {akt, h_murai}@valdes.titech.ac.jp
Abstract. A digital humanities technique for the network analysis of words with a text is applied to capture the subtle and sensitive contents of essays written by a contemporary composer of classical music. Based on analysis findings, the possible contributions of digital humanities to affective technology are discussed. This paper also provides a systematic view of digital humanities and affective technology. Keywords: high-order cognition, emotion, music, art, network analysis, digital humanities.
1 Introduction: Affective Technology Meets Digital Humanities Texts are the most essential research materials for many areas of humanities. Emotions and sensibilities are sometimes regarded as being fundamental conceptual devices for art-oriented areas of humanities, including literary studies and musicology. Despite their central importance, there have been few systematic treatments of texts and their affective contents within traditional humanities. It would seem that higher-order aesthetic and affective contents can only be analyzed by the subtle sensibilities possessed by humanities researchers. However, collaborations of affective technology and digital humanities may change that situation. From a methodological perspective, the area of digital humanities can be defined by its set of techniques for handling materials within traditional humanities. As shown in Fig. 1, we have been attempting to develop new techniques to process literary and religious texts; namely, a knowledge-based approach to model reader responses [1], and a network-oriented approach to capture the cognitive components of religious thoughts [2]. The same research strategy can be applied to texts that contain affective contents as their essential components. In this paper, we briefly summarize how we have approached the texts written by a prominent contemporary composer of classical music and discuss its relevance to the enterprise of affective technology. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 679–682, 2009. © Springer-Verlag Berlin Heidelberg 2009
680
A. Tokosumi and H. Murai
Fig. 1. Affective technology, digital humanities, and text genres
2 Extracting Cognitive Components from Texts Expecting a full range of cognitive components from conceptual, evaluative, affective, to aesthetic, we have selected Toru Takemitu’s essays as our research target. Takemitu was one of the most important Japanese composers of the twentieth century, and he left behind a wealth of writings on music and other topics, as well as a wide range of musical works from concert pieces to film soundtrack music. While our larger research project focuses on the complete collection of his writings, the present paper focuses on two essays to demonstrate how the different natures of the contents can be accurately extracted. Two analysis methods were applied to the texts, as described below. Our previous report [3] describes in detail our findings concerning one of the essays. 2.1 Network Analysis The purpose of the network analysis was to identify keywords and their surrounding words within the text. After applying a morphological analysis parser to the text, the network creation software Polaris [4] was employed to create a network for the extracted keywords with the KeyGraph algorithm. The nodes of the network are nouns from the text, because nouns are the object for the cognitive action in the content analysis. The edges of the network represent words that co-occur with the key words in the text. The decision to use co-occurrence words is that these words are used when explaining similar concepts. Fig. 2 presents the overall results from a 19,551 word essay entitled Sound and Silence, as Measurable Each Other, that is primarily reflections on music in general. There are clearly several node clusters and a number of
Extracting High-Order Aesthetic and Affective Components
681
Fig. 2. A keyword network of Sound and Silence, as Measurable Each Other (1971)
Fig. 3. A keyword network of Citation of Dreams (1984)
measures of centrality for the network indicate that the multi-centrality of the keyword space. In contrast, Fig. 3 indicates the singular-centered nature of the keyword space extracted from a 15,547 word essay entitled Citation of Dreams, which is a collection of writings about films. Despite Takemitsu’s admitted affection for films, his cognitive space for films is clearly far simpler than the cognitive space for music. 2.2 Ontology and Content Analysis Based on the keywords obtained from the network analyses, we constructed a word ontology for Takemitsu’s music. The text corpus for the essays was then parsed in
682
A. Tokosumi and H. Murai
order to carry out a content analysis at the semantic level. The aim of the content analysis was to extract the structures within the concepts employed by Takemitsu in talking about music. The interesting results include the findings that a) his aesthetic vocabulary is strongly associated to an abstract thinking vocabulary, and b) more ordinary emotion words tend to be associated with lower level music entities (pop music). These findings seem to substantiate the layered model of affective processes proposed in our previous report [1].
3 Conclusions Two digital humanities techniques demonstrate that it is possible to automatize text analysis to levels that are comparable to those of more traditional humanities. In the Affective Technology towards Affective Society session, we will argue for the following points;
・ ・ ・
Network analysis of the words within a text can provide a better basis for text analysis. Ontology and content analysis can provide different perspectives from text analysis. The modeling of affective process and text analysis may be mutually beneficial.
References 1. Tokosumi, A.: A Computational Literary Theory: The Ultimate Products of the Brain/Mind Machine. In: Kitamura, T. (ed.) What should be Computed to Understand and Model Brain Function?, pp. 43–51. World Scientific Pub., Singapore (2001) 2. Murai, H., Tokosumi, A.: A Network Representation of Hermeneutics Based on Co-citation Analysis. WSEAS Transactions of Information Science and Applications 1(6), 1513–1517 (2004) 3. Aoshima, Y., Tokosumi, A.: Extracting Musical Concepts from Written Texts: A Case Study of Toru Takemitsu. In: Proceedings of the International Conference on Kansei Engineering and Emotion Research, CD-ROM B-10, Sapporo (2007) 4. Polaris, http://www.chokkan.org/software/polaris/
Affective Technology, Affective Management, towards Affective Society Hiroyuki Umemuro Department of Industrial Engineering and Management, Tokyo Institute of Technology, Tokyo, 152-8552 Japan [email protected]
Abstract. In this paper, the term affective is defined as “being capable to evoke affects in people’s mind” or “being capable to deliberate affects to be evoked in people’s mind”. This paper discusses potential impact of concept of affectiveness on development of technological products and services, management, and value systems of societies. Keywords: Affect, emotion, feeling, management, mood, quality, usability.
1 Introduction: Beyond Usability 1.1 Contributions and Limits of Human Factors In the later twentieth century, products and services using technologies continued to get complex, and thus, difficult to interact with. The more people have to adapt to technologies, the more errors they tend to make, and the more likely they tend to forget how to use. One of the goals of human factors and ergonomics is to make products, systems, and/or environments safe and usable, in other words, to enhance usability. For this purpose, human factors researchers and practitioners have made efforts to understand characteristics of potential users and to reflect them on design. As pioneers including Norman [1] and Nielsen [2] had established and propagated the concept of usability and methodology of usability engineering, the concept and importance of usability had been widely recognized in design communities. As the results of the efforts of industries during the last couple of decades of the twentieth century to enhance usability, most of the products and services that can be seen in markets today have high usability. Except for those targeted to special users and special purposes, products and services with low usability can never be successful in the market. That means, usability is now considered as one of the attributes that every product or service must have; usability alone no longer makes a product or a service attractive and distinguishable from other competitors any more [3]. Therefore from the end of the twentieth century to this century, a new idea has emerged across a broad range of fields that design is just not enough to be simple and usable, and that it is now essential to design products and services that users J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 683–692, 2009. © Springer-Verlag Berlin Heidelberg 2009
684
H. Umemuro
themselves want to use and to continue to use [4]. Thus researchers started to seek for what are necessary in addition to traditional usability. 1.2 Beyond Usability In his recent book Emotional Design [5], Norman argued that design had too much emphasized on usability aspect of the products, i.e. to enable users to accomplish their own goals safely an efficiently. Norman also discussed significance of appeal of products to user’s emotions. Some researchers focused on factors such as fun and pleasure. Csikszentmihalyi [6] defined flow as a status where a person is completely absorbed in an activity, forgetting time. According to Csikszentmihalyi, flow is the happiest and most productive moment in people’s life. In human factors field, Fulton [7] claimed the importance of introducing the concept of pleasure into human factors approach. Jordan [3] categorized pleasure into four categories: physio-, psycho-, socio-, and ideo-pleasures, and proposed approaches to design products to evoke these pleasure categories as pleasure-based approach of human factors. Hancock and colleagues [8] coined hedonomics as design and scientific studies that aim pleasurable interaction between human and technology. They categorized the goals of human factors into five hierarchical layers: safety, functionality, usability, pleasurable experience, and individualization. They argued that conventional ergonomics deals with safety to a part of usability, and usability, pleasurable experience, and individualization should be pursued by hedonomics. Furthermore, an emerging interdisciplinary field that studies fun is called funology, which attracts researchers from engineering, human factors, philosophy, history, education, psychology, as well as practitioners [9]. On the other hand, scientific studies on aesthetics have also attracted more attentions. There have been more evidences that products that were designed aesthetically beautiful can not only obtain higher subjective evaluation but also improve actual task performances and perceived usability [10], [11]. There are also new evidences that aesthetics is a major factor that determines users’ engagements with technologies [12]. At the same time, as symbolized by media arts, boundary between technology and art has been blurred illimitably [13], [14]. Now aesthetic design and engineering design have been considered as one inseparable activity. One common idea among the series of studies above is that providing emotionally or affectively good experiences such like pleasure, fun, and aesthetics might lead users to have affection and continue to use technological products and services.
2 Affect and Affection 2.1 Cognition and Affect: Rationality and Irrationality of Human In the history of human evolution, affects had developed much earlier than rational thinking. In many situations, affect can be responsible to the information (stimulus) from neurosensory system and able to send signals (responses) to body system for appropriate reactions, much faster than rational thinking. This mechanism has been important for humans to survive.
Affective Technology, Affective Management, towards Affective Society
685
However, modern Western philosophies have tended to recognize affects as primitive and irrational aspects of human, and thus, emphasized on rational and logical thinking as the characteristics that differentiate human from other animals. In the trend towards “globalization” in the last century, rationalism and market-based principles in which efficiency and cost reduction are measured in money became the fundamental rule to participate in competition in the global market. This rationalistic trend in modern thoughts is also apparent in emergence of cognitivism in psychology. Cognitivism is the approach to recognize human mental activity, or cognitive process, as information processing, and to try to model human mental activities as information processing models. Irrational behaviors of human are treated as exceptions from rational information processing and called heuristics or biases. This approach became a major stream called cognitive science, and has given significant influences on various fields even outside of psychology. Psychological studies on affects, on the other hand, have long been laboratory studies and field studies on human emotional responses, understanding, and expressions. However, there were some major developments in this field in the end of the twentieth century. One was the emergence of the modeling approach that had been successful in cognitivism, which tried to model human affective processes to build computational models of affects. Another was the development of technologies for direct observation of brain activities such as positron-emission tomography (PET) and functional magnetic resonance imaging (fMRI). These technologies enabled high precision observations of brain cells, and resulted in significant advance in neuroscience. These two approaches, analytic approach with models and empirical approach based on direct observation of human brain, provided broader ways for scientific research on affects. In addition, demand for affective studies was also claimed from the fields of social sciences. Goleman [15] pointed out that human emotion could be a significant factor for various modern social problems and argued the importance of people’s ability to understand and control their own emotion. In the marketing field, consumer psychology is now establishing a new research paradigm called neuro-marketing, with the help of fMRI technology. As discussed above, since the end of twentieth century to the beginning of this century, studies on affects have become a major multidisciplinary stream. Fujita and colleagues [16] coined affective science as a scientific research field on human affects, rather irrational aspects of human activities, contrasting to conventional cognitive science that studies human cognition, subjecting rational aspects of human. Fujita claimed that affective science is not to replace conventional cognitive science; affective science and cognitive science are to focus on two different aspects of human, and influence each other. Furthermore, as discussed above, emerging interdisciplinary studies on fun and pleasure are not limited within psychology field. 2.2 Affect and Affectiveness In psychology, the term affect is used to represent human affects in general, including emotion, feeling, and mood. Affects include both positive and negative status. However, the English word affection is usually used only for positive meanings such as love and gentle care.
686
H. Umemuro
In this paper, the term affective is defined as “being capable to evoke affects in people’s mind” or “being capable to deliberate affects to be evoked in people’s mind”. For example, affective products might mean “products that are capable to evoke appropriate affects in users” or “products that were designed carefully considering possible affects users might have.” In the same way, the term affectiveness is used for the meanings of “how capable to evoke affect in people’s mind,” or “to what extent affects that people might have are thoroughly considered.” As stated above, affects include both positive and negative status. Thus an affective product may evoke positive or negative affects in users. What are desired in many situations in the world should be positively affective products, or products that evoke positive affects and avoid negative affects among users. In some specific situations, it might be necessary to be negatively affective. A good example can be roller coasters. By providing negative affect of fear, roller coasters may give riders higher positive affects such as exhilaration or accomplishment. In general, however, careless misuse of negative affects may result in serious damage in human relationships and social climate. Thus being negatively affective, at its heart, requires thorough understanding of human affects and advanced skills. Affect is not a simple one-dimensional characteristic, such as rational-irrational or cognitive-affect. Firstly, it is necessary to understand the multi-layered nature of affects, which consists of at least two layers: basic emotion including fear and anger, and higher affective responses based on individual memories and value systems. Norman [5] proposed three-level model of human information processing; in addition to behavioral level in which rational cognitive information processing is conducted, there are also lower visceral level that is responsible to instinctive responses, and higher reflective level that is meta-cognition based on human individual strategies and value systems. These three levels are concurrently working, influencing each other. Basic emotions described above are corresponding to the visceral level, while higher affective responses are considered to be processed in the reflective level. Secondly, it is also important to understand multi-dimensionality of affects. Many psychologists agree that there exist six basic emotions: anger, fear, disgust, sad, happiness, and surprise. Furthermore, it is also considered that these basic emotions can be mapped onto a two-dimensional space that is spanned by valence and arousal. There are at least five modalities of sensory stimulus that evoke basic emotions. Individual factors that may relate to higher affective responses may include value systems, memories, experiences, generations, cohorts, social groups, cultures, and religions. As seen above, affective responses and their causes should be perceived in a multi-dimensional way.
3 Affective Technology 3.1 Perspectives on Affectiveness Research Idea of designing technological products and services to be affective is not very new. In Europe, there has been a long tradition of affective design, or designing artifacts to evoke specific affects, in many cases positive ones, especially in industrial design field.
Affective Technology, Affective Management, towards Affective Society
687
One of pioneering works of research about relationship between technology and affect might be a series of work called as affective computing [17]. In 1990s, however, major focuses of affective computing research had been how to let computers to understand, express, and have affects. It was not until this century that research focus started to shift to the affective responses computers might evoke among users. This section tries to organize various researches that have the common viewpoint of how technological products and services might evoke affects among people, both ongoing and supposed to be pursued in near future, including the new trends in affective computing noted above. Research topics are categorized into five groups and discussed on their research significance and research questions. However, these five categories are neither definitive nor exclusive; some of research topics may be categorized into more than one category. 3.2 Affective Technology Today, as discussed above, products should not only be excellent in the conventional aspects of multi-functionality and usability any more, but also be those that users themselves want to use and continue to use: products to make owners pleased and proud of their owning, products that are comfortable and enjoyable in use, and/or products that provide remarkable affective experience such as excitement and deep satisfaction. Such technological products and services can be affective technologies. There are a number of questions to be answered in order to create such affective technologies. Firstly, in what situations or conditions do people experience affective experiences such as fun or pleasure in the context of technology usage? Secondly, what factors of technological products and services might evoke affects? Furthermore, as discussed in section 2, some of affective responses, especially higher ones, are expected to vary across individuals significantly. Thus individual differences in affective responses and its consideration in design should also be studied. 3.3 Affective Quality What are the factors to make technological products and services affective? Various qualities built into products and services as a whole provide affective experiences: color and shape, material and finish, weight and balance, softness, torque and click of movement, sound, lightness and readability, latency, information provided to users, efficiency and comfort of task, temperature and smell, and so on. Various operational activities throughout whole organizations may contribute to customer’s affective experiences: not only aesthetic design, but also production technology, acquisition, information design, usability engineering, marketing and advertisement that form anticipation among users before they actually see and get products and services, and various services to enhance satisfaction after customers have obtained them. All of these kinds of qualities of products or services that contribute to people’s affective experiences are called affective quality [18]. As it is generally not easy to provide objective standards to these affective qualities, they are difficult to be numerically measured, and thus methodologies to design these qualities are often empirical in practice and still not systematically
688
H. Umemuro
established. In addition, as discussed above, affective qualities are created not only by design but also as results of various operations of whole organizations, thus it is not easy to understand affective quality comprehensively. The constructions of affective quality, as well as organizational activities and methodologies to produce affective quality, should be investigated. 3.4 Affective Design There has been a long tradition of researches on aesthetic and attractive design, particularly in Europe. Those designs, including not only ones aesthetically beautiful, but also those to give a sense of wonder, sensual or vibrant ones, and those make their owners proud of them, may give a variation of affective experiences to people who own or see them. People irresistibly pay much money accidentally for these excellent designs, or become tolerant even if they have poor usability [5]. Such affective designs are not limited only within the category of fine arts, but also seen in everyday things such as kitchen ware and stationary. What are essences of such affective designs? For a long time, they have been considered as the territory of arts, or based on quite personal attributes of product designers such as their own sensitivity, skills or talents. However, now that the boundary between aesthetic design and engineering design is blurred, there are emerging researches to systematically investigate the fundamental elements of those fine arts. 3.5 Affective Communication Researches on communication technology have mainly focused on how to transmit as much information as possible to remote sites reliably. In other words, their efforts have been made towards conveying as realistic and as high-quality information as possible to remote counterparts. As a result, in today’s workplaces, videoconference systems using high-speed lines and high-definition imaging technology are conveying realistic vision of your colleagues on the other side of the earth. However, when you look at home settings, is the communication technology today connecting our minds with our loved ones? For example, are cell phones and e-mails nowadays connecting hearts of family members living far apart? If you introduce a high-definition videoconference system between households of you and your remote family members, can you feel stronger bond? Communication systems that connect hearts of people far apart and make them warm-hearted feeling emotional bonds with counterparts can be called affective communication. Those kinds of communication might not be those that convey true and accurate information of users, such as videoconference systems in workplaces. Then what is the difference between communication systems that can make people smile and those in workplaces? Factors essential for affective communication should be clarified in future study, and actual systems realizing those factors should also be proposed in order to demonstrate the significance of this idea in our life. 3.6 Affective Service Services, especially that human provides to human, are different in nature from the cases discussed above where human interacts with technology. The difference is that
Affective Technology, Affective Management, towards Affective Society
689
both provider and receiver of a service are human, and that affects of the people on the both sides should be considered. It would be ideal if both providers and receivers of a service feel happiness from the time they started a service until the end of it. Such ideal form of services can be called affective services. However in reality, in broad types of jobs centered in customer-care, it is often required for workers to play or pretend particular emotions that may be different from their own but are necessary to do the job. Hochschild [19] called this kind of jobs as emotional labor. This type of selling one’s emotions off in pieces often demands heavily on workers’ mental health. They may feel stressed, and if they failed to cope with it, they may suffer from serious mental illness such as burnout syndrome. In order to realize the ideal form of services described above, it is necessary to study on multiple issues including how services should be designed, what kinds of customer experiences should be provided, as well as how providers should attend customers and what kind of management is necessary to protect workers from hard feelings.
4 Affective Management 4.1 Affective Management It has been widely believed, and actually practiced, that decision-making in management should be done in principle based on objective measurements such as sales, costs, benefits, and efficiency. In recent years, however, there emerged a new idea that such numerical indices may not be sufficient as a basis of decision. For example, if a company wants to develop a new product that evokes deep affective experiences among users, it is necessary to deliberately build high design, quality materials, skilled finish and tuned movement into the product, usually resulting in additional costs. If the management board emphasized reduction of costs, these elaborations of quality might be replaced with easier and cheaper ways. On the other hand, if the management board recognized that the affective experiences these qualities might bring to users would add much value to the product, these elaborations would never be the target of cost-reduction, and could even be given higher priorities. The example above illustrates the difference in affectiveness of management, or whether the management takes into considerations “what affect this decision may result in customer’s mind,” and if it gives higher priority to this value system. In other words, a management may clearly recognize the importance of factors that appeal to people’s affect such as aesthetics and pleasure, give higher priority to them, and do not allow sacrificing them. Such management that emphasize the criteria that “if this decision affective or not” in addition to the conventional numerical indices may be called as affective management. The concept of affective management is neither neglecting nor thinking little of conventional numerical and rational indices. This concept is to value affectiveness equally to them. In practice, however, it is very difficult to provide common numerical measure of affectiveness that can be directly compared with the conventional numerical indices. As shown in the example above, it is not easy to convert affectiveness into money amount. If there were already precedent products of
690
H. Umemuro
competitors in the market, such as the case of luxury automobiles, it may be possible to estimate the money amount of affective quality value of your product that is planned to get into the market in the future. However, the more innovative the new product is, the harder it is to estimate its affective value in advance. In order to promote the concept of affective management, it is urgently needed to research and develop these quantitative measurements of affectiveness. It is not limited to affects of customers that management board has to deliberate. For all stakeholders including shareholders, employers, business partners, and society, managements need to consider on what affective experience their managerial decisions may give those people, what affective experience they should provide, and what priority they should give to these issues comparing with other criteria. This is not a very new idea. A number of excellent managements have known, based on their sensibility and experiences, what affective effect of their decisions and behaviors might have on whom, and whether they should dare to do them or not. However, in many times this skill existed as implicit knowledge of excellent managements, and never be the one everyone can own. Affective management is one of the important values that have firmly existed among Japanese managements, which have not been explicitly claimed and thus might have remained in the shadows of “rational” decision making. 4.2 Affective Organization Affective management is not the issue only for the top managements. In order to practice the organization’s philosophy, principles, and strategies effectively, they should be shared among all members of the organization, including middle managements and employees in the fields. If the top managements emphasize affectiveness in their operations, this value system should be shared throughout the organization. In affective management, as discussed above, potential affective responses on all stakeholders should be taken into considerations. Among them, the people who are considered to be more important comparable to customers are employees. Most of the managerial issues inside organizations have been discussed focusing on productivity and efficiency. On the other hand, issues such as climate of workplaces have generally got little attention in such a way that it is always better to maintain a good climate in workplace, though it was actually given lower priority than productivity. Recent researches, however, have shown a number of new evidences suggesting that affective factors in workplaces have influence on productivity and creativity (e.g. [20], [21]). In the near future when this fact is widely recognized, whether workers in the workplace can have positive affect may be another important criteria to evaluate workplaces in addition to quantitative efficiency and productivity. Then managers would be required to maintain good affective climate in their workplace as one of their management skills. It is the emotional labor discussed in section 3 that managers have to pay particular attention to positive affects of employees. The emotional labor often demands heavily on workers’ mental health, and occasionally they may suffer from serious mental illness. In addition to mental cares such as coping, managerial supports in the workplace is said to be particularly important in order to take care of these issues.
Affective Technology, Affective Management, towards Affective Society
691
5 Towards Affective Society In the new era when the concept of affectiveness have penetrated among managements as discussed in section 4, affectiveness will also be called into question as a qualification of a manager as a person; ability to deliberate possible emotional responses in other’s mind. It would also be true for individuals in general. Affectiveness may also be one of the evaluation criteria at recruiting or personnel evaluations. In that era, it will be quite usual for people to think about their own affectiveness. That means, idea of “to think about other’s affect” or “to behave with deliberating possible impacts on other’s emotion” are widely accepted as an essential value system in the society. Such a society can be called as an affective society. Again, this idea is not really new. This can be a reflection on an old issue that Goleman [15] warned more than a decade ago, but has been buried in the shadows of the trends of market fundamentalism and cost-reductionism and has not well reflected by people. In an affective society, people will be required to grow out of the over-simplified codes of conduct such as they can do anything as long as it is not regulated by law or berated by others, or they should pay efforts for visible results objectively evaluated but not for elusive matters such as human minds. However, it goes without saying that such affective society is the society that is gentle to people, with much less stress, and comfortable and peaceful place to live.
References 1. Norman, D.A.: Psychology of Everyday Things. Basic Books, New York (1988) 2. Nielsen, J.: Usability Engineering. Moragan Kaufamann, San Diego (1993) 3. Jordan, P.W.: Designing Pleasurable Products: An Introduction to the New Human Factors. Taylor and Francis, London (2000) 4. Carroll, J.M.: Beyond Fun. Interactions 11(5), 38–40 (2004) 5. Norman, D.A.: Emotional Design: Why We Love (Or Hate) Everyday Things. Basic Books, New York (2004) 6. Csikszentmihalyi, M.: Flow: The psychology of optimal experience. Harper & Row, New York (1990) 7. Fulton, J.: Physiology and Design. American Center for Design Journal 7(1), 7–15 (1993) 8. Hancock, P.A., Pepe, A.A., Murphy, L.L.: Hedonomics: The Power of Positive and Pleasurable Ergonomics. Ergonomics in Design 13(1), 8–14 (2005) 9. Blythe, M.A., Overbeeke, K., Monk, A.F., Wright, P.C. (eds.): Funology: From Usability to Enjoyment. Kluwer, Dordrecht (2004) 10. Kurosu, M., Kashimura, K.: Apparent Usability vs. Inherent Usability: Experimental Analysis on the Determinants of the Apparent Usability. In: Conference Companion on Human Factors in Computing Systems, Denver, CO, pp. 292–293 (1995) 11. Tractinsky, N., Katz, A.S., Ikar, D.: What is Beautiful is Usable. Interacting with Computers 13(2), 127–145 (2000) 12. Solves Pujol, R.: Personal Factors Influencing People’s ICT Interaction: A Study of Engagement, Quality of Experience, Creativity and Emotion. Unpublished master thesis. Department of Industrial Engineering and Management, Tokyo Institute of Technology, Tokyo, Japan (2007)
692
H. Umemuro
13. Dunne, A.: Hertzian Tales. MIT Press, Cambridge (2005) 14. Fishwick, P.A. (ed.): Aesthetic Computing. MIT Press, Cambridge (2006) 15. Goleman, D.: Emotional Intelligence: Why It Can Matter More Than IQ. Bantam Dell, New York (1995) 16. Fujita, K. (ed.): Affective Science. Kyoto University Press, Kyoto (2007) 17. Picard, P.W.: Affective Computing. MIT Press, Cambridge (1997) 18. Zhang, P., Li, N.: The Importance of Affective Quality. Communications of the ACM 48(9), 105–108 (2005) 19. Hochschild, A.R.: The Managed Heart: Commercialization of Human Feeling. University of California Press, Berkeley (1983) 20. Grawitch, M.J., Munz, D.C.: Individual and Group Affect in Problem-Solving Workgroups. In: Hartel, C.E.J., Zerbe, W.J., Ashkanasy, N.M. (eds.) Emotions in Organizational Behavior, pp. 119–142. Lawrence Erlbaum Associates, Mahwah (2005) 21. Meisiek, S., Yao, X.: Nonsense Makes Sense: Humor in Social Sharing. In: Hartel, C.E.J., Zerbe, W.J., Ashkanasy, N.M. (eds.) Emotions in Organizational Behavior, pp. 143–165. Lawrence Erlbaum Associates, Mahwah (2005)
Bio-sensing for Emotional Characterization without Word Labels Tessa Verhoef1, Christine Lisetti2, Armando Barreto3, Francisco Ortega2, Tijn van der Zant4, and Fokie Cnossen4 1 University of Amsterdam 1012 VT Amsterdam, The Netherlands [email protected] 2 School of Computing and Information Sciences [email protected], [email protected] 3 Department of Bio-medical Engineering Florida International University, Miami, 33199 USA [email protected] 4 Artificial Intelligence Institute University of Groningen 9747 AG Groningen, The Netherlands {C.M.van.der.Zant,f.cnossen}@ai.rug.nl
Abstract. In this article, we address some of the issues concerning emotion recognition from processing physiological signals captured by bio-sensors. We discuss some of our preliminary results, and propose future directions for emotion recognition based on our lessons learned. Keywords: Emotion Recognition, Affective Computing, Bio-sensing.
1 Introduction In the past few years, a number of psychologists [1-3] have challenged the classical notion that emotion can be categorized by labels, using words such as ‘anger’, ‘fear’, ‘happiness’ [4], and have proposed to use dimensional representations of emotions for more realistic categorization of emotional states. Emotion labeling has also been found to be dangerously ethnocentric and misleading [5]. Our current reported work is one of the first attempts to approach automatic emotion recognition with the novel method proposed by Peter and Herbon [6] that moves away from the notion of labeling emotional states with discrete categorical words. In the following section we describe which physiological modalities associated with emotions we chose to capture, the bio-sensors that we used, and the emotion elicitation method used with participants. The section after that explains how we created a data set suitable for training and testing emotion classifiers, and the processing of these bio-physiological signals for classification. Finally we discuss some of the lessons learnt from this experiment and propose future directions toward emotion recognition. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 693–702, 2009. © Springer-Verlag Berlin Heidelberg 2009
694
T. Verhoef et al.
2 Emotion Recognition without Labeling Emotion with Words Peter and Herbon [6] have proposed a method for avoiding the use of words for automatic emotion recognition and provided guidelines about how to structure emotions as a dimensional representation for the use in human-machine interaction. Labeling emotions can be problematic because the category borders are blurry and the word ‘anger’ for instance can describe many different emotional states. The method as described by [6] avoids these problems because it abandons labeling emotions with words. The procedure to classify emotions for automatic emotion recognition proposed in [6] consists of the following four steps: • Step 1: Elicit emotions while measuring physiological signals and ask test subjects to self-report in a way that can be translated into a dimensional structure. • Step 2: Assign the physiological measurements to the related ratings. • Step 3: Group emotions into clusters with similar physiology and place in dimensional structure. • Step 4: Identify characteristic patterns in physiology for each cluster. In this section we describe which sensor modalities we chose to capture and process the physiological signals that are associated with emotional states. We describe the emotion elicitation method based on psychological findings designed to collect data while eliciting emotions from the participants. With these experiments we completed step one and two of the procedure described by [6]. 2.1 Bio-sensors Used to Collect Data Our data set consisted of multimodal physiological evidence about the affective state of a user: galvanic skin response (GSR), and blood volume pressure (BVP). For a survey of the different modalities associated with emotional states and the recognition methods used to process these various modalities to date see [7]. Galvanic Skin Response (GSR): The GSR2 Thought Tech LTD device1 shown in Figure 1.a was used to measure the Galvanic Skin Response (GSR). This method was introduced in the early 20th century and is based on the idea that conductance of an electric current is easier on moist skin. The autonomic nervous system, which consists of two subsystems – the parasympathetic and the sympathetic subsystems – has an influence on the control of sweat glands. In the case of higher sympathetic activity, the sweat glands get more hydrated and skin conductance increases. So, the sweat glands are used as resistors and the skin conductance can be measured with the GSR device by passing a small electric current across two electrodes that touch the skin. Blood Volume Pressure (BVP): The Pulse Plethysmograph2 shown in Figure 1.b. was used for measuring the Blood Volume Pulse (BVP), a signal from which information about the Heart Rate Variability (HRV) can be computed. HRV has been 1 2
http://www.thoughttechnology.com/gsr.htm http://www.ufiservingscience.com/Pig1.html
Bio-sensing for Emotional Characterization without Word Labels
695
linked to emotional processes and the autonomic nervous system [8]. In addition, information about vasoconstriction (constriction of the blood vessels) can be inferred by detecting a decrease in the amplitude of the BVP signal. Vasoconstriction is said to be related to emotional processing as well [9]. The sensing device shown in Figure 1.b. is a finger clip that uses an infrared emitter and receiver to measure the amount of light that is reflected back by the skin.
(a)
(b)
Fig. 1. (a).The GSR2 skin conductance device (Thought Tech LTD) (b). Pulse Plethysmograph.
2.2 Emotion Elicitation We created an experimental set-up with the sensors described above in which test subjects were exposed to emotion eliciting stimuli, and data was captured from the sensing devices during that exposure. During the experiment, the physiological signals were measured with the non-invasive sensors while the participant was asked to keep the arm that was attached to the sensors as motionless as possible (to avoid generating noise data associated with body movements). Stimulus design: The stimuli used for emotion elicitation consisted of movie fragments that were known to elicit a range of different emotions. Gross and Levenson [10] conducted a very thorough study to provide a selection of movie fragments best suited to elicit certain emotions. Using a large amount of test subjects and a wide selection of movie fragments they were eventually able to reduce it to a reliable set in terms of emotion discreteness and intensity. More recently, Nasoz et al. [11] performed another panel study in which they tested the movie selection again, and created a modified version of the movie selection which proved more appropriate. In order to allow for easy comparison of previous emotion recognition classification, our set of emotion eliciting movie clips was based on the selection of [11]. Some changes have been made though because during a pre-testing stage it appeared that people responded inappropriately to some of the movies. For example, The Shining, originally meant to elicit fear, is so well known and by now so old, that people often show a smile of recognition instead of fear. Similarly, Drop Dead Fred caused people to be annoyed more than amused. To find out whether these two movies should be
696
T. Verhoef et al.
replaced, a small pilot study was conducted in which we showed the two movie clips as well as two alternatives (The Ring for fear and the Pixar short movie Boundin' for happiness) and asked people to rate the emotion that they experienced while watching the clip. They were asked to choose one of the following possibilities [Happy, Angry, Sad, Disgusted, Surprised, Afraid, Neutral or None of the above] and to rate the intensity of the felt emotion on a scale from 1 to 5. Fifteen test subjects participated in this pilot study among which 7 were female and 8 were male. Their ages varied from 22 to 57. The results are shown in Table 1. The difference in eliciting success (defined by the percentage of subjects that reported to have felt the intended emotion) and average reported intensity between the two movies for ‘happy’ is smaller than the difference for the ‘fear’ movies, but in both cases the alternative movie scores better, therefore we decided to replace them both. The final selection of movies for the main experiment therefore was: The Champ for sadness, Schindler's List for anger, The Ring for fear, Capricorn One for surprise, Bounding' for happiness and an episode of Fear Factor for disgust. Table 1. Results of the pilot study about eliciting abilities of movie clips. Average eliciting success, average intensity rating and Standard Deviation of the intensity rating. Movie clips `happy’ Drop Dead Fred Boundin’ Movie clips `fear’ The Shining The Ring
Eliciting success 67 % 73 % Eliciting success 87 % 100 %
Average intensity 2.8 3.1 Average intensity 2.6 3.9
SD 0.79 1.39 SD 1.14 1.16
Procedure: During the main experiment, the user watched the six selected movie fragments which were separated with a reasonably long pause to make sure that the subject would be relaxed and in a neutral state again before the next movie started. Before the movies started there was such a pause as well, in which relaxing music was played and the participant was asked to breathe slowly and try to relax. After each fragment, the user was asked to self-report about which emotion he or she felt during the movie via a questionnaire. The written questionnaire was designed by adapting the concept of the Emotion Wheel [3] shown in Figure 2, which has been proposed to be an intuitive tool [3] for participants to (a) identify which emotions they are experiencing using labels from 16 different emotion families such as anger, fear, happiness (as most people are used to do when reporting on their emotional states) and (b) grade the intensity of their emotion within that family (e.g. the anger label can be fine-tuned to refer to rage or to annoyance, by respectively rating up or down the intensity of the experience). There are ongoing discussions among psychologists about what is the minimum number of dimensions sufficient to differentiate the variety of emotions that humans can experience and about what these dimensions should be [1], [2], [3]. Although the Emotion Wheel proposes 16 dimensions as “spikes” around the wheel, it also incorporates a mapping from these labels to a continuous 2-dimensional structure of emotional states: the valence dimension indicates whether an emotional state is
Bio-sensing for Emotional Characterization without Word Labels
697
associated with a positive or a negative experience, and the power dimension indicates the coping potential or how well the emotional situation can be handled. In addition, the Emotion Wheel allows for self-report of the intensity or how strongly the emotional state is experienced, with increasingly intense emotions radiating from the center of the wheel.
Fig. 2. The Emotion Wheel and Our Associated Questionnaire Notes, both adapted from [3]
Explanation: 16 different emotion families are arranged in a circular fashion. Please note that the word or label that represents each family can stand for a whole range of similar emotions. Thus, the Anger family also covers emotions such as rage, vexation, annoyance, indignation, fury, exasperation, or being cross or mad. First identify approximately how you felt during the movie and choose the emotion family that best corresponds to the kind of feeling you experienced. Then determine with which intensity you experienced the respective emotion and check one of the circles in the "spike" corresponding to this emotion family -- the bigger the circle and the closer it is to the rim of the wheel, the stronger the emotional experience. If you felt no emotion at all, check the ‘neutral’ circle in the center. The duration of the complete procedure was approximately 45 minutes. 25 test subjects participated in the experiment who varied in age from 21 to 41. The group consisted of 16 males and 9 females and the division of their ethnicities was as follows: 40 % Caucasian, 40 % Latin American, and 20 % Asian. Some of the data had to be excluded from the data set. Whenever a participant self reported an emotion that did not match the intended emotion for a movie fragment, the data for that movie fragment was not used in the data set. Another reason to exclude data was unsuccessful recording of the signals. Sometimes participants moved the arm with the sensors too much which caused interruptions in the signals and made it impossible to compute the features in the signal. These recordings were therefore also excluded.
698
T. Verhoef et al.
3 Physiological Data Classification toward Emotion Recognition 3.1 Feature Extraction For each test subject the experiment resulted in two signals: the raw GSR signal and the raw BVP signal. To create the data set, we computed features from the recorded signals. The features we computed were the same as the ones that are assessed by Barreto et al. [12] with the only difference that we did not consider each movie as a complete segment from which each feature is computed over the whole segment, but we assessed the signals in intervals of 40 seconds, in order to have a sequence of feature values for each elicited emotion. A typical GSR response consists of several temporary increases: the skin conductance responses (SCRs). Figure 3 shows an example of such a GSR response. An electrodermal response is often described with a few specific characteristics from these responses: amplitude, rise time and the half-recovery time [12]. The specific features that we computed from the GSR signal are: the number of GSR responses, Mean value of the GSR, average Amplitude of the GSR responses, average Rising time of the GSR responses and the average Energy of the responses (the total area under the half-recovery time). All these features were computed as described in [12].
Fig. 3. Graphical annotated rendering of a Galvanic Skin Response
As mentioned earlier, the BVP signal can be used to compute features such as the Heart Rate Variability (HRV). A typical BVP beat is shown in Figure 4. As described in [12], the individual heart beats are first separated by finding the Inter Beat Intervals (or period) which is the time between two peaks in the signal. This series is usually analyzed by studying different frequency bands in which the Low Frequency (LF) (0.05-0.15Hz) band reflects sympathetic activity whereas the High Frequency (HF)
Bio-sensing for Emotional Characterization without Word Labels
699
(0.16-0.40Hz) band reflects parasympathetic activity. The LF/HF ratio is computed as one feature, as well as the mean Inter Beat Interval, the standard deviation of the Inter Beat Interval and the mean amplitude of the individual beats that are detected in the segment. Some problems were encountered with one of the six emotion categories because of the nature of the elicitation method. The duration of the surprise part in the movie clip that elicited surprise only lasted a few seconds. The way we compute the features from the signals requires the assessment of segments of at least 40 seconds because, for instance, a single GSR response can last that long. This made it unfeasible to use the signals that we recorded for surprise in the way the data is processed, and we decided to use only the data for the other 5 emotions and built a classifier for this 5class problem.
Fig. 4. Graphical annotated rendering of a Blood Volume Pressure beat
3.2 Normalization The physiological response to emotion eliciting stimuli differs a lot from person to person. Therefore the data from the two biosensors was normalized so that it reflects the proportional difference in reaction for the different stimuli segments and also to re-scale the individual baselines. These normalization steps follow the example of those used by [12] with similar data. The first normalization step uses a set of features that were computed during the relaxation interval that preceded the first movie clip of the experiment. This interval represents the baseline of the user's reaction. If Xe is one of the features for the segment eliciting emotion e, and Xr is the feature recorded during the relaxation interval, then Equation 1 computes the feature value of Xe after the first normalization step by dividing the original value by the one in relaxation.
700
T. Verhoef et al.
X e′ =
Xe Xr
(1)
The second normalization step was also aimed at reducing the influence of individual differences and to make sure that the baselines and strength of the responses of the individuals are equalized. If X e′ is one feature value after the first normalization step, for the segment that elicited emotion e, the second normalization step follows Equation 2. The value is divided by the average individual response for that feature computed over all feature vectors n for all segments corresponding to the six emotions.
X e′′ =
X e′
1 n
∑
n i=1
X ei′
(2)
The last step normalized the features to a uniform range in order to eliminate differences in dynamic range and the chance that this could make some features dominate others. This min-max normalization step follows Equation 3 and maps all computed feature values to a value in the range from zero to one.
X norm =
X e′′ − X e′′min X e′′max − X e′′min
(3)
3.3 Classification Corresponding with step 3 in the process as described by Peter and Herbon [6], we tried to group the emotions by implementing the K-means clustering algorithm [13], which can find the centers of clusters that are naturally present in data. Then with this new class labeling (not associated with emotion words), we moved on to step 4 to indentify patterns in the physiological signals. We used a Static Bayesian Network approach to train and test a classifier with attribute selection. The performance that could be reached so far with this method was 60.0 % classified correctly.
4 Lessons Learned and Future Work The investigation described in the previous sections of this paper was exploratory in nature and has been successful in providing direction for our future work. For example, this work reinforced our belief that attempts at describing emotional states through discrete emotion labels may be self-limiting. This view is also supported by the results from experiments by Barrett and Russell, which showed that subjects may experience several emotions at the same time [14]. Accordingly, we will approach future experiments using a methodology similar to that suggested by Peter and Herbon [6]. Thus, to specify emotional states we intend to use a multi-dimensional representation, such as, for example, the three dimensional space proposed by Russell and Mehrabian [2], in combination with personality data about the subject. The latter will be important to understand the differences between the subjects.
Bio-sensing for Emotional Characterization without Word Labels
701
The work described here also highlighted the critical nature of the emotion elicitation component of the experimental protocol. In the future, we may still use elicitation techniques similar to those described in this paper. However, we will extend the length of time for elicitation to 60 seconds per movie clip. In addition, we will experiment with more realistic life experiences as suggested by Peter and Herbon [6]. Such type of elicitation may prove to be better suited for emotional state recognition. To analyze the data resulting from the experiments we may explore other forms of cluster analysis (e.g., Kohonen Learning), which may provide a clearer picture when viewing an n-dimensional emotional state space. For classification techniques we will revisit methods used in this paper with additional techniques if needed. Some specific recommendations for future work follow: • More modalities: It might have been the case that the combination of only the two sensors that we used for recording the physiological signals did not contain enough discriminating ability to distinguish between five or more emotions. The combination of physiological signals with facial expression data and other modalities such as vocal intonation is likely to enhance accuracy. • More participants: In the present research only 25 test subjects participated in the data collection. A lot of this data had to be excluded which resulted in a data set that was quite small. In a follow-up experiment, more participants should be invited so that the problem of not having enough data can be eliminated. • Other elicitation methods: the way that we elicited the emotions in the experiment was one of the causes for the unfortunate composition of our data set. The stimuli should be of a longer duration so that also dynamic methods can seriously be applied and they should have the same duration for all emotions in order to be able to create a balanced data set, which would contain the same amount of examples for each category. Acknowledgments. This work was partially supported by NSF grants HRD-0833093 and CNS-0520811.
References 1. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39(6), 1161–1178 (1980) 2. Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. Journal of Research in Personality 11(3), 273–294 (1977) 3. Scherer, K.R.: What are emotions? and how can they be measured? Social Science Information 44(4), 695–729 (2005) 4. Ekman, P.: An argument for basic emotions. Emotion: Themes in the Philosophy of the Mind (1992) 5. Wierzbicka, A.: Defining emotion concepts. Cognitive Science 16(4), 539–581 (1992) 6. Peter, C., Herbon, A.: Emotion representation and physiology assignments in digital systems. Interacting with Computers 18(2), 139–170 (2006)
702
T. Verhoef et al.
7. Lisetti, C.L., Nasoz, F.: Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP Journal on Applied Signal Processing 2004(11), 1672–1687 (2004) 8. Dishman, R.K., Nakamura, Y., Garcia, M.E., Thompson, R.W., Dunn, A.L., Blair, S.N.: Heart rate variability, trait anxiety, and perceived stress among physically fit men and women. International Journal of Psychophysiology 37(2), 121–133 (2000) 9. Hilton, S.M.: The defence-arousal system and its relevance for circulatory and respiratory control. J. Exp. Biol. 100(1), 159–174 (1982) 10. Gross, J.J., Levenson, R.W.: Emotion elicitation using films. Cognition & Emotion 9(1), 87–108 (1995) 11. Nasoz, F., Alvarez, K., Lisetti, C.L., Finkelstein, N.: Emotion recognition from physiological signals using wireless sensors for presence technologies. Cognition, Technology & Work 6(1), 4–14 (2004) 12. Barreto, A., Zhai, J., Adjouadi, M.: Non-intrusive Physiological Monitoring for Automated Stress Detection in Human-Computer Interaction. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 29–38. Springer, Heidelberg (2007) 13. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification and scene analysis. Wiley, New York (1973) 14. Barrett, F.L., Russell, J.A.: Independence and bipolarity in the structure of affect. Journal of Personality and Social Psychology 74(4), 967–984 (1998)
An Affect-Sensitive Social Interaction Paradigm Utilizing Virtual Reality Environments for Autism Intervention Karla Conn Welch1, Uttama Lahiri2, Changchun Liu1, Rebecca Weller2, Nilanjan Sarkar2,1, and Zachary Warren3 1
Department of Electrical Engineering and Computer Science 2 Department of Mechanical Engineering 3 Vanderbilt Kennedy Center and Department of Pediatrics Vanderbilt University, Nashville, TN 37212 [email protected]
Abstract. This paper describes the design and development of both software to create social interaction modules on a virtual reality (VR) platform and individualized affective models for affect recognition of children with autism spectrum disorders (ASD), which includes developing tasks for affect elicitation and using machine-learning mathematical tools for reliable affect recognition. A VR system will be formulated that can present realistic social communication tasks to the children with ASD and can monitor their affective response using physiological signals, such as cardiovascular activities including electrocardiogram, impedance cardiogram, photoplethysmogram, and phonocardiogram; electrodermal activities including tonic and phasic responses from galvanic skin response; electromyogram activities from corrugator supercilii, zygomaticus major, and upper trapezius muscles; and peripheral temperature. This affect-sensitive system will be capable of systematically manipulating aspects of social communication to more fully understand its salient components for children with ASD. Keywords: Human-computer interaction, Physiological responses, Virtual Reality, Autism, Affective model.
1 Introduction The development of technological tools that can make application of productive intensive treatment more readily accessible and cost effective is an important new direction for research on autism spectrum disorders (ASD) [1, 2]. A growing number of studies have been investigating the application of advanced interactive technologies to address core deficits related to autism, namely computer technology [3-5], robotic systems [6-8], and virtual reality environments [9-11]. There is increasing consensus in the autism community that development of assistive tools that exploit advanced technology will make application of intensive intervention for children with ASD more efficacious. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 703–712, 2009. © Springer-Verlag Berlin Heidelberg 2009
704
K.C. Welch et al.
Virtual reality (VR) represents a medium well-suited for creating interactive intervention paradigms for skill training in the core areas of impairment for children with ASD (i.e., social interaction, social communication, and imagination). VR-based therapeutic tools can partially automate the time-consuming, routine behavioral therapy sessions and may allow intensive intervention to be conducted at home [6,10]. However, no existing VR-based system specifically addresses the issue of how to detect and flexibly respond to affective cues of children with ASD within an intervention paradigm. Affective cues are insights into the emotions and behaviors of children with ASD, and as such the ability to utilize the power of these cues is critical given the importance of human affective information in human-computer interaction [12], the significant impacts of the affective factors of children with ASD on intervention [13], and the core social and communicative vulnerabilities that limit individuals with ASD to accurately self-identify affective experiences [14]. There are several modalities such as facial expression, vocal intonation, gestures and postures, and physiology that can be utilized to evaluate the affective states of individuals interacting with a computer. In this work, the affective models are based on physiological data for several reasons. Children with ASD often have communicative impairments (both nonverbal and verbal), particularly regarding expression of affective states. These vulnerabilities place limits on traditional conversational and observational methodologies; however, physiological signals are continuously available and are arguably not directly impacted by these difficulties [15]. As such, physiological modeling may represent a methodology for gathering rich data despite potential communicative impairments. Furthermore, there is evidence for typical individuals that the physiological activity associated with various VR experiences is differentiated [16], and the transition from one affective state to another state is accompanied by dynamic shifts in indicators of autonomic nervous system activity [17]. The ultimate objectives of this work are to develop technologies capable of flexibly responding to subtle affective changes in individuals with ASD during social paradigms and to take current VR-based ASD intervention technology to a higher level such that it can present itself as a realistic and powerful intervention platform. This paper describes the current VR task design and affective modeling techniques that could eventually be used during closed-loop interactions in which the VR system autonomously responds to the affective cues of a child with ASD. The framework of the affect-sensitive VR system during closed-loop interaction with a child with ASD is presented in Fig. 1. The physiological signals from the children with ASD are recorded while they are interacting with the VR system. These signals are processed in real time to extract features, which are fed as input into the developed affective models. The models map the features to an intensity level of an affective state and return this information as an output. The affective information is used by a controller to decide the next course of action for the VR system. The child who engages with the system is then influenced by the system’s behavior, and the closed-loop interaction cycle begins anew. In particular, the affect-sensitive VR system will monitor the affective states of anxiety and engagement, measured by physiological signals which vary with respect
An Affect-Sensitive Social Interaction Paradigm Utilizing Virtual Reality Environments
705
Fig. 1. Framework of the affect-sensitive VR system during closed-loop interaction
to the variation of specific communication factors (e.g., proximity and eye contact) presented in the VR environment. The discriminating capability of the physiological features will be measured to identify ones that have significant influence during social communication in the VR environment for the children with ASD. Assessment of the prediction accuracy of the affective models could be useful in developing intervention applications to help children with ASD explore social interaction dynamics in an adaptive and customized manner.
2 Physiological Features and Affective Modeling More than one physiological signal, judged as a favorable approach [18], will be examined. The collected physiological signals are cardiovascular activities including electrocardiogram (ECG), impedance cardiogram (ICG), photoplethysmogram (PPG), and phonocardiogram (PCG)/heart sound; electrodermal activities (EDA) including tonic and phasic responses from galvanic skin response (GSR); electromyogram (EMG) activities from corrugator supercilii, zygomaticus major, and upper trapezius muscles; and peripheral temperature. These signals were selected because they are likely to demonstrate variability as a function of the targeted affective states, can be measured non-invasively, and are relatively resistant to movement artifacts [19]. Signal processing techniques will be used to derive the relevant features from the physiological signals. For example, inter beat interval (IBI) is the time interval between two "R" waves in the ECG waveform. Time-domain features of IBI, the mean and standard deviation (SD), are computed from the detected R peaks. Power spectral analysis is performed on the IBI data to localize the sympathetic and parasympathetic nervous system activities associated with different frequency bands. The high-frequency power (0.15-0.4 Hz) is associated with parasympathetic nervous system activity. The low-frequency power (0.04-0.15 Hz) provides an index of
706
K.C. Welch et al.
sympathetic effects on the heart. Very-low frequency power is associated with the frequency band less than 0.04 Hz. The ratios of different frequency components are also computed as the input features for affective modeling. The PPG signal measures changes in the volume of blood in the finger tip associated with the blood volume pulse (BVP) cycle. Pulse transit time (PTT) is estimated by computing the time between systole at the heart (as indicated by the R-wave of the ECG) and the peak of the BVP wave reaching the peripheral site. Besides PTT, the mean and SD values of BVP peak amplitudes are also extracted as features. Pre-ejection period (PEP), derived from ICG and ECG, measures the latency between the onset of electromechanical systole and the onset of left-ventricular ejection. The time intervals between the successive peaks of ICG time-derivative and ‘‘R’’ peaks of ECG are calculated to obtain the value of PEP. The features obtained are the mean of PEP and the average time interval between two peaks of the ICG time-derivative. The features extracted from the heart sound signal consist of the mean and SD of the third (138275 Hz), fourth (69-138 Hz), and fifth (34-69 Hz) level coefficients of the Daubechies wavelet transform. EDA consists of two main components - tonic response and phasic response. The phasic skin conductance detection algorithm uses the following heuristics for considering a particular peak as a valid skin conductance response: (i) the slope of the rise to the peak should be greater than 0.05 μSiemens/min; (ii) the amplitude should be greater than 0.05 μS; and (iii) the rise time should be greater than 0.25s. Once the phasic responses are identified, the rate of the responses and the mean and maximum amplitude are determined as features. All the signal points that are not included in the phasic response constitute the tonic part of the skin conductance signal. The slope of tonic activity is obtained using linear regression. Another feature derived from tonic response is the mean tonic amplitude. The EMG signal from corrugator supercilii muscle (eyebrow) detects the tension in that region, and the EMG signal from the zygomaticus major muscle (cheek) captures the muscle movements while smiling. Upper trapezius muscle EMG activity measures the tension in the shoulders, one of the most common sites in the body for developing stress. Time-domain features, the mean, SD, and slope are calculated from the EMG signals after performing a bandpass filtering operation (10-500 Hz). The analysis of the EMG activities in the frequency domain involve applying fast Fourier transform on a given EMG signal, integrating the EMG spectrum, and normalizing it to [0,1] to calculate the two features of interest – the median frequency and mean frequency for each EMG signal. The blink-related features are determined from the corrugator supercilii EMG signals after being preprocessed by a low-pass filter (10 Hz). The peripheral temperature signal is down-sampled by 10 and filtered to remove highfrequency noise, from which the mean, SD, and the slope are calculated as features. In order to have reliable reference points to link the physiological feature sets to the affective states, subjective reports on the affective states from a therapist, each participant's parent, and the child himself/herself are collected. Each participant has a dataset comprised of both the objective physiological features and corresponding subjective reports on intensity of the target affective states from the three reporters. Using support vector machines (SVM), we can build affective models to map between the physiological features and the intensity (i.e., high/low), as reported on questionnaires, of a particular affective state. As illustrated in Fig. 2, a therapist-like
An Affect-Sensitive Social Interaction Paradigm Utilizing Virtual Reality Environments
707
Fig. 2. Overview of affective modeling when the therapist’s subjective reports are used
affective model (i.e., a model that captures the therapist’s ability to assess affective states) can be developed when the therapist’s reports are used. This process of differentiating high/low levels of the target affective states from physiological signals attempts to emulate present autism intervention practices and to experimentally demonstrate the feasibility of affective modeling for children with ASD via psychophysiological analysis. In this work we chose anxiety and engagement to be the target affective states. Anxiety was selected for two primary reasons. First, anxiety plays an important role in various human-machine interaction tasks that can be related to task performance [20]. Second, anxiety is not simply a frequently co-occurring disorder; in some ways it is also a hallmark of autism [21]. Engagement has been regarded as one of the key factors for children with ASD to make substantial gains in academic, communication, and social domains [22].
3 Experimental Design 3.1 VR Task Design We created realistic VR scenarios for social interaction with virtual human characters (i.e., avatars) using the Vizard Virtual Reality Toolkit (www.worldviz.com). To prevent possible "cybersickness" [23] and since in ASD intervention VR is often effectively experienced on a desktop system using standard computer input devices [2], our participants view the VR environment on a computer monitor from the firstperson perspective. Within the controllable VR environment, components of the
708
K.C. Welch et al.
interaction are systematically manipulated to allow the participants to explore different social compositions. The social parameters of interest, eye gaze and social proximity, are examined in a 4x2 experimental design. These parameters were chosen because they play significant roles in social communication and interaction [24], and manipulation of these factors may elicit variations in physiological responses [25]. The eye gaze parameter dictates the percentage of time an avatar looks at the participant (i.e., staring straight out of the computer monitor) or away from the participant. Four types of eye gaze will be examined. These are tagged as "straight," "averted," "normal," and "flip of normal," which correspond to the avatar staring straight ahead 100%, 0%, 30%, and 70% of the time during the interaction, respectively. The social proximity parameter is characterized by the distance between the avatar and the user. Two types of social proximity, termed "invasive" and "decorum," will be examined. For invasive proximity, the avatar stands approximately 1.5ft from the main view of the scene. This invasive distance is characterized by eliciting uncomfortable feelings and attempts to increase the distance to achieve a social equilibrium consistent with comfortable social interaction [26]. Decorum proximity means the avatar stands approximately 4.5 ft from the main view of the scene, and research indicates this distance results in a more comfortable conversation experience than the invasive distance [26]. Using Vizard software we will project avatars who display different eye gaze patterns at different distances (two examples shown in Fig. 3). Each social interaction situation is represented three times, which creates 24 trials/epochs in the experiment. Each epoch includes one avatar for one-on-one interaction with the participant. In each epoch, participants are instructed to watch and listen as the avatar tells a 2min. first-person story. At the end of the story, the avatar asks the participant a question about the story. The questions were designed to facilitate interaction and to serve as a possible objective measure of engagement. Thus, the task can be likened to having different people introduce themselves to the user, which is comparable to research on social anxiety and social conventions [26][27]. Other social parameters, such as facial expression and vocal tone have been kept as neutral as possible. However, we also attempt to make the task interesting
Fig. 3. At left an avatar displays straight gaze at the invasive distance, while on the right an avatar stands at the decorum distance and looks to her right in an averted gaze
An Affect-Sensitive Social Interaction Paradigm Utilizing Virtual Reality Environments
709
enough so that participants do not become excessively detached based on habituation or dull content. Our design is currently being evaluated by lab members and a pool of undergraduate students to determine to what extent these objectives are being met. The Virtual Human Interaction Lab at Stanford University has provided distinct avatar heads created from front and side 2D photographs of college-age students using 3DMeNow software. The stories the avatars share were adapted from DIBELS (Dynamic Indicators of Basic Early Literacy Skills) reading assessments (dibels.uoregon.edu/measures/). The voices for the avatars were gathered from teenagers and college-age students from the regional area. Their ages (range = 13-22 years, mean = 18.5 years, SD = 2.3 years) are similar to the age of people used for the avatar heads and our participant pool. 3.2 Experimental Procedure The experiment will include 24 social interaction epochs, broken up over two 1-h sessions. Each session will take place on a different day to avoid bias in data due to habituation. Participants will relax in a seated position and read age-appropriate leisure material during a 3-min baseline recording at the beginning of each session to offset day-variability. A second audio-only baseline including components of the subsequent epochs (e.g., story and corresponding question) except for the appearance of an avatar will also be recorded for comparison. Subsequently, emotional stimuli induced by the VR tasks will be applied in epochs of 2min. in length. While interacting with the VR environment, each participant's physiological signals will be recorded using the Biopac MP150 data acquisition system (www.biopac.com). The physiological sensors are small, lightweight, non-invasive, and FDA approved. ECG will be measured from the chest using a two-electrode configuration. ICG will be measured by four pairs of surface electrodes that are longitudinally configured on both sides of the body along the neck and torso. A microphone specially designed to detect heart sound waves will be placed on the chest to measure PCG. PPG, peripheral temperature, and EDA will be measured from the middle finger, the thumb, the index finger, and the ring finger of the non-dominant hand, respectively. EMG will be measured by placing surface electrodes on two facial muscles (corrugator supercilii and zygomaticus major) and an upper back muscle (upper trapezius). As shown in Fig. 4, the equipment setup for collecting physiological data for psychophysiological analysis includes a computer dedicated to the social interaction tasks where the participants interact with the VR environment (Task Computer C1), biological feedback equipment – labeled Biopac System, and another PC that is dedicated to acquiring signals from the biological feedback equipment (Biopac Computer C2). The Vizard software runs on computer C1 that is connected to the Biopac System via a parallel port to transmit task related event-markers. The physiological signals along with the event markers are acquired by the Biopac System and sent over an Ethernet link to the Biopac computer C2. We video record the sessions to cross-reference observations made during the experiment. The signal from the video camera is routed to a television, and the signal from the participant's computer screen where the task is presented is routed to a
710
K.C. Welch et al.
Fig. 4. Experimental setup for collecting physiological data and subjective reports
separate computer monitor (2nd Monitor, M2). The therapist and the participant's parent, seated at the back of the experiment room, will be fully exposed to the experiment by watching the participant on the TV from the view of the video camera and observing how the task progresses on the separate monitor. Self-reports from each participant on their perceived affective states will be collected after each epoch via dialog windows on C1 and automatically stored in corresponding data-logger files. However, children with ASD may have deficits in identifying and describing their own emotions on a self-report [14]. Therefore, a therapist and a parent will observe the experiment and provide subjective reports about how they think the participant is feeling. Given the fact that a therapist's judgment based on his/her expertise is the state-of-the-art in most autism intervention approaches and the results about the reliability of the subjective reports in our completed studies [28], the reports from the therapist may provide the most reliable reference points to link the objective physiological data to each child's subjective affective states. Since the remarks of parents based on their every-day experience are also sought-after in the autism community, reports from each participant's parent will also be collected to compute any correlation with the therapist and child. The three reports regarding the intensity level (i.e., high/low) of the target affective states will be collected after each of the 24 social interaction epochs.
4 Measurement and Analysis Due to the phenomena of person stereotypy [19], an individual-specific approach will be applied when creating the affective models. However, patterns of physiological response that may be related to presumed core impairments of ASD may be
An Affect-Sensitive Social Interaction Paradigm Utilizing Virtual Reality Environments
711
detectable. Post-experiment analysis will identify physiological reactions to the social communication parameters and any correlations to the subjective reports. These techniques have provided highly reliable results for typical adults [20] and children with ASD during performance-based computer and robot tasks [28]. We will further explore the discriminating capability of those features during social communication in a VR environment. For example, GSR is generally accepted as an indicator of anxiety. Therefore, if GSR increases significantly when an avatar uses direct eye contact, the child could likely have a communication deficit related to this social situation, and the subjective reports may reveal high anxiety.
5 Conclusions The proposed design of integrating biofeedback sensor technology and VR social interaction tasks is novel, yet relevant to the current priorities of computer-assisted ASD intervention. This research is expected to result in the development of a physiological-based VR system for assessing physiological response during social interaction. Plans for user studies will make comparisons between typicallydeveloping children and children with ASD to define social vulnerabilities within the interaction scenarios. The experimental design and methods could potentially produce a valuable tool for clinical application using this technology. Acknowledgements. This work is supported by an Autism Speaks Pilot Study award.
References 1. Rogers, S.J.: Interventions That Facilitate Socialization in Children with Autism. Journal of Autism and Developmental Disorders 30(5), 399–409 (2000) 2. Parsons, S., Mitchell, P.: The potential of virtual reality in social skills training for people with autistic spectrum disorders. J. Intellect. Disabil. Res. 46(Pt 5), 430–443 (2002) 3. Bernard-Opitz, V., Sriram, N., Nakhoda-Sapuan, S.: Enhancing social problem solving in children with autism and normal children through computer-assisted instruction. J. Autism. Dev. Disord. 31(4), 377–384 (2001) 4. Blocher, K., Picard, R.W.: Affective social quest: emotion recognition therapy for autistic children. In: Dautenhahn, K., Bond, A.H., Canamero, L., Edmonds, B. (eds.) Socially Intelligent Agents: Creating Relationships with Computers and Robots. Kluwer Academic Publishers, Dordrecht (2002) 5. Swettenham, J.: Can children with autism be taught to understand false belief using computers? Journal of child psychology and psychiatry, and allied disciplines 37(2) (1996) 6. Dautenhahn, K., Werry, I.: Towards interactive robots in autism therapy: background, motivation and challenges. Pragmatics & Cognition 12(1), 1–35 (2004) 7. Kozima, H., Nakagawa, C., Yasuda, Y.: Interactive robots for communication-care: A case-study in autism therapy. In: IEEE International Workshop on Robot and Human Interactive Communication, Nashville, TN, pp. 341–346 (2005) 8. Michaud, F., Theberge-Turmel, C.: Mobile robotic toys and autism. In: Dautenhahn, K., Bond, A.H., Canamero, L., Edmonds, B. (eds.) Socially Intelligent Agents: Creating Relationships with Computers and Robots, pp. 125–132. Kluwer Academic Publishers, Dordrecht (2002)
712
K.C. Welch et al.
9. Parsons, S., Mitchell, P., Leonard, A.: The use and understanding of virtual environments by adolescents with autistic spectrum disorders. J. Autism. Dev. Disord. 34(4), 449–466 (2004) 10. Strickland, D., Marcus, L.M., Mesibov, G.B., Hogan, K.: Brief report: two case studies using virtual reality as a learning tool for autistic children. J. Autism. Dev. Disord. 26(6), 651–659 (1996) 11. Tartaro, A., Cassell, J.: Using Virtual Peer Technology as an Intervention for Children with Autism. In: Lazar, J. (ed.) Towards Universal Usability: Designing Computer Interfaces for Diverse User Populations. John Wiley and Sons, Chichester (2007) 12. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997) 13. Seip, J.A.: Teaching the autistic and developmentally delayed: A guide for staff training and development, Delta, British Columbia (1996) 14. Hill, E., Berthoz, S., Frith, U.: Brief report: cognitive processing of own emotions in individuals with autistic spectrum disorder and in their relatives. J. Autism. Dev. Disord. 34(2) (2004) 15. Ben Shalom, D., Mostofsky, S.H., Hazlett, R.L., Goldberg, M.C., Landa, R.J., Faran, Y., McLeod, D.R., Hoehn-Saric, R.: Normal physiological emotions but differences in expression of conscious feelings in children with high-functioning autism. J. Autism. Dev. Disord. 36(3), 395–400 (2006) 16. Meehan, M., Razzaque, S., Insko, B., Whitton, M., Brooks Jr., F.P.: Review of four studies on the use of physiological reaction as a measure of presence in stressful virtual environments. Applied psychophysiology and biofeedback 30(3), 239–258 (2005) 17. Bradley, M.: Emotion and motivation. In: Cacioppo, J.T., Tassinary, L.G., Berntson, G. (eds.) Handbook of Psychophysiology. Cambridge University Press, New York (2000) 18. Bethel, C., Salomon, K., Murphy, R., Burke, J.: Survey of psychophysiology measurements applied to human-robot interaction. In: IEEE International Symposium on Robot and Human Interactive Communication, Jeju, Korea (2007) 19. Lacey, J.I., Lacey, B.C.: Verification and extension of the principle of autonomic response-stereotypy. The American Journal of Psychology 71(1), 50–73 (1958) 20. Rani, P., Sims, J., Brackin, R., Sarkar, N.: Online stress detection using psychophysiological signal for implicit human-robot cooperation. Robotica 20(6), 673– 686 (2002) 21. Gillott, A., Furniss, F., Walter, A.: Anxiety in high-functioning children with autism. Autism 5(3), 277–286 (2001) 22. Ruble, L.A., Robson, D.M.: Individual and environmental determinants of engagement in autism. Journal of Autism and Developmental Disorders 37(8) (2006) 23. Cobb, S.V.G., Nichols, S., Ramsey, A., Wilson, J.R.: Virtual reality-induced symptoms and effects. Presence 8, 169–186 (1999) 24. Bancroft, W.J.: Research in Nonverbal Communication and Its Relationship to Pedagogy and Suggestopedia. ERIC (1995) 25. Groden, J., Diller, A., Bausman, M., Velicer, W., Norman, G., Cautela, J.: The development of a stress survey schedule for persons with autism and other developmental disabilities. J. Autism and Dev. Disord. 31, 207–217 (2001) 26. Argyle, M., Deal, J.: Eye-contact, distance and Affiliation. Sociometry 28(3) (1965) 27. Schneiderman, M.H., Ewens, W.L.: The cognitive effects of spatial invasion. The Pacific Sociological Review 14(4), 469–486 (1971) 28. Conn, K., Liu, C., Sarkar, N., Stone, W., Warren, Z.: Towards affect-sensitive assistive intervention technologies for children with autism. In: Jimmy, O. (ed.) Affective Computing: Focus on Emotion Expression, Synthesis and Recognition, pp. 365–390. ITech (2008)
Recognizing and Responding to Student Affect Beverly Woolf1, Toby Dragon1, Ivon Arroyo1, David Cooper1, Winslow Burleson2, and Kasia Muldner2 1
Department of Computer Science, University of Massachusetts Amherst 140 Governors Drive, Amherst MA 01003, USA {dragon,bev,ivon,dcooper}@cs.umass.edu 2 School of Computer Science and Informatics/Arts, Media and Engineering, Arizona State University, Tempe AZ 85287, USA {winslow.burleson,Katarzyna.Muldner}@asu.edu
Abstract. This paper describes the use of wireless sensors to recognize student emotion and the use of pedagogical agents to respond to students with these emotions. Minimally invasive sensor technology has reached such a maturity level that students engaged in classroom work can us sensors while using a computer-based tutor. The sensors, located on each of 25 student’s chair, mouse, monitor, and wrist, provide data about posture, movement, grip tension, facially expressed mental states and arousal. This data has demonstrated that intelligent tutoring systems can provide adaptive feedback based on an individual student’s affective state. We also describe the evaluation of emotional embodied animated pedagogical agents and their impact on student motivation and achievement. Empirical studies show that students using the agents increased their math value, self-concept and mastery orientation. Keywords: intelligent tutoring systems, wireless sensors, student emotion, pedagogical agents.
1 Introduction Sophisticated methods have been developed for building the interfaces of intelligent tutoring systems, but thus far they have focused almost exclusively on the cognitive side of teaching. If computers are to interact naturally with humans, they must recognize affect and express social competencies. However, the role of affect in instruction is at best in its infancy. The possibility of tutoring systems that trace students’ emotions is an attractive concept; our primary research goal is to systematically examine the relationship(s) between student affective state and desired outcomes, i.e., to identify whether a dependency exists between students’ reported emotions and their learning, motivation, and attitudes toward mathematics. Various classroom studies have linked interpersonal relationships between teachers and students to increased student motivation over the long term [1,2] Thus great interest exists to embed affective support into tutoring applications. Since affect recognition is a key aspect of tailored affective support, research has focused on automated detection of affective states in a variety of learning contexts [e.g., 3-7]. Hardware sensors have the J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 713–722, 2009. © Springer-Verlag Berlin Heidelberg 2009
714
B. Woolf et al.
potential to provide information on students’ physiological responses that have been linked to various affective states [e.g., 4]. Research explores various sensors' potential for affect recognition, e.g., Burleson [14] developed a learning companion that depended on a sensor framework (incorporating a mouse, posture chair, video camera, skin conductance bracelet) to recognize and respond to student affect. Currently there is no gold standard for labeling a person’s emotional state nor responding to it. Our approach is to triangulate among three different inputs: sensor data, student self-reports, and human observation of students. While we accept that there will never be definitive categorization of a human’s emotional state, we plan to use this triangulation to identify clear examples of certain emotions (frustration, flow, etc.) that can be labeled using sensor information.
2 Recognizing Student Emotion: Physiologic Sensors To date, much of existing work in detecting emotion has focused on inferring students' affective states with software (e.g., machine learning) [e.g., 3, 8]. However, hardware sensors have the potential to provide information on students’ physiological responses that have been linked to various affective states [e.g., 6,7]. Our sensor platform includes four physiological sensors (Fig. 1) and is currently being tested in high school computer labs, demonstrating that the platform is unobtrusive enough to be used by student in a typical setting and resource-conscious enough to run on average computer labs available to students. These sensors collect raw data about physical activity and state of a student and the challenge remains to map this data into models of emotional states and use this information productively. The sensors are similar to Burleson’s more costly sensors used in previous studies [9]. We describe how our sensors compare to earlier sensors as well as some of the past uses of the earlier sensors. Major improvements focus on the overall production cost and the noninvasive nature of the sensors.
Fig. 1. Sensors used in the classroom (clockwise): mental state camera, skin conductance bracelet, pressure sensitive mouse, pressure sensitive chair
Skin Conductance Bracelet. The Affective Computing Group at the MIT Media Lab has been advancing the development of wireless wearable skin conductance sensors
Recognizing and Responding to Student Affect
715
for over a decade. Various implementations include the galvactivator, a glove that could illuminate an LED when its user had heightened levels of skin conductance [10]; HandWave which used a custom built Printed Circuit Board (PCB) and 9V battery to provide blue tooth wireless transmission of skin conductance data at rates up to5 Hz [11]. The current system used in our research employs the next generation of Hand Wave electronics developed at MIT, providing greater reliability, lower power requirements through wireless RFID transmission, and a smaller form. This smaller form was redesigned to minimize the visual impact and increase the wearable aspects of previous versions. ASU integrated and tested these electronic components into a wearable package suitable for students in classrooms. Our version reports at 1Hz. Pressure Sensitive Mouse. The pressure mouse was developed by the Affective Computing Group at MIT. It uses six pressure sensors embedded in the surface of the mouse to detect the tension in users' grip and has been used to infer elements of users' frustration [12, 13] Our endeavors replicated MIT's pressure mouse through a production of 30 units. The new design of the mouse minimized the changes made to the physical appearances of the original mouse in order to maintain a visually noninvasive sensor state. Pressure Sensitive Chair. The posture state chair that was used in Burleson's affective Learning Companion [14, 15] utilized a donated Tek-Scan pressure system; the commercial cost of this system was and continues to be upwards of $10,000. A software system was developed and trained to detect nine states of posture and activity level [16] and was subsequently used in conjunction with the blue-eyes camera hardware can additional software to detect levels of interest, boredom, and break-taking while students engaged in an educational game. A greatly simplified chair sensor system was developed at ASU using a series of eight force sensitive resistors as pressure sensors dispersed throughout the seat and back of a readily available seat cover cushion. This posture chair sensor was developed at ASU at an approximate cost of $500 per chair for a production volume of 30 chairs. Mental State Camera. The facial expression recognition system used in Burleson's Affective Learning Companion [14,15] utilized IBM Research's Blue-Eyes camera hardware. Which used a digital camera board with adjustable focused lens augmented with a custom PCB that provided differential LED illumination (e.g. conical illumination in close circumferential positions around the lens, used to obtain retroreflection from users' retina; and axial illumination that reflects of the front surface of users' eyeballs). Using the differential images to detect pupil location, pattern recognition models were trained to detect head nod and shake behaviors, blinking, pupil dilation, mouth fidgets and smiles. One limitation of this system was the need for the integration of the custom PCB with the camera, adding to the expense; a second limitation was its development in Linux, which made it less compatible with most operating systems found in schools today. In our current research we use a standard web-camera to obtain 30fps at 320x240 pixels. This is coupled with El Kaliouby's MindReader applications [17]. We developed a Java Native Interface (JNI) wrapper around the MindReader library. The interface starts a version of the MindReader software, and can be queried at any time
716
B. Woolf et al.
to get the most recent mental state values that have been computed by the library. In the version used in the experiments, only the six mental state features were available, but in future versions we can train it on new mental states. In our framework, each feature source from each student is a separate stream of data. Hence we have streams of data which each report asynchronously and at different rates. In order to merge all of the data sources, an ID from each student, and a time of the report was needed from each source. We have a database table with a row for every time stamp and wrist ID pair, and a column for each reported sensor value and tutor data value. Each cell in a row represents the latest report of the data source.
3 Testbed Application: Wayang The sensor data collection and analysis system described above is stand-alone software and can provide input for any system. Currently we are using it in tandem with Wayang Outpost, a multimedia tutoring system for high school geometry and algebra, see Fig. 2. [18, 19] Students learn by working through problems both in a real-world context and in a test-like environment in preparation for standardized testing such as the SAT and other state exams. Prior research has shown that examining a student’s affect is critical to learning mathematics, since affect (particularly frustration and boredom) has been found to be an important factor in student learning [2] is adaptive in that it iterates through different topics (e.g. pythagorean theorem) and multiple hints based on student need. Within each topic section, Wayang adjusts the difficulty of problems provided depending on past student performance. Students are presented with a problem and asked to choose the solution from a list of multiple-choice options. As students solve problems, they may ask the tutor for one or several multimedia hints via the help button, which combine text messages, audio and animations. Wayang has been used with thousands of students in the past and has demonstrated improved learning gains in state standard exams [20]. Wayang collects student interaction features in order to predict the level of effort of each student.
4 Experiments We conducted three studies during Fall 2008 involving the use of sensors and Wayang Outpost. One study involved 35 students in a public high school (HS) in Massachusetts; another involved 29 students in the University of Massachusetts (UMASS); and the final study involved 29 undergraduates from Arizona State University (AZ). In the HS and UMASS studies, students used the software as part of their regular math class for 4-5 days and covered topics in the traditional curriculum. In the AZ lab study, students came into a lab for a single session. The three experiments yielded the results of 588 Emotional Queries from 80 students who were asked about their emotion, e.g., “How confident do you feel?” The response was a scale 1-5 and the queries separated into four emotion variables: 149 were about confidence/anxiety, 163 about excitement/depression, 135 about
Recognizing and Responding to Student Affect
717
interest/boredom, and 141 about frustrated/not frustrated. 16 of the student responses gave no answer to the Emotional Query. Models were created to automatically infer student emotions from physiological data from the sensors. Students produced selfreports of emotions and all queries include valid data from at least one sensor. In order to select a subset of the available features, a stepwise regression was done with each of the emotions as the dependent variable, and tutor and sensor features as the independent variables. Since some students had missing sensor data, separate models were run pairing the tutor with one sensor at a time, and then finally with all of the sensors. Results from the regression show that the best models for confident, frustrated, and excited came from the subset of examples where all of the sensor data was available, and the best model for interested came from the subset of examples with mouse data available. Summaries of student physiological activity, in particular data streams from facial detection software, helped to predict more than 60% of the variance of students emotional states, which is much better than predicting emotions from other contextual variables from the tutor, when these sensors are absent. In order for the user model system to provide feedback to the tutor, the available sensor and tutor features can be put into a classifier and report when a user is likely to report a high value of a particular emotion, Table 1. This likelihood could reduce and possibly eliminate the need for querying the user of their affective state. To test the efficacy of this idea, we made a classifier based on each linear model in the table. For each model we performed leave-one-student-out cross validation. We recorded the number of True Positives, False Negatives, True Negatives, and False Positives at each test. Table 1 shows results from the best classifier of each emotion in terms of Accuracy. The best classification results are obtained by only training on examples that are not in the middle (scale of 1-5). This is likely the case because the middle values (3) show student indifference. Table 1. This shows results of the best classifier of each emotional response. Accuracy of no classifier is a prediction that the emotional state is always not high. Values in parentheses include the middle values in the testing set as negative examples.
5 Responding to Student Affect Providing empathy or support strongly correlates with learning [21, 22] and the presence of someone who cares, or at least appears to care, can be motivating. Various studies have linked interpersonal relationships between teachers and students to motivational outcomes [23; 1]. Can this noted human relationship be reproduced,
718
B. Woolf et al.
in part, by apparent empathy from a computer character? Apparently the answer is yes [24]. People seem to relate to computers in the same way they relate to humans and some relationships are identical to real social relationships [25]. For example, students continue to engage in frustrating tasks on a computer significantly longer after an empathetic computational response [26], have immediately lowered stress level (via skin conductance) after empathy and after apology [27], and relational skills improve long-term ratings of caring, trust, respect, desire to keep working [28]. Computer agents impact student learning, affect and motivation based on gender, ethnicity and realism of the agent [29].
Fig. 2. Pedagogical Agents act out their emotion and talk with the student expressing full sentences of cognitive, meta-cognitive and emotional feedback
This is not to say that the inferences, movements and interventions of computer agents can exactly replace those of people, nor can peer theories exactly map to the human peer-tutoring case; however, though computer control does allow for careful testing of hypotheses about how to use virtual peer support for learning [23]. Wayang includes gendered learning companions, see Fig. 2, that provide support and encouragement, emphasize the importance of perseverance, express emotions and offer strategies (e.g., “Use the help function”). These learning companions (LCs) are empathetic in that they visually reflect the last emotion reported by the student (queried within the system every five minutes); they act out their emotion and talk with students expressing full sentences of meta-cognitive and emotional feedback. They are non-intrusive -- they work on their own computer to solve the problem at hand, and react only after the student has answered the question. Agents respond with some of Carole Dweck’s [30] recommendations about disregarding success and valuing effort. This adds a new dimension to the traditional feedback regarding success/no-success generally given to students.
Recognizing and Responding to Student Affect
719
Fig. 3. Learning companions make mathematics more interesting. Students with LCs reported more interest in math sessions. Lines represent best-fit curves, which in this case are 4th degree polynomials.
We measured the impact of LCs on student motivation and achievement (Woolf et al., submitted) and integrated controlled exploration of their communicative factors (facial expression and mirroring postures) as the student/agent relationship developed. Empirical studies show that students who use LCs increased their math value (e.g., questions such as “Mathematics is an important topic”), self-concept (e.g., “I am good in mathematics”) and mastery orientation (Woolf et al., submitted). Students tend to become more bored (less interested) towards the end of any instructional session. Yet the student using LCs maintain higher levels of interest and reduced boredom after 15 minutes of tutor use, see Fig. 3. They reported a higher mean confidence, interest and excitement. Despite the fact these results were not significant, this relative advantage for LCs indicates that they might alleviate students’ boredom as the session progresses.
6 Discussion and Future Work This paper described the use of wireless sensors to recognize student emotion and the use of pedagogical agents to respond to students. We presented a user model framework to predict emotional self-concept. The framework works in classrooms of up to 25 students with four sensors per student. By using stepwise regression we have isolated key features for predicting user emotional responses to four categories of emotion. This is backed up by cross validation, and shows a small improvement using a very basic classier. This data has demonstrated that intelligent tutoring systems can provide adaptive feedback based on an individual student’s affective state. We also described the evaluation of emotional embodied animated agents and their impact on student motivation and achievement. There are a number of places for improvement in our system. The first is that we used summary information of all sensor values. We may find better results by considering the time series of each of these sensors. In addition, the software in the Mental State Camera can be trained for new mental states. This is one avenue of
720
B. Woolf et al.
future work. Another place for improvement is to look at individual differences in the sensors. Creating a baseline for emotional detection before using the tutor system could help us to better interpret the sensor features. Emotional predictions from sensors and agents as described above are a first step towards personalized feedback for students in classroom environments. We propose that the tutor will identify desirable (e.g. flow) and non-desirable (e.g. boredom) student states. Different interventions will be tested in an attempt to keep students in desirable states as much as possible (e.g. a confused student might be invited to slow down, reread the problem and ask for a hint). Part of this approach includes embedding a user model into the tutor to provide instructional recommendations. Interventions algorithms are being developed based on tutor predictions, e.g. mirror student emotion, support student effort, provide more immediate feedback on student progress, and allow students increased control of their experience. Acknowledgement. This research was funded by awards from the National Science Foundation, 0705554, IIS/HCC Affective Learning Companions: Modeling and supporting emotion during teaching, Woolf and Burleson (PIs) with Arroyo, Barto, and Fisher and the U.S. Department of Education to Woolf, B. P. (PI) with Arroyo, Maloy and the Center for Applied Special Technology (CAST), Teaching Every Student: Using Intelligent Tutoring and Universal Design To Customize The Mathematics Curriculum. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies. We acknowledge contributions to the system development from Rana el Kaliouby, Ashish Kapoor, Selene Mota and Carson Reynolds. We also thank Joshua Richman, Roopesh Konda, and Assegid Kidane at ASU for their work on sensor manufacturing. We thank Jerry Chen and William Ryan for their contributions to agent development.
References 1. Wentzel, K., Asher, S.R.: Academic lives of neglected, rejected, popular, and controversial children. Child Development 66, 754–763 (1995) 2. Royer, J.M., Walles, R.: Influences of gender, motivation and socioeconomic status on mathematics performance. In: Berch, D.B., Mazzocco, M.M.M. (eds.) Why is math so hard for some children, pp. 349–368. Paul. H. Brookes Publishing Co., Baltimore (2007) 3. Conati, C., Maclare, H.: Evaluating a Probabilistic Model of Student Affect. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) ITS 2004. LNCS, vol. 3220, pp. 55–66. Springer, Heidelberg (2004) 4. D’Mello, S., Graesser, A.: Mind and Body: Dialogue and Posture for Affect Detection in Learning Environments. Paper presented at the Frontiers in Artificial Intelligence and Applications (2007) 5. McQuiggan, S., Lester, J.: Diagnosing Self-Efficacy in Intelligent Tutoring Systems: An Empirical Study. In: Ikeda, M., Ashley, K., Chan, T.W. (eds.) Eighth International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan (2006) 6. Graesser, A.C., Chipman, P., King, B., McDaniel, B., D’Mello, S.: Emotions and Learning with AutoTutor. In: Luckin, R., Koedinger, K., Greer, J. (eds.) 13th International Conference on Artificial Intelligence in Education (AIED 2007), pp. 569–571. IOS Press, Amsterdam (2007)
Recognizing and Responding to Student Affect
721
7. D’Mello, S.K., Picard, R.W., Graesser, A.C.: Towards an Affect-Sensitive AutoTutor. Special issue on Intelligent Educational Systems. IEEE Intelligent Systems 22(4), 53–61 (2007) 8. Conati, C., Mclaren, H.: Evaluating A Probabilistic Model of Student Affect. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) ITS 2004. LNCS, vol. 3220, pp. 55–66. Springer, Heidelberg (2004) 9. Burleson, W.: Affective Learning Companions: Strategies for Empathetic Agents with Real-Time Multimodal Affective Sensing to Foster Meta-Cognitive Approaches to Learning, Motivation, and Perseverance. MIT PhD thesis (2006), http://affect.media.mit.edu/pdfs/06.burleson-phd.pdf 10. Picard, R.W., Scheirer, J.: The galvactivator: A glove that senses and communicates skin conductivity. In: 9th International Conference on Human-Computer Interaction, New Orleans, August 2001, pp. 1538–1542 (2001) 11. Strauss, M., Reynolds, C., Hughes, S., Park, K., McDarby, G., Picard, R.: The handwave bluetooth skin conductance sensor. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 699–706. Springer, Heidelberg (2005) 12. Qi, Y., Picard, R.: Context-sensitive bayesian classifiers and application to mouse pressure pattern classification. In: 16th International Conference on Pattern Recognition, Proceedings, vol. 3, pp. 448–451 (2002) 13. Dennerlein, J., Becker, T., Johnson, P., Reynolds, C., Picard, R.W.: Frustrating computer users increases exposure to physical factors. In: Proceedings of International Ergonomics Association, Seoul, Korea, pp. 24–29 (2003) 14. Mota, S., Picard, R.W.: Automated posture analysis for detecting learner’s interest level. In: Computer Vision and Pattern Recognition Workshop, vol. 5, p. 49 (2003) 15. Kapoor, A., Burleson, W., Picard, R.W.: Automatic prediction of frustration. International Journal of Human-Computer Studies 65(8), 724–736 (2007) 16. Burleson, W., Picard, R.W.: Gender-specific approaches to developing emotionally intelligent learning companions. IEEE Intelligent Systems 22(4), 62–69 (2007) 17. el Kaliouby, R.: Mind-reading Machines: the automated inference of complex mental states from video. PhD thesis, University of Cambridge (2005) 18. Arroyo, I., Cooper, D., Burleson, W., Woolf, B.P., Muldner, K.: Empathetic Pedagogical Agents. Submitted to AIED (2009) 19. Arroyo, I., Ferguson, K., Johns, J., Dragon, T., Mehranian, H., Fisher, D., Barto, A., Mahadevan, S., Woolf, B.: Repairing Disengagement With Non Invasive Interventions. In: International Conference on Artificial Intelligence in Education, Marina del Rey, CA (2007) 20. Dragon, T., Arroyo, I., Woolf, B.P., Burleson, W., El Kaliouby, R., Eydgahi, H.: Viewing Student Affect and Learning through Classroom Observation and Physical Sensors. In: Woolf, B.P., Aïmeur, E., Nkambou, R., Lajoie, S. (eds.) ITS 2008. LNCS, vol. 5091, pp. 29–39. Springer, Heidelberg (2008) 21. Graham, S., Weiner, B.: Theories and principles of motivation. In: Berliner, D., Calfee, R. (eds.) Handbook of Educational Psychology, pp. 63–84. Macmillan, New York (1996) 22. Zimmerman, B.J.: Self-Efficacy: An Essential Motive to Learn. Contemporary Educational Psychology 25, 82–91 (2000) 23. Picard, R.W., Papert, S., Bender, W., Blumberg, B., Breazeal, C., Cavallo, D., Machover, T., Resnick, M., Roy, D., Strohecker, C.: Affective Learning–A Manifesto. BT Journal 2(4), 253–269 (2004) 24. Bickmore, T., Picard, R.W.: Establishing and Maintaining Long-Term Human-Computer Relationships. Transactions on Computer-Human Interaction 12(2), 293–327 (2004)
722
B. Woolf et al.
25. Reeves, B., Nass, C.: The media equation: How people treat computers, television and new media like real people and places. CSLI, New York (1998) 26. Klein, J., Moon, Y., Picard, R.W.: This Computer Responds to User Frustration: Theory, Design, Results, and Implications. Interacting with Computers 14(2), 119–140 (2002) 27. Prendinger, H., Ishizuka, M.: The Empathic Companion: A Character-Based Interface that Addresses Users’ Affective States. Applied Artificial Intelligence 19(3-4), 267–285 (2005) 28. Bickmore, T., Picard, R.W.: Establishing and Maintaining Long-Term Human-Computer Relationships. Transactions on Computer-Human Interaction 12(2), 293–327 (2004) 29. Baylor, A.: The Impact of Pedagogical Agent Image on Affective Outcomes. In: Proceedings of Workshop on Affective Interactions: Computers in the Affective Loop, International Conference on Intelligent User Interfaces, San Diego, CA (2005) 30. Dweck, C.: Messages that motivate: How praise molds students’ beliefs, motivation, and performance (In Surprising Ways). In: Aronson, J. (ed.) Improving academic achievement. Academic Press, New York (2002)
Usability Studies on Sensor Smart Clothing Haeng Suk Chae, Woon Jung Cho, Soo Hyun Kim, and Kwang Hee Han Cognitive Engineering Lab, Yonsei University, Seoul, Korea {acechae19,chrischo,puellang,khan}@yonsei.ac.kr
Abstract. This paper presents approach to usability evaluation on sensor smart clothing that the methodologies can be divided into two categories. 1) usability evaluation that gather data from actual users on sensor smart clothing. 2) investigation weight values which is calculated for evaluation item. The result of usability evaluation shows that SC(sensor controller) influence on overall usability of sensor smart clothing. Effective item and module is social acceptance of SC, wearability of GC(general connector) & PA(platform appearance), usefulness of GC & PA and maintenance(400) of PA & SC. To estimate the sensor smart clothing, task process was applied and the components on the response of user were investigated. This study was performed to determine how effects the properties of sensor smart clothing. Our study suggests that usability evaluation may be important within design process of sensor smart clothing. Keywords: Smart Clothing, Computing, Wearablility.
Usability,
Evaluation, Sensor,
Wearable
1 Introduction Wearable computing has arisen in the area of human computer interaction design which is involved human cognition, context of use, platform of access, task analysis and user experience [8]. “Wearable” has been defined as implying the use of the human body as a support for some product [2]. Besides, “Smart clothing” may be an intelligent garment, which is augmented with electrical or non-electrical component, including safety and entertainment [6]. User-centered smart clothing design must be related with user task [9] since smart clothing is constantly with the user’s better responds to their needs [7]. This paper is focused on developing product of sensor smart clothing in terms of user experience. Usability is needed design frames through that people can use technology that are meaningful. Daily activities can be classified into two categories. Some are characterized by the way the human body is being used as sitting, standing, walking and so on. These activities are best recognized directly where they occur, using body-worn sensors. Recent work used this approach [4], [5]. Other activities are defined by the usage of certain objects or a sequence of objects. Sensors in the environment could track the conditions [4], [5]. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 725–730, 2009. © Springer-Verlag Berlin Heidelberg 2009
726
H.S. Chae et al.
There are a lot of sensing technologies that could be envisioned for context recognition for instance audio, video, photography, acceleration, light, air, body temperature, heat, humidity, pressure, and heart rate. Many physiology sensors require skin contact or special outfits. There is clearly a tradeoff between informative and unobtrusive sensing. In addition, other points are battery life and price. In order to reach user acceptance wearable device must be small and unobtrusive. If possible even fashionable, there are several accessories that are widely accepted, such as belts, watches, necklaces, cell phones or pagers on a beltclip. Miniaturization of hardware has made it possible to integrate sensors into such devices while nobody would want to wear sensors strapped around all legs and arms. This work focuses on a minimal set of sensors. Several studies using physiology sensors were conducted. The aim of usability study in sensor smart clothing is to be determined whether evaluation item of wearable computer would be useful. This study is to investigate partially for weight which is used to calculate for evaluation item.
2 Method 2.1 Participants 12 participants took part in this research. All the participants are undergraduate student. All participants were male because prototype sensor clothing was man type. 2.2 Procedure Participants could put on and handle sensor smart clothing directly. This study surveyed as paper-based questionnaire, with all questionnaire on a 7-point scale from 1 to 7 [1], [3]. Evaluation contents are as following. There are social acceptance(100), wearability(200), usefulness(300), maintenance(400), and safety(500) in item evaluation and there are pa(platform appearance), pm(platform material), sc(sensor controller) sd(sensor detector), gc(general connector) and e(satisfaction). It was total 153 questions (pa 36, pm 8, sc 23, sd 7, gc 14 and, e 65 questions). All usability evaluation had the same procedure. First, participants did the simple task to be accustomed on smart clothing. And then they fill in a questionnaire and interviewed with experimenter about the evaluation. All procedure was recorded on video-tapes. The entire experiment lasted approximately 60minutes. There is description of rating about item evaluation in Table 1. Participants sat in a UT(Usability Testing). Moderator explained sensor smart clothing. Moderator took unstructured interview and made participants talk freely for sensor clothing. After interview was completed, participants were asked to fill out questionnaire. The participants were debriefed after the survey.
Usability Studies on Sensor Smart Clothing
727
Table 1. Item Evaluation Description
3 Results 3.1 Survey Sensor controller's social acceptance was estimated by low point than other items. In addition, sensor controller of maintenance was estimated by low because of controller's size or point. Graph that compares each mean and whole satisfaction item sees two graphs keep similar inclination and is showing possibility that can estimate satisfaction which is whole through the question that estimates each part item(Figure 1). In Figure 2, it is showed pa(platform appearance), pm(platform material), sc(sensor controller), sd(sensor detector), and gc(general connector). These influence on social acceptance, wearability, usefulness, maintenance and safety. Latent outcomes showed
Fig. 1a. Module and Item Evaluation
728
H.S. Chae et al.
Fig. 1b. Evaluation item on sensor smart clothing
that social acceptance was influenced on sensor detector, wearability was influenced on appearance and connector, usefulness was influenced on appearance and connector, and maintenance was influenced on appearance and sensor controller (Table 2). Figure 2 indicated that main factor influencing on usability regardless of individual item was Sensor controller.
Table 2. Evaluation Code and Qusetion Screening Results
3.2 Interview We asked a question briefly of participants except quantitative measurement through question. We could not speak that suggestion which is said here is meaning statistically, but items that are worth as subjective measured value.
Usability Studies on Sensor Smart Clothing
729
Fig. 2. Sensor controller Influencing on usability
Weight imbalance. As shown in question before, because controller was heavy, point that symmetry of clothing throws was brought to problem. Information display. If user wants to see acidity sensor and display, a user is possible though drop head deep. It means others must inform than feeling that do oneself check. Function concentration. Smart Clothing is positive at point that offer convenient function, but user will need to think point that can inform person who does exercise. For example, present activity is achieved properly and if inform how momentum becomes, there was opinion that is good for a person who does exercise.
4 Discussion The present underlying conclusions about evaluation of module and item suggest that sc(sensor controller) influence on overall usability of sensor smart clothing nevertheless evaluation item(figure 2). And effective module of social acceptability(100) is sc(sensor controller) and sd(sensor detector), wearability(200) is gc(general connector) and pa(platform appearance), usefulness(300) is gc(general connector) and pa(platform appearance) and maintenance(400) is pa(platform appearance) and sc(sensor controller). This study has had two main purposes. First, usability evaluation of sensor smart clothing had mean value above average about evaluation item except social acceptance and maintenance. Social acceptance and maintenance is influenced on the reason of experiment of being not perfect prototype. Of course, second, when item validation procedure have seen only tendency, it is not easy to conclude. But in unexperienced product centered technology, though one of factors is a problem such as sensor controller on sensor smart clothing, such a factor would effect on other items. Appearance and material of smart clothing would influence on factor except function in technology because of character clothing itself. On sensor smart clothing prototype is essential measurement based on the physiological measurement. The purpose of sensor smart clothing is to help the user
730
H.S. Chae et al.
to reach optimal condition. Usability evaluation for this smart clothing is connected to technology. To estimate the sensor smart clothing, mechanical stimulation was applied and the components of the response user of smart clothing were investigated. A simple procedure of user study was performed to determine how effects the properties of sensor smart clothing. To evaluate behavior of user, It is necessary that daily measurement and over a period of a month. Acknowledgements. The authors would like to thank project team on technology development of smart clothing for future daily life. This work has been supported by Ministry of Knowledge Economy, Republic of Korea.
References 1. Chae, H.S., Hong, J.Y., Cho, H.S., Park, S.H., Han, K.H., Lee, J.H.: The Development of Usability Evaluation for Wearable Computer: An Investigation of Smart Clothing. Korean Journal of the Science of Emotion & Sensibility 9(3), 265–276 (2006) 2. Gemperle, K., Kasabach, C., Stivoric, I., Bauer, M., Martin, R.: Design for wearability. In: The Second International Symposium on Wearable Computer, pp. 116–122. IEEE Computer Society, Los Alamitos (1998) 3. Hong, J.Y.: Study on Requirement Investigation and Development of Usability Evaluation for Wearable Computers: Toward Smartwear, Doctoral dissertation. Cognitive Science Program, Yonsei University (2007) 4. Intille, S.S., Larson, K., Beaudin, J.S., Nawyn, J., Munguia Tapia, E., Kaushik, P.: A living laboratory for the design and evaluation of ubiquitous computing technologies. In: Extended Abstracts of the 2005 Conference on Human Factors in Computing Systems. ACM Press, New York (2005) 5. Kern, N., Schiele, B.: Context-Aware Notification for Wearable Computing. In: Proceedings of the IEEE International Symposium on Wearable Computing, ISWC 2003 (2003) 6. Lantanen, J., Impio, I., Karinsalo, T., Malmivaara, M., Reho, A., Tasanen, M., Vanhala, J.: Smart Clothing Prototype for the Arctic Environment. Personal and Ubiquitous Computing 6, 3–16 (2002) 7. Lyons, K., Starner, T.: Mobile capture for wearable computer usability testing. In: Proceedings of IEEE International Symposium on Wearable Computing (ISWC 2001), Zurich, Switerland (2001) 8. Macdonald, N.: About, Interaction Design, a Design Council paper series on design issues, London, UK (2003) 9. Smailagic, A., Siesiorek, D.P.: Modalities of interaction with CMU wearable computers. IEEE Personal Communications (3), 14–25 (1996)
Considering Personal Profiles for Comfortable and Efficient Interactions with Smart Clothes Sébastien Duval1, Christian Hoareau2,3, and Gilsoo Cho1 1
Yonsei University, Smartwear Research Center, Department of Clothing and Textile, 134 Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea [email protected] 2 National Institute of Informatics, Hitotsubashi 2-1-2, Chiyoda-ku, Toyko 101-8430, Japan [email protected] 3 The Graduate University for Advanced Studies, Shonan Village, Hayama, Kanagawa 240-0193, Japan [email protected]
Abstract. Profiles describing the abilities and specificities of individual wearers enable smart clothes to fundamentally and continuously personalize their behavior, suggesting or selecting useful, comfortable and efficient services and interaction modes. First, we suggest foundations for the design of personal profiles for the general public based on perception, bodily characteristics, culture, language, memory, and spatial abilities. Then, we sketch reactions towards profiles for oneself and one’s family based on a 2008 pilot study in Japan. Accordingly, we discuss the creation, update, use and dissemination of profiles, and finally perspectives for future social investigations. Keywords: General public, Interaction, Smart clothes, Sociology, Ubiquitous computing, Personal Profile, User profile.
1 Introduction Individuals may benefit daily in private, public or semi-public places from services proposed or adapted based on personal profiles describing users’ specificities, because our past differentiates us, ubiquitous services diversify [1], and contacts with personal, shared and public systems affect quality of life. For instance, partially sighted people may access guidance systems of little use to fully sighted people; terminals may offer sound along with altered graphics to compensate color deficits. Young children may navigate digital encyclopedias through associative links while adults access logical links. Tourists may hear explanations adapted to their culture in a familiar language. Considering activities, aging and disabilities, profiles would benefit most people sooner or later, possibly with mobile networked devices handling local services. With accepted well-thought standards, designers worldwide could greatly enhance systems and favor universal access in a single year, exploiting simple profiles stored in contact cards. Moreover, using detailed profiles in cellular phones or smart clothes (clothes containing computers), designers could bring singular services to challenged J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 731–740, 2009. © Springer-Verlag Berlin Heidelberg 2009
732
S. Duval, C. Hoareau, and G. Cho
as well as non-challenged individuals on a time-scale contingent on the technology concerned. Smart clothes are most challenging because their size, shape and location allow for more numerous and more diverse sensors and actuators as well as for novel services such as health monitoring based on bio-sensors. Knowledge about the wearer appears necessary to mitigate the resulting complexity, e.g. by suggesting inputs and outputs to ignore or prioritize. Advanced smart clothes may thus paradoxically simultaneously provide and require personal profiles. Profiles make sense for the general public since computer users diversified (e.g. kids and elders), access to shared systems spread (e.g. terminals in train stations), and affordable mobile devices emerged (e.g. Apple’s iPhone); the commercialization of smart clothes worn most of the time will strengthen these three motivations. However, standard personal profiles to share information through time, between devices or services appear non-existent. Computer users can barely adapt operating systems to their abilities and needs: accessibility features are static patches with limited options, linguistic choices exclude proficiency levels, and personality and intellect are ignored. On top, local applications are secretive about users, and online services request data to improve business rather than interaction. Only e-learning commonly exploits profiles but they are usually tightly linked to systems themselves, and usually lack sensory, linguistic and psychological dimensions. As multiform ubiquity exists [1] and as huge sums were invested in it worldwide for a decade, one may expect at least embryonic research on all fundamental issues, including profiles. However, the community pursued a monolithic agenda suggested by Weiser’s vision [2], focusing on context awareness: systems were to adapt to the user’s location, access to devices, identity of bystanders, etc. Personal profiles were systematically ignored although, ironically, they could promote the humane face of Weiser’s vision called “calm technology” [3] by uniquely shaping information to provide at the center and periphery of a given person’s perception. Outside ubiquitous computing, the creation of profiles was probably hindered by the intuitive difficulty of cheaply gathering and maintaining reliable sensori-motor and psychological information, accordingly adapting systems, and respecting privacy. To these inherent obstacles, we may add the excessive focus on technology and the over-specialization of scientists; for example works abound to adapt Internet services to various screen sizes independently of users. Finally, some may argue that instead of creating personal profiles to adapt systems we should improve general usability and, when necessary, create dedicated devices. This would be unsatisfactory for the modeled majority of users as improving general usability cannot lead to an optimal design: different users may have conflicting priorities. This would also be unsatisfactory for others as they may not obtain smart clothes–even remotely–appropriate to their specificities, especially if concurrent conditions affect them. Besides, adaptive garments may be very valuable when situations quickly change, for instance when one breaks her leg or when aging degrades senses. Thus, we should simultaneously pursue personalization and improvements in usability. Hereafter, we suggest foundations for the design of personal profiles based on perception, bodily characteristics, culture, language, memory, and spatial abilities. Next, we sketch reactions towards profiles for oneself and one’s family based on a 2008 pilot study in Japan. Accordingly, we discuss the creation, use and dissemination of profiles. Finally, we conclude on perspectives for future social investigations.
Considering Personal Profiles for Comfortable and Efficient Interactions
733
2 Current Knowledge Suffices to Create Partial Profiles To firmly found personal profiles for the world general public, we cover aspects relevant to everyday life with smart clothes. Some additional aspects are discussed in section 4.1. For more information on impairments and functional abilities, see [4]. Deep knowledge of vision and hearing was successfully applied to design cinemas, glasses and cochlear implants [5]. Complex, touch is underexploited by still emerging technologies. We omit smell and taste because the former is badly understood and the latter of little interest in clothing. Beside human senses, bodily features are critical for clothes, culture and language for global markets and mobile users, and memory and spatial abilities for proper uses of most computers. 2.1 Vision We can characterize vision with contrast sensitivity (to distinguish light and dark), visual acuity (to discriminate details), visual field, and color perception, which all affect human-computer interactions [6]. A significant part of the world population is partially sighted, and suffers from important disadvantages when compared to sighted people (e.g. 50 times slower to select icons [6]). Contrast sensitivity may be assessed with a Pelli-Robson chart, visual acuity with a Snellen or Bailey-Lovie chart, the visual field with standardized automated perimetry, and color vision with the Farnsworth D-15 color vision test [6]. The human visual field approximates 200 degrees but disabilities may modify it, affecting everyday life and immersion in data spaces provided by e.g. semi-transparent glasses. Similarly, inter-ocular distance and data about an eye disease or loss helps manage stereovision. 2.2 Hearing and Speech We can characterize hearing with frequency-specific auditory thresholds (in dB) from which a sound is heard; important to notice and identify sounds, and to evaluate the distance to their source. The average intensity required to hear pure tones at 500, 1000 and 2000Hz could provide a unique reference [4]. Head-Related Transfer Functions (HRTF) also characterize hearing; describing the behavior of sound within a listener’s body, especially at the outer ears, they provide spatial clues [7] useful for stereophonic music and some augmented-reality services. Diseases, drug consumption, exposure to loud sounds, and genetic expression may degrade hearing [5]. We may easily characterize speech by its impairments based on audibility, intelligibility, and functional efficiency [4] but such data is difficult to effectively exploit. Age typically alters speech, elders multiplying repeats and restarts in sentences. 2.3 Touch Complex, touch informs about pressure, vibration, temperature, pain, and position thanks to cutaneous and kinesthetic systems. Fully profiling it seems unrealistic as numerous various receptors heterogeneously cover the body; interfaces exist even for the inside of the mouth [8]. Dedicated research is here necessary to establish the data
734
S. Duval, C. Hoareau, and G. Cho
to gather and standard measurements, taking into account bodily differences (e.g. fat, gender) and the influences of age on the short- and long-term. 2.4 Bodily Characteristics Anthropometric information such as citizens’ height and arm length is typically gathered by textile/clothing organizations to evaluate their market or by governments to evaluate the health and evolution of the population. These measurements provide a basis to profile the body of users of smart clothes but should be complemented with dynamic information regarding e.g. joints or excess fat during movements (useful to select or tune interaction algorithms). Besides, data on dominant and non-dominant hands could help adapt interfaces. Finally, age may be a useful proxy for missing information (in or beyond the scope of a profile) because it allows assumptions about users’ abilities; for instance, children, young adults and older adults fatigue at different rates. Recommended interaction and navigation techniques may change according to the developmental phases of children and potential problems of elders. 2.5 Culture and Language Culture affects abilities, common sense, preferences, and world views. In the broad sense of the word, we all simultaneously belong to several cultures, related or not to a national identity. Standardizing classifications of non-regional cultures is a challenge but, as a starting point, one may easily be defined by the culture of her citizenship. Linguistic abilities can be defined by a list of spoken/written languages, associated to proficiencies characterized by the knowledge of vocabulary, semantic structures, usage patterns, and writing features. A common standard lacks but independent ones exist for several languages (e.g. Test Of English for International Communication, Japanese Language Proficiency Test). 2.6 Memory and Spatial Abilities Memory can be divided into sensory, short-term/work, and long-term [4], respectively dealing with immediate sensory information, items for 20-30 seconds, and generic permanent storage. Interacting with computers usually mainly involves short-term memory, easily evaluated by the number of items a person can remember for 20-30 seconds. Smart clothes may promote sensory memory due to their actuators, as well as long-term memory linked to new communication and interaction patterns. Spatial abilities cover spatial perception, spatial cognition, navigation, movement, and manipulation in three-dimensional spaces (e.g. mental rotations), and can notably be evaluated with subsets of Intelligence Quotient (aka I.Q.) tests.
3 Reactions of the Public in Tokyo Suggest Acceptable Designs We gathered surface information about the general public’s perception of potential everyday life uses of personal profiles with paper self-completion questionnaires, and clarified results with informal interviews in Tokyo (Japan) from March to November
Considering Personal Profiles for Comfortable and Efficient Interactions
735
2008. 84 citizens from Japan and 11 other countries, aged 11-75, including as many males as females, participated to this pilot study. Accessible and featuring leading edge technologies, Tokyo was appropriate to evaluate people’s interest in personal profiles for ubiquitous services, and collect sufficient quality information to prepare big scale investigations planned for 2009-2010. Background information of interest included problems experienced in everyday life with familiar systems. Core issues were feelings about physical or mental profiles for oneself and for one’s family, differences in reactions for private, public and shared systems, wishes for services and practices. We designed a 6-page questionnaire, including 4 series of background questions, 17 series of core closed-ended questions, and one open-ended question. We revised it based on comments from a test group, and provided checked versions in Japanese and English. For core questions, participants rated assertions on a 5-point scale: 1-strongly disagree, 2-disagree, 3-neither agree nor disagree, 4-agree, and 5-strongly agree. Respondents were not shown screenshots, photos or videos to avoid bias. A short text introduced the study as research to evaluate existing information systems, and design future devices and services for use by the general public in everyday life. Systems were defined as any computer equipment, electronic device or software. Questions evoked familiar technologies, wearable computers, smart spaces and robots. We clarified the collected data with interviews to establish hypotheses for future studies. We limited statistics to medians, the most simple but least controversial measure for opportunity-based sampling, which provides weak data. The reactions of respondents from diverse cultures expressed a clear pattern presented hereafter. 3.1 Familiar Everyday Devices Already Pose Problems Example of assertion to rate: I already experienced movements problems (example: accurately and quickly use a mouse or type on a keyboard) when using cell-phones or computers. Cellular phones and computers raise interaction-related problems in everyday life. Most respondents experienced significant difficulties with systems due to movements (median: 4), and sometimes to language (median: 3) and spatial abilities e.g. difficulty to find appropriate menus or icons (median: 3). However, respondents think they had no problem due to vision (median: 2), hearing (median: 1) or memory (median: 2). 3.2 Profiles Seem Useful for Personal Systems Used by One’s Self and Family Examples of assertions to rate: It would be useful to adapt my children’s systems to their ability to see, hear, feel objects, and move and To automatically adapt public systems (for example when buying train tickets or getting guidance in shopping malls), I would provide information about my vocabulary (example: meaning of software menus). As seen on Figure 1, respondents feel it would be useful to adapt their systems to their vision, hearing, movements, spatial abilities, memory, and vocabulary (median: 4 all). However they are ambivalent about providing information on their cognitive abilities (median: 3).
736
S. Duval, C. Hoareau, and G. Cho
Although respondents think adapting public systems is also useful, they refuse to provide information for it (medians: 1.5 to 2.5) except for vision (median: 3).
Fig. 1. Perceived usefulness of adaptations, and perceived willingness to disclose information, for use with one’s personal systems
Finally, most respondents feel it would be useful to adapt their children’s and older relatives’ systems to their perception, spatial abilities, memory, language, (median: 4 all) and possibly to their Intelligence Quotient (median: 3). Questionnaires being long, we omitted questions on profiling relatives to adapt public systems and on disclosing information to profile relatives. 3.3 Monitoring of Abilities Seems Fine for Oneself and Elders, Not Children Examples of assertions to rate: I would feel comfortable and agree for my systems to monitor my memory and I would feel comfortable and agree for my children’s systems to monitor their mastery of vocabulary and grammar. Respondents would seemingly quietly let their systems monitor their perception (median: 4), language (median: 3.5), and spatial abilities (median: 4), yet less their memory (median: 3) or Intelligence Quotient (median: 3). Respondents are even more enthusiastic about the monitoring of their older relatives’ language and memory (median: 4 both). However, they express mixed feelings for the monitoring of all children’s abilities (median: 3 all). 3.4 Disrupted Settings Strongly Promote Adaptations Example of assertion to rate: I would like services to adapt to me when I am pregnant. Respondents indicated strong wishes for adaptations of services in disrupted settings: when pregnant (median: 5), physically or mentally challenged (median: 5 both), ageing (median: 5), or abroad (median: 4). Most respondents also wished adaptations when using other people’s devices (median: 4) yet this possibility elicited negative reactions from several respondents.
Considering Personal Profiles for Comfortable and Efficient Interactions
737
3.5 Profiles Should Be Stored on the User, Not Online Examples of assertions to rate: I would feel comfortable with storing a description of my ability to see, hear, feel objects, and move in a card sending information only in contact and I would feel comfortable with storing a description of my intellectual abilities such as finding objects, remembering things, and solving complex problems online with a personal service from e.g. Google. As seen on Figure 2, respondents would feel comfortable with storing information about their vision, hearing and movements in a contact-card or cell-phone (median: 4 all) but less about spatial abilities, memory and language (median: 3 all). Although online storage is usually rejected, it is the preferred method of some respondents who consider it the best method, arguing that companies like Google would do their best to protect the data in order to safeguard their reputation.
Fig. 2. Projected comfort when storing personal profiles according to supports
4 Discussion The actual feasibility of useful personal profiling and its apparent attractiveness to manage everyday systems suggest that smart clothes may successfully store and use a wearer’s profile to continuously adapt their behavior and that of nearby devices. How shall we create, update, exploit and promote such profiles? 4.1 Practical Creations and Updates Necessitate Much Thinking Smart clothes typically involve multi-modal interactions, fabric-embedded sensors, and ubiquity. Personal profiles should thus logically cover (1) senses, spatial abilities and memory, (2) bodily characteristics, and (3) culture and language to ensure access to, and understanding of, information, as well as to render wearers’ interactions more comfortable and efficient. However, we need research to define standards for culture, language, smell, and touch because they are complex. Although personality and preferences may serve, they would provide weak foundations to personal profiles because their subjective nature favors incompatible standards, and reduces their usefulness to adapt arbitrary services.
738
S. Duval, C. Hoareau, and G. Cho
Doctors and psychologists may assess elements in section 2 to create or update profiles but this approach is unrealistic due to their busyness and costs. Alternatively, profiles could be initialized from templates then updated, which makes sense for homogeneous (e.g. kids) but not heterogeneous (e.g. elders) groups. Monitoring [9, 10] or evaluations through games [11] by smart clothes are cheap, convenient and enjoyable update methods; the former requires new algorithms [12] but the latter already works. Profile elements may be specifically updated at appropriate moments, taking into account developmental phases for kids [13], risk assessments for elders, cultural events (e.g. rites of passage), policies (e.g. age for English tests), and environments (e.g. noise pollution). Updates should be possible based on context-awareness or on demand to quickly adapt to disrupted settings, a key wish of respondents. Various creation and update methods should be considered to allow for unequal wealth, technology and expertise worldwide, but also for different levels of patience (e.g. children) or health (e.g. elders). 4.2 Designers May Easily Enhance Wearers’ Comfort and Efficiency With profiles, designers may easily select interaction modes e.g. vision for fully sighted wearers but touch for the blind, or adapt stimulations e.g. size/color of icons, loudness/type of alarms, pattern/location of vibrations for immersive games or affective telepresence. Knowledgeable designers may enhance interactions based on bodily features by correcting the results of accelerometers due to fat, on memory by applying memorable lengths to messages, and on spatial abilities by switching menu styles. Smart clothes may define referenced words and concepts to children and travelers based on their linguistic proficiency. The Intelligence Quotient is a natural–weak–proxy for memory and spatial abilities; age may similarly serve for homogeneous cohorts (using e.g. developmental phases [13]), before old age. Integrating data from separate domains, smart clothes may enrich face-to-face communication with displays on their surface of visual metaphors adapted to an interlocutor’s age and culture. Finally, staff in game and shopping centers may check profiles to suggest simulators or new smart clothes. Smart clothes should be carefully exploited for medical, psychological, social, and legal reasons. We need research on the effects of body-wide stimulations and continuous adaptations, especially for children and elders, and during pregnancy; meanwhile, designers should diversify stimuli. Besides, providers should acknowledge erroneous and incomplete profiles because updates may lag, wearers may be cautious towards e.g. public systems, and uses may be rejected (e.g. touch for women [14]) or taboo. Finally, online storage should be avoided as international privacy laws may restrict its use, which matches the public’s apparent concerns and preferences. 4.3 Ubiquity May Spread Profiles before Smart Clothes Become Mainstream To avoid familiar problems, respondents are open to adapting ubiquitous services with personal profiles for themselves, their children and older relatives, especially in disrupted settings. This family-friendly view should foster adoption as parents equip their children and sometimes also their own parents. Monitoring by personal systems is a key enabler for up-to-date profiles so worries about children’s monitoring should be investigated and, if possible, alleviated; other predictable concerns should also be researched to save this confidence capital from risky or inappropriate first products.
Considering Personal Profiles for Comfortable and Efficient Interactions
739
Embedding perceptual profiles in contact cards or cellular phones seems a first step with few drawbacks; cognitive elements may spread later, after they prove valuable to e.g. stimulate young and elder first adopters. However, adding the Intelligent Quotient seems dangerous. To deal with distrusted services, users should be able to prevent access to their profile, provide a vaguer version or a common archetype. Profiles may be adopted on a larger scale in populations that are heterogeneous (due to aging, pollution, healthcare) or share devices (due to cost or availability), and in environments where diverse networked mobile services are available. Considering initial reactions and the potential of smart clothes equipped with profiles, early adopters may be travelers, challenged people, children, elders, and pregnant women.
5 Perspectives Personal profiles already appear feasible and acceptable to the public. Exploring their creation, update, use, and adoption, we suggested bases for comfortable and efficient interactions in ubiquity-enabled countries, notably with smart clothes as daily support from youth to old age. We will check, deepen and extend our hypotheses with big-scale studies in Seoul in 2009-2010 using better sampling and various methods to study variations due to age, gender and culture, e.g. drawings and scenarios with children. As proof of concept, we will also define XML descriptions of personal profiles to adapt the behavior of smart clothing prototypes from the Smartwear Research Center. Considering uses, we may ask: When are profiles thought useful for children and elders, and why? What are influential fears, hopes and values? What risks do personal profiles raise? How shall we provide affordable reliable profiles worldwide? Considering design, we may ask: What tools would help design regarding memory, spatial abilities, language, culture, and concurrent disabilities? How shall we combine profiles about users and about garments? How shall we test and validate adaptations? Besides, we may ask: How will gender, culture and spirituality shape standards and the adoption of personal profiles, considering the symbolic role of clothes, body images, perceptions of technology, and the importance of individualism? How will body implants, contact cards, cellular phones and smart clothes relate to each other? What will be the energy costs and benefits? How important would integrated profiling be to enable advanced smart clothes? Finally, although beyond the core scope of technological research, we must consider the potential for abuse or misuse of information created by ubiquitous transfers of so far unavailable personal detailed information, leading to e.g. oppression of human rights with impacts on denial of dignity, risks to personal security or to the ability to shape one’s own future. On the contrary, personal profiles could support cultural diversity and reduce wasted (human) potential. Acknowledgments. Thanks to Testuro Kamura (NII, Japan) for helping create the questionnaires and for translating them into Japanese. Thanks to Thomas Martin (L3I, France) for information about adaptive games and adaptive learning. Thanks to Eric Platon (NII, Japan), Christian Sandor (Canon, Japan), Kaori Fujimura (NTT DoCoMo, Japan), and Masaaki Fukumoto (NTT DoCoMo, Japan) for references.
740
S. Duval, C. Hoareau, and G. Cho
References 1. Bell, G., Dourish, P.: Yesterday’s Tomorrows: Notes on Ubiquitous Computing’s Dominant Vision. Personal and Ubiquitous Computing 11(2), 133–143 (2007) 2. Weiser, M.: The Computer for the 21st Century. Scientific American 265(3), 66–75 (1991) 3. Weiser, M., Brown, J.S.: The Coming Age of Calm Technology (ch. 6). In: Beyond Calculation: The Next Fifty Years of Computing, pp. 75–85. Springer, Heidelberg (1997) 4. Jacko, J., Vitense, H.: A Review and Reappraisal of Information Technologies within a Conceptual Framework for Individuals with Disabilities. Universal Access in the Information Society 1(1), 56–76 (2001) 5. Wilson, B., Dorman, M.: Interfacing Sensors With the Nervous System: Lessons From the Development and Success of the Cochlear Implant. Sensors Journal 8(1), 131–147 (2008) 6. Jacko, J., Rosa, R., Scott, I., Pappas, C., Dixon, M.: Visual Impairment: The Use of Visual Profiles in Evaluations of Icon Use in Computer-Based Tasks. International Journal of Human-Computer Interaction 12(1), 151–164 (2000) 7. Bowan, D.A., Kruijff, E., LaViola, J.J., Poupyrev, I.: 3D User Interfaces: Theory and Practice. Addison-Wesley, Reading (2005) 8. Tang, H., Beebe, D.: An Oral Tactile Interface for Blind Navigation. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14(1), 116–123 (2006) 9. Gemmell, J., Bell, G., Lueder, R.: MyLifeBits: A Personal Database for Everything. Communications of the ACM 49(1), 88–95 (2006) 10. My Life Assist Service by NTT DoCoMo, http://pr.mylifeassist.jp/ 11. Jimison, H., Pavel, M., McKanna, J., Pavel, J.: Unobstrusive Monitoring of Computer Interactions to Detect Cognitive Status in Elders. IEEE Transactions on Information Technology in Biomedicine 8(3), 248–252 (2004) 12. Nack, F.: You Must Remember This. Multimedia 12(1), 4–7 (2005) 13. Piaget, J.: Le Language et la Pensée chez l’Enfant. Delachaux et Niestlé, Paris (1923) 14. Duval, S., Hashizume, H.: Satisfying Fundamental Needs With Wearables: Focus on FaceTo-Face Communication. Transactions of the Virtual Reality Society of Japan 10(4), 495– 504 (2005)
Interaction Wearable Computer with Networked Virtual Environment Jiung-yao Huang1, Ming-Chih Tung2, Huan-Chao Keh3, Ji-jen Wu3, Kun-Hang Lee1, and Chung-Hsien Tsai4 1
Dept. of Computer Science and Information Engineering, National Taipei University, San Shia, Taipei, 237 Taiwan [email protected], [email protected] 2 Department of Computer Science and Information Engineering, Ching Yun University, 229 Chien-Hsin Road, Jung-Li, Taoyuan County 320, Taiwan [email protected] 3 Department of Computer Science and Information Engineering, Tamkang University Tamsui 251, Taiwan [email protected], [email protected] 4 Dept. of Computer Science and Information Engineering, National Central University, Taoyuan County 32001, Taiwan [email protected]
Abstract. The goal of this research is to propose a technique to integrate the mobile reality system into the legacy networked virtual environment. This research composes of two essential research domains, one is networked virtual environment (NVE) and the other is mobile computing. With the proposed technique, a user can use a mobile device to join a networked virtual environment and interact with desktop users of the same virtual environment. To achieve this goal, three technical issues have to be solved including mobile networking, resource-shortage and coordinates coordination. The paper presents solutions to all of these issues. Further, a Mobility Supporting Server (MSS) is proposed to implement presented solutions into an existing networked virtual environment, called 3D virtual campus, Taiwan. The result of this experimental research enlightens the possibility of building a Multiplayer Mobile Mixed Reality (M3R) environment in the near future. Keywords: Networked Virtual Environment (NVE), Mobile Computing, Mobile Supporting Server, Multiplayer Mobile Mixed Reality.
1 Introduction The Networked Virtual Environment (NVE) is the research of integrating a distributed system, network communications, and virtual reality into a graphical multi-player interactive system. In this synthetic environment, each player is embodied by a graphical representation called the avatar to convey his identity, presence, location, and activities to others.[1] Complied with the development of the computing technology, J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 741–751, 2009. © Springer-Verlag Berlin Heidelberg 2009
742
J.-y. Huang et al.
the research of Mixed Reality started in 1994[2]. The research of mixed reality focuses on how to integrate the digital world and the physical world into one mixed space. Depending upon the degree of integration, the mixed reality can be classified as Augmented Reality and Augmented Virtuality. As computers have become smaller and more powerful, the concept of a portable, high performance computer system for the augmented reality has become feasible and popular in the recent years[3]. Mobile Augmented Reality (MAR) is the research of the augmented reality technique on the mobile computing system. The significance of MAR is to enrich the application of the mobile computing by adding friendly interface from the augmented reality technology. With the help of the mobile computing, MAR frees the conventional augmented reality application from the desktop[4]. Mobile augmented reality allows computer-generated virtual objects to overlap with live images when the user is navigating the real world. This capability promotes the augmented reality to a more friendly and daily applications. However, one drawback of MAR is that it still isolates the user from each other. The user can only interact with the objects inside the MAR environment. He/She does not know the existence of other players. To further expand the application of MAR, the study of integrating interactive networking capability to MAR has become the trend of the recent researches[5]. However, none of these previous networked MAR researches attempts to integrate NVEs with MARs to enable MAR users to interact with conventional NVE users within a shared mixed reality space. This study refers to such an integrated environment as the Multiplayer Mobile Mixed Reality (M3R) environment. The basic idea of M3R is integrating NVEs with MAR to enable remote MAR users to interoperate with each others. In this study, a user who joins the virtual environment through his desktop computer is called the desktop player. Meanwhile, the mobile player is the user who uses a wearable computer or a notebook to wirelessly login the virtual environment and to interact with others while he is moving in the physical world. In the following sections, the research issues of allowing MAR user to join a conventional NVE are given first. Proposed solutions to those research issues are then followed in the next section. Finally, the implementation of a prototyping M3R system is given in the last section.
2 Research Issues of the M3R Environment The goal of the M3R environment is to enable the mobile player to join the legacy networked virtual environment and to interact with others while moving in the physical world. To enable the mobile player to interact with the desktop user, there are three important issues to be solved. First, a method is required to keep the data link between the mobile user and the desktop user alive and smooth due to the fragility of wireless network environment. Second, under the constrained and limited computing resources of the mobile device, the balance of computing load between desktop device and mobile device and the balance of transmitted data between wired and wireless networks are barely stable. Finally, because the mobile user can use the positioning device (such as Global Positional System receiver) to provide his location information to the virtual
Interaction Wearable Computer with Networked Virtual Environment
743
world, consequently, if the result of coordinated data from Geographical coordinate system of the physical space could be matched with position data from Cartesian coordinate system adopted by the virtual world or not would affect the interaction between mobile and desktop players. The above issues can broadly classified into three categories. One is the stability and bandwidth problem of the mobile network. The second issue is the limited computation resources of the mobile device. The last one is the data correlation between the Geographical coordinate system and the Cartesian coordinate system. Upper part of Figure 1 depicts these three issues and their sub-problems.
Fig. 1. Hierarchies of research issues and proposed approach of M3R environment
2.1 Mobile Networking (Box Labeled A) The mobile user joins virtual shared space through the wireless network. Hence, the signal stability of wireless network will affect the interaction between the mobile players and desktop players. The advantage of wireless network is to allow the mobile device to compute while its carrier is moving in the open coverage of access point (AP). The research of [6] pointed out the fragile wireless signal induce the connection stability and handoff problem. The connection stability issue further includes disconnection and bandwidth variability problems. The instability of signal affected by surrounding environment and the moving speed of the carrier would lead to the disconnection problem. The bandwidth variability is often related to signal strength between AP and the mobile device. The handoff problem occurs when the mobile player moving away from the coverage of one AP and entering another coverage range. The choice of wireless network includes GPRS, 3G, WiMAX and WiFi, etc. and each of them has different factors to causes the above mobile networking issues. Hence, this study is not focus on any specific wireless network but explores the logical linking problem only. That is, this study targets on how to keep logical link between the mobile player and the server alive while the signal strength still exists within an acceptable threshold.
744
J.-y. Huang et al.
2.2 Resource Shortage (Box Labeled B) This issue focuses on the performance problem caused by the limitation of the mobile device. The mobile device is always designed with the limitation of display capability, power consumption and CPU, and these constraints will significantly affect the computing performance. To solve this problem, computing load of the mobile device has to be reduced and shared by others. Further, the wireless network is the only way for the mobile device communicates with the server, the variation of the bandwidth can not guarantee that interactive message can be transmitted between the mobile device and the server in time. This situation, in turn, will influence the interaction among players in the same virtual world. These limitations of the mobile device will consequently affect the realism of interaction when the mobile device is interactively communicated with other computing devices. Hence, the resource shortage issue can be further classified into two sub-problems: computing load sharing and data transmission control. Because, other than exchange messages with other player, the majority work of the mobile device is to display 3D information, the rendering becomes the major load of the mobile device. Therefore, to solve the loading sharing problem, graphic render related computing works have to be studied such as the visibility of objects, the realism of object appearance and the realism of object animation. The visibility problem is aimed to limit the visible of distant objects. Whereas the realism of appearance and animation are focus on reduce the complexity of appearance and animation, respectively, of the distant visible objects. 2.3 Coordinate Discrepancy (Box Labeled C) Since the mobile player uses the position sensing device (such as GPS) to navigate the virtual world and to animate his remote avatar, this location information has to be converted into the position data recognizable by the virtual world. However, since the data from GPS is expressed in Geographical coordinate system whereas the virtual world uses Cartesian coordinate system, this conversion may cause spatial and temporal inconsistency between the mobile player and the desktop player. That is, as long as the mobile player is moving in the physical world, the difference of these two coordinate systems may corrupt the causal order relationships of events within the shared virtual world. Hence, coordination positional data and events between these two coordinate systems will significantly affect the realism of interaction. Further, the moving speed of the mobile player can be misinterpreted by the Server which, in turns, causes the temporal relation error as well as interaction among players.
3 Proposed Solutions The solutions for the M3R interaction issues of the previous section comprise the research domains of networking, graphics and coordinates studies. Each issue and sub-problem had already been discussed and proposed solution from its perspective
Interaction Wearable Computer with Networked Virtual Environment
745
domain. Each solution has its pro and con. This study studies various solutions from their respective domains and recommends the most suitable solution from the view point of the overall performance. Lower part of Figure 1 depicts the proposed solutions of corresponding problems. 3.1 Solutions for Mobile Networking (Box Labeled A) Since the bandwidth variation problem is deeply affected by the deployed underline wireless network such as GPRS, 3G, WiMAX and WiFi, etc., this study will not put focus on this problem. For the disconnection problem, a simple yet effective solution is to periodically detect the connection status of the mobile player. If the duration of disconnection is within a given threshold, the server can simply adjust message update rate of the mobile player to avoid wasting of bandwidth usage. The most well-known mechanism for this solution is the heartbeat function from DIS[7]. According to the heartbeat mechanism, the mobile player will periodically send a connecting message to the server. If this heartbeat sequence is interrupted, the server will then be aware the instability of the wireless network. If this disconnection is longer than a given threshold, the server then kick out the mobile player for him to re-login later. Otherwise, the server can redeem the missing data by the dead reckoning algorithm[8] as defined in DIS protocol. That is, the server can use the past history of the mobile player to predict his movement and forward this prediction to other players. Notice that the proper threshold is related to the frequency of re-logins. Based on the instable wireless signal, a frequent re-login situation can easily flood the wireless bandwidth with duplicated messages on the mobile host. The last problem of the mobile networking is the handoff problem. The solutions to this problem may range from the physical layer to the application layer. For the M3R environment, research in [9] points out that the soft handoff model is a reasonable approach. However, the actual solution is decided by the wireless network technique that is used. Hence, it is not fully discussed in this paper. 3.2 Solutions for Resource Shortage (Box Labeled B) The mobile device always has inferior computing resource than the desktop device. This inferior computing resource can be categorized into two types: the computation power of the mobile device itself and the communication resource to the outside world. The resource shortage service attempts to solve these problems based upon the characteristics of networked virtual environment. The inferior computation power could be solved by sharing some of the computing load on the mobile device to the server site. As to the limitation of the communication resource, it can be solved by controlling the data flow between the server and the mobile device to maintain their interaction. The rationale of computation sharing can be derived from the limitation of the human visual perception. For example, when the human being is moving, his eyesight to a distant object tends to be less perceptible. Hence, the depth perception[10] technique can be adopted to reduce the number of the rendered objects to solve the
746
J.-y. Huang et al.
problem of visibility of objects. According to the depth perception technique, if the server site can use the status of a mobile player to compute remote objects that are within his field of view (FOV) [11] and forward this information to the player, the computing load of that mobile device will then be significantly saved. Once a distant object is within the perception distance of a player, further load sharing can be achieved by reducing the realism of appearance and animation. For the realism of appearance issue, the level-of-detail(LOD)[12] technique is a conventional approach to render an object with different resolution according to its distance to the viewer. If an object far away from a viewer, less polygonal information is required for the viewer to identify that object which, in turns, implies less computation to be needed. However, the decision of model resolution of an object to a viewer can be computation intensive as the number of objects within a virtual environment is increased. Hence, if the server site can also forward the resolution information of a remote object when passing its status information to the mobile device, the computing load of the mobile device can then be saved. In other words, the server site pre-computes the resolution of a remote object before forwarding its status information to the mobile player. The computing load of the mobile device is then reduced by simply rendering objects according to its received resolution information. To further adapt to the bandwidth variance during the interaction, the data priority [13] approach is adopted to sort transmitted message to the mobile device. According to [13], messages flow within the virtual environment are prioritized according to their importance to the interaction. Low priority event messages can be preempted by high priority event message. If a low priority event message is preempted over a predefined period, it will be then regarded as old event to be discarded away. With this approach, the data transmission control issue is the resolved by requesting the server to send message to the mobile player based upon the priority of that message. Meanwhile, the mobile player will reduce the frequency of message transmission when the bandwidth is reduced. 3.3 Solutions for Coordinate Discrepancy (Box Labeled C) The differences in the coordinate systems used by the virtual world and the physical world can cause both the spatial and temporal inconsistency problems between the mobile player and the desktop player. To solve the spatial inconsistency problem, the coordinate correlation between the physical world and the virtual world needs to be computed beforehand. However, there is no direct translation from a point in the Geographical coordinate system to its corresponding coordinate in the Cartesian coordinate system. To solve this problem, a third coordinate system, called the Earth-Centered, Earth-Fixed(ECEF) [14] is developed. ECEF coordinate system is used as the mediator. That is, GPS data are first translated from the Geographical coordinate system into data in the ECEF coordinate system, and then the Cartesian coordinate system. This pipeline of translation consists of a sequence of complex matrix manipulation which is not suitable for the computation requirement for NVE. Especially when the client device is mobile device, the complexity of computation may
Interaction Wearable Computer with Networked Virtual Environment
747
Fig. 2. The geographic markers(left) and the virtual makers(right)
not be acceptable. To solve this coordinate reconciliation problem, a simple yet effective translation is required. First, the Cartesian coordinate system is manually assigned to the virtual world when the virtual world is designed. Each object within the virtual world is then placed corresponding to this Cartesian coordinate system. Second, fixed positions within the virtual world are carefully chosen as the virtual markers. In the physical world, the geographic markers that are corresponding to those virtual survey markers, respectively, are then computed, as illustrated in Figure 2. In other words, each geographic marker is a physical location that is mapped to the virtual marker. The geometric relationship between the geographic marker and the virtual survey marker is then calculated. This relationship becomes the equation to transfer the geographic position in the physical world to a coordinate data in the virtual world and vice versa. In addition, the proportional scale between the virtual world and the physical world will cause a temporal inconsistency problem between the mobile player and the desktop player. For example, when a mobile player is walking across the road, his motion may be misinterpreted by a desktop computer as a fly motion due to the scaling difference in the input data. “A shared sense of presence” is one of five common features for the networked virtual environment[1]. This feature is achieved by allowing each player to control the motion of an avatar inside such a shared space. For a desktop player, the mouse and keyboard are two legacy input devices to support navigation and interaction within the virtual environment. On the other hand, since the mobile player uses the GPS data to navigate the virtual world, the GPS receiver becomes an input device to control the motion of his remote avatar. That is, when the mobile player is walking in the physical world, his location is tracked by the location sensor and transmitted to other players through the server. From the viewpoint of the desktop player, the mobile player is navigating the virtual world by his own motion. Hence, the Geographical coordinate from the GPS receiver and orientation from the electronic compass needs to be correctly mapped to position and direction inside the virtual world. This mapping is achieved by first computing the geometrical ratio among given position and geographic markers that were set when the virtual world was designed. The Cartesian coordinate of this position is then derived from computed geometrical ratio with respect to the virtual markers. This approach allows fast translation between the Geographical coordinate and Cartesian coordinate with an acceptable spatial inconsistency. This technique also
748
J.-y. Huang et al.
tolerates different moving speeds when the mobile is walking in the real world. Only when the walking speed is below a predefined value, the accurate Cartesian coordinate will be actually computed.
4 Implementation Other than the proposed solutions of the interaction issues, the implementation approach will also affect the performance of M3R environment. The most straightforward implementation is to allow the mobile device to communicate with the multi-user server directly and to implement proposed solution on the multi-user server. However, this type of implementation will overload the server and, hence, decrease the performance of M3R environment. To avoid this problem, as illustrated in Figure 3, a mediate server called Mobile Support Server (MSS) was designed to realize the proposed solutions. The MSS play the role of a data mediator between the multi-user server and the mobile player. For multi-user server, MSS is a desktop player that controls multiple avatars inside the virtual world. On the other hand, the mobile player treats the MSS as a special-purpose server that shares some of its computation load.
Fig. 3. The architecture of M3R environment
The Mobile Support Server (MSS) is implemented on the Windows XP platform. To further verify the validity of the proposed solutions, the MSS is realized on the existing networked virtual environment called 3D virtual campus[15]. The 3D virtual campus is a networked virtual environment of National Taipei University (NTPU), Taiwan. By adding MSS, both the mobile players and desktop players are allowed to join this 3D campus environment. The position of the mobile player is derived by the GPS receiver. The mobile device, i.e. a wearable computer or a notebook, will translate and transmit its received GPS data to the server. The server then forwards the received data to other players for them to remote render his avatar. Hence, the 3D campus project allows more vivid interactive experiences when the user is navigating this overlapped virtual and physical worlds. The movement of the mobile player changes with the virtual reality picture as Figure 4 is the snapshot of the mobile devices.
Interaction Wearable Computer with Networked Virtual Environment
749
Fig. 4. The snapshot of the mobile device
Skype [16] capability is further embedded in the 3D virtual campus to enable live chatting among players. When a user wants to chat with another player on the user list, he can directly click that player’s name. The system will then launch the Skype software to connect to that specific player. For example, if the user “sennin32” wants to voice chat with, says, “annheilong”, he can click the receiver’s name, “sennin32”, on the right sub-window or on top of the avatar. As shown in Figure 5, a calling notification will pop-up on the receiver’s browser. The receiver can decide whether to accept or deny this call by clicking buttons on the pop-up window.
Fig. 5. Live chatting through Skype software
5 Conclusion and Future Work This paper studies the techniques to integrate mobile computing into the networked virtual environment. The integrated environment is referred as Multiplayer Mobile Mixed Reality (M3R) environment and it enables the user to wear a mobile device to interact with the conventional desktop player in the shared virtual space. This paper fully discusses technical issues to design such a networked mixed reality environment. The first issue is to keep steady connection between the mobile device and the server under the instable wireless environment. The second issue is to maintain interaction performance under the constraint of the limited computing resource of the mobile
750
J.-y. Huang et al.
device. The last issue is to solve the data inconsistency problem caused by the difference of two coordinate systems which are the Geographic coordinate system and the Cartesian coordinate system. Further, the solutions to the above research issues are presented along with the proposed implementation approach. To fully support the mobile player, a Mobile Support Server(MSS) is designed for the traditional NVE as the data mediator between the mobile player and desktop player. In essence, the MSS is the mechanism to share parts of the computation of the mobile device when the mobile player is interacting with the mixed reality environment. Finally, architecture of a NVE with MSS is also presented in this paper. Although the implementation of the 3D virtual campus project successfully verifies the effectiveness of MSS to support a M3R environment, more research issues are exposed for further studies. For example, from the prior experiences, notebooks are proved to be unsuitable for mobile users to operate in moving. The ultimate goal of the mobile device is an optical see-through mobile augmented reality system running on a wearable computer. In addition, the orientation of the mobile player and, hence, the corresponding display of the virtual world is another important issue that requires further investigate. In an M3R system, the digital compass is used to detect the rotation of the mobile player. The digital compass has a well known inaccurate and unstable problem. Consequently, other auxiliary orientation sensor requires further exploration. Currently, the research of using the optical flow technology to detect the mobile player’s rotation is under study. Finally, the performance of MSS is another important topic that deserves further probe.
References 1. Singhal, S.K., Zyda, M.J.: Networked Virtual Environments Design and Implementation. Addison Wesley, Reading (1999) 2. Milgram, P., Kishino, F.: A Taxonomy of Mixed Reality Visual Displays. IEICE Transactions on Information Systems E77-D (12), 1321–1329 (1994) 3. Rosenblum, L.: Virtual and Augmented Reality 2020. IEEE Computer Graphics and Applications 20(1), 38–39 (2000) 4. Feiner, S., MacIntyre, B., Höllerer, T., Webster, T.: A touring machine: Prototyping 3D mobile augmented reality systems for exploring the urban environment. In: Proc. ISWC 1997 (First IEEE Int. Symp. on Wearable Computers), Cambridge, MA, October 13-14, Also in Personal Technologies, pp. 208–217 (1997) 5. Kortuem, G., Bauer, M., Heiber, T., Segall, Z.: NETMAN: The Design of a Collaborative Wearable Computer System. Mobile Networks and Applications 4(1), 49–58 (1999) 6. Forman, G.H., Zahorjan, J.: The challenges of mobile computing. IEEE Computer 27(4), 38–47 (1994) 7. Distributed Interactive Simulation, IEEE Standard 1278.1a, http:// ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumber=5896 8. Singhal, S.K.: Effective remote modeling in large-scale distributed simulation and visualization environments. Stanford University, Stanford, CA. PhD thesis (1996)
Interaction Wearable Computer with Networked Virtual Environment
751
9. Huang, J.Y., Tsai, C.H.: A Wearable Computing Environment For The Security of A Large-Scale Factory. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 1113–1122. Springer, Heidelberg (2007) 10. Wann, J.P., Rushton, S.K., Mon-Williams, M.: Natural problems for stereoscopic depth perception in virtual environments. Vision Research 19, 2731–2736 (1995) 11. Kolb, H., Fernandez, E., Nelson, R.: The Organization of the Retina and Visual System, http://webvision.med.utah.edu/index.html 12. Clark, J.: Hierarchical geometric models for visible surface algorithms. Communication of the ACM 19(10), 547–554 (1976) 13. Witmer, B.G., Singer, M.J.: Measuring presence in virtual environments: A presence questionnaire. Presence: Teleoperators and Virtual Environments 7, 225–240 (1998) 14. ECEF datum transformation, http://www.satsleuth.com/GPS_ECEF_Datum_transformation.htm 15. Huang, J.Y., Tung, M.C., Keh, H.C., Wu, J.J., Lee, K.H., Tsai, C.H.: A 3D Campus on the Internet – A Networked Mixed Reality Environment. The Transactions on Edutainment (to appeared, 2009) 16. Prasolova-Førland, E., Sourin, A., Sourina, O.: Cybercampuses: Design Issues and Future Directions. The Visual Computer 22(12), 1015–1028 (2006)
The Impact of Different Visual Feedback Presentation Methods in a Wearable Computing Scenario Hendrik Iben1 , Hendrik Witt1 , and Ernesto Morales Kluge2 1
2
Center for Computing Technologies, University of Bremen, Germany {hiben,hwitt}@tzi.de BIBA - Bremen Institute for Industrial Technology and Applied Work Science, Bremen, Germany [email protected]
Abstract. Interfaces for wearable computing applications have to be tailored to task and usability demands. Critical information has to be presented in a way allowing for fast absorption by the user while not distraction from the primary task. In this work we evaluated the impact of different information presentation methods on the performance of users in a wearable computing scenario. The presented information was critical to fulfill the given task and was displayed on two different types of head mounted displays (HMD). Further the representations were divided into two groups. The first group consisted of qualitative representations while the second group focused on quantitative information. Only a weak significance could be determined for effect the different methods used have on the performance but there is evidence that familiarity has an effect. A significant effect was found for the type of HMD.
1
Introduction
We choose a simple task applicable to wearable computing scenarios were the information can be presented in different ways. This task serves as an abstraction for a real task in a similar way Witt and Drugge showed in [2006] by simulating a primary task. Participants were asked to calibrate a rectangular table by adjusting the height of the four table legs using an open-jaw wrench. The status of the calibration was represented by the angle formed between the floor and two orthogonal axes on the tables’ surface. The deviation from the calibrated state on each axis was presented to the participants via a HMD while performing the adjustments. This task was chosen as an example for a typical maintenance scenario. The goal of the study was to determine how the method of information representation and the use of either a monocular or binocular HMD affect the performance at the calibration task. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 752–759, 2009. c Springer-Verlag Berlin Heidelberg 2009
The Impact of Different Visual Feedback Presentation Methods
1.1
753
Experiment Setup
The apparatus used in the study consists of a wearable computer (OQO), a MicroOptical SV-6 non-transparent HMD, a Zeiss ProVi 2D non-transparent HMD and a special textile vest designed and tailored to unobtrusively carry all equipment as well as all needed cabling for the HMD (figure 2). The task given to the participants consists of aligning a rectangular table with the floor. To accomplish this task, the height of the table legs has to be changed by turning
Fig. 1. Experiment setup
Fig. 2. Vest with wearable computing equipment
754
H. Iben, H. Witt, and E.M. Kluge
an adjustment screw with the open-jaw wrench. Ideally, this task can be solved in three steps by adjusting the height of three table legs to match the height of the remaining one. To measure the alignment of the table an XSense MT9 inertial sensor is used to acquire needed pitch and roll values. For the purpose of subject post-hoc motion analysis and to determine problem-solving strategies dependent on the feedback methods presented, a button is mounted to each leg of the table. Subjects have to press first the button mounted on the leg of the table to indicate their proceeding adjustment of the height adjustment screw of that particular leg. Button pressed events are logged in the central log file of each user. Figure 1 provides a schematic overview on the setup. 1.2
Representation Methods
Three different groups of representation methods were tested where in each group a qualitative and a quantitative method exists. The groups are shape-based, bargraph-based and textual representations. Shapes Qualitative shape feedback (figure 3) is based on the form or size of a shape that implies whether a certain threshold value has been reached. The calibration result is adjusted correctly when a (green) circle is displayed. Increasing deviation from the calibrated state is indicated by the representation morphing into a (red) triangle. The shape morphing is performed by removing vertices from the polygon approximating the circle until only three vertices remain. In the subjects’ display this output modality is represented for each angle. The quantitative shape based feedback (figure 4) used here consists of two needles that are correlated to the measured angle in the physical set-up. The area defined by the space between the two needles indicates the state of the measured object and the setup is adjusted correctly when a rectangular area can be seen. In addition to the graphical representation, the measured values are also displayed in text. Bargraph-Area The qualitative bargraph (figure 5a) has two axes which represent the two measured angles without any quantitative data and only with the modulus of the
Fig. 3. Qualitative shape based feedback
The Impact of Different Visual Feedback Presentation Methods
755
Fig. 4. Quantitative shape based feedback
(a) qualitative
(b) quantitative
Fig. 5. Bargraph-area representations
measured value. The set-up is adjusted correctly when the graphs disappear in the origin. The quantitative bargraph (figure 5b) covers a scale from negative to positive values for both axes. Furthermore the values are displayed at the axes, so that an explicit value can be read. Textual-Feedback The qualitative textual feedback is a message indicating whether the angles have to be increased or decreased. The screen will display ’increase’ in case the value of the angle is lower than the desired value and ’decrease’ in case the value is higher. The quantitative textual feedback is realized by displaying explicit values on the display. For a measured value of e.g. -4 the system will display ’-4’. When the system displays ’0’, the table has been calibrated on that axis.
2
Methods
A total of 20 subjects were selected for participation in the experiment. The study used a within subject design with the feedback method as the single independent variable, meaning that all subjects tested every method where the type
756
H. Iben, H. Witt, and E.M. Kluge
of HMD was evenly distributed among the subjects. To avoid learning effects, the subjects were divided into counterbalanced groups where the order of methods differs. A single test session consisted of one practice round where the subject got to understand each feedback method, followed by one experimental round during which data was collected for analysis. The time to complete the primary task naturally varies depending on how quick the subject is. When comparing task completion times, the values were normalized. In the end of the experiment subjects were provided with questionnaires to record qualitative data used for later evaluation, e.g. to gain user acceptance measures. User acceptance was measured by asking the participants to rank all six methods according to their preference. This results in a rank of one to six for each method where one is used for the best method.
3
Results
The average task completion times (see table 1) for each presentation method have been computed for the two types of HMDs used and for the average of both groups.
Table 1. Task completion times output TCT (s) device Shape qual. Shape quan. Bargrap qual. Bargraph quan. Monocular 131,21±71,20 132,08±67,37 135,66±44,82 129,57±103,64 Binocular 105,97±48,04 134,79±84,07 175,34±113,47 119,78±90,98 average 118,59±60,51 133,43±74,16 155,50±86,40 124,67±95,05 output TCT (s) device Text qual. Text quan. Monocular 274,77±226,66 180,92±175,36 Binocular 162,98±85,49 131,63±44,85 average 218,88±176,31 156,27±127,11
The results show large differences in the task completion time over all subjects regardless of the information presentation method (see figures 6a and 6b). There is also no clear difference between the percentage of time needed for each presentation method when using a monocular or binocular HMD (see figure 6c). To verify whether the method or the type of HMD had a significant effect on the task completion times, a two-way ANOVA was performed. For the type of HMD, a p-value of 0.203412 was calculated suggesting no significant effect on the times. A p-value of 0.056378 was calculated for the presentation method suggesting only a weak significance. The combined effect of HMD and presentation type resulted in a p-value of 0.361591.
The Impact of Different Visual Feedback Presentation Methods
(a). task completion times, monocular
757
(b). task completion times, binocular
(c). comparison of task completion time
(d). subject preferences (lower is better)
(e). needed attempts
Fig. 6. Evaluations
Despite the statistical weakness when analysing the measurements the evaluation of the questionnaires draws in interesting picture (see figure 6d). When asked for their preference, the non-shape quantitative methods are ranked far better. Also the subjects have ranked the qualitative text representation to be the worst. Participants ranked the presentation methods in this order (best to worst, average over both groups): Bargraph quantitative, Text quantitative, Bargraph qualitative, Shape quantitative, Shape qualitative, Text qualitative. In addition to the task completion time the number of attempts to complete the task were also recorded (see figure 6e). An analysis of the average number of attempts needed shows a difference between the groups using monocular and binocular HMDs. A two-way ANOVA was performed on the average number of
758
H. Iben, H. Witt, and E.M. Kluge
attempts needed resulting in a p-value of 0.0098 suggesting a significant effect. Users of binocular HMDs needed less attempts in all modalities where the modality with the least average attempts (3.8 attempts) needed was the qualitative shape based feedback method. Most attempts (5.3 attempts) were needed with the qualitative text based feedback method. For monocular HMDs the modality with less attempts (5.7 attempts) was the quantitative bargraph method. Most attempts (10.4 attempts) were needed with the qualitative text based method. On average, least attempts (5.2 attempts) were needed with the qualitative shape based method, while most attempts (7.85 attempts) were needed with the qualitative text based method. The quantitative bargraph method was ranked as the best method on average by the participants and also the task completion time is lowest (on average) for this method. The number of needed attempts, while not lowest, is also very small compared to the other methods.
4
Discussion and Future Work
From a statistical point of view, no significant effect could be found for the type of the presentation method. Still the analysis suggest a higher relevance of the presentation to the task completion time. Evaluating questionnaires asking for a ranking of methods a clear preference for quantitative methods could be found while the text based qualitative method was ranked worst by far. The participants best ranked method was the quantitative bargraph representation. While the quantitative shape based representation is very similar to this method (in terms of visual feedback) it was not ranked very good. A possible explanation can be given by taking into account that bargraphs are a very common and understood concept while the quantitative shape based method makes use of geometric properties to provide visual feedback. The method could have been rejected by the participants because it is an unfamiliar concept. The use of binocular HMDs shows a significant decrease in the number of attempts needed to complete a task. It is still unknown what aspect causes this advantage. One possible explanation could be the expected effect of binocular rivalry associated with the use of monocular displays. A simpler possible explanation could also be that the used binocular HMD was easier to wear by the majority of the participants leading to a better performance. However, both argumentations do not explain the significant increase of needed attempts while no significant increase in completion time was observed. Further studies should concentrate more on different aspects of information display in wearable computing scenarios and the differences of monocular and binocular HMDs with respect to the attention payed to the information. It is reasonable to assume that some presentation techniques are more efficient for certain types of information than others. An in-depth evaluation of presentation methods is necessary to find suitable methods for common types of information. It is unclear if it is possible compensate the negative effect of binocular rivalry when using monocular HMDs by presentation techniques. More studies have to be performed to find reliable presentation techniques for monocular
The Impact of Different Visual Feedback Presentation Methods
759
and binocular HMDs in wearable computing scenarios. When designing information representation methods, the differences between the two types of displays and the conceptual approach to encoding the information have to be considered carefully.
Acknowledgements This study was conducted in part of the WearIT@Work project which is funded in part by the Commission of the European Union under contract 004216 WearIT@Work.
Reference [2006]
Witt, H., Drugge, M.: Hotwire: An apparatus for simulating primary tasks in wearable computing. In: Proc. CHI 2006: Extended Abstracts on Human Factors in Computing Systems (April 2006)
Gold Coating of a Plastic Optical Fiber Based on PMMA Seok Min Kim, Sung Hun Kim, Eun Ju Park, Dong Lyun Cho, and Moo Sung Lee Faculty of Applied Chemical Engineering and Center for Functional Nano Fine Chemicals, Chonnam National University, Gwangju 500-757, Sourth Korea [email protected], [email protected], [email protected], [email protected], [email protected]
Abstract. We investigated the adhesion between gold thin film and poly(methyl methacrylate) (PMMA) and poly(vinylidene fluoride-co-hexafluoropropylene) (P(VDF-co-HFP)) substrates with the aim of imparting electrical conductivity to plastic optical fibers (POFs). The two polymers were used as the core and the cladding of POF, respectively. Gold thin film of 50nm thickness was deposited by ion sputtering onto the polymers and also POF. Several approaches, which were well known to be effective in enhancing adhesive strength between gold and polymers, were applied in this study: introduction of polar functionality on the substrate surface by plasma treatment, buffer layer insertion, and physical surface roughening. The variation of wettability and adhesion with plasma conditions was investigated through water contact angle measurement and cross hatch cut test. Even though the contact angles of substrates were decreased after Ar or O2 plasma treatment, irrespective of the polymer type, the adhesion of polymers with gold layer was very poor. The Ti buffer layer of 5nm thickness, which was deposited between PMMA substrate and gold layer, did not contribute to improve the adhesion. However, P(VDF-co-HFP) substrates with rough surface of 13.44nm RMS shows 3B class adhesion to gold from the cross hatch tape test. The gold-coated POF showed the electrical conductivity of 1.35×103Scm−1 without significant optical loss. The result may be used for developing a medical device capable of simultaneously applying electrical and optical stimulus. Keywords: plastic optical fiber, POF, sidelight, overcoating.
1 Introduction Light therapy consists of exposure to daylight or to specific wavelengths of light using lasers, LEDs, dichroic lamps or very bright, full-spectrum light, for a prescribed amount of time and, in some cases, at a specific time of day. Laser light waves penetrate the skin with no heating effect, no damage to skin and no side effects. Laser light directs biostimulative light energy to the body’s cells which convert into chemical energy to promote natural healing and pain relief. It has proven effective in treating Acne vulgaris, seasonal affective disorder, neonatal jaundice, and is part of the standard treatment regimen for delayed sleep phase syndrome [1]. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 760–767, 2009. © Springer-Verlag Berlin Heidelberg 2009
Gold Coating of a Plastic Optical Fiber Based on PMMA
761
Acupuncture is a technique of inserting and manipulating fine filiform needles into specific points on the body to relieve pain or for therapeutic purposes. Even though most acupuncture needles are made from stainless steel but can also be made of silver or gold. Among several types, plastic needles made from plastic optical fibers are very interesting because they collect light from the light source and radiate light or heat against the affected body part, producing improved remedial effects [2]. More recently, an optical/electrical acupuncture needle has been introduced for the purpose of simultaneously applying pulsed electrical energy and colored light one into body tissue [3]. The needle body is formed of a central optical transmission core, preferably an intermediate opaque non-conductive clad layer atop the core and an outer conductive layer positioned atop the clad layer. Plastic or polymer optical fibers (POFs) are fiber type optical waveguides and can transmit light up to several hundred meters. The light transmitted through OF includes digital light pulse carrying digital information, sunlight, therapeutic laser light, and so on. Since optical signal is propagated along the fiber by total internal reflection, POF has core-sheath (or cladding) structure and refractive index of cladding should be lower than that of core, i.e., ncladding < ncore. Usually, poly(methyl methacrylate) (PMMA, n = 1.49) and fluoropolymers with n ~ 1.35 to 1.43 are used as core and cladding polymers, respectively [4, 5]. Since POF is flexible and tough, it can be used as effective medical devices. For example, flexible light diffusers made from POF can be adjusted during treatment, especially on complex body surfaces unlike many current inflexible light diffusers [6]. In this study, as a preliminary work on the realization of such optical/electrical acupuncture needle, we attempt to impart electrical conductivity to POF by coating gold thin layer on its surface. Gold was selected due to its high conductivity of 4.52×103Scm-1 and aesthetic point of view. Since the cohesive energy of metals is typically two orders of magnitude higher than that of polymers, the interaction between the two is very weak. The durability and performance of the needle is closely related to the adhesion between POF and gold layer. As a prerequisite for the development of gold-coated POF needle strong adhesion between gold and POF is needed. Several approaches, which are well known to be effective in enhancing adhesive strength between gold and polymers, are applied in this study: buffer layer insertion, plasma treatment and physical surface roughening.
2 Experimental 2.1 Materials Two different polymers, poly(methyl methacrylate) (PMMA) and poly(vinylidene fluoride-co-hexafluoropropylene) (P(VDF-co-HFP)) were purchased from Aldrich Chemical Co. and vacuum dried at 50°C for 24h before use. The two were the same type of polymers, which were used as the core and the cladding of POF, respectively. The POF used in this study was a step index type with the diameter of 0.5mm and obtained from Nuvitech Co., Ltd, Korea.
762
S.M. Kim et al.
2.2 Sample Preparation The polymer substrates having a dimension of 20×20×1 mm were prepared by compression molding using a Carver press. In order to reduce the surface roughness of substrates applied during preparation they were annealed at the temperature above the melting and glass transition temperatures of the polymers for 20min: at 170°C for PMMA and 120°C for P(VDF-co-HFP), respectively. The surface of the substrates was then modified by using a plasma treatment in a home-made plasma reactor. Three different working gases, Ar, O2, and a mixture of Ar/O2 (1:2 by volume) were used in this study. Except for processing time, other processing parameters were kept constant at 60W of plasma power and 30sccm of gas flow rate. Gold thin film of 50nm thickness was deposited by using a DC/RF magnetron sputtering system onto the substrates, which were plasma-treated or not, and also POF. The system was equipped with two different targets, titanium and gold, and sputtering was performed at the condition of a DC power of 300W and working pressure of 10×10-3 mTorr for 30second. The thickness of gold thin layer was controlled to be 50nm. 2.3 Characterization The effects of plasma treatment conditions (operating gas and operating time) on the surface of the polymer substrates were investigated by using contact angle measurement and X-ray photoelectron spectroscopy (XPS), which is widely accepted as one of the most powerful techniques for polymer surface chemical analysis. The adhesion of gold layer with the polymer substrates was determined by a cross hatch cut tape test (ASTM D3359-02). The test method was originally designed to assess the adhesion of coating film up to 125μm to substrates by applying and removing pressure. A cross-hatch cutter with multiple preset blades was used to make sure that the incisions were properly spaced and parallel. After the tape has been applied and pulled off, the cut area was then inspected and rated according to the percentage of the squared remaining on the test specimen. In this study, 1mm (11 teeth) cutter and 12.3125N/25mm adhesive tapes were used. The electrical conductivity of gold-coated POF was measured by using the 4-probe method, where a silver paste was used for the electrical contact.
3 Results and Discussion Figure 1 shows the change in the atomic force microscopy (AFM) topography of P(VDF-co-HFP) substrates before and melt annealing. The polymer is used as cladding material for POF. The root-mean-square (RMS) surface roughness of the substrate decreases from 13.44 nm to 2.96nm after melt annealing as a result of mass transfer dominated by surface diffusion. Unless otherwise mentioned, the samples with smoothened surface are used for further study. XPS spectra of plasma treated polymer substrates are shown in Figure 2. For the case of PMMA substrate of Figure 2(a), new peaks are not shown, but relative intensity of the O1s peak to the C1s peak increases after plasma treatment as compared
Gold Coating of a Plastic Optical Fiber Based on PMMA
763
with that of untreated surface. On the other hand, for P(VDF-co-HFP) substrate, a new peak corresponding to O12 atom is clearly seen at the binding energy of 528eV. The contents of O1s atom increase to 6.78% and 5.25% in Ar and O2 atmospheres, respectively. This indicates the introduction of oxygen atoms onto the hydrophobic surface of P(VDF-co-HFP) and from the result the change in its surface properties is expected. The effect of the type of working gases on the surface functionality of the substrates is not large.
Fig. 1. AFM images of P(VDF-co-HFP) substrates: (a) as-prepared; (b) annealed at 120°C for 20min
(a)
(b)
Fig. 2. XPS spectra of (a) PMMA and (b) P(VDF-co-HFP) substrates, plasma- treated at 60W plasma power and 30scm gas flow rate. For visual convenience, the curves are shifted vertically from bottom to top.
Figure 3 shows the variation of water contact angle (θ) with the time of plasmatreatment. The measurement was performed after 24h from the plasma treatment, considering that gold thin layers are coated on polymer substrates after the same interval. Irrespective of the type of substrates the angle is decreased at first and then is not
764
S.M. Kim et al.
changed after 3min treatment. The decrease in the contact angle, i.e., the improvement of the wettability of the polymer surfaces is the result of the increase of the surface polar groups introduced during plasma treatment. The Ar plasma shows a better effect for improving the wettability than the O2 plasma for P(VDF-co-HPF) substrate, while for PMMA substrate the difference is marginal. This corresponds well to the content of O1s atom introduced by plasma treatment, as shown in Figure 2.
100
100
(b)
Ar O2 Ar/O2(1:2)
80
Contact angle (degree)
Contact angle (degree)
(a)
60
40
20
Ar O2 Ar/O2 (1:2)
80
60
40
20 0
2
4
6 Time (min)
8
10
12
0
2
4
6
8
10
12
Time (min)
Fig. 3. Variation of water contact angle with the time of plasma-treatment using the working gas of A, O2, or mixture of Ar and/O2: (a) PMMA; (b) P(VDF-co-HFP) substrates
The adhesion between gold and polymer substrates was assessed by the crosshatched tape test. The surfaces of substrates were treated using Ar plasma for 5min before depositing gold layer. The conditions were chosen because the contact angles did not decrease anymore after the time. Figure 4 shows the photographs of the substrates after the cross hatch cut tape tests. The adhesions of gold thin film to PMMA and P(VDF-co-HFP) were very poor and were evaluated as both classification 0B from the surfaces of cross-cut area from which flaking has occurred. Plasmatreatment condition applied in this study did not contribute at all to enhancing the adhesion. Since the adhesive properties of substrates are affected by their chemical nature, the topography and the cohesive strength of the surface regions [7], no improvement in the adhesion indicates that the functionality introduced onto the polymer surfaces is not enough to change the cohesive energy of polymer surfaces. Extra experiments to introduce different types of functional groups such as thiol group are in progress and the details will be given at the conference. Figure 5 shows the effect of surface roughness of P(VDF-co-HFP) substrates on the adhesion of gold layer. The substrate which is not melt annealed and has RMS surface roughness of 13.44nm, as shown in Figure 1(a), was used for the test. The change in surface roughness gives significant increase in adhesion: the adhesion level increases from classification 0B to 3B. Since the same plasma treatment conditions are applied, the increase seems to be closely related to the increase in surface area. Although Ar or a mixture of Ar/O2 gases is well known to be effective etching
Gold Coating of a Plastic Optical Fiber Based on PMMA
765
gas for polymer, the conditions, especially equipment power of 60W, applied in this study are not enough to induce the change in surface roughness. High power equipment and longer processing time are required for the purpose.
PMMA
P(VDF-co-HFP)
Fig. 4. Photographs of gold-coated polymer substrates after the adhesion test. The substrates with smoothened surface by melt annealing are used for coating.
P(VDF-co-HFP) Rough surface
Fig. 5. Photographs of gold-coated P(VDF-co-HFP) substrate after the adhesion test. The substrate with rough surface is used for coating.
POF having the smooth cladding surface of a fluoropolymer, similar to P(VDF-coHFP) was gold-coated under the same sputtering conditions applied to polymer substrates (see Figures 6(a) and (b)). Plasma-treatment condition applied before sputtering did not change, too. As expected from Figure 4, the gold layer deposited was easily peeled off with bare hand, indicating no adhesion between POF and gold layer. This makes it impossible to use the gold-coated POF as an acupuncture needle capable of simultaneously delivering electrical and light stimuli to the body tissue. Since surface roughness is key factor for improving the adhesion between gold and P(VDFco-HFP), the surface of POF was slightly damaged by rubbing with sandpaper. The depth and width of the defects introduced was not qualitatively monitored, but the defects is well seen in Figure 6(c). The sample obtained by gold-coating of the specimen
766
S.M. Kim et al.
of Figure 6(c) is shown in Figure 6(d). It shows strong adhesion not to be peeled off with bare hand. More precise control of surface roughness with controlled manner is required for proper adhesion.
(a)
(b) (c)
(c)
(d)
Fig. 6. (a) and (c) SEM micrographs of POF surfaces, (a) neat; (c) slightly damaged with sandpaper; (b) and (d) photographs of the sample obtained by gold-coating of the specimens of Figures 6(a) and (c), respectively
The electrical conductivity of gold-coated POF, which is shown in Figure 6(d), was measured to be 1.35×103Scm-1, one-third of that of neat gold. The optical loss of goldcoated POF did not significantly decrease after physical damaging and gold coating, maybe due to highly reflecting capability of gold.
4 Summary In this study, we have realized an acupuncture needle capable of delivering optical/electrical stimuli into body tissue. Plastic optical fiber based on PMMA was used as light carrying medium. In order to impart electrical conductivity to POF, gold thin film of 50nm thickness was deposited by ion sputtering onto the surface of POF. From some preliminary works on the adhesion between gold and polymers, it was revealed that surface roughness of polymer substrates is key factor for strong adhesion. Plasma treatment conditions applied in this study was not appropriate and higher power condition was required. The gold-coated POF, in here the surface of POF was slightly rubbed with sandpaper, showed the electrical conductivity of 1.35×103Scm-1 without significant optical loss. The result may be used for developing a medical device capable of simultaneously applying electrical and optical stimulus into body tissue.
Gold Coating of a Plastic Optical Fiber Based on PMMA
767
Acknowledgments. This work was supported by Grant NO. RTI 04-03-03 from the Regional Technology Innovation Program of the Ministry of Knowledge Economy (MKE), Korea. SMK, SHKim, and EJP also thank for financial support by BK21 program from MKE.
References 1. 2. 3. 4.
http://en.wikipedia.org/wiki/Light_therapy Ideguchi, E., Yamada, S.: U.S. Patent 5, 250, 068 (1993) Jin Zhao, R.: U. S. Patent 6, 916, 329 (2005) Marcou, J., Robiette, M., Bulabois, J.: Plastic Optical Fibers: Practical Application’s. John Wiley & Sons, Chichester (1997) 5. Daum, W., Krauser, J., Zamzow, P.E., Ziemann, O.: POF: Polymer Optical Fibers for Data Communication. Springer, Berlin (2002) 6. Khan, T., Unternährer, M., Buchholz, J., Kaser-Hotz, B., Selm, B., Rothmaier, M., Walt, H.: Performance of a contact textile-based light diffuser for photodynamic therapy. Photodiagnosis and Photodynamic Therapy 3(1), 51–60 (2006) 7. Grythe, K.F., Hansen, F.K.: Surface Modification of EPDM Rubber by Plasma Treatment. Langmuir 22(14), 6109–6124 (2006)
Standardization for Smart Clothing Technology Kwangil Lee and Yong Gu Ji Dept. of Information and Industrial Engineering, Yonsei University, Seoul, Korea {party38,yongguji}@yonsei.ac.kr
Abstract. Smart clothing is the next generation of apparel. It is a combination of new fabric technology and digital technology, which means that the clothing is made with new signal-transfer fabric technology installed with digital devices. Since this smart clothing is still under development, many problems have occurred due to the absence of the standardization of technology. Therefore, the efficiency of technology development can be strengthened through industrial standardization. This study consists of three phases. The first phase is selecting standardization factors to propose a standardization road map. The second phase is to research and collect related test evaluation methods of smart clothing. For this, we selected two categories, which are clothing and electricity/electron properties. The third phase is establishing a standardization road map for smart clothing. In this study, test evaluations have not yet been conducted and proved. However, this study shows how to approach standardization. We expect that it will be valuable for developing smart clothing technology and standardization in the future. Keywords: smart clothing, standardization, new fabric technology, clothing property, electricity/electron property.
1 Introduction Since many people are growing more and more interested in developing digital technology and highly technical clothes, there is a growing number of studies about smart clothing adapted with digital technology. Smart clothing can be defined as ‘new clothes that are convenient for use by IT based applications.’ [1]. However, since smart clothing is still being developed, there is no standard definition yet. So, it is often referred to as ‘digital smart clothing,’ ‘digital clothes,’ and ‘intelligent clothes.’ Since 1998, Smart clothing has been developed in the United State and Europe. At the beginning stage, many people tried to adapt computers into clothes and use them [2]. For instance, the Industrial Clothing Design + line jackets developed by Levi Strauss in collaboration with PRL(Philips Research Laboratory) allows wearers to use a remote-controlled microphone embedded in the collar for mobile phones and Digital MP3 players [3]. Smart clothing is now being developed for everyday life, with the market expanding into the military, health and medical care, business and leisure industries [4]. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 768–777, 2009. © Springer-Verlag Berlin Heidelberg 2009
Standardization for Smart Clothing Technology
769
Even though smart clothing technology has just started to develop, the economic outlook is bright. According to NPD (National Purchase Diary) Group’s recent research, 50% of male clothing will be equipped with high technological functions by 2010, with women’s clothing reaching the same level by 2012 [5]. Smart clothing technology has been actively developed domestically for use in easing daily life [6][7]. However, A Standardization for technology has not started yet. Thus, this study proposes the methodology and standardization road map for smart clothing technology development. Also, this study will be helpful to establish an effective strategy for smart clothing development.
2 Background about Smart Clothes Standardization 2.1 Standardization Trends An analysis of the standardizations regarding smart clothing shows that until June 2008, there was no application for ISO or IEC certification [8][9]. Technology development and commercialization have been expedited in the United States, Japan and Europe. According to a recent study, however, partial technology development is in progress in Southeast Asia. However, there are no studies or data related to standardization except in Japan, whose fire-fighting uniforms are certified by the IEC [9]. 2.2 Necessity of Standardization It is judged that competition among the United States, Europe and other developed countries has begun in the global apparel market. In order to dominate the growing market, one must urgently focus on technology development. However, many problems have occurred due to the absence of the standardization of technology. For instance, we currently manufacture all kinds of subsidiary materials to aid us in developing the technology. Therefore, the efficiency of technology development can be strengthened through smart clothing standardization. 2.3 Expected Effect of Standardization − Revitalization of fashion industry: Recently, fashion industries have changed from labor-intensive to knowledge-intensive industries. Such a knowledge-intensive, high-valued textile industry is expanding rapidly. As standardization is very important to a high-technology industry such as smart clothing, it will revitalize the clothing industry’s parallel with developing technologies. − Acquisition of international competitiveness: The standardization of specifications, telecommunication, protocol and systems for a product’s function can encourage competition in the international market, easy entry to the market and leadership in the industry. Therefore, we will be able to take the lead in the global smart clothing market throughout standardization.
770
K. Lee and Y. Gu Ji
− Increasing market sharing: It is clear that technology development is required for increasing market share. Therefore, market sharing for smart clothing in the international market can be maximized by standardization along with technology and product development. − Positioning in international standardization society: If we dominate the technology area such as digital textiles and e-textiles, before international specification has been ratified, we will be able to take the lead in the world apparel market. The responsibility for standardization of the technology can reside in a chairman, director or other high-ranking official of ISO/IEC. Therefore, our profile will be high and we will have a step forward when competing with the other developed countries’ monopolies and the challenges posed by emerging markets.
3 Methodology This study consists of three phases. The first phase is selecting the standardization factors to propose the standardization road map. The second phase is to research and collect related test evaluation methods of smart clothing. For this, we selected two categories, which are clothing and electricity/electron property. The third phase is establishing a standardization road map for smart clothing. Figure-1 bellow is the methodology of this study.
Fig. 1. Methodology for smart clothing standardization
3.1 Standardization Analysis This phase established the smart clothing standardization analysis according to the smart clothing classification. In this study, the smart clothing classification is a definition of related products and technology among the smart clothing categories. Throughout the smart clothing classification, each standardization factor is selected. Experts analyze these standardization factors to make a list according to three
Standardization for Smart Clothing Technology
771
important, weighted factors. The factors that need to be prioritized are: market competitiveness, product life cycle and possibility of commercial launch. Experts give marks to each product based on the factors and listed in priority order. 3.2 Gathering the Related Performance Evaluation The next step is to collect data about smart clothing evaluation. The features of smart clothing are divided into two categories, which are clothing and electricity/electron property. The following are the definitions of each property: − Clothing property: Smart clothing is basically considered regular clothing. Therefore, it should have basic clothing functions such as wearability and fashionability. In terms of these functions, abrasion resistance, bending resistance and washability need to be addressed, as well as the design of the clothing. Based on these features, data needs to be collected regarding research, recommendations and safety standards in international organizations such as ISO. − Electricity/electron property: Smart clothing not only has the characteristics of clothing, but also the characteristics of electricity/electron elements. For example, certain levels of the life cycle should satisfy the standards that are developed. Also, despite being adapted clothes, they should work properly. Lastly, over the clothing should be safe for the human body. Therefore, safety standard for electricity also need to be established. International safety standards in organizations such as ISO/IEC and product standards need to be collected. 3.3 Standardization Road Map Based on research, testing samples and such standardization as mentioned above, a standardization road map has been discovered. The road map was revaluated based on the technology level, technology life cycle, marketability and time expectations. The standardization road map has positive impacts on standardization for smart clothing technology.
4 Case Study 4.1 Standardization Analysis of Smart Clothing Establishing smart clothing classification - As suggested in Table 1 below, smart clothing can be categorized into four areas: body & environment monitoring clothing, entertainment clothing, photonic clothing and extra functional clothing. We defined the clothing products and related technology among these four categories. This smart clothing classification will be the first step toward a future standard. We will select the standardization factors according to this smart clothing classification.
772
K. Lee and Y. Gu Ji Table 1. Smart clothing classification Smart clothes category Body & environment monitoring clothing
Entertainment clothing
Corresponding technology ECG electrode GPS built-in GSR built-in Bio-monitoring MP3 player builtin Cell phone built-in Media POF (Plastic Optical Fiber)
Photonic Clothing
Extra functional clothing
Photoelectron & LED(Light Emitting Diode)
Related technology
Common Technology
ECG measurement GPS module GSR module Temperature/humidity sensor module Ultraviolet rays/ozone measurement module
Signal sensor
Fabric signal wire Fabric key pad Signal transmissibility Materials POF fabric manufacture Photoelectron weaving textile LED manufacture
Color/Sound responding Heat emitting function Air-conditioning function Masking in army uniform Thermal resistance in fire fighter
Conductive fabric
Thin film electroluminescent technology
Color/Sound sensor Heat emitting system Air-conditioning system
Digital conversions
Anti-radar Thermal resistance fabric
Standardization factor analysis - These factors are divided into two categories, which are standardization of product and performance. Next, four elements are applied: technology, marketability, urgency and compatibility, listed in order from high to low. Evaluation has been done by five professional groups in the area of smart clothing. The results will be used for standards of performance evaluation. Table 2 below is the standardization factor analysis. Table 2. Standardization factor analysis Smart clothes
Clothes Products
Test type
Technol -ogy level
Market -ability
Urgency
Compati -bility
ECG measurement Temperature/humid ity sensor module GSR module Ultraviolet rays/ Ozone measurement module Environment sensor
Performance
Ɣ
Ɣ
Ɣ
Ɣ
A living body signal & environment monitoring clothes
Product
Product
Product
Performance
Standardization for Smart Clothing Technology
773
Table 2. (continued) Entertain ment clothes
Photonic clothes
Extra functional clothes
Fabric signal wire Fabric key pad Signal transmissibility materials Photonic clothes module POF fabric manufacture Light emitting diode Photoelectron textiles materials Electron activity textiles materials Smart clothes interface Interface & components for smart clothes Smart clothes usage test Smart clothes terminology
Performance Performance
Ɣ Ɣ
Ɣ Ɣ
Ɣ Ɣ
Ɣ
Performance
Ɣ
Product
Ɣ
Ɣ
Performance
Ɣ
Performance
Product
Ɣ
Product
Ɣ
Performance
Product
Ɣ
Ɣ
Performance
Performance
Ɣ
Ɣ
※ Note: Standardization priority: High (●), Medium (), Low (◎). 4.2 Test Evaluation Factors for Standardization − Clothing property: Collecting data in terms of clothing function refers to ISO (International Organization for Standardization), BS(British Standards), DIN (Deutsche Industrie Normen), JIS(Japan Industrial Standards), ASTM(American Society for Testing Materials) and KS (Korea Standardization), as well as KCA(Korea Consumer Agency) and KITRI(Korea Appear Testing Research Institute) [12][13][14][15][17][18]. Evaluations in terms of relation, importance and possibility have been done by professionals using 3 weights: High, Medium and Low. Twentyfour evaluation factors were found, as the result chart mentions below in Table 3. Category
Test Type
No.
Relevancy
Importance
Possibility
Size changes rate
1-1
Appearance change
●
●
●
Iron size change rate
1-2
●
●
●
1-3
●
●
●
Mechanical properties
Form change Breaking strength and elongation Tearing resistance
1-4
●
●
1-5
●
●
●
Abrasion resistance
1-6
Seam strength
1-7
●
●
Flexibility Wide fabric tensile strength Water resistance
1-8
●
●
●
1-9
●
●
●
1-10
●
●
◎
Insulation
1-11
●
●
●
774
K. Lee and Y. Gu Ji
Color fastness
Product Test
Bending resistance
1-12
●
●
●
Durable press
1-13
●
◎
◎
Infrared reflectance
1-14
◎
◎
Fire-resistance Color fastness to washing Color fastness to crocking Color fastness to dry cleaning Colorfastness to water Colorfastness to ironing Industrial production A production of rubber
1-15
●
●
●
1-16
●
●
●
1-17
●
◎
1-18
●
◎
◎
1-19
●
●
●
1-20
◎
●
1-21
●
●
1-22
●
A button
1-23
●
●
●
A zipper
1-24
◎
◎
●
– Electricity/electron property: The research has been done in terms of credibility, safety and performance. We obtained information from ISO/BS/DIN/JIS/ASTM/KS professional evaluations in terms of relation, importance and possibility [12][13][14][15][17][18]. They had three choices: High, Medium and Low. Twentyfive evaluation factors were found, as the result chart mentions below in Table 4.
Category Reliability
Safety
Test Type Key Pad and Earphone Jack durability test ISO/IEC 9126-2 8.2.1 Fault rate
No.
Relevancy
Importa-nce
Possibility
2-1
◎
●
◎
2-2
●
●
◎
ISO/IEC 9126-2 8.2.1 Problem rate
2-3
●
●
◎
ISO/IEC 9126-2 8.2.2 Down rate
2-4
●
●
ISO/IEC 9126-2 8.2.2 Recovery rate
2-5
●
●
Safety of household and similar electrical appliances General performance – Safety Requirements Specification for safety of household and similar electrical appliances. Particular requirements for electric irons Particular requirements for spin extractors Specification for safety of household and similar electrical appliances Particular requirements for washing machines Particular requirements for shavers, hair clippers and similar appliances Household and similar electrical appliances. Safety.
2-6
●
◎
2-7
●
●
◎
2-8
◎
●
◎
2-9
●
2-10
●
●
◎
2-11
◎
◎
◎
2-12
●
●
2-13
◎
◎
◎
2-14
●
●
Standardization for Smart Clothing Technology
Electricity
775
Particular requirements for floor treatment machines and wet scrubbing machines Audio, video and similar electricity apparatus – Safety requirements Appliance couplers for household and similar general purposes Ballasts for tubular fluorescent lamps.
2-15
●
◎
2-16
●
◎
2-17
●
●
◎
2-18
●
●
Auxiliaries for lamps. a.c. supplied electricity ballasts for tubular fluorescent lamps. Single-Capped Fluorescent Lamps Safety Specifications Information technology equipment
2-19
●
2-20
●
2-21
◎
◎
◎
Safety requirements for electrical equipment for measurement, control, and laboratory use Hand-Held Motor-Operated Electric Tools Fixed capacitors for use in electricity equipment Testing and Measuring Equipment/Allowed Subcontracting
2-22
●
●
◎
2-23
●
●
◎
2-24
◎
◎
◎
2-25
◎
◎
◎
4.3 Standardization Road Map Based on previous research, Table-5 shows the application of standard factors from tests. These factors were revaluated in terms of the level of technology, life cycle and marketability. We applied high relation and high marks to the results from previous steps up to 6 categories. We will arrange smart clothing standardization wording in order of each characteristic. Smart clothing category Body signal & environm ent monitorin g clothes
Year Techno-logy
Level
08 ECG measurement standardizati on
09
10
11
Temperature/hu midity sensor module standardization GSR module standardizati on
Entertain ment clothes
Mark et
Ɣ
Property Clothing 1-8
2-2 2-4
1-22
2-5
1-5
2-2
Ɣ
Ɣ
Ɣ
Ɣ
2-3
1-6 1-8
Electricity /electron
1-19
Ultraviolet rays/ozone measureme nt module standardization Sensor standardization related with environment Textile-based Transmission line standardization
Life time
2-4
1-8
2-2
1-19 1-22
2-4 2-54
-
2-3
-
2-5
-
2-7
-
2-3 2-5
-
2-7
1-4
2-5
1-5
2-2
1-6
2-7
776
K. Lee and Y. Gu Ji Textile-based key pad standardization Signal transmissibility Materials standardization Photonic clothes
Ɣ
Photonic Clothes module standardization
Ɣ
Photoelectron textiles materials standardization
Electron activity textiles materials standardization Smart clothing interface standardization Smart clothing components standardization Smart clothing terminology standardization
Ɣ
Ɣ
POF fabric manufacture technology standardization Light emitting diode(LED) standardization
Extra functional clothes
Ɣ
Ɣ
1-1
2-1
1-3
2-2
1-23
2-5
1-1
2-1
1-3
2-2
1-8
2-5
1-5
2-9
1-6
2-16
1-8
2-19
1-5
2-9
1-6
2-16
1-8
2-19
1-8
2-9
1-11
2-16
1-12
2-19
1-1
2-9
1-3
2-16
1-8
2-19
1-9
2-5
1-10 1-11
2-7 2-22
1-1
-
1-3
-
1-4
2-4
Ɣ
-
2-5
-
2-25
The definition of related technology and product
5 Conclusion and Further Study Smart clothing is a new form of fabric. However, standardization of smart clothing technology has rarely been considered and little research has focused on solving these problems. Evaluation and technology standards have not existed so far, even though the technology has been developing. In order to have an efficient technology development, evaluation and technology standards need to be presented. This study has not conducted a systematic standardization yet. Nonetheless, this study proposed methodology for the standardization of smart clothing, and presents how strategic thinking may help toward achieving a standardization of smart clothing. As a result, we believe that our standardization road map can be used for gaining the competitive advantage in the smart-wear market. In addition, we expect we can occupy a more competitive position for standardization. Further study, testing and evaluation will be conducted on the standardization road map. More research will be continued about standardization trends and international technology. Through it, we expect to have a positive impact on smart clothing development.
Standardization for Smart Clothing Technology
777
References 1. Cho, G.S.: Latest Clothing Material. Sigma Press, Seoul (2006) 2. Mann, S.: Smart clothing: Wearable multimedia computing and personal imaging to restore the Technological balance between people and their environments. In: Proceedings of 4th ACM International conference on Multimedia, pp. 163–174 (1996) 3. Schreiner, K.: Stepping into smart clothing. IEEE Multimedia Web Engineering Part II, 16–18 (2001) 4. Cho, G., Cho, J.: The technological development of smart-wear for future daily life. Fiber (2007) 5. NPD Group (National Purchase Diary), Smart wear prospecting (2008), http://www.ebn.co.kr/news/ n_view.html?kind=menu_code&keys=70&id=294447 6. Moon, H.S., Cho, H.S., Lee, J.H., Jung, H.I.: An Investigation on the Development of Healthcare Smart Clothing. Korea Society for Emotion and Sensibility 9(1), 77–84 (2006) 7. Lee, J.H.: Digital clothing for everyday life. Textile Technology and Industry 8(1), 11–18 (2004) 8. ISO (International Organisation for Standasrdization) (2008), http://www.iso.org/iso/home.htm 9. IEC (International Electrotechnical Commission) (2008), http://www.iec.ch/ 10. Comprehensive Merchandising Support. Eleksen a peratech company (2007), http://www.eleksen.com/?page=news/index.asp&newsID=78 11. JIS(Japan Industrial Standards) (2008), http://www.jisc.go.jp/eng/ 12. American Society for Testing Materials(ASTM) (2008), http://www.astm.org/ 13. BS(British Standards) (2008), http://www.standardsuk.com/ 14. DIN (Deutsche Industrie Normen) (2008), http://www.din.de/cmd?level=tpl-home&languageid=en 15. Korea Apparel Testing & Research Institute, Quality standard information (2008), http://www.cleaningq.co.kr/quality.php 16. Korea consumer agency textile test team, Summary of amendment on recommended quality standard of textile products, Seoul, Korea (2003) 17. KETI (Korea Electronics Technology Institute), Reliability evaluation method (2008), http://www.keti.re.kr/ 18. KS (Korea Standards) Springer-Author Discount (2008), http://www.kats.go.kr/Appendix
Wearable ECG Monitoring System Using Conductive Fabrics and Active Electrodes Su Ho Lee1, Seok Myung Jung2, Chung Ki Lee2, Kee Sam Jeong3, Gilsoo Cho6, and Sun K. Yoo1,2,4,5,* 1
Brain Korea 21 Project for Medical Science, Yonsei University College of Medicine 2 Graduate Programs in Biomedical Engineering Yonsei University 3 Department of Medical Information System, Yongin Songdam College 4 Department of Medical Engineering, Yonsei University College of Medicine 5 Human Identification Research Center, Yonsei University 6 Department of Clothing and Textiles, Yonsei University [email protected], [email protected], [email protected], [email protected], [email protected] [email protected]
Abstract. The aim of this paper is to develop nonintrusive type ECG monitoring system based on active electrode with conductive fabric. Our developed electrode can measure ECG signal without the electrolyte gel or the adhesives causing skin trouble. For the stable measurement of ECG signal, the buffer amplifier with high input impedance and the noise bypassing shield with conductive fabric were developed. This system involves real-time ECG signal monitoring, and wireless communication using the Zigbee protocol. We show experimental results for developing wearable ECG monitoring system and demonstrate how it can be applied to the design of nonintrusive electrode with conductive fabric. Keywords: active electrode, conductive fabric, wearable, ZigBee, portable.
1 Introduction Wearable smart clothing and medical technologies are a novel research area within the healthcare and textile industry respectively. Recently, there are many studies about convergence technology of these fields for measuring several biosignals in daily life. The ECG (electrocardiogram) is the primary diagnostic signal in cardiology. In the acquisition of the ECG signal, measurement of the surface voltage on standard positions is normally based on the galvanic contact of the electrodes with the skin [1]. Attaching the Ag-AgCl conductive gel filled electrodes to the skin was a popular method. This conventional methodology has several problems during long-term monitoring in daily life. Continuous contact may cause some uncomfortable feelings during a long–term monitoring. The Ag-AgCl gel filled electrode is restricted to the durability and the reusability. More importantly, the probability of occurrence of skin troubles could be increased at the wet electrode type (Ag-AgCl gel filled type) system [2]. *
Corresponding author.
J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 778–783, 2009. © Springer-Verlag Berlin Heidelberg 2009
Wearable ECG Monitoring System Using Conductive Fabrics and Active Electrodes
779
To overcome these restrictions, we designed and a belt type portable ECG measurement system and implemented active electrode with conductive fabric electrode. Without the use of the electrolyte gel, the impedance between the electrode and skin surface will be increased and unstable. We considered the problem of conventional ECG monitoring system and developed the belt type portable ECG monitoring system based on conductive fabric.
2 Materials and Methods Our system is composed of portable device, PC receiving module and windows based application program. The analog part is showed in the left side of the Figure 1. The conductive fabric electrodes of the preamp are connected to the skin. Those two operational amplifiers are used as buffers to the two differential input signals. The ECG module processes signal amplification and noise filtering for the output signals from the buffer. In digital part, MCU (Micro Control Unit) is used to convert ECG signal from analog to digital and control the wireless communication. 2.1 Conductive Fabrics There are several types of conductive fabric. The surface of the active electrode in this system is used by the electroless metal plated fabric. The electroless metal plated fabric has several good characteristics such as good conductivity, even metal plating, free tailorability, and easy sewability [3].
Fig. 1. Block diagram of portable ECG monitoring system
780
S.H. Lee et al.
2.2 Active Electrodes The contact between the conductive fabric and skin surface could not be well established without the use of adhesives such as the electrolyte gel. Therefore the variance of the contact impedance of the conductive fabric electrode is much more than that of the Ag-AgCl gel filled type. If the input impedance of the ECG module is not much larger than that of the skin contact area, the attenuation of the source signal will be high and unbalanced. To acquire valid ECG signal in this condition, a buffer which has very high input impedance is essential. We use the OPA129 (National Instrument Corporation) operational amplifier to meet this requirement. The differential input impedance of the OPA129 is about 1013Ω. The more the impedance of the contact point is increased, the more the power of the penetrating signal is decreased. When the power of the delivering signal through the contact point is decreased, the influence of the outer noise can be increased. To prevent the noise interference at the high contact impedance condition, we covered the preamp with the electrical shield which was made of conductive fabric. 2.3 Wireless Data Transmission For the nonintrusive monitoring, the ZigBee (IEEE 802.15.4) [4] is used in this system and miniaturized. The maximum data rate and the coverage area of the ZigBee are 250Kbps and within a radius of approximately 70m respectively. Our system samples the ECG signal 500 times per second and using two bytes for each sampled data. The required transmission speed is 8Kbps (16bit x 500). Therefore the speed of ZigBee is enough for this system. To implement the ZigBee communication, we use the CC2420 RF Tranceiver (Chipcon Corporation). The CC2420 needs minimum passive elements, has low power consumption characteristic and use SPI (Serial Peripheral Interface) for the connection with the MCU. 2.4 Integration of the Measurement System All modules are integrated into a belt shaped prototype (Figure 2). The conductive fabric active electrode has five snap buttons – two are for those outputs of the buffers,
Fig. 2. The belt-typed ECG monitoring system: (a) Electrode surface (Active circuit, Conductive fabric) (b) ECG module (Analog amplifier, MCU, Zigbee)
Wearable ECG Monitoring System Using Conductive Fabrics and Active Electrodes
781
another two are for the dual power supply and the last is for the common node. The active electrode is connected to the center of the belt using these buttons (Figure2-b). The ECG and the ZigBee transmission module are connected to the belt at the other side of it (Figure2-a). Battery (7.4V, Li-ion cell) is connected to the ECG module. Within the belt the snap buttons are connected to each other.
3 Results The critical factor in reliable ECG measurement is the contact impedance. If the contact impedance is not balanced or extremely high, data acquisition could be difficult. We analyzed the characteristic of the contact impedance of the conductive fabric electrode to verify the usability of this system. One of the equivalent models of the impedance of the human body is a parallel circuit which is composed of a resistor and a capacitor (Figure 3-a) [5, 6]. The resistance will not be changed when the applied frequency is varying. The capacitance is inversely proportional to the frequency. Therefore the total impedance of the contact point between the skin and the conductive fabric will be decreased when the frequency is increased (Figure 3-b). In General, the important frequency band in ECG measurement is located between 0.5 and 100Hz. As shown in the Figure 3-b, the total impedance is very high under 1Hz. From the data sheet, the input impedance of the active electrode of our system is 1013Ω. This value is much larger than that of the contact impedance. Therefore the amount of signal attenuation factor can be tolerable. Figure 4 represent the ECG waveforms obtained by the developed conductive fabric active electrode and Ag-AgCl gel filled type respectively. For compare to these electrodes, we measured ECG signal with three different conditions such as sitting on a chair, walking and running. The two signals at developed electrode and Ag-AgCl electrode are synchronically acquired at the same time and same location on the skin surface. As shown at Figure 4, the signal of the conductive fabric electrode has little more baseline drift than that of the Ag-AgCl gel filled type, because of the time-varying contact impedance by motion artifact. However, the signal quality of developed electrode was comparable to Ag-AgCl electrode.
Fig. 3. (a) The impedance model between the electrode and the skin. (b) The impedancefrequency response of the contact point.
782
S.H. Lee et al.
Fig. 4. ECG waveforms of developed electrode and Ag-AgCl electrode with three different conditions: (a) normal (sitting on a chair), (b) walking and (c) running
At the region of the peak point, the impedance is lower than that of other region because of the high frequency component. Therefore the amplitude of the peak point of the active electrode is higher than that of the Ag-AgCl gel filled type.
4 Conclusion The goal of this study is developing user-friendly and nonintrusive ECG monitoring system. Although Ag-AgCl electrode has been generally used to measure ECG signal, it could not overcome the unpleasant feelings and the skin troubles. We solved these problems by applying to several advanced technologies. In the conductive fabric electrode, the variance of the contact impedance, related with signal quality, is larger than that of the Ag-AgCl gel filled type. To solve this problem we used an operational amplifier buffer which has very high input impedance. Figure 4 represented that there was no significant difference between the output of our conductive fabric type and that of an Ag-AgCl gel filled type. Wireless data transmission system was potable and nonintrusive for users as well. We concluded that belt-typed ECG monitoring system is more user-friendly than the Ag-AgCl gel filled type. Therefore it can be useful for daily and long-term monitoring of the ECG signal. The results of this study show that the developed system can guarantee the nonintrusive measurement of biosignal for home-appliance users as hospital patient in real-time.
Wearable ECG Monitoring System Using Conductive Fabrics and Active Electrodes
783
Acknowledgments. This work was financially supported by a Grant-in-Aid for NextGeneration New Technology Development Programs from the Ministry of Knowledge Economy (MKE) (10016447) and by the Ministry of Knowledge Economy (MKE) and Korea Industrial Technology Foundation (KOTEF) through the Human Resource Training Project for Strategic Technology (2008-I01-032).
References 1. Oegler, M., Ling, V., Melhorn, K., Schilling, M.: A multichannel portable ECG system with capacitive sensors. Physiol. Meas. 29, 783–793 (2008) 2. Hoffmann, K.-P., Ruff, R.: Flexible dry surface-electrodes for ECG long-term monitoring. In: Proceedings of the 29th annual International Conference of the IEEE EMBS Cite Internationale, Lyon, France, August 23-26 (2007) 3. Cho, J., Moon, J., Jeong, K., Cho, G.: Application of PU-sealing into Cu/Ni Electroless Plated Polyester Fabrics for E-Textiles. Fibers and Polymers 8(3), 330–334 (2007) 4. http://www.ieee802.org/15/pub/TG4.html 5. Liedtke, R.J.: Fundamental of Bioelectrical Impedance Analysis. RJL Systems – Publications, http://rjlsystems.com/research/bia-fundamentals.html 6. Vuorela, T., Kukkonen, K., Rantanen, J., Jarvinen, T., Vanhala, J.: Bioimpedance Measurement System for Smart Clothing. In: Proceedings of the Seventh IEEE International Symposium on Wearable Computers (2003)
Establishing a Measurement System for Human Motions Using a Textile-Based Motion Sensor Moonsoo Sung1, Keesam Jeong2, and Gilsoo Cho1 2
1 Department of Clothing and Textiles, Yonsei University, Seoul 120-749, Korea Department of Medical Information Systems, Yong-in Songdam College, Yong-in 449-710, Korea {mssung, gscho}@yonsei.ac.kr, [email protected]
Abstract. We developed a human motion measurement system using textilebased motion sensors whose electrical resistance changes with textile length. Eight body locations were marked and used for measurement, based on previous studies investigating the relationship between human muscles and activities. Five male subjects participated to the experiment, walking and running while the electrical resistance of each sensor was measured. Measuring and analyzing the variations in the electrical resistances of our sensors allowed us to successfully evaluate body postures and motions. Keywords: human motion, human posture, measurement, textile-based motion sensor, electronic textile.
1 Introduction Continuously and precisely measuring human postures and movements is critical to the monitoring of diverse human activities. Therefore, many studies investigate, the measurement of human motions with cameras or accelerometers [1, 2, 3] since the 1990’s. Nowadays, many researchers use conductive fibers and yarns to develop motion sensors. Studies done with camera capture systems and accelerometer-based systems achieved stabilization of measurement techniques thanks to the accumulation of technology. However human activities are difficult to measure in real-time because of excessive measurement system sizes and of new settings needed for each measurement. On the other hand, textile motion sensors are simple and light. Paradiso et al. [4] and Catrysse et al. [5] have developed piezoresistive textile sensors and measured respiration volumes by attaching them around the chest. Because they were developed for respiration, these piezoresistive textile sensors are not sensitive enough to measure body movements, nor the direction and angle of movements. Ravindra Wijesiriwardana [6] developed textile-based motion sensors that exploit magnetic field gradients. Magnetic coil bands made of copper wires and spandex fabric were attached around an upper arm (coil 1) and a forearm (coil 2). They predicted the angle of an arm’s movement by analyzing variations in coil 1 and coil 2. The magnetic coil bands provided correct data, but magnetic waves had to be continuously supplied to the coils. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 784–792, 2009. © Springer-Verlag Berlin Heidelberg 2009
Establishing a Measurement System for Human Motions
785
We, the Smartwear Research Center, developed textile-based motion sensors from knitted fabric made of 100% stainless steel yarns [7]. The electrical resistance of the textile-based motion sensors changes according to fabric elongation. A prediction model for measuring body movements was developed by monitoring changes in electrical resistance resulting from changes at elbows. The Smartwear research center developed textile-based motion sensors from stainless steel yarns and spandex yarns to extend previous studies, measuring changes in electric resistance at outward points of the knees [8]. We planned the current study according to these results that distinguishing postures and motions is more useful than measuring correct joint angles in everyday life. In this study, we focused on the development of textile-based sensors better than previous textile-based sensors. We do not purpose to measure joint angles but to identify movements. In the previous study, the sensors were made with a knitted structure and their elasticity was measured. In this study, the sensors are made of spandex fibers, which are elastomeric, solving the problem of previous sensors. We marked eight points on the lower and upper body to measure body movements.
2 Experiments 2.1 Development of Textile-Based Motion Sensor The textile-based sensor was braided with 60% of filament yarns made of polyester (75denier) yarns covering with spandex yarns (75denier) and with 40% of stainless steel multifilament yarns (Fig. 1). The narrow band was used as a textile sensor, and the changes in its electrical resistance were mapped to changes in length extension. The spandex yarns distributed and smoothed the stretchability and recovery of the sensor, while the steel yarns conducted electricity, with changes in resistance corresponding to changes in length due to better contacts between stainless steel filaments. The electrical resistance of textile sensors was measured with a FLUKE PM6304 Multi-meter. As a free test, we cut the sensor at 5cm, and measured the electric resistance at regular stretch intervals of 0.2cm. The electric resistance decreased to one third at a maximum stretch length of 6.2cm (24% extension), from 13.8Ω to 4.2 Ω. 2.2 Measurement Points of Textile Based Motion Sensors in Clothing The measurement points of textile-based motion sensors in clothing were selected considering previous studies [9, 10, 11, 12]. Each sensor location matches a muscles commonly used in walking and running. Eight points were marked: two on the left and right side of the mesosternum (P.1, P.2) two 5cm below axillae (P.3, P.4), two at the bottom (P.5, P.6), and two outward the knees (P.7, P.8). Five centimeter long sensors were fixed as show on figure 3.
Fig. 1. Textile-based motion sensor
786
M. Sung, K. Jeong, and G. Cho
Fig. 2. Electrical resistance of the fabric sensor according to stretch and shrink
2.3 Suit Containing the Sensors We used a commercial suit produced by a sports brand for our measurements, because it shows identical extension through all sections. The upper garment was a men’s short-sleeve consisting of 78% polyester, 13% nylon and 9% spandex. Pants were 71% polyester, 21% polypropylene and 5% spandex. 2.4 Subjects and Protocol Subjects. We selected five male college students and their average age 24.8 (±2.86) years old. Their average height was 177 (± 1) cm and their weight was 67 (± 0.8) kg. Protocol. The subjects wore the suit, and were asked to walk and run during 3 minutes. A cycle of walk and run, was defined as moving the right arm and left leg in first step, then left arm and right leg in second step. The walking speed was 1.2 seconds per 1 cycle and the running speed was 0.8 seconds per 1 cycle.
Fig. 3. Measurement points for the sensors
Establishing a Measurement System for Human Motions
787
2.5 Real-Time Measurement of Electrical Resistance The electrical resistance was measured with an Agilent data 34970A acquisition/switch unit and an Agilent benchlink data logger. The changes in electrical resistance were measured in real time.
3 Results and Discussion 3.1 Real-Time Changes in Electrical Resistance While Walking Figure 4 and table 1 show the changes in electric resistance and features in each body part due to the walking motions of the 5 subjects.
(a) Upper body
(b) Lower body Fig. 4. Real-time changes in electrical resistance while walking Table 1. Three characteristics of electric resistance while one cycle walking Maxium value (Ω ) Minium value (Ω ) Avg. of gradient (between max and min)
P.1 11.96 ±0.24
P.2 11.66 ±0.25
P.3 10.31 ±0.11
P.4 10.32 ±0.10
P.5 10.60 ±0.08
P.6 10.58 ±0.08
P.7 9.94 ±0.20
P.8 10.11 ±0.19
9.91 ±0.24
9.66 ±0.23
7.69 ±0.11
7.66 ±0.11
7.86 ±0.07
7.44 ±0.09
8.68 ±0.18
8.86 ±0.18
3.40
3.32
13.13
13.13
2.74
3.13
2.08
2.08
The curve shown in figure 4 was produced by averaging the data of the five subjects, after discarding unexpected values, from repeated testing. Although depending on the subject there were minute time differences in moving each body part, the 5 subjects showed a similarly shaped curve. The electric resistance change measured for the upper body showed a smaller overall slope than that for the lower body, because there is less motion in the upper body than in the lower body while walking.
788
M. Sung, K. Jeong, and G. Cho
Electric resistance change trends characteristically appear at the mesosternum for the upper body and at the knee for the lower body. In the case of the mesosternum, during the transition time between motions, a stable period of steady resistance occurs. Large electric resistance change is observed for the left and right side of the axillae, which show gradual changes and a bit of stable period. With the walking motion, significant individual differences appeared, due to the impact of individual walking habits, in the range of electric resistance change measured at 4 points on the upper body. 3.2 Real-Time Changes in Electrical Resistance While Running Figure 5 and table 2 show the changing electric resistance and features in each body part due to the running motions of the 5 subjects.
(a) Upper body
(b) Lower body Fig. 5. Real-time Changes in electrical resistance while running Table 2. Three characteristics of electric resistance while one cycle running
Maximum value (Ω ) Minium value (Ω ) Avg. of gradient (between max and min)
P.1 14.88 ±0.17
P.2 14.90 ±0.17
P.3 13.06 ±0.03
P.4 13.10 ±0.01
P.5 13.17 ±0.10
P.6 12.85 ±0.08
P.7 10.49 ±0.10
P.8 10.40 ±0.07
11.93 ±0.16
12.32 ±0.17
9.38 ±0.07
8.98 ±0.02
8.24 ±0.18
7.74 ±0.18
9.63 ±0.12
9.77 ±0.02
7.36
6.44
9.19
10.28
8.20
8.51
2.15
1.55
As with the walking motion, the electric resistance data was averaged from repeated testing after discarding unexpected values, and then analyzed. For the running motion, there were virtually no subject-dependent motion time differences. The main difference observed between running and walking is that the stable periods found at the exterior of the knee and at the left and right side of the mesosternum are very brief due to the acceleration. Besides, the difference between the minimum
Establishing a Measurement System for Human Motions
789
and maximum electric resistances increases, because the motion becomes significantly bigger when running. As the stable period and the duration of a motion cycle from start to completion shorten, the slope of the “running” curve steepens, the impact of individual walking habits becomes insignificant. 3.3 Establishing a System to Measure Human Motions Based on the “walking” and “running” data, a way to link electric resistance changes to motions was considered. Three features emerge from our analyses. First, when resistance change is produced due to motion, a fluctuation period and a stable period co-occur. In both the walking and running motions, the sensors attached to the exterior of the knees accurately show the fluctuation and stable period patterns. The grounds for motion differentiation could be established based on the existence and nonexistence of such fluctuation and stable periods and their lengths. As presented in the summary of our experiments, the data collected from the sensors attached to the exterior of the knees showed a relatively long stable period for walking and a shorter one for running. Second, the length of cycle collected from the sensors attached to the upper and the lower body differ. This indicates differences in speed, and allows the evaluation of not only walking or running speed but also the balance of the upper and lower body. For example, if the lower body appears slower than usual compared to the upper body, the subject’s motion is unbalanced. Third, depending on the movement form, the range of electric resistance change measured by the sensor attached to the same area, that is, the difference between the maximum and minimum value, differs. When the data from the walking and the running motions in this study are compared, the difference between the maximum and minimum value for each sensor is greater for running than for walking. Thus, as the motion becomes larger and faster, the change in electric resistance increases. In other words the higher the slope of the measurement in a section, the larger the difference between the maximum and minimum value. This conclusion will be verified with data collection and analysis through additional tests. If a program exploits these three characteristics, the motion form, speed and acceleration may be measured simply from electric resistances. The following table is a numeric summary, focusing on the aforementioned three characteristics, of the data obtained in this study. Table 3. Three key characteristics of the electric resistance of textile-based motion sensors
Stable section of electric resistance changing (ms, millisecond) Cycle duration(ms, millisecond) Avg. of gradient between max and min (Left and right sides of knees)
Walking(one cycle)
Running(one cycle)
504 ± 22.6
204 ± 54.9
999 ± 122.5
644 ± 28.2
2.93
8.36
790
M. Sung, K. Jeong, and G. Cho
4 Applications As summarized in the results, we can determine the form and overall motion speed as well as acceleration from the motion measurement system using resistance changes in textile sensors. Reflecting these abilities, we propose the following four applications. First, we may correct the posture of athletes. For example, consider a batter’s or a golfer’s swing: the swing form, the overall swing speed and the instantaneous acceleration at a given point can be measured and then corrected. Athletes such as batters and golfers produce records through the athletic form of swings. To improve their performance, while increasing the overall swing speed is important, the overall body balance during a swing and the acceleration at the instant the ball is hit by a bat or club are the most important factors. To make such corrections, it is possible to construct a system to detect and analyze an athlete’s motion and posture using textile-based sensors. Second, with the increase of the elderly population, the measurement system can be applied in the health care of aged patients. Because of physical infirmities, the activities of elderly patients are limited. Although their health may improve through concentrated care, such care must be accompanied with appropriate activities or exercises. Without carrying around a heavy and uncomfortable device or having it attached, but just by wearing clothing, it will be possible to determine if a patient is getting enough exercise and, based on this, prescribe him or her appropriate exercises and activities. Third, the system can be used as an input device for various instruments. By attaching textile-based motion sensors at finger joints or the body’s major muscles, the movements can be measured, providing as a glove-type input device or a video game controller. Fourth, it can also be used for simple medical purposes. Currently the most researched bio-monitoring area is in the ECG measurement field founded on textilebased electrode systems. Although the current technology cannot be used for precision medical diagnosis, it can be used to detect abnormal symptoms occurring in everyday life and then to get further diagnostic checkup to prevent serious diseases. Such ECG monitoring wear only needs the current state of the patient. Simply put, it is natural for the heart beat to quicken and breathing become hard when running fast. However if the heart beat speeds up and the breathing becomes hard while resting or walking slowly, it is abnormal. If the symptom is intense, the patient may of course go to a hospital for an examination. But if the patient does not feel or is not greatly conscious of the symptom, then most of the time no visit is made at a hospital. Therefore, serious diseases can be prevented, if a motion measurement system employing textile-based motion sensors and a bio-monitoring smartwear using textile-based ECG electrodes are developed and used.
5 Conclusion In this study, a resistance change textile was made and applied to clothing to realize a textile-based motion measurement system. Through tests measuring electric resistance in real-time, particular changes were found for walking and running motions. We also determined that designing a movement measurement system using electric resistance
Establishing a Measurement System for Human Motions
791
changes in textile is possible if the measured data is combined to a program exploring their characteristics. Although very simple, this sensor can provide quantitative data expressed numerically in various ways depending on human needs.
6 Future Works Since 5 subjects with similar physical characteristics were selected and used for tests in this study, generalizations may be unreasonable. Thus, we will focus subsequent studies on generalization based on movement data obtained from subjects with diverse physical characteristics. The electric resistance measurement device used in this study could not be carried around. We need to replace this with a portable instrument for a realistic construction of a textile-based motion measurement system. To do this, a wireless measurement board capable of measuring 8 to 12 channels is currently being developed, and a portable device combining this with a wireless transmitter-receiver is also being designed. Acknowledgments. This study was done with the support of the Ministry of Knowledge Economy’s Strategic Technology Development Project, “Technology Development of smart wear for future daily.” We thank the MKE for this support.
References 1. Motoi, K., Ikeda, K., Kuwae, Y., Ogata, M., Fujita, K., Oikawa, D., Yuji, T., Higashi, Y., Fujimoto, T., Nogawa, M., Tanaka, S., Yamakoshi, K.: Development of a Wearable Sensor System for Monitoring Static and Dynamic Posture together with Walking Speed for Use in Rehabilitation. IFMBE Proceedings 5(16), 2977–2980 (2006) 2. Koyanagi, M., Shino, K., Yoshimoto, Y., Inoue, S., Sato, M., Nakata, K.: Effects of changes in skiing posture on the kinetics of the knee joint. Knee Surgery, Sports Traumatology, Arthroscopy 14, 88–93 (2006) 3. Hoshino, R., Arita, D., Yonemoto, S., Taniguchi, R.: Real-Time Human Motion Analysis Based on Analysis of Silhouette Contour and Color Blob. In: Perales, F.J., Hancock, E.R. (eds.) AMDO 2002. LNCS, vol. 2492, pp. 92–103. Springer, Heidelberg (2002) 4. Paradiso, R., Loriga, G., Taccini, N., Gemignani, A., Ghelarducci, B.: WEALTHY - a Wearable Healthcare System: New Fron-tier on E-textile. Journal of Telecommunications and Information Technology, 105–113 (2005) 5. Catrysse, M., Puers, R., Hertleer, C., Langenhove, L.V.: Towards the integration of textile sensors in a wireless monitoring suit. Sensors and Actuators A: Physical 114, 302–311 (2004) 6. Wijesiriwardana, R.: Inductive Fiber-Meshed Strain and Displacement Transducers for Respiratory Measuring Systems and Motion Capturing Systems. IEEE Sensors Journal 6(6), 571–579 (2006) 7. Sung, M., Baik, K., Yang, Y., Cho, J., Jeong, K., Cho, G.: Characteristics of Low-cost Textile-based Motion Sensor for Monitoring Joint Flexion. In: Proceedings of the 11th International Symposium on Wearable Computers - Student Colloquium Proposals, pp. 29– 31 (2007)
792
M. Sung, K. Jeong, and G. Cho
8. Sung, M., Cho, G.: Development of Motion Capture Clothing for Monitoring Body Movement of Textile-based Motion sensor. In: Symposium of the Korean fiber society, pp. 66–69 (2008) 9. Keller, T.S., Weisberger, A.M., Ray, J.L., Hasan, S.S., Shiavi, R.G., Spengler, D.M.: Relationship between vertical ground reaction force and speed during walking, slow jogging, and running. Clinical Eiomerhanics 11(5), 253–259 (1996) 10. Sasaki, K., Neptune, R.R.: Differences in muscle function during walking and running at the same speed. Journal of Biomechanics 39, 2005–2013 (2006) 11. Sasaki, K., Neptune, R.R.: Muscle mechanical work and elastic energy utilization during walking and running near the preferred gait transition speed. Gait & Posture 23, 383–390 (2006) 12. Lichtwarka, G.A., Bougoulias, K., Wilson, A.M.: Muscle fascicle and series elastic element length changes along the length of the human gastrocnemius during walking and running. Journal of Biomechanics 40, 157–164 (2007)
A Context-Aware AR Navigation System Using Wearable Sensors Daisuke Takada1, Takefumi Ogawa2, Kiyoshi Kiyokawa3, and Haruo Takemura3 1
Graduate School of Information Science and Technology, Osaka University 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan [email protected] 2 Information Technology Center, The University of Tokyo 2-11-16 Yayoi, Bunkyo, Tokyo 113-0032, Japan [email protected] 3 Cybermedia Center, Osaka University 1-32 Machikaneyama Toyonaka, Osaka 560-0043, Japan {kiyo,takemura}@ime.cmc.osaka-u.ac.jp
Abstract. We have been developing a networked wearable AR system that determines the user's current context to provide appropriate annotations. This system allows for annotation management based on the relationship between annotations and the real environment along with data transfer routines that dynamically calculate annotations' priority to transfer just enough data from the annotation server to the wearable PC worn by the user. Furthermore, this system recognizes the user's activity to predict the kind and level of detail of annotations the user needs at a given time. This information can be used for dynamic annotation filtering and switching of rendering modes.
1 Introduction There is much active work in AR (Augmented Reality) technology, which overlays computer generated information onto the real environment using display devices like HMDs (Head Mounted Displays). Among AR's various applications, wearable AR systems have also been studied for a variety of purposes, including navigation in a real environment[1]. To share annotation data among users and to allow information providers to update annotation data from time to time, it is possible to employ a networked wearable AR system that follows a client-server model. In such a system, a filtering technique for annotations is necessary because the client has to use an unreliable network, such as PHS, and all of the server's information cannot be received at once. Although different kinds of annotation information (i.e. shops, transit, and public service) and different kinds of annotation data types (i.e. simple text, multimedia content like images and video, and detailed descriptions) are available in the system, having a user manually select what kind of annotation information the user needs and what type of annotation data type the user wants is a burden on the user. Therefore, a technique to detect this automatically from the user's current situation (context) is necessary. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 793–801, 2009. © Springer-Verlag Berlin Heidelberg 2009
794
D. Takada et al.
In this paper, we present a networked wearable AR system that estimates the user's current context to provide appropriate annotations. This system allows for annotation management based on the relationship between annotations and the real environment along with data transfer routines that dynamically calculate annotations' priority to transfer just enough data from the annotation server to the wearable PC worn by the user. Furthermore, this system recognizes the user's activity to predict the kind and level of detail of annotations the user needs at a given time. This information can be used for dynamic annotation filtering and switching of rendering modes. After a discussion of related works in Section 2, overview and details about the proposed navigation system and techniques used in the system are presented in Section 3. Finally, in Section 4, a summary and future directions are detailed.
2 Related Works A number of studies have been conducted on distribution and management technique of annotation information shared with multiple users and context recognition using wearable sensors. 2.1 Priority Evaluation for Annotation Transfer Many evaluation methods of objects' sending priority have been investigated for use in distributed virtual environments. Beeharee et al.[2] visually assigns higher priority to objects that are more attractive using color, contrast, orientation, speed, and other characteristics of the objects. Park et al.[3] evaluates sending priority by using each user's individual interest rate for objects and the total interest rate of all users for objects simultaneously. Chan et al.[4] and Li et al.[5] assume the user's environment to be a desktop-form virtual environment. They predict the user's movement in virtual space using the user's position in virtual space and the movement of the user's mouse on the desk and evaluate priority using the results of the prediction. However, these methods are focused on virtual environments only. In an AR system, data includes annotations on the real world, so it is not appropriate to prioritize by visual attractiveness as in [2]. Furthermore, networked wearable AR presents further problems related to the movement of the user's viewing direction. In a networked wearable AR system, the movement of the user's head directly affects the user's viewing direction and the movement will often be fast, especially when the user looks around, as compared to the movement in a desktop type distributed virtual environment, where movement is controlled by the user's mouse. Although [4] and [5] use the user's position and orientation to evaluate priority and the weights for each parameter are constants, while in networked wearable AR , the weights must be changed dynamically in order to adapt to the user’s context. A filtering method suited to mobile AR has been proposed by Julier et al.[6] Their method considers the task which the user is performing to evaluate the priority of objects. However, this method has some problems. One is that an expert for each task must choose the elements which should be considered for the task and another is that parameters used to evaluate priority must be set manually.
A Context-Aware AR Navigation System Using Wearable Sensors
795
2.2 Hierarchical Annotation Management NPSNET[7] divides geometry data into grids of fixed size and manages the data in a quad-tree to achieve efficient data management in a virtual environment. Although there are many techniques to construct databases hierarchically, several of these methods simply divide the data into grids of fixed size and do not consider the relation between the data and the real environment. Kolbe et al.[8] proposed a hierarchical data structure to express city environment. In this data structure, each object can have associated semantic information such as “building”, “bridge”, and “monument”. However, filtering by this semantic information has not been considered. 2.3 Context Estimation There are a number of studies on context recognition using wearable sensors, with accelerometers being a popular sensor to perform the recognition. Kern et al.[9] uses 12 body-worn 3D accelerometers to obtain the user's activity. Their system can detect 8 kinds of user behavior --- “sitting”, “standing”, “walking”, “upstairs”, “downstairs”, “shake hands”, “write on board”, and “keyboard typing”. Ward et al.[10] uses two microphones and accelerometers to achieve continuous context recognition of a user's assembly tasks.
3 Proposed System 3.1 System Overview Figure 1 gives an overview of a networked wearable AR system. A networked wearable AR system consists of a server that stores and manages all of the annotation information in a database and wearable PCs that are worn by users and work as clients. The annotation database on a server can be updated by a system administrator and users with wearable PCs. The wearable PC measures the user's position and orientation, sends this information to the server, receives appropriate annotation data from the server based on the user's context, and renders the received data accordingly. user’s view
desktop PC
location
wearable PC
annotation
Annotation Database
note PC annotation annotation
wearable PC database management
text image movie etc… annotation
annotation
location
user’s view wearable PC navigation
Fig. 1. Overview of a networked wearable AR system
796
D. Takada et al.
Devices wearable sensors
annotation database
rendering module
context module database module
cache module
networking module
position detection module
networking module
Server position orientation
GPS and orientation sensor
HMD
Wearable PC
raw data from sensors
estimated context
rendered image
annotations
Fig. 2. Detailed overview of the system
Figure 2 shows the data flow in the system. On the client, the position detection module samples the user's position and orientation using GPS and a gyro sensor. The context recognition module samples various user's states from wearable sensors and estimates the user's context. The networking module sends this information to the server. The server uses the user's position, orientation, and context to retrieve appropriate annotations using the database module. The networking module sends this annotation information to the client, and finally, the rendering module renders received annotations on the user’s HMD. 3.2 Hierarchical Data Structure Considering Real Environment This section describes the method for managing annotations on the server. The areas covered by the server are hierarchically subdivided into small areas, and each area corresponds to a space in the real environment. Annotations belong to one of these areas. Figure 3 shows a hierarchical structure of areas and annotations. Campus
Cybermedia Center
Area Library
Building 1F
3F
2F
4F
5F
1F
2F
3F
Floor Seminar Room
Experiment Room
Elevator
Room Projector
TV
Bookshelf
Dome Display
Fig. 3. Hierarchical data structure
note area
annotation
A Context-Aware AR Navigation System Using Wearable Sensors
797
Each area has information about the real space corresponding with the area. According to the definition of an area, a category for the area (e.g., “Room,” “Floor,” “Building,” “Area,” “Town,” “City,” “State/Prefecture”) can be set. These categories are used to find annotation information. Each user can select a yes-or-no policy for each category. Eight different policies can be used: 1. When the user is in the area a. add annotations in the area into the search result b. traverse to each child area for a search c. traverse to the parent area for a search 2. When the area is traversed to from a child area a. add annotations in the area into the search result b. traverse each child area for a search c. traverse the parent area for a search 3. When the area is traversed to from the parent area a. add annotations in the area into the search result b. traverse each child area for a search For example, when a user is in a room and needs annotations which are only in that room or in the buildings around the area, the user can set “yes” for policies 1(a), 1(c), and 2(c) for categories “Room” and “Floor”; “yes” for 2(a), 2(c), 3(a) for “Building”; “yes” for 2(b) for “Area” to restrict the target area when searching for annotations. 3.3 Dynamic Priority Control Technique The system receives a set of annotations that satisfy the search criteria and then evaluates the transfer priority for each annotation, sending the annotations according to this priority. Figure 4 shows the position of user A and annotation O. In this case, the priority of O for the user A, p(O,A) is defined by the following expression: relation in the database
area S annotation O
viewing direction
area S 1
θ(O,A)
1
area B area A
d(O,A)
area A
in this case: h(O,A)=2
area B
annotation O exists
user A
user A exists
Fig. 4. Parameters used for priority evaluation
,
1 1
h O, A H A
,
1
1 α
1
, θ O, A π A
(1)
θ
where d(O,A) is the distance between the annotation O and the user A, (O,A) is the angle between the user's viewing direction and the line between O and A, and h(O,A) is a is the distance between O and A in the hierarchical structure of the database.
α
798
D. Takada et al.
constant in the interval 0 < α < 1. When α is small, annotations in front of the user are prioritized, and when α is large, annotations closer to the user are prioritized. In addition, when the distance between the current area and the location of the annotation is further down in the tree structure, the priority becomes low because the area should correspond to the semantic space in a real environment as described above. In addition, D(A), π(A), H(A) are constants used to normalize variables d(O,A), θ(O,A), h(O,A), and are given by the user when the connection is established. When the user's viewing direction is moving quickly and spread over a large area, it is assumed that the user is looking around. When the user's viewing direction is stable, it is assumed that the user needs the information which is in the current direction. In this case, the priority depending on the user's behavior can be evaluated by relating the movement of the user's viewing direction with α. In this case, equation (1) is expanded as follows: p O, A
1 h O, A H A
1
h O, A H A 1
α
α 1 1
d O, A D A (2)
θ O, A π A
s and t are defined as: s t
S A ω A 1 S A ω A
S A ω A 1 S A W A S A 1
ω A
W A ω A S A W A ω A
where w(A) is the angular velocity that expresses the movement of the user's viewing direction. When w(A) is large, annotations that are closer to the user's viewing direction are prioritized, and when w(A) is small, annotations that are closer to the user himself are prioritized by equation (2). W(A) is a constant for normalizing w(A) and is given by the user when the connection is established, as with the other constants. It might not be required to determine the weight for d(O,A) and θ(O,A) automatically using the angular velocity. For such a case, there is a constant S(A) in equation (2). S(A) is in the interval 0 < S(A) < 1. Figure 5 shows the change in priority when the user's viewing direction is moving. When the angular velocity is large, annotations that are closer to the user are prioritized, and when the angular velocity is small, annotations that are closer to the user's viewing direction are prioritized. When the angular velocity is in the middle, annotations that are closer to the user's viewing direction are prioritized, but the smaller the distance between the annotation and the user is, the higher its priority becomes. 3.4 Context Estimation We focused on outdoor user contexts. These specifically include 5 contexts ― “sitting”, “standing”, “walking”, “running”, and “biking”. When the user is sitting, the user's thighs are horizontal and when the user is standing, they are vertical. When the user is walking, running or biking, the angle of the user's thighs changes periodically. Furthermore, the period of the movement of the user's thighs is shorter when the user
A Context-Aware AR Navigation System Using Wearable Sensors
front
front
front
799
high
priority back distance is emphasized w( A) = 1.0 W ( A)
back both are emphasized w( A) = 0.5 W ( A)
back angle is emphasized w( A) =0 W ( A)
low
Fig. 5. Difference in priority
is running than when the user is walking, and it is shorter when the user is walking than when the user is biking. Therefore, we used two triaxial accelerometers on the fronts of the user's thighs to sample the static and dynamic acceleration from the front of the user's thigh to the back. Figure 6 shows the workflow of the context recognition mechanism. A short-term Fourier transfrom (STFT) is employed to compute the signal energy distribution of the user's thighs' acceleration in the joint time-frequency domain. A support vector machine (SVM) is employed for classification of the signal power spectrum. We trained the SVM in advance using classified teacher signals. Finally, we considered the tendency of a change from one state to another. For example, the change of context from “walking” to “running” is likely, but it is unlikely a user will go directly from “running” to “biking”. In this case, the tendency of state change from “walking” to “running” should be set higher than the one from “running” to “biking”. This technique can reduce recognition error. 3.5 Context-Aware Annotation Visualization The context recognition module outputs 5 contexts --- “sitting”, “standing”, “walking”, “running”, and “biking”. Our system changes the appearance of the annotation based on the user's changing context. When the user is “running” or “biking”, it is considered that the user is in a hurry and does not focus on annotations, therefore, the system does not render any annotations.
Training Phase
Sensor
Data
STFT
Power Spectrum
Training
User Context
Model
Recognition Phase
Sensor
Data
STFT
Power Spectrum
Training
Fig. 6. Context recognition workflow
User Context
Recognition by human
800
D. Takada et al.
(a) context: sitting
(b) context: standing / walking
(c) context: running / biking Fig. 7. Change of annotation visualizations
When the user is “walking”, it is considered that the user is not in a hurry and he can check annotations, and the system renders annotations. When the user is “sitting”, it is considered that the user's sight will not change suddenly and the user wants detailed information about the user's view, so the system renders detailed description about the object in front of the user. Figure 7 shows the result of context-aware annotation visualization. Images are used as annotations and detailed descriptions are displayed at the right-bottom corner of the screen. As shown in Figure 7(a), the annotation and detailed description are rendered when the user is sitting. When the user is standing or walking, the detailed description is not rendered as shown in Figure 7(b). And when the user is running or biking, neither the annotation and nor the detailed description is rendered as shown in Figure 7(c).
4 Conclusion In this paper, we presented a context-aware wearable AR navigation system. In this system, we employed a data structure for efficient filtering of annotations, a dynamic priority control technique using the movement of the user's viewing direction, a context recognition mechanism, and a view switching mechanism depending on the user's context. Annotations handled by the proposed method belong to areas that correspond to the real environment, and the proposed system manages annotations hierarchically using a tree structure. The server filters annotations using this information and the policy decided by each user. The priority control technique considers that the user's
A Context-Aware AR Navigation System Using Wearable Sensors
801
head bobbing varies from low to high in the wearable AR system and changes the priority of each annotation based on the distance between the user and annotations and the angle between the user's viewing direction and the line from the user to the annotation, depending on the angular velocity of the user's viewing direction. Context recognition estimates 5 different states using two accelerometers on the user's thighs and contexts are used to change the appearance of annotation. At present, we have only implemented these mechanisms and tested them via simulation, but future work includes performing an experiment to confirm the effectiveness of the mechanism in a real environment.
References 1. Feiner, S., MacIntyre, B., Hollerer, T.: A Touring Machine: Prototyping 3D Mobile Augmented Reality System for Exploring the Urban Environment. In: Proc. of the 1st International Symposium on Wearable Computers (ISWC 1997), pp. 208–217 (1997) 2. Beeharee, A.K., West, A.J., Hubbold, R.: Visual Attention Based Information Culling for Distributed Virtual Environments. In: Proc. of the ACM symposium on Virtual reality software and technology (VRST 2003), October 2003, pp. 213–222 (2003) 3. Park, S., Lee, D., Lim, M., Yu, C.: Scalable Data Management Using User-Based Caching and Prefetching in Distributed Virtual Environments. In: Proc. of the ACM Symposium on Virtual Reality Software and Technology (VRST 2001), pp. 121–126 (2001) 4. Chan, A., Lau, R.W.H., Ng, B.: A Hybrid Motion Prediction Method for Caching and Prefetching in Distributed Virtual Environments. In: Proc. of the ACM symposium on Virtual reality software and technology (VRST 2001), pp. 135–142 (2001) 5. Li, T.Y., Hsu, W.H.: A data management scheme for effective walkthrough in large-scale virtual environments. The Visual Computer 20(10), 624–634 (2004) 6. Julier, S., Lanzagorta, M., Baillot, Y., Rosenblum, L., Feiner, S., Hollerer, T., Sestito, S.: Information Filtering for Mobile Augmented Reality. In: Proc. of International Symposium on Augmented Reality (ISAR 2000), pp. 3–11 (2000) 7. Macedonia, M.R., Zyda, M.J., Pratt, D.R., Barham, P.T., Zeswitz, S.: NPSNET: A Network Software Architecture For Large Scale Virtual Environments. MIT Press Presence 3(4), 265–287 (1994) 8. Kolbe, T.H., Groger, G.: Towards Unified 3D City Models. In: Proc. of the ISPRS Comm. IV Joint Workshop on Challenges in Geospatial Analysis, Integration and Visualiation II (2003) 9. Kern, N., Schiele, B.: Recognizing context for annotating a live life recording. Personal and Ubiquitous Computing (2005) 10. Ward, J.A., Lukowicz, P., Troester, G., Starner, T.E.: Activity Recognition of Assemby Tasks Using Body-Worn Microphones and Accelerometers. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1553–1567 (2006) 11. Nakamura, T., Ogawa, T., Kiyokawa, K., Takemura, H.: User Context Recognition for Distinguishing Standing Inside or Outside a Train. In: Proc. of the 11th Multimedia, Distributed, Cooperative, and Mobile Symposium (DICOMO 2007), pp. 1496–1501 (2007) (in Japanese) 12. Takada, D., Ogawa, T., Kiyokawa, K., Takemura, H.: A Hierarchical Annotation Database and a Dynamic Priority Control Technique of Annotation Information for a Networked Wearable Augmented Reality System. In: Proc. of the 18th International Conference on Artificial Reality and Telexistence (ICAT 2008), pp. 226–233 (2008)
Emotional Smart Materials Akira Wakita, Midori Shibutani, and Kohei Tsuji Keio Univeristy, Faculty of Environment and Information Studies, 5322 Endo, Fujisawa-shi Kanagawa, 252-8520, Japan {wakita,lestari,s07626kt}@sfc.keio.ac.jp
Abstract. To build affective and emotional interaction, we pay attention to materials of interface. We introduce two non-emissive displays as examples showing our concept. Fabcell is a fabric pixel that changes its color with nonemissive manner. Matrix arrangement of Fabcell enables information display with fabric texture. Jello Display is composed of gel blocks with moisture, coldness and softness. The unique look and feel enables organic information display. These kinds of haptic and organic information displays have ability adding rich affectivity to the artifacts used in our everyday life. Keywords: smart material, affective computing, ubiquitous computing, tangible interface.
1 Introduction Recently, applications of ubicomp and tangible UI are keep expanding its domain and a variety of smart objects are created. The interaction accomplished by these kind of smart objects will cover not only generalized cold functions but also affective warm functions. Thus, human computer interactions keep evolving from dry to wet manner. To fulfill this requirement, huge numbers of sensors are developed and distributed on the market as readily-available electric parts. Moreover, easy-to-use physical computing toolkit, for example Gainer [1], Arduino[2] and Wiring[3] are developed. These rapid-prototyping environment enables creators of digital contents to change fields of creation from cyber world to physical world. Sensors can transfer many physical phenomena to electric signals. For example, light, pressure, sound, temperature, acceleration, tactile, proximity, and so on. The combination of sensor fusion and artificial intelligence technique will be applicable for the use of recognizing and interpreting human emotion [4]. On the other hand, there exist limited number of actuators. Compared with sensors, there is really few output physical phenomena, for example light, movement, temperature and so on. This is one of the problems why it is difficult to accomplish rich and emotional communication by using computer embedded artifacts. Expressiveness of actuators is not enough to strike users chords. To address this problem, we pay attention to materials of actuators. Building calm and embodied interface is expected by utilizing mentally-and-physically soft and worm materials. J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 802–805, 2009. © Springer-Verlag Berlin Heidelberg 2009
Emotional Smart Materials
803
2 Emissive and Non-emissive As mentioned the above sentences, most of works use emissive materials, i.e. LED and projection images, for output signals. As a consequence, styles of interaction are biased because these works tend to push information to users as a stimulus. We have to discuss natural inclination of emissive and non-emissive materials. Generally, emissive materials are suitable for showing the urgency or degrees of risk, and are not suitable for showing emotions or hospitality. For instance, hospital emergency rooms are compatible with emissive materials because those push information in high frequency and directly. On the other hand, rooms of obstetrics and gynecology are compatible with non-emissive materials. Pushing through blinking light or high lumen display are not appropriate for the spaces in which high hospitality is requited. Let us think about the relationship between scale and light emission. Large scaled objects such as cities and vehicles contain many emissive functions. Street lamps and billboards are glittering in the night. Automobile seems like a composition of pushing functions for example headlight illumination, noise and horn. On the other hand, small scale objects such as house interiors and clothing contains many non-emissive functions. Wallpaper of the room, bed, sofa, underwear, and so on. The closer artifacts are used to our bodies, the more non-emissiveness is required. Is is also observed that the closer to our bodies, the softer material becomes both physically and mentally. We human have not acquired the culture to put on emissive materials in our everyday life. Cell-phone maybe the first artifact in our history. Hence, it will take a long time to overcome social, cultural and psychological problem of emissive artifacts.
3 Emotional Displays 3.1 Fabcell – A Fabric Pixel Based on the above-mentioned ideas, we are developing material-oriented smart actuators for emotive interactions. The first example is Fabcell [5], a kind of fabric pixel. A fabcell consist of conductive yarns and heat sensitive ink. The surface temperature of the textile will change by energizing conductive yarns. A unique color correspond to each temperature will emerge on the surface of fabcell. The color will change from red, green, to blue. Digital images can be displayed by arranging fabell in matrix state. Digital images and animations on fabric texture are quite unique and warm expression. By utilizing fabcells, any kind of artifact made by fabric can be changed interactive media showing rich and warm information.
804
A. Wakita, M. Shibutani, and K. Tsuji
Fig. 1. Fabcell – A Farbric Pixel. Colors change form red, green to blue
Fig. 2. Jello Display / Keyboard
Emotional Smart Materials
805
3.2 Jello Display and Keyboard The second example is Jello Display made of gelled pixels. Each gelled pixel consist of japan agar block and photochromic ink. A ultra-violet LED is attached at the back of gelled pixel and is turned on and off, enabling photochromic ink to emerge its predefined color. Jello Display has unique look and feel, shaking like a jelly and giving of a semi-transparent brilliance. It shows not only digital image but also qualitative code through its unique texture. If a ager block is resembled as a key block and a touch sensor is attached at the bottom, it becomes keyboard with moisture, coldness and softness. By utilizing Jello Display, wet and organic interaction may be realized.
4 Calm and Embodied Mark Weiser used two impressive keywords to explain the vision of ubicomp, those are "calm technology" and "embodied interaction"[6]. The former means that existence of computers becomes invisible in our living space, and unfortunately most of applications of ubicomp and TUI still use emissive materials such as LED and LCD. This situation also makes our living space extraordinary venue, and disturbing affective and emotional ambient atmosphere. The latter means that computers connected to the network become smaller and are embedded to any object in our living space. However modern manufacturing process utilizes rigid and solid materials such as plastics and metals, soft materials such as gels and textiles remains untouched by ubicomp and TUI. Because those kind of materials are really suited for affective and emotional information display, cultivating the edge of unused materials contains many potentials.
References 1. 2. 3. 4. 5.
Gainer, http://gainer.cc/ Arduino, http://www.arduino.cc/ Wiring, http://www.wiring.org.co/ Rosalind, W.: Picard, Affective Computing. MIT Press, Cambridge Shibutani, M., Wakita, A.: Fabcell:Fabric Element. In: SIGGRAPH 2006 Emerging Technologies (2006) 6. Weiser, M.: The Computer for the 21st Century. Scientific American 265, 94 (1991)
Novel Stretchable Textile-Based Transmission Bands: Electrical Performance and Appearance after Abrasion/Laundering, and Wearability Yoonjung Yang and Gilsoo Cho Department of Clothing and Textiles, Yonsei University, Seoul 120-749, Korea [email protected], [email protected]
Abstract. In this paper, we (1) compare the electrical performances and appearance changes of two textile-based transmission bands after repeated abrasion and laundering, and (2) evaluate their wearability with MP3 player jackets. The bands were made with non-stretchable Teflon-coated stainless steel yarns, or stretchable silicon-coated stainless steel yarns and spandex. The electrical resistance of the bands after repeated abrasion and laundering was measured with a RCL (resistance capacitance inductance) meter. The appearance changes were observed using a digital microscope. For wear tests, five subjects evaluated the degree of convenience while doing specific actions and other wear sensations using questionnaires with a 7-point Likert-type scale. Both non-stretchable and stretchable transmission bands were evaluated as excellent on electrical performances. Appearance changes after abrasion were tolerable, and there were neither exposure nor disconnection of stainless steel yarns. Convenience and other wear sensations for the MP3 player jacket using stretchable silicon-coated bands were evaluated as better than non-stretchable Teflon-coated bands. Keywords: stretchable textile-based transmission band, silicon-coated stainless steel multifilament yarn, abrasion, laundering, electrical resistance, image analysis, MP3 player jacket, wear sensation.
1 Introduction A typical smart clothing system comprises five basic functions–interaction, communication, energy supply, data management, and processing–and textile-based transmission lines are an effective solution for the communication component [1]. Textile-based transmission lines functioning in smart clothing like flat cables in standard electronic devices are e-textiles. They integrate the functions of electric components and the characteristics of woven fabrics. Thus, the properties of both domains should be considered when developing e-textiles. Early textile-based transmission bands generally use metallic multifilament yarns covered with regular threads as signal transmission lines [2, 3, 4]. This structure poses problems for safety and operation, such as the risk of causing interferences between neighboring lines during use and the deterioration of electric insulation. As a solution, J.A. Jacko (Ed.): Human-Computer Interaction, Part III, HCII 2009, LNCS 5612, pp. 806–813, 2009. © Springer-Verlag Berlin Heidelberg 2009
Novel Stretchable Textile-Based Transmission Bands: Electrical Performance
807
we had developed a textile-based transmission band using Teflon-coated stainless steel multifilament yarns. The physical and electrical performances of the band were improved [5]. Until now, research about textile-based transmission bands focused on verifying the stability of signal transmission, preventing interferences between neighboring lines, and improving the durability in physical and electrical aspects [5, 6, 7, 8]. However, to commercialize textile-based transmission bands, wearability as well as physical and electrical performances must be satisfactory. When we used preexisting textile-based transmission bands to develop smart clothing, we had to use bands longer than the actual length because it was not elastic, in order to allow the wearer’s body movements. Accordingly, we need to develop a textile-based transmission band with elasticity that makes the wearer’s movements comfortable without using additional length. Preexisting textile-based transmission bands using Teflon-coated stainless steel multifilament yarns were too rigid to bend them into curvy shapes for stretchable construction. Therefore, we developed a new type of textile-based transmission line using silicon-coated stainless steel multifilament yarns, described hereafter.
2 Experiments This study was carried out in two phases. One was investigating the electrical resistance and appearance changes of two band types after abrasion and laundering. The other was evaluating the wearability of MP3 player jackets using the two types of bands for electrical signal transmission. 2.1 Phase Ι: Electrical Performance and Appearance Specimen. The textile-based transmission bands were woven with yarns of polyester filaments. When weaving, six strands of signal transmission lines were placed 2.55mm from each other in the warp direction of the band, and the width of the band was set at 19mm to allow easy connections with regular connectors. Two types of signal transmission lines were used to manufacture the bands. The characteristics of the bands are shown in Table 1. Type B was stretchable whereas Type A was Table 1. Characteristics of Specimen Type A
Base yarn Material Conductive yarn
Yarn size Yarn coating
100% Polyester filament 100% stainless steel filament ø= 63 , 20 filaments Teflon-coating: Wall thickness 0.18mm Outer diameter 0.7mm
㎛
Type B
100% Polyester filament 100% stainless steel filament ø = 11 , 180 filaments Silicon-coating: wall thickness 0.20mm, Outer diameter 0.7mm
㎛
808
Y. Yang and G. Cho
non-stretchable. To make the stretchable textile-based transmission band, coated stainless steel yarns were curled then wrapped with spandex yarns to fix the curling so that the band remains stretchable despite repeated extension and contractions. Abrasion Tests. To evaluate the abrasion durability of the bands, we used equipment and procedures recommended for general textile evaluation standards. The bands were tested following the Inflated diaphragm method (ASTM D3786) [9] using a Universal Textile Abrasion Tester (Intec Co. LTD., SEC. 28 NO. AR-1). The abrasion repetitions were selected at 500, 1.000, and 1.500 cycles, and the test was repeated three times for each type of sample. The test was conducted by the KATRI (Korea Apparel Testing & Research Institute) [10]. Laundering Tests. We laundered the bands up to 10 times using a drum washer (Samsung, SEW-5HR120) with 30 RPM, water temperature at 20 , and 70g of detergent for 5Kg of total laundry weight with dummies according to the testing conditions of ISO 6330 [11]. A single washing cycle consisted of 15min. washing, 13min. rinsing, and 13min. spin-drying.
℃
Measurement of Electrical Resistances. For electrical performances, we measured the electrical resistances of the bands that went through abrasion and laundering. We verified whether the electrical resistance of the transmission lines was maintained after the treatments using a RCL meter (Fluke, PM6304). Checking the six strands of the transmission lines within the band, we measured the resistance of each line and averaged the results, stated in Ω/cm. Image Analysis. The appearance of the bands before and after abrasion and laundering were observed with a digital microscope (ANMO, AM-311S) with 10 times magnification. 2.2 Phase ΙΙ: Wearability Test Clothing Prototype. Two identical MP3 player jackets were prepared with different textile-based transmission lines. As shown in Fig. 1, textile-based transmission bands were used in this clothing to connect a textile-based keypad placed in the left sleeve to a MP3 Player in the left inner pocket.
Fig. 1. MP3 Player Jacket and Inner Pocket of the Jacket
Subjects. We selected five female university students aged 25 to 28 year old as subjects. They had similar body sizes and were in the Korean standard size range.
Novel Stretchable Textile-Based Transmission Bands: Electrical Performance
809
Wear sensation measurements. To evaluate according to the type of textile-based transmission line, the two clothes were presented to the subjects in random order. They estimated convenience of action, feeling the presence of the MP3 player and band, hardness, and overall satisfaction for each cloth. The questionnaire consisted of 6 questions using a 7-point Likert-type scale. Convenience was evaluated with three questions about representative actions for MP3 player jackets: connecting the MP3 player to a band, putting the MP3 player in and out the inner pocket of the jacket. Three questions about annoyance notably due to feeling the presence of the system, about its hardness, and about the overall sensation were developed in a previous study [12].
3 Results and Discussion 3.1 Changes of Electrical Resistance after Abrasion and Laundering When we examined the effect of abrasion and laundering on electrical resistance, the bands appeared excellent before and after physical stress. As shown in Fig. 2 and 3, the electrical resistance of Type A and B were very low, revealing that electric current was flowing well even after abrasion and laundering were applied to the bands.
Fig. 2. Electrical Resistance Before and After Abrasion
Thus, the two types of textile-based transmission bands maintained their conductivity and showed excellent electrical durability. Therefore, both Teflon and silicon-coated stainless steel multifilament yarns can be used as signal transmission lines. As we used Teflon-coated stainless steel multifilament yarns and silicon-coated stainless steel multifilament yarns manufactured by companies and commercially available, the electrical resistance of Type A and B differ.
810
Y. Yang and G. Cho
Fig. 3. Electrical Resistances Before and After Laundering
3.2 Appearance Changes after Abrasion and Laundering The appearances of bands before and after abrasion are presented in Table 2. After repeated abrasion, there was neither stainless steel yarn exposure outside the Teflon and silicon coating nor disconnection for both types of bands. However, there was an increasing rate of transmission lines outside the band, and the base polyester fabric partly disconnected as the abrasion cycles increased. Even though disconnection of the polyester fabric doesn’t influence electrical performances, it can be influence wear sensation. Therefore, manufacturing methods fixing the transmission lines tightly to the base fabric must be developed. Both types of coated transmission lines remained unexposed outside the polyester fabric after laundering. As shown in Table 3, Type A had almost the same appearance before and after washing. Although the curvy shape of Type B became irregular and the edge of the bands slightly shrunk, its appearance barely differed after washing. Table 2. Microscopic Images Before and After Abrasion Before abrasion
Type A
Type B
500
Abrasion cycles 1,000
1,500
Novel Stretchable Textile-Based Transmission Bands: Electrical Performance
811
Table 3. Microscopic Images Before and After Laundering Type A Before Laundering
Laundered 10 times
Type B Before Laundering
Laundered 10 times
Fig. 4. Wearability Evaluation of an MP3 Player Jacket According to Band Types
3.3 Wearability The result of the wearability evaluations is shown in Fig. 4. Both the Teflon-coated textile-based transmission line (Type A) and silicon-coated line (Type B) were estimated convenient to connect the MP3 player (1.2; 1.4). Four of the five subjects gave the same score to Type A and B for this action. Therefore, the convenience of connecting an MP3 player to the band was barely influenced by the elasticity of the textile-based transmission band. Type A was evaluated less convenient than Type B, to put the MP3 player in (-1.6; 1.04) and out (1.0; 1.6) inner pocket of the jacket. Putting the band into the pocket was evaluated especially uncomfortable for Type A. We suppose that because Type A was non-stretchable and rigid, the band could not be easily moved into the pocket, on the contrary to Type B. When wearing the MP3 player jacket, all subjects felt annoyance due to feeling the presence of the system (-1.2; -0.2) and hardness (-1.6; -0.6) regardless of the band type. However, the jacket using Type A was evaluated more negatively than the one using Type B. Silicon-coated transmission lines used in Type B bands were soft and stretchable, in contrast, Teflon- coated ones used in Type A were rigid and nonstretchable. Therefore, when putting the Type A band into the pocket, the volume of the inner pocket increased. It means that a wearer feels not only the MP3 player, but also additional annoyance because of the band. It makes the wearer significantly feel
812
Y. Yang and G. Cho
the presence of the system and hardness. However, the overall wear sensation was not evaluated negatively (0.0; 0.8). It means that this kind of sensation was not enough to make subjects uncomfortable. In conclusion, the jacket using the Type B band for signal transmission was evaluated as more convenient and superior regarding wearablity. Therefore, to improve the wearability of an MP3 player jacket, it is more efficient to use the stretchable textile-based transmission band. Moreover, using soft and flexible materials such as silicon to make stretchable structure also improves wearability.
4 Conclusions In this study, we developed stretchable textile-based transmission bands using siliconcoated stainless steel multifilament yarns to improve the wearability of smart clothing. The results of changes in appearance of Teflon-coated transmission lines and silicon-coated transmission lines both based on polyester fabric before and after abrasion and laundering, showed no exposure and disconnection of stainless steel yarns. In addition, electrical resistance was perfectly maintained in both types of bands. We proved that the electrical performances and abrasion and laundering durability of silicon-coated transmission lines are as good as Teflon-coated lines, even though silicon coating is soft and flexible. The results of convenience and wearablity of MP3 player jackets using nonstretchable transmission bands based on Teflon-coated stainless steel yarns and stretchable bands based on silicon-coated stainless steel yarns showed that the stretchable band was more convenient and comfortable. Therefore, we developed better signal transmission lines that possess good physical durability and electrical performances as well as improved wearability. Acknowledgements. The authors would like to thank the Korean Ministry of Knowledge Economy for funding this effort.
References 1. Tao, X.: Wearable electronics and photonics, ch. 1. Woodhead publishing Ltd., Cambridge (2005) 2. Cottet, D., Grzyb, J., Kirstein, T., Tröster, G.: Electrical characterization of textile transmission lines. IEEE Transactions on Advanced Packaging 26(2), 182–190 (2003) 3. ELEKTRISOLA FEINDRAHT AG (February 2009), http://www.textile-wire.ch 4. Offray Specialty Narrow Fabrics (February 2009), http://www.osnf.com 5. Yang, Y., Sung, M., Cho, J., Jeong, K., Cho, G.: The Influence of Bending and Abrasion on the Electrical Properties of Textile-based Transmission Line Made of Teflon-coated Stainless Steel Yarns. In: International Conference on Intelligent Textiles 2007, Seoul, pp. 154–156 (2007) 6. Cho, J., Moon, J., Jeong, K., Cho, G.: Application of PU-sealing into Cu/Ni Electro-less Plated Polyester Fabrics for E-Textiles. Fibers and Polymers 8(3), 330–334 (2007)
Novel Stretchable Textile-Based Transmission Bands: Electrical Performance
813
7. Cho, J., Moon, J., Jeong, K., Cho, G.: Development of Transmission Threads for Smart Wear; By Coating Stainless Steel Yarns with Insulating Materials. In: International Conference on Intelligent Textiles 2006, Seoul, pp. 67–68 (2006) 8. Cho, J., Moon, J., Jeong, K., Cho, G.: Design and Evaluation of Textile-based Signal transmission Lines and Keypads for Smart Wear. In: Human Computer Interaction International, Beijing, pp. 1078–1085 (2007) 9. American Society for Testing and Materials, http://www.astm.org/Standards/D3786.htm 10. Korea Apparel Testing & Research Institute (February 2009), http://www.katri.re.kr/ 11. International Organization for Standardization, http://www.iso.org/iso/iso_catalogue/ catalogue_tc/catalogue_detail.htm?csnumber=32908 12. Chea, H., Hong, J., Kim, J., Kim, J., Han, K., Lee, J.: Usability Evaluation and Development of Design Prototyping for MP3 Smart Clothing Product. Emotion & Sensibility 10(3), 331–342 (2007)
Author Index
Ablaßmeier, Markus 211 Abou Khaled, Omar 473 Aedo, Ignacio 228 Alarcon, Rosa 67 Al Faraj, Khaldoun 3 Alam´ an Rold´ an, Xavier 330 Ando, Hideyuki 284 Andr´e, Elisabeth 340 Anupam, Vinod 11 Arase, Yuki 239 Argyros, Antonis 397 Arroyo, Ivon 713 Asahi, Toshiyuki 114 Bamidis, Panagiotis D. 565 Barreto, Armando 693 Bekiaris, Evangelos 377 Bellotti, Victoria 114 Billinghurst, Mark 387 Blach, Roland 407 Blanchard, Emmanuel G. 575 Blum, Rainer 314 Bratsas, Charalambos 565 Brdiczka, Oliver 621 Breuer, Henning 585 Broekens, Joost 605 Bubb, Heiner 211 Burleson, Winslow 713 Caelen, Jean 446 Carrino, Stefano 473 Chabbi Drissi, Houda 473 Chae, Haeng Suk 725 Chang, Mali 544 Chignell, Mark 660 Cho, Dong Lyun 760 Cho, Gilsoo 731, 778, 784, 806 Cho, Woon Jung 725 Cloud-Buckner, Jennifer 248 Cnossen, Fokie 693 Cobos Perez, Ruth 330 Cooper, David 713 Craig, Scotty 595 D’Mello, Sidney 595 Dai, Linong 125
Detweiler, Christian 605 D´ıaz, Paloma 228 Doke, Mamoru 257 Dong, Xiao 21 Dos Santos, Philippe 446 Drabe, Christian 501 Dragon, Toby 713 D¨ unser, Andreas 387 Duval, S´ebastien 731 Eom, Ju Il
416
Fagel, Sascha 349 Fan, Mingming 133 Fike, Karl 595 Frantzidis, Christos A. 565 Frommhagen, Klaus 501 Fukuda, Ryoko 615 Fukuzumi, Shin’ichi 114 Gallimore, Jennie 248 Georgalis, Yannis 397 George, Jonas 314 Germanakos, Panagiotis 320 Gerwig, Christian 501 Ghinea, Gheorghita 30 Gilbert, Stephen B. 651 Glasnapp, James 621 Go, Gihoon 433 Graesser, Arthur 595 Grammenos, Dimitris 397 Grasset, Rapha¨el 387 Grønli, Tor-Morten 30 Guerrero, Luis 67 Halkia, Matina 483 Hamaguichi, Narichika 257 Hamberger, Werner 211 Han, Kwang Hee 725 Hara, Takahiro 239 Hayashi, Yugo 267 Heimg¨ artner, R¨ udiger 275 Hermann, Fabian 407 Hermann, Marc 58 Ho, Chun-Heng 641 Hoareau, Christian 731
816
Author Index
Holman, Jerome 77 Hong, Sung Soo 416 Huang, Jiung-yao 741 Huang, Wan-Fu 141
Lahiri, Uttama 703 Lajoie, Susanne P. 575 Lakner, Hubert 501 Laquai, Florian 189 Lebrun, Yoann 446 Lee, Chil-Woo 423 Lee, Chung Ki 778 Lee, Kun-Hang 741 Lee, Kwangil 768 Lee, Moo Sung 760 Lee, Su Ho 778 Lee, Youngho 456 Lekkas, Zacharias 320 Lepreux, Sophie 446 Lisetti, Christine 693 Lithari, Chrysa 565 Liu, Changchun 703 Liu, Cheng-Li 544 Luneski, Andrej 565
Iben, Hendrik 752 Iizuka, Hiroyuki 284 Imamura, Naoki 294 Imanaka, Takeshi 151 Inami, Masahiko 536 Inoue, Seiki 257 Ishii, Koji 526 Islam, Md. Zahidul 423 Janssen, Doris 407 Jeng, Tay-Sheng 641 Jeon, Jaewook 433, 520 Jeong, Kee Sam 778, 784 Ji, Hyunjin 39 Ji, Yong Gu 768 Jung, Seok Myung 778 Jung, Soonmook 433, 520 Kallinen, Kari 95 Kalogirou, Kostas 377 Kaneko, Hiroyuki 257 Kartakis, Sokratis 397 Katai, Osamu 670 Kato, Toshikazu 294 Kawakami, Hiroshi 670 Kawazu, Kosuke 304 Keh, Huan-Chao 741 Kettner, Marlene 585 Khakzar, Karim 314 Kim, Kyungdoh 48 Kim, Seok Min 760 Kim, Soo Hyun 725 Kim, Sung Hun 760 Kim, Taeyong 39 Kiyokawa, Kiyoshi 793 Klados, Manousos A. 565 Klein, Thorsten 407 Klinker, Gudrun 159 Kolski, Christophe 446 Kono, Yasuyuki 151 Konstantinidis, Evdokimos I. Kotilainen, Niko 179 Kratky, Andreas 440 Kubicki, S´ebastien 446 K¨ uhnel, Christine 349 Kwon, Keyho 433, 520
565
Maeda, Taro 284 Maeno, Takashi 304 Mahler, Thorsten 58 Mase, Masayoshi 304 Matsui, Takahiro 151 Matsushima, Norikazu 554 McMullen, Kyla A. 631 Millard, Mark O. 464 Miwa, Kazuhisa 267 Miwa, Yoshiyuki 554 Miyaki, Takashi 526 Mizoguchi, Riichiro 575 Mojahid, Mustapha 3 Mok, Jon 660 Morales Kluge, Ernesto 752 M¨ oller, Sebastian 349 Moreno-Llorena, Jaime 330 Morizane, Masao 492 Mou, Tsai-Yun 641 Mourlas, Constantinos 320 Mourouzis, Alexandros 377 Mugellini, Elena 473 Mukawa, Naoki 368 Muldner, Kasia 713 M¨ uller, Boris 483 Murai, Hajime 679 Murao, Masakazu 304 Nagel, Till 483 Nakano, Yukiko 340, 492
Author Index Nakao, Yusuke 114 Nebe, Karsten 169 Nestler, Simon 159 Nii, Hideaki 536 Nishi, Hiroko 554 Nishio, Shojiro 239 Ochoa, Sergio 67 Ogawa, Takefumi 793 Ogino, Akihiro 294 Oh, Chi-Min 423 Ohta, Takeru 304 Oren, Michael A. 651 Ortega, Francisco 693 Osaki, Kouzi 358 Paelke, Volker 169 Panou, Mary 377 Papadelis, Christos L. 565 Pappas, Costas 565 Park, Eun Ju 760 Park, Jihwan 520 Partarakis, Nikolaos 397 P´erez-Qui˜ nones, Manuel A. 77 Pham, Thien Cong 520 Pierroz, St´ephane 473 Pirhonen, Antti 179 Plocher, Thomas 86, 105 Poitschke, Tony 189 Preuschen, Nathalie 585 Proctor, Robert W. 48 Pschetz, Larissa 483 Pyla, Pardha S. 77 Qu, Weina
86, 105
Rapino, Marco 95 Rau, Pei-Luen Patrick Ravaja, Niklas 95 Ravandi, Mehdi 660 Rehm, Matthias 340 Rekimoto, Jun 526 Rigoll, Gerhard 189
21
Saari, Timo 95 Sainathuni, Bhanuteja 248 Salminen, Mikko 95 Salvendy, Gavriel 48 Samaras, George 320 Sandner, Thilo 501 Sarkar, Nilanjan 703 Sarmis, Thomas 397
Schenk, Harald 501 Schmid, Falko 199 Scholles, Michael 501 Schuller, Andreas 407 Seichter, Hartmut 387 Sellick, Michael 248 Seto, Ryutaro 554 Shi, Yuanchun 133 Shibutani, Midori 802 Shimohara, Katsunori 492 Shin, Choonsung 456 Sixsmith, Andrew 511 Song, Taehoun 433, 520 Soylu, Firat 464 Spath, Dieter 407 Spies, Roland 211 Stefaner, Moritz 483 Steinhoff, Fee 585 Stephanidis, Constantine 397 Sueda, Koh 526 Sugimoto, Maki 536 Sugimura, Yukinobu 221 Sun, Xianghong 86, 105 Sung, Moonsoo 784 Suto, Hidetsugu 670 Suzuki, Shunsuke 114 Takada, Daisuke 793 Takemura, Haruo 793 Tanev, Ivan 492 Tokiwa, Takuji 536 Tokosumi, Akifumi 679 Tokunaga, Hiroko 368 T¨ onnis, Marcus 159 Tourlakis, Panagiotis 397 Tsai, Chung-Hsien 741 Tsianos, Nikos 320 Tsuji, Kohei 802 Tung, Ming-Chih 741 Tungare, Manas 77 Uang, Shiaw-Tsyr 544 Umemuro, Hiroyuki 683 van der Zant, Tijn 693 Vargas, Mario Rafael Ruiz Verhoef, Tessa 693 Vigouroux, Nadine 3 Wada, Chikamune Wagler, Matthias
221 585
228
817
818
Author Index
Wakefield, Gregory H. 631 Wakita, Akira 802 Wang, Li 86, 105 Warren, Zachary 703 Watanabe, Takabumi 554 Watanabe, Tomio 358 Weber, Michael 58 Wechsung, Ina 349 Weiss, Benjamin 349 Welch, Karla Conn 703 Weller, Rebecca 703 Witt, Hendrik 752 Woo, Woontack 456 Woolf, Beverly 713 Wu, Chih-Fu 141
Wu, Ji-jen 741 Wu, Tung-Chen 141 Yamamoto, Michiya 358 Yang, Betty 248 Yang, Yoonjung 806 Yee, Nick 114 Yoneda, Yu 221 Yoo, Sun K. 778 Yoshidzumi, Masashi 536 Yuasa, Masahide 368 Zabulis, Xenophon 397 Zarraonandia, Telmo 228