Human-Computer Interaction. New Trends: 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19-24, 2009, Proceedings, Part ... Applications, incl. Internet Web, and HCI)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Author: Julie A. Jacko

14 downloads 4978 Views 31MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

5610

Julie A. Jacko (Ed.)

Human-Computer Interaction New Trends 13th International Conference, HCI International 2009 San Diego, CA, USA, July 19-24, 2009 Proceedings, Part I

13

Volume Editor Julie A. Jacko University of Minnesota Institute of Health Informatics MMC 912, 420 Delaware Street S.E., Minneapolis, MN 55455, USA E-mail: [email protected]

Library of Congress Control Number: 2009929048 CR Subject Classification (1998): H.5, I.3, I.7.5, I.5, I.2.10 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13

0302-9743 3-642-02573-0 Springer Berlin Heidelberg New York 978-3-642-02573-0 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12705719 06/3180 543210

Foreword

The 13th International Conference on Human–Computer Interaction, HCI International 2009, was held in San Diego, California, USA, July 19–24, 2009, jointly with the Symposium on Human Interface (Japan) 2009, the 8th International Conference on Engineering Psychology and Cognitive Ergonomics, the 5th International Conference on Universal Access in Human–Computer Interaction, the Third International Conference on Virtual and Mixed Reality, the Third International Conference on Internationalization, Design and Global Development, the Third International Conference on Online Communities and Social Computing, the 5th International Conference on Augmented Cognition, the Second International Conference on Digital Human Modeling, and the First International Conference on Human Centered Design. A total of 4,348 individuals from academia, research institutes, industry and governmental agencies from 73 countries submitted contributions, and 1,397 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human–computer interaction, addressing major advances in the knowledge and effective use of computers in a variety of application areas. This volume, edited by Julie A. Jacko, contains papers in the thematic area of Human–Computer Interaction, addressing the following major topics: • • • • •

Novel Techniques for Measuring and Monitoring Evaluation Methods, Techniques and Tools User Studies User Interface Design Development Approaches, Methods and Tools

The remaining volumes of the HCI International 2009 proceedings are: • • • • •

Volume 2, LNCS 5611, Human–Computer Interaction––Novel Interaction Methods and Techniques (Part II), edited by Julie A. Jacko Volume 3, LNCS 5612, Human–Computer Interaction––Ambient, Ubiquitous and Intelligent Interaction (Part III), edited by Julie A. Jacko Volume 4, LNCS 5613, Human–Computer Interaction - Interacting in Various Application Domains (Part IV), edited by Julie A. Jacko Volume 5, LNCS 5614, Universal Access in Human–Computer Interaction––Addressing Diversity (Part I), edited by Constantine Stephanidis Volume 6, LNCS 5615, Universal Access in Human–Computer Interaction––Intelligent and Ubiquitous Interaction Environments (Part II), edited by Constantine Stephanidis

VI

Foreword

• • • • • • • • • • •

Volume 7, LNCS 5616, Universal Access in Human–Computer Interaction––Applications and Services (Part III), edited by Constantine Stephanidis Volume 8, LNCS 5617, Human Interface and the Management of Information––Designing Information Environments (Part I), edited by Michael J. Smith and Gavriel Salvendy Volume 9, LNCS 5618, Human Interface and the Management of Information––Information and Interaction (Part II), edited by Gavriel Salvendy and Michael J. Smith Volume 10, LNCS 5619, Human Centered Design, edited by Masaaki Kurosu Volume 11, LNCS 5620, Digital Human Modeling, edited by Vincent G. Duffy Volume 12, LNCS 5621, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris Volume 13, LNCS 5622, Virtual and Mixed Reality, edited by Randall Shumaker Volume 14, LNCS 5623, Internationalization, Design and Global Development, edited by Nuray Aykin Volume 15, LNCS 5624, Ergonomics and Health Aspects of Work with Computers, edited by Ben-Tzion Karsh Volume 16, LNAI 5638, The Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience, edited by Dylan Schmorrow, Ivy Estabrooke and Marc Grootjen Volume 17, LNAI 5639, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris

I would like to thank the Program Chairs and the members of the Program Boards of all thematic areas, listed below, for their contribution to the highest scientific quality and the overall success of HCI International 2009.

Ergonomics and Health Aspects of Work with Computers Program Chair: Ben-Tzion Karsh Arne Aarås, Norway Pascale Carayon, USA Barbara G.F. Cohen, USA Wolfgang Friesdorf, Germany John Gosbee, USA Martin Helander, Singapore Ed Israelski, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Kari Lindström, Finland

Holger Luczak, Germany Aura C. Matias, Philippines Kyung (Ken) Park, Korea Michelle M. Robertson, USA Michelle L. Rogers, USA Steven L. Sauter, USA Dominique L. Scapin, France Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK Teresa Zayas-Cabán, USA

Foreword

Human Interface and the Management of Information Program Chair: Michael J. Smith Gunilla Bradley, Sweden Hans-Jörg Bullinger, Germany Alan Chan, Hong Kong Klaus-Peter Fähnrich, Germany Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA Youngho Rhee, Korea

Anxo Cereijo Roibás, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P.R. China Bernhard Zimolong, Germany

Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven Landry, USA

Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China

Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Ron Laughery, USA Wen-Chin Li, Taiwan James T. Luxhøj, USA

Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK

VII

VIII

Foreword

Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa

Matthew J.W. Thomas, Australia Mark Young, UK

Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth André, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian Bühler, Germany Noelle Carbonell, France Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Arthur I. Karshmer, USA Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA

Patrick M. Langdon, UK Seongil Lee, Korea Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Toshiki Yamaoka, Japan Panayiotis Zaphiris, UK

Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA

Gordon M. Mair, UK Miguel A. Otaduy, Switzerland David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Dieter Schmalstieg, Austria Dylan Schmorrow, USA Mark Wiederhold, USA

Internationalization, Design and Global Development Program Chair: Nuray Aykin Michael L. Best, USA Ram Bishu, USA Alan Chan, Hong Kong Andy M. Dearden, UK

Susan M. Dray, USA Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA

Foreword

Sung H. Han, Korea Veikko Ikonen, Finland Esin Kiris, USA Masaaki Kurosu, Japan Apala Lahiri Chavan, USA James R. Lewis, USA Ann Light, UK James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA

Elizabeth D. Mynatt, USA Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin Röse, Germany Eunice Ratna Sari, Indonesia Supriya Singh, Australia Christian Sturm, Spain Adi Tedjasaputra, Singapore Kentaro Toyama, India Alvin W. Yeo, Malaysia Chen Zhao, P.R. China Wei Zhou, P.R. China

Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Amy Bruckman, USA Peter Day, UK Fiorella De Cindio, Italy Michael Gurstein, Canada Tom Horan, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Jonathan Lazar, USA Stefanie Lindstaedt, Austria

Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan Anthony F. Norcio, USA Jennifer Preece, USA Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Sergei Stafeev, Russia Charalambos Vrasidas, Cyprus Cheng-Yen Wang, Taiwan

Augmented Cognition Program Chair: Dylan D. Schmorrow Andy Bellenkes, USA Andrew Belyavin, UK Joseph Cohn, USA Martha E. Crosby, USA Tjerk de Greef, The Netherlands Blair Dickson, UK Traci Downs, USA Julie Drexler, USA Ivy Estabrooke, USA Cali Fidopiastis, USA Chris Forsythe, USA Wai Tat Fu, USA Henry Girolamo, USA

Marc Grootjen, The Netherlands Taro Kanno, Japan Wilhelm E. Kincses, Germany David Kobus, USA Santosh Mathan, USA Rob Matthews, Australia Dennis McBride, USA Robert McCann, USA Jeff Morrison, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Glenn Osga, USA

IX

X

Foreword

Dennis Proffitt, USA Leah Reeves, USA Mike Russo, USA Kay Stanney, USA Roy Stripling, USA Mike Swetnam, USA Rob Taylor, UK

Maria L.Thomas, USA Peter-Paul van Maanen, The Netherlands Karl van Orden, USA Roman Vilimek, Germany Glenn Wilson, USA Thorsten Zander, Germany

Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Thomas J. Armstrong, USA Norm Badler, USA Kathryn Cormican, Ireland Afzal Godil, USA Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Sung H. Han, Korea Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Tianzi Jiang, P.R. China

Kang Li, USA Zhizhong Li, P.R. China Timo J. Määttä, Finland Woojin Park, USA Matthew Parkinson, USA Jim Potvin, Canada Rajesh Subramanian, USA Xuguang Wang, France John F. Wiechel, USA Jingzhou (James) Yang, USA Xiu-gan Yuan, P.R. China

Human Centered Design Program Chair: Masaaki Kurosu Gerhard Fischer, USA Tom Gross, Germany Naotake Hirasawa, Japan Yasuhiro Horibe, Japan Minna Isomursu, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan

Kun-Pyo Lee, Korea Loïc Martínez-Normand, Spain Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Kazuhiko Yamazaki, Japan

In addition to the members of the Program Boards above, I also wish to thank the following volunteer external reviewers: Gavin Lew from the USA, Daniel Su from the UK, and Ilia Adami, Ioannis Basdekis, Yannis Georgalis, Panagiotis Karampelas, Iosif Klironomos, Alexandros Mourouzis, and Stavroula Ntoa from Greece. This conference could not have been possible without the continuous support and advice of the Conference Scientific Advisor, Prof. Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications Chair and Editor of HCI International News, Abbas Moallem.

Foreword

XI

I would also like to thank for their contribution toward the organization of the HCI International 2009 conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, and Maria Bouhli. Constantine Stephanidis

HCI International 2011

The 14th International Conference on Human–Computer Interaction, HCI International 2011, will be held jointly with the affiliated conferences in the summer of 2011. It will cover a broad spectrum of themes related to human–computer interaction, including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. More information about the topics, as well as the venue and dates of the conference, will be announced through the HCI International Conference series website: http://www.hci-international.org/

General Chair Professor Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email: [email protected]

Table of Contents

Part I: Novel Techniques for Measuring and Monitoring Automatic Method for Measuring Eye Blinks Using Split-Interlaced Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kiyohiko Abe, Shoichi Ohi, and Minoru Ohyama

3

A Usability Study of WebMaps with Eye Tracking Tool: The Eﬀects of Iconic Representation of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ¨ Ozge Ala¸cam and Mustafa Dalcı

12

Feature Extraction and Selection for Inferring User Engagement in an HCI Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stylianos Asteriadis, Kostas Karpouzis, and Stefanos Kollias

22

Informative or Misleading? Heatmaps Deconstructed . . . . . . . . . . . . . . . . . Agnieszka (Aga) Bojko

30

Toward EEG Sensing of Imagined Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael D’Zmura, Siyi Deng, Tom Lappas, Samuel Thorpe, and Ramesh Srinivasan

40

Monitoring and Processing of the Pupil Diameter Signal for Aﬀective Assessment of a Computer User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ying Gao, Armando Barreto, and Malek Adjouadi

49

Usability Evaluation by Monitoring Physiological and Other Data Simultaneously with a Time-Resolution of Only a Few Seconds . . . . . . . . K´ aroly Hercegﬁ, M´ arton P´ aszti, Sarolta T´ ov¨ olgyi, and Lajos Izs´ o

59

Study of Human Anxiety on the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . Santosh Kumar Kalwar and Kari Heikkinen

69

The Research on Adaptive Process for Emotion Recognition by Using Time-Dependent Parameters of Autonomic Nervous Response . . . . . . . . . Jonghwa Kim, Mincheol Whang, and Jincheol Woo

77

Students’ Visual Perceptions of Virtual Lectures as Measured by Eye Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yu-Jin Kim, Jin Ah Bae, and Byeong Ho Jeon

85

Toward Constructing an Electroencephalogram Measurement Method for Usability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masaki Kimura, Hidetake Uwano, Masao Ohira, and Ken-ichi Matsumoto

95

XVI

Table of Contents

Automated Analysis of Eye-Tracking Data for the Evaluation of Driver Information Systems According to ISO/TS 15007-2:2001 . . . . . . . . . . . . . . Christian Lange, Martin Wohlfarter, and Heiner Bubb Brain Response to Good and Bad Design . . . . . . . . . . . . . . . . . . . . . . . . . . . Haeinn Lee, Jungtae Lee, and Ssanghee Seo An Analysis of Eye Movements during Browsing Multiple Search Results Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuko Matsuda, Hidetake Uwano, Masao Ohira, and Ken-ichi Matsumoto Development of Estimation System for Concentrate Situation Using Acceleration Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masashi Okubo and Aya Fujimura Psychophysiology as a Tool for HCI Research: Promises and Pitfalls . . . . Byungho Park Assessing NeuroSky’s Usability to Detect Attention Levels in an Assessment Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genaro Rebolledo-Mendez, Ian Dunwell, Erika A. Mart´ınez-Mir´ on, Mar´ıa Dolores Vargas-Cerd´ an, Sara de Freitas, Fotis Liarokapis, and Alma R. Garc´ıa-Gaona Eﬀect of Body Movement on Music Expressivity in Jazz Performances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mamiko Sakata, Sayaka Wakamiya, Naoki Odaka, and Kozaburo Hachimura

105 111

121

131 141

149

159

A Method to Monitor Operator Overloading . . . . . . . . . . . . . . . . . . . . . . . . . Dvijesh Shastri, Ioannis Pavlidis, and Avinash Wesley

169

Decoding Attentional Orientation from EEG Spectra . . . . . . . . . . . . . . . . . Ramesh Srinivasan, Samuel Thorpe, Siyi Deng, Tom Lappas, and Michael D’Zmura

176

On the Possibility about Performance Estimation Just before Beginning a Voluntary Motion Using Movement Related Cortical Potential . . . . . . . Satoshi Suzuki, Takemi Matsui, Yusuke Sakaguchi, Kazuhiro Ando, Nobuyuki Nishiuchi, Toshimasa Yamazaki, and Shin’ichi Fukuzumi

184

Part II: Evaluation Methods, Techniques and Tools A Usability Evaluation Method Applying AHP and Treemap Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toshiyuki Asahi, Teruya Ikegami, and Shin’ichi Fukuzumi

195

Table of Contents

Evaluation of User-Interfaces for Mobile Application Development Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florence Balagtas-Fernandez and Heinrich Hussmann User-Centered Design and Evaluation – The Big Picture . . . . . . . . . . . . . . Victoria Bellotti, Shin’ichi Fukuzumi, Toshiyuki Asahi, and Shunsuke Suzuki

XVII

204 214

Web-Based System Development for Usability Evaluation of Ubiquitous Computing Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jong Kyu Choi, Han Joon Kim, Beom Suk Jin, and Yonggu Ji

224

Evaluating Mobile Usability: The Role of Fidelity in Full-Scale Laboratory Simulations with Mobile ICT for Hospitals . . . . . . . . . . . . . . . . Yngve Dahl, Ole Andreas Alsos, and Dag Svanæs

232

A Multidimensional Approach for the Evaluation of Mobile Application User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jos´e Eust´ aquio Rangel de Queiroz and Danilo de Sousa Ferreira

242

Development of Quantitative Usability Evaluation Method . . . . . . . . . . . . Shin’ichi Fukuzumi, Teruya Ikegami, and Hidehiko Okada

252

Reference Model for Quality Assurance of Speech Applications . . . . . . . . . Cornelia Hipp and Matthias Peissner

259

Toward Cognitive Modeling for Predicting Usability . . . . . . . . . . . . . . . . . . Bonnie E. John and Shunsuke Suzuki

267

Webjig: An Automated User Data Collection System for Website Usability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mikio Kiura, Masao Ohira, and Ken-ichi Matsumoto ADiEU: Toward Domain-Based Evaluation of Spoken Dialog Systems . . . Jan Kleindienst, Jan Cuˇr´ın, and Martin Labsk´ y Interpretation of User Evaluation for Emotional Speech Synthesis System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ho-Joon Lee and Jong C. Park Multi-level Validation of the ISOmetrics Questionnaire Based on Qualitative and Quantitative Data Obtained from a Conventional Usability Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan-Paul Leuteritz, Harald Widlroither, and Michael Kl¨ uh What Do Users Really Do? Experience Sampling in the 21st Century . . . Gavin S. Lew Evaluating Usability-Supporting Architecture Patterns: Reactions from Usability Professionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edgardo Luzcando, Davide Bolchini, and Anthony Faiola

277 287

295

304 314

320

XVIII

Table of Contents

Heuristic Evaluations of Bioinformatics Tools: A Development Case . . . . Barbara Mirel and Zach Wright A Prototype to Validate ErgoCoIn: A Web Site Ergonomic Inspection Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcelo Morandini, Walter de Abreu Cybis, and Dominique L. Scapin

329

339

Mobile Phone Usability Questionnaire (MPUQ) and Automated Usability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Young Sam Ryu

349

Estimating Productivity: Composite Operators for Keystroke Level Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeﬀ Sauro

352

Paper to Electronic Questionnaires: Eﬀects on Structured Questionnaire Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anna Trujillo

362

Website Designer as an Evaluator: A Formative Evaluation Method for Website Interface Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chao-Yang Yang

372

Part III: User Studies Building on the Usability Study: Two Explorations on How to Better Understand an Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anshu Agarwal and Madhu Prabaker Measuring User Performance for Diﬀerent Interfaces Using a Word Processor Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanya R. Beelders, Pieter J. Blignaut, Theo McDonald, and Engela H. Dednam

385

395

Evaluating User Eﬀectiveness in Exploratory Search with TouchGraph Google Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kemal Efe and Sabriye Ozerturk

405

What Do Users Want to See? A Content Preparation Study for Consumer Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yinni Guo, Robert W. Proctor, and Gavriel Salvendy

413

“I Love My iPhone... But There Are Certain Things That ‘Niggle’ Me” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anna Haywood and Gemma Boguslawski

421

Table of Contents

Acceptance of Future Technologies Using Personal Data: A Focus Group with Young Internet Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabian Hermann, Doris Janssen, Daniel Schipke, and Andreas Schuller Analysis of Breakdowns in Menu-Based Interaction Based on Information Scent Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yukio Horiguchi, Hiroaki Nakanishi, Tetsuo Sawaragi, and Yuji Kuroda E-Shopping Behavior and User-Web Interaction for Developing a Useful Green Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fei-Hui Huang, Ying-Lien Lee, and Sheue-Ling Hwang Interaction Comparison among Media Internet Genre . . . . . . . . . . . . . . . . . Sang Hee Kweon, Eun Joung Cho, and Ae Jin Cho Comparing the Usability of the Icons and Functions between IE6.0 and IE7.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chiuhsiang Joe Lin, Min-Chih Hsieh, Hui-Chi Yu, Ping-Jung Tsai, and Wei-Jung Shiang

XIX

431

438

446

455

465

Goods-Finding and Orientation in the Elderly on 3D Virtual Store Interface: The Impact of Classiﬁcation and Landmarks . . . . . . . . . . . . . . . . Cheng-Li Liu, Shiaw-Tsyr Uang, and Chen-Hao Chang

474

Eﬀects of Gender Diﬀerence on Emergency Operation Interface Design in Semiconductor Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hunszu Liu

484

Evaluating a Personal Communication Tool: Sidebar . . . . . . . . . . . . . . . . . Malena Mesarina, Jhilmil Jain, Craig Sayers, Tyler Close, and John Recker

490

“You’ve Got IMs!” How People Manage Concurrent Instant Messages . . . Shailendra Rao, Judy Chen, Robin Jeﬀries, and Richard Boardman

500

Investigating Children Preferences of a User Interface Design . . . . . . . . . . Jamaliah Taslim, Wan Adilah Wan Adnan, and Noor Azyanti Abu Bakar

510

Usability Evaluation of Graphic Design for Ilmu’s Interface . . . . . . . . . . . . Tengku Siti Meriam Tengku Wook and Siti Salwa Salim

514

Are We Trapped by Majority Inﬂuences in Electronic Word-of-Mouth? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yu Tong and Yinqing Zhong

520

XX

Table of Contents

Leveraging a User Research Framework to Guide Research Investments: Windows Vista Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gayna Williams A Usability Evaluation of Public Icon Interface . . . . . . . . . . . . . . . . . . . . . . Sungyoung Yoon, Jonghoon Seo, Joonyoung Yoon, Seungchul Shin, and Tack-Don Han

530 540

Part IV: User Interface Design Little Design Up-Front: A Design Science Approach to Integrating Usability into Agile Requirements Engineering . . . . . . . . . . . . . . . . . . . . . . . Sisira Adikari, Craig McDonald, and John Campbell

549

Aesthetics in Human-Computer Interaction: Views and Reviews . . . . . . . Salah Uddin Ahmed, Abdullah Al Mahmud, and Kristin Bergaust

559

Providing an Eﬃcient Way to Make Desktop Icons Visible . . . . . . . . . . . . Toshiya Akasaka and Yusaku Okada

569

An Integration of Task and Use-Case Meta-models . . . . . . . . . . . . . . . . . . . R´emi Bastide

579

Model-Based Speciﬁcation and Validation of User Interface Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Birgit Bomsdorf and Daniel Sinnig

587

A Position Paper on ’Living Laboratories’: Rethinking Ecological Designs and Experimentation in Human-Computer Interaction . . . . . . . . . Ed H. Chi

597

Embodied Interaction or Context-Aware Computing? An Integrated Approach to Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Johan Eliasson, Teresa Cerratto Pargman, and Robert Ramberg

606

Supporting Multidisciplinary Teams and Early Design Stages Using Storyboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mieke Haesen, Jan Meskens, Kris Luyten, and Karin Coninx

616

Agent-Based Architecture for Interactive System Design: Current Approaches, Perspectives and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . Christophe Kolski, Peter Forbrig, Bertrand David, Patrick Girard, Chi Dung Tran, and Houcine Ezzedine BunBunMovie: Scenario Visualizing System Based on 3-D Character . . . Tomoya Matsuo and Takashi Yoshino Augmented Collaborative Card-Based Creative Activity with Digital Pens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motoki Miura, Taro Sugihara, and Susumu Kunifuji

624

634

644

Table of Contents

Usability-Engineering-Requirements as a Basis for the Integration with Software Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karsten Nebe and Volker Paelke Design Creation Based on KANSEI in Toshiba . . . . . . . . . . . . . . . . . . . . . . Yosoko Nishizawa and Kanya Hiroi High-Fidelity Prototyping of Interactive Systems Can Be Formal Too . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philippe Palanque, Jean-Fran¸cois Ladry, David Navarre, and Eric Barboni

XXI

652 660

667

RUCID: Rapid Usable Consistent Interaction Design Patterns-Based Mobile Phone UI Design Library, Process and Tool . . . . . . . . . . . . . . . . . . . Avinash Raj and Vihari Komaragiri

677

The Appropriation of Information and Communication Technology: A Cross-Cultural Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jose Rojas and Matthew Chalmers

687

UISK: Supporting Model-Driven and Sketch-Driven Paperless Prototyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vin´ıcius Costa Villas Bˆ oas Segura and Simone Diniz Junqueira Barbosa

697

Beyond the User Interface: Towards User-Centred Design of Online Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcin Sikorski

706

Designing for Change: Engineering Adaptable and Adaptive User Interaction by Focusing on User Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bruno S. da Silva, Ariane M. Bueno, and Simone D.J. Barbosa

715

Productive Love: A New Proposal for Designing Aﬀective Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ramon Solves Pujol and Hiroyuki Umemuro

725

Insight into Kansei Color Combinations in Interactive User Interface Designing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K.G.D. Tharangie, Shuichi Matsuzaki, Ashu Marasinghe, and Koichi Yamada Learn as Babies Learn: A Conceptual Model of Designing Optimum Learnability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Douglas Xiaoyong Wang Time-Oriented Interface Design: Picking the Right Time and Method for Information Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keita Watanabe, Kei Sugawara, Shota Matsuda, and Michiaki Yasumura

735

745

752

XXII

Table of Contents

Enabling Interactive Access to Web Tables . . . . . . . . . . . . . . . . . . . . . . . . . . Xin Yang, Wenchang Xu, and Yuanchun Shi

760

Integration of Creativity into Website Design . . . . . . . . . . . . . . . . . . . . . . . . Liang Zeng, Robert W. Proctor, and Gavriel Salvendy

769

Part V: Development Approaches, Methods and Tools YVision: A General Purpose Software Composition Framework . . . . . . . . Ant˜ ao Almada, Gon¸calo Lopes, Andr´e Almeida, Jo˜ ao Fraz˜ ao, and Nuno Cardoso Collaborative Development and New Devices for Human-Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hans-J¨ org Bullinger and Gunnar Brink

779

789

Orchestration Modeling of Interactive Systems . . . . . . . . . . . . . . . . . . . . . . . Bertrand David and Ren´e Chalon

796

An Exploration of Perspective Changes within MBD . . . . . . . . . . . . . . . . . Anke Dittmar and Peter Forbrig

806

Rapid Development of Scoped User Interfaces . . . . . . . . . . . . . . . . . . . . . . . Denis Dub´e, Jacob Beard, and Hans Vangheluwe

816

PaMGIS: A Framework for Pattern-Based Modeling and Generation of Interactive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J¨ urgen Engel and Christian M¨ artin

826

People-Oriented Programming: From Agent-Oriented Analysis to the Design of Interactive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steve Goschnick

836

Visualization of Software and Systems as Support Mechanism for Integrated Software Project Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Liggesmeyer, Jens Heidrich, J¨ urgen M¨ unch, Robert Kalckl¨ osch, Henning Barthel, and Dirk Zeckzer Collage: A Declarative Programming Model for Compositional Development of Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bruce Lucas, Rahul Akolkar, and Charlie Wiecha Hypernetwork Model to Represent Similarity Details Applied to Musical Instrument Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tetsuya Maeshiro, Midori Maeshiro, Katsunori Shimohara, and Shin-ichi Nakayama

846

856

866

Table of Contents

Open Collaborative Development: Trends, Tools, and Tactics . . . . . . . . . . Kathrin M. Moeslein, Angelika C. Bullinger, and Jens Soeldner

XXIII

874

Investigating the Run Time Behavior of Distributed Applications by Using Tiny Java Virtual Machines with Wireless Communications . . . . . . Tsuyoshi Miyazaki, Takayuki Suzuki, and Fujio Yamamoto

882

OntoDesk: Ontology-Based Persistent System-Wide Undo on the Desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Nemeskey, Buntarou Shizuki, and Jiro Tanaka

890

Peer-to-Peer File Sharing Communication Detection System with Traﬃc Mining and Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Satoshi Togawa, Kazuhide Kanenishi, and Yoneo Yano

900

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

911

Automatic Method for Measuring Eye Blinks Using Split-Interlaced Images Kiyohiko Abe1, Shoichi Ohi2, and Minoru Ohyama3 1

College of Engineering, Kanto Gakuin University, 1-50-1 Mutsuura-higashi, Kanazawa-ku, Yokohama, Kanagawa 236-8501, Japan 2 School of Engineering, Tokyo Denki University, 2-2 Kandanishiki-cho, Chiyoda-ku, Tokyo 101-8457, Japan 3 School of Information Environment, Tokyo Denki University, 2-1200 Muzaigakuendai, Inzai-shi, Chiba 270-1382, Japan [email protected], [email protected], [email protected]

Abstract. We propose a new eye blink detection method that uses NTSC video cameras. This method utilizes split-interlaced images of the eye. These split images are odd- and even-field images in the NTSC format and are generated from NTSC frames (interlaced images). The proposed method yields a time resolution that is double that in the NTSC format; that is, the detailed temporal change that occurs during the process of eye blinking can be measured. To verify the accuracy of the proposed method, experiments are performed using a high-speed digital video camera. Furthermore, results obtained using the NTSC camera were compared with those obtained using the high-speed digital video camera. We also report experimental results for comparing measurements made by the NTSC camera and the high-speed digital video camera. Keywords: Eye Blink, Interlaced Image, Natural Light, Image Analysis, HighSpeed Camera.

1 Introduction The blinking of the eye is related to factors such as human cognition, fatigue, and depressed consciousness; many studies have investigated eye blinking in relation to these factors. Most conventional methods for the measurement of the eye blink analyze eye images (images of the eye and its surrounding skin) captured by a video camera [1], [2], [3]. The NTSC video cameras that are commonly used are capable of detecting eye blinks; however, it is difficult for these cameras to measure the detailed temporal change occurring during the process of eye blinking, because eye blinks occur relatively fast (within a few hundred milliseconds). Therefore, a high-speed camera is required for an accurate measurement of the eye blink [3]. NTSC video cameras capture moving images at 60 fields/s and these field images are mixed with images that have a frame rate of 30 frames/s (fps) to field interlaced images. In this paper, we propose a new method for measuring the eye blink that uses J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 3–11, 2009. © Springer-Verlag Berlin Heidelberg 2009

4

K. Abe, S. Ohi, and M. Ohyama

NTSC video cameras. This method utilizes split-interlaced images of the eye captured by an NTSC video camera. These split images are odd- and even-field images in the NTSC format and are generated from NTSC frames (interlaced images). The proposed method yields a time resolution that is twice that in the NTSC format. Therefore, the detailed temporal change that occurs during the process of eye blinking can be measured. To verify the accuracy of the proposed method, we performed experiments using a high-speed digital video camera. Thereafter, we compared results obtained using the NTSC cameras with those obtained using the high-speed digital video camera. This paper also presents experiments that evaluate the proposed automatic method for measuring eye blinks.

2 Open-Eye Area Extraction Method by Image Analysis In general, eye blinks are estimated by measuring the open-eye area [2] or on basis of characteristics of specific moving points between the upper and lower eyelids [3]. Many of these methods utilize image analysis. It is possible to measure the wave pattern of eye blinks if the entire process of an eye blink is captured [3]. Furthermore, the type of eye blink and/or its velocity can be estimated on the basis of this wave pattern. However, it is difficult to measure the wave patterns of eye blinks by using video cameras that are commonly used for measuring eye blinks because the resulting eye images include high noise content owing to the change in light conditions. We have developed a new method for measuring the wave pattern of an eye blink. This method can be used with common indoor lighting sources such as fluorescent lights, and it can measure the wave pattern automatically. Hence, our proposed measurement method can be used under a variety of experimental conditions. In this method, the wave pattern is obtained by counting the number of pixels in the openeye area of the image as captured by a video camera. This image is enlarged for capturing the detailed eye image. We have proposed an algorithm for extracting the open-eye area in a previous study [4]. It utilizes color information of eye images. We have adapted the algorithm to our proposed method for elucidating the wave pattern of eye blink measurement. This algorithm has been developed for our eye-gaze input system, in which it compensates and traces head-movement [5]. Furthermore, the algorithm has been used under common indoor sources of light for a prolonged period. Hereafter, we describe in detail our image-processing algorithm for extracting the open-eye area. 2.1 Binarization Using Color Information on Image Many methods have been developed for the purpose of skin-color extraction; these methods are primarily focused on facial image processing, including those that utilize color information on a facial image. They mostly determine threshold skin-color values statistically or empirically [6]. We have developed an automatic algorithm for estimating thresholds of skin-color. Our algorithm can extract the open-eye area from the eye image on the basis of the skin-color.

Automatic Method for Measuring Eye Blinks Using Split-Interlaced Images

5

Using our algorithm, skin-color threshold is determined by the histogram of the color-difference signal ratio of each pixel—Cr/Cb—that is calculated from the YCbCr image transformed from the RGB image. The histogram of the Cr/Cb value has 2 peaks indicating skin area and open-eye area. The Cr/Cb value indicated by the minimum value between the 2 peaks is designated as the threshold for open-eye area extraction. 2.2

Binarization by Pattern Matching Method

The method described in Subsection 2.1 can extract the open-eye area almost completely. However, the results of this extraction sometimes leave deficits around the corner of eye, because the Cr/Cb value around the corner of eye is similar to the value on skin in certain subjects. To resolve this problem, we have developed a method for open-eye extraction without deficits by combining 2 extraction results. One of them is a binarized image using color information, as described in Section 2.1. The other extraction result is a binarized image using light intensity information, which includes in the extraction result the area around the corner of the eye. Binarization using light intensity information utilizes the threshold estimated by a pattern matching method, which determines the matching point by using the color information of the binarized image as reference data. Hence, the threshold level is estimated automatically. The original image and the extracted open-eye area image are shown in Fig. 1(a) and Fig. 1(b).

(a)

(b)

Fig. 1. Original eye image (a) and extracted open-eye area (b)

3 Measurement Method of Wave Patterns of Eye Blinks Using Split-Interlaced Images Commonly used NTSC video cameras output interlaced images. One interlaced image has 2 field images, which are designated as odd or even fields. If an NTSC camera captures a fast movement such as an eye blink, there is a great divergence between the captured odd- and even-field images. Therefore, the area around eyelids on the captured image has comb-like noise. This phenomenon occurs because of mixing of 2 field images of the fast movement of eyelids. An example of interlaced images during eye blinking is shown in Fig. 2. To describe this phenomenon most clearly, Fig. 2 has been captured at low resolution (145 × 80 pixels).

6

K. Abe, S. Ohi, and M. Ohyama

If one interlaced image is split by scanning even- and odd-numbered lines separately, 2 field images are generated. Thus, the time resolution of the motion images doubles, but the amount of information in the vertical direction decreases by half. These field images are captured at 60 fields/sec, and the NTSC interlaced moving images are captured at 30 fps; therefore, this method yields a time resolution that is double that available in the NTSC format. The duration of a conscious blink is a few hundred milliseconds; therefore, it is difficult to measure accurately the wave pattern of an eye blink by using NTSC cameras. However, the detailed wave pattern of an eye blink can be measured by using our proposed method. The split-interlaced images are shown in Fig. 3. The 2 eye images shown in Fig. 3 are enlarged in a vertical direction and were generated from the interlaced image shown in Fig. 2. Our proposed method measures the wave patterns of eye blinks from these images.

Fig. 2. Blinking eye image (interlaced)

Fig. 3. Split-interlaced image generated from Fig. 2

4 Evaluation Experiment for Proposed Method Either 4 or 5 subjects participated in experiments to evaluate our proposed method, as described in Subsections 4.1 and 4.2, respectively. The experimental setup includes an NTSC DV camera (for home use), a high-speed digital video camera, and a personal computer (PC). The PC analyzes sequenced eye images captured by the video cameras. The DV camera captures interlaced images at 30 fps, and the high-speed digital video camera captures non-interlaced images at 300 fps. In the experiments performed using these video cameras, the wave pattern of eye blinks is measured from sequenced eye images. The experimental setup is shown in Fig. 4.

Automatic Method for Measuring Eye Blinks Using Split-Interlaced Images PC

7

Display

User NTSC or High-speed digital video camera

Fig. 4. Hardware configuration of experimental system

4.1 Experiment for Eye Blink Measurement Using NTSC Camera In this experiment, sequenced eye images were captured using the DV camera at 30 fps in NTSC format. In addition, split-interlaced images are generated from these interlaced NTSC images. These split-interlaced images have a time resolution of 60 fields/s. The wave pattern of eye blinks is measured by the interlaced NTSC images and split-interlaced images. The binarization threshold for open-eye area extraction is determined automatically from the first field image of the experimental moving images. This threshold is estimated by the method described in Section 2. A typical result from this experiment is shown in Fig. 5.

Pixels of open-eye area

1.1 1 0.9 0.8

60fps

0.7

30fps

0.6 0.5 0.4 1

6

11 16 21 Sampling point (1/60 sec.)

26

Fig. 5. Wave patterns of eye blinks measured by DV (30 fps and 60 fps)

In Fig. 5, the longitudinal axis and the abscissa axis indicate pixels of open-eye area and sampling point (interval: 1/60 sec), respectively. To compare the 2 wave patterns of eye blinks, these plots are normalized using the pixels of open-eye area at the first field image. The bottoms of the plots indicate the eye-closed condition. Our proposed algorithm classifies the area of eyelid outline and cilia into the open-eye area; therefore, the pixels at the bottom of the plots are not reduced to zero. From Fig. 5, it is evident that sequenced images at 60 fields/s can be used to estimate the detailed wave pattern of an eye blink. During the eye blink, there is a great difference in the 2 plots of pixels of the open-eye area; however, this difference is not dependent on individual subjects.

8

K. Abe, S. Ohi, and M. Ohyama

Results of the wave pattern of eye blink measurements for 5 subjects are shown in Fig. 6, where the longitudinal axis and the abscissa axis show pixels of open-eye area and sampling point, respectively. These plots also are normalized in a manner similar to those in Fig. 5. From Fig. 6, it is evident that there are great differences in the results for each subject. 1.1

Pixels of open-eye area

1 0.9 0.8 0.7 0.6 0.5 0.4

Subject A Subject D

0.3

Subject B Subject E

Subject C

0.2 1

11 21 31 41 Sampling point (1/60 sec.)

51

Fig. 6. Wave patterns of eye blinks of 5 subjects measured by DV (60 fps)

4.2 Experiment for Eye Blink Measurement Using High-Speed Video Camera To verify the accuracy of the proposed method that utilizes split-interlaced images, experiments were conducted with 4 subjects; this experiment and the one described in Subsection 4.1 were conducted separately. Subjects A and E (listed in Fig. 6) were enrolled in this experiment continuously, in which sequenced images at 3 different frame rates (30, 60, and 150 fps) were generated from moving images captured by the high-speed digital video camera. These sequenced images were then analyzed to measure the wave pattern of eye blinks. The results of eye blink measurements performed using the sequenced images at 3 different frame rates and those taken at 300 fps are compared. Typical examples of measurement results are shown in Fig. 7, Fig. 8, and Fig. 9, which display results at 30, 60, and 150 fps, respectively. From Fig. 7 and Fig. 8, it is evident that the degree of accuracy of measurement at 60 fps is higher than that at 30 fps. The minimum of the wave pattern (bottom of the curve) is quite characteristic of when an eye blink occurs. Results at 60 fps show that the bottom of the plot is measured with a high degree of accuracy. Therefore, sequenced images at this frame rate are suitable for measurement of eyelid movement velocity. Moreover, our proposed method using split-interlaced images (described in Section 3) utilizes 2 field images generated from one interlaced image; that is, the

Automatic Method for Measuring Eye Blinks Using Split-Interlaced Images

9

spatial information of these field images is decreased by half. We have confirmed that this decrease in spatial information does not affect measurement accuracy via an experiment using sequenced images at 60 fps. The sequenced images at 60 fps were generated from moving images captured by a high-speed digital video camera. In this experiment, we generated half-sized eye images by extracting scanned odd-numbered lines from sequenced images at 60 fps. We estimated wave patterns of eye blinks using these half-sized images. Our results show that the measured open-eye area decreases by half, which is in agreement with the results shown in Fig.8.

Pixels of open-eye area

46000 300 fps

44000

30 fps

42000 40000 38000 36000 34000 32000 1

11

21 31 41 51 61 71 Sampling point (1/300 sec.)

81

Fig. 7. Wave pattern of eye blinks measured by high-speed video camera (30 fps)

Pixels of open-eye area

46000

300 fps

44000

60 fps

42000 40000 38000 36000 34000 32000 1

11

21 31 41 51 61 71 Sampling point (1/300 sec.)

81

Fig. 8. Wave pattern of eye blinks measured by high-speed video camera (60 fps)

10

K. Abe, S. Ohi, and M. Ohyama

Pixels of open-eye area

46000

300 fps

44000

150 fps

42000 40000 38000 36000 34000 32000 1

11

21 31 41 51 61 71 Sampling point (1/300 sec.)

81

Fig. 9. Wave pattern of eye blinks measured by high-speed video camera (150 fps)

4.3 Discussion On the basis of Fig.5, it is evident that by using split-interlaced images, the time resolution of measurement is doubled than that of the results obtained in previous studies. These split images are odd- and even-numbered field images in the NTSC format that are generated from NTSC frames. This method can also be utilized for any subject under common indoor lighting sources, such as fluorescent lights. We have shown the wave patterns of eye blinks for 5 subjects in Fig. 6. From results shown in Fig. 7, Fig. 8, and Fig. 9, it is evident that the degree of accuracy of measurement increases with increasing frame rate. A closer estimate of eye blinking velocity can be achieved if the wave pattern of an eye blink were to be measured with higher accuracy. In other words, the type of eye blink can be classified with a high degree of accuracy. In addition, our proposed method can measure the wave patterns of eye blinks efficiently even by using half-sized eye images. As shown by our experimental results presented earlier, we have verified the reliability of our proposed method described in Section 3. Thus, detailed wave patterns of eye blinks can be measured by using our proposed method.

5 Conclusions We present a new automatic method for measuring eye blinks. Our method utilizes split-interlaced images of the eye captured by an NTSC video camera. These split images are odd- and even-numbered field images in the NTSC format and are generated from NTSC moving images. By using this method, the time resolution for measurement increases to 60 fps, which is double that of conventional methods. Besides the function of automatic measurement of eye blinks, our method can be used under common indoor lighting sources, such as fluorescent lights. In evaluation experiments, we measured eye blinks of all subjects without problems.

Automatic Method for Measuring Eye Blinks Using Split-Interlaced Images

11

To verify the accuracy of our proposed method, we performed experiments using a high-speed digital video camera. On comparison of the results obtained using NTSC cameras with those obtained using a high-speed digital video camera, it is evident that the degree of accuracy of measurement increases with increased resolution time. Additionally, a decrease in area of the split-interlaced image has no adverse effect on the results of eye blink measurements. We confirmed that our proposed method is capable of measuring the wave pattern of eye blinks with high accuracy by using an NTSC video camera. In the future, we plan to develop a new method for classifying types of eye blinks using our proposed measurement method reported above. That new method will be capable of profiling eye blinks according to velocity of open-eye area changes. We also plan to apply this new method to more general ergonomic measurements.

References 1. Grauman, K., Betke, M., Gips, J., Bradski, G.R.: Communication via Eye Blinks - Detection and Duration Analysis in Real Time. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1010–1017, Lihue, HI (2001) 2. Morris, T., Blenkhorn, P., Zaidi, F.: Blink Detection for Real-Time Eye Tracking. J. Network and Computer Applications 25(2), 129–143 (2002) 3. Ohzeki, K., Ryo, B.: Video Analysis for Detecting Eye Blinking Using a High-Speed Camera. In: Proc. of Fortieth Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, pp. 1081–1085 (2006) 4. Abe, K., Ohyama, M., Ohi, S.: Eye-Gaze Input System with Multi-Indicators Based on Image Analysis under Natural Light. J. The Institute of Image Information and Television Engineers 58(11), 1656–1664 (2004) (in Japanese) 5. Abe, K., Ohi, S., Ohyama, M.: An Eye-Gaze Input System Using Information on Eye Movement History. In: Proc. on 12th International Conference on Human-Computer Interaction, HCI International 2007, Beijing, vol. 6, pp. 721–729 (2007) 6. Garcia, C., Tziritas, G.: Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis. IEEE Trans. on Multimedia 1(3), 264–277 (1999)

A Usability Study of WebMaps with Eye Tracking Tool: The Effects of Iconic Representation of Information Özge Alaçam and Mustafa Dalcı Human Computer Interaction Research and Application Laboratory, Computer Center, Middle East Technical University, 06531 Ankara/Turkey {ozge,mdalci}@metu.edu.tr

Abstract. In this study, we aim to conduct usability tests on different WebMap sites with eye movement analysis. Overall task performance, the effects of iconic representation of information, and the efficiency of pop-up usage were evaluated. The eye tracking technology is used for this study in order to follow the position of the users’ eye-gaze. The results show that there are remarkable differences in task performance between WebMaps. Addition, they also differ in the use of iconic representations according to results of users’ evaluation. It is also found that efficiency of pop-up windows’ usage has an effect on task performance. Keywords: Web mapping, usability, eye tracking, cognitive processes, iconic representations, and the efficiency of pop-ups.

1 Introduction Web mapping sites became widespread in many professional areas since they provide opportunities such as designing and sharing maps on the World Wide Web. Beside to their role on professional area, it also became very considerable part of our daily life since it makes the navigation easier [13]. Addition to large number of web mapping sites’ users which access these sites with their desktop and laptop PCs, improvements in technology make the internet available in nearly everywhere providing a chance to connect with mobile devices (mobile phones, smart phones, PDAs) and multiply the number of web mapping sites’ users. By the increasing number of web mapping sites’ users, researchers started to conduct usability studies of these sites and to investigate the effects of usability [2, 8, 13, 15]. The term usability is defined by ISO 9241 [9] as “the effectiveness, efficiency and satisfaction with which specified users achieve specified goals in particular environments” [9, 13]. Another definition from Nielson, one of the pioneers in the usability field, states usability as a quality evaluation that assesses how easy user interfaces are to use. According to his definition, usability is composed of five quality components [16]; these components are learnability, efficiency (task completion time), memorability, errors, satisfaction. Addition to these parameters obtained from usability study, usage of the eye tracking tools adds a different aspect to the usability field for the reason that it provides objective and quantitative evidence to investigate user’s cognitive processes such as visual and attentional J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 12–21, 2009. © Springer-Verlag Berlin Heidelberg 2009

A Usability Study of WebMaps with Eye Tracking Tool

13

processes [5]. Usage of eye tracking on usability field started at 1950’s [6]. However due to the difficulties in the analysis of huge data obtained from the eye tracking tools, it lost its popularity in 1970’s. With the improvements of the eye tracking technologies, eye tracking tools gain their impacts on the usability field again [10] and nowadays they are accepted as a tool to improve computer interface. In one of the studies about WebMap usability conducted by Nivala [13], severity of the usability problems were investigated. In our study, we aim to make additional analysis to find the reason of these usability problems and make them more clear by analyzing eye movements of the users. The focus of this study is to analyze the effects of the iconic representation of the information and to investigate whether the pop-ups are used efficiently by the user. The eye tracking tool is used for this study in order to follow the position of the users’ eye movements, which helps to measure the attended location on the map. It is known that eye movements provide information about cognitive processes such as perception, thinking, decision making and memory [1, 3, 4, 12, 14]. Evaluation of eye movement provided us the opportunity to focus on the iconic representations, efficiency of pop-up windows and their effects in map comprehension in different WebMaps.

2 Method and Materials 26 subjects (12 female, 14 male) either university students or graduate in a range of ages between 18 and 32 participated to this study. In order to get information about their prior knowledge about WebMap usage and to get the user’s idea about the comprehensibility of the icons and preferences about the WebMaps, a questionnaire was carried out. Each subject evaluated two different WebMaps for different places in US in random order. Six tasks shown in Table 1 were used in the experiment. Users were told that they could give up the task or the experiment whenever they wanted to. Tasks given to the users include; to find given address, to find definite places which are represented with icons (s.t airport, metro station, hospital) and to show the route to specific locations. The experiments are conducted at the Human-Computer Interaction Research and Application Laboratory at Middle East Technical University. Eye movements of users were collected by Tobii 1750 Eye Tracker and analyzed with Tobii Studio. Table 1. Task Description Task No Task Description Instruction Welcome to X City/State. You are planning to look at the city map to specify the locations that you want to visit before starting your trip 1 Point the nearest highway intersection to X International Airport You want to go from X International Airport to X University. Could you describe 2 how to arrive to that location? Find the address of the hospital nearest to X Park 3 Now, you are in X Location. Show the nearest metro/railway station to this place 4 You are searching for the library nearest to X place. Find and point it on the map. 5 Show the intersection point of the given address with X street. 6

14

Ö. Alaçam and M. Dalcı

In the Nivela et al.’s study [13], there is an evaluation of four different web mapping sites. These are Google Maps, MSN Maps & Directions, MapQuest and Multimap. However, since the MSN Maps and Directions, Multimap are based on Microsoft Virtual Earth, we replaced these sites with Live Search Maps that is also based on MS Virtual Earth. Since these are well-known and all have zooming and panning options on their 2D map applications, they are very good candidates for usability testing. Although their common properties mentioned above, they differed in terms of usage of icon representation and pop-up window properties. We conducted the usability testing of Google Maps, Live Search Map, MapQuest and Yahoo Map and investigated the effect of iconic representation of information and pop-up windows by analyzing eye movements. We use the term “The iconic representation of information” as to state the relationship between their semantics and appearance. Addition to evaluation of task completion performance (s.t. task completion score and time), eye tracking data such as fixation length, fixation count, observation length was collected.

3 Results Results are presented under three categories; task performance, analysis of the iconic representations and analysis of pop-up windows. 3.1 Task Performance Users are grouped into two categories according to their WebMap usage experience; experienced users (14 users) for high-level usage frequency and inexperienced users (12 users) for low level usage frequency. One way ANOVA test was conducted to compare mean fixation length on task completion time for experience level. Result shows that user’s experience level has a significant effect on task completion time, F(1,52)=5,30, p>.05. One of the evaluation criteria of comparing the usability of WebMaps is users’ task completion scores. Task completion score was evaluated under three categories; accomplished tasks, unaccomplished tasks and partially accomplished tasks that the users thought that they accomplished a task when they actually did not. Table 2 provides the percentage of users, who accomplished, partially accomplished and did not accomplish each task and also overall score was calculated for each WebMap site. Fig. 1 shows the overall completion score for each map. Results of one way ANOVA shows that task completion score of Google Map is significantly different than MapQuest and Yahoo Map, F(3,48)= 8.629 p<.05. It is also worth to note that significance value of difference between Live and Yahoo is .05. In addition to analysis of task completion score, mean fixation length for each task was analyzed individually (see Fig. 2 for comparison). Only fixation length on accomplished and partially accomplished tasks was counted. The results show that, for the first task, there is no significant difference in fixation length according to map type. The fixation length during task two, a significant difference was found between Live Search Map and MapQuest, Google Map and MapQuest, Yahoo Map and MapQuest F (3,43)= 12.538, p<.05. For the third task, Google Map is significantly

A Usability Study of WebMaps with Eye Tracking Tool

15

different than MapQuest, F (3,33)=3.768, p<.05. For the fourth task, Google Map is significantly different than MapQuest and Live Search Map, F(3,23)= 5.398 p<.05. The fixation length on the fifth task, Google Map is significantly different than MapQuest, Yahoo Map and Live Search Map, F(3,35)=12.058, p<.05. For the task six, only difference in Live Search Map and MapQuest is significant, F(3,41)=2.444, p<.05. Statistical analysis of mean fixation count for each task also shows that there are significant differences between the pairs given above. Table 2. Percentage of users’ task completion scores

Task2

Task3

Task4

Task5

Task6

Overall

tasks accomplished tasks partially accomplished tasks unaccomplished tasks accomplished tasks partially accomplished tasks unaccomplished tasks accomplished tasks partially accomplished tasks unaccomplished tasks accomplished tasks partially accomplished tasks unaccomplished tasks accomplished tasks partially accomplished tasks unaccomplished tasks accomplished tasks partially accomplished tasks unaccomplished tasks accomplished tasks partially accomplished tasks unaccomplished

Live Search Map 86.7 13.3 0.0 93.3 0.0 6.7 46.7 26.7 26.7 46.7 0.0 53.3 53.3 13.3 33.3 100.0 0.0 0.0 71.1 8.9 20.0

Map Quest 100.0 0.0 0.0 50.0 35.7 14.2 64.3 14.2 21.4 14.2 7.1 78.6 57.1 0.0 42.8 57.1 0.0 42.9 57.1 9.5 33.3

120,0 Number of participants (%)

Task1

Google Map 100.0 0.0 0.0 100.0 0.0 0.0 35.7 21.4 42.8 78.5 14.2 7.1 78.5 21.4 0.0 100.0 0.0 0.0 82.1 9.5 8.3

100,0 80,0 60,0 40,0 20,0 0,0 Google Map tasks accomplished

Live Search

Map Quest

tasks partially accomplished

Yahoo Map tasks unaccomplished

Fig. 1. The percentage of users according to task completion score

Yahoo Map 100.0 0.0 0.0 63.6 0.0 36.2 36.4 27.2 36.4 9.1 36.4 54.5 45.5 18.1 36.4 45.5 18.2 36.4 50.0 16.7 33.3

16

Ö. Alaçam and M. Dalcı

600 500 400 300 200 100 0 görev1

görev2

Live Search Map

görev3 Google Map

görev4 Yahoo Map

görev5

görev6

MapQuest

Fig. 2. Fixation Length (sec.) on each task according to WebMaps

3.2 Analysis of the Iconic Representations In order to investigate the efficiency of iconic representations in these WebMaps, the observation length on icons was counted frame by frame in specific tasks. In order to analyze icon usage, first, third and fourth tasks were selected since these tasks contain specific places which can be represented by icons (such that airport icon for first task, hospital icon for task three, metro/railway icon for task 4, and pointers that appears after users’ search for all three tasks). Other icons displayed on the map during these tasks were also investigated. Fixation length on each icon and the time which they are displayed on the map were counted and then the percentage of iconic representation looking time was calculated. One-way ANOVA results shows that there is no significant difference in observation length on icons for each WebMap, F(3,48)=1,859, p>.05. In addition, no correlation between the icon usage looking time percentage and completion score was found. (Spearman’s rho=.37, p>.05) Icons are divided into two categories as, task related icons and task unrelated icons. However, since every map could not have a representation for each icon, they were investigated individually, without conducting statistical analysis. Since MapQuest had only pointer icons, these are not compared with the specific icons in other maps. Table 3 provides the percentage of looking time for each icon in each WebMap. Observation length on Airport in Yahoo Map, is 13,3% of the time that it appears on the map. However, since the other WebMaps (Google Maps, MapQuest and Live Search Map) do not have icon for airport, there can be no comparison analysis. The pointers which represent the searched location have approximately same looking time for all maps. For the metro and railway icons, Google icon has biggest looking time percentage, and then it is followed by Live Search Map and Yahoo Map respectively. Since the MapQuest and Live Search Maps do not have hospital icons, looking time of icons on only Yahoo Map and Google Map were investigated. Even tough Google and Yahoo Maps contain an icon for “Hospital”, the users are expected to zoom close

A Usability Study of WebMaps with Eye Tracking Tool

17

enough for that icon to be visible. However none of the users were inclined to zoom to that distance while performing their tasks. The eye movement analysis of users which perform the experiment on Live Map Search shows an interesting outcome; some task unrelated icons (s.t. park, hotel or sponsored link icons) have been remarkably fixated by the users. Table 3. The percentage of looking time for each icon Icon Type Google Map Task Related Icons

Unrelated Icons

Live Search

Map Quest

Yahoo Map

Airport

Na*

Na

Na

13,3

Pointers

10,6

10,4

10,9

10,5

Metro/railway

13,5

10,8

Na

7,2

University

3,5

Na

Na

Na

Hospital

0,0

Na

Na

0,0

Park

1,4

Na

Na

Na

Hotel

Na

5,5

Na

Na

Sponsored link

Na

1,7

Na

Na

*Not Applicable.

Task based investigation has been made regarding iconic representations. Correlations between the icon looking time percentage, completion score and observation length for Task 1,3 and 4 were examined individually. For Task 1, no correlation between these parameters was found. The analysis conducted for the third task, correlations have been found between the icon looking time percentage and completion score (Pearson’s r=.523; p<.05) and between icon looking time and observation length (Pearson’s r=-.368; p<.05). For Task four, a correlation has been found between icon looking time and observation length (Pearson’s r=-.289; p<.05). On the other hand, no correlation has been found between the icon looking time percentage and completion score (Pearson’s r=.165; p>.05). Addition to eye movement analysis, user evaluation questionnaire was carried out. The users were asked to predict the meaning of icons and rate their comprehensibility in a scale from 1 to 10 (the results are given in Table 4). This gives us the opportunity to evaluate the efficiency of the relationship between their semantics and appearances and show whether there is an ambiguity. The comprehensibility ratings for only correct predictions were counted. It can be claimed that there is a consistency between its appearance and semantic when both the correct predictions and comprehensibility rate of a icon is high. Although the comprehensibility evaluated by the users is high for the hospital icon in Google, only 6.3 % of the users predict the meaning of the icon correctly. The comparison of the results for metro/railway icons indicates that iconic representations in Google Maps and Yahoo Maps are comprehended more easily by the users than in Live Search Maps. None of the users predicts the meaning of the sponsored link and hotel icon in Live Search Map. When we look at the Table 3 for looking time percentage of these icons, it can be concluded that the users notice them without attaching a meaning.

18

Ö. Alaçam and M. Dalcı Table 4. Users’ ratings on iconic representations Icon

% of correct definition by users

Comprehensibility

+

6,3

9,0

Hospital (Yahoo Map)

87,5

8,5

Airport (Yahoo Map) Metro/Railway (GoogleMap) Metro/Railway (Yahoo Map) Metro/Railway (Live Search Map)

100

10

81,3 / 93,8

6,6 / 7,9

81,3

6,6

62,5

6,2

Hotel (Live Search Map)

0

Na

Sponsored Links (Live Search Map)

0

Na

Hospital (Google Map)

Moreover, users were asked to specify their icon preference and rate the usabilty of the maps which they used during the experiment. We can group the icons into three main categories; these are Pictorial, Textual and Numerical/Alphabetical icons. Textual icons like (s.t. ) are mostly prefered icons (37.5%). 31,3 % of the users prefer pictorial icons (s.t. ), to be used in maps. Another 31,3 % of the users prefer numerical/alphabetical icons (s.t. ). Additionally, users’ usabilty ratings of Web Maps are parallel with task completion score given in Table 2. Their ratings are 8.3 for Google Map 5.1 for live Search Map, 4.6 for Yahoo Map and 3,4 for MapQuest. 3.3 Analysis of Pop-Up Windows Additional analysis was conducted to investigate the usage of pop-up windows. It can be suggested that the aim of the pop-up is to direct users focus on something else where they are provided with additional information either regarding their task or not. Moreover, these pop-ups can facilitate additional search on locations that the task requires. Therefore pop-up windows are very important parts of the web maps and they frequently appear during map usage. In order to investigate the efficiency of popup usage in these WebMaps, the sections that contain pop-up windows on map area were extracted for each task. Fixation length and fixation count on pop-up windows and display time of it on map area was counted. The analysis of looking time of popup windows when they appear on the map gives us an idea about whether the users prefer to use them. The results of this analysis showed that Google Maps use the popup windows more efficiently, since % 64.8 of the fixations on the map are on the popup area. It is followed by MapQuest, Live Search Map and Yahoo Map respectively (see Fig. 3). One way ANOVA results also indicates that there is a significant difference between Google Map and Live Search Map in pop-up usage percentage. Google

A Usability Study of WebMaps with Eye Tracking Tool

19

Observation Length (sec)

200,00 180,00 160,00 140,00 120,00 100,00 80,00 60,00 40,00 20,00 0,00

Google Map

Live Search Map

MapQuest

popup

48,96

40,05

63,16

25,79

map

75,52

96,78

115,19

77,38

map

Yahoo Map

popup

Fig. 3. Observation Length on Pop-up / Map for each WebMap

Map is also significantly different than Yahoo Map. Addition, there is also difference between Yahoo Map and MapQuest, F(3,51)=5.939, p<.05. Additionally, correlations have been found between fixation count on pop-ups and overall completion score (Spearman’s rho=.396; p<.05) and between fixation length on pop-ups and overall completion score (Spearman’s rho=.423; p<.05).

4 Conclusion By analyzing eye movements which are indicators of cognitive processes, we examined the effects of iconic representations, pop-up window usage, and their roles on the usability of these mapping sites. Since comprehension of maps contains very complicated cognitive process, decreasing the cognitive load by making the icons more comprehensible and the pop-ups more usable will cause effectiveness and efficiency in the usage of web mapping sites. Task performance evaluation shows that there is a significant difference between these Webmaps. When the tasks are getting complicated (ex. to find an address of specific location near to another location), differences between these WebMaps in terms of task completion time and score has become apparent. Besides the differences in overall display organization, in a micro level framework, we examined the effect of the iconic representation of information and efficiency of the usage of popup windows. The analysis indicates that there is gab between user’s ratings on icons and their looking time percentages. Even the icons that are correctly predicted and given high comprehensibility level by the users have a low looking time. Additionally, it is also worth to be noted that icon’s looking time percentage is low for even task related icons. This means users have difficulty to detect them when they are performing their tasks although the icons are highly related with tasks. To make them more visible can help the users notice them, and increase their task completion performance. Moreover, analysis on the efficiency of pop-up windows, which are other

20

Ö. Alaçam and M. Dalcı

widely used elements of WebMaps, shows that users’ pop-up usage differs significantly according to WebMap type and this parameter is positively correlated with task completion score. In addition, as an expected outcome, experience level has a significant role on WebMap usage performance.

5 Further Studies However, icons for local traffic signs (ex. Highway numbers) were highly fixated during tasks; however these findings have been disregarded due to the user’s lack of familiarity towards to locations. Follow up studies to investigate these icons can be conducted with native users of the location. In addition, to evaluate the interaction between some particular areas of web site (s.t. search bars, menus, map area, information area which shows the results of the searched location) gives additional information about the efficiency and effectiveness of the usability of the WebMaps. Acknowledgements. We thank to the Computer Center / Middle East Technical University, and TUBİTAK (for support under grant SOBAG 104K098) for providing eye tracking system. We also thank to Yasemin Saatiçioğlu Oran and our colleagues in METU Computer Center for their valuable support.

References 1. Barkowsky, T.: Mental Processing of Geographic Knowledge. In: Montello, D.R. (ed.) COSIT 2001. LNCS, vol. 2205, pp. 371–386. Springer, Heidelberg (2001) 2. Davies, C., Wood, L., Fountain, L.: User-centred GI: hearing the voice of the customer. In: AGI 2005: People, Places and Partnerships, Annual Conference of the Association for Geographic Information, London, UK, November 8–10, 2005, Association for Geographic Information, London (2005) 3. Downs, R.G., Liben, L.S.: The Development of Expertise in Geography: A CognitiveDevelopmental Approach to Geographic Education. Annals of the Association of American Geographers 81(2), 304–327 (2005) 4. Downs, R.G., Liben, L.S., Daggs, D.G.: The Development of Expertise in Geography: A Cognitive-On Education and Geographers. The Role of Cognitive Developmental Theory in Geographic Education 78(4), 680–700 (2005) 5. Duchowski, A.T.: A Breadth-First Survey of Eye Tracking Applications. Behavior Research Methods, Instruments, & Computers (BRMIC) 34, 455–470 (2002) 6. Fitts, P.M., Jones, J.E., Milton, J.L.: Eye movements of aircraft pilots during instrument landing approaches. Aeronautical Engineering Review 9(2), 24–29 (1950) 7. Global Mapping International, http://www.gmi.org/mapping/websites.htm 8. Haklay, M., Zafiri, A.: Usability Engineering for GIS: Learning from a Screenshot. The Cartographic Journal 45(2), 87–97 (2008) 9. ISO. ISO/DIS 9241-11 Ergonomic requirements for office work with visual display terminals (VDTs) – Part 11: Guidance on usability, International Organization for Standardization (1998) 10. Jacob, R.J.K., Karn, K.S.: Eye tracking in human-computer interaction and usability research: Ready to deliver the promises (Section commentary). In: Hyona, J., Radach, R., Deubel, H. (eds.) The Mind’s Eyes: Cognitive and Applied Aspects of Eye Movements, Elsevier Science, Oxford (2003)

A Usability Study of WebMaps with Eye Tracking Tool

21

11. Montello, D.R.: Cognitive Map-Design Research in the Twentieth Century: Theoretical and Empirical Approaches. Cartography and Geographic Information Science 29(3), 283– 304 (2002) 12. Murata, A., Furukawa, N.: Relationships among display features, eye movement caharcteristics and reaction time in visual search. Human Factors 47(3) (2005) 13. Nivala, A.M., Brewster, S., Sarjakoski, L.T.: Usability Evaluation of Web Mapping Sites. The Cartographic Journal 45(2), 129–138 (2008) 14. Richardson, D.C., Dale, R., Spivey, M.J.: Eye movements in language and cognition: A brief introduction. In: Gonzalez- Marquez, M., Coulson, S., Mittelberg, I., Spivey, M.J. (eds.) Methods in cognitive linguistics, John Benjamins, Amsterdam (in press) 15. Skarlatidou, A., Haklay, M.: Public Web Mapping: Preliminary Usability Evaluation. In: Proceedings of GIS Research UK Conference, Nottingham, April 5-7 (2006) 16. Usability 101: Introduction to Usability (August 25, 2003), http://www.useit.com/alertbox/20030825.html (retrieved 16:22, September 21,2008)

Feature Extraction and Selection for Inferring User Engagement in an HCI Environment Stylianos Asteriadis, Kostas Karpouzis, and Stefanos Kollias National Technical University of Athens, School of Electrical and Computer Engineering, Image, Video and Multimedia Systems Laboratory, GR-157 80 Zographou, Greece [email protected], {kkarpou,stefanos}@cs.ntua.gr

Abstract. In this paper, we present our work towards estimating the engagement of a person to the displayed information of a computer monitor. Deciding whether a user is attentive or not, and frustrated or not, helps adapting the displayed information of a computer in special environments, such as e-learning. The aim of the current work is the development of a method that can work userindependently, without necessitating special lighting conditions and with only requirements in terms of hardware, a computer and a web-camera. Keywords: User engagement, Head Pose, Eye Gaze, Facial Feature tracking.

1 Introduction While a lot of work has been done regarding the issue of user attention estimation in multi-user environments, such as meetings [7], very few articles have been published for estimating attention (engagement) in an HCI environment [6]. In this work, we propose a method for inferring user attention based on the extraction of features deriving from facial analysis. Such systems have mainly been proposed for estimating drivers' attention. For example, in [8], a monocular system is used and color precicates are employed to detect and track the driver's facial features. Facial geometry and eye closure are calculated and driver's attention is extracted with the use of three finite state automata. In [3], the authors use solely eyes closure to determine whether a driver is attentive or not. To this aim, they use a gaussian model to describe driver's attention and, eye closure for certain amounts of time would denote inattention or fatigue. In [10], the position in space of each facial feature is detected using a stereo vision system. Based on these positions, a least square algorithm estimates the head orientation and further analysis follows for the detection of the eye gaze. The combination of the above gives a good estimate of the direction at which a user is looking, however, it requires initialization. In our work, we combine head pose with eye gaze, as well as other biometrics, in a monocular environment, not necessitating any particular lighting conditions or calibration. There is not much work in bibliography combining the two features in an unconstrained environment. Typical work is the one J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 22–29, 2009. © Springer-Verlag Berlin Heidelberg 2009

Feature Extraction and Selection for Inferring User Engagement

23

reported in [12], where facial symmetry along with Gabor filters are used for estimating head pose and eye gaze respectively. A look-up table is then built for corresponding the resulting eye gaze and head pose with the final focus of attention estimate. Here, we propose a work that can be summarized as follows: Face detection [11], followed by facial feature detection is the first step of our method, while tracking follows. Based on facial features' motion, a series of biometric measurements are extracted and their appropriateness is evaluated for inferring the level of frustration or attentiveness of a user in a Human-Computer-Interaction scenario. Our algorithm is able to recover and re-initialize in cases of occlusion or tracking failure.

2 Facial Points Detection and Tracking The method reported in [1] is used for face and eye centre and mouth corner (here, enhanced with upper and lower lip points) localization. For the detection of the eye corners (left, right, upper and lower eyelids) a technique similar to that described in [13] is used. In the current work, the point between the nostrils and two points on each eyebrow have also been used, as will be discussed later. For nostrils detection, an area around a segment of the perpendicular to the inter-ocular line was extended starting from the middle of the eyes. The darkest row of this area is considered as the vertical position of the nostrils and the middle point of this row is further used for our experiments. In a similar manner, two points on each eyebrow are extracted, as the darkest points in a neighborhood above the eye corners. The above are illustrated in Fig. 1, where the luminance values of two search areas have been projected on the vertical axis. The minimum of the projections corresponds to the features in search. Tracking is done using a three-Pyramid Lukas-Kanade algorithm. Geometrical face models, and prototypes of natural human motion are employed for recovering from erroneous tracking (see subsection 3.1).

Fig. 1. Eyebrow and nose detection search regions

3 Feature Extraction The features extracted in our method are the following: Head pose, eye gaze, eyebrow movements, head horizontal and vertical speed components, mouth horizontal and vertical opening, relative movements of the user back and forwards.

24

S. Asteriadis, K. Karpouzis, and S. Kollias

3.1 Head Pose Estimation Head rotation is calculated by examining the translation of the point in the middle of the inter-ocular line with regards to its position when the user was rotated frontally (see Fig. 2), thus providing the Head Pose Vector p = [px py], where px and py are the horizontal and vertical components of the eye middle point’s translation, respectively, normalized with the inter-ocular distance, as calculated at start-up, to cater for scale variations. The fraction of the inter-ocular distance with the vertical distance between the eyes and the mouth is monitored, and if it is restricted within certain limits with regards to its value at a frontal position, no rotation is decided. As tracking, many times, fails, thus giving false estimates of head pose (as well as the other features), a series of rules were integrated: After large rotations, some features are occluded and cannot be further recovered. In this case, when the user comes back to a frontal position, after nt1 frames, the pose length reduces in length but is above a certain threshold, as one of the eyes is not well tracked and the eye center is not at the same neighborhood as at start-up. In this case, the algorithm can re-initialize. The above can be modeled as in equations (1),(2):

p(n) < thr1 ⋅ p(n − nt1 )

(

)

var p n−nt 2:n < thr2

(1) (2)

where ║*║ denotes a vector length metric. Equations (1)-(2) are interpreted as follows: If the Head Pose Vector length at the current frame n is smaller than a fraction thr1 of its value at frame n-nt1, and its variance for the last nt2 frames (with nt2
Fig. 2. Pose changes during a video of a person in front of a monitor

Feature Extraction and Selection for Inferring User Engagement

25

smaller than thr2, the algorithm re-initializes. In our experiments, we used nt1=10, nt2=7, thr1=0.7, thr2=0.05. If the above conditions are met, but the user has not turned frontally, face detection fails and, thus, frontal rotation is not decided. In general conditions, however, the algorithm re-initializes by re-detecting the face, facial features and re-starting to track. Further constraints that are taken into account are related with the displacement of features in subsequent frames. By assuming an orthographic projection in the interval between two subsequent frames, it is expected that features are shifted in a uniform way. Finding such outliers and re-calculating the mean shift with the rest of the features helps positioning erroneous points to the position that would agree with the rest of the features’ shift. As experiments showed, the above refinement is achieved after 7-10 iterations per frame.

Fig. 3. Gaze changes during a video of a person not moving his head in front of a monitor

3.2 Eye Gaze Estimation Eye gaze is extracted by monitoring the eye centre movements with regards to a coordinate system defined by the positions of the eye corners and eyelids at each frame (see Fig. 3). The resulting displacement provides with the Eye Gaze Vector gy], where gx and gy is the horizontal and vertical component respectively.

g = [gx

3.3 Extraction of Further Features The vertical movements of the eyebrows with regards to the upper eyelids are also extracted, and the horizontal and vertical components of the speed (in pixels per frame) of the head movements are calculated. Furthermore, mouth opening is calculated, with reference to the initial distance between the mouth corners and the lips distance at start-up. Finally, the inter-ocular distance changes are monitored, and

26

S. Asteriadis, K. Karpouzis, and S. Kollias

calculated as fractions of the eye centres’ distance with regards to the first frame of each initialization of the system. In this way, when changes in inter-ocular distance are not due to head rotations, qualitative measurements of user movement back and forth are achieved.

4 Feature Selection The experiments were conducted on a database consisting of children with learning difficulties, between the age of 8 and 10. The recorded videos were 720×576 pixels and the frame-rate 25fps. A total of about 10000 and 12250 frames were used for the case of attention/non-attention and frustration/non-frustration problems respectively. The videos were annotated by experts. One of the difficulties of the dataset was that the positive instances (frustration, non-attentiveness) were very few in comparison to the negative ones, and this limited the training session prototypes. In order to evaluate the appropriateness of each of the above features, the Fisher's exact test [4] was used. To this aim, the 3-bin histogram of each of the features was calculated and the resulting distribution for positive instances was compared against the distribution of the same feature throughout all videos, regardless of the state. In our case, we chose Fisher's exact test and not any other method (e.g. chi-square method) because it is ideal for small scale data. Indeed, in many occasions (for example, horizontal speed of the head when the user is frustrated), there are only a few instances where there are low or high values at the correspondent histogram bins. Fisher's exact test is ideal in cases of such small samples. For the event of non-attentiveness, among all features, tests showed that for the features of Head Pose, Eye Gaze, Inter-Ocular Distance Changes and Head Speed, the null hypothesis (that observed and expected distributions do not differ) should be rejected with higher confidence than for the rest of the features. For the event of frustration, Head Pose, Horizontal and Vertical Head speed and Eye Gaze do not follow the null hypothesis as much as the rest of the features do, as it was expected. Figure 4 justifies the rejection of some of the features due to high p-values, while Figures 5 and 6 illustrate examples of data for each class.

Fig. 4. p-values for feature selection in attention/non-attention and frustration/non-frustration scenarios

Feature Extraction and Selection for Inferring User Engagement

27

Fig. 5. Features used for attention/non-attention classification

Fig. 6. Features used for frustration/non-frustration classification

5 Experimental Results For testing the accuracy of our system, a Sugeno-type fuzzy [9] inference system was built for each case. The idea behind using fuzzy systems has to do with the fact that behavioral states do not necessarily belong to certain classes but, rather, they are fuzzy concepts. For example, frustration or distraction can be given confidence values, and outputs of fuzzy systems are ideal in this case. Prior to training, our data were clustered using the sub-cluster algorithm described in [2]. This algorithm, instead of using a grid partition of the data, clusters them and, thus, leads to fuzzy systems deprived of the curse of dimensionality. The number of clusters created by the algorithm determines the optimum number of the fuzzy rules. After defining the fuzzy inference system architecture, its parameters (membership function centers and widths), were acquired by applying a least squares and back-propagation gradient descent method [5]. Tables 1 and 2 summarize the results of the overall accuracy of our system in estimating the behavior of a user in the cases attention/non-attention and frustration/non-frustration experiments using different sets of features with low p-value as inputs and 1 or 0 the target states (attention/non-attention or frustration/non-frustration). Testing was done by adopting a leave-one-out protocol.

28

S. Asteriadis, K. Karpouzis, and S. Kollias

From tables 1 and 2, it can be seen that, in the case of frustration, although eye gaze has low p-value (see Fig. 3), excluding it from experiments does not deteriorate the results but they are marginally higher. This is due to the fact that, in cases of frustration, in our dataset, eye gaze vector length was strongly correlated with head pose vector length. Similarly, although Head Speed (horizontal and vertical) has low p-values in the attention tests, results have shown that, excluding these parameters from our decision systems, would improve the results. More careful observation of our data and the corresponding annotation gave the following explanation: Head speed is only large at the beginning of those time segments when a person is turning his/her head away from the camera. At those time segments, head pose vector has small values but head speed is high. However, such movements can also be met during attention time-stamps, as it is very frequent for a reader to make small rapid movements without changing his head pose a lot. For the above reason, it was decided to exclude head speed from our experiments in the case of attention estimation. The database we used was acquired under normal lighting conditions, with very challenging subjects: Children with learning difficulties. Testing our system on such a dataset is challenging, not only because of its nature, but also due to the fact that annotation is subjective. However, the results obtained are extremely promising. Table 1. Neuro-Fuzzy System decision accuracy for two different sets of low p-value features for detecting User Attention Features Head Pose, Eye gaze, Distance changes, Head speed Head Pose, Eye gaze, Distance changes

Overall success rates 84.00% 88.00%

Table 2. Neuro-Fuzzy System decision accuracy for two different sets of low p-value features for detecting User Frustration Features Head Pose, Horizontal and Vertical Head speed Head Pose, Horizontal and Vertical Head speed, Eye Gaze

Overall success rates 82.00% 80.63%

6 Conclusions and Future Work We presented a method for the automatic estimation of the behavior of a person in front of an HCI environment. Our system is un-intrusive, thus, leaving space for spontaneous behavior, it does not depend on controlled conditions in terms of lighting, and this constitutes it ideal for different settings. Furthermore, since the system does not require any a-priory knowledge of the user or the camera, it does not need any kind of training or calibration beforehand. Future extensions of our work shall include the creation of a common framework for discriminating among a set of states simultaneously. To this aim, we will build a database suitable for our research and work on developing a facial feature tracker, highly specialized for such applications.

Feature Extraction and Selection for Inferring User Engagement

29

Acknowledgments This work has been funded by the FP6 IP CALLAS (Conveying Affectiveness in Leading-edge Living Adaptive Systems), Contract number IST-34800 and by the IST Project ’FEELIX’, (under contract FP6 IST-045169).

References 1. Asteriadis, S., Nikolaidis, N., Pitas, I., Pardàs, M.: Detection of facial characteristics based on edge information. In: 2nd International Conference on Computer Vision Theory and Applications (VISAPP), Barcelona, Spain, pp. 247–252 (2007) 2. Chiu, S.L.: Fuzzy Model Identification Based on Cluster Estimation. Journal of Intelligent and Fuzzy Systems 2(3), 267–278 (1994) 3. D’Orazio, T., Leo, M., Guaragnella, C., Distante, A.: A visual approach for driver inattention detection. Pattern Recognition 40(8), 2341–2355 (2007) 4. Fisher, R.A.: Statistical Methods for Research Workers. Hafner Publishing (1970) 5. Jang, J.S.R.: ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Transactions on Systems, Man, and Cybern. 23, 665–684 (1993) 6. Matsumoto, Y., Ogasawara, T., Zelinsky, A.: Behavior recognition based on head pose and gaze direction measurement. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Takamatsu, Japan, pp. 2127–2132 (2000) 7. Otsuka, K., Takemae, Y., Yamato, J.: A probabilistic inference of multiparty-conversation structure based on markov-switching models of gaze patterns, head directions, and utterances. In: ICMI, pp. 191–198 (2005) 8. Smith, P., Member, S., Shah, M., Lobo, N.D.V.: Determining driver visual attention with one camera. IEEE Trans. on Intelligent Transportation Systems 4, 205–218 (2003) 9. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. 15(1), 116–132 (1985) 10. Victor, T., Blomberg, O., Zelinsky, A.: Automating driver visual behavior measurement. In: 9th Vision in Vehicles Conference, Australia (2001) 11. Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 511–518 (2001) 12. Weidenbacher, U., Layher, G., Bayerl, P., Neumann, H.: Detection of head pose and gaze direction for human-computer interaction. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Weber, M. (eds.) PIT 2006. LNCS, vol. 4021, pp. 9–19. Springer, Heidelberg (2006) 13. Zhou, Z.H., Geng, X.: Projection functions for eye detection. Pattern Recognition 37(5), 1049–1056 (2004)

Informative or Misleading? Heatmaps Deconstructed Agnieszka (Aga) Bojko User Centric, Inc. 2 Trans Am Plaza Dr, Ste 100, Oakbrook Terrace, IL 60181 USA [email protected]

Abstract. Eye tracking heatmaps have become very popular and easy to create over the last few years. They are very compelling and can be effective in summarizing and communicating data. However, heatmaps are often used incorrectly and for the wrong reasons. In addition, many do not include all the information that is necessary for proper interpretation. This paper describes several types of heatmaps as representations of different aspects of visual attention, and provides guidance on when to use and how to interpret heatmaps. It explains how heatmaps are created and how their appearance can be modified by manipulating different display settings. Guidelines for proper use of heatmaps are also proposed. Keywords: Heatmaps, attention maps, eye tracking.

1 Introduction Heatmaps are two-dimensional graphical representations of data where the values of a variable are shown as colors. Heatmaps are compelling for two reasons. First, the intuitive nature of the color scale as it relates to temperature minimizes the amount of learning necessary to understand it. From experience, we know that yellow is warmer than green, orange is warmer than yellow, and red is hot. It is not difficult to then figure out that the amount of heat is proportional to the level of the represented variable. Second, heatmaps show the data directly over the stimulus. Because the data could not be any closer to the elements to which they pertain, little mental effort is required to read a heatmap. Heatmaps can be of great value to papers, reports, and presentations because they summarize large quantities of data that would be much more difficult to grasp if presented numerically. Heatmaps help us quickly see “the big picture” including any patterns or trends that may exist in the data. In the user experience field, heatmaps can represent various types of data, such as usage (e.g., clicks, key presses), accuracy, or visual attention. This paper focuses exclusively on attention heatmaps, which have become popular over the past few years due to the increased usage of eye tracking technology. Attention heatmaps can be easily generated with the help of an eye tracking software application such as Tobii’s ClearView, Tobii Studio, SMI’s BeGaze, NYAN, or J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 30–39, 2009. © Springer-Verlag Berlin Heidelberg 2009

Informative or Misleading? Heatmaps Deconstructed

31

EyeTools. Anyone with an eye tracker and the right software can create a heatmap. No knowledge of eye movements or of how heatmaps are created is required. As a result, heatmaps are often generated unnecessarily or are misinterpreted by those who do not understand what the visualizations are really showing or, perhaps even more importantly, not showing. Heatmaps can be deceptive because they look so intuitive that we often do not realize how much we actually do not understand. This paper describes different heatmap types and their limitations, as well as settings used to manipulate the appearance of heatmaps. It also discusses when heatmaps should and should not be used. Proposed guidelines for using heatmaps correctly conclude this work.

2 Types of Attention Heatmaps Heatmaps are often shown with little, if any, description of what it is they are representing. The assumption is that they are showing “attention” or “eye movements” but knowing that is certainly not enough to be able to truly understand a heatmap. There are different aspects of eye movements that heatmaps can represent. Examples include fixation count, absolute or relative gaze duration, and percentage/proportion of participants who fixated on each area of the stimulus. Choosing the right heatmap to present depends on the study objectives and the eye movement measures that address these objectives. For example, if search efficiency was of interest to the researchers, one of the measures collected and analyzed might be the number of fixations prior to acquiring the target [1]. Therefore, assuming that the analysis would benefit from data visualization, a fixation count heatmap should be presented. A fixation count heatmap would also be appropriate if the study goal was to determine the amount of interest generated by various elements of the stimulus during a free-view task (i.e., task with no specific task instructions) [1]. However, if noticeability of a particular element was of interest, the percentage of participants who fixated on the element could be used as a measure (in addition to, for example, time to first fixation), which would warrant a participant percentage heatmap. Because each heatmap type has different limitations that impact its interpretation, it is not only important to be aware of these limitations, but also to know the types of all heatmaps included in papers, reports, and presentations. 2.1 Fixation Count Heatmap A visual fixation can be loosely defined as a relatively stationary eye position focused on a particular location of the stimulus (a more precise definition is discussed in section 3.1). Fixations are important events that provide insight into human cognition because during each fixation we extract visual information that we process [2]. A fixation count heatmap (see Fig. 1) shows the accumulated number of fixations across participants. Each fixation made by each participant adds a value to the color map at the location of the fixation [3]. This value is the same for each fixation regardless of its duration, so a 100 ms fixation is represented in the same way as a 900 ms fixation. Thus, when looking at a fixation count heatmap, we cannot assume that areas of the same color received similar total gaze time.

32

A. (Aga) Bojko

Fig. 1. Fixation count heatmap (n = 13; 0 – 12 s; free-view task)

Fixation count heatmaps can also be biased towards individuals who show high interest in elements that others do not. For example, two elements can be the same color but one attracted ten fixations from one participant only, while the other attracted attention of ten participants, one fixation from each. Therefore, we cannot assume that areas that appear similar in terms of “heat” are equivalent in terms of the number of participants who looked at them. Another limitation of fixation count heatmaps is the fact that they can be skewed towards individuals who had a longer exposure to the stimulus and thus an opportunity to produce more fixations. For example, if participant A spent twice as much time on the stimulus than participant B, participant A’s data would impact the heatmap twice as much as participant B’s data. This is something to always keep in mind when viewing heatmaps created based on unequal exposure times. 2.2 Absolute Gaze Duration Heatmap An absolute gaze duration heatmap (see Fig. 2) shows the accumulated time participants spent looking at the different areas of the stimulus. Each fixation made by each participant adds a value to the color map that is proportional to its duration [3]. For example, a 900 ms fixation will be nine times higher in color value than a 100 ms fixation. Because fixation duration is an indicator of cognitive processing [4], a heatmap that is scaled by fixation duration not only shows which areas were attended to but also represents the level of cognitive processing that the areas required.

Fig. 2. Absolute gaze duration heatmap (n = 13; 0 – 12 s; free-view task)

Informative or Misleading? Heatmaps Deconstructed

33

An absolute gaze duration heatmap can be misleading because it displays different phenomena in the exact same way. For example, this type of heatmap will make one 900 ms fixation look the same as nine 100 ms fixations. A 900 ms fixation on an element indicates that one person looked at it for a while, while nine 100 ms fixations could mean, for example, that one person made nine brief fixations to the element or nine people made one brief fixation each. In addition, similar to the fixation count heatmaps, absolute gaze duration heatmaps can be biased towards individuals who spent more time looking at the stimulus. To eliminate any bias due to unequal exposure times, the gaze duration data can be normalized to create relative gaze duration heatmaps. 2.3 Relative Gaze Duration Heatmap A relative gaze duration heatmap shows the accumulated time each participant spent fixating at the different areas of the stimulus relative to the total time the participant spent looking at the stimulus [3]. In other words, if participant A spent 6 seconds on a web page including 2 seconds on the navigation and participant B spent 60 seconds on the same web page including 20 seconds on the navigation, this type of heatmap will make their data, as it relates to the navigation, the same weight. Similar to the absolute gaze duration heatmap, this heatmap will also show a high gaze time by an individual (proportional to his/her total viewing time) the same way as several short gaze times by a number of individuals (proportional to their total viewing times). If the exposure time is equal across participants (e.g., all participants saw the page for 12 s), which is the case in the examples presented in this paper, the relative gaze duration heatmap and the absolute duration heatmap will be identical. 2.4 Participant Percentage Heatmap A participant percentage heatmap (see Fig. 3) shows the percentage of participants who fixated on the different areas of the stimulus. Each participant who looked at any given location adds a value to the color map. This value is the same for each participant regardless of the number of fixations he or she made or the fixation durations. Thus, an area that was briefly fixated once by each of the participants will be presented in the same color as an area that was fixated multiple times by each of the participants and the fixations were much longer.

Fig. 3. Participant percentage heatmap (n = 13; 0 – 12 s; free-view task)

34

A. (Aga) Bojko

3 Display Settings for Creating Heatmaps The appearance of heatmaps can be modified by manipulating various display settings. While adjusting the settings cannot change the relative distribution of attention, certain areas can be made to appear “hotter” or “colder.” This can be done, for example, by changing the fixation criteria used for the analysis, showing raw data instead of fixation data, changing the upper threshold definition of the color scale, or modifying the time segment for the presented data. Since heatmap display settings can have a great impact on the appearance of heatmaps, they must be properly selected and communicated to ensure accurate interpretation. 3.1 Changing Fixation Criteria There are several algorithms that can be used to define a fixation. A common algorithm used in commercial eye tracking software is based on duration and dispersion threshold identification. To define a fixation using this algorithm, two parameters need to be specified: minimum fixation duration (e.g., 80 ms) and maximum dispersion threshold (e.g., 0.5 degree of visual angle) [5]. A fixation defined by 80 ms and 0.5° will encompass all consecutive eye movements that occurred within 0.5° from each other for at least 80 ms. Unless noted otherwise, the heatmaps in this paper were created using these settings. Manipulating the duration and dispersion thresholds will change the number of fixations in the data. For example, increasing the minimum fixation duration from 80 ms to 200 ms will decrease the number of fixations because it will exclude all the fixations between 80 ms and 200 ms. Increasing the maximum dispersion threshold from 0.5 degree to 1 degrees of visual angle will also decrease the number of fixations because some of the fixations that are closer together will be combined. Conversely, reducing the minimum fixation duration and maximum dispersion threshold will increase the number of fixations. More fixations in the data will increase the amount of “heat” in the heatmap, as shown in Figure 4.

Fig. 4. Fixation count heatmaps based on the same data (n = 13; 0 – 12 s; free-view task). On the left: minimum fixation duration = 200 ms; on the right: minimum fixation duration = 80 ms.

The lack of explicit fixation definition is an issue that does not pertain specifically to heatmaps but to entire papers and reports. Many user experience studies that analyze fixation data never mention how these fixations were defined. However, this information is very important for two reasons. First, the results of different studies are

Informative or Misleading? Heatmaps Deconstructed

35

not comparable unless it is clear that fixations were defined in the same way. Second, if the definition is not provided, it cannot be verified whether or not the fixation criteria were appropriate for the stimuli used in the study. For example, fixation duration threshold for image viewing should be higher than for reading because image viewing tends to produce longer fixations than reading due to the fact that more information is being processed in a single fixation [6]. The fixation definition used to create heatmaps should match the definition used for data analysis. Changing the fixation duration to obtain a visualization of a particular intensity is not a good practice. 3.2 Displaying Raw Data Instead of Fixation Data One step further from decreasing the fixation duration and dispersion thresholds is presenting raw data instead of fixation data. Raw data consists of meaningful eye movements (raw fixation points) and “noise” – eye movements that have little meaning in most user experience research. The noise includes eye movements that take place during saccades (rapid eye movements between fixations) as well as drifts, tremors, and flicks that occur during fixations [5]. Adding all the noise intensifies the heatmap, increasing the area covered in red (see Fig. 5).

Fig. 5. Fixation count heatmaps based on the same data (n = 13; 0 – 12 s; free-view task). On the left: fixation data; on the right: raw data.

As a general rule, fixation data rather than raw data should be used when creating visualizations unless the stimulus has moving elements and the heatmap has to show smooth pursuit eye movements. We can assume that fixation data was used if the paper or report specifies how the researchers defined a fixation. 3.3 Changing the Definition of the Color Scale Upper Threshold Another way to manipulate the amount of heat in heatmaps is by changing the definition of the upper threshold on the scale, which is usually indicated by the red color. If the requirements for an area to be red are lowered, the amount of red in the heatmap will increase. Lowering of the upper threshold of the scale can be achieved by decreasing the minimum number of fixations in fixation count heatmaps (see Fig. 6) or by decreasing the minimum gaze length in absolute gaze duration heatmaps.

36

A. (Aga) Bojko

There is no set process for choosing the right upper threshold. The rule of thumb is to make sure that the heatmap properly captures the range of values that are of interest to the study. Threshold selection can be compared to setting the maximum value on the Y axis in a graph, where the Y axis indicates values of the independent variable. If the Y axis is too short, the data points that exceed the maximum on the axis are cut off. As a result, we only know that these data points are higher than the maximum but we do not know what they are exactly and how they differ from one another. On the other hand, if the Y axis is too high, the graph data will appear compressed and the differences between the data points will look smaller. Similarly, if a heatmap’s upper threshold is set too low, many areas will be covered in red with no differentiation between the amount of attention each attracted. Conversely, setting a heatmap’s upper threshold too high will limit the range of colors (e.g., constricting it to yellow or orange as the maximum) and no areas will be covered in red. Regardless of what criteria have been selected, they should be explicitly communicated in the figure legend or caption (e.g., “red = 10+ fixations” or “red color indicates areas that accumulated 10 s or more of gaze time”). It is also useful to put this value in context by providing the average number of fixations each participant made on the stimulus or the average time each participant spent looking at it. The heatmaps presented in this paper were created based on data with the average of 42 fixations on the page per participant and consistent exposure time of 12 seconds. If heatmaps are generated for different experimental conditions or participant groups, their upper threshold definition should be identical, so the heatmaps can be compared. If the display settings are not the same, the differences between the heatmaps may be due to factors other than the data itself.

Fig. 6. Fixation count heatmaps based on the same data (n = 13; 0 – 12s; free-view task). On the left: red = 10+ fixations; on the right: red = 3+ fixations.

3.4 Modifying the Time Segment Sometimes it may be appropriate to present data based on a shorter time segment than the total time during which a stimulus was shown to the participants. This will obviously decrease the amount of data, thus reducing the size of red areas in all heatmap types mentioned in this paper except for the relative gaze duration heatmap (see Fig. 7). Therefore, if a heatmap presents data from a time segment that is shorter than the total viewing time, this needs to be specified in the figure legend or caption.

Informative or Misleading? Heatmaps Deconstructed

37

In addition to the time segment, it should also be clear what participants were trying to do when the data presented in the heatmap was being collected. All too often heatmaps are shown without any context of the task. Eye movements are very taskdependent [7] and participants trying to log in to a website will produce very different attention distribution than if they were trying to find a product. Even if there was no specific task, it should still be noted that the data was collected in a free-view situation, which is how the heatmaps included in this paper were obtained.

Fig. 7. Fixation count heatmaps of the same free-view task (n = 13). On the left: data from 0 to 6 seconds; on the right: data from 0 to 12 seconds.

4 Heatmap Usage 4.1 Common Mistakes There are a few common mistakes when it comes to using heatmaps. The biggest of them is a belief that heatmaps are appropriate for just about any user experience study and for any research question. Sometimes creating heatmaps even becomes the objective of the study (e.g., “we just wanted to see where people look”). This popular “let’s-track-and-see-what-happens” approach has a limited value because of its lack of focus (i.e., the study design does not target specific questions) and frequent lack of proper data analysis. Researchers often draw conclusions or make recommendations based on the results of those studies, which is inappropriate. For example, we cannot say that the reason why an element did not get much attention was because of its suboptimal placement or insufficient size unless we have tested other conditions (i.e., alternative placements or sizes) and compared the data using appropriate statistics. Even if different conditions were tested, sometimes conclusions regarding differences between conditions are made just by looking at the heatmaps. However, heatmaps do not lend themselves to any systematic comparison. Without any data analysis, it is impossible to tell if there are real differences between heatmaps, even if the heatmaps appear to be different. 4.2 Proper Usage Examples Heatmaps should be used in a purposeful way and only when they add value. They can serve as illustrations of participants’ viewing behavior and distribution of attention.

38

A. (Aga) Bojko

While they can communicate data, they cannot explain it or help analyze it. Therefore, heatmaps can rarely stand on their own. To maximize their usefulness and reduce ambiguity, heatmaps should accompany a quantitative analysis. One of our studies investigated a new standardized label template for prescription drug labels [8]. The goal was to determine the impact of the template on pharmacists’ drug selection speed and accuracy as compared to the existing label designs. The eye tracking measures included the number of fixations prior to target selection as an indicator of search efficiency, average fixation duration as a measure of information processing difficulty, and pupil diameter as a measure of cognitive workload. The results were presented in the form of statistical analyses but no heatmaps were included in the report. The study was a quantitative assessment of the effectiveness of the new labels, and heatmaps showing attention distribution were simply of no value. In another study, we evaluated a new homepage design for a professional organization against the original homepage [9]. Our objective was to identify which design was better and why based on a series of tasks during which participants attempted to locate the correct entry point on the homepage. Measures of search efficiency such as the number of fixations and the number of eye visits to the target prior to target selection were analyzed. The analysis was supplemented with heatmaps to show the distribution of attention on the page and to help account for any inefficiencies that occurred. For example, several tasks were more efficient using the new design which had a more centralized navigation. The heatmaps of the original design showed scattered fixations covering multiple navigation areas, while heatmaps of the new design revealed fixations focused mostly around the targets.

5 Guidelines for Using Heatmaps Even though heatmaps are very compelling and seemingly easy to understand, they should be used with caution and according to the following guidelines, summarized based on the discussion from the previous sections of this paper: A. B. C. D.

E. F. G. H.

I. J.

Generate heatmaps only if they add value to the research. Use heatmaps for data visualization instead of data analysis. Use heatmaps to support quantitative analysis rather than on their own. Understand the different heatmap types and only use the ones that represent measures which address your study objectives (e.g., when analyzing gaze time, use a gaze duration heatmap). Specify the type of data the heatmap is representing (e.g., fixation count or absolute fixation duration). Know the limitations of each heatmap type to avoid incorrect interpretation. When creating heatmaps, use fixation data rather than raw data. Provide fixation definition and keep it consistent for analyses and visualizations within a study (e.g., min fixation duration = 100 ms and max dispersion threshold = 0.5°). Provide the definition for the upper threshold of the heatmap color scale (e.g., red = 10+ fixations). Put the upper threshold value in context (e.g., average number of fixations on the stimulus per participant).

Informative or Misleading? Heatmaps Deconstructed

39

K. Specify the time segment based on which the heatmap was created (e.g., the first 10 seconds of exposure). L. Provide task context for each heatmap (e.g., data obtained from participants during the checkout task). M. Use the same heatmap settings (e.g., upper threshold and time segment) for conditions that you are comparing. N. If a paper or report does not provide important information about its heatmaps (e.g., type, fixation definition, upper threshold definition, time segment), ask the authors for clarification before making any assumptions.

6 Conclusion Blinded by the attractiveness and apparent intuitiveness of heatmaps, we often do not realize how much information in addition to the visualization itself is necessary to fully understand a heatmap and properly interpret the data it represents. In other words, the biggest danger involved in creating and reading heatmaps is that we are often unaware of what we do not know, and thus we do not look or ask for the missing information. This paper has exposed some of these gaps in our meta-knowledge in the hope to encourage more critical thinking about the usage of heatmaps.

References 1. Jacob, R.J.K., Karn, K.S.: Eye Tracking in Human-Computer Interaction and Usability Research: Ready to Deliver the Promises. In: Hyona, J., Radach, R., Deubel, H. (eds.) The Mind’s Eye: Cognitive and Applied Aspects of Eye Movement Research, pp. 573–605. Elsevier Science, Amsterdam (2003) 2. Liversedge, S.P., Findlay, J.M.: Saccadic Eye Movements and Cognition. Trends in Cognitive Sciences 4, 6–14 (2000) 3. Tobii.: Tobii Studio 1.X User Manual (2008) 4. Duchowski, A.: Eye Tracking Methodology: Theory and Practice. Springer, Heidelberg (2003) 5. Salvucci, D.D., Goldberg, J.H.: Identifying Fixations and Saccades in Eye-Tracking Protocols. In: Proceedings of Eye Tracking Research and Applications Symposium (2000) 6. Castelhano, M.S., Rayner, K.: Eye Movements During Reading, Visual Search, Scene Perception: An Overview. In: Rayner, K., Shem, D., Bai, X., Yan, G. (eds.) Cognitive and Cultural Influences on Eye Movements, pp. 3–33. Psychology Press (2008) 7. Yarbus, A.L.: Eye Movements and Vision. Plenum Press (1967) 8. Bojko, A.: Measuring the Effects of Drug Label Design and Similarity on Pharmacists’ Performance. In: Tullis, T., Albert, W. (eds.) Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics, pp. 271–280. Morgan Kaufmann, San Francisco (2008) 9. Bojko, A.: Using Eye Tracking to Compare Web Page Designs: A Case Study. Journal of Usability Studies 1, 112–120 (2006)

Toward EEG Sensing of Imagined Speech Michael D’Zmura, Siyi Deng, Tom Lappas, Samuel Thorpe, and Ramesh Srinivasan Department of Cognitive Sciences, UC Irvine, SSPB 3219, Irvine, CA 92697-5100 {mdzmura,sdeng,tlappas,sthorpe,r.srinivasan}@uci.edu

Abstract. Might EEG measured while one imagines words or sentences provide enough information for one to identify what is being thought? Analysis of EEG data from an experiment in which two syllables are spoken in imagination in one of three rhythms shows that information is present in EEG alpha, beta and theta bands. Envelopes are used to compute filters matched to a particular experimental condition; the filters' action on data from a particular trial lets one determine the experimental condition used for that trial with appreciably greater-than-chance performance. Informative spectral features within bands lead us to current work with EEG spectrograms. Keywords: EEG, imagined speech, covert speech, classification.

1 Introduction Dewan reported in 1967 [1] that he and several others could transmit letters of the alphabet using EEG recording; their ability to control voluntarily the amplitude of their alpha waves let them send letters using Morse code. Farwell and Donchin [2] used a second method for transmitting linguistic information by EEG; the P300 response to targets let them determine which of a sequence of displayed letters a person had in mind. Finally, Suppes and colleagues [3,4] reported that they were able to classify EEG responses to heard sentences and that there was information available through EEG concerning imagined speech. The work reported here verifies this latter result and develops it further, with the aim of using EEG brain waves to communicate imagined speech. Classification experiments show that brain-wave signatures of imagined speech lets one distinguish linguistic content, with varying degree of success. These signatures include differences in alpha-, beta- and theta-band activity. Our near-term goal is to use such signatures to design filters that let one distinguish linguistic elements in real time. One application is in further experiments which provide feedback to the thinker as to how recognizable a particular thought is, with the aim of training the thinker to produce brain waves which are more discernible. The focus within is on the results of a single experiment in which subjects produce in their imagination one of two syllables in one of three different rhythms. Analysis of the data using envelopes within alpha, beta and theta frequency bands, spectral features within single bands, and spectrograms in current work show that EEG provides considerable information on imagined speech in this experimental setting. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 40–48, 2009. © Springer-Verlag Berlin Heidelberg 2009

Toward EEG Sensing of Imagined Speech

41

2 Methods Four subjects participated in an experiment with six conditions determined factorially through combination of two syllables and three rhythms (see Figure 1). A single experimental session comprised 20 trials for each of the six conditions; conditions were presented in block-randomized order. Each subject participated in six such sessions for a total of 120 trials per condition. EEG was recorded using a 128 Channel Sensor Net (Electrical Geodesics) in combination with an amplifier and acquisition software (Advanced Neuro Technology). The EEG was sampled at 1024Hz and on-line average referenced. Subjects were instructed to keep their eyes open in the dimly-lit recording room and to avoid eye and other movement during the six seconds, following the cue, during which speech was imagined without any vocalization whatsoever.

Fig. 1. Timelines for the six conditions, labeled at left, in the imagined speech experiment. Durations are indicated in eighths of a second. The syllable /ba/ was imagined in one of three different rhythms in conditions 1-3; the syllable /ku/ was imagined in conditions 4-6. The condition for each trial was cued during an initial period of duration 4.5 sec (12/8 sec = 1.5 sec; 3 x 1.5 sec = 4.5 sec). During this initial period, subjects heard through Stax electrostatic earphones either a spoken "ba" or a spoken "ku" followed by a train of clicks (arrows) indicating the rhythm to be reproduced. Subjects were instructed to speak in their imagination the cued syllable, illustrated for condition 1 by {ba}, using the cued rhythm and tempo. Desired imagined syllable onset times reproduce the cued rhythm and are indicated by asterisks.

3 Analysis and Results Our aims were to classify single trials offline according to condition and discern condition signatures needed to create online filters. EEG waveform envelopes provide encouraging results when used to classify trials according to experimental condition.

42

M. D’Zmura et al.

These envelopes were computed for each electrode in the theta (3-8Hz), alpha (813Hz) and beta (13-18Hz) frequency bands and used to construct matched filters. These filters work well to classify trials according to condition, yet further results with spectra lead us to explore a finer-grained analysis using spectrograms. 3.1 Matched-Filter Classification Using Envelopes We used waveform envelopes to compute matched filters, one for each of the six conditions (see Figure 2). The inner product of each matched filter with a particular trial's envelope provides six numbers. The matched filter which gives rise to the largest inner product is the best match, and the condition to which the matched filter corresponds is declared the best guess as to the experimental condition for that trial. Offline preprocessing steps include • segment the EEG data to provide time-varying waveforms for each condition, electrode and trial; • remove from further consideration 18 electrodes most sensitive to electromyographic artifact: those with the lowest positions about the head and spanning locations close to the eyes, low on the temple, at or below the ear, and at or below the external occipital protuberance; • remove the mean and linear trend from each segmented waveform; • low-pass filter the detrended, segmented, waveforms to remove 60Hz line noise; • use thresholds to identify and remove from further consideration filtered waveforms likely contaminated with electromyographic artifact. Alpha-, beta- and theta-band activity were computed for each electrode and trial using band-pass elliptic filters. These band-pass filtered waveforms (we[t] and bottom right plot in Fig. 2) were Hilbert-transformed to provide corresponding envelopes (ve[t] and middle right plot). The envelopes serve as input to the matched-filter classification and as data used to construct the matched filters. An electrode's average envelopes, found by averaging across trials for each of the six conditions, serve as matched filters with one further step: for each electrode, pseudoinvert the six conditions' average envelopes to provide six filters (Fec[t] and top right plot). The inversion of the average envelopes provides filters which return an inner product of one for the corresponding condition's average envelope and an inner product of zero for all other condition's average envelopes. Information may be integrated across electrodes by summing the measures pe,c across electrodes and determining the maximum of the resulting six numbers. Information may be integrated across bands by weighing individual response measures for the three bands by band reliability estimates or by a voting procedure. Classification results shown in Table 1 show that the beta band (13-18Hz) is, for all but one subject, the most informative frequency band; theta (3-8Hz) and alpha (813Hz) are comparable for all subjects but S1. The per-condition classification performance is shown for subject S2 in Fig. 3. Rows refer to a trial's actual condition, while columns refer to the condition of the matched filter for which the maximal response was obtained. Darker values indicate higher percentages of trials. These dark values describe diagonals, which indicates a match between a trial's actual condition and the condition with matched filter providing the greatest response.

Toward EEG Sensing of Imagined Speech

43

Fig. 2. Preprocessed and band-pass-filtered waveform we[t], recorded by electrode e, is Hilberttransformed to provide envelope ve[t]. Shown at bottom right is the preprocessed waveform from an electrode with mid-parietal location from subject S1 for a trial during which {ba} is spoken in imagination at times 1.5, 3.0 and 4.5 sec (condition 1). Shown at middle right is the corresponding envelope. The inner product <, > of this envelope with each matched filter Fe,c[t], one filter per condition per electrode, provides six numbers pe,c. These six numbers measure how well the particular trial's envelope matches the filter for each condition. The maximum of these six numbers is used to determine the most likely condition c˜ e . Shown at top right is a matched filter for condition c = 1.

The distributions of classification performance across the scalp are similar across the four subjects; those for subject S4 are shown in Figure 4. The most informative electrodes lie largely near the top of the head (vertex) where electromyographic artifacts have their least influence. These distributions of information differ significantly from the distributions of envelope amplitude, which follow well-known patterns for alpha activity (parietal) and theta activity (frontal).

44

M. D’Zmura et al.

Table 1. Classification performance using matched filters for envelopes in three frequency bands. The fraction of correctly-classified trials (720 trials per subject, identified in the left column) is indicated. The chance performance level in this classification among six conditions is 1/6 (0.17).

S1 S2 S3 S4

alpha 0.38 0.63 0.44 0.64

beta 0.80 0.87 0.68 0.62

theta 0.63 0.59 0.46 0.59

Fig. 3. Classification matrices for subject S2. Values along the diagonals indicate correct classification, while non-white values off the diagonal indicate errors. The black gray-value in the middle panel (beta-band) for actual condition 1 and matched filter condition 1 (top left square) represents 91% of the trials; lighter shades indicate smaller values (through white indicating 0%). Perfect performance would be indicated by all off-diagonal entries set to white.

Fig. 4. Distributions of classification performance across the scalp for alpha (α), beta (β) and theta (θ) bands for subject S4. Darker values indicate the positions of electrodes providing better classification performance.

3.2 Spectral Features

Analysis of trials’ power spectral densities shows that differences in power within single frequency bands can provide information concerning trial condition. Spectral

Toward EEG Sensing of Imagined Speech

45

differences within a single band are completely invisible to the previous analysis, which grouped together activity at all frequencies within a single band. Data for subjects S1 and S2 show that, for condition 3, there is a peak in the power spectrum in the range 5-6Hz, relative to baseline power measured at 3.5-4.5Hz. This peak is largely absent for condition 1. This difference between the average power spectra for conditions 3 and 1 is localized to electrodes with front central locations on the scalp (see Fig. 5, leftmost panels). This spectral difference alone, within the theta band, provides ~75% correct two-way classification between these two conditions. The peak and the localized spatial distribution do not exist for subjects S3 and S4 (see Fig. 5, rightmost panels).

Fig. 5. Difference between condition 3 and condition 1 theta sub-band spectral power localized primarily to electrodes with front, center locations for subjects S1 and S2; this pattern is not evident for S3 and S4

3.3 Spectrographic Analysis

The spectral data suggest several things. The first is that one would do well to represent EEG data spectrographically: record power as a fine-grained function of both frequency and time. The second and third are more cautionary. There is a tremendous amount of trial-by-trial variability in EEG recordings. Adding channels, like further frequency bands of narrow range as in spectrograms, may have the effect of masking signals more effectively. Furthermore, fine subdivision of the frequency domain may reveal individual differences (as in Fig. 5), so that averages across subjects become less meaningful. These caveats notwithstanding, our current work focuses on spectrographic means of identifying imagined speech.

4 Discussion The syllables /ba/ and /ku/ have little semantic content in and of themselves, so that differences in EEG records underlying classification performance are unlikely to reflect semantic contributions to imagined speech production. Furthermore, each experimental trial starts with a period during which the cue to the condition is presented; cue recognition and response planning occur during the cue period and are likely absent during the following period in which EEG recording is made of imagined speech. The EEG recordings analyzed here concern most likely immediate aspects of imagined speech production based on readout from working memory.

46

M. D’Zmura et al.

EEG electrodes record not only signals from brain cortex but also electromyographic activity. Common sources of the latter are muscles responsible for eye and head movements and other muscles near recording electrodes, like those in the temples and neck. Electrodes atop or close to such muscles are most sensitive to electromyographic activity. The most flagrant such offenders were excluded from analysis in the work reported here. Furthermore, recordings for which EEG signals (detrended and filtered to excluded 60Hz line noise) exceeded 30 μV in absolute value were excluded from consideration. This threshold eliminated many (but not all) recordings contaminated with artifact. The most informative electrodes are those positioned on the scalp near the top of the head (Fig. 3), and this suggests that the information is due more to cortical than electromyographic activity. Subjects were instructed to think the imagined speech without any vocal or subvocal activity: without moving any muscles involved in producing overt speech. This contrasts with what is the most successful method yet for recognizing speech produced without any acoustic signal: surface electromyographic (SEMG) recordings from face and throat muscles. Substantial progress has been made in automatic speech recognition from surface EMG recordings of vocal tract articulators made during both spoken speech [5-10] and subvocalized speech [11-17]. Further non-acoustic methods for extracting imagined speech information potentially include magnetoencephalography (MEG) and invasive methods, like electrocorticography (ECoG), which involve implanted electrodes. While there are no published positive results yet for classification of imagined speech using MEG, its high temporal resolution and moderate spatial resolution (comparable to high-density EEG) suggest its value, as does success in classifying heard speech [18-21]. MEG is most sensitive to cortical signals from surfaces perpendicular to the scalp (in sulci) while EEG is most sensitive to signals from parallel surfaces (gyri) [22], so that the two methods are complementary. While the use of electrodes implanted in the brain is limited to small clinical populations for which these electrodes are indicated in combination with neurosurgery, it can provide very useful information concerning imagined speech [23]. Functional magnetic resonance imaging (fMRI) studies have contributed significantly to our neuroscientific knowledge of language perception and production [24-27]. Yet the poor temporal resolution of fMRI and of functional near-infrared imaging (fNIR) make these unlikely candidates for speech recognition. EEG is a non-invasive, non-injurious method for probing cortical activity which has a high temporal resolution, moderate spatial resolution (~2cm), a relatively low cost and which is increasingly portable. We feel that its greatest contribution as a braincomputer interface is likely to be found in circumstances where a human subject is trained to produce activity discernible by EEG [28]. Such training can reduce the variability associated with imagined speech production and also help one emphasize aspects of imagined speech which are most easily discerned by EEG. Our next step is to close the loop by providing real-time feedback concerning imagined speech signals.

Acknowledgements We thank David Poeppel for suggesting the value of an experiment like this. This work was supported by ARO 54228-LS-MUR.

Toward EEG Sensing of Imagined Speech

47

References 1. Dewan, E.M.: Occipital alpha rhythm eye position and lens accommodation. Nature 214, 975–977 (1967) 2. Farwell, L.A., Donchin, D.: Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogy and Clinical Neurophysiology 70, 510–523 (1988) 3. Suppes, P., Lu, Z.-L., Han, B.: Brain wave recognition of words. Proceedings of the National Academy of Science USA 94, 14965–14969 (1997) 4. Suppes, P., Han, B., Lu, Z.-L.: Brain wave recognition of sentences. Proceedings of the National Academy of Science USA 95, 15861–15866 (1998) 5. Morse, M.S., O’Brien, E.M.: Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscles using surface electrodes. Computers in Biology and Medicine 16, 399–410 (1986) 6. Chan, A.D.C., Englehart, K., Hudgins, B., Lovely, D.F.: Hidden Markov model classification of myoelectric signals in speech. IEEE Engineering in Medicine and Biology 21, 143– 146 (2002a) 7. Chan, A.D.C., Englehart, K., Hudgins, B., Lovely, D.F.: Multiexpert automatic speech recognition using acoustic and myoelectric signals. IEEE Transactions on Biomedical Engineering 53, 676–685 (2002b) 8. Bu, N., Tsuji, T., Arita, J., Ohga, M.: Phoneme classification for speech synthesizer using differential EMG signals between muscles. In: Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pp. 5962–5966 (2005) 9. Jou, S.-C., Maier-Hein, L., Schultz, T., Waibel, A.: Articulatory feature classification using surface electromyography. In: Acoustics, Speech and Signal Processing, ICASSP 2006 Proceedings, pp. I–605–I–608 (2006) 10. Jou, S.-C., Schultz, T., Walliczek, M., Kraft, F., Waibel, A.: Towards continuous speech recognition using surface electromyography. In: Interspeech 2006 – ICSLP, pp. 573–576 (2006) 11. Jorgensen, C., Lee, D.D., Agabon, S.: Sub-auditory speech recognition based on EMG signals. In: Proceedings of the International Joint Conference on Neural Networks, Portland Oregon (July 2003) 12. Betts, B.J., Jorgensen, C.: Small vocabulary recognition using surface electromyography in an acoustically harsh environment. NASA/TM-2005-213471 (2005) 13. Maier-Hein, L.: Speech recognition using surface electromyography. Diplomarbeit, Universität Karlsruhe (2005) 14. Maier-Hein, L., Metze, F., Schultz, T., Waibel, A.: Session independent non-audible speech recognition using surface electromyography. In: 2005 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 331–336 (2005) 15. Jorgensen, C., Binsted, K.: Web browser control using EMG based sub vocal speech recognition. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS 2005), pp. 294–301 (2005) 16. Binsted, K., Jorgensen, C.: Sub-auditory speech recognition. In: HICSS (2006) 17. Walliczek, M., Kraft, F., Jou, S.-C., Schultz, T., Waibel, A.: Sub-word unit based nonaudible speech recognition using surface electromyography. In: Interspeech 2006 - ICSLP, pp. 1487–1490 (2006) 18. Numminen, J., Curio, G.: Differential effects of overt, covert and replayed speech on vowel-evoked responses of the human auditory cortex. Neuroscience Letters 272(1), 29–32 (1999)

48

M. D’Zmura et al.

19. Ahissar, E., Nagarajan, S., Ahissar, M., Protopapas, A., Mahncke, H., Merzenich, M.M.: Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proceedings of the National Academy of Sciences 98, 13367–13372 (2001) 20. Houde, J.F., Nagarajan, S.S., Sekihara, K., Merzenich, M.M.: Modulation of the auditory cortex during speech: an MEG study. Journal of Cognitive Neuroscience 15, 1125–1138 (2002) 21. Luo, H., Poeppel, D.: Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010 (2007) 22. Nunez, P.L., Srinivasan, R.: Electric Fields of the Brain: The Neurophysics of EEG, 2nd edn. Oxford University Press, New York (2006) 23. Barras, C.: Brain implant helps stroke victim speak again. New Scientist (July 2008) 24. Indefrey, P., Levelt, W.J.M.: The spatial and temporal signatures of word production components. Cognition 92, 101–144 (2004) 25. Hickok, G., Poeppel, D.: Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92, 67–99 (2004) 26. Hickok, G., Poeppel, D.: The cortical organization of speech processing. Nature Reviews Neuroscience 8, 393–402 (2007) 27. Poeppel, D., Idsardi, W.M., van Wassenhove, V.: Speech perception at the interface of neurobiology and linguistics. Philosophical Transactions of the Royal Society B 363, 1071–1086 (2007) 28. Fetz, E.: Volitional control of neural activity: implications for brain-computer interfaces. Journal of Physiology 579, 571–579 (2007)

Monitoring and Processing of the Pupil Diameter Signal for Affective Assessment of a Computer User Ying Gao, Armando Barreto, and Malek Adjouadi Department of Electrical and Computer Engineering Florida International University Miami, FL 33174 USA {ygao002,barretoa,adjouadi}@fiu.edu

Abstract. The pupil diameter (PD) has been found to respond to cognitive and emotional processes. However, the pupillary light reflex (PLR), is known to be the dominant factor in determining pupil size. In this paper, we attempt to minimize the PLR-driven component in the measured PD signal, through an ∞ Adaptive Interference Canceller (AIC), with the H time-varying (HITV) adaptive algorithm, so that the output of the AIC, the Modified Pupil Diameter (MPD), can be used as an indication of the pupillary affective response (PAR) after some post-processing. The results of this study confirm that the AIC with the HITV adaptive algorithm is able to minimize the PD changes caused by PLR to an acceptable level, to facilitate the affective assessment of a computer user through the resulting MPD signal. Keywords: Pupil diameter (PD), Pupillary light reflex (PLR), Pupillary affec∞ tive response (PAR), Adaptive Interference Canceller (AIC), H time-varying (HITV) adaptive algorithm.

1 Introduction Affective Computing, defined as “computing that relates to, arises from, or deliberately influences emotions” by Picard in 1997 [1] requires computers to have the ability to remain aware of their users’ affective states. Therefore, the affective assessment of the computer user has been considered as one of the key challenges to overcome for improvement of the relationship between human and computers. This challenge has been addressed, among other approaches, through the processing of a variety of physiological signals from the computer users, such as the Galvanic Skin Response (GSR), the Blood Volume Pulse (BVP), etc. We are currently studying the possibility of analyzing pupil diameter (PD) variations in the computer user for the detection of affective changes. The human pupil is the variable-size opening in the iris of the eye, through which light passes to the retina. The size of pupil is known to be controlled by two opposing sets of muscles in the iris, the sphincter and dilator pupillae, which are governed by the sympathetic and parasympathetic divisions of the Autonomic Nervous System (ANS) [2]. Traditionally, light intensity has been considered as the major determinant of the pupillary response, through the Pupillary Light Reflex (PLR). However, other J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 49–58, 2009. © Springer-Verlag Berlin Heidelberg 2009

50

Y. Gao, A. Barreto, and M. Adjouadi

psychological factors controlling pupil size, such as emotional processes have recently been investigated. For example, in 2003, Partala and Surakka found, using auditory emotional stimulation, that the pupil size variation can also be seen as an indication of affective processing during human-computer interaction [3]. Therefore, in this paper, we focus our study on monitoring and processing the pupil diameter (PD) signal for the affective assessment of a computer user. To achieve the required separation of the PLR-driven component from PD changes due to Pupillary Affective Responses (PAR) we propose an Adaptive Interference Canceller (AIC), which is known to be able to remove an unwanted interference component z(k) that pollutes a measured signal s(k) using an independent measurement of the interference n(k) [4] (See Figure 1). The core portion of this AIC system is an Adaptive Transversal Filter (ATF), which performs the adaptive algorithm to implement the function of interference canceling. To ensure a successful noise removal from the AIC, it is important to select an adaptive algorithm that possesses good robustness properties. The H∞ adaptive algorithm, introduced in robust control theory, is an attempt at addressing this need, with features that are useful in safeguarding against the worst-case of model uncertainties and makes no assumption on the (statistical) nature of the signals [5].

s (k )

d (k )

d (k )

z (k )

n( k ) d (k )

n( k )

r ( k ) = n( k )

r (k )

∧

∧

y (k ) = z (k )

Σ

s ( k ) = e( k )

u (k ) = 0

Fig. 1. Adaptive Interference Canceller (AIC) Block Diagram

Thus, we propose to use an H∞ –based adaptive technique, namely the H∞ timevarying (HITV) algorithm, in the Adaptive Interference Canceller for the removal of the PLR-driven component from the pupil diameter variation. Our intent is to use the output of the AIC, the Modified Pupil Diameter (MPD), further refined by additional processing (median evaluation of a sliding window), to indicate the occurrence of affective changes (e.g., onset of stress) in the computer user.

2 Methods 2.1 AIC Overview A basic AIC block diagram is illustrated in Figure 1, above. The signal of interest, polluted with an uncorrelated noise signal, is transmitted over the top channel, and constitutes the primary input to the AIC. The bottom channel receives a signal, which

Monitoring and Processing of the PD Signal

51

is uncorrelated with the signal of interest, but correlated with the interference, constituting the reference input. The output of AIC is expected to provide a signal where the correlated interference has been removed. As described in the previous paragraph, the key equations describing the AIC are: Primary Input: Reference Input:

Output:

d (k ) = s (k ) + z (k )

(1)

r ( k ) = n( k ) + u ( k )

(2)

e(k ) = d (k ) − y (k ) = d (k ) − zˆ (k ) = sˆ(k )

(3)

where, 9 9 9 9

s(k) is the signal of interest (PD- driven by PAR); z(k) is the interference in the primary sensor (PD- driven by PLR); n(k) is the actual source of the interference (Illumination changes); u(k) is the measurement noise. (In this study, u(k) is assumed to be zero);

In the AIC system, the core element is an adaptive transversal filter (ATF), where the reference signal r(k) is processed to produce an output y (k ) = zˆ (k ) that is an approximation of z(k). The state space model of the ATF is given by [5]: w(k + 1) = w(k ) + Δw(k )

(4)

d (k ) = r (k ) w(k ) + v(k )

(5)

z (k ) = r (k ) w(k ) + υ (k )

(6)

T

T

v(k ) = s(k ) + υ (k )

(7)

In these equations, 9 9 9 9 9 9

w(k) = system state vector, is the ATF coefficient vector of size m × 1 (m is the order of ATF); d(k) = measurement sequence, is the observed pupil diameter(PD) signal. z(k) = sequence to be estimated Δw(k ) = process noise vector, represents the time variation of the ATF weights w(k). v(k) = measurement noise vector, includes s(k) (PD- driven by affective changes) and model uncertainties υ (k ) .

r (k ) = [rk , r( k −1) ,......, r( k − m +1) ]T is the interference vector of size m ×1

As shown in Figure 1, for this study we define the primary input of the AIC as the recorded pupil diameter signal, which is composed by the signal of interest s(k) (PD- driven by PAR) and the interference z(k) (PD- driven by PLR). Although, the reference input comprises both the actual source of the interference n(k) (actual illumination changes) and the measurement noise u(k), under the assumption that the measurement noise is negligible, an independent measurement of illumination in the neighborhood of the eye of the subject (IL) is used as the reference input. We expect

52

Y. Gao, A. Barreto, and M. Adjouadi

the adaptive transversal filter (ATF) to emulate the transformation of the illumination variations to pupil diameter changes, which would convert the noise n(k) into a closeenough replication of the PLR-driven components of PD (the output y(k)). Therefore, the error, e(k) (Modified Pupil Diameter), would be the estimation of the desired signal s(k), i.e., the PD variations due exclusively to affective changes (PAR). 2.2 H∞ Time-Varying Adaptive Algorithm

In this study, the adaptive algorithm we applied to the Adaptive Interference Canceller system is the H∞ time-varying (HITV) adaptive algorithm, which aims to remove the noise from the recorded signal by adaptively adjusting the impulse response of the ATF. The robustness of this HITV algorithm is derived from its minimization of the maximum energy gain from the disturbances to the estimation errors with the following solutions [6]: P% −1 (k ) = P −1 (k ) − ε g−2 r(k )rT (k )

(8)

~

g (k ) =

P(k )r(k ) ~

1 + rT (k ) P(k )r(k)

(9)

wˆ (k + 1) = wˆ (k ) + g (k )( y (k ) − rT (k ) wˆ (k ))

(10)

P(k + 1) = [ P -1 (k ) + (1- ε g-2 )r (k )r T (k )]-1 + γ 0

(11)

P (0) = η I = Π 0 , ϒ 0 = ρ I

(12)

Here, g ( k ) is the gain factor; ε g , η and ρ are positive constants. Note that ρ reflects a priori knowledge of how rapidly the state vector w(k ) varies with time, and η reflects the a priori knowledge of how reliable the initial estimate available for the state vector w(0) is.

3 Experiments 3.1 Subjects

We have collected data from twenty-two volunteer students who have completed the protocol described below. All the subjects reported to have normal color vision and had experience using computers. This paper presents results achieved through the processing methods proposed on a subset of the subject pool recorded to date. 3.2 Task

In order to observe the response of pupil diameter changes to affective stimuli, the “Stroop Color-Word Interference Test”, implemented as a flash program [7], was used to elicit mild mental stress in the participating subjects during the experiment.

Monitoring and Processing of the PD Signal

Fig. 2. Sample of Stroop Test Interface

53

Fig. 3. Stimuli schedule of the Stroop Test

In the test, a word presented to the subject designates a color that may or not match the font of the word. The subjects are instructed automatically by the program to select (out of 5 possible choices) the screen button that indicates the font color of the word presented, by clicking on it (An example, showing the appearance of the test interface is shown in Figure 2). The stimuli schedule of the test in the experiment is shown in the Figure 3. The complete protocol is composed of three consecutive sections. In each section, there were four segments. 9 9 9 9

‘IS’ – the Introductory Segment to let the subject get used to the task environment; ’C’ – the Congruent segment of the Stroop Test, in which the subject was asked to click the on-screen button naming the font color of a word that correctly spelled the font color being displayed; ‘IC’ – the Incongruent segment of the Stroop Test in which the subject was asked to click the on-screen button naming the font color of a word that spelled the name of a different color; ‘RS’ – is a Resting Segment to let the subject relax for some time.

The incongruent Stroop segments (IC) were expected to elicit mild mental stress in the subject, according to previous research found in the psychophysiological literature [8]. In contrast, the congruent Stroop segments (C) were expected to allow the subject to continue in a relaxed state. The binary numbers shown in Figure 3 represent the demultiplexed output of the stimulus generator, which is used to insert the corresponding values (1, 2, 3) in the event channel of the PD data file, and the corresponding time marks in the illumination measurement data, recorded simultaneously. 3.3 Instruments

In this study, a desk-mounted eye tracking system (TOBII T60) was used to measure the pupil diameter signals from both eyes of the subjects at 60 samples/sec. The average ((L+R)/2) was recorded as the “Measured PD” signal, which corresponds to d(k), in Figure 1. Simultaneously, the illumination intensity level present in the area around the eyes of the subjects was recorded by a system built for that purpose. This system is composed by a BS500B0F photo-diode (Sharp), placed on the forehead of the subject and connected to an amplification circuit to provide an analog output voltage that is proportional (~0.0043 v/Lux) to the illumination intensity level [9]. The “Measured IL” signal, r(k), shown in Figure 1 was finally provided by sampling the analog

54

Y. Gao, A. Barreto, and M. Adjouadi

output of the luminance meter at the frequency of 360 Hz with a Data Acquisition (DAQ) System (PCI-DAS6023 board from Measurement Computing Co.) 3.4 Procedure

In our experiments, participants were asked to remain seated in front of the TOBII screen, interacting with the “Stroop Test” for about 30 minutes, while wearing a head band with the photo-diode. During that time, all the normal lights in the room were kept on, but an additional level of illumination provided by a desk lamp placed above the eye level of the subject was switched ON and OFF, alternatively, at intervals not previously known by the subject, using a dimmer. This was done to repeatedly introduce passages of high and low illumination in the experiment, which would trigger the pupillary light reflex.

4 Results Before its application to the adaptive interference canceller, the recorded pupil diameter signal is pre-processed by a blink-removal algorithm implemented in MATLAB, which is able to: 1. Detect the PD data interruptions due to eye blinks (identified as a value of “4” in the validity code provided by the TOBII system); 2. Compensate the missing data by linear interpolation. 3. Filter out the blink responses through a low pass, 512th order FIR filter designed for a cutoff frequency 0.13Hz. Figure 4 illustrates the stages of the blink-removal process on data collected from subject 13. MEASURED PD SIGNAL 6 4 2 0 -2

0

0.5

1

1.5

2

2.5

3 4

x 10 VALIDITY CODES FROM BOTH EYES 6 4 2 0 -2

0

0.5

1

1.5

2

2.5

3 4

x 10 PD SIGNAL AFTER BLINK COMPENSATION 6 4 2 0 -2

0

0.5

1

1.5

2

2.5

3 4

x 10 PD SIGNAL AFTER LOW PASS FILTER 6 4 2 0 -2

C1 0

0.5

IC1 1

C2 1.5

IC2 2

C3 2.5

IC3 3 4

x 10

Fig. 4. PD Signal Before & After Blink-Removal

Monitoring and Processing of the PD Signal

55

The PD signal obtained after blink-removal is applied to the AIC system as the primary input signal d(k), and the reference input r(k) is the simultaneously measured illumination intensity level signal, which is down-sampled from 360Hz to 60 Hz to share the same sampling rate with the PD signal. A MATLAB program was created to apply the HITV adaptive algorithms for the ATF with 120 weights and the parameter settings of η = 0.001 , ε g = 2.0 as well as a time-varying parameter ρ , changed according to the IL value to enable the AIC system to have a quicker response when there is a sudden increase in IL. The output of the AIC, the MPD, is shown in the bottom plot of Figure 5. This signal is further processed as described below to become a useful indicator of pupillary affective response in the subject. AIC PRIMARY INPUT: PD SIGNAL 6 4 2

0

0.5

1

1.5

2

2.5

3 4

x 10 AIC REFERENCE INPUT: IL SIGNAL 2 1.5 1

0

0.5

1

1.5

2

2.5

3 4

x 10 -6

1

AIC TIME-VARING PARAMETER: RHU

x 10

0.5 0

0

0.5

1

1.5

2

2.5

3 4

x 10 AIC OUTPUT: MPD SIGNAL 2 0 -2

C1 0

0.5

IC1 1

C2 1.5

IC2 2

C3 2.5

IC3 3 4

x 10

Fig. 5. AIC Implementation with HITV Algorithm

The affective state of “Stress” is expected to cause a dilation of the pupil [2]. Therefore, the negative portions of the MPD signal, are zeroed to isolate significant MPD increases, which indicate the emergence of stress in the subject. The result of applying this non-negative restriction to the MPD signal is shown in the bottom panel of Figure 6.

56

Y. Gao, A. Barreto, and M. Adjouadi

A sliding window with a width of 1200 samples is applied throughout the nonnegative MPD signal to calculate the median value within each window. The effect of this process on both the original PD signal and the non-negative MPD signal is compared in Figure 7. In this figure, it is clear that the significant increases that have been isolated in the processed MPD signal correlate closely with the occurrence of IC segments, regardless of the presence of higher illumination passages during segments IC2 and C3. It should also be noted that the same post-processing operated on the PD signal obtained directly from the eye gaze tracking system does not set apart the IC segments as clearly. Sliding Window Median Analysis of PD

AIC Output: MPD Signal 2

6

1

5

0

4

-1

0

0.5

1

1.5

2

2.5

3

3

0

0.5

1

1.5

2

2.5

3

4

4

x 10

x 10

Non Negative MPD Signal

Sliding Window Median Analysis of MPD

2

1

1

.5

0

0 C1

-1

0

0.5

IC1 1

C2 1.5

IC2

C3

2

IC3

2.5

C1 .5

3

0

0.5

IC1 1

C2

IC2

1.5

C3 2

2.5

4

x 10

Fig. 6. MPD Signal Non Negative Processing for Stress Indication

Fig. 7. Comparison of Sliding Window Median Analysis on PD & MPD

Signal Processing Results of Subject 11 3.5 3 2.5 MPD 2 Non Negtive1.5 MPD 1 Sliding 0.5 Window Median MPD 0 C1 -0.5

0

0.5

IC1 1

3 4

x 10

GSR

IC3

C2 1.5

IC2 2

C3 2.5

IC3 3

3.5 4

x 10

Fig. 8. Signal Processing Results of Subject 11

Monitoring and Processing of the PD Signal

57

A similar result is observed for data collected from Subject 11, as shown in Figure 8, where the 3 top signals (GSR, which was recorded simultaneously, MPD and Nonnegative MPD) have been shifted up to make the display clear. In this case, also, the most significant increases of the bottom trace, after the initial adaptation that precedes segment C1, occur during the incongruent segments (IC1, IC2 and IC3). This figure shows the appearance of Skin Conductivity Responses (SCRs) and overall elevation of the GSR signal during the incongruent segments. Furthermore, the bottom plot (the result of the proposed post-processing) also shows significant increases during IC1, IC2 and IC3, with minimal output on the other segments after the initial adaptation.

5 Conclusions This study implemented an H∞ time-varying adaptive algorithm in an Adaptive Interference Canceller to discount the influence of Pupillary Light Reflex from a measured PD signal, so that the output of the AIC, the MPD, with the application of a non negative constraint and sliding window median analysis, can be used as an indicator of Pupillary Affective Responses due to, for example, subject stress. This indicates that this approach might be useful for the affective assessment of a computer user, even in the presence of illumination changes. A comparison of this result with the outcome obtained by applying the same post-processing directly to the recorded pupil diameter signal, points out the advantage of the adaptive implementation, which output relatively more distinctive increases when the incongruent Stroop segments occurred. Data from other subjects who participated in our experiment have revealed similar results. These outcomes, therefore, encourage the continued exploration of adaptive processing algorithms applied to pupil diameter signals, as a non-invasive mechanism to achieve affective assessment of computer users in ordinary environments. Acknowledgements. This work was sponsored by NSF grants CNS-0520811, HRD0833093 and CNS-0426125. Ms. Ying Gao is the recipient of a Dissertation Year Fellowship from Florida International University.

References 1. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997) 2. Beatty, J., Lucero-Wagoner, B.: The Pupillary System. In: Cacioppo, Tassinary, Berntson (eds.) Handbook of Psychophysiology, 2nd edn., pp. 142–162. Cambridge University Press, Cambridge (2000) 3. Partala, T., Surakka, V.: Pupil size variation as an indication of affective processing. Int. J. of Human-Computer Studies 59, 185–198 (2003) 4. Widrow, B., Stearns, S.D.: Adaptive signal processing, pp. 337–339. Prentice-Hall, Englewood Cliffs (1985) 5. Hassibj, B., Kailath, T.: H∞ adaptive filtering. In: Proc. Int. Conf. Acoustics, Speech, Signal Processing, pp. 945–952 (1995) 6. Puthusserypady, S.: H∞ Adaptive filters for eye blink artifact minimization from electroencephalogram. IEEE Signal Processing Letters 12, 816–819 (2005)

58

Y. Gao, A. Barreto, and M. Adjouadi

7. Zhai, J., Barreto, A.: Significance of Pupil Diameter Measurements for the Assessment of Affective State in Computer Users. Biomedical Sciences Instrumentation 42, 495–500 (2006) 8. Renaud, P., Blondin, J.-P.: The stress of Stroop performance: physiological and emotional responses to color-word interference, task pacing, and pacing speed. Int. Journal of Psychophysiology. 27, 87–97 (1997) 9. Todoroki, A., Hana, N.: Luminance change method for cure monitoring of GFRP. Key Engineering Materials, 321–323, 1316–1321 (2006)

Usability Evaluation by Monitoring Physiological and Other Data Simultaneously with a Time-Resolution of Only a Few Seconds Károly Hercegfi, Márton Pászti, Sarolta Tóvölgyi, and Lajos Izsó Budapest University of Technology and Economics, Department of Ergonomics and Psychology, Egry J. u. 1, 1111 Budapest, Hungary {hercegfi,tim,ebediyen,izsolajos}@erg.bme.hu

Abstract. This paper outlines the INTERFACE methodology developed by researchers of our department. It is based on the simultaneous assessment of Heart Period Variability (HPV), Skin Conductance (SC), and other data. The objective and significance of this paper are (1) showing its capability of identifying quality attributes of software elements with a time-resolution of only a few seconds and (2) presenting its practical applicability in the evaluation phase of a real software development process. The Department of Ergonomics and Psychology at the Budapest University of Technology and Economics carried out a contract-based applied research project for the GeneraliProvidencia Insurance Co. Ltd. The Company was in the process of further developing the software used in its customer centers, and our Department contracted to assess the user interface. Both analytical and empirical usability evaluation methods were applied. In this paper, we highlight the new experiences of the INTERFACE testing methodology. Keywords: Usability testing and evaluation, empirical methods, case study, Heart Period Variability (HPV), Skin Conductance (SC).

1 The Frame of the Project During the last decades, the Department of Ergonomics and Psychology at the Budapest University of Technology and Economics developed mutually useful relationships with some industrial partners successfully accomplishing various Research & Development (R&D) projects 4. The Genarali-Providencia Insurance Co. Ltd. as a member of the Generali Group is owned by the Generali PPF Holding that has 9 million customers in Central and Eastern Europe. From these 9 million customers, 1.2 million are given by the Genarali-Providencia Insurance Co. Ltd. Two kinds of different core systems are used by the Genarali-Providencia Insurance Co. Ltd. to keep their data: one for the life insurance (it is called SYNPAC), and another (it is called VIAS) for the non-life insurance, e.g. liability insurance, Casco insurance, etc. The management started to establish a new system that can visualize the data of both core systems. To carry it out, the IT department of the company started to develop a new frame system called Genesys. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 59–68, 2009. © Springer-Verlag Berlin Heidelberg 2009

60

K. Hercegfi et al.

This software is used by approximately 1500 users. From these 1500 users, 300 users use it daily. The main part of the users work in personal customer centers. One smaller part of the users work as call center operators. It is a smaller group of the users, however, the time pressure in their job underlines the importance of the usability factors. The Department of Ergonomics and Psychology at the Budapest University of Technology and Economics latched on to the software development to carry out the usability assessment. The main goal of our project was to collect problems of the user interface (UI). From these misses we made a list that we gave it to the developers. Developing solutions for the found problems can be a next project. Other goal of the Company was supporting a scientifically interesting research project transferring results of foundational researches to really applicable methodology in real software development process.

2 Applied Methods The preparation of the project started in November 2007. The contract was signed in March 2008. We got in touch with the developers at the beginning of April 2008. We started our project with studying the software and collecting data and facts. While we were studying it, the system was updated, so sometimes it was difficult to be up to date. In the 2nd part of April we started to observe the users of the software, and we also made interviews with them. We performed the observation and interviews at 3 different locations: in the biggest customer service of the firm, in another type, smaller customer service, and in a call center. We obtained some logfile data to establish objective initial data for the subsequent assessment. We applied analytical methods (usability inspection methods) in May and June 2008. The backbone of the analytical evaluation was a Guideline Review, supported by Cognitive Walkthrough elements. Also GOMS model-based analysis was carried out. The main part of the assessment was a carefully planned, deep empirical series of experiments applying the INTERFACE methodology described in the following section. The series of experiment were carried out in July 2008; the analysis of the collected records was performed in August 2008.

3 Description of Our Main Methodology: The INTERFACE Figure 1 shows the conceptual arrangement of the INTERFACE (INTegrated Evaluation and Research Facilities for Assessing Computer-users' Efficiency) workstation.

Usability Evaluation by Monitoring Physiological and Other Data

61

observable behavior current screen content

data collecting and

keystrokes and mouse clicks

processing frame system

physiological signals by ISAX

Fig. 1. Conceptual arrangement of the INTERFACE user interface testing workstation

The advantage of the methodology applied in our study lies in its capability of recording continuous on-line data characterizing the user’s current mental effort derived from Heart Period Variability (HPV) and the user’s emotional state indicated by Skin Conductance (SC) parameters simultaneously and synchronized with other characteristics of Human-Computer Interaction (HCI). This way, a very detailed picture can be obtained which serves as a reliable basis for the deeper understanding and interpretation of psychological mechanisms underlying HCI. Elementary steps of HCI, like the different mental actions of users followed by a series of keystrokes and mouse-clicks, are the basic and usually critical components of using software. These steps can be modeled and analyzed by experts, but empirical studies of real users’ interactions often highlight new HCI issues or give more objective results than expert analyses. One of the key aspects of the empirical methods is measuring mental effort as it is laid down e.g. in the earlier international standard of software product evaluation (ISO/IEC 9126:1991). Hence we need methods capable of monitoring users’ current mental effort during these elementary steps. To attain the above, a complex methodology was developed earlier at the Budapest University of Technology and Economics, by Prof. Lajos Izsó and his team [3, 4, 5]. This study presents an improved methodology and a new case study. The INTERFACE simultaneously investigates the following: • Users' observable actions and behavior − keystroke and mouse events; − video record of the current screen content; − video records of users’ behavior: (1) mimics, (2) posture and gestures. • Psycho-physiological parameters − Power spectrum of Heart Period Variability (HPV), regarded as an objective measure of current mental effort – we apply this signal successfully since more than 15 years [1, 2, 3, 4]; − Skin Conductance (SC) parameters, indicating mainly the emotional reactions – recently integrated into our system.

62

K. Hercegfi et al.

A number of studies [1, 2, 3, 5, 7, 8, 9, 10] have shown that an increase in mental effort causes a decrease in the mid-frequency (MF) peak of the HPV power spectrum. The main advantage of the assessment method of the spectral components integrated into our system over the previously existing HPV-based methods is that the MF component of HPV shows changes in mental effort in the time range of several seconds (as opposed to the earlier methods with a resolution of tens of seconds at the best). This feature was achieved by an appropriate windowing data processing technique, and application of an all-pole auto-regressive model with built-in recursive Akaike's Final Prediction Error criteria and a modified Burg’s algorithm. We watch the Alternating Current (AC) component of the Skin Conductance (SC) responses focusing mainly on the emotional aspects of the HCI, in addition to our well-tried approach of mental effort. An interesting series of experiments analyzing SC responses is finished by one of our colleagues [6]. It is a good example of the promising way to use data mining techniques in empirical usability studies. Participant, operator of the Computer used Camera to Display Standard call center with standard by the record the with IP headset regularly used in the participant – facial the phone call center. The Skin according to expressions. software of the Conductance (SC) electrodes can the standard Motorized face currently call tested center tracking, be seen on the left hand, the workstation of zoomable ECG electrodes are on the torso the call center

The ISAX equipment to record the physiological signals

Camera to record the body posture

Computer of the experimenter – during the session, online curves of the physiological signals, video images of the cameras and the editor window for the comments can be seen

Fig. 2. The experimental arrangement applied during the sessions of the INTERFACE usability testing installed on a standard workstation of the call center

Usability Evaluation by Monitoring Physiological and Other Data

63

4 Applying INTERFACE in the Current Series of Experiments The empirical sequence of experiments applying the previously introduced INTERFACE assessment was performed with the call center operators in the middle of July 2008. Due to the impersonality caused by the phone calls, on one hand the simulation was more authentic than it could be in the personal customer centers. On the other hand, because of the specialty of the call center, we could count on concentrated, quick problem-solving. What more, the usability problems are the most critical in the call centers due to the time pressure. The tasks occurred in the call center are usually not solved only by means of the Genesys, but these other software are not being examined in this project. However,

The screen just seen by the user Two cameras: Facial expression and body posture

Keyboard and mouse actions

Experimenter’s comments

Upper (blue) curve: AC component of the Skin Conductance (SC). The higher deviation means more emotional event.

Signals derived from the ECG, related to mental effort. Red RR curve in the middle: periods between the subsequent heart beats in ms. Last (green) profile curve: the MidFrequency (MF) power of the variability of the RR curve. Its low values mean significant mental effort; peaks mean relief, relaxation.

Fig. 3. The INTERFACE Viewer screen with a record of the empirical test of the Genesys software. As it can clearly be seen, currently the user makes significant mental effort – it is shown by the facial expression and gesture, and the low value of the last green profile curve of the Mid-Frequency (MF) power of the Heart Rate Variability (HRV) at the cross-hair.

64

K. Hercegfi et al.

we had to give controlled, simulated tasks to the users in order that they can solve them in the Genesys system. These simulated tasks helped us to be able to compare the 12 sessions. We used the version of the Genesys from the test server that is substantially equal to the version actually used in real. 12 real operators were involved as participants, each of whom we recorded a onehour-long session with. The quantity of data gained from these sessions is really significant according to the depth of the enquiry. Due to the real life situation the users participating the series of experiments, were more or less disturbed by their colleagues’ calls or talks. Since these are the employees’ real conditions, they have got used to them, and so they could work on typical, real workstations of their workplace. These advantages gave reason not to hold this usability testing in laboratories. Nevertheless, we chose a workstation that was located in the corner of the operators’ room, in order not to disturb the others. Behind this workstation, the team leader’s glass wall can be found, and so our staff could sit and make simulated phone calls from behind this pane. As mentioned before, during the recorded sessions, the users were given tasks from the real life, with quasi-real data, names, problems, questions, etc.; the main difference was that the customers were not real customers, but from our staff. Three ECG electrodes were put on the users’ torso and one on the left hand (in case of a right-handed person) for measuring Skin Conductance. After that the users put on their headsets and adjusted their seats. Figure 2 shows the experimental arrangement applied during the sessions of the INTERFACE usability testing installed on a standard workstation of the call center. Figure 3 shows the INTERFACE Viewer screen with a record of the empirical test of the Genesys software. At the beginning of the session, we asked the users to relax for two minutes. We told them the aim and the details of the assessment. We always emphasized that we are not willing to assess they themselves, but the Genesys software by means of their help. Relaxation was followed by two-minute mental effort: mental arithmetic. The result of the counting was not important; we only wanted to generate mental effort. These periods were planned for “calibrating” the physiological curves. After that, the hard work started. The first customer (from our staff) rang the phone and asked some questions. To answer these questions the operator had to use the software under testing. Later 4 more similar calls came. Each call contained 2 to 4 questions. One part of the questions were just to “warm up”, others were really difficult. The subtasks were based on the interviews, observations and expert analyses performed earlier. The last part of the INTERFACE session was an interview.

5 Validation As it was mentioned, the periods of relaxation and mental arithmetic were planned for “calibrating” the physiological curves. The curves shown in the upper part of the Figure 4 were recorded during session #11, the ones shown at the bottom were recorded during session #10.

Usability Evaluation by Monitoring Physiological and Other Data

relaxation

65

mental arithm.

relief

relaxation

mental arithm.

relief

Fig. 4. The typical pattern of the relaxation and mental arithmetic in cases of two participants

In both cases, the three curves are the blue curves of the AC of Skin Conductance (SC), the red RR curves (Heart Periods), and the green profile curves of the MidFrequency (MF) power of Heart Rate Variability (HPV). The blue curve of AC of SC is relatively smooth during both the relaxation and the mental arithmetic. During these sections, there are not any emotional peaks, and these two participants can be characterized as “stabile” type according to the typology of physiology. However, the beginnings and the ends of the sections are followed by peaks. During relaxation, the MF component of the HPV increases, then the red RR curve has zigzags, and the green profile curve is relatively high. (In case of perfect relaxation, the profile curve should be consistently high. However, this is not expected in this experimental situation. The curve can be considered as high, especially in comparison with the next section.) During mental arithmetic, the red RR curve gets smoother, and the green profile curve is significantly low. After the “calibration” tasks, the participants really relieve. During this short period of relief, the participants get more relaxed than during the conscious, intended relaxation: the green curves have their highest peaks here.

66

K. Hercegfi et al.

Mean of MF power of HPV [ms2]

These “calibration” tasks prove a validation of our method. The values of the MF power of the HPV were significantly higher during relaxation than during mental arithmetic. A non-parametric statistical method, the Wilcoxon Signed Ranks Test proves the difference (sig. 0.037 – Figure 5).

100,0

80,0

60,0

40,0

20,0

0,0 relaxation

mental arithm.

Fig. 5. Validation of measuring Mid-Frequency (MF) power of Heart Rate Variability (HPV) as an indicator of mental effort: the MF power of HPV was significantly higher during relaxation than during mental arithmetic (sig.0.037)

It is a significant difference, in spite of the non-perfect relaxation. However, the mental arithmetic task works better: the significance of the difference between the values of MF power of HPV during mental arithmetic and in general, during the whole software usage section is better: the Wilcoxon test results sig. 0.002. The values of the deviation of the AC component of Skin Conductance (SC) do not differ during the relaxation and the mental arithmetic significantly. As it was described earlier, this is the expected result. However, the deviations of the AC of SC during the relaxation and the mental arithmetic are significantly lower than in general, during the whole software usage section: the Wilcoxon tests result sig. 0.009 and sig. 0.017. After these results, we can say that the low value of the curve of the MF power of HPV really means mental effort, and the high deviation of AC of SC probably means higher emotions. Than, in the section of software usage, we look for moments with relatively high (and unwanted) mental efforts and high (and unwanted and not positive) emotions. This method gives us the key to find the problems of the UI.

6 A Sample UI Problem Identified by the INTERFACE Methodology Commercial sensitivities prevent publication of the most of the details of the particular software problems found. However, Figure 6 gives an illustration.

Usability Evaluation by Monitoring Physiological and Other Data

67

Fig. 6. The 11th participant during the first task of the second call. The mental effort can clearly be seen: it is shown by the facial expression and gesture, and the low value of the last green profile curve of the Mid-Frequency (MF) power of the Heart Rate Variability (HRV) at the cross-hair. In this case, the problem was caused by a bad design solution to ensure choosing a time period for a list view. This problem can also be found by analytical methods – but the INTERFACE highlighted this problem and gave objective evidence for it.

7 Conclusion Based on the results presented here as well as in related papers, it can be stated that the INTERFACE methodology in its present form is capable of identifying the relative weak points of the HCI. By this methodology and the related workstation, it was possible to study events occurring during the HCI in such detail and objectivity that would not have been possible using other methods presently known to us. The sophisticated Heart Period Variance (HPV) profile function integrated into the INTERFACE system is a powerful tool for monitoring events in such a narrow time frame that it can practically be considered as a time-continuous recording of relevant elementary events. Measuring the Skin Conductance (SC) is a new opportunity to modulate the results. Acknowledgements. The authors would like to thank Dr. Eszter Láng for the earlier developments, the Generali-Providencia Insurrance Co. Ltd. for the support of the deep research, and the participants of the series of experiments for their valuable contribution.

68

K. Hercegfi et al.

References 1. Chen, D., Vertegaal, R.: Using Mental Load for Managing Interruptions in Physiologically Attentive User Interfaces. In: Proc. CHI 2004, pp. 1513–1516. ACM Press, New York (2004) 2. Hercegfi, K., Kiss, O.E., Bali, K., Izsó, L.: INTERFACE: Assessment of HumanComputer Interaction by Monitoring Physiological and Other Data with a Time-Resolution of Only a Few Seconds. In: Proc. ECIS 2006, ECIS Standing Comm., pp. 2288–2299 (2006) 3. Izsó, L.: Developing Evaluation Methodologies for Human-computer Interaction. Delft University Press, Delft, The Netherlands (2001) 4. Izsó, L., Hercegfi, K.: HCI Group of the Department of Ergonomics and Psychology at the Budapest University of Technology and Economics. In: Ext. Abstracts CHI 2004, pp. 1077–1078. ACM Press, New York (2004) 5. Izsó, L., Láng, E.: Heart Period Variability as Mental Effort Monitor in Human Computer Interaction. Behaviour Information Technology 19(4), 297–306 (2000) 6. Laufer, L., Németh, B.: Predicting User Action from Skin Conductance. In: Proc. IUI 2008, pp. 357–360. ACM Press, New York (2008) 7. Lin, T., Imamiya, A.: Evaluating Usability Based on Multimodal Information: An Empirical Study. In: Proc. ICMI 2006, pp. 364–371. ACM Press, New York (2006) 8. Mulder, G., Mulder-Hajonides van der Meulen, W.R.E.H.: Mental Load and the Measurement of Heart Rate Variability. Ergonomics 16, 69–83 (1973) 9. Orsilia, R., Virtanen, M., Luukkaala, T., Tarvainen, M., Karjalainen, P., Viik, J., Savinainen, M., Nygard, C.-H.: Perceived Mental Stress and Reactions in Heart Rate Variability – A Pilot Study Among Employees of an Electronics Company. International Journal of Occupational Safety and Ergonomics (JOSE) 14(3), 275–283 (2008) 10. Rowe, D.W., Sibert, J., Irwin, D.: Heart Rate Variability: Indicator of User State as an Aid to Human-Computer Interaction. In: Proc. CHI 1998, pp. 18–23. ACM Press, New York (1998)

Study of Human Anxiety on the Internet Santosh Kumar Kalwar and Kari Heikkinen Lappeenranata University of Technology, Department of Information Technology, Lappeenranata, Finland [email protected], [email protected]

Abstract. In this paper a conceptualization of human anxiety on the Internet is introduced; it is built on the understanding of human behavior with regard to technology. The objective of this paper is to conceptualize the human anxiety. An integral part of understanding is an inter-disciplinary (psychology science, cognitive science, behavioral science and communication technology) literature review, of which and overall summary is presented. The understanding is conceptualized by designing, implementing and evaluating through a developed user study model. In this paper the preliminary result of utilizing the developed user study found seven particular anxiety areas which need further studies. Keywords: Human, study, anxiety, internet.

1 Introduction Twenty-seven years ago, Internet was like a small dog at the bottom of the application pile, fighting for recognition, today, vast numbers of people are using Web Services through the Internet. One of the fascinating things about the Internet is that it comes with outstanding tools for usage in regards to recreation, education, society and much more. Current estimated statistics have shown that more than 63 million humans use the Internet in homes and the importance of the Internet in daily activities is growing. There is little which cannot be accomplished from the comfort zone of home; paying bills, research work, buying, selling, updating, uploading, downloading, shopping and, most importantly, communicating with your family and friends half way across the world. Since the establishment of the WWW, the number of Internet users has grown from an estimated 16 million in 1995 to more than 500 million in 2002 – explosive growth to say the least [1]. The rather recent but enormous and rapidly progressing emergence of the Internet has not gone unnoticed by scholars working in many different fields of research [1]. The Internet has become both an object of study and a tool of research. A basic definition of the term host in the case of a computer network is a computer connected to the Internet. The number of hosts connected to the Internet has grown rapidly. In 1983, when the Transmission Control Protocol/Internet Protocol (TCP/IP) standard was first adopted there were only about 250 hosts connected to the Internet. By late 1993, figure had grown to two million. During 2005, the number of hosts on the Internet had increased to 317.6 million. In recent years, the number has J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 69–76, 2009. © Springer-Verlag Berlin Heidelberg 2009

70

S.K. Kalwar and K. Heikkinen

skyrocketed to 541 million [2]. The fact and digital data from various sources confirms that number of humans accessing the Internet is growing at a rapid pace. As the Internet evolves in terms of number of human online, it feels as if it is evolving as social community. Worldwide, the Internet population is growing at a rapid pace. The number of people getting access to information, learning, and going online is booming like never before. It should be remembered that as late 1988, only a few countries were connected to the Internet. According to the 2004 CIA World Fact book, over 50 countries have at least one million humans using the Internet [3]. Humans once found it difficult and expensive to communicate during the times of voice telephones but with the rapid technological development communication has improved drastically. The distance has been shortened from family, friends and from seeking information which can be seen as the replacement of very important daily interactions. Boundaries of time, distance and identity are broken by the transfer of simple applications like e-mail to the complex world of virtual communities. Together with the positive growth, its negative effects are growing too. According to the U.S. Department of Justice, the Internet is an anonymous and effective way for many predators to find and groom children for illegal activities [4]. The fear of using the internet is further amplified by social disintegration, psychological and cognitive implications. 1.1 Objective and Scope The main research question is, “how can we address the challenges such as Internet addiction, psychology and human computer interaction it is currently facing now?” To understand the objectives, some of the questions were researched in details. The paper aims to find answers to the following hypotheses: • • • • •

Do users shows increased or reduced anxiety level when using the Internet? What kinds of behaviors are shown when using the Internet? What is the role of the content? Finding types of the anxiety behaviors? How human process information at the internet interface?

The scope of the work includes design, implement and evaluate methods used in understanding the behavior of humans using the Internet technology. It has wide range of scope from the field of psychological perspective to cognitive science, behavioral science and communication technology. The scope is complex. However, To study human and their tasks and how to relate information to design style, human behavior theories, standards, procedures or guidelines in order to build an appropriate model of interaction with the help of some existing methods is investigated. 1.2 Internet Anxiety A lot has been written in past about the negative use of the Internet anxiety, Internet addiction and full dependence on the internet is welling up [5] [6] [7]. The service disruption because of network faults, software bugs, administrator mistakes and version upgrade could seem less tolerable. Millions of human around the world use Internet to search, inform, find, communicate, work and play. Internet should not be

Study of Human Anxiety on the Internet

71

only viewed as negative such as addiction and pathological nor should it be vilified. One must be aware of negative consequences of overuse of the Internet by understanding the behavior of themselves and from others. Four types of Internet anxiety was identified by Presno. C,, using qualitative study method [8]. These are: Internet terminology anxiety: anxiety produced by an introduction to a host of new vocabulary words and acronyms. Net search anxiety: anxiety produced by searching for information in a maze-like cyberspace. Internet time delay anxiety: anxiety produced by busy signals, time delays, and more and more people clogging the Internet. General fear of Internet failure: a generalized anxiety produced by fear that one will be unable to negotiate the Internet, or complete required work on the Internet. Additional three areas of the Internet anxiety from the qualitative study were found in this research. Experience anxiety: an anxiety produced by lack of concentration or focus. Usage anxiety: a generalized anxiety produced by excessive usage of the Internet. Environment and attraction anxiety: anxiety produced by content on the Internet. For example: interactive game, pornography and larger number of colorful applications. Table 1. Types of Anxiety recorded for Subject I and Subject II Types of anxiety Terminology Search Time delay General fear Experience Usage Environment & attraction

p1

p2

p3

x x x x x

x x x x x

p4 x x x x x x

p5

p1 x

x x

p2 x

p3 x x

x x

x

x x x

x x x

x x

x x x x

p4

p5

x x x

x x x x x

x

2 Inter-disciplinary Literature Review Sociability is important as well as usability of applications in the Internet. While usability is concerned with making sure that the application, software and system is consistent, predictable, and easy and satisfying to use, sociability and the social aspect of building and maintaining an online community focuses on processes and styles of social aspect in interaction that support human behavior on the internet to some extent. Research has shown certain social groups to be under-represented on the Internet [9] [10] not simply because of a lack of access, but more because of cognitive, motivational and affective factors [11]. Psychology therefore has an important role in advancing the understanding of why humans choose to use or not to choose the usage of the Internet [12]. There will always be an argument to model psychology with technology or technology with psychology however; combining psychology with technology will give rise to new technology called psycho technology. To understand the Internet technology in broader ways, interaction between human and the technology through the Human Computer Interaction becomes essential. Brain Computer

72

S.K. Kalwar and K. Heikkinen

Interface techniques are only studied in this research work, no practical implementation was carried out due to the limitation of resources. In BCI, skill developed by a human involves proper control of electro physiological signals which are easily adapted and modulated by the brain for better feedback. In understanding human anxiety, BCI techniques could be used for testing and analyzing different activities on the Internet. The ethnographic study of the Internet can be divided into two categories. First, user-based and second, content based. User-based analysis is about the investigation, examination and the study of humans using the Internet. Whereas, content based analysis is mainly focused on text. Humans are capable of providing reasons to support their points of view if asked a question such as “What color is my shirt?” and are capable of knowing without explicit Deduction or reasoning answers to questions like, “If I were you, I would hate myself”, whereas computers cannot function without specific programming instructions.

3 Methods, Design and Implementation The user study model used both qualitative and quantitative research methods. The qualitative research was conducted using interviews and observational analysis. The quantitative research was conducted in three iterations by using questionnaires and surveys. The Questions were analyzed with the help of different types of questionnaires being constructed. Intentionally, Participants were given very general types of question to answer. Two types of Questionnaires were used: Using the Internet and Using the Pen and Paper method. Task calculation was carried out by dividing the task into subtask. Two types of subject were categorized based on skill level (novice, intermediate and expert human): Subject I and Subject II. Usability Test recording form was used for each participant. This form was used to record both the verbal and non-verbal behaviors of humans using the Internet. The task was chosen which was based on the

Fig. 1. The user study model implemented to test behavior of humans using the Internet

Study of Human Anxiety on the Internet

73

Internet. The task contained three modules which were divided into task 1, task 2 and task 3 based on the level of difficulties. It was found from the task analysis that humans using the Internet took less than a second to complete all of these tasks. There is a division of the participant’s overt behavior into two major general categories: verbal and nonverbal. Verbal behavior includes anything a participant says and Nonverbal behaviors include various activities that the participants actually do. In non verbal behavior mostly facial expression such as smiling, looks of surprise, furrowing brow or showing body gestures such as leaning close to the screen, rubbing the head is shown by the participants these are facial expression, eye-tracking, pupil diameter, skin conductance among others.

4 Results and Discussion The goal of the study was to determine “how can we address the challenges such as Internet addiction, psychology and human computer interaction it is currently facing now?” In order to evaluate the main research question, the main research question was broken down into hypothesis. There were five different hypothesis formulated in the beginning of the research. Now, Let us try to discuss these hypotheses to see our method, design, evaluation and analysis of the research was supported (fully supported, partially supported and not supported) or not. H1: Do users shows increased or reduced anxiety level when using the Internet? Hypothesis 1 was fully supported. Human shows the sign of increased level of anxiety when using the Internet. It appears that, with any given task on the Internet number of anxiety increase in human. More number of participants said “yes” to five or more items from the QS 2, which indicates problematic Internet Usage. Using HADS and PHQ-9 it was observed that, one participant seemed to have a Case of higher depression scale. Therefore, in this particular case users showed increased anxiety level when using the Internet. H2: What kinds of behaviors are shown when using the Internet? Hypothesis 2 was fully supported. Literature review revealed that there are two types of behavior shown by human using the Internet: Verbal and Non verbal. During Observation of behavior for the participants, Most of the times humans were laughing, smiling, drumming their fingers on the table and looking aimlessly around. These behaviors pattern was verbal and non-verbal. These types of the behavior patterns were observed among narrowly selected group of the participants. H3: What is the role of the content? Hypothesis 3 was partially supported. Role of content could determine the predicted or unpredicted human behavior on the Internet. Such as addiction, anxiety and stress of using the Internet, since humans were successfully able to complete the task with ease, it could be predicted that-any sort of given task for humans is very easy to do on the Internet. Therefore, the role of content has principal impact on how human behaves on the Internet.

74

S.K. Kalwar and K. Heikkinen

H4: Finding types of the anxiety behaviors? Hypothesis 4 was fully supported. It was found that in this particular case there are seven main area of anxiety in the humans: Internet terminology anxiety, Internet search anxiety, Internet time delay anxiety, general fear of Internet failure anxiety, experience anxiety, usage anxiety, and environment and attraction anxiety. Using Observation methodology and Comparing two types of subjects: Subject I and Subject II, it concludes that- all the participants showed these above cited anxiety. H5: How human process information at the internet interface? Hypothesis 5 was partially supported. When human interacts with the internet interface, it appears that, everything that human senses such as sight, hearing, touch, smell and taste are processed as the information in the mind. This information could result in behavior such as verbal and non-verbal. Even if behavior initially disappears, it may partially return as undamaged parts of the brain reorganize their linkages. The human behavior in totality of processing information includes internal cognitive processes which can result in observable behavior. Processing of the information at the internet interface has the realistic approach such as thinking of mental processes as several railroad lines that all feed to same terminal. Two schools of thought have emerged which confirms with the hypothesis that, there are two types of the behavior while using the Internet: Verbal and Non-Verbal behavior. In more general terms human using the Internet can use the content available on the Internet in two different ways: Positive or Negative. The gestures or types of human behavior shown could lead to anxiety. Seven major types of anxiety were studied and validated: Internet terminology anxiety, Internet search anxiety, Internet time delay anxiety, and general fear of Internet failure anxiety, experience anxiety, usage anxiety, and environment and attraction anxiety. Two types of behavior (verbal and non-verbal) were formulated from relevant literature study, empirical analysis and evaluation. The study of Brain Computer Interface concludes that signal could be send in human brain physically to control and observe behavior of humans, however the BCI techniques were not used in the study and without using BCI techniques, the study conducted discovered sample of humans showing increased level of anxiety when using the Internet. The task completion behaviors of humans were calculated. By the end of this discussion session, We can reach to the conclusion that, to reduce Internet anxiety, addiction and depression scale on the Internet, it is important to have many multicultural experiences, and control over own self behavior to accumulate successful experiences of behavior.

5 Conclusions and Future Work Taking the results and discussion to the logical conclusion it appears to the authors knowledge that, “Internet has lulled humans with the sense of dependency to greater extent”. Five types of hypothetical questions were answered in this study. These questions were: Do users shows increased or reduced anxiety level when using the Internet? What kinds of behaviors are shown when using the Internet? What is the role of the content? Finding types of the anxiety behaviors? , And How human process information at the internet interface? Seven major types of anxiety were studied

Study of Human Anxiety on the Internet

75

and validated: Internet terminology anxiety, Internet search anxiety, Internet time delay anxiety, and general fear of Internet failure anxiety, experience anxiety, usage anxiety, and environment and attraction anxiety. Two types of behavior (verbal and non-verbal) were formulated from relevant literature study, empirical analysis and evaluation. The study of Brain Computer Interface concludes that signal could be send in human brain physically to control and observe behavior of humans, however the BCI techniques were not used in the study and without using BCI techniques, the study conducted discovered sample of humans showing increased level of anxiety when using the Internet. From the book by Sir Tony Hoare, The first passage in Communicating Sequential Processes reads, “Forget for a while about computers and computer programming, and think instead about objects in the world around us, which act and interact with us and with each other in accordance with some characteristic pattern of behavior”. The same idea is followed in the study of human anxiety on the Internet. • Large sample size, different demography structure and discovery of perfect user study model are needed, for larger impact and generalization. • In contrast to several findings of negative effect in the Internet addiction, anxiety and depression group, some positive effects could be determined in future by building the framework for future learning through Imagination, Investigation and Innovation. • In depth analysis and comparisons of the human brain and the network open System Interconnection (OSI) model could be performed. Despite the above limitations, undoubtedly the Internet has provided a collection of applications that is having a profound effect on mankind. Like the wheel, the plow, and steam power before it, it is a proving a truly differentiating tool in our world, changing the very ways in which we interact with each other. Progress is relatively easy to recognize if we follow technology exploration. A more challenge is to find technology we want to change ourselves and for our civilization. Understanding the human factors for the design and development of the technology, system and services to ensure successful and perfect application environment is major concern. The forms of anxiety identified suggest areas for future Internet development and research.

References 1. ISOC.org (2008), http://www.isoc.org/internet/history/brief.shtml 2. ISC.org. Millions of host on the internet (2008), https://www.isc.org/ 3. Central Intelligence Agency. The DigiWorld in the global economy. In: DigiWorld 2008 (2008), https://www.cia.gov/library/publications/ the-world-factbook/geos/xx.html; https://www.cia.gov/library/publications/ the-world-factbook/fields/2184.html 4. Golden, S.M.J.: Protecting children in the internet age (2008), http://www.senate.state.ny.us/sws/ Protecting%20Children%20in%20the%20Internet%20Age.pdf

76

S.K. Kalwar and K. Heikkinen

5. Kraut, R.E. (2008), http://www.cs.cmu.edu/~kraut/RKraut.site.files/articles/ Bessiere06-Internet-SocialResource-DepressionL.pdf 6. Chou, C.: Incidence and correlates of internet anxiety among high school teachers in taiwan. Computers in Human Behaviour 19, 731–749 (2003) 7. Skinner, B.F. (2008), http://www.bfskinner.org/aboutfoundation.html; http://www.bfskinner.org/f/Science_and_Human_Behavior.pdf 8. Presno, C.: Taking the byte out of internet anxiety: Instructional techniques that reduce computer/internet anxiety in the classroom. J. Educ. Comput. Res. 18, 147–161 (1998) 9. Jackson, L.A.: Social psychology and the digital divide. In: The 1999 Conference of the Society for Experimental Social (1999) 10. Sax, L.J., Ceja, M., Teranishi, R.T.: Technological preparedness among entering freshmen: the role of race, class and gender. Journal of Educational Computing Research 24, 363– 383 11. Jackson, L.A., Ervin, K.S., Gardner, P.D., Schmitt, N.: Gender and the internet: Women communicating and men searching. Sex Roles 44(5), 363–379 (2001) 12. Jackson, L.A., Ervin, K.S., Gardner, P.D., Schmitt, N.: The racial digital divide:Motivational, affective and cognitive correlates of internet use. Journal of Applied Social Psychology 31, 2019–2046 (2001)

The Research on Adaptive Process for Emotion Recognition by Using Time-Dependent Parameters of Autonomic Nervous Response Jonghwa Kim1, Mincheol Whang2, and Jincheol Woo1 1

Dept. of Computer Science, Sangmyung University, 7 Hongji-dong, Jongno Gu, Seoul, Korea {rmx2003,mcun}@naver.com 2 Dept. of Digital Media Technology, Sangmyung University, 7 Hongji-dong, Jongno Gu, Seoul, Korea [email protected]

Abstract. This study is to propose new method, called by TDP (time dependenet parameter) anlaysis, of physiological signal processing for emoiton recognition. TDP consised of delay, activation, half recovery and full recovery. TDP was determined from running average and normalization of physiological signals for finding tonic and phasic reponse according to emotion at entire time range from stimulating emotion to recovery. As the results of this study, TDP analysis and adaptive TDP analysis enhanced accuracy of emotion recognition in the comparison with tonic analysis. Speciallly, TDP analysis enhanced the accuracy while adaptive TDP analysis reduced the individual difference of the accuracy. Keywords: Physiological signal, GSR, ECG, PPG, Skin temperature, emotion recognition, accuracy.

1 Introduction Human emotion has been tried to be recognized by physiological measurements based on assumption that emotion was expressed by them[1]. Indentifying physiological signals to subjective emotion has been main issue of emotion recognition research. However, since the signals were vulnerable to be affected by noise, emotion has had recognition error. One of methods avoiding noise has been running average at narrow time interval of incoming signal. This signal process has been effective to treat noise in autonomic response having relatively low frequency of physiological signals [2-4]. Physiological signals have been analyzed into tonic level and phasic response which were important variables according to emotion response[5]. Data treatment should be considered to discriminate level and reaction from the signal. Physiological signal has contained information of both spontaneous and non-spontaneous response. Phasic response could provide non-spontaneous overall oscillation and specific continuous level according to stimulus[5]. Recently, normalization of stimulus state from reference state has been enabled to count both tonic level and phasic response for emotion recognition [6-8]. However, since response before and after stimulus according to time variation was not examined enough, non-specific phase response of physiological signals has less clarified to cause low accuracy of emotion recognition. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 77–84, 2009. © Springer-Verlag Berlin Heidelberg 2009

78

J. Kim, M. Whang, and J. Woo

Physiological response has been characterized by individual difference. Same emotion could be recognized by different regulation of physiological response. Therefore, the strategy or rule of emotion recognition should consider individual characteristics. Some findings showed that algorithm of emotion recognition set physiological variation automatically based on verification of subjective emotion and that this process enhances the accuracy of emotion recognition[9]. Therefore, emotion could be well recognized by considerations of noise reduction, discrimination between tonic level and phasic response of physiological signals, and individualization. Considering these issues, this study is to suggest new analysis methods of physiological signals, called by TDP (time dependent parameter) and to attempt to show effectiveness of emotion recognition when it is used.

2 Method 2.1 Research Purpose This study is to propose new analysis methods for physiological response and to prove that this method was effective for emotion recognition. This research was proceeded to compare the accuracies of emotion recognition from different methods which were tonic analysis, TDP analysis, and adaptive TDP analysis. 2.2 Definition of TDP TDP (Time dependant parameter) of physiological response was defined in this study as shown in Fig. 1. The delay was moment difference between stimulation and the activation. The activation meant time at peak form the beginning. The half recovery indicated the time at half peak and the full recovery did the time back to base state. The full recovery could be inferred from half recovery when it could not be measured. In this study, ECG (Electrocardiogram), RSP (Respiration), PPG (photoplethysmogram), GSR (Galvnic skin resistance) and SKT(Skin temperature) were processed to construct TDP curve as shown in Fig. 1.

Fig. 1. TDP (time dependent parameter) of physiological measurement

The Research on Adaptive Process for Emotion Recognition

79

2.3 Emotion Induction Emotion was tried to be induced by image pictures. The images were chosen by previous study [8]. 100 university students (33 females and 67 males) participated, and were not visually handicapped. Participants were asked to score subjective emotion after watching the images. Significant images of emotion induction were categorized into 2 dimensional emotion model[10]. 6 images were selected for evoking unpleasantness-arousal emotion and 10 for pleasantness-relaxation emotion as shown Fig. 2. The pleasantness-relaxation was called as the positive emotion, and the unpleasantness-arousal was called as the negative emotion in this study.

Fig. 2. The images evoking emotions

2.4 Experiment 4 university students (average 26.5 years old) participated in the experiments and were healthy with no problem of vision. 24 prepared images were presented to participants for inducing emotion. PPG, GSR, RSP and SKT were measured during presenting images. Experiment procedure was shown as fig. 3. Participant experienced first non-image state as a reference state for 30 seconds followed by presenting the image for 10 seconds. Then, non-image state called by neutral state was presented for 30 seconds. A procedure consisted of presenting 4 images and a participant experienced 6 times a procedure. The experiment took 190 seconds for a procedure and total experimental time was 1440 seconds for a participant.

Fig. 3. Experimental procedure

80

J. Kim, M. Whang, and J. Woo

3 Analysis 3.1 Data Acquisition Collected were 6 sets consisting of 4 physiological signals and subjective score of emotion. For purpose of analysis, data unit was set at 70 seconds including neutral state for 30 seconds, stimulation for 10 seconds and another neutral state for 30 seconds. Total 74 data (6 set * 4 pictures * 4 participants) were prepared for tonic analysis, TDP analysis, and adaptive TDP analysis. 3.2 Running Average for Noise Reduction and Normalization The running average has been effective to reduce noise [3]. In this research, the time interval for running averaging was determined by response rate of each signal. Time intervals of physiological signals were tried to be set for noise reduction. The determination was done by visual scanning for confirming signal stability. The time interval of GSR and SKT was set at 0.5 seconds while one of RSP was at 3 seconds. PPG was analyzed to convert HR by frequency analysis. Therefore, time intervals for PPG and HR were set 2 seconds. Running average was performed by sliding window method at the pre-determined time interval on all the physiological signals. Then, the stimulus state of physiological signal was normalized from the neutral state. This process was able to observe activation level (tonic level) of physiological signal. Normalization was determined by equation 1 and performed at every 0.5 seconds. Normalized state = (Stimulus state –Neutral state) / Neutral state

(1)

3.3 TDP Rule for Emotion Recognition TDP rule for emotion recognition was determined from previous study [8]. Visual stimulus from the prepared images induced their emotion as shown Fig 2. Then, physiological signals were analyzed into TDP and their threshold values of emotion recognition were set for constructing rule of emotion recognition as shown Table 1. Table 1 showed mean and standard deviation of physiological response time of emotion based on TDP definition. The range was defined by mean plus and minus standard deviation for recognizing respective neutral, positive and negative emotion. 3.4 Adaptive TDP Rule for Individualization Since physiological response on same emotion was individually different, TDP rule for emotion recognition needed to be developed for individualization. The process of adaptive TDP rule was shown as Fig. 4. First, emotion was recognized by nonadaptive TDP rule. Second, difference between measured subjective emotion and emotion by physiological signals was calculated. Then, if difference existed, TDP rule was adaptively set by individual input of subjective emotion. Otherwise, emotion recognition was performed. Through these processes, it became individual and accurate adaptively for a particular person.

The Research on Adaptive Process for Emotion Recognition

81

Table 1. The TDP rule (unit: seconds) t1(Delay) t2(Activation) t3(Half Recovery) Neg. Neu. Pos. Neg. Neu. Pos. Neg. Neu. Pos. 1.55 1.92 4.08 5.79 1.57 3.08 HR ±0.92 ±1.00 ±2.91 ±2.31 ±0.17 ±1.40 15.85 9.61 20.00 5.60 5.83 16.00 SKT ±10.00 ±0.40 ±9.21 ±5.09 ±0.74 ±6.57 3.67 1.00 RR ±1.80 ±0.05 2.65 1.56 3.69 0.79 4.51 GSR ±1.16 ±0.40 ±1.95 ±11.06 ±6.55 PPG 2.25 2.43 3.90 4.37 3.45 8.00 1.75 6.03 amp ±2.74 ±0.31 ±2.91 ±1.97 ±1.48 ±4.93 ±12.06 ±6.03

t4(Recovery) Neg. Neu. Pos. 2.85 1.44 1.92 ±1.20 ±0.24 ±1.68 -

-

-

3.33 ±2.50

1.00 ±0.05

-

-

-

-

-

-

Neg.: Negative, Neu.: Neutral, Pos.: Positive

Fig. 4. The process of adaptive TDP rule for individualization

4 Result Results showed the accuracies of emotion recognition from three different methods such as tonic analysis, TDP analysis and adaptive TDP analysis as shown table 2-5. The accuracy was determined by match rate between subjective emotion and emotion determined by physiological signals. Tonic response rule was determined from previous research of autonomic response pattern for emotion[5, 11]. If negative emotion was evoked, electrodermal and cardiovascular response were increased and thermal response was decreased and vice versa [11]. As shown in Table 2, the accuracy of emotion recognition was about 62% in negative emotion recognition and about 50% in positive emotion. In the same condition, the accuracy was enhanced up to 60% when TDP analysis was used as shown in Table 3. It was about 70% in recognition of both positive and negative. Therefore, the accuracy of emotion recognition by TDP analysis could observe responses more than one by tonic analysis. There were findings for individualization of emotion recognition as shown Table 5. Adaptive TDP rule made the accuracy enhanced little more. Interestingly, the participants having the accuracy lower than 70% increased up to 70% or more. Therefore, adaptive TDP analysis could be effective to increase accuracy of a particular person who was low. Figure 5 showed overall accuracies from three analyses. TDP analysis showed improvement of accuracy but adaptive TDP did not much. Since accuracy improved in individual showing low, overall accuracy did not contribute improvement.

82

J. Kim, M. Whang, and J. Woo Table 2. Accuracy of emotion recognition from tonic analysis

participants

Negative emotion Emotion by Subjective physiological emotion signals

Positive emotion Emotion by Subjective physiological emotion signals

Accuracy

A

8

5

10

6

61.1%

B

6

3

9

4

46.7%

C

7

4

8

4

53.3%

D

8

6

9

4

58.8%

Sum

29

18

36

18

Accuracy

62.1%

55.0%

50.0%

Table 3. Accuracy of emotion recognition by TDP rule

Participants

Negative emotion Emotion by Subjective physiological emotion signals

Positive emotion Emotion by Subjective physiological emotion signals

Accuracy

A

8

7

10

7

77.8%

B

6

3

9

6

60.0%

C

7

4

8

7

73.3%

D

8

6

9

5

64.7%

Sum

29

20

36

25

Accuracy

69.0%

69.0%

69.4%

Table 4. Accuracy of emotion recognition by adaptive TDP rule

Participants

Negative emotion Emotion by Subjective physiological emotion signals

Positive emotion Emotion by Subjective physiological emotion signals

Accuracy

A

8

6

10

7

72.2%

B

6

4

9

6

66.7%

C

7

4

8

7

73.3%

D

8

6

9

6

70.6%

Sum

29

20

36

26

Accuracy

69.0%

72.2%

70.1%

The Research on Adaptive Process for Emotion Recognition

83

Fig. 5. Accuracy comparison of three analyses such as tonic, TDP and adaptive TDP analysis

5 Conclusion TDP and adaptive TDP were newly proposed to analyze physiological response for emotion recognition. The methods were successful of enhance its accuracy. The adaptive TDP was effective to count individual difference of physiological response. Comparing accuracies of emotion recognition among tonic analysis, TDP analysis, and adaptive TDP analysis, this study concluded the followings. First of all, TDP analysis enhanced the accuracy more than tonic analysis. Accuracy by tonic analysis was an average of 55%, and one of TDP analysis was of 69%. Second, the accuracy of adaptive TDP analysis reduced the individual difference. In the case of analysis by using TDP rule, the accuracy of individual difference was between 60% and 77% while in the case of adaptive TDP rule, the accuracy of individual difference was between 66.7% and 73.3%. Results showed adaptive TDP could enhance accuracy that was relatively low but work less for a participant whose accuracy was already high. Therefore, TDP and adaptive TDP method may be useful of emotion recognition and observe detail significant response of physiological response.

References 1. Christine, L., titia, L., Fatma, N.: Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP J. Appl. Signal Process., 1672–1687 (2004) 2. Allanson, J., Fairclough, S.H.: A research agenda for physiological computing. Interacting with Computers 16, 857–878 (2004) 3. Haag, A., Goronzy, S., Schaich, P., Williams, J.: Emotion recognition using bio-sensors: First steps towards an automatic system. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS, vol. 3068, pp. 36–48. Springer, Heidelberg (2004)

84

J. Kim, M. Whang, and J. Woo

4. Mandryk, R.L., Atkins, M.S.: A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies. International Journal of HumanComputer Studies 65, 329–347 (2007) 5. Boucsein, W.: Electrodermal Activity. Plenum Press, New York (1992) 6. Whang, M.: The emotional computer adaptive to human emotion. Phillips Research: Probing Experience 8, 209–219 (2008) 7. Whang, M., Lim, J., Boucsein, W.: Preparing computers for affective communication: a psychophysiological concept and preliminary results. The Journal of the Human Factors and Ergonomics 45, 623–634 (2003) 8. Kim, J., Whang, M., Kim, J., Woo, J.: The study on emotion recognition by timedependent parameters of autonomic nervous response. Korean Journal of the science of emotion & Sensibility 11, 637–644 (2008) 9. Fredrickson, B.L., Losada, M.F.: Positive Affect and the Complex Dynamics of Human Flourishing. American Psychologist 60, 678–686 (2005) 10. Russell, J.A.: A circumplex model of affect. Journal of personality and social psychology 39, 1161–1178 (1980) 11. Whang, M., Chang, G., Kim, S.: Research on Emotion Evaluation using Autonomic Response. Korean Journal of the science of emotion & Sensibility 7, 51–56 (2004)

Students’ Visual Perceptions of Virtual Lectures as Measured by Eye Tracking Yu-Jin Kim, Jin Ah Bae, and Byeong Ho Jeon Dept. of Media Image Art & Technology, Kongju National University, Sinkwandong, Gongju, South Korea {yujinkim,jinabae,bhjeon}@kongju.ac.kr

Abstract. In this paper, we used eye tracking methodologies to investigate students’ visual perceptions of lectures using 3D real-time virtual studio technology. For measuring learning performance, we also gave the students multiple-choice paper quizzes at the end of the lectures. Three virtual lectures were created with different types of lecture materials (text-centered, image-centered, and lecturercentered) and 3D virtual sets (classroom, cyberspace, and lecture-theme space). Through analyzing students’ eye movements in viewing still and moving scenes of the virtual lectures, we found that layouts and movements of design elements on lecture screens significantly influenced students’ scanpaths and areas of interest (AOIs). Lecture material types affected learning performance while 3D virtual sets had no effect due to students’ inattention to the virtual background areas. We discuss effective ways to develop virtual lectures and design lecture screens for better presentation of lecture content and higher learning performance. Keywords: Virtual lectures, virtual studios, eye tracking, visual perception, learning performance, user-centered screen design.

1 Introduction Over the last two decades, advances in digital technologies brought about the emergence of new teaching pedagogies such as Computer-Assisted Instruction (CAI), Computer-Supported Collaborative Learning (CSCL), Web-Based Instruction (WBI), and others. Besides these pedagogies using computer and internet technologies, there has been growing interest in virtual studio technology, which composites images of performers shot in front of a chroma-key blue background and visual images created through real-time 3D computer graphics [1]. The virtual studio technology increases face-to-face interaction between a lecturer and students and helps the lecturer interact with various types of lecture materials in dynamic virtual environments [2]. In academia, however, little attention has been paid to the methods to create courses using virtual studios for delivering learner-centered lecture materials and enhancing learning performance. To fill this research niche, this study aims to analyze how students look at the screens of virtual lectures [3] using eye tracking technology. Eye tracking provides insight into students’ visual perceptions and cognitive processes by measuring their eye movements. We research how lecture material types and 3D virtual sets affect learning performance. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 85–94, 2009. © Springer-Verlag Berlin Heidelberg 2009

86

Y.-J. Kim, J.A. Bae, and B.H. Jeon

2 Related Work 2.1 Lectures Using Virtual Studio Technology Virtual education, by the use of cyberspace, eliminates the spatial and temporal limitations by removing the need for the lecturer and students to be present at an instructional site at a designated time [4]. Recently, the proliferation of information and communication technologies (ICTs) has spawned a boom in virtual education by creating a variety of virtual lecture content production techniques such as Flash, Web 3D, 3D real-time virtual studio, and others. Among these production technologies, several researchers have explored the effectiveness of the virtual studio application to lectures. In 2003, Morozov, Debelov, and Zhmulevskaya claimed that the application of virtual studio systems enhances the efficiency of instructional processes by providing lecturers more possibilities for diverse forms of education [5]. According to Brown and Cruickshank (2003), a virtual lecture can be more effective than classroom lecture by achieving significant savings in the costs of lecture delivery and student support, though a very large investment of time and effort is required by the lecturer [6]. Along with the previously mentioned research interests in the educational effects of the application of new technology, researchers have studied the factors affecting the delivery of lecture content in virtual education in order to increase the motivation and effects of learning. These factors include screen design [7, 8], content design [9], content suggestion style, and lecture material [10]. 2.2 Eye Tracking in Visual Information Processing Eye movements are driven both by properties of the visual world and processes in a person’s mind [11]. Therefore, many scholars have used eye tracking techniques in order to explore the relations between eye movements and visual information processing in diverse research fields. In particular, researchers in HCI have tracked eye movements to understand visual and display-based information processing, as well as to discover the factors that may impact the usability of system interfaces [12]. In fact, eye movement data provide more detailed and specific information about a user’s cognitive process in many different kinds of displays [13, 14]. They also help researchers determine the roots of some usability problems, and then come up with effective solutions. Meanwhile, the recent trend of eye-tracking studies in HCI shows that eye-tracking methods have been rapidly employed in conducting usability testing of websites. Laarni et al. (2003) tracked the eye movements of users while they were selecting and reading online news items on a small computer [15]. Cowen (2004) asked the subjects to perform two tasks on a website and measured their total fixation duration, the number of total fixations, average fixation duration, and distribution of fixations on the screen [13]. Through these measurements, he suggested effective ways to design webpages in terms of user interface. In 2007, Cutrell and Guan also measured web surfers' eye movements to observe their web search strategies [16]. Like the aforementioned studies, many researchers have analyzed the interface of and interaction with web content, mainly focusing on a sequence of still scenes of web pages.

Students’ Visual Perceptions of Virtual Lectures as Measured by Eye Tracking

87

However, there have been increasing interests in employing eye-tracking methods in moving visual images along with the proliferation of more dynamic web content, which is developed by adopting new types of web authoring tools and media, such as Flash, Web 3D, Virtual Studio, and others. 2.3 Eye Tracking Metrics The main measurements used in eye-tracking research are “fixations,” which are moments when eyes are relatively stationary, taking in or encoding information, and “saccades,” which are quick eye movements occurring between fixations [12]. From these basic measurements, a multitude of metrics are also derived [17]: (1) “gaze duration,” the cumulative duration and average spatial location of a series of consecutive fixations within an area of interest; (2) “area of interest (AOI),” the area of a display or visual environment that is of interest to the research or design team and, thus, defined by them (not by the participant); and (3) “scanpath,” a special arrangement of a sequence of fixations.

3 Hypotheses The focus of our study is the investigation of students’ visual perceptions in virtual lectures. In order to identify the patterns of students’ visual perceptions, we tracked their eye movements according to different layouts and movements of design elements on lecture screens. We also studied the effects of lecture material types and 3D virtual sets on students’ learning performance. In line with these purposes, we investigated the following research hypotheses: Hypotheses 1a. Students’ eye movements (AOIs, scanpaths) are affected by the layouts of design elements on lecture screens. Hypotheses 1b. Students’ eye movements (AOIs, scanpaths) are affected by the movements of design elements on lecture screens. Hypotheses 2a. Students’ learning performance varies according to lecture material types. Hypotheses 2b. Students’ learning performance is different according to 3D virtual set types.

4 Experiment 4.1 Virtual Lecture Prototypes In order to test the aforementioned hypotheses, we designed and conducted an experiment to analyze students’ eye movements and learning performance in the context of virtual studio-based lectures. As experimental material, three virtual lecture prototypes were created under the theme of “Media and Culture” with varying layouts and movements of lecture screen design elements (lecturer, lecture board, 3D virtual sets, text, images, and movies on the lecture board). The three prototypes (see Figure 1) were also produced with different types of lecture materials (text-centered, image-centered,

88

Y.-J. Kim, J.A. Bae, and B.H. Jeon

and lecturer-centered) and 3D virtual sets (classroom, lecture-theme space, and cyberspace): (1) Prototype 1 - text-centered lecture materials in classroom background sets; (2) Prototype 2 - lecturer-centered lecture materials in cyberspace background sets; and (3) Prototype 3 - image-centered lecture materials in lecture-theme space background sets. Figure 1 shows screen shots of these three prototypes. Three screen shots located in the first row display their 3D background sets. The other three shots in the second row illustrate how the same lecture content about the phenomena of mass communication is delivered by different types of lecture materials.

Fig. 1. Screen shots of three prototypes (left: Prototype 1, middle: Prototype 2, and right: Prototype 3)

After modeling and animating the above 3D virtual sets using 3D Studio Max 9.0., we combined these 3D background sources with the live-action footage of lecturers by a real-time chroma key technique of the "VS2000" system. In addition, we created animation buttons for controlling lecture screen design elements using the VS scripts of the "HotAction" program. 4.2 Participants Forty participants were selected for the experiment, but ten of them had problem with the calibration of the eye-tracking system. Calibration was a fine-tuning process for the experiment, so 30 participants successfully completed the experiment. The 30 subjects were divided into three groups of ten who participated in the experiment composed of three kinds of virtual lectures. The subjects were freshmen and sophomores of K University and had almost no previous knowledge of the content of the experiment lectures. They were between the ages of 18 and 23, and the gender ratio was 1:1. Twenty-four of the students had experienced distance lectures, and eight of them had been exposed to VR lectures. 4.3 Apparatus This experiment used the hardware "Eyegaze Development System" developed by LC Technologies, Inc. This device tracks the x-y coordinates of the participant’s

Students’ Visual Perceptions of Virtual Lectures as Measured by Eye Tracking

89

gazepoint on the computer screen automatically using the pupil-center-corneal reflection (PCCR) method, which directs infrared light into the eye. This system generates raw eyegaze point location data at a camera field rate of 60 Hz [18] and calculates three kinds of gazepoints: (1) a moving point that a participant looks at for only 1/60 second, then looking at other points; (2) a fixating point that a participant looks around in; and (3) a fixation completed point that a participant maintains the gaze in. This study used the software "EyeTrack v1.0" and “EyeTrackMovie v1.0” for analyzing eye movements in viewing still and moving scenes, respectively. These programs were developed by the Human Computer Interaction Laboratory (HCIL) of KAIST for the visual monitoring and analysis of eye-tracking results that were expressed in coordinates [19]. “EyeTrack” and “EyeTrackMovie” provide “Replay” and “Analysis.” "Fixation Mark" and "Color Variation" options enable the monitoring of eye-tracking data with a variety of graphic effects in chronological order. For an efficient analysis of eye movement patterns, it also provides five analysis options: (1) "shadow," which shows the area that subjects focused on and illustrates the rest in shades with gradation effects; (2) "frequent area," which marks a clear boundary between the area that the subjects focused on and the rest, highlighting the former; (3) "hotspot," which shows the amount of gaze (red meaning more gaze and green meaning less gaze); (4) "selected area," which shows the amount of gaze in a certain area and its duration in numbers; and (4) "priority order," which shows the duration of a gaze over time in circles. All participants’ eye movements could be monitored at once due to the function of the programs which can analyze multiple data simultaneously. The patterns of AOIs and scanpaths, described in Section 2.3, were discovered from these eye movements. 4.4 Experimental Design and Procedure The experiment was performed in four stages, and it took about an hour for a participant to complete the experiment: (1) questionnaires on demographics, media education level, and cyber learning experiences; (2) still scene eye-tracking with the captured scenes according to distinguishable types of screen layouts (Prototype 1: 6 scenes, Prototype 2: 9 scenes; and Prototype 3: 7 scenes); (3) moving scene eyetracking with three prototype movies; and (4) a quiz with multiple-choice questions. Eye movements were tracked twice for still and moving scenes of virtual lectures. Compared with the eye-tracking of moving scenes, the eye-tracking of still scenes allowed more detailed and accurate analysis of eye movements for the different screen layout types. On the other hand, the eye-tracking of the moving scenes enabled an additional analysis of the effects of the movements of screen layout elements on eye movements.

5 Results 5.1 AOIs According to the Layouts and Movements of Lecture Screen Design Elements Through analyzing students’ eye movements in viewing still and moving scenes of the virtual lectures, we found that layouts and the movement of lecture screen design

90

Y.-J. Kim, J.A. Bae, and B.H. Jeon

elements (the lecturer, lecture board, and 3D background set) significantly influenced students’ scanpaths and areas of interest (AOIs) [H1a and H1b]. In the case of AOIs, parts that students paid close attention to were similar in both still and moving scene tests. By pointing out the design elements that received the most attention on the sill scenes, it was found that students generally gazed into a lecturer’s face in all of the still scenes that feature a lecturer (Prototype 1: 4 scenes, Prototype 2: 5 scenes, and Prototype 3: 5 scenes) regardless of the lecturer’s size and position on the screen. Figure 2 shows the “hotspot” option analysis results of the three scenes (left: Prototype 3, middle: Prototype 2, and right: Prototype 3), where different sizes of a lecturer appear in different positions. We calculated the distribution of fixations on the face of the lecturer in the left scene of Figure 2 using the “selected area” option and found that 52.5% of the total fixations were on the lecturer’s face.

Fig. 2. “Hotspot” option analysis results (Note: The amounts of fixations gradually increased from green to red, and the reddest parts are marked in circle)

In addition, we found that the students’ points-of-regard are likely to stay not just on the lecture’s face, but also on the face of people illustrated in the lecture material images (see Figure 3). Another finding was that the student’s attention to the people whose profile or back views were shown to decrease compared with front view images.

Fig. 3. “Selected area” option analysis results (1. Front view (28%), 2. Profile view (11.9%), and 3. Back view (12.5%) of people in the image)

Students’ Visual Perceptions of Virtual Lectures as Measured by Eye Tracking

91

Fig. 4. Comparing visual attention to text areas (3. Body (40.1%) > 2. Subtitle (29.5%) > 1. Title (7.3%))

Along with the aforementioned visual attention to people’s faces in the still scenes, we also found a tendency in the moving scenes that students’ points-of regard are mostly directed at the lecturer’s face as well as the faces of people illustrated in the lecture material images. In addition, the analysis results also suggest that students’ eye movements and text sizes in lecture boards were inversely proportional. There were less eye movements and shorter gazes on larger texts. In fact, visual attention was generally highest in the body, subtitles, and title (see Figure 4). In other words, considering that titles take up more space than subtitles, but receive less visual attention, it would be effective to deliver new and important messages using subtitles. 5.2 Scanpaths According to the Layouts and Movements of Lecture Screen Design Elements In order to measure the student’s scanpaths and scanpath duration, we selected a student whose eye movement pattern can represent the visual attention of the ten students participating in the same still scene experiment and then analyzed his/her eye movement over time using the “fixation map” and “priority order” options. From the results, it turned out that the farther from the center the objects are positioned, the

Fig. 5. Scanpath tracking (left: from left to right, right: from top to bottom)

92

Y.-J. Kim, J.A. Bae, and B.H. Jeon

more likely they are to be out of the scanpath or stared at later. This analysis also verified the commonly accepted idea that people in Korean-speaking culture read words and sentences from left to right and from top to bottom (see Figure 5). On the other hand, students’ scanpaths showed different patterns in still and moving scene tests in terms of the starting point of the eye’s gaze. In the still scene test, the starting points of the students’ gazes were related to screen design element types, as well as their positions. For example, the majority of students firstly gazed into the lecturer’s face, regardless of its position, and then moved their gazes to other elements located around the center of the screens. In the moving scene test, the movement of screen design elements, rather than their type, dominantly influenced the starting points, as illustrated in Figure 6. We also found that students moved their gazes to empty parts where the next lecture content was expected to show up. Properly timed anticipation could enable the students to better perceive animated lecture content by preparing them for what will come next in the lecture.

Fig. 6. Starting points of students' gazes according to the lecture screen layout type

5.3 Students’ Learning Performance According to Lecture Materials and 3D Virtual Sets The average quiz scores for the three groups who respectively watched text-centered, image-centered, and lecturer-centered virtual lectures (12.9, 9.7, and 10.2, respectively) were significant at the alpha level of 0.05 (F=4.694). These results support H 2a, which suggests the possibility of a relationship between students’ learning performance and lecture material types. Students could easily understand and memorize lecture content in text-centered lecture material which presented the lecturer’s

Students’ Visual Perceptions of Virtual Lectures as Measured by Eye Tracking

93

explanations in clear print on the screen. While lecture material types affected learning performance, 3D virtual sets had no effect on learning performance due to students’ lack of attention to the background areas. Consequently, H 2b, which suggests a strong relationship between students’ learning performance and 3D virtual set type, was not supported. In fact, the effects of the virtual sets on the learning process are very small because students’ visual attention is usually drawn to the background only when they found some noticeable images or figures in the set.

6 Conclusions This study investigated students’ visual perceptions of virtual lectures by analyzing their AOIs and scanpaths in the still and moving scenes of the lectures. It also explored how to develop lecture materials and 3D background sets of virtual lectures for improving learning performance by testing the students’ comprehension with multiple-choice questions. Our research revealed that the following issues on virtual lecture production should be carefully considered for better presentation of lecture content: (1) the lecturer’s size and positions on lecture screens and people’s postures illustrated in lecture materials; (2) text’s sizes and positions in lecture materials; (3) movements of screen design elements (the lecturer, lecture board, animating parts on the lecture board); and (4) balanced layouts between the lecturer, lecture materials, and virtual background sets. Educators and material designers should also consider that text-centered lecture materials were the most effective for higher learning performance than imagecentered and lecturer-centered ones. Even though statistical significance was not reached between 3D background sets and learning performance, an effective background set design suited to the lecture objectives and content is necessary for enhancing students’ learning interest and motivation. Finally, it is hoped that this study will assist lecturers in understanding students’ visual information processing in virtual lectures, in designing virtual lecture content more effectively, and in improving the instructional effects of virtual lectures.

Acknowledgment We thank Kun-pyo Lee and Jung-mi Park for their comments and help in conducting our eye-tracking experiment.

References 1. Fukaya, T., Fujikake, H., Yamanouchi, Y., Mitsumine, H., Yagi, N., Inoue, S., Kikuchi, H.: An Effective Interaction Tool for Performance in the Virtual Studio-Invisible Light Projection System. NHK Science & Technical Research Laboratories, Japan (2003) 2. Dolgovesov, B.S., Morozov, B.B., Shevtsov, M.Y.: The System for Interactive Virtual Teaching Based on “Focus” Virtual Studio. In: International Conference Graphicon 2003 in Moscow, Russia (2003)

94

Y.-J. Kim, J.A. Bae, and B.H. Jeon

3. In this paper, a virtual lecture means instructional content that is created using virtual studio techniques for virtual education 4. Kuroda, K., Shanawez, H.D.: Strategies for Promoting Virtual Higher Education: General Considerations on Africa and Asia. Africa and Asian Studies 2(4), 565–575 (2003) 5. Morozov, B.S., Develov, B.B., Zhmulevskaya, M.Y.: The System for Interactive Virtual Teaching Based on “Focus” Virtual Studio. In: International Conference Graphicon, Moscow, Russia (2003), http://www.graphicon.ru/ 6. Brown, S., Cruickshank, I.: The Virtual Studio. International Journal of Art & Design Education 22(3), 281–288 (2003) 7. Kim, M.R.: Strategies on Screen Design of Learner-Centered Web-based Instructional Systems. Journal of Educational Technology 16(4), 51–65 (2008) (in Korean) 8. Shon, M., Chung, H.H.: An Analysis on the Learning Hindrance Factors in BlendedLearning Environment. Journal of Educational Information and Media 13(2), 251–276 (2007) (in Korean) 9. Ryu, I.: Factors Influencing the Effectiveness of Web-Based Distance Learning. Management Education Review 6(2), 7–27 (2003) (in Korean) 10. Kang, M.H., Gu, M.H., Moon, H.N., Jung, S.Y., Chung, J.Y., Kim, J.S.: Examining the Effects of Tutor Delivery Modes on Cognitive Presence and Learning Outcomes in Online Lectures. Journal of Educational Information and Media 13(4), 155–181 (2007) (in Korean) 11. Richardson, D.C., Spivey, M.J.: Eye-Tracking: Characteristics and Methods, 1-9. In: Wnek, G., Bowlin, G.: Encyclopedia of Biomaterials and Biomedical Engineering. Informa HealthCare (2004) 12. Poole, A., Ball, L.J.: Eye Tracking in Human-Computer Interaction and Usability Research: Current Status and Future Prospects. In: Ghaoui, C. (ed.) Encyclopedia of Human Computer Interaction, Idea group (2004) 13. Cowen, L.: An eye movement analysis of web-page usability. Masters by Research in the Design and Evaluation of Advanced Interactive Systems (2001) 14. Lohse, G.L.: Consumer Eye Movement Patterns on Yellow Pages Advertising. Journal of Advertising 26(1), 61–73 (1997) 15. Laarni, J., Isotalus, P., Kojo, I., Kärkkäinen, L.: Reading News from a Pocket Computer: An Eye-Movement Study. In: Harris, C.D., Duffy, V., Smith, M.J., Stephanidis, C. (eds.) Human-Centered Computing: Cognitive, Social and Ergonomic Aspects. The Proceedings of HCI International 2003, Lawrence Erlbaum, Mahwah (2003) 16. Cutrell, E., Guan, Z.: What Are You Looking for?: an Eye-tracking Study of Information Usage in Web Search. In: Proceedings of the SIGCHI conference on Human factors in computing systems, San Jose, California, USA (2007) 17. Jacob, R.J.K., Karn, K.S.: Eye Tracking in Human-computer Interaction and Usability Research: Ready to Deliver the Promises (Section commentary). In: Hyönä, J., Radach, R., Deubel, H. (eds.) The Mind’s Eyes: Cognitive and Applied Aspects of Eye Movements, Elsevier Science, Oxford (2003) 18. Eyegaze Development System Information, http://www.eyegaze.com 19. Park, J., Lee, K.: Eyetrack - Developing Eyegaze Analysis Visualization Software for Designers’ Use. In: KEER 2007, Sapporo, Japan, p. 10 (2007)

Toward Constructing an Electroencephalogram Measurement Method for Usability Evaluation Masaki Kimura*, Hidetake Uwano, Masao Ohira, and Ken-ichi Matsumoto Graduate School of Information Science, Nara Institute of Science and Technology {masaki-k,hideta-u,masao,matumoto}@is.nasit.jp

Abstract. This paper describes our pilot study toward constructing an electroencephalogram (EEG) measurement method for usability evaluation. The measurement method consists of two steps: (1) measuring EEGs of subjects for several tens of seconds after events or tasks that are targets to evaluate, and (2) analyzing how much components of the alpha and/or beta rhythm are contained in the measured EEGs. However, there only exists an empirical rule on measurement time length of EEGs for usability evaluation. In this paper, we conduct an experiment to reveal the optimal time length of EEGs for usability evaluation by analyzing changes of EEGs over time. From the results of the experiments, we have found that the time length suitable for usability evaluation was more than 0~56.32 seconds.

1 Introduction Existing usability evaluation methods include interview, think-aloud protocols [1], and questionnaires [2, 3, 4]. These methods are widely used for usability evaluation because they require no measurement apparatuses and allow usability experts to measure usability of software systems in a relatively easy way. However, they also have some shortcomings for usability evaluation. For instance, usability evaluation using the above methods often requires a huge amount of time to analyze and evaluate collected data. Also, the results of the analysis are sometimes hard to replicate because the collected data is based on qualitative and/or subjective evaluations from system users or subjects participating in usability testing. In order to complement the limitations of these methods, many studies try to quantitatively and objectively develop evaluation methods for measuring the mental or psychological state of users from biological information. In this paper, we focus on the electroencephalogram (EEG) of users after using software as a quantitative measurement method of usability and aim to construct an EEG measurement method for usability evaluation. Alpha rhythm and beta rhythm, which are the frequency component of EEG, change according to mental condition [4], and EEG measurement methods have features that do not disturb subjects using a computer. In general, the EEG measurement method is used as follows: * Graduate School of Information Science, Nara Institute of Science and Technology 8916-5,Takayama, Ikoma, Nara, Japan. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 95–104, 2009. © Springer-Verlag Berlin Heidelberg 2009

96

M. Kimura et al.

(1) experimenters (or usability experts) measure a subject’s EEG for several tens of seconds after the subject finishes target tasks for usability evaluation, and (2) experimenters analyze how much the measured EEG contains the alpha and beta rhythm components, which respectively indicate a comfortable or uncomfortable state of the subject’s mind. The results of the evaluation should change according to the time length to perform the EEG. After the tasks, subjects’ EEGs return to the usual condition as time passes. Hence, the time length of the analysis is too long, and the ratio of the EEG changed by the tasks decreases. Conversely, if the time length is too short, evaluation accuracy decreases because the ratio of the noise to the entire the EEG increases. However, the measurement time at EEG analysis has been decided by experiential standards. So we need to analyze EEGs by time series for usability evaluation. In this paper, we try to obtain the proper time length of EEG data.

2 Related Research In this paper, we quantitatively evaluate a psychological state of computer users during a system in-use, using alpha and beta rhythms composed of frequency components of brain waves. Power spectrums of alpha and beta rhythms, which are obtained by discrete Fourier transform, the ratio of alpha and beta rhythms in all brain waves, and beta/alpha, which is the ratio of alpha and beta rhythms, are often used as common indicators for observing the psychological state of human beings. Matsunaga et al. developed a brain wave measurement system for evaluating satisfaction of human beings and validated the hypothesis that people feel comfortable if the amount of information processing in the brain is small, while people feel uncomfortable if the amount of information processing in the brain is large [5]. In this paper, we also use the ratio of alpha and beta rhythms in brain waves and beta/alpha as indicators. These indicators are often used for studies using brain waves. So our experimental results are easy to compare with implications and insights from previous work.

3 Experiment 3.1 Overview Using Microsoft Excel 2003 and Excel 2007, which are the most popular spreadsheet software, participants performed eight kinds of tasks operating spreadsheets, and we measured participants’ EEGs after each task. Excel 2003 and Excel 2007 are different versions of software released in 2003 and 2007, respectively. Both versions have almost the same functions, but have a different look and feel in their graphical user interfaces (GUIs). Excel 2007 has a new GUI called “ribbon,” which is designed to improve task performance and user experience. However, the newly designed ribbon interface introduced users to a lot of changes to menus, tool bars, and working windows. So, even if users would like to use a familiar function such as “Save As,” they need to select a menu or button with different names and/or positions between the two versions. Furthermore, even if names and positions of menus are not changed, the design of working windows displayed after selecting a menu is often different from Excel 2003. In this way, not only new users but also existing users of Excel 2003 need to learn how to operate the new interface of Excel 2007.

Toward Constructing an EEG Measurement Method for Usability Eval.

97

Table 1. Subjects’ usage frequency of Excel 2003 and Excel 2007

never several times per year several times per month several times per week

Excel 2003 0 2 3 5

Excel 2007 6 1 2 1

Using Excel 2003 and Excel 2007, we investigate the relationship between the experiences of software and the attributes of brain waves without the effect of functional differences. In our experiment, we also analyzed the relationship between the results of subjective evaluation by questionnaire and the attributes of brain waves. 3.2 Participants Ten master’s students from the graduate school of information science participated in the experiment. Table 1 shows participants’ usage frequency of Excel 2003 and Excel 2007. All participants had experience in using Excel 2003 and understood basic operations and functions, but half of the participants had never used Excel 2007. 3.3 Task Participants performed eight tasks (four types of tasks for each version of Excel) operating the spreadsheets given in advance. Table 2 shows a list of tasks used in the experiment. All tasks can perform both Excel 2003 and Excel 2007. The content (a grade report) of the data file that is used in this experiment is the same in all tasks. Participants can continue the tasks until the task time exceeds five minutes. We counterbalanced the order of the tasks to minimize learning and fatigue effects. The following are the details of each task. Same Place Task In this task, the participant selects a particular menu that has the same name in the same position in different versions of Excel. The task is completed when the participant selects the objective menu. Different Place Task The participant selects a menu that has a different name in a different position in the two versions of Excel. This task is completed when the participant selects the objective menu. Same Interface Task The Same Interface Task consists of functions that have the same interfaces of modal dialog in different Excel versions. In the task, menu name and position were given to the participant before the task. Different Interface Task In this task, functions that have different dialog interfaces in Excel 2003 and Excel 2007 were used. As with the Same Interface Task, menu name and position were given to the participant before the task.

98

M. Kimura et al. Table 2. A task list used in the experiment

Task type Same place Different place

Same interface

Task name Open Clip Art Pane Filter Setting Display of version information Record of macros Format Cells Page Orientation

Different interface

Conditional Formatting Insert Bar Chart

Description Open clip art pane to select clip art from a list. Set options for data filtering. Display the version information of Excel. Change date formats from Mar-01 to 03/01. Change a page orientation to landscape and set margins. Change a page orientation to landscape and set margins. Indicate cells that have less than “C” or “Absence” as red font. Insert stacked bar chart of student's scores with chart/axis titles.

3.4 Environment The Emotional Spectrum Analysis System ESA-16 was employed to record EEGs of participants. After the task, we recorded participants’ EEGs for two minutes at 200Hz sampling frequency in the eyes-closed, resting condition. Electrode locations are based on the International 10-20 System, shown in Figure 1. We adapted referential derivation to observe the EEG, and used the right earlobe (A2) as a reference electrode. As a ground electrode, the center of the forehead (Fpz) was employed and the center of the parietal (Pz) was used as an exploring electrode to minimize the electromyogram (EMG) artifact. We also recorded electrocardiogram (ECG) from both arms. In addition to this, we used a headrest and elastic net bandage to secure electrodes placed on the head. Before the first task, each participant adjusted the height of chair and position of the mouse/keyboard. 3.5 Procedure The procedure of the experiment was as follows. 1. Preparation: Authors informed participant about experiment and EEG measurement. 2. Environment setting: Put the electrodes on the participant at the points described in Section 3.4, then set up the EEG analyzer. 3. Practice tasks: The participant performed two practice tasks to understand the procedure of EEG measurement. These tasks were excluded from analysis. 4. Perform a task: The participant performed a main task described in Table 2. 5. EEG measurement: After each task, the EEG of the participant was measured. 6. Perform eight tasks: The participant performed tasks repeatedly until finishing eight tasks and EEG measurements. 7. Questionnaire: After the tasks, the participant filled out the questionnaire described in Section 3.6.

Toward Constructing an EEG Measurement Method for Usability Eval.

99

3.6 Questionnaire After the eight tasks, participants answered the questionnaire sheet to investigate subjective satisfactions for each version of Excel and usage frequency of each function that was used in the tasks. The questionnaire was created by the authors based on the Questionnaire for User Interaction Satisfaction (QUIS). Each question about usage frequency consists of a four-point scale (from “Never” to “Few times per week”) and seven-point scale (from “Strongly disagree” to “Strongly agree”) for subjective satisfaction. Figure 2 shows a part of questionnaire sheet that used in the experiment.

Fp1 F7

T3

Fpz

Fp1

Fz

F3

C3

F4

F8

T4

C4

Cz

A2

A1 T5

P3

P4

Pz O1

T6

O2

Fig. 1. Electrode Locations in the International 10-20 System

Brain wave time[sec]

Fig. 2. Questionnaire Sheet

Method 1. Power spectral analysis at even intervals

5.12

Method 2. Power spectral analysis using different lengths of time window

5.12

5.12

5.12

10.24 15.36

Fig. 3. Two Kinds of Analysis Methods for Electroencephalogram

4 Analysis for EEG We applied power spectral analysis to EEG data we collected at a sampling frequency of 200Hz. To have a clear understanding of how frequency components of brain waves changed over time in the setting of our experiment and how analysis results varied according to lengths of the analysis window, we used two analysis methods for the EEG data as follows. Figure 3 illustrates the difference between the analysis methods.

100

M. Kimura et al.

Method 1. Power spectral analysis at even intervals This analysis aims to observe how brain waves change over time. We analyzed EEG data with a time interval of 5.12 seconds by cutting out the entire EEG data every 5.12 seconds so as not to overlap analysis data. The EEG data with 19 intervals (from 0 to 92.16 second) was analyzed. Method 2. Power spectral analysis using different lengths of time window This analysis aims to observe how analysis results differ according to lengths of analysis window. We analyzed EEG data by increasing time length of the analysis window by 5.12 seconds, without changing the start position of our analysis. The time length was increased by 5.12 seconds (min.) to 97.28 seconds (max.) (i.e., 0~5.12 sec., 0~10.24 sec , …, and 0~97.28 sec.). Next, the target data was filtered to reduce the artifacts from eye blinking, myoelectric activity and so on. We used a high-pass filter (HPF, 3Hz cutoff frequency, +6dB/oct attenuation factor), a low-pass filter (LPF, 60Hz cutoff frequency, -6dB/oct attenuation factor), and a band-elimination filter (BEF, 60Hz central frequency, 47.5Hz~72.5Hz stopband, second order). The band-elimination filter was used to remove the influence of an alternating-current power supply. After the EEG data was multiplied by the Hamming window and processed with the fast Fourier transform (FFT), we obtained the power spectrum from the EEG data. From the obtained power spectrum, we calculated the respective proportions of alpha rhythm and beta rhythm to all brain waves, and also calculated beta/alpha, which divided the proportion of alpha rhythm into the proportion of beta rhythm. In accordance with the classification of the international 10-20 system, we set the frequency components of alpha rhythm and beta rhythm to 8~13Hz and 13~30Hz respectively. We also set the range of all brain waves to 3~30Hz. Since the proportions of alpha and beta rhythms to all brain waves are widely used for observing various activities in the brain, we also decided to use them as indexes for measuring the physiological state of subjects after the tasks. However, because the proportions and intensity of alpha and beta rhythms vary from individual to individual, comparisons of brain waves with the absolute value would be inappropriate. In this paper, we normalized EEG data by an average value of each subject’s power spectrum and compared it with each data.

5 Results 5.1 Results of the Power Spectral Analysis at Even Intervals Figures 4, 5, and 6 respectively show the mean and the standard deviation of the alpha rhythm, beta rhythm and beta/alpha in the power spectral analysis at even intervals. In each graph, the left x-axis, the right x-axis, and the y-axis are the mean, standard deviation, and time respectively. Figure 4 indicates that the mean of alpha rhythms for Excel 2003 were larger than that for Excel 2007 after 56.32 seconds and that the difference of the alpha rhythms between Excel 2003 and 2007 was largest at 81.92 seconds. The standard deviation was always comparatively small and lowest at 56.32 seconds. Figure 5 shows the mean of the beta rhythms for Excel 2007 was larger than that for Excel 2003 after 46.08 seconds, and the difference of the beta rhythms between Excel 2003 and 2007

Toward Constructing an EEG Measurement Method for Usability Eval.

101

was greatest at 87.04 seconds. The standard deviation was higher on the whole than the standard deviation of alpha rhythms. The lowest standard deviation was observed at 40.96 seconds. Figure 6 presents that the mean of beta/alpha in Excel 2007 was larger than that in Excel 2003 after 46.08 seconds and that the difference of beta/alpha between Excel 2003 and 2007 was at 81.92 seconds. The standard deviation is larger than the results of alpha rhythms and beta rhythms and smallest at 40.96 seconds. 5.2 Results of the Power Spectral Analysis Using Different Length of Time Window

0.30

1.05

0.25

1.00

0.20

0.95

0.15

0.90

0.10

0.85

0.05

0.80

0.00

0.40

1.15

0.35

1.10

0.30

1.05

0.25

1.00

0.20

0.95

0.15

0.90

0.10

0.85

0.05

0.80

0.00 5.12 10.24 15.36 20.48 25.60 30.72 35.84 40.96 46.08 51.20 56.32 61.44 66.56 71.68 76.80 81.92 87.04 92.16 97.28

1.20

Time length[sec]

Start time [sec]

1.10

0.30

1.05

0.25

1.00

0.20

0.95

0.15

0.90

0.10

0.85

0.05

0.80

0.00

1.20

0.40

1.15

0.35

1.10

0.30

1.05

0.25

1.00

0.20

0.95

0.15

0.90

0.10

0.85

0.05

0.80

0.00

Start time [sec]

Time length[sec]

0.35

1.10

0.30

1.05

0.25

1.00

0.20

0.95

0.15

0.90

0.10

0.85

0.05

0.80

0.00

Standard deviation

0.40

1.15

Average of normalized β/ α

Fig. 8. Beta Rhythm in Method 2

1.20

0.00 5.12 10.24 15.36 20.48 25.60 30.72 35.84 40.96 46.08 51.20 56.32 61.44 66.56 71.68 76.80 81.92 87.04 92.16

Average of normalized β/ α

Fig. 5. Beta Rhythm in Method 1

Fig. 6. Beta/Alpha in Method 1

Standard deviation

0.35

5.12 10.24 15.36 20.48 25.60 30.72 35.84 40.96 46.08 51.20 56.32 61.44 66.56 71.68 76.80 81.92 87.04 92.16 97.28

0.40

1.15

Standard deviation

1.20

Average of normalized β rhythm

Fig. 7. Alpha Rhythm in Method 2

0.00 5.12 10.24 15.36 20.48 25.60 30.72 35.84 40.96 46.08 51.20 56.32 61.44 66.56 71.68 76.80 81.92 87.04 92.16

Average of normalized β rhythm

Fig. 4. Alpha Rhythm in Method 1

Start time [sec]

Standard deviation

0.35

1.20

0.40

1.15

0.35

1.10

0.30

1.05

0.25

1.00

0.20

0.95

0.15

0.90

0.10

0.85

0.05

0.80

0.00

Time length[sec]

Fig. 9. Beta/Alpha in Method 2

Standard deviation

Excel 2007 Excel 2007 SD

5.12 10.24 15.36 20.48 25.60 30.72 35.84 40.96 46.08 51.20 56.32 61.44 66.56 71.68 76.80 81.92 87.04 92.16 97.28

1.10

Excel 2003 Excel 2003 SD

Standard deviation

1.15

Average of normalized α rhythm

0.40

1.20

0.00 5.12 10.24 15.36 20.48 25.60 30.72 35.84 40.96 46.08 51.20 56.32 61.44 66.56 71.68 76.80 81.92 87.04 92.16

Average of normalized α rhythm

Figures 7, 8, and 9 respectively show the mean and the standard deviation of the alpha rhythm, beta rhythm and beta/alpha in the power spectral analysis using different lengths of time window. In each graph, the left x-axis, the right x-axis, and the y-axis are the mean, standard deviation, and time respectively.

102

M. Kimura et al.

Figure 7 shows that the mean of alpha rhythms in Excel 2003 was larger than that in Excel 2007 in the case of using time windows over 40.96 seconds. As lengths of time window became longer, the alpha rhythms in Excel 2003 tended to increase and the standard deviation decreased. Figure 8 presents that the mean of the beta rhythms in Excel 2007 was larger than that in Excel 2003 in the case of using time windows over 56.32 seconds. As lengths of time window became longer, the standard deviation tended to decrease as well as the results of the alpha rhythms. Figure 9 indicates that the mean of beta/alpha in Excel 2007 was larger than that in Excel 2003 in the case of using time windows over 40.96 seconds. As lengths of time window became longer, the standard deviation tended to decrease as well as the results of the alpha and beta rhythms. Table 3. A Result of Questionnaire Usage

Understand Productivity

frequency Excel 2003 Average 3.3 SD 0.82 Excel 2007 Average 1.8 SD 1.14 p < 0.05 yes

5.0 1.33 3.5 2.17 no

Simple Interface Easy Satisfacto use to use tion 5.4 4.8 4.9 5.3 5.0 1.26 1.32 1.45 1.34 1.15 3.5 4.0 3.1 3.4 3.3 1.72 1.63 2.13 1.90 1.95 no yes yes yes yes

5.3 Results of Questionnaire Table 3 shows the mean, standard deviation and the result of the two-sample t-test of each questionnaire item. In the table, there were significant differences in “Productivity,” “Interface,” “Easy to use” and “Satisfaction” between Excel 2003 and Excel 2007. Our subjects gave Excel 2007 lower scores than Excel 2003.

6 Discussions From the results of the power spectral analysis at even intervals, we could confirm that all three indexes tended to be relatively stable after 56.32 seconds. In fact, the alpha rhythms in Excel 2003 were larger than those in Excel 2007. The beta rhythms and beta/alpha in Excel 2007 were larger those in Excel 2003. The standard deviations of all three indexes also indicated lowest or relatively lower from 40.96 seconds to 56.32 seconds. These results would imply that EEG data from 56.32 seconds to 61.44 seconds is stable among all subjects, is less influenced by artifacts, and appropriately reflects the influence of the difference between Excel 2003 and 2007. However, for all indexes, the difference of the influence between two versions of Excel and the standard deviations constantly fluctuated. We consider that this might occur due to individual differences of subjects’ brain waves, fatigue from the long experiment duration, myogenic potential by attitude variation, and so forth. Therefore, it is necessary not to use a time window with short length, but to use one with long length, in order to conduct accurate usability evaluation.

Toward Constructing an EEG Measurement Method for Usability Eval.

103

From the results of the power spectral analysis using different lengths of time window, we could observe that the alpha rhythms in Excel 2003 were stably higher in the case of using time windows over 40.96 seconds, the beta rhythms in Excel 2007 were stably higher in the case of using time windows over 56.32 seconds, and beta/alpha in Excel 2007 was stably higher in the case of using time windows over 40.96 seconds. For all indexes, the standard deviations tended to become small as the time window became longer. This might be because the rate of EEG influenced by artifacts became small as the time window became longer. These results suggest us that we could analyze EEG data with no influence of artifacts by using a time window over 56.32 seconds. The results of our analysis showed that the alpha rhythms in Excel 2003 were larger than those in Excel 2007, and the beta rhythms and beta/alpha in Excel 2007 were larger than those in Excel 2003. It is clarified by previous studies on EEG measurement methods that the amount of alpha rhythms is decreased and the amount of beta rhythms and the value of beta/alpha are increased when a subject’s mental work load is high. The results of the questionnaire showed our subjects preferred Excel 2007 to Excel 2003. These results of questionnaire and our analysis agree with previous work. Since the results of our analysis and questionnaire also supported the results of the past studies on EEG, we can conclude EEG measurement would be useful for evaluating software usability.

7 Conclusion and Future Work In this paper, we have conducted an experiment to have a clear understanding of the appropriate timing and lengths of time window in order to analyze EEG data for accurate usability evaluation. From the results of the experiments, we could obtain the following insights. • A short length of time window (e.g. 5.12 seconds) is not suitable for usability evaluation because the frequency components of brain waves constantly fluctuate. • The accuracy of usability evaluation can be improved by using length of time windows of 56.32 seconds. In this experiment, we could not observe normal state of EEG data but EEG influenced by the tasks. In the near future, we need to conduct another experiment to observe it.

References 1. Ericsson, K.A., Simon, H.A.: Protocol analysis: Verbal reports as data. MIT Press, Cambridge (1993) 2. Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The measurement of meaning. University of Illinois Press, Urbana (1957) 3. Chin, J.P., Norman, K.L., Shneiderman, B.: Subjective user evaluation of CF PASCAL programming tools. Technical Report (CAR-TR-304) (1987)

104

M. Kimura et al.

4. Hart, S.G., StaveLand, L.E.: Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In: Handcock, P.A., Meshkati, N. (eds.) Human Mental Workload, pp. 139–183. Elsevier, Amsterdam (1988) 5. Matsunaga, H., Nakazawa, H.: A Study on Human-Oriented Manufacturing System (HOMS) – Development of Satisfaction Measurement System (SMS) and Evaluation of Element Technologies of HOMS using SMS. In: Int. Conf. Manufacturing Milestones toward 21st Century, pp. 217–222 (1997)

Automated Analysis of Eye-Tracking Data for the Evaluation of Driver Information Systems According to ISO/TS 15007-2:2001 Christian Lange, Martin Wohlfarter, and Heiner Bubb Lehrstuhl für Ergonomie, Technische Universiät München Bolzmannstrasse 15, 85747 Garching {lange,m.wohlfarter,bubb}lfe.mw.tum.de

Abstract. First of all, the most important content of the ISO/TS 15007-2:2001 standard for performing eye tracking experiments will be described. The following text includes a detailed description of how eye/gaze experiments using the Dikablis eye tracking system are conducted according to the above mentioned standard and how recorded statistical evaluations can be automated and visualized. Keywords: ISO/TS 15007-2:2001, Eye tracking, Driver Assistance Systems, Driver Information Systems.

1 Introduction Adhering to the guidelines of the European Statement of Principles ESoP, the need for good and less distracting design of driver information systems will grow enormously. The future will yield the driver information systems which will hinder or distract the driver from the driving task being performed as least as possible. The advantage of the standardized experimental conduction lies in the improvements delivered by progressive experimental comparability, error reduced and faster processing. The duration of the analysis of uplifted data is therefore enormously reduced. Subsequently, the standard conduction of gaze experiments using the Dikablis Toolkit can be proved to comply with the ISO/TS 15007-2:2001 standard.

2 Toolkits for Standardized Experimentation and Testing The Dikablis software package supports the standards required by eye tracking experimentation. In figure 1 the interference between Dikablis and the ISO/TS 150072:2001 standard regarding test planning, calculations and analysis of eye tracking data is illustrated. Dikablis will be deployed in order to evaluate the tracking of the gaze direction and eye and head movements of the test subject. The Dikablis Toolkit consists of a Recording Software, an Analysis Software for post-processing of the eye-tracking data and manual data analysis and D-LAB for completely automated data analysis. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 105–110, 2009. © Springer-Verlag Berlin Heidelberg 2009

106

C. Lange, M. Wohlfarter, and H. Bubb

Fig. 1. Workflow schema for standardized testing and experimentation of Dikablis and ISO/TS 15007-2:2001 interferences/differences

With the Recording Software the gaze behavior of the test subject can precisely be recorded. The test subject wears the Dikablis Head Unit, on which two cameras are installed. One camera is directed towards the test subject’s eye and is used to recognize the subject’s gaze behavior (pupil movements). The other camera is directed straight on in front of the subject and monitors the subject’s environment. By processing both of these pieces of video data, the gaze direction of the test subject can almost precisely be determined. Offline re-calibration and post-processing of the eye-detection can be made with the Dikablis Analysis Software using the recorded data even after the recording software has finished recording. These possibilities to adjust and post-process guarantee clean and always analyzable test results under any circumstances. The D-LAB module contains a package for conducting experiments, from test planning and defining of special Areas of Interest (AOI) visible within the gaze region, to the automatic calculation of glance durations and graphical presentation of these results. Successively, the proceedings to standardized testing and experimentation will sbe described subsequently. Herein the capital and lower case letters are always related to figure 1’s demonstrated procedural (workflow) plan and the relationships between the small angled boxes located within the figure.

Automated Analysis of Eye-Tracking Data

107

2.1 Construction of an Experimental Plan The compliancy to test partitioning presented by ISO/TS 15007-2:2001 into „experimental condition“, „task“, and „subtask“ can be defined into one test plan with DLAB (see figure 2 left). This way a test can be represented in the form of intertwined and nested intervals in which: • „experimental condition“ an entire Experiment is unfolded (Ex. driving on country roads); • “task” defines the interaction between a determined system presented within the Experiment (ex. Operation of Navigation Systems); • A specification of a “task” as “subtask” will be considered. (Ex. Operation of a navigation system via a touch screen display). D-LAB offers the possibility to additionally define „subsubtasks“ as the fourth layer, ex. in order to mark the appearance of critical events presented within the experiment (ex. sharp breaking situations) or automatic analysis of the display screen (ex. an individual input screen within the navigation system, for example inputting a destination address).

Fig. 2. Left: constructing a test sequence; Right: Automatically created block shifting diagram/interface for online test segment sequence marking

2.2 Record Gaze Behavior and Mark the Beginning and End Points of Each Trial Interval Using one of the D-Lab integrated applications one can automatically create a shifting block diagram/interface using the trial definitions, which can be seen on the right side of figure 2. Each shifting block represents a test segment. By pressing a block, a network event is created, which marks the start or ending of a task interval directly in the Recording Software. The functionality of the Dikablis recording software is described

108

C. Lange, M. Wohlfarter, and H. Bubb

in written detail in Lange et al., 2006a und Lange et al., 2006b. These events mark the beginning respectively the end of a trial segment and are synchronously saved to the calculated gaze data. The events can also be started from another data recorder such as a driving simulator. 2.3 Validating Gaze Data and Trial Intervals The Dikablis analysis software is firm in validating the gaze data after a trial and in optimizing. In preparation, the gaze data if necessary is calibrated after the trial so as to optimize offline pupil recognition (see Lange et al., 2006a and Lange et al. 2006b). Further processing in D-LAB follows after the optional jointing is successful. The first move consists of testing whether all trial intervals would be marked correctly. The marked trial intervals located under the gaze player window synchronized to the playtime line shown on the user interface layer (see figure 3). D-LAB also offers a testing function which automatically identifies inconsistencies within the trial segment. The segment can be manually adjusted or changed, tasks can be added or deleted. Figure 3 shows the D-LAB interface for management of a trial interval.

Fig. 3. Validation and post-processing of a trial segment in D-LAB

2.4 Definition of Areas of Interest The areas of interest used in the calculation of glance durations in relevant gaze regions (ex. the navigation system display, the street view, the rear view mirror, the left angle mirror, the dashboard, etc.) as required by the ISO/TS 15007-2:2001 are defined by using D-LAB’s functionality of marking specific AOIs. Thus in D-LAB a random number of AOIs can be defined in the form of a polyline and labeled with names. Figure 4 demonstrates the Head-Up Display and Dashboard as defined AOIs.

Automated Analysis of Eye-Tracking Data

109

Fig. 4. Defining selected AOIs within the gaze region such as the Head-Up Display and the Dashboard

2.5 Automatic Calculation of Glance Metrics to the Defined AOIs In order to automatically calculate glance duration and glance frequency, the pupil of the test subject as well as his/her head position in relation to the test environment (relative to the defined AOIs) must be recognized. The determination of the head position is carried out with the help of the so called marker(s) (see the quadratic black and white object in figure 4), which describe the environment reference. These markers are found using image processing in the photo of the field camera and used in the head position calculation of the test subject. For every defined and established AOI, using the D-LAB function „Calculate Gaze Durations“ it is calculated when the test subject glanced at an AOI and when their glance was away from a certain AOI. The result of this calculation is displayed analogously to the graphical representation of the trial intervals located under the gaze film player window synchronous to the playtime line. Hence, the operator can impose calculation corrections at any time simply by testing and manually correcting if necessary. The gaze specific values from the ISO/TS 15007-2 standard can be calculated using the automated determination of glance durations via the defined AOIs. Thus the operator can input the values which need be calculated for the following gaze experiment in the form of an „Analysis Series“. Therefore the specific values pertaining to specific tasks and AOIs to be calculated must be defined. The following values are available to choose from: • • • • • • • •

Total Glance Time Glance Frequency Time off road-scene-ahead Total glance time as a percentage Fixation probabilities Link value probabilities Maximum Glance Duration Mean Glance Duration

110

C. Lange, M. Wohlfarter, and H. Bubb

D-Lab hereupon calculates the inputted values and saves the results in the form of text file. These automated value calculations can be conducted for all defined AOIs, all terms of the experiment plan as well as for all gaze metrics. The text file is built concurrently so that it can be opened for further statistical analysis using the SPSS statistics program without indirection. 2.6 Graphical Representation of Calculated Metrics Next to the internal calculation of values, D-LAB additionally offers a trial result(s) graphical representation capability. For this purpose several graphical diagrams are displayed which support the interpretation of the result(s). There are two examples in figure 5 that exemplify this. To the top one can find the progressive course of glance duration on defined AOIs for a single subtask presented by four independent test subjects. To the bottom the course of the users mean glance fixation duration in a critical situation is shown in order to allow conclusions on the basis of mental strain.

Fig. 5. Top: Glance duration on the defined AOIs on all trial experiments. Bottom: User glance fixation duration in critical situations.

References 1. ISO/TS 15007-2:2001: Road vehicles - Measurement of driver visual behavior with respect to transport information and control systems - Part 2: Equipment and procedures 2. Lange, C., Yoo, J.-W., Wohlfarter, M., Bubb, H.: Dikablis - Operation mode and evaluation of the human-machine interaction. In: Spring Conference of Ergonomics Society of Korea, Seoul, May 12 (2006) 3. Lange, C., Wohlfarter, M., Bubb, H.: Dikablis - engineering and application area. In: Proceedings IEA 2006 16th World Congress on Ergonomics, Maastricht the Netherlands (2006)

Brain Response to Good and Bad Design Haeinn Lee1, Jungtae Lee2, and Ssanghee Seo2 1

107A Kiehle Visual Arts Center, 720 Fourth Avenue South St. Cloud, MN 56301-4498, USA 2 6409-1 Dept of Computer Science & Engineering, Pusan National University Jangjeon-dong, Geumjeong-gu, Busan, 609-735, Republic of Korea

Abstract. This paper is about the decision of whether good or bad design is the result of the human brain process. Our research team has used the technique of functional MRI and Electroencephalogram (EEG) to address the question of how the brain answers while subjects viewed different designs. Classifying the good or bad designs, subjects chose a mouse button to decide their perception of good or bad design and we analyzed their patterns of EEG rhythms and fMRI. The results of fMRI showed that the perceptions of different feelings of designs are associated with the frontal lobe and the occipital lobe. After analyzing the EEG by the Event-related brain potentials (ERP) method, we also found that the amplitude of ERP components in perception of bad design is greater and latency is shorter than that of good design. Therefore, the human brain responds sooner and stronger in perception of bad feeling. Keywords: Human Behaviors, EEG, fMRI, ERP, Interaction and Interface Design, Usability Test, Brainwork, Visual Brain.

1 Introduction With advances in computer video display technologies such as EEG, fMRI, PET, and SPECT, brain research has become a contemporary issue. Most people thought that the brain is just black box and is not an interesting area to explore, but because of developed new technologies such as EEG, PET, and fMRI, research of the brain revealed how it is structured and worked. [1] The weight of the human brain is about 1400g and contains between 1 billion and 100 billion neuron, and is divided into the brain stem, limbic system, and cortex. The brain stem is the major route by which the forebrain sends information to and receives information from the spinal cord and peripheral nerves. The limbic system is a system of nerves in the brain involving several different areas concerned with basic emotions such as fear and anger, and basic needs such as the need to eat and to have sex. [1] The Cortex forms 90 percent of the brain, and plays a key role in memory, attention, perception, awareness, thought, language, and consciousness. Also this part of the brain is divided into four sections: the occipital lobe, the temporal lobe, the parietal lobe, and the frontal lobe. Functions, such as vision, hearing, and speech, are distributed in these selected regions. Brain research has become a contemporary issue and J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 111–120, 2009. © Springer-Verlag Berlin Heidelberg 2009

112

H. Lee, J. Lee, and S. Seo

has explored diverse aspects of many different major areas such as medical science, engineering, psychology, biotechnology, linguistics, economics, music, etc. This research has provided new insight into underlying disease mechanisms and is beginning to suggest new treatment. Also using the Neuro-Lingustic Programming (NLP) method, researchers understand human behavior and cure the human mind to lead a better life. [2] The brain controls body activities, feels the five senses, and shapes thoughts, hopes, dreams, and imaginations. In short, the brain is what makes people human. [1] Recently, there are lots of brain researches relating to linguistics and music activities. For example, a person hears two different sentences; “He takes coffee with cream” and “He takes coffee with dog” and examined their brain movements. There are certain distinctions between the two sentences. The human brain has a strong reaction when people hear the latter sentence that is not normal. Likewise, there are certain differences of brain movements when people hear natural and unnatural melodies from music. [3,4,5] It can be linked to the art and design areas, so that research can be about how the human brain acts when people do artistic activities, and how people feel when they look at artistic works or designs. If people look at the smile of the Mona Lisa, they usually get a good feeling. On the other hand, if people look at a photo that shows a scene with a terrible car accident, they get a bad feeling. According to this fact, if our researchers can clarify and measure how the brain works when people get a good or bad feeling, we can come up with an objective evaluation of which part of design makes people get a good feeling or bad feeling. Also we can determine the main aspects of a design with good value. We have used the technique of functional MRI and EEG to address the question of how the brain answers while subjects viewed different designs. Twenty-five subjects participated in this research and viewed fifty design images randomly. Classifying the good or bad designs, subjects chose a mouse button to decide their perception of good or bad design. We analyzed their patterns of EEG rhythms and fMRI to find the characteristics of brain activity based on good and bad designs.

2 Brain Structure and Movement 2.1 Brain Structure As I mentioned before, the brain is divided into four sections: the occipital lobe, the temporal lobe, the parietal lobe, and the frontal lobe, and functions such as vision, hearing, and speech, are distributed in these selected regions. The occipital lobe is located in the back of the brain and plays a role in processing visual information. (Fig.1) The parietal lobe plays a role in sensory processes, particularly determining spatial sense and navigation, attention, and language. The frontal lobe has a role in controlling movement and in the planning and coordinating of behavior. The temporal lobe is involved in auditory processing and is home to the primary auditory cortex. [1].

Brain Response to Good and Bad Design

113

Fig. 1. Brain Classifications [1]

2.2 Brain Movement and Technology Functional magnetic resonance imaging (fMRI). People use the MRI, which provides a high quality, three-dimension image of organs and structures inside the body to examine a body structure. But to examine the brain activity people need to use fMRI and this is most popular neuroimaging technique today. This technique compares brain activity under resting and active conditions, and it combines the high spatial resolution, which is a correlate for neuronal activity. This technique allows for more detailed maps of brain areas underlying human mental activities in health and disease. Our team has used the technique of fMRI to find out which part of brain is more activated when subjects viewed good or bad design. [1]. Electroencephalogram (EEG). Many of the recent advances in understanding the brain are due to the development of techniques that allow scientists to directly monitor neurons throughout the body. Electroencephalogram is the recording of electrical activity along the scalp produced by the firing of neurons within the brain. In this method, electrodes placed in specific parts of the brain, which vary depending on which sensory system is being tested-make recordings that are then processed by a computer. [1]. EEG has several strong sides as a tool of exploring brain activity; for example, its time resolution is very high (on the level of a single millisecond). Other methods looking at brain activity, such as PET and fMRI have time resolution between seconds and minutes and it measures the brain’s electrical activity directly. EEG can be used simultaneously with fMRI so that high-temporal-resolution data can be recorded at the same time as high-spatial-resolution data. [6]. Our team has used the technique of EEG (Electroencephalogram) to address the question of how the brain answers while subjects viewed images. Most important thing to use EEG is which part of brain we need to place electrodes to measure the brain waves. Electrode location and names are specified by the 10-20 system for most research applications. This system is an internationally recognized method to describe and apply the location of electrodes in the context of an EEG test. [6] (Fig.2).

114

H. Lee, J. Lee, and S. Seo

Fig. 2. Electrodes location to check the EEG (10-20 System) [8]

Event-related potential (ERP). During the experiment, we used event-related brain potentials (ERP), which is a method in checking an electroencephalogram in certain moment when subjects received the event like viewed designs. ERP is any measured brain response that is directly the result of a thought or perception. It can be reliably measured using the EEG, a procedure that measures electrical activity of the brain through the skull and scalp. [7]. There are two important components in ERP waveform, which are P300 and N400. The N400 ERP component is described as a negative voltage deflection occurring approximately 400ms after stimulus onset, where as P300 component describes as a positive voltage deflection 300ms after stimulus onset. The presence, magnitude, topography and time of this signal are often used as metrics of cognitive function in decision making processes. While the neural substrates of this ERP still remain hazy, the reproducibility of this signal makes it a common choice for related researches. [6].

3 Brain Response to Good and Bad Design Our team has used the technique of functional MRI and EEG to address the question of how the brain answers while subjects viewed different designs. Twenty-five subjects participated in this research and viewed fifty design images randomly. First, as I am a professional graphic designer, I subjectively chose twenty-five designs that evoked good feelings and twenty-five that evoked bad feelings. (Fig.3) Because the author’s judgment of good or bad designs was purely subjective, the individual subjects were asked to categorize each design as good or bad. As they viewed each image, they either clicked the right mouse button to classify the image as good, or the left mouse button to classify the image as bad. As they did this, we analyzed their patterns of EEG rhythms and fMRI.

Brain Response to Good and Bad Design

Fig. 3. Design image examples for experiment

Fig. 4. The fMRI response of good design

115

116

H. Lee, J. Lee, and S. Seo

3.1 The Results of the fMRI Based on Good and Bad Design Our team used fMRI machine (ISOL fORTE) in the Brain Science Research Center at Korea Advanced Institute of Science and Technology (KAIST) and shooting variable is as follows: TR=3000ms, TE=35ms, Number of Slice=25, FOV=24cm, Image natrix=64x64, Slice Thickness=5mm. The scenario of the experiment is as follows: 1. Shows “+” on a screen for 3seconds. 2. Shows good and bad design randomly for 3seconds. 3. Shows “+” on a screen for 3seconds again. We kept repeating this process to show all fifty designs. At the same time, subjects chose a mouse button to decide their perception of good or bad designs. Fig 4 presents the part of brain that showed activity when one subject saw a good design. The top left brain image shows the midsagittal section, the right image is showing the coronal section, and the bottom image shows a horizontal section. The left part of the midsagittal is the location of the occipital lobe and the right part is the location of the frontal lobe. The top part of horizontal section is the right part of midsagittal section and bottom part is the left part of midsagittal section.

Fig. 5. The fMRI response of bad design

Brain Response to Good and Bad Design

117

The bottom table of fig4 shows the coordinates of where the brain was activated. Using these coordinates we can indicate the broadmann area in the activated brain areas. After observing the broadmann area, the most activated areas were points 19,7,11,and 6. These areas are the occipital lobe, the parietal lobe, the prefrontal lobe, and the frontal lobe. The decisions to judge images either good or bad are made in the occipital lobe and the frontal lobe. The broadmann 7 area, which controls visualmotor coordination, is located in the parietal lobe, but it is close to the occipital lobe. Fig 5 presents the part of brain that showed activity when one subject saw a bad design. When subjects judged the design as bad, their brain was activated in broadmann area 18, 19,7,11,6, and 20 and these areas are location of the occipital lobe, parietal lobe, prefrontal lobe, frontal lobe, and temporal lobe. The broadmann 20 area, which controls the high-level visual processing and recognition, is located in the temporal lobe. Similarly, The occipital lobe, frontal lobe, and parietal lobe were activated while subjects looked a bad design. We noticed that there were differences in specific activated areas, for example, the brain action when subjects had a bad feeling was much stronger than when subjects had a good feeling. Also, there was little difference in brain action between the left and right brain when the subject perceived the design as good, but left-brain action was strong when subjects perceived the design as bad. 3.2 The Results of the EEG Based on Good and Bad Design Based on the fMRI result, our researchers decided to attach electrodes in the area of the occipital and the frontal lobe, which is Fp1, Fp2, O1, and O2 from 10-20 system, and examine the EEG in the areas. (See Fig.2.) Twenty-five subjects participated this research and the scenario of the experiment is as follows: 1.Shows “+” on a computer screen for 1minute to be a steady state of brain. 2. Shows good and bad design randomly for 3seconds. 3. Subjects choose a mouse button to decide their perception of good or bad designs. 4. Shows “+” on a computer screen for 1minute to be a steady state of brain again. We kept repeating this process to finish all fifty designs. During the experiment, we used event-related brain potentials (ERP), to check EEG in certain moment when subjects received the different feelings. Fig6 image shows the amplitude of the occipital lobe when the subjects perceived the design as good or bad. The latency of P300 component from channel O1 and O2 when subjects had a bad feeling appeared faster than when subjects had a good feeling. Also the amplitude of the perception of bad feeling is higher than the amplitude of the perception of good feeling. Fig 7 shows the result of difference of latency based on subjects’ perception of good or bad design. From this figure, the x-axis represents the result of the latency (G) of a bad design subtracts from the latency (B) of a bad design, and the y-axis is the subjects’ number. Most points placed in plus area which means the latency (G) of a good design is longer than the latency (B) of a bad design. Therefore, the human brain responds sooner and stronger action when subjects perceived the design as bad. We also examined the latency of the frontal lobe when the subjects perceived the design as good or bad, and the result is similar with the occipital lobe. As I mentioned before, the human brain responds sooner to in perception of bad feeling. However, these results are the mean value of subjects, there are people responded sooner in perception of good feeling.

118

H. Lee, J. Lee, and S. Seo

Fig. 6. Comparison of the occipital lobe characteristic

Fig. 7. Comparison of the occipital lobe latency

4 Conclusion With advances in computer video display technologies such as EEG, fMRI, PET, and SPECT, brain research has become a contemporary issue. Based on the issue, many different major areas such as Medical Science, Engineering, Psychology, Linguistics,

Brain Response to Good and Bad Design

119

Biotechnology, Economics and Music explored diverse aspects of brain research. Hence, there should be lots of possibilities to study brain research relative to Art and Design. When one makes the decision that something is good or bad his/her decision is a result of the human brain process. It is obvious that the brain works differently depending on a good feeling or a bad feeling. In other words, the brain will answer different ways when people look at good design or bad design. Although there would be a difference in determining design values, it depends on the individual. In the case of a masterpiece, most people admit its artistic value. According to this fact, if our researchers clarify how the brain works when people look at good and bad design, we come up with an objective evaluation of how people judge the design value. Based on the result, we also determine what can be the main aspects of a design with good value. We have used the technique of functional MRI and EEG to address the question of how the brain answers while subjects viewed different designs. The results of fMRI show that the perceptions of different feeling for designs are associated with the frontal lobe and the occipital Lobe. The occipital lobe is placed in back of the brain part and has a visual cortex so controls with the organ of vision. The frontal lobe is placed in front of the brain part and takes care of complicated functions such as thinking, planning, and deciding. Based on the fMRI result, our researchers decided to attach electrodes in the area of the occipital and the frontal lobe and examine the EEG in the areas. During the experiment, we used the ERP, which is a method in checking the EEG in certain moment when subjects received the event like viewed designs. After analyzed the EEG by the ERP method, we also found that the amplitude of ERP components in perception of bad design was higher and latency was shorter than that of good design. From all examinations, we found there is a significant distinction between an individual because of a characteristic of brain. Some people have strong action in leftbrain and some are not. But in general, the human brain responds sooner and stronger to action in perception of bad feeling. Thus, we assume it can be an objective fact to determine good value of design when brain responds slower and exhibit a long latency. Considering this issue, we also can determine what aspect that makes good value of design and apply as a solution to create a better design. In the future, the research includes not only pure design but also can be applied into the human interface design. One of the most important issues in the human interface design is a usability test to grasp a person’s individual feeling and measure the emotional satisfaction. The idea of this paper will show as a great method of usability test to classifies the good or bad interface design as well as link with ease of use. Acknowledgments. I want to give a special thank you to my research team and family: my father, mother, and brother for their constant and unconditional love, encouragement, and support. I am deeply indebted to them and dedicate this study to them.

References 1. Society for Neuroscience: Brain Facts, A primer on the Brain and Nervous System (2005), http://www.sfn.org 2. Mark, F., Barry, W., Michael, A.: Neuroscience: Exploring the Brain, 3rd edn. Lippincott Williams & Wikins (2006)

120

H. Lee, J. Lee, and S. Seo

3. Lee, J.Y.: Neurophysiology and Brain-imaging study of Music-music& language, music & emotion. Nangman Music Magazine 18(3) (2006) 4. West, W.C., Rourke, T., Holcomb, P.J.: Event-Related Brain Potentials and Language Comprehension: A Cognitive Neuroscience Approach to the Study of Intellectual Functioning, Tufts University (1998) 5. Lu, H., Wang, M., Yu, H.: EEG Model and Location in Brain when Enjoying Music. In: Proceedings of the 2005 IEEE Engineering in Medicine and Biology, Shanghai, China, pp. 2695–2698 (2005) 6. Wikipedia, encyclopedia: http://en.wikipedia.org/wiki/EEG 7. Coles, Michael, G.H., Rugg, M.D.: Event-related brain potentials: an introduction, Electrophysiology of Mind, Oxford scholarship Online Monographs, pp. 1–27 (1996) 8. Kim, D.S.: Electroencephalogram, Korea Medical (2001)

An Analysis of Eye Movements during Browsing Multiple Search Results Pages Yuko Matsuda, Hidetake Uwano, Masao Ohira, and Ken-ichi Matsumoto Graduate School of Information Science, Nara Institute of Science and Technology 8916-5,Takayama, Ikoma, Nara, Japan {yuko-m,hideta-u,masao,matumoto}@is.nasit.jp

Abstract. In general, most search engines display a certain number of search results on a search results page at one time, separating the entire search results into multiple search results pages. Therefore, lower ranked results (e.g., 11thranked result) may be displayed on the top area of the next (second) page and might be more likely to be browsed by users, rather than results displayed on the bottom of the previous (first) results page. To better understand users' activities in web search, it is necessary to analyze the effect of display positions of search results while browsing multiple search results pages. In this paper, we present the results of our analysis of users' eye movements. We have conducted an experiment to measure eye movements during web search and analyzed how long users spend to view each search result. From the analysis results, we have found that search results displayed on the top of the latter page were viewed for a longer time than those displayed on the bottom of the former page. Keywords: Eye tracking, Web search, User activity, Search results page.

1 Introduction Web search engines are designed to search for useful information on the World Wide Web. Each search engine uses different algorithms to rank web pages, but their interfaces are similar to each other; that is, users type some words into a query box and receive a rank-ordered list of web pages that is relevant to the words. Better understanding user activities would provide us insights into improvements of interactions during web search and the usefulness of the interfaces for search engines. Up to now, many studies have reported analysis results of users’ web search activities based on eye tracking, which is well-known as an effective means to understand what users are looking for during web search. One of the studies [2] showed that users tend to spend most time browsing a few, top results on a search results page while they spend less time browsing the bottom of the page. The study concluded users strongly rely on the correctness of a ranked order presented by a search engine. As with the above study, most previous studies also have focused on user interactions on the first page of the entire search results. However, many users browse multiple results pages in using a search engine [3]. Neglecting this fact might lead to an incomplete understanding of user activities in web search. When a number of search J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 121–130, 2009. © Springer-Verlag Berlin Heidelberg 2009

122

Y. Matsuda et al.

results exist, most search engines separate the entire results into multiple search results pages and display one results page at a time (e.g., 10 search results on a page. The numbers of results displayed on one page depends on user preference.) In this case, lower ranked results (e.g., 11th result) may be displayed on the top area of the next page and might be more likely to be browsed by users, rather than results displayed on the bottom of the previous (first) results page. Therefore, it is important to analyze and understand user activities in searching multiple search results pages in order to provide users with a means or new interface that enables them to more naturally browse search results in a ranked order recommended by a search engine. In this paper, we have conducted an experiment to observe effects of search result positions on user activities in web search. We measured eye movements during web search tasks and analyzed them to understand how long users spend to browse each result.

2 Related Work There have been many studies on user activities in web search. A popular approach to analyzing user activities in web search is to use data of browsing histories or access logs [6][7][8]. Although using history data helps us understand a user’s access paths or interests in a specific web page, it cannot be used for analyzing a user’s attention to search results during web search and influences of displayed positions of search results on web search activities. Another approach is to use eye tracking instruments to capture user activities based on eye movements. Cutrell, et al. [1] used eye tracking to analyze the influence of the length of a site summary (a snippet text) presented in web search results on a user’s search activities. The results of their experiments indicated that a long site summary has an effect of decreasing search time and increasing a user’s search correctness in informational search tasks where users are trying to find a specific web site or homepage. In contrast to informational search tasks, in navigational search tasks where users are trying to find web pages that include some kind of information, it has an effect on the increase of search time and the decrease of user’s search correctness. Guan et al. [2] also measured eye movements and then analyzed the influence of positions of target results that users are looking for. They found that users took longer to search the target results and were less successful in finding the targets when the search targets were placed in low positions on a search results page. Although the study provides useful insights into the design of a new interface for web search engines, it only focused on the first search results page. So it still fails to capture user activities in browsing multiple search results pages. Lorigo et al. [9] analyzed differences of task types in web search. They reported that users performing informational search tasks took longer to complete those tasks than navigational ones and spent more time to stay at pages linked by search results. However, the study also did not focus on time consumption for each search result and user interactions with multiple search results pages. In this paper, using multiple search results pages and two types of tasks (i.e., informational and navigational tasks), we measure and analyze users’ eye movements for each set of search results to provide a new understanding of user activities in web search.

An Analysis of Eye Movements during Browsing Multiple Search Results Pages

123

3 Experiment 3.1 Overview To observe how users look at each set of search results during browsing multiple search results pages, we analyzed total time of eye movements on each search result. In the experiment, participants were asked to search for appropriate web pages (target results) from the search results pages of Google to find particular information with predetermined words. The information and words were specified by experimenters. The experimenters measured eye movements of the participants during the tasks. To analyze user’s eye movements helps us understand how users browse search results. In the experiment, WebTracer [10] was used as an eye tracking system. WebTracer allows us to collect and analyze data of a user’s eye movements and operations (e.g., mouse and keyboard operations) during web search. After the tasks, participants answered a questionnaire about their usual search activities and were interviewed about observed interactions in the tasks. Participants of the experiment were 21 undergraduate students studying information science. All participants used web search in daily life and used Google as their main search engine. 3.2 Apparatus In the experiment, the following equipment was used. • Display: 21-inch LCD monitor (Viewable screen size: H30 x W40cm, Resolution 1,024 x 768 pixels) • Distance from subject’s face to display: approx. 50cm • Device for measurement of sight-line: NAC, EMR-NC (View angle: 0.28 degrees, resolution on the screen: approx. 2.4mm) • Recording and playing of sight-line data: WebTracer 3.3 Task The tasks performed by participants were (1) to search for appropriate web pages linked by search results from search results pages, (2) to find particular information specified by the experimenters and (3) to bookmark the target pages.

Fig. 1. Example of Rearrangement Search Results

124

Y. Matsuda et al.

The time limit of each task was ten minutes whether participants could complete the task or not. In the experiment, participants needed to use predetermined words and they were prohibited from changing search words during the tasks. Since the purpose of this experiment was to observe user’s activities in using multiple web search results pages, participants were only permitted to move to web pages linked by Google’s search results. The order of the tasks was counterbalanced to consider the learning effect. The tasks themselves were based on the test collection provided by NTCIR (NTCIR4 WEB) [11][12]. In this experiment, the two types of tasks were selected as follows. • Informational Task: required participants to find specific information (e.g., web pages including information on university entrance exams). The task was completed by finding three web pages linked by target results and bookmarking them. • Navigational Task: required participants to find specific web pages (e.g., official web page of the university). The task was completed by finding a web page linked by a target result and bookmarking it. Each participant performed ten tasks (five tasks for each task type). 3.4 Design of Web Search Results Pages To prevent the bias effect of the numbers of target results and their positions, we modified the results pages that were saved in a local computer when we searched with Google. The participants performed the search tasks with the modified search results pages. The previous study showed users search about 2.35 pages [3]. Therefore, we prepared three search results pages and allocated target results randomly on the search results pages. Advertisements and information irrelevant for web search were removed. We used Google’s default setting in which 10 search results are displayed at a time. In addition, Google’s original (unmodified) search results pages were used for fourth or later search results pages. Note that each search result and display position of search results followed Google’s page rank. To prevent participants from finishing their search only on the first page, we allocated target results on the second or third page. Figure 1 is an example of a way of inserting target search results.

Fig. 2. Inserted Positions of Target Search Results

An Analysis of Eye Movements during Browsing Multiple Search Results Pages

125

For the design of search results pages, we prepared four rearrangement patterns of search results (Figure 2). In the Informational Task, we displayed target results in top (I-1), middle (I-2), bottom (I-3) and even (I-4) in second and third search results page. In Navigational Tasks, we displayed target results in the top (N-1, N-3) and bottom (N-2, N-4) on the second and third search results pages. Since participants may notice the experimenters’ intention (i.e., target results were displayed only after the second page), we insert a dummy task (original search results of Google) into each task. 3.5 Experimental Procedure 1. Explanation of the experiment and preparation: Experimenters explained the experiment and the eye tracking system to participants. 2. Configuration of the eye tracking system: We configured devices for measurement of sight line and checked sight line (calibration). 3. Task for practice: To understand the flow of the experiment, participants practiced a task. The task was an Informational Task. Original search results pages of Google were used. 4. Performing tasks: Experimenters explained each search task and participants started to search. This was repeated until all the tasks were finished. 5. Questionnaire: At the end of the experiment, participants were asked to answer a questionnaire about their daily use of web search engines. 6. Interview: Participants were also asked to answer an interview about observed characteristic activities during the tasks.

4 Results 4.1 Eye Movement Figure 3 shows the example of eye movements gathered in the experiment. The vertical axis shows the position of search results and the horizontal axis shows the time of appeared sight line. In the Figure, horizontal line describes eye movements on search results and the circle shows user click to the search result. The figure showed this user searched the results from top to bottom. The total time of eye movements on the clicked result was longer than other results. 0

Eye Movements Mouse Click

)k nar 10 (lt us eR hrc ae 20 S 30 0:03

0:37

1:13

1:57 2:14 2:41 3:41 Search Time(min:sec)

4:01

4:16

Fig. 3. Eye Movement and Clicked Search Result during the Task

5:05

126

Y. Matsuda et al. Table 1. Classification of Search Completion Pages

Group

Last Search Result

Last Search Results Page

Number of Task

G0

~5

Less than 1

Informational 6

Navigational 5

G1

6~15

1

18

21

G2

16~25

2

33

38

G3

26~

Over 3

27

20

4.2 Analytic Procedure To calculate the total time of eye movements, we separated the tasks by search completion pages. The search completion page is calculated from the position of the lowest search result looked at by each subject. Table 1 shows the classification of search completion pages by group. In this paper, we would like to analyze users’ activities searching multiple search results pages. Hence, we analyze the tasks which finish at the second page (G2) and after the third page (G3). We use the length of time to analyze the eye movements. Even if a users’ gaze appears at a certain search result, it does not necessarily mean that the reviewer has interest that line. Hence, we have to distinguish a focus (i.e., interest) from users' eye movements. In this paper, we defined focus as the eye remaining on a certain search result more than 100ms. To increase correctness of the analysis, we also remove eye movements that stay a long time on a particular position. When the user reads a search result intensively, the time of eye movements to the results greatly increases. However, this increase is not the effect of display position but the effect of the content of the result itself, that is, title of the web page, snippet (description of the web page), and URL. To distinguish a user’s intensive reading, the average reading time of clicked search results was adopted. Basically, users read the results before the clicking, to decide to move to the web page. Hence, it is reasonable to remove the eye movements to search results that stay more than the average time of clicked search results. In this experiment, the average time in the Informational Task was 3.18 seconds, and 2.44 seconds for the Navigational Task. 4.3 Analysis Result Figures 4 and 5 describe the mean time of eye movements on each search results classified as G2 and G3, respectively. The vertical axis shows the mean time of eye movements and the horizontal axis shows the rank of results. In both groups, users tend to view search results longer in informational tasks than navigational tasks. To evaluate effects of search result positions on user activities in web search, we calculated the total time of eye movements on top search results and bottom search results. Table 2 shows the average time of eye movements for the three results that were ranked high and displayed at bottom of the page (HB), and for the three results that were ranked low and displayed at top of the page (LT) in G2. Table 3 shows the average time of eye movements for the three HB results and that for the three LT results in G3. The table describes that the mean time of eye movements to LT is

An Analysis of Eye Movements during Browsing Multiple Search Results Pages

yeE fo e im T

）c 3 es 2.5 （ 2 st ne 1.5 1 em vo 0.5 M 0

Informational Task

1

3

5

7

9

11

13 15 17 19 Search Result （rank）

21

127

Navigational Task

23

25

27

29

Fig. 4. Mean Time of Eye Movements on each Search Result in Informational Task (square) and Navigational Task (triangle) of G2

yeE fo em iT

c）e 3 s（ 2.5 st 2 ne 1.5 m ev 0.51 oM 0

Informational Task

1

3

5

7

9

11

13 15 17 19 Search Result （rank）

21

Navigational Task

23

25

27

29

Fig. 5. Mean Time of Eye Movements on each Search Result in Informational Task (square) and Navigational Task (triangle) of G3 Table 2. Mean Time of Eye Movements for High-Rank Results Displayed in Bottom Area 3 and Lower-Rank Results Displayed in Top Area 3 in G2

Group

Search Results

LT_1 HB_1 LT_2 HB_2

1~3 8~10 11~13 18~20

Mean Time of Eye Movements (sec) Informational Navigational 1.67 1.24 1.46 1.04 1.51 0.96 0.66 0.79

Table 3. Mean Time of Eye Movements for High-Rank Results Displayed in Bottom Area 3 and Lower-Rank Results Displayed in Top Area 3 in G3

Group

Search Results

LT_1 HB_1 LT_2 HB_2 LT_3 HB_3

1~3 8~10 11~13 18~20 21~23 28~30

Mean Time of Eye Movements (sec) Informational 1.38 1.56 1.43 1.25 1.48 0.75

Navigational 0.95 0.88 0.84 0.83 0.95 0.61

almost the same as HB, or longer than HB in some cases (e.g. between HB_2 and LT_3 at G3). This result indicated users did not view the search results in proportion with page rank.

128

Y. Matsuda et al.

In particular, we focused on the top-three search results (LT) of each page (see Figures 4 and 5). In Figure 4 (eye movements of G2), the mean time of eye movements to the first result of each page (ranks 1 and 11) was shorter than the second and third results (ranks 2, 3, 12, and 13) at both task types. Also at G3 (Figure 5), eye movements to the first result of the page were shorter than the second and third search results.

5 Discussion 5.1 Effect of Task Differences The result of the experiment describes that users tend to view search results longer in informational tasks than navigational tasks. In the Navigational task, users read the snippet of the search result, then decide to click the search result or not. On the other hand, in the Navigational Task, users read the title and URL of the result instead of the snippet to decide to click the result. Reading the title and URL of the result requires less time than reading the snippet, therefore the length of the eye movements in Informational Tasks is longer than Navigational Tasks. The result suggests that when users browse multiple web search results pages, they adopt different reading patterns for each task. 5.2 Effect of Position within a Result Page/Screen The result of the experiment describes that time of eye movements on LT is longer than HB. The result shows that users are impressed not only by the rank but also the position of the search results within the results page. That is, the time length of eye movements on the search results is influenced by the position within a results page. The detailed analysis showed the time of eye movements on second and third results in each page are longer than the first result. This result suggests users’ eye movements are attracted by the position on the screen. In the experiment, users viewed the middle area of the screen more than the top area. The second and third results are displayed on the middle of the screen when the user went to the search results page. On the other hand, the first result is displayed on the top of the screen. The first result is turned out by the scroll, hence, the user interest moved to the second/third results. This assumption was verified from interviews of the subjects. 5.3 Design Implications Using the results of the experiment, we propose a design encouraging users to browse the search results based on the rank. To increase the time of the eye movements of the users, the results displayed on the bottom of the page should be emphasized to get more attention from the users. In the Navigational Task, users concentrated their eye movements on the title of the web page or URL since the result page was a goal of the task. Hence, a thumbnail and/or attribute (e.g. the official page or blog) of the Web page are useful information for the users searching a specific web site.

An Analysis of Eye Movements during Browsing Multiple Search Results Pages

129

6 Conclusion In this paper, we experimentally analyzed the effect of the result position on the results pages. In the experiment, we measured users’ eye movements during web search tasks to analyze how long users spend on each result of the results pages. As a result, we found the results displayed on the bottom of the page were viewed for a shorter time than the results displayed on the top of the next page. Also, we found the tendency that the second/third results of each page were viewed longer than the first result of the results page. As a future work, we will analyze the effect of the display position on the screen. Acknowledgments. We would like to thank all the participants. In this paper, we used a part of the data of the NTCIR-4 WEB task that is sponsored by the National Institute of Informatics, as organizers of the NTCIR-4 WEB task project.

References 1. Cutrell, E., Guan, Z.: What Are You Looking For?: An Eye-tracking Study of Information Usage in Web Search. In: CHI 2007: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 407–416 (2007) 2. Guan, Z., Cutrell, E.: An Eye Tracking Study of the Effect of Target Rank on Web Search. In: CHI 2007: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 417–420 (2007) 3. Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manage. 36(2), 207–227 (2000) 4. Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002) 5. Rose, D.E., Levinson, D.: Understanding User Goals in Web Search. In: WWW 2004: Proceedings of the 13th international conference on World Wide Web, pp. 13–19 (2004) 6. Murata, T., Saito, K.: Extraction and Visualization of Web Users’ Interests Using SiteKeyword Graphs. Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 18(5), 701–715 (2006) (in Japanese) 7. Clarke, C.L.A., Pan, B., Agichtein, E., Dumais, S., White, R.W.: The Influence of Caption Features on Clickthrough Patterns in Web Search. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 135–142 (2007) 8. Otsuka, S., Toyoda, M., Kitsuregawa, M.: A Study for Analysis of Web Access Logs with Web Communities. Transactions of Information Processing Society of Japan 44(18), 32– 44 (2003) (in Japanese) 9. Lorigo, L., Pan, B., Hembrooke, H., Joachims, T., Granka, L., Gay, G.: The influence of task and gender on search and evaluation behavior using Google. Inf. Process. Manage. 42(4), 1123–1131 (2006) 10. Sakai, M., Nakamichi, N., Shima, K., Nakamura, M., Matsumoto, K.: WebTracer: A New Web Usability Evaluation Environment Using Gazing Point Information. Transactions of Information Processing Society of Japan 44(11), 2575–2586 (2003) (in Japanese)

130

Y. Matsuda et al.

11. Eguchi, K., Oyama, K., Aizawa, A., Ishikawa, H.: Overview of the Informational Retrieval Task at NTCIR-4 WEB. In: Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization (2004) 12. Oyama, K., Eguchi, K., Ishikawa, H., Aizawa, A.: Overview of the NTCIR-4 WEB Navigational Retrieval Task 1. In: Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization (2004)

Development of Estimation System for Concentrate Situation Using Acceleration Sensor Masashi Okubo and Aya Fujimura Doshisha University, 1-3 Miyakodani, Tatara, Kyotanabe, Kyoto, 610-0321, Japan [email protected], [email protected]

Abstract. Recently, to discipline to increase powers of concentration is popular. One of the reason, it is difficult to concentrate something in these days because of a flood of information. However we discipline our concentration by using the how-to books and the portable games, we cannot evaluate the training effect on the practical life. In this paper, we propose an evaluation system for user’s powers of concentration in which the method for the estimate user’s sitting situation is utilized. This system is constructed by two kinds of method, one is the method which estimates the sitting situation and the other is the evaluation method for user’s powers of concentration situation. These methods use user’s motion that is obtained from the acceleration sensor that is fixed on the chair. And we prepare the three kinds of Graphical User Interface (GUI) which presents the concentration situation to the user. Keywords: Powers of concentration, GUI, Sensory evaluation and Selfmanagement.

1 Introduction The modern society can be described as information society because of a flood of miscellaneous information. It is getting difficult to handle with these information obtained by Internet and e-mail as well as newspapers and televisions with full understanding. We have a difficulty to deliberate and concentrate something in our life environments. On the other hand, since the launch of game software aiming at brain activation, commonly known as ‘Notore’, which means brain training, various games to train brains, how-to books for improving our concentration are becoming popular among wide range of age groups． One of main factors of this phenomenon is attributed to middle-aged and senior people’s demands for prevention of brain aging and wide-range age groups’ desires for obtaining abilities to deal with a large number of information in a short time. Since these trainings are mostly performed by recitations and simple calculations, users can improve the ability as though they were playing some kind of games. Moreover the results of the trainings can be described as ‘age of brain’ so that they can easily check their abilities. This is one of the reasons that ‘brain training’ software are widely accepted among various age groups. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 131–140, 2009. © Springer-Verlag Berlin Heidelberg 2009

132

M. Okubo and A. Fujimura

However, even our thinking ability and concentration powers can be improved in such games and trainings, we cannot evaluate the effects in the practical life. Currently, the method using biosignals including breathing, heartbeat, and brain waves…etc. is proposed in order to measure person’s powers of concentration [1]. For example, it is found that a brain wave called Frontal Midline theta is generated when solving a problem and concentrating on abstract thinking. Also it has been researched that transition of learner’s skin potential level meets his interest in lectures. However, this sort of method put a heavy burden on a measured person as well as it is costly and time-consuming. Therefore in this study, assuming the situation as if users are working with a computer, studying and doing some paper works in sitting situation, we propose an evaluation system for user’s powers of concentration by processing information obtained by motion sensor fixed on a chair and presenting the concentration situation to the user. Along with development of the actual system, we also prepare the three kinds of Graphical User Interface (GUI) and perform the sensory evaluation experiments for the proposed system and the interface evaluation using these GUI.

2 Hardware Configuration of Estimation System in Sitting Situation Hardware configuration of proposed system is showed as Fig. 1. The motion sensor sends acceleration data to Personal Computer (PC) through Bluetooth communication after measuring a movement of a chair. User’s sitting or standing up situation and his power of concentration are estimated on PC according to the received data, and the estimation result is presented to the user. Monitor

User Bluetooth communication Acceleration sensor

PC

Chair

Fig. 1. System configuration

In the proposed system, the user’s sitting situation on a revolving chair that is generally used in households and offices as shown in Fig. 2 is estimated. Since the revolving chair reflects the various users’ movements, we can estimate the user’s movements by measuring the acceleration of the movements and swings using the motion sensor.

Development of Estimation System for Concentrate Situation

133

We use a remote controller of Nintendo Wii games as the motion sensor. This remote controller uses Bluetooth for connecting to the main unit so that we can easily connect to PC. Since Wii game is available at a relatively low cost, it has already been popular among many households. Fig.2 shows that we fix the remote controller on the upper side of the backrest of the chair because the inclination of the backrest and the revolution of seating face appear prominently there and yet it doesn’t disturb the user. The three dimensional motion sensor is installed near A button located around the center of the remote controller surface. Fig.2 also shows the coordinate axes with putting it lengthwise: X axis is horizontal direction, Y axis is vertical direction and Z axis is depth direction [2].

Y X Z

Fig. 2. Coordinate of 3-axis accelerometers

3 Software Configuration of Estimation System in Sitting Posture 3.1 Estimation of Time of Sitting and Leaving This proposed system estimates if the user is sitting down or standing up and user’s powers of concentration in sitting situation. At first we propose the method for estimating if the user is sitting down or standing up. There are the following two kinds of methods for estimating user’s situation. The first one is a method for estimation by momentary change of acceleration at the time of sitting down and at the time of standing up. The second one is a method for estimation by the change of acceleration along with time. In the proposed system, we use these two methods to ensure the reliability. Estimation by Momentary Change of Acceleration Speed. We sample the acceleration around 100Hz from Wii remote controller via Bluetooth. We instruct several subjects to sit down, work and stand up the chairs with the remote controller fixed on and perform the preparatory experiment to measure the acceleration of these movements. As a result, we find that the accelerations significantly change at the time of sitting down and standing up in the all axes directions. However, we also find that relatively greater acceleration is measured in X-axis and Z-axis at the time of sitting down. Therefore we estimate whether sitting down or standing up by Y-axis acceleration which shows less change. Moreover we focus on momentary change of

134

M. Okubo and A. Fujimura

acceleration at the time of sitting and standing up, respectively. Fig. 3 shows an example of transition of Y-axis acceleration at the time of sitting down and standing up. It shows that the acceleration shifts to a negative direction after shifting greatly to a positive direction. This occurs due to the rise of the seat front as a reaction when the subject is being seated and the seat front sinks down once. On contrary, the acceleration shifts to a positive direction when standing up after shifting greatly to a negative direction. This occurs due that the seat front sinks down as a reaction when the subject is standing up and the seat front rises, which means that the Y-axis acceleration shifts from a positive direction to a negative direction when sitting down and shifts from a negative direction to a positive direction when standing up. Moreover, these shifts of the acceleration such as positive to negative by sitting down and negative to positive by standing up occur within 50ms. Using these shift patterns of the acceleration, we estimate the subjects’ sitting down and standing up situations. Acceleration G(×9.8m/s2)

Time(ms) (a) When sitting down

Time(ms) (b) When standing up

Fig. 3. Transition of the Y-axis acceleration by (a) sitting down and (b)standing up

Now we perform the preparatory experiment for setting the threshold value against the acceleration in order to distinguish the movements among sitting down, standing up and working in sitting situation. Among the nine subjects, maximum value of 0.27G is measured by two subjects and minimum value of -0.23G is measured by another subject. Therefore we set the threshold value of the acceleration to estimate sitting down and standing up as 0.3G in the positive direction and -0.3G in the negative direction. We summarize the estimation method for sitting down and standing up using the momentary acceleration change. At first if more than 0.3G acceleration is measured, and if less than -0.3G is measured within 50ms during standing up, we estimate the subject is sitting down. Also if less than -0.3G acceleration is measured during sitting down and if more than 0.3G acceleration within 50ms is measured during sitting down, we estimate the subject is standing up. Estimation Using the Acceleration Changes with Time Course. The acceleration measured at the time of sitting down and standing up varies with each individual. Sometimes it is difficult to estimate if the acceleration occurred by sitting down and standing up or movements generated during sitting down. According to the maximum and minimum value of Y-axis acceleration when sitting down and standing up, the

Development of Estimation System for Concentrate Situation

135

acceleration of one subject shows less change when sitting down; at 0.11G and -0.19G. If the acceleration is more than 0.1G and less than 0.3G, and if it is less than -0.1G and more than -0.3G when sitting down and standing up, the estimation will be inaccurate simply by using the momentary acceleration. Therefore, we also use the estimation method using the power spectrum sequentially obtained by sampling the acceleration in a certain period of time. At first we seek the power spectrum by executing Fourier transform of 256 pcs sample data among about 2.5s acceleration in the three axes. The frequency component 0.01Hz being regarded as noise is eliminated. We can find that the power spectrum is barely measured in the all frequency domains when absence but is measured in many frequency domains during sitting. Then we add up the value of power spectrum except the spectrum at 0.01Hz frequency. We will judge between absence and sitting by sum of the power spectrum obtained as above. Fig. 4 shows the transition of sum of 3-axis acceleration power spectrum when absence, sitting, and working by using the acceleration data obtained by preparatory experiment Sitting Not available down

Working

Sum total of diff of acceleration

Rotation of the seat

Inclining the rear

Standing up

of

X-axis Y-axis Z-axis

Time(s)

Fig. 4. Transition of sum of 3-axis acceleration power spectrum

Comparing between absence and working in Fig.4, we find that the sum of Y axis accelerations power spectrum does not change very much comparing to that of other axes. On the other hand, the sum of X and Z axis acceleration power spectrum changes greatly depending on the user’s motion. For example, the sum of X axis acceleration power spectrum increases by the user’s motion like rotating the seat front and the sum of Z axis acceleration power spectrum increases by the user’s motion like inclining the backrest of the seat. We summarize the estimation method for sitting down and standing up using the power spectrum. In case more than -0.3 G and less than -0.1G acceleration is measured when absence and the sum of power spectrum exceeds 50 afterwards, we estimate for “sitting down”, otherwise we estimate for “absence”. On contrary, in case more than -0.3G and within -0.1G acceleration, or more than 0.1G and under 0.3G acceleration are measured and the sum of power spectrum is between 5s and 10s, we estimate for “standing up”, otherwise we estimate for “working”.

136

M. Okubo and A. Fujimura

3.2 Relationship of Sitting Situation and Power Spectrum Experiment Objective. Generally, it is considered that a person’s motion strongly tends to lessen when concentrating. That is, when concentrating, it is highly possible that the sum of X-axis and Z-axis acceleration power spectrum obtained by the motion sensor fixed on the chair (hereinafter called the sum of the acceleration power spectrum) become smaller. We perform the preparatory experiments in order to examine this assumption. In the experiments the subjects are instructed to type on the computer for quantitative evaluation for concentration. Experiment ・ Evaluation Method. In this experiment we examine the relationship between the sum of the acceleration power spectrum and the typing speed per unit of time while three subjects engage in typing on the computer for 30 minutes. The subjects are instructed to sit down the chair for 30 minutes and engage in either typing or taking a rest. Except the first 1 minute and last 1minute in each of 30-minutes data, we divide the remaining 28-minutes data into 1 minute to find the cross-correlation functions. Evaluation result. The transition of the sum of power spectrum and typing speed, and the examples of the cross-correlation function is as shown in Fig.5. According to Fig. 5 (a), we find that the sum of power spectrum increases when the typing speed becomes 0, when taking a rest. Also Fig.5 (b) shows the fact that approximately -0.7 negative correlation is obtained due to the result as mentioned above. Fig.5(c) shows the decrease of the sum of power spectrum when increasing the typing speed. Fig.5 (d) shows that approximately -0.7 negative correlation is obtained likewise. The average and standard deviation of the minimum of cross-correlation function of the three subjects are as shown in Fig.6. The average of over -0.4 negative correlation function can be seen in Fig.6. Thus we understand that the strong negative crosscorrelation exist in the sum of the power spectrum of user’s motion and the typing speed per unit of time. Also, the validity of estimation that the subjects are concentrating when the sum of the power spectrum become smaller is clarified. Power spectrum Time(s) (b)

60

Time(s) (a)

30

60

Time(s) (c)

Power spectrum

0

Typing speed Power spectrum of acc.

Time(s) (d)

Fig. 5. Cross-correlation functions (b),(d) between power spectrum of user’s motion and typing speed(a),(c)

Development of Estimation System for Concentrate Situation

Subject A

Subject B

137

Subject C

Fig. 6. Average and standard deviation of the minimum of cross-correlation function

3.3 Interface Set-Up Recently, cars installed instantaneous fuel consumption meters are popular in the market. Drivers are likely to loosen gas pedals because of this meter by knowing his situation at the moment and by reviewing his own behavior. That is also the aim of this proposed system. The situations of users who are sitting are described with avatar so that the users can monitor their own situation objectively. The four types of avatars are prepared: “NOT AVAILABLE”,“AVAILABE”, “WORKING”, “PLAYING” and two patterns such as “male”(upper) and “female”(lower) are created. If the system is used as a self-management tool, it becomes important not only to present the user’s situation whether concentrating or not concentrating, but also to present the powers of the concentration. Therefore, as well as displaying the avatar, we create a line graph that present the powers of concentration in the long term and three patterns of GUI using a level meter to describe momentary powers of concentration. Fig.7 (a) shows the GUI displaying the avatar only. In the Fig. 7 (b), GUI that presents transition of the powers of the concentration is added to the avatar. As we assume that the less the sum of the power spectrum decreases, the more the degree of the concentration increases, we set the upper limit of the sum of the power spectrum at 500. Then we reduce the sum of the power spectrum from 500 so that the line goes up when the Graph of power of concentration while long term

Line graph of power of concentrations while short term.

Level meter (a) GUI (avatar only)

(b) GUI (with line graph)

Fig. 7. Graphical User Interface

(c) GUI (with level meter)

138

M. Okubo and A. Fujimura

degree of the concentration becomes high. The graph in the lower right of GUI presents the powers of the concentration in the previous 30s. Also the graph in the upper side of GUI presents the powers of the concentration for a last few minutes for users to check the record. In Fig. 7 (c), the level meter which displays the momentary change of the powers of concentration nonlinearly is added. Even if the sum of the power spectrum is small when concentrating, up down of the scale of the meter is visually confirmed by decreasing the sum per one scale.

4 System Sensory Evaluation Experiment 4.1 Experiment Objective We examine the usefulness and usability of the proposed system and understandably of the powers of concentration presented in three types of GUI. To be more specific, we confirm the operation capability of the estimation system for powers of concentration and perform the sensory evaluation experiments for aforementioned three types of GUI. 4.2 Experiment Conditions Methods In the experiment, ten subjects ( five males and five females in their twenties) are instructed to sit down the chairs with Wii remote controller fixed on, and to engage in typing on computers which the proposed system are running (Fig.8). They are asked to type letters printed on papers in word processing program for five minutes. The GUI is displayed in the upper left of the screen so that it will not disturb their works and the subjects can check it during the works. GUI

Fig. 8. Experimental scene

The experiments are conducted under the four kinds of conditions as follows: (1) without the proposed system, (2) with GUI using only avatar, (3) with GUI using avatar and line graph, and (4) with GUI having avatar and level meter. Taking into account of the order effect, the order of the experiment conditions changes every subject. After the experiment, we give the subjects evaluation questionnaires by five grades. 4.3 Result of Experiment The average and standard deviation obtained by scoring the answers of questionnaire on understandability of the powers of concentration and the interface is shown in

Development of Estimation System for Concentrate Situation

139

Fig.9. These results show that comparing to when having no system and having GUI using avatar only; the subjects finds easier to understand their degree of the concentration when having GUI with the line graph and level meter. Also we conduct a sign test for all combinations in all questionnaires. About the understandability of the powers of the concentration, significant differences at 5% significant level are seen among the combination between having no system and having the line graph, and between having no system and having the level meter. This result also shows that GUI using the line graph and level meter is easier to understand the powers of concentration than having no system. 4.4 Discussion The GUI with the line graphs and the one with level meters were preferred alike as mentioned previously. The subjects who prefer the line graphs evaluated the ability to check the archival record and the others who prefer the level meter evaluated the ability to understand the powers of concentration. However, some said that they were preoccupied by the system and couldn’t concentrate. So the system which users can choose their favorite GUI and save the archival record to look back the day is desirable. Simplicity of understanding for power of concentration

P<0.05

Good

Neutral

No good No system Only avatar Line graph Level meter

Fig. 9. Result of questionnaire

5 Conclusion This study aims that users conduct self-management by presenting their situations while they are using the systems. We develop the systems which estimate user’s powers of the concentration and sitting down and standing up situation, and also present the powers of concentrations using interface to the users. We conduct further experiments on the sensory evaluation to evaluate the operation effectiveness of the systems and interface. As a result, interface using the line graphs which presents the transition of the long-term powers of concentration and interface using the level meter which presents transition of the momentary powers of concentration are preferred equally by the users.

140

M. Okubo and A. Fujimura

The proposed system is developed in order to review users’ own behaviors by understanding their own situations. Although we consider that this system enables to facilitate the concentration, long hours of concentration turns out to be stress and have a bad influence on mental and health. Therefore, we think it is possible to build up the system to facilitate taking appropriate rests during long hours of concentration. We will validate its effectiveness in the long-term experiments along with improvement of the existing system and interface.

References 1. Tamura, H.: Human Interface. Ohmsha, 44–68 (1998) (in Japanese) 2. http://www.wiili.org/index.php/Wiimote

Psychophysiology as a Tool for HCI Research: Promises and Pitfalls Byungho Park Graduate School of Information and Media Management Korea Advanced Institute of Technology (KAIST) 207-43 Cheongryangri 2-dong, Dongdaemun-gu, Seoul, 130-722, Korea [email protected]

Abstract. Psychophysiology, an area of psychology that measures individual's physiological responses to refer to one's psychological state, can provide a set of useful measures HCI researchers can take advantage of. However, there are limitations to the method itself and room for misinterpretation. This paper introduces psychophysiology, and also shows how research methods psychophysiology offer can be used for HCI research, advantages and disadvantages of using research tools from psychophysiology.

1 Introduction A student essay written and posted online by MIT in 1998 [14] reads "...from studying how human physiology and psychology, we can design better interfaces for people to interact with computers. Work in this domain is only beginning (indeed the number of papers written on this topic has increased in the past few years), and there is much that we don't yet know about the way the human mind works that would allow more perfect user interfaces to be built. (p.4)" Yes, ten years has passed. Though it may have not been as magnificent as this (presumably) young student predicted, there were small yet solid advances made here and there in the area of psychology research. There were also technological advances made for physiology research. Today, as Kim et al. [7] put, HCI researchers are calling for study of human interaction, cognition and physiology to apply such knowledge in system development and implementations. However, it seems that measuring and analyzing physiology data is still not commonly found in HCI research today. Part of the reason may be entrance barrier (understanding of the relationship between human psychology AND physiology, learning to collect physiological data, high cost of physiology equipment, etc.). Another reason may be lack of effort to introduce and share knowledge on using physiology for HCI research. This paper is an attempt to introduce the advantages (and also warn about the possible mistakes made) of applying psychophysiology for HCI research. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 141–148, 2009. © Springer-Verlag Berlin Heidelberg 2009

142

B. Park

2 Psychophysiology and HCI 2.1 Psychophysiology as an Academic Discipline Psychophysiology as an academic discipline that studies the interrelationships between the physiological and psychological aspects of human behavior [15]. A typical study in this area observes human’s physiological responses to understand human’s psychological status and/or changes. Due to its interdisciplinary nature, research from various disciplines including, but not limited to, psychology, cognitive science, medicine, anatomy and neuroscience are incorporated. The type of physiological responses used for psychophysiology studies include blood flow pattern in the brain (functional magnetic resonance imaging; fMRI), heart rate variability, sweat production (skin conductance; SC), respiration, Electroencephalography (EEG; commonly known as ‘brain wave’), muscle contraction (Electromyography; EMG), eye movement, and much more. 2.2 Psychophysiology Providing Tools for HCI Research A wide variety of techniques ranging from qualitative to quantitative are used in HCI studies. The qualitative ones such as in-depth interviews and focus group interviews can provide great depth of insight. However, they are open to misinterpretation and requires a great deal of experience to avoid this. The quantitative ones can be further divided to self-reports and observational ones. Former includes questionnaire and survey, and latter includes response time, counting number of errors (or attempts), and so on. The advantage of using quantitative methods is that the data produced can be used for inferential statistical analysis, providing insight beyond descriptive statistics. However, many quantitative techniques, especially the self-report types, are also open to serious errors. The data may be less-than-accurate because the subject cannot remember certain things clearly, or even motivated to lie (e.g., “have you voted during the last election?”). The research tools psychophysiology offers are quantitative and observational. What makes these tools interesting is that they provide a way to tap into human mind. A large body of literature in psychophysiology provides a fairly solid ground for interpreting the data. In other words, it can provide an alternative way to find how much attention is being paid to the task or how difficult the interface is to use – both which subjects might provide inaccurate answers to when asked using a self-report questionnaire, depending on circumstances.

3 Useful Psychophysiological Research Techniques As mentioned above, a large variety of physiological responses are subject for study in psychophysiology. This section will introduce a few of them that HCI researchers are more familiar with, or may find more useful than others.

Psychophysiology as a Tool for HCI Research: Promises and Pitfalls

143

3.1 Sweat Production (SC) and Stress It is known that people sweat under stress. This is also true when computer users are stressed out from the difficulty of the task given. In order to find how easy or difficult an interface is to use, HCI researchers may measure the amount of sweat production, or skin conductance (SC) of the subject [11, 16]. Deep down, skin conductance is directly related to the activation of sympathetic system of the central nervous system. This is good because it means skin conductance is independent from the activation of the parasympathetic system, which can cause misinterpretation of the activity of many human organs or systems which are under influence of both central nervous systems (for example, human heart may beat faster because the sympathetic system is activated, or because the parasympathetic system is deactivated, or both). However, there are downsides for using skin conductance. One of them is the sensor (electrode) placement. Sweat glands in humans are concentrated in palms and soles. Many computer-related tasks and their interfaces require users’ hands. Collecting skin conductance data from the palm may restrict subjects from using both hands freely. Furthermore, in certain tasks that require both hands (such as typing text with a keyboard), using palm is out of question. Alternatively, sole of the foot may be used, but this is inconvenient since subjects are required to remove their socks and have their foot kept lifted through the whole session to prevent electrodes touching the floor. Using skin conductance as an index of task (or interface) difficulty is it’s slow response speed. Rather than constantly producing sweat, human sweat glands ‘spout out’ sweat. It takes about six seconds for sweat glands to respond to an arousing event, though exact timing varies by individual. This makes it hard for researchers to pinpoint the exact timing of a particular event that causes stress in the human subject. For this reason, comparing the total amount sweat or average skin conductance level over time within a subject during different tasks (which takes longer than 10 seconds to complete) is recommended rather than attempting to identify the exact moment that causes stress [15]. 3.2 Heart Rate (HR) Variability and Attention Human heart is affected by both sympathetic and parasympathetic systems of the central nervous system. This means that its activity is affected by both arousal and paying attention to external stimuli. Research shows that the speed of heart beat slows down when one is paying attention to a stimulus presented [17]. This can be explained from an evolutionary psychology perspective: when an unknown change is noticed in the environment, the body automatically responds to it by calming down (or slowing down) the activities of the internal organs until proper assessment of the situation is made. If the change turns out to be life-threatening, then the famous ‘fight or flight’ reaction, which includes intensive activity in sweat glands and quick acceleration of heart rate follows. If the change does not pose any harm to human, then the heart rate gets faster, but no faster than the normal speed.

144

B. Park

It is important to acknowledge that deceleration of heart rate is an index of external attention. Measurement of external attention is useful for HCI research since it serves as an index of whether the subject is paying attention to the menu of the computer screen, or whether that gentle chime has actually got the subject’s attention. The other type of attention is called internal attention, which happens when one puts mental effort to solve math questions [9]. When this happens, heart rate acceleration is observed. Hence, it is important to see the context of the settings heart rate data has been collected and determine whether the heart rate getting fast is a heart rate acceleration which is a byproduct of internal attention, or simply a heart rate deceleration from not paying attention. Also to put into consideration is the subject’s arousal level – excitation leads to arousal in the sympathetic system, which triggers heart to beat faster. When analyzing heart rate data, context matters a lot. There are two ways to measure heart rate. One is to measure the electrical pulse produced by the human heart every time it contracts and pumps blood out to the whole body. This measurement is called electrocardiogram, or ECG. The other way is to measure the blood flowing in and out to the tip of the finger and/or toe, which is called photoplethysmography (PPG). Photoplethysmography is typically measured by emitting infrared light into the skin. The level of light absorption changes by the amount of blood flowing underneath the skin, and this is used to measure heart rate. ECG monitors the electrical activity of heart, while PPG monitors the mechanical activity of heart. In a perfect world, both has to be able to serve as equally perfect indices of heart activity. But neither is perfect. ECG requires three electrodes, attached on both arms, both legs, or above the chest. Chest placement is rarely used outside of a hospital setting, though it provides the clearest ECG. Attaching electrodes on arms or legs is more practical for HCI research, but the distance from the heart tends to make it more vulnerable to noise (caused by subject’s body movement and internal organ activities) and weak pulse. From my personal experience, arms tend to provide cleaner ECG than legs. PPG needs to have one sensor attached to a finger or a toe. Generally, fingers are better than toes for collecting PPG data because they are closer to the heart. Nevertheless, if the subject’s heart is relatively weak and the hand or foot is in a position that makes it hard for blood to flow in, there may not be enough blood flowing in and out for the infrared light sensor to notice and record the PPG. For these reasons, both are somewhat vulnerable to errors, especially by missing individual heart beats. However, many of these issues can be prevented by preparing the best possible settings for the subject. As long as the study condition is fine, ECG is as reliable as PPG [13]. Sensor placement again can be a problem: ECG requires three locations on the arm or the leg. PPG may require only one but it has to be either a finger or a toe. And it is best to avoid the sensors touching hard surface. As a result, type of data to be collected for heart rate and location for sensor placement depends on the type of task for the subjects. 3.3 Eye Tracking and Attention As the maxim “eye is the window to mind” suggests, it has been long thought that human gaze reflects the top priority of cognitive processes [6]. This is one of the

Psychophysiology as a Tool for HCI Research: Promises and Pitfalls

145

reasons HCI researchers find eye tracking to have great potential to be an important research tool. In the past, electrooculography (EOG) that measures resting potential of the retina was used to track the eye movements. However, EOG can only show to which general direction the eye has moved to; not exactly where the eye gaze if fixed to [6, 15]. With advances in real-time optical data processing, monitoring of the retina by using infra-red light became more sophisticated. Data produced by eye tracking is useful not only since it provides quantitative data that can be subjected for statistical analysis, but also because it can be visualized in ways that are intuitive for audience when presented. One is heat map, which colorcodes each area of the computer screen where usually red is colored to the area that the eye fixation happened longest and dark blue (or no color) represents little or no eye gaze in the area. The other is an animation of which part of the screen the gaze has been moved to (which, some companies call “gaze replay”) in accordance to time. When used properly, both visualization techniques are powerful enough to convince audience with little or no knowledge in statistics [3, 4, 6]. In the past, eye tracking equipments required a head-mounted camera that monitors the retina movement and alto a head-mount that fixes the subject’s head since the equipment was not capable of making adjustments according to head movements. Today, there are equipments in the market that does not require head-mounted cameras. Some of them are capable to make adjustments to minor (and gentle) head movements. However, eye tracking technology available today still has its own limitations. The largest challenge is that it is hard to use on subjects with seriously bad vision. This is not because of the vision itself, but because of the lenses (both glasses or contact lenses). In theory, the equipment should be able to be adjusted for such lenses, but in practice, virtually no machine in the market provides such calibration. However, there are indeed quite a few equipments that automatically adjust to subjects who wear glasses for relatively light myopia. Incapability to track vision to sudden and/or fast head movement is another limitation for contemporary eye tracking technology. 3.4 Facial Muscle Activity (EMG) and Emotion Muscle fibers of human (as well as other animals) contract when they are triggered by electrical current. Electromyography (EMG) is a measure of the electrical activity of muscles, which directly index muscle activities. Facial EMG refers to indexing of facial muscle activities. Psychophysiology researchers have been using facial EMG to monitor emotional changes in human [5, 12]. There are three muscle groups monitored for this purpose: Currugator supercilii, Zygomatic major, and Orbicularis oculi [15]. The Corrugator supercilii muscle group is the “frowning” muscle. It is located between the nose and the forehead and when activated, it draws the eyebrow slightly downward and creates vertical wrinkles between the eyebrows. Corrugator activity is typically used as an index of negative emotion. The Zygomatic major is a muscle group that is located at both ends of the lips. When activated, it pulls the lips from both sides and mostly used when one is smiling. For this reason, it is used as an index of positive emotion. However, it is also

146

B. Park

activated when one is experiencing other types of non-positive emotion (e.g., One screaming “Eeeek!” out of disgust), so the data has to be analyzed with caution. The Orbicularis oculi is another muscle group that is getting attention as an alternative for the Zygomatic major muscle group as an index for positive emotion. The Orbicularis oculi muscle group is located right below the lower eye lids, and it is primarily responsible for eye blinks. It is also responsible for gathering of skin around the eyes when one is smiling, which is called the Duchenne smile. A Duchenne smile involves contraction of both the zygomatic major muscle group and the orbicularis oculi muscle group, and it is seen as an indicator of one experiencing true happiness (a Non-Duchenne smile only involves the zygomatic major muscle group and interpreted as an indicator of social smiling which does not involve happiness; for review, see Ekman, Davidson, and Friesen [2]). One of the challenges using facial EMG is that the muscle groups are small and the electrical activity is very weak. Especially, orbicularis oculi muscle group is so small that researchers are forced to place two mini electrodes (diameter of 4-millimeters) very close (about 5-millimeters), which makes it vulnerable to errors during data collection.

4 Conclusion: No One Measure Is Perfect This paper has reviewed skin conductance, heart rate, eye tracking, and facial EMG as tools for HCI research. There are, of course, other tools such as Electroencephalography (EEG; also known as ‘brain wave’), functional Magnetic Resonance Imaging (fMRI; also known as ‘brain imaging’), and many more. Each has its own unique advantages and disadvantages as a tool for research. Though psychophysiology promises to provide useful research tools for HCI, there are limitations that generally apply to them [8]. The largest one is external validity. As mentioned above, skin conductance, heart rate, facial EMG all requires some kind of sensor to be attached on parts of the subject’s body. And this adds to the awkward feeling to already unnatural setting (e.g., being put into a usability lab with a one-way window and, possibly, video camera rolling) the subject may experience during a session. Eye tracking is no less artificial than other measurements. The least intrusive technology still requires participants keep their head movement minimal and if needed, move gently. The subject has to be reminded of this, and this inevitably causes unnatural tension in the subject which, depending on the type of study, may have fairly large negative impact on the data collected. Another limitation is that the data collected is open to interpretation. Though many people tend to believe psychophysiology provides an absolutely objective data and interpretation. For example, Hornbaek (2005) argues that physiological measures are objective, using "physiological measures of fun in playing computer games (p.92)" as an example. Unfortunately, this view can be wrong – especially in the context of HCI research. Studies in psychology tend to use simple stimuli, which helps avoid unexpected factors to interfere. But in many cases, HCI research has to use real-life equipments and/or interfaces as stimuli that are much more complex than those stimuli used in a

Psychophysiology as a Tool for HCI Research: Promises and Pitfalls

147

typical psychology experiment. So the data collected always have to be interpreted with the context in mind. Could the heart rate have picked up speed because the subject was not paying attention, or because the subject was puzzled by the menu system and had to think about it? Was the subject sweating more because the interface was hard to use or because there was an image of a huge spider shown on the screen? When designing the session, it would be the best to eliminate all factors that may result unwanted interference during data collection, but even after data has been collected, it would be the best to go back and think if there is any room for alternative explanation for certain physiological responses. And researchers should keep in mind that though research tools provided by psychophysiology may offer new insights, none of them is perfect by itself and it is best to be used in combination with other research methods. Through his online column, Jacob Nielsen, a well-known HCI guru and consultant, advices design agencies who are seeking ways to convince clients to pay for usability testing, "using sound methodology is the true sign of professionalism" and they have to "point out usability's astounding return on investment."[10] Including psychophysiology to the set of tools available may also be a good idea for design agencies for improving usability of the final product. Psychophysiology should be able to make good friends with both HCI researchers and HCI practitioners. It’s just the matter of choosing the right tool and applying it the right way to get the right answer to the right question.

References [1] Allanson, J., Wilson, G.M.: Physiological Computing. In: Proceedings of the Conference on Human Factors in Computing Systems (CHI 2002), pp. 912–913 (2002) [2] Ekman, P., Davidson, R.J., Friesen, W.V.: The Duchenne Smile: Emotional Expression and Brain Physiology II. Journal of Personality and Social Psychology 58(2), 342–353 (1990) [3] Hewig, J., Trippe, R.H., Hecht, H., Straube, T., Miltner, W.H.R.: Gender Differences for Specific Body Regions When Looking at Men and Women. Journal of Nonverbal Behavior 32, 67–78 (2008) [4] Jacob, R.J.K., Karn, K.S.: Eye Tracking in Human–Computer Interaction and Usability Research: Ready to Deliver the Promises. In: Hyona, Radach, Deubel (eds.) The Mind’s Eye: Cognitive and Applied Aspects of Eye Movement Research, Oxford, England (2003) [5] Jones, C.M., Dlay, S.S.: The Face as an Interface: The New Paradigm for HCI. In: Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), vol. 1, pp. 774–779 (1999) [6] Just, M.A., Carpenter, P.A.: Eye fixations and cognitive processes. Cognitive Psychology 8, 441–480 (1976) [7] Kim, S., Godbole, A., Huang, R., Panchadhar, R., Smari, W.: Toward an Integrated Human-centered Knowledge-based Collaborative Decision Making System. In: Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, LasVegas, NV, November 2004, pp. 394–401 (2004) [8] Lin, T., Hu, W., Omata, M., Imamiya, A.: Do physiological data relate to traditional usability indexes? In: Proceedings of the 17th Australia conference on Computer-Human Interaction: Citizens Online: Considerations for Today and the Future (OZCHI), vol. 122, pp. 1–10 (2005)

148

B. Park

[9] Mansell, W., Clark, D.M., Ehlers, A.: Internal versus external attention in social anxiety: An investigation using a novel paradigm. Behaviour research and therapy 41(5), 555–572 (2003) [10] Nielsen, J.: Convincing Clients to Pay for Usability (May 19, 2003), http://www.useit.com/alertbox/20030519.html [11] Pecchinenda, A., Smith, C.A.: The affective significance of skin conductance activity during a difficult problem-solving task. Cognition and Emotion 10(5), 481–504 (1996) [12] Prendingera, H., Mori, J., Ishizuka, M.: Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game. International Journal of Human-Computer Studies 62, 231–245 (2005) [13] Selvaraj, N., Jaryal, A., Santhosh, J., Deepak, K.K., Anand, S.: Assessment of heart rate variability derived from finger-tip photoplethysmography as compared to electrocardiography. Journal of Medical Engineering & Technology 32(6), 479–484 (2008) [14] Spiegel, D.: Human Computer Interaction (October 22, 1998), http://xenia.media.mit.edu/~spiegel/papers/HCI.pdf [15] Stern, R., Ray, W.J., Quigley, K.S.: Psychophysiological Recording, 2nd edn. Oxford University Press, New York (2001) [16] Svebak, S.: The effect of task difficulty and threat of aversive electric shock upon tonic physiological changes. Biological Psychology 14(1-2), 113–128 (1982) [17] Ward, R.D., Marsden, P.H.: Physiological responses to different WEB page designs. International Journal of Human-Computer Studies 59, 199–212 (2003)

Assessing NeuroSky’s Usability to Detect Attention Levels in an Assessment Exercise Genaro Rebolledo-Mendez1,3, Ian Dunwell1, Erika A. Martínez-Mirón2, María Dolores Vargas-Cerdán3, Sara de Freitas1, Fotis Liarokapis4, and Alma R. García-Gaona3 1

Serious Games Institute, Coventry University, UK 2 CCADET, UNAM, Mexico 3 Facultad de Estadistica e Informatica, Universidad Veracruzana, Mexico 4 Interactive Worlds Applied Research Group, Coventry University, UK {GRebolledoMendez,IDunwell,Sfreitas, F.Liarokapis}@cad.coventry.ac.uk, [email protected], {dvargas,agarcia}@uv.mx

Abstract. This paper presents the results of a usability evaluation of the NeuroSky’s MindSet (MS). Until recently most Brain Computer Interfaces (BCI) have been designed for clinical and research purposes partly due to their size and complexity. However, a new generation of consumer-oriented BCI has appeared for the video game industry. The MS, a headset with a single electrode, is based on electro-encephalogram readings (EEG) capturing faint electrical signals generated by neural activity. The electrical signal across the electrode is measured to determine levels of attention (based on Alpha waveforms) and then translated into binary data. This paper presents the results of an evaluation to assess the usability of the MS by defining a model of attention to fuse attention signals with user-generated data in a Second Life assessment exercise. The results of this evaluation suggest that the MS provides accurate readings regarding attention, since there is a positive correlation between measured and selfreported attention levels. The results also suggest there are some usability and technical problems with its operation. Future research is presented consisting of the definition a standardized reading methodology and an algorithm to level out the natural fluctuation of users’ attention levels if they are to be used as inputs.

1 Introduction This paper presents a usability evaluation of NeuroSky’s MindSet (MS) device. An aspect of interest was to investigate whether MS readings can be combined with usergenerated data. The amalgamation of physiological and user-generated data would allow the programming of more sophisticated user models. An experimental setting was set up to analyze MS usability in an assessment exercise in Second Life. The assessment [10] is based on a multiple-choice questionnaire in the area of programming for Computer Science undergraduate students. The questionnaire is presented by an Artificial Intelligence-controlled avatar (AI-avatar) who is aware of the levels of attention of the J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 149–158, 2009. © Springer-Verlag Berlin Heidelberg 2009

150

G. Rebolledo-Mendez et al.

person interacting with it. The MS1 also provides a measurement of the user’s meditative state (derived from alpha wave activity). In this paper, however, only the levels of attention are used, given their role and importance in educational settings. The objective of this study is threefold: firstly, the MS general usability is examined. Secondly, an analysis of how well it is possible to fuse information generated as part of normal interactions with brain activity. Thirdly, an analysis of the MS adaptability to different ableusers is provided. The significance of this work lies in that it presents evidence of the usability of a commercially available BCI and its suitability to be incorporated into serious games. The paper is organized in five sections. Section two presents a literature review about Brain Computer Interfaces and their use for learning. Section three describes the Assessment exercise used as test bed and presents the materials, participants and methodology followed during the evaluation. Section four presents the results of the evaluation and, finally, section five provides the conclusions and future research.

2 Brain Computer Interfaces (BCI) Brain Computer Interface (BCI) technology represents a rapidly emerging field of research with applications ranging from prosthetics and control systems [6] through to medical diagnostics. This study only considers BCI technologies that use sensors that measure and interpret brain activity (commonly termed neural bio-recorders [14]) as a source of input. The longest established method of neural bio-recording, developed in 1927 by Berger [3], is the application of electrodes that measure the changes in field potential over time arising from synaptic currents. This forms the basis for EEG. In the last two decades, advances in medical imaging technology have presented a variety of alternative means for bio-recording, such as functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), and positron emission tomography (PET). A fundamental difference between bio-recording technologies used for diagnostic imaging, and those used for BCI applications, is a typical requirement for real or quasi-real time performance in order to translate user input into interactive responses. In 2003, a taxonomy by Mason and Birch [8] identified MEG, PET and fMRI as unsuitable for BCI applications, due to the equipment required to perform and analyze the scan in real-time, but more recent attempts to use fMRI as a BCI input device have demonstrated significant future potential in this area [12]. Bio-recording BCIs have become a topic of research interest both as a means for obtaining user input, and studying responses to stimuli. Several studies have already demonstrated the ability of an EEG-based BCI to control a simple pointing device similar to a mouse [9, 12] and advancing these systems to allow users more accurate and responsive control systems is a significant area for research. Of particular interest to this study is the use of BCI technologies in learning-related applications. The recent use of fMRI to decode mental [4] and cognitive [11] states illustrates a definite capability to measure affect through bio-recording, but the intrusiveness of the scanning equipment makes it difficult to utilize the information gained to provide feedback to a user performing typical real-world learning activities. In this study, the effectiveness of one of the first commercially available lightweight EEG devices, NeuroSky’s MS, is considered. Via the application of a single 1

The MB is a developer-only headset. NeuroSky’s newest headset has been designed to address comfort and fitting problems and is available to both developers and consumers.

Assessing NeuroSky’s Usability to Detect Attention Levels

151

electrode and signal-processing unit in a headband arrangement, the MS provides two 100-state outputs operating at 1Hz. These outputs are described by the developers as providing separate measures of ‘attention’ and ‘meditation’, and it is thus assumed these readings are inferred from processing beta and alpha wave activity respectively. Although the MS provides a much coarser picture of brain activity than multielectrode EEG or the other aforementioned technologies, the principle advantage of the MS is its unobtrusive nature, which minimises the aforementioned difficulties in conducting accurate user studies due to the stress or distraction induced by the scanning process. Research into EEG biofeedback as a tool to aid individuals with learning difficulties [5] represents an area for ongoing study, and the future widespread availability of devices similar to the MS to home users presents an interesting opportunity to utilize these technologies in broader applications.

3 An Assessment Exercise in Second Life An assessment exercise was developed to examine the MS. The exercise works in combination with a model of attention [10] built around dynamic variables generated by the learner’s brain (MS inputs) and the learner’s actions in a computer-based learning situation. The combination of physiological (attention) and data variables is not new [7, 1]. Our approach, however, fuses MS readings (providing a more accurate reading of the learner’s attention based on neural activity) with user-generated data. In our model, attention readings are combined with information such as the number of questions answered correctly (or incorrectly), or the time taken to answer each question, to model attention within the assessment exercise. The MS reads attention levels in an arbitrary scale ranging from 0 to 100. There is an initial delay of between 7 and 10 seconds before the first value reaches the computer and newer values of attention are calculated at a rate of 1Hz (one value per second, see Figure 1). A value of -3 indicates no signal is being read and values equal to or greater than 0 indicate increasing levels of attention with a maximum value of 100. Given the dynamic nature of the attention patterns and the potentially large data sets obtained, the model of attention underpinning the assessment exercise is associated to a particular learning episode lasting more than one second. The model of attention not only determines (detects) attention patterns but also provides (reacts) feedback to the learner [10]. The assessment exercise consists of presenting a Second Life2, AI-driven avatar able to pose questions, use a pre-defined set of reactions and have limited conversations with learners in Second Life. The AI-driven avatar was programmed using C# (C-sharp) in combination with the lib second life library3. Lib Second Life is a project aimed at understanding and extending Second Life’s client to allow the programming of features using the C# programming languages. This tool enables the manipulation of avatars’ behaviors so that they respond to other avatars’. To do so, the AI-driven avatar collects user-generated data during the interaction including MS inputs. The current implementation of the AI-driven avatar asks questions in a multiple-choice 2 3

http://secondlife.com/ http://www.libsecondlife.org/

152

G. Rebolledo-Mendez et al.

format, while dynamically collecting information (answers to questions, time taken to respond, and whether users fail to answer). The data generated by the MS is transmitted to the computer via a USB interface and organized via a C# class which communicates with the AI-driven avatar. In this way, the model of attention is updated dynamically and considers input from the MS as well as the learner’s performance behavior while underpinning the AI-driven avatar’s behavior. For the purposes of assessing MS’s usability, the assessment exercise consisted of ten questions in the area of Informatics, specifically for the area of Algorithms. This area was targeted since it has been noted first year students in the Informatics department often struggle with the conception and definition of algorithms, a fundamental part of programming. The assessment exercise asked Fig. 1. Attention readings as read by the NeuroSky nine theoretical questions and presented three possible answers. For example, the avatar would ask ‘How do you call a finite and ordered number of steps to solve a computational problem?’ while offering ‘a) Program, b) Algorithm, c) Programming language’ as possible answers. The assessment exercise also includes the resolution of one practical problem, answered by the learner by hand while still wearing the MS. 3.1 Materials To evaluate the MS’s reliability, two adaptations of the Attention Deficit and Hyperactivity disorder (ADHD) test and a usability questionnaire were defined. The attention tests consisted of seven items based on the DSV4-IV criteria [2]. The items chosen for the attention test were: 1. Difficulty to stay in one position, 2. Difficulty in sustaining attention, 3. Difficulty to keep quiet often interrupting others, 4. Difficulty to follow through on instructions, 5. Difficulty to organize tasks and activities, 6. Difficulty or avoidance of tasks that require sustained mental effort and 7. Difficulty to listen to what is being said by others. Each item was adapted to assess attention both in the class and at interaction time. To answer individual questions, participants were asked to choose the degree which they believed reflected their behavior in a Likert type scale with 5 options. For example, question 1 of the attention questionnaire asked the participant: ‘How often is it difficult for me to remain seated in one position whilst working with algorithms in class/during the interaction?’ with the answers 1) all the time, 2) most of the time, 3) some times, 4) occasionally and 5) never. Note that for both questionnaires the same seven questions were asked but were rephrased considering the class for the pre-test 4

Diagnostic and Statistical Manual of Mental Disorders.

Assessing NeuroSky’s Usability to Detect Attention Levels

153

or the interaction for the post-test. The usability questionnaire consisted of adapting three principles of usability into three questions (a) comfort of the device; (b) easiness to wear; and (c) degree of frustration. To answer the usability questionnaire participants were asked to select the degree to which they felt the MS faired during the interaction via a Likert type scale with 5 options. For example, question 1 of the usability questionnaire asked the student: ‘Was using Neurosky’ 1) Very uncomfortable, 2) Uncomfortable, 3) Neutral, 4) Comfortable, 5) Very comfortable. Note that to report the usability of the MS, other factors were also considered such as battery life, light indicators and data read/write times and intervals. 3.2 Participants and Methodology An evaluation (N=40) to assess the usability of the MS was conducted among firstyear undergraduate students in the Informatics Department at the University of Veracruz, Mexico. The population consisted of 28 males and 12 females, 38 undertaking the first year of their studies and 2 undertaking the third year. 26 students (65%) of the population were 18 years old, 12 students (30%) were 19 years old and 2 students (5%) were 20 years old. The participants interacted with the AI-avatar for an average of 9.48 minutes answering ten questions posed by the avatar within the assessment exercise (see previous section). During the experiment, the following procedure was followed: 1) students were asked to read the consent form, specifying the objectives of the study and prompted to either agree or disagree, 2) students were asked to solve an online pre-test consisting of the adaptation of the attention deficit and hyperactivity disorder (ADHD) questionnaire to assess their attention levels in class, 3) students were instructed on how to use the learning environment, and finally 4) the students were asked to answer an online post-test consisting of the usability questionnaire and the adaptation of the ADHD questionnaire to assess their attention levels during the interaction in the assessment exercise. Individual logs registering the students’ answers and attention levels as read by the MS were kept for analyses. All students agreed to participate in the experiment but in some cases (N=6) the data was discarded since the MS did not produce readings for these participants. See the results section for a description of these problems. Cases with missing data were not considered in the analysis.

4 Results The results of this evaluation are organized to consider the MS’s usability, how well the model fuses user-generated data and attention readings and the MS’s adaptability. 4.1 Usability and Appropriateness of MS for Assessment Exercises The main aspect of interest was MS’s usability considering the responses to three questions (see materials section). This questionnaire considered three aspects to assess the usability of new computer-based devices: Comfort, Ease of Use, and the Degree of Frustration. The answers to the questionnaire are organized around each aspect considered. There was one question associated to every usability aspect.

154

G. Rebolledo-Mendez et al.

Comfort The results showed that for 5% (N=2) the MS was uncomfortable, for 10% (N=4) somewhat uncomfortable, for 35% (N=14) neither comfortable nor uncomfortable, for 25% (N=10) somewhat comfortable and for 25% (N=10) comfortable. Ease of Use The results showed 15% (N=6) students found the MS difficult to wear, 12.5% (N=5) found it somewhat difficult to wear, 37.5% (N=15) thought it was neither easy nor difficult to wear, 12.5% (N=5) found it somewhat easy to wear and 22.5% (N=9) thought it was easy to wear. Degree of Frustration The answers showed 2.5% (N=1) found the experience frustrating, 2.5% (N=1) thought it was somewhat frustrating, 22.5% (N=9) found the experiment neither frustrating nor satisfactory, 25% (N=10) thought it was somewhat satisfactory and 47.5% (N=19) had a satisfactory experience using the MS. There were three aspects that only became apparent once the evaluation was over. The first aspect of interest was in relation to the pace and the way readings were collected. The attention model [1] considered readings in the space of time used by learners to formulate an answer for each question. The pace in which data was collected by the model was 10Hz which produced repeated measurements in some logs. This method of collecting data is inefficient as plotting attention fluctuations considering fixed, regular intervals might be difficult. People interested in programming the MS device should consider that, due to a hardware processing delay, the MS outputs operate at 1Hz, and need to program their algorithms accordingly. The second aspect of interest is in relation to difficulties wearing the device. When connection is lost, there is a delay of 7-10 seconds before a new reading is provided. Designers should consider this as a constant input might not be possible. The third aspect refers to MS’s suitability as an input device for interface control. Developers need to consider that attention levels (and associated patterns) vary considerably between users (see Figure 2), as expected. If developers employ higher levels of attention as triggers for interface or system changes, they should consider some users normally have higher levels of attention without being prompted to put more attention. This normal variability creates the need to research and develop an algorithm to level-out initial differences in attention levels and patterns. On a related topic, MS’s readings vary in a scale from 0 to 100 (see Figure 1): however, it is not yet clear what relationship exists between wave activity and processed output, whether the scale is linear, or whether the granularity of the 100-point scale is appropriate for all users. Finally, there were some usability problems that caused data loss, in particular: 1. In 3 cases the MS did not fit the participant’s head properly leading to adjustments by the participants leading to not constant and unreliable readings. Another problem was people with longer hair having problems wearing the device to allow sensors touch the skin behind the ears at all times. During the experiment, extra time was required to make sure people with longer hair placed the device adequately. 2. In other 3 cases the MS ran out of battery. The battery was checked before each participant interacted with the assessment exercise using NeuroSky software via its associated software. However, despite the precautions taken and after having checked the green light on one of the device’s side, battery life was very short. The

Assessing NeuroSky’s Usability to Detect Attention Levels

155

device does not alert the user when battery levels are low, so it was not clear when batteries needed to be replaced. This was a problem at the beginning of the experiment but later on batteries were replaced on daily basis. 4.2 Adaptability to Different Users One of the characteristics of the MS reader is that it can be worn by different users producing different outputs. This would allow for adaptation of the model [10] in the frame of the assessment exercise. It was expected MS outputs would vary for different users reflecting varying levels of attention. Furthermore, this adaptation would be fast and seamless without the need to train the device for a new user. To throw some light onto the issue of adaptability, it was speculated attention readings would be different for individuals. It was also hypothesized there would be a positive correlation between the readings and the self-assessment attention test (see materials section). To assess variability among participants, a test of normality was done to see the distribution of the participants’ average attention levels. Table 1 shows descriptive statistics of the readings for the population (N=34). The results of a test of normal distribution showed that the data is normally distributed (Shapiro-Wilk = .983, p = .852) suggesting there is not a tendency to replicate particular readings. Figure 2 illustrates the Q-Q plot for this sample suggesting a good distribution of average attention levels during the assessment exercise. Table 1. Descriptive statistics

N 34 34

Student's attention levels Student's self reported attention

Min 14.99 3.0

Max 88.00 5.0

Mean 53.40 4.27

Std. Dev. 16.69 .44548

Another test designed to see whether MS readings adapted to individual participants, was a correlation between the readings and the self-reported attention using the post-test questionnaire. A positive correlation was expected between these two variables. Table 1 shows the descriptive statistics for the two variables. To calculate self-reported attention levels, the mean of the answers to the 7 items of the attention posttest was calculated per participant; lesser values indicate lesser attention levels. The results of a PearObserved Value son’s correlation between the two variables indicated a significant, Fig. 2. Q-Q plot of students’ average attention positive correlation (Pearson’s = levels during the assessment exercise -.391, p = .022). .3

.2

.1

0.0

Dev from Normal

-.1

-.2

-.3

-.4

20

30

40

50

60

70

80

90

156

G. Rebolledo-Mendez et al.

4.3 Fusing User-Generated Information with MS Readings One way to analyze whether the data was fused correctly was to check the logs for missing or incorrect data. The results of this analysis showed that there were six participants (15%, original sample N=40) for which the MS did not produce accurate readings. An analysis of the logs for the remaining participants (N=34) showed the device produced readings throughout the length of the experiment (average time = 9.48 minutes) without having an erroneous datum (attention = -3). The causes for the lack of readings in 6 cases were due to usability problems (see following section). Another way of throwing some light on how well the MS readings and usergenerated data were fused consisted of analyzing the logs to see whether there was a variation on the model’s reactions for the sample. Since the reactions given by the AI avatar could be of six types [1], the frequency was calculated for each reaction type for the entire population with correct NeuroSky readings (N=34), see Table 2. Table 2. Frequencies associated to the model’s reaction types for the population (N=34)

Reaction Type Frequency

6 128

5 172

4 0

3 77

2 13

1 0

It was expected the frequencies for reaction types 4 and 1 would be 0 given the averages of the four binary inputs. Reaction Types 5 was the most common type followed by Reaction Types 6, 3 and 2. Given the 8 possible results of averaging out the four binary inputs [10], it was expected Reaction Type 3 would be the most frequent. However, this was not the case suggesting the model did vary and the reactions type provided were in accordance to the variations in attention, time, and whether answers were correct. Finally, the responses to two questions in the post-test questionnaire gave an indication of students’ subjective perceptions about how well the reaction types were adequate to their attention needs. The first question asked: ‘how frequently the reactions helped you realize there was something wrong with the way you were answering the questions?’ The answers showed 25% (N=10) of students felt the avatar helped them all the time, 20% (N=8) said most of the time, 35% (N=14) mentioned some times, 12.5% (N=5) said rarely and 7.5% (N=3) stated never. The second question asked ‘how appropriate they thought the combined use of MS and avatars was appropriate for computer-based educational purposes?’ Students’ answered with 65% (N=26) saying it was appropriate, 12.5% (N=5) saying it was appropriate most of the time, 15% (N=6) saying it was neither appropriate nor inappropriate, 2.5% (N=1) saying it was somewhat inappropriate and 5% (N=2) saying it was inappropriate.

5 Conclusions and Future Work The reliability of MS readings to assess attention levels and to amalgamate with usergenerated data was evaluated in an assessment exercise in Second Life, N=34. The results showed there is variability in the readings and they correlate with self-reported attention levels suggesting the MS adapts to different users providing accurate readings of attention. The results of analyzing the device’s usability suggest some users

Assessing NeuroSky’s Usability to Detect Attention Levels

157

have problems with wearing the device due to head sizes or hair interference and that the device’s signals to indicate flat batteries are poor. By analyzing individual logs it was possible to determine that, when the device fits properly, the MS provides valid and constant data as expected. Log analyses also helped establish the frequency different reactions types were provided in the exercise in the light of attention variability. The frequencies suggested the model did not lean to the most expected reaction (Type 3) but that it tended to be distributed amongst Reaction Types 5 and 6, providing an indication that user-generated data was fusing adequately with attention readings. When asked about their experience, 35% of the population said the avatar helped them realize there was something wrong with how s/he was answering the questions and 65% indicated using a MS in combination with avatars was appropriate in computer-based educational settings. When asked about comfort, 35% thought the device was neither comfortable not uncomfortable, 37.5% thought it was neither easy nor difficult to wear and 47.5% said they had a satisfactory experience with the device. There were other results that were apparent only after the evaluation. In particular, it was found that: 1) sampling rates need to be considered to organize data in fixed, regular intervals to determine attention. 2) Developers need to be aware there is a delay when readings are lost due to usability issues. 3) Variability imposes new challenges for developers who wish to use levels of attention as input to control or alter interfaces. Work for the future includes the combination of MS readings with other technologies such as using learner’s gaze, body posture and facial expressions to read visual attention. Future work will be carried out to find out the degree of attention variability to program an algorithm capable of leveling-out different patterns of attention. In addition, future work will explore how attention data can be used to develop learner models that help understanding attention and engagement for informing gamebased learning design and user modeling.

Acknowledgements The research team thank the NeuroSky Corporation for providing the device for testing purposes. We also thank the students, lecturers and support staff at the Faculty of Informatics, Universidad Veracruzana, Mexico.

References 1. Amershi, S., Conati, C., McLaren, H.: Using Feature Selection and Unsupervised Clustering to Identify Affective Expressions in Educational Games. In: Workshop in Motivational and Affective Issues in ITS, 8th International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan (2006) 2. Association, A.P.: Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Press (1994) 3. Berger, H.: On the electroencephalogram of man. In: Gloor, P. (ed.) The fourteen original reports on the human electroencephalogram, Amsterdam (1969) 4. Haynes, J.D., Rees, G.: Decoding mental states from brain activity in humans. Nature Neuroscience 7(7) (2006)

158

G. Rebolledo-Mendez et al.

5. Linden, M., Habib, T., Radojevic, V.: A controlled study of the effects of EEG biofeedback on cognition and behavior of children with attention deficit disorder and learning disabilities. Applied Psychophysiology and Biofeedback 21(1) (1996) 6. Loudin, J.D., et al.: Optoelectronic retinal prosthesis: system design and performance. Journal of Neural Engineering 4, 72–84 (2007) 7. Manske, M., Conati, C.: Modelling Learning in an Educational Game. In: 12th Conference on Artificial Intelligence in Education, IOS Press, Amsterdam (2005) 8. Mason, S.G., Birch, G.E.: A general framework for brain-computer interface design. IEEE Transactions on Neural Systems and Rehabilitation Engineering 11, 70–85 (2003) 9. Poli, R., Cinel, C., Citi, L., Sepulveda, F.: Evolutionary brain computer interfaces. In: Giacobini, M. (ed.) EvoWorkshops 2007. LNCS, vol. 4448, pp. 301–310. Springer, Heidelberg (2007) 10. Rebolledo-Mendez, G., De Freitas, S.: Attention modeling using inputs from a Brain Computer Interface and user-generated data in Second Life. In: The Tenth International Conference on Multimodal Interfaces (ICMI 2008), Crete, Greece (2008) 11. Sona, D., Veeramachaneni, S., Olivetti, E., Avesani, P.: Inferring cognition from fMRI brain images. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 869–878. Springer, Heidelberg (2007) 12. Sitaram, R., et al.: fMRI Brain-Computer Interfaces. IEEE Signal Processing Magazine 25(1), 95–106 (2008) 13. Trejo, L.J., Rosipal, R., Matthews, B.: Brain-computer interfaces for 1-D and 2-D cursor control: designs using volitional control of the EEG spectrum or steady-state visual evoked potentials. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14(2), 225–229 (2006) 14. Vaughan, T., et al.: Brain-computer interface technology: a review of the second international meeting. IEEE Transactions on Neural Systems and Rehabilitation Engineering 11(2), 94–109 (2003)

Effect of Body Movement on Music Expressivity in Jazz Performances Mamiko Sakata1, Sayaka Wakamiya2, Naoki Odaka2, and Kozaburo Hachimura3 1

Faculty of Culture and Information Science, Doshisha University, 1-3 Tatara Miyakodani, Kyotanabe City, 610-0394 Japan [email protected] 2 Graduate School of Human Development and Environment, Kobe University [email protected], [email protected] 3 College of Information Science & Engineering, Ritsumeikan University [email protected]

Abstract. In this study, we tried to examine empirically how body motion contributes to music expressivity, both in terms of intensity and manners, during impromptu jazz performances. Psychological rating experiments showed that music expressivity in jazz performances are assessed in two aspects, namely power and aesthetic quality. In the assessment of musical performances, the music itself basically contributed to how observers evaluated its expressivity. However, it was also shown that body motion had a greater influence on assessing the quality of music in terms of “hard or soft” and “light or heavy.”As a result of the three-dimensional motion analysis using motion capture, we learned that the characteristics of the player’s body motions changed with the playing style and the playing dynamics. The player, therefore, is making music not only by producing the “sound,” but by also showing “body motions” for creating that sound. Keywords: Jazz Performances, Music Expressivity, Body Movement, Motion Capture.

1 Introduction When performing classical music, the musicians share a general, predetermined understanding of the expression of their music and timing and play under the conductor’s direction. In impromptu jazz performances, however, the musicians begin to play their music only after agreeing on minimal requirements, such as the chord progression and structure of the musical composition they are about to perform. This means that each player must understand, on a real-time basis, the music expressivity of the other players and respond to or go along with them. Many studies on music expressivity have been conducted concerning its technical aspects, such as performance methods and their relationship with the impressions aroused by music, but hardly any empirical research has been conducted up to this point concerning body movements as representative of music expressivity; in other J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 159–168, 2009. © Springer-Verlag Berlin Heidelberg 2009

160

M. Sakata et al.

words, the visual expression of body movements. Recently, some studies (those by Davidson [1], Okada [2], and Maruyama [3] and others, or example) have made some reference to the role played by the body in musical performances, but such works are sporadic, and the discussion has really only just begun. This study aims to illuminate “the visual roles of body movements during impromptu performances of jazz music” and empirically show the modes and intensity of body movements that contribute to music expressivity. For this purpose, we employ “Kansei” information processing techniques, motion capturing, feature extraction from motion data and some statistical analyses. “Kansei” is a Japanese word whose meaning is close to “feeling” or “sensibility” in English. Kansei information processing is a method of extracting features related to Kansei conveyed by the media we receive. Conversely, it is also a method of adding or generating some Kansei factors to media produced by computers [4]. We employ motion capturing techniques for obtaining images of human body motions. This technique is now used commonly in movie and CG animation production. Several systems are commercially available nowadays. This study uses motion capturing to analyze jazz performances and quantitatively analyze the roles played by body movements.

2 Study Subjects For study subjects, we prepared materials with the help of professional jazz musicians in order to study the role of body movements in music expressivity. We asked a 10year veteran male alto-saxophone player (24 years old) to play the jazz standard “Summertime.” He played the front theme1, an ad-lib solo in the middle and the back theme for two choruses (with each chorus containing 16 bars). He was asked to play them in three different modes: “ordinary,” “expressionless” and “over expressive.” In this study, these three different modes of expression are defined as “expression dynamics.” In order to retain the characteristic feature of freestyle jazz performance, which is to perceive each other’s music expressivity in real-time, respond to them or follow them, we asked other players to join in the performance of our subject. The back band consisted of a drummer, a wood base player and a guitarist. The drummer was asked to keep the BPM=120 tempo, and all the other players were asked to follow the alto-saxophonist’s performance.

3 Motion Capture System We used an optical motion capture system (Motion Analysis Corporation, EvaRT with Eagle cameras) to measure body movements during a jazz performance. Fig. 1 shows a scene from the motion capturing session in our studio. Reflective markers were attached to the joints on the player’s body, and several high-precision, highspeed video cameras were used to track the motion. In our case, 33+2(on the instrument) markers were put on the player's body (see Fig. 2), and the movement was 1

The pre-composed part of a jazz number is called “theme”. In common jazz performances, musicians play the theme first, then the solo, and then the theme again.

Effect of Body Movement on Music Expressivity in Jazz Performances

161

measured with 10 cameras. The acquired data can be observed as a time-series using the three-dimensional coordinate values (x, y, z) of each marker in each frame (frame rate is 120 fps).

Fig. 1. Motion Capture

Fig. 2. Positions of markers

4 Psychological Rating Experiments In order to determine what kind of impressions are perceived from jazz performances, the object of our study, and how the different modalities, namely the “sound” and the “body,” contribute to musical expressivity, we conducted psychological rating experiments. Thirty-eight observers (20 men and 18 women) participated in this experiment. The mean and standard deviations of age among the 38 observers were 20.5 and 1.43, respectively. All observers had some training in jazz. 4.1 Stimuli for Experiments The motion measured by the motion capture system described above was filmed by a digital camera (SONY) and then edited to produce experimental stimuli. The stimuli were obtained by editing the performances of the front theme and ad-lib solo played with three different expression dynamics, and then further edited using the three modalities of “sound only,” “visual images only” and “sound and visuals.” Table 1 shows the order in which the stimuli were presented (the modalities, dynamics and styles) and the length of each stimulus. 4.2 Procedure We briefed the observers on the experiment, asked them to answer questions concerning their personal attributes and then presented the stimuli, one type at a time, in the order of “visual images only” -> “sound only”-> “sound and visuals.” We provided an interval after showing one type of stimulus twice. In the first showing of the stimuli, the subjects were asked to closely and carefully observe the stimuli. In the second showing, they were asked to fill in the Answer Sheet using the Assessment Words on

162

M. Sakata et al.

a scale from 1 to 7 for each word in the adjective pair, which are shown in Table 22. The videotaped recording was temporarily stopped during the intervals between showing the different types of stimuli. The new stimulus was presented after making sure all subjects finished filling in their Answer Sheet. Table 1. Order of stimuli Order of display 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Modality

Expression dynamics

Style

Duration

visual images only visual images only visual images only visual images only visual images only visual images only sound only sound only sound only sound only sound only sound only sound and visuals sound and visuals sound and visuals sound and visuals sound and visuals sound and visuals

ordinary expressionless over expressive over expressive expressionless ordinary expressionless over expressive ordinary over expressive expressionless ordinary over expressive expressionless over expressive expressionless over expressive ordinary

solo theme theme solo solo theme solo theme theme solo theme solo theme solo solo theme theme solo

64 sec 66 sec 64 sec 62 sec 63 sec 65 sec 63 sec 64 sec 65 sec 62 sec 66 sec 64 sec 65 sec 63 sec 62 sec 66 sec 64 sec 64 sec

Table 2. Assessment Words 1 2 3 4 5 6 7 loose --+--+--+--+--+--+--+-soft --+--+--+--+--+--+--+-powerful --+--+--+--+--+--+--+-clear-cut --+--+--+--+--+--+--+-impressive --+--+--+--+--+--+--+-have presence-+--+--+--+--+--+--+-neat --+--+--+--+--+--+--+-plain --+--+--+--+--+--+--+-happy --+--+--+--+--+--+--+-light --+--+--+--+--+--+--+-unique --+--+--+--+--+--+--+-fantasy-like--+--+--+--+--+--+--+-rich --+--+--+--+--+--+--+-beautiful --+--+--+--+--+--+--+-fast --+--+--+--+--+--+--+-warm --+--+--+--+--+--+--+-subdued --+--+--+--+--+--+--+-inarticulate--+--+--+--+--+--+--+-favorable --+--+--+--+--+--+--+-good --+--+--+--+--+--+--+--

2

tight hard weak unclear unimpressive no presence messy passionate sad heavy ordinary realistic poor ugly slow cold bright articulate unfavorable bad

In defining the Assessment Words used in our experiment, we referred to Iwamiya [5] and added our own 10 pairs of adjectives to create a group of Assessment Words and started our preliminary experiments using these words. After running a factor analysis, we deleted the terms that were not affecting any of the factors, deleted one of the highly-correlated pairs and finally selected 20 pairs of adjectives.

Effect of Body Movement on Music Expressivity in Jazz Performances

163

4.3 Results of Kansei Assessment Experiment The results of the assessments of each stimulus were converted into scores from 1 to 7 using the SD method. We also obtained the average for the adjectives. Extraction of KANSEI information from the stimuli. After conducting a principal component analysis based on the Kansei Assessment Scores obtained, we extracted two principal components with a characteristic value greater than 1. (The cumulative contribution rate was 0.879 up to the second principal component.) Table 3 shows the values of factor loading of each word pair to the two principal components. The shaded areas in the Table indicate the significant image word pair ratings to each principal component with a magnitude larger than 0.8. Table 3. Results of PCA for the rating experiment Assessment Words

PC1

PC2

Table 4. Result of multiple regression analysis Assessment Words

Standardized Coefficients Sound

Adjusted R2

Visuals

loose-tight

.882

.122

soft-hard

.650

.025

powerful-weak

.959

-.141

clear-unclear

.948

.252

powerful-weak

0.926

0.055

0.876

impressive-unimpressive

.964

.108

clear-unclear

0.942

0.081

0.932 0.928

loose-tight

0.851

0.204

0.931

soft-hard

0.593

0.601

0.897

.974

.158

impressive-unimpressive

0.999

-0.031

-.649

.737

0.931

0.077

plain-passionate

.932

.102

have presence-no presence

happy-sad

.749

-.524

neat-messy

0.773

0.222

0.914

light-heavy

.673

-.385

plain-passionate

0.841

0.193

0.959

happy-sad

0.617

0.433

0.847

light-heavy

0.587

0.656

0.959

unique-ordinary

-

-

-

fantasy-like-realistic

-

-

-

rich-poor

1.028

-0.047

0.991

beautiful-ugly

0.943

0.051

0.945

fast-slow

0.670

0.348

0.861

-

-

-

0.657

0.405

0.910 0.969

have presence-no presence neat-messy

unique-ordinary

.974

-.076

fantasy-like-realistic

.859

.036

rich-poor

.789

.589

beautiful-ugly

.308

.942

fast-slow

.743

-.653

warm-cold

.618

.640

subdued-bright

-.701

.706

inarticulate-articulate

-.464

.853

favorable-unfavorable

.490

.846

good-bad

.468

.865

Eigenvalue

11.708

5.877

Variance (%)

58.539 87.925

warm-cold subdued-bright

0.926

inarticulate-articulate

0.687

0.341

favorable-unfavorable

0.915

0.102

0.898

good-bad

0.884

0.145

0.988

From Table 3, one can interpret the PC1 to be the variable concerned with the “power” of a musical performance, and PC2 to be the variable concerned with “aesthetic quality.” We plotted the principal components of PC1 and PC2 on the x- and y-axes and also plotted 18 types of stimuli on a graph (see Fig. 3). On this graph, the presence or power of a performance increases as you move toward the right, while the aesthetic quality increases as you move upward. From the PCA results, it became clear that musical expressivity in jazz performance is perceived from two aspects, namely the “power” and “aesthetic quality.”

164

M. Sakata et al.

Fig. 3. Plot of PCA score for each motion

The Body and the Music in Music Expressivity. To examine which of the two factors, namely “body motions” and “sound,” contributes more to music expressivity, we made a multiple regression analysis for each of the adjectives listed in Table 2, using Kansei Assessment Scores obtained by showing “visual images only” (in other words, body motions only) and “sound only” (in other words, music only) as independent variables. The Kansei Assessment Scores obtained by showing the “sound and visuals” were treated as dependent variables. As a result, we obtained a multiple regression equation (p<0.05) with a high contribution rate for many of the adjectives, as shown in Table 4. Based on the information in Table 4, we see that music contributes more than the body motion (visual images) in the Kansei assessment, as expressed by the words “loose-tight,” “powerful-weak,” “clear-unclear,” “impressive-unimpressive,” “have presence-no presence,” “neat-messy,” “plain-passionate,” “happy-sad,” “rich-poor,” “beautiful-ugly,” “fast-slow,” “subdued-bright,” “inarticulate-articulate,” “favorableunfavorable” and “good-bad.” On the other hand, the body motion (visual images) contributes more than the music in the Kansei assessment, as expressed by words like “soft-hard” and “light-heavy.”

5 Feature Values for Body Motion In the current study, the angles of each body part (back, sides, and knee), velocity (finger tips, elbow, sacral, head, and toe) and the distance moved on the floor (heel movement per unit time) were adopted as “feature values for body motion.” 5.1 Extracting Physical Parameters Angle. This parameter shows how the various body parts change in a time-series manner during musical performances. In our study, we measured the angles of the back, the body’s side and the knee. In Fig. 2, the angle created by marker numbers 5, 18 and 31 shows the angle of the back. The angle created by marker numbers 7, 4 and

Effect of Body Movement on Music Expressivity in Jazz Performances

165

18 is the angle of the side of the body. The angle created by marker numbers 19, 21 and 31 is the angle of the knee. For example, in the case of the back, we set the origin at marker no.18 (x2, y2, z2) and measured the angleθ created between marker no. 5 (x1, y1, z1) and marker no. 31 (x3, y3, z3). Then, using Equation (1) below, we calculated cosθ and returned the obtained cosine radian to a radian using the Arc Cosine. Then we used the Degrees function to convert the angles in degrees into numerical values of the angle. (1) Velocity. This parameter shows the time-series change in the movement of the body parts during a musical performance. In the current study, we measured the speeds of the fingertips of the right hand (no.11), elbow (no.7), sacral (no.18), head (no.2) and the toe (no.27). For each marker, we obtained the Euclidian distance in the frames from the data expressed by the x, y and z coordinates. We then multiplied the Euclidian distance with the frame rate to obtain the time-series data of the velocity. For example, when the x, y and z coordinates of the marker in Frame i are expressed as xi, yi and zi, we can obtain the distance d from Equation (2). (2) Then, by multiplying the d with the motion data frame rate, 120[Frame/sec], we can obtain the velocity |v|. |v| = d * 120

(3)

For the elbow, we used the relative coordinates by using the shoulders as the origin. The sacral were set as the origin to obtain the relative coordinates for the head and toe. This means that the velocity of the elbow, head and toe is expressed by a relative velocity based on the shoulder and the sacral. We obtained the velocity of the fingertips of the right hand and the sacral by using absolute coordinates using an origin determined in the capture area. Floor Travel Distance. This parameter shows how much the player moved on the floor during the performance. We obtained the distance traveled by the left heel (no.32) frame by frame. 5.2 Feature Values for Body Motion By conducting a principal component analysis after obtaining each parameter (raw data) described in Section 5.1 above, we extracted three components with characteristic values greater than 1. (The cumulative contribution rate up to the third principal component was 0.782.) From the values of factor loading shown by the Table 5, we can interpret that PC1 is the component showing the velocity of the upper part of the body, PC2 is the component showing flow travel distance, and PC3 is the component showing the bending of the body.

166

M. Sakata et al. Table 5. Results of PCA for the motion capture data Physical parameters Angles of the back

PC1

PC2

PC3

-.088

.181

.240

.285

.601

Angles of the knee

-.396

.628

-.107

Speed of the hand

.925

.251

-.053

Speed of the elbow

.922

.278

-.093

Speed of the head

.922

.258

-.155

Speed of the sacral

.913

.065

.000

Speed of the toe

.883

-.312

.091

Floor travel distance

.422

-.744

.190

4.572

1.371

1.094

Angles of the body’s side

Eigenvalue Variance (%)

.801

50.798 66.029 78.186

Fig. 4. Plot of PCA score for each motion by motion capture data

In the left graph of Fig. 4, the centers of gravity for PC1 and PC2 are plotted on the x and y axes, and six types of stimuli are plotted. In this figure, the more you move to the right, the greater velocity at which the top part of the body moves. The lower you go on the figure, the more distance covered in floor travel. Looking at each stimulus, the player showed greater floor travel when playing solo than when playing the theme. In terms of expression dynamics, the velocity of the top part of the body increased in this order: “ordinary” -> “expressionless” -> “over expressive.” Likewise, we plotted the centers of gravity for the principal components of PC1 and PC3 on the x and y axes and plotted each stimulus in a graph, which is shown in the right graph of Fig. 4. In this graph, the more you go to the right, the greater velocity of the top part of the body, and the lower you go, the more bending there is of the body. This graph shows that playing solo, rather than playing the theme, resulted in greater body bending. And when playing either solo or the theme, the player’s body bending was greatest during the “ordinary” mode of playing.

Effect of Body Movement on Music Expressivity in Jazz Performances

167

5.3 Relationship between Kansei Assessment and Feature Values for Body Motion We calculated the average and standard deviations for each stimulus for the nine parameters obtained in Section 5.1. In order to examine the relationship between the Kansei Assessment and the Feature Values for Body Motion, we calculated the coefficient of correlation between the principal component score of each performance obtained in Chapter 4 and the characteristic value of body motion of each performance(see table 6). The shaded area shows the combinations that showed a significant correlation of 5%. In the “Power Component,” we found a significant correlation between all Power Components and body motion parameters, except for the parameters “average angle of body’s side,” “average foot travel distance” and “standard deviation of foot travel distance.” For the Aesthetics Components, on the other hand, we could not find any correlation with any of the body motion parameters. Table 6. Correlation matrix Physical parameters Angles of the back Angles of the body’s side Angles of the knee Speed of the hand Speed of the elbow Speed of the head Speed of the sacral Speed of the toe Floor travel distance

Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD

PC1 Power -0.889* 0.880* 0.377 0.879* -0.949* 0.880* 0.894* 0.927* 0.908* 0.954* 0.931* 0.968* 0.908* 0.944* 0.881* 0.859* 0.710 0.678

PC2 Aesthetics 0.130 0.163 0.040 0.254 0.080 0.211 0.244 0.242 0.241 0.226 0.209 0.194 0.247 0.212 0.256 0.270 0.269 0.198

*p<0.05

This means that in musical performance, body motion contributes a large measure to the Power, but not to the Aesthetics.

6 Discussion and Conclusion In this study, we tried to examine empirically how body motion contributes to music expressivity, both in terms of intensity and manners, during impromptu jazz performances. Psychological rating experiments showed that music expressivity in jazz performances are assessed in two aspects, namely power and aesthetic quality. In the Kansei assessment of musical performances, the music itself basically contributed to how observers evaluated its expressivity. However, it was also shown that body motion

168

M. Sakata et al.

had a greater influence on assessing the quality of music in terms of “hard or soft” and “light or heavy.” As a result of the three-dimensional motion analysis using motion capture, we learned that the characteristics of the player’s body motions changed with the playing mode and the playing dynamics. The player, therefore, is making music not only by producing the “sound,” but by also showing “body motions” for creating that sound. It was found that body motions had a great role in creating “power,” but were not much related to the “aesthetics quality.” Naturally, the Kansei emanated from the sound itself is central to music expressivity. However, we have shown empirically that the body motions people make when making music also contribute greatly in music expressivity. This study offers a basic examination of the role of body motions in musical performances; however, many challenges and problems still remain to be explored. Acknowledgments. This work was supported in part by the Grant-in-Aid for Scientific Research, for Young Scientists (B) No. 19700493 of the Ministry of Education, Culture, Sports, Science and Technology, Japan. The authors would like to express their sincere gratitude to Mr. Junya Kondo for his cooperation with our research. Thanks are also due to Mr. Takahiro Yorino, Ms. Hitomi Toyoka and Mr. Naoyuki Okamoto for their kind help in motion capturing experiment.

References 1. Davidson, J.W.: Visual Perception of performance manner in the movements of Solo Musicians. Psychology of Music 21, 103–113 (1993) 2. Okada, A.: The body playing the piano. Shunjusha Publishing Company (2003) (in Japanese) 3. Maruyama, S.: The Embodied Sense of Music: Case Studies on the Rhetorical Function of Bodily Gestures by Highly Practiced Musicians. Cognitive studies: bulletin of the Japanese Cognitive Science Society 14(4), 471–493 (2007) (in Japanese) 4. Sakata, M., Hachimura, K.: KANSEI Information Processing of Human Body Movement. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4557, pp. 930–939. Springer, Heidelberg (2007) 5. Iwamiya, S.: Multimodal Communication of Music and Image. Kyushu University Press (2000) (in Japanese) 6. Iwamiya, S.: Design of Sounds. Kyushu University Press (2007) (in Japanese) 7. Nagashima, Y.: Drawing-in Effect on Perception of Beats in Multimedia. The Journal of the Society for Art and Science 3(1), 108–148 (2004)

A Method to Monitor Operator Overloading Dvijesh Shastri, Ioannis Pavlidis, and Avinash Wesley Computational Physiology Lab, Department of Computer Science, University of Houston, Houston, TX, 77204 {dshastri,ipavlidis,awesley}@uh.edu

Abstract. This paper describes research that aims to quantify stress levels of operators who perform multiple tasks. The proposed method is based on the thermal signature of the face. It measures physiological function from a standoff distance and therefore, it can unobtrusively monitor a machine operator. The method was tested on 11 participants. The results show that multi-tasking elevates metabolism in the supraorbital area, which is an indirect indication of increased mental load. This local metabolic change alters heat dissipation and thus, it can be measured through thermal imaging. The methodology could serve as a benchmarking tool in scenarios where an operator’s divided attention may cause harmful outcomes. A classic example is the case of a vehicle driver who talks on the cell phone. This stress measurement method when combined with user performance metrics can delineate optimal operational envelopes. Keywords: Human-Machine Interaction, divided attention, stress, thermal imaging.

1 Introduction In many occasions, machine operators simultaneously perform more than one task. When a combination of overlapping events demands critical decisions and rapid actions, they raise the operator’s alertness. If this persists, it develops to stress that overloads the operator. Stress due to operator’s divided attention may lead to degradation of his/her performance. This work aimed to analyze stress-induced change in facial physiology during divided attention situations. The supraorbital thermal signature emerged as the physiological variable of interest, which can be measured through thermal imaging. In fact, thermal imaging based stress monitoring is an increasingly popular approach [1] [2] [3]. In contrast to probe based stress monitoring methods, it is totally unobtrusive [7] [8]. Although concurrent performance of multiple tasks is part of human life, insufficient research has been done to understand its effect on human emotional states and performance. The purpose of this study is to develop an effective tool to gauge stress load of operators engaging in multi-tasking. The experimental design focused on cellphone communication during driving simulation [9][10][11] and was tested on 11 participants. However, other task pairs could have been chosen, such as reading in the presence of background distractions. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 169–175, 2009. © Springer-Verlag Berlin Heidelberg 2009

170

D. Shastri, I. Pavlidis, and A. Wesley

2 Methodology During dual tasking there is considerable temperature increase in the supraorbital region of a participant. This locally elevated temperature is the result of increased metabolic activity due to activation of the forehead muscle group. The phenomenon is consistent with findings in prior experiments involving Stroop color conflict testing [1]. In that previous work, the stress signal was extracted from the evolution of the mean thermal footprint of the entire supraorbital region (see Fig. 1. a). This approach, however, introduced noise in the extracted signal, partly due to the wide probing area and partly due to sub-optimal tracking performance [4]. In the present work, the tracking region was differentiated from the measurement region. An even bigger area that included sharp contrasts (e.g., skin versus hair) was selected for tracking. This improved tracking performance. However, only a small subset within the tracking region was selected for the thermal measurement. This subset was confined in the area were metabolic changes are more dramatic, to reduce the effect of probing noise (Fig. 1. b).

(a)

(b)

Fig. 1. (a) Tracking and measurements regions coincide in legacy method. (b) Measurement region (in pink) is a subset of tracking region in the current method.

For every participant in the experiment, tracking and measurement regions of interest were selected as described above. The mean temperature of the measurement region of interest was computed for every frame in the thermal clip. Thus, a 1D supraorbital signal was produced from the 2D thermal data. Any residual noise in the supraorbital signal was suppressed by a noise cleaning algorithm based on Fast Fourier Transform (FFT) [5]. The supraorbital signal was split into segments corresponding to the phases of the experiment (resting, initial single task, dual task, latter single task, and cool-off). Each segment was approximated with a linear fit (Fig. 2), which described the local metabolic rate at the time.

3 Experimental Design A high quality Thermal Imaging (TI) system was used for data collection. The centerpiece of the TI system was a ThermoVision SC6000 Mid-Wave Infrared (MWIR) camera (FLIR Systems) (sensitivity = 0.025oC) [12]. The experimental protocol included thermal imaging of the participant’s face while he was resting, engaging in a

A Method to Monitor Operator Overloading

171

driving simulation game (single task), engaging in a driving simulation game and talking on the cell phone (dual task), and relaxing. The dataset featured participants of both genders, different races, and with varying physical characteristics. The participants were placed 6 feet away from the thermal camera (Fig. 3). We used a XBOX-360 game console and the Test Drive: Unlimited game to simulate driving. The participants were asked to follow all traffic signs, drive normally, and not race during the experiment. They were given an opportunity to testdrive before the experiment began to acquaint themselves with the driving simulation. In the first formal phase of the experiment, the participants were asked to rest for 5 min while being imaged. This helped to isolate effects of other stress factors that participants may have carried inside them. This was the baseline phase.

Fig. 2. Supraorbital raw thermal signal (marked in blue color), noise cleaned thermal signal (marked in pink color) and linear fitting (marked in yellow color). Slope values for the linear segments are shown in blue colored text.

Next, the participants were asked to play the driving simulation game. This phase of the experiment also lasted about 5 min. After around 1 min of driving simulation (initial single task phase), the participant received a cell phone call that played a set of prerecorded questions in the following order: Instruction: Please do not hang up until you are told so. Q1: Are the lights ON in the room, yes or no? Q2: Are you a male or female? Q3: Who won the American civil war, the north or the south? Q4: What is 11 + 21? Q5: How many letters ‘e’ are in the word experiment? Q6: I am the son of a mom whose mother in law's son hit. How am I related to the other son? Q7: My grandma's son hit his son. How are the sons related? Q8: A man is injured in 1958 and died in 1956. How is that possible? Q9: What is 27 + 14? Instruction: You may now hang up the phone and pay attention to the game.

172

D. Shastri, I. Pavlidis, and A. Wesley

The question set was a combination of basic, logical, simple math, and ambiguous questions. The order of the questions was designed to build-up pressure on the participants. Additional pressure was achieved by repeating one more time every question that was incorrectly answered. The participants were supposed to drive while talking on the cell phone (dual task phase). At the end of the phone conversation, participants put the phone down and continued driving till the end of the experiment (the latter single task phase). Finally, the participants relaxed for 5 minutes. The purpose of this so-called cool-off segment was to monitor physiological changes after the simulated driving experiment.

Fig. 3. Experimental setup: Participant, imaging equipment, xBOX-360

4 Experimental Results The slopes of the linearly fitted segments, computed according to the method described in Section 2, were used as stress indicators. Fig.4 shows the mean slope values of the various segments for the entire data set (statistically constructed mean participant). The graph clearly indicates that the temperature increase during dual tasking is the highest among all segments (sole exception was participant S-6). Since the temperature increase is correlated to metabolic rate, the results indicate elevated metabolism in the supraorbital region during dual tasking. This is presumably due to strong muscle activation, associated with frowning, a facial expression autonomically associated with mental engagement. Stress during the latter single task phase was stronger than the initial single task phase (Fig. 4). Apparently, this was due to residual effect from the dual task that preceded the latter single task phase. Most of the participants admitted during debriefing that they were thinking about their dual task performance while performing the latter single task.

A Method to Monitor Operator Overloading

173

Fig. 4. Mean slope value of the experimental segments. This stress indicator is the highest during dual tasking.

Interestingly, baseline stress was a bit higher than the initial single task stress. This indicates that either the baseline was poorly designed (just sitting idle can be stressful) or participants carried some residual stress from the informal test-driving phase that preceded it. Participant S-6 is an interesting case. It appears that his stress started decreasing in the middle of dual tasking (Fig. 5). On careful examination of the data, the investigators found that this is the only participant who started perspiring in the middle of the experiment, apparently due to overwhelming stress. Perspiration reduces the thermal signature and the current method wrongly interprets this as lower metabolic rate and thus, lower stress. It is exactly the opposite. A method that identifies the emergence of perspiration and switches measurement metrics is needed to overcome this issue. In all cases, the rate of thermal change of the cool-off segment had an opposite global trend than that of the dual task segment. In most cases, the rate of thermal change of the cool-off segment had also an opposite global trend than that of the initial and latter single task segments. This illustrates that the participants indeed felt relaxed after 5 minutes of intense mental activity but at various degrees. Those participants who were thinking about their performance during the cool-off period exhibited slow recovery. This is an interesting finding, as it illustrates that not only an action but also thoughts about the action (past action in this case) could affect stress levels. Performance of the drivers degraded during the dual task segment, as measured by the point system of the simulator. This was inversely proportional to the mean stress level measured through the supraorbital channel.

174

D. Shastri, I. Pavlidis, and A. Wesley

Fig. 5. During the dual task period, the supraorbital signal (marked in blue color) of S6 showed ascending global trend in the first half and then descending global trend in the second half (marked in green color). The culprit is the onset of perspiration.

5 Conclusion This research brings to the fore a stress quantification method ideally suited to situations where the attention of the machine operator is divided. Unobtrusive quantification of stress and its correlation to operator performance and emotions are of singular importance in man-machine interaction. A feedback system can be developed that alerts the operator about his/her stress status based on the facial thermal signature. Results from a pilot experiment on the effect of cell phone communication during driving are more than encouraging. They open the way for a plethora of other multitasking experiments drawn from daily life. At the technical level, the issue of perspiration, which corresponds to the onset of extreme stress, cannot be handled with the current method. A method that identifies perspiratory patterns and handles thermal computation in a different manner from that point onward is needed in the future.

Acknowledgments This material is based upon work supported by the National Science Foundation under Grant No. #ISS-0812526, entitled “Do Nintendo Surgeons Defy Stress.” Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

References 1. Puri, C., Olson, L., Pavlidis, I., Levine, J., Starren, J.: StressCam: Non-contact measurement of users’ emotional states through thermal imaging. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1725–1728 (2005)

A Method to Monitor Operator Overloading

175

2. Palvidis, I., Eberhardt, N.L., Levine, J.: Human behavior: Seeing through the face of deception. Nature 415, 35 (2002) 3. Palvidis, I., Levine, J.: Thermal image analysis for polygraph testing. IEEE Engineering in Medicine and Biology Magazine 21(6), 56–64 (2002) 4. Dowdall, J., Pavlidis, I., Tsiamyrtzis, P.: Coalitional tracking. Computer Vision and Image Understanding 106, 205–219 (2007) 5. Tsiamyrtzis, P., Dowdall, J., Shastri, D., Pavlidis, I., Frank, M.G., Ekman, P.: Imaging facial physiology for the detection of deceit. International Journal of Computer Vision 71(2), 197–214 (2006) 6. Standage, D.I., Trappenberg, T.P., Klein, R.M.: A continuous attractor neural network model of divided visual attention. In: Processing of IEEE international Joint Conference on Neural Networks, vol. 5, pp. 2897–2902 (2005) 7. Yamakoshi, T., Yamakoshi, K., Tanaka, S., Nogawa, M., Shibata, M., Sawada, Y., Rolfe, P., Hirose, Y.: A Preliminary Study on Driver’s Stress Index Using a New Method Based on Differential Skin Temperature Measurement. In: 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 722–725 (2007) 8. Yamaguchi, M., Wakasugi, J., Sakakima, J.: Evaluation of Driver Stress using Biomarker in Motor-vehicle Driving Simulator. In: 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 1834–1837 (2006) 9. Yili, L.: A queuing network model of human performance of concurrent spatial and verbal tasks. IEEE Transactions on Systems, Man and Cybernetics 27(2), 195–207 (1997) 10. Kenmochi, A., Takaki, Y., Fukuzumi, S.: Psychological tension estimation during the use of a driving simulator: A finger and ear pulse volume study. In: Proceedings of the 18th Annual International Conference of the IEEE Bridging Disciplines for Biomedicine, vol. 5, pp. 1804–1805 (1996) 11. Healey, J.A., Picard, R.W.: Detecting stress during real-world driving tasks using physiological sensors. IEEE Transactions on Intelligent Transportation Systems 6, 156–166 (2005) 12. FLIR Systems, 70 Castilian Dr., Goleta, California 93117, http://www.flir.com

Decoding Attentional Orientation from EEG Spectra Ramesh Srinivasan, Samuel Thorpe, Siyi Deng, Tom Lappas, and Michael D’Zmura Department of Cognitive Sciences, UC Irvine, SSPA 3151, Irvine, CA 92697-5100 {r.srinivasan,sthorpe,sdeng,tlappas,mdzmura}@uci.edu

Abstract. We have carried out preliminary experiments to determine if EEG spectra can be used to decode the attentional orientation of an observer in threedimensional space. Our task cued the subject to direct attention to speech in one location and ignore simultaneous speech originating from another location. We found that during the period where the subject directs attention to one location in anticipation of the speech signal, EEG spectral features can be used to predict the orientation of attention. We propose to refine this method by training subjects using feedback to improve classification performance. Keywords: EEG, attention, orienting, classification.

1 Introduction This report summarizes our preliminary studies towards developing a BCI to decode intended direction, i.e., the attentional orientation of an individual, from EEG recordings alone. This effort is part of a larger project (ARO 54228-LS-MUR, Silent Communication Among Dispersed Forces) to develop a BCI to decode intended speech and intended direction from EEG signals. The most obvious indicators of attentional orientation are head and eye orientation. However, even when head and eyes are oriented in one direction, an observer can attend to other directions. In experimental psychology, this is referred to as covert attention[1-3]. Covert attention involves all the senses including visual, somatosensory, and auditory information processing [4-5]. Auditory information processing is particularly relevant to covert attention, because the auditory system can identify sources which lie outside the visual scene, including to the side of and behind the observer. Mechanisms which monitor and detect sources throughout the environment selectively are essential to guide head and eye movements. The close relationship between attention and orienting eye and head movements is supported by experimental studies and theoretical models which suggest a motor programming role for attention [6-7]. Most previous EEG and fMRI studies focused on the effects of orienting attention on the response to a sensory stimulus [8-9]. Across all of these studies, the main effect is that larger responses are recorded to attended stimuli. More recently, a more nuanced view has emerged which suggests that distinct brain networks respond to attended and unattended stimuli [10-11]. Our goal is to identify the neural signature of orienting attention in one direction even when there is no stimulus at that location. Thus, we are interested in the J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 176–183, 2009. © Springer-Verlag Berlin Heidelberg 2009

Decoding Attentional Orientation from EEG Spectra

177

top-down instruction to orient attention in one direction (or to one location) rather than the orienting elicited by a salient stimulus (bottom-up). Recently, a number of EEG and fMRI studies have been directed at this question and have demonstrated preparatory neural activity when attention is directed by instruction (cued) to one location on a computer screen [12-15]. The fMRI studies examined retinotopically mapped areas of the visual system and demonstrated preparatory increases in neuronal activity as indexed by the BOLD signal. EEG studies showed increased alpha rhythm in the parietal cortex ipsilateral to the attended visual field. Although these studies have elucidated some of the mechanisms of top-down attentional orienting, they are limited in their usefulness towards developing a BCI that can decode attentional orientation. In general, orienting attention takes place in a larger sensory field, not just a limited sector of visual space within 10-12 degrees of fixation on a computer monitor. In audition, perception of sources takes place in all directions, even behind the subject. Thus, in our experiment observers selectively attend to auditory rather than visual stimuli in order to investigate attentional orientation in a wider span of sensory space. We have carried out experiments directing attention to one of two directions and identify the attended direction by classification of the EEG spectra. Our results suggest that attentional orientation can potentially be decoded from the EEG but that further work is needed to train the observers and improve classification methods.

2 Methods Procedure. Six subjects participated in a speech perception experiment (see Figure 1). The subject was seated in a dimly lit room between two speakers (each at 1 m distance) and instructed to fixate on a point. There were two experimental conditions – attend left or attend right. The subject is given the instruction through both speakers. After a variable ISI (500, 700, 900, 1100, or 1300 ms), two different speech stimuli were presented one through each speaker. The speech stimuli were synthesized (http://cepstral.com) in two distinct male voices, one played through each speaker. The assignment of voices to speakers was independently randomized on each trial. The stimuli were a simplified version of the Coordinate Response Measure corpus [16]. These sentences were structured as “(Arrow, Baron, Eagle or Tiger) go to (Blue, Green, Red, or White) now”. The subjects’ task was to identify the two words played through the attended speaker with a response on a keypad. In the example shown in the figure the correct response is “Baron” and “Blue”. The experiment was designed in this manner to demand that the subject direct attention to the correct speaker before the speech was played; otherwise the subject would miss the first code word. The variable ISI and randomized voice ensured that the observer quickly deployed and maintained attention to the appropriate speaker. In addition an adaptive staircase procedure was used to control subject performance. When the subject responded correctly to the first word, the volume was reduced by 5 % on the attended speaker and increased by 5% on the unattended speaker on the next trial. When the subject responded incorrectly to the first word, the volume was increased by 10% on the attended speaker and decreased by 10% on the unattended speaker. This procedure resulted in subjects performing the task correctly about 70% of the time.

178

R. Srinivasan et al.

Fig. 1. The experimental setup. A. The physical layout of the experiment. Note that each speaker was 45 degrees away from fixation and could not be seen by observer without moving the eyes. B. The time course of each trial in the experiment. In the example shown, the instruction is to attend to the right and after a variable ISI, two distinct sentences are played through each speaker. In the example shown, the subject responds “Baron” and “Blue”, and received feedback indicating a correct response.

At this threshold, the amplitude of the attended speaker was typically 30 dB below the unattended speaker. A single experimental session comprised 200 trials, presented in two 100 trial blocks with a break. Each subject participated in three such sessions, each lasting around an hour. EEG recording. EEG was recorded using a 128 Channel Geodesic Sensor Net (Electrical Geodesics, Inc., Eugene, OR, USA) in combination with an amplifier and acquisition software (Advanced Neuro Technology, Inc, Enschede, NL). The EEG was sampled at 1024 Hz and on-line average referenced. Artifact editing was performed through a combination of automatic editing using an amplitude threshold and a manual editing to check the results. Trials with excessive bad channels (> 15 %) were first discarded and then channels with bad trials were discarded. This typically yielded 80100 usable EEG channels (out of 128) and 500-550 usable trials. Data Analysis. We focused on three intervals within the ISI – 0-500, 200-700, and 400-900 ms. For the later intervals, we discarded trials where the target stimulus had already started, resulting in different numbers of trials for the three intervals – 600, 480, and 360. The data were Fourier transformed using an FFT (Matlab) for each 500 ms interval ( f = 2 Hz) and the power spectrum calculated as the squared magnitude of the Fourier coefficients. We limited further analyses to the interval from 4-22 Hz.

Decoding Attentional Orientation from EEG Spectra

179

This was motivated by the goals of this study to identify robust spectral features that can predict attentional orientation. Below 4 Hz EEG is often contaminated with movement and eye blink artifacts. Above 20 Hz the EEG is often contaminated with EMG artifacts. EEG power at each frequency was log transformed and normalized against the total power from 4-22 Hz. Classification. Classification was performed using a naïve Bayes classifier (Matlab) which assumes independent variables, i.e., a diagonal covariance matrix. The data were divided into three conditions based on instruction and performance: Correct Left, Correct Right, and Incorrect. There were roughly equal numbers of Correct Left

Fig. 2. Examples of classification performance versus the number of variables for the three classification intervals. Overall S0 had the best classification performance and S5 had the worst performance. The other subjects showed intermediate levels of classification performance.

180

R. Srinivasan et al.

and Correct Right trials, and 10-20% fewer Incorrect trials. To facilitate comparisons in classification performance between the analysis intervals, we used a fixed number (determined by the 400-900ms interval) of randomly selected “training” trials (typically 150) to calculate a linear classifier, which was applied to classify another 150 “test” trials. The classification proceeded in two steps. First we evaluated the performance of each individual variable (80-100 channels x 10 frequencies) in classification of the “test” data, using 30 random samples of “training” and “test” trials. Then we evaluated the performance of the best 10, 20, 50, 75, and 100 variables using 50 random samples of “training” and “test trials”.

Fig. 3. Power spectra from the channel that provided the best three-way classification for each subject. For 3 subjects (S0, S3, and S7) the channel shown is located over parietal cortex. For the other subjects (S2, S4 and S5) the channel shown is located over the frontal lobes. Note that spectral differences are present at most frequencies.

Decoding Attentional Orientation from EEG Spectra

181

Fig. 4. Topographic distribution of channel sensitivity to attentional orientation. Each channel classification ability was scored and averaged across frequencies. The results were plotted as a topographic map, where white shows the highest predictive power and black indicates no predictive power.

3 Discussion In a simple cued spatial attention experiment we were able to demonstrate preliminary results indicating that the EEG contains information that can be used to decode the orientation of attention in space. Our approach here has been very direct and overly

182

R. Srinivasan et al.

simplistic. We have made use of spectral features of the EEG and a naïve Bayes classification scheme. Both approaches can be significantly improved. Despite the limitation of these methods, we were able to achieve as high as 75% classification performance in 2-way classification and 60% classification performance in 3-way classification. This indicates that EEG signals clearly contain information about attentional orientation that can be used to decode attentional orientation for BCI applications. The data also suggest that the signatures of attentional orientation can be more robustly decoded at a longer interval following the cue. In our comparisons, 400-900 ms after the cue provided the best classification in each subject. This is consistent with theoretical and experimental studies of the episodic nature of attention, which suggested that following the cue the attention window will take at least 300 ms to open [17]. How long this window remains open and whether more robust classification can be obtained remains to be tested. Estimating the duration for which the neural signatures of attentional orientation are sustained will require a larger amount of data. Our results suggest that electrodes over frontal and parietal cortex have the greatest sensitivity to attentional orientation. This finding is consistent with a large body of experimental research with EEG and fMRI that indicate that large-scale networks spanning parietal and frontal cortex mediate selective attention [18-20]. More surprising was the lack of frequency specificity in our classification results. Previous reports had suggested occipital/parietal alpha rhythms would be sensitive to attentional orientation [15, 21]. However, those studies used visual displays where attention was directed to one of two regions within 4-6 degrees of fixation. Thus, those results were specifically related to attentional orienting within the narrowly defined retinotopically mapped visual space. Our results relate to orienting to two regions of auditory space separated by 90 degrees and out of the field of view while fixating. The results of this paper are preliminary and indicate the potential for obtaining an attentional orientation BCI. An important factor we have not yet considered is training subjects to optimize the BCI. Our current work extends this study through a consideration of the full 360 degrees of auditory space surrounding the subject.

Acknowledgements This work was supported by ARO 54228-LS-MUR.

References 1. Desimone, R., Duncan, J.: Neural mechanisms of selective visual attention. Annu. Rev. Neurosci. 18, 193–222 (1995) 2. Egeth, H.E., Yantis, S.: Visual attention: control, representation, and time course. Annual Reviews in Psychology 48, 269–297 3. Kastner, S., Ungerleider, L.G.: Mechanisms of visual attention in the human cortex. Annual Review of Neuroscience 23, 315–341 (2000) 4. Spence, C., Pavani, F., Driver, J.: Crossmodal links between vision and touch in covert endogeneous spatial attention. Journal of Experimental Psychology: Human Perception and Peformance 26, 1298–1319 (2000)

Decoding Attentional Orientation from EEG Spectra

183

5. Spence, C., Driver, J.: Crossmodal attention. Current Opinion in Neurobiology 8, 245–253 (1998) 6. Moore, T., Armstrong, K.M., Fallah: Visuomotor origins of covert spatial attention. Neuron 40, 671–683 7. Sheliga, B.M., Riggio, L., Rizzolatti, G.: Orienting of attention and eye movements. Experimental Brain Research 98, 507–522 (1994) 8. Hillyard, S.A., Anllo-Vento, L.: Event-related brain potentials in the study of visual selective attention. PNAS 95, 781–787 (1998) 9. Kastner, S., Pinsk, M., De Weerd, P., Desimone, R., Ungerleider, L.: Increased activity in human visual cortex during directed attention in the absence of visual stimulation. Neuron 22, 751–761 (1999) 10. Corbetta, M., Shulman, G.L.: Control of goal- and stimulus-driven attention in the brain. Nature Reviews Neuroscience 31, 201–215 (2002) 11. Ding, J., Sperling, G., Srinivasan, R.: SSVEP power modulation by attention depends on the network tagged by the flicker frequency. Cerebral Cortex 16, 1016–1029 (2006) 12. Sylvester, C., Jack, A., Corbetta, M., Shulman, G.: Anticipatory Suppression of Nonattended Locations in Visual Cortex Marks Target Location and Predicts Perception. Journal of Neuroscience 28(26), 6549–6556 (2008) 13. Shulman, G., Ollinger, J., Akbudak, E., Conturo, T., Snyder, A., Petersen, S., Corbetta, M.: Areas Involved in Encoding and Applying Directional Expectations to Moving Objects. The Journal of Neuroscience 19(21), 9480–9496 (1999) 14. Fu, K., Foxe, J., Murray, M., Higgins, B., Javitt, D., Schroeder, C.: Attention-dependent suppression of distracter visual input can be cross-modally cued as indexed by anticipatory parieto–occipital alpha-band oscillations. Cognitive Brain Research 12, 145–152 (2001) 15. Worden, M., Foxe, J., Wang, N., Simpson, G.: Anticipatory Biasing of Visuospatial Attention Indexed by Retinotopically Specific alpha band Electroencephalography Increases over Occipital Cortex. The Journal of Neuroscience 20 (2000) 16. Moore, T.J.: Voice communication jamming research. In: AGARD Conference Proceedings 331: Aural Communication in Aviation, vol. 2, pp. 1–6 (1981) 17. Weichselgartner, E., Sperling, G.: Dynamics of automatic and controlled visual attention. Science 238, 778–780 (1987) 18. Gitelman, D., Nobre, A., Parrish, T., LaBar, K., Kim, Y., Meyer, M., Mesulam, M.: A large-scale distributed network for covert spatial attention, Further anatomical delineation based on stringent behavioural and cognitive controls. Brain 122(6), 1093–1106 (1999) 19. Posner, M., Petersen, S.: The attention system of the human brain. Annual Review Neuroscience 13, 25–42 (1990) 20. Coull, J.T.: Neural correlates of attention and arousal: insights from electrophysiology, functional neuroimaging and psychopharmacology. Progress in Neurobiology 55(4), 343– 361 (1998) 21. Foxe, J., Simpson, G., Ahlfors, S.: Parieto-occipital ~10 Hz activity reflects anticipatory state of visual attention mechanisms. NeuroReport 9, 3929–3933 (1998)

On the Possibility about Performance Estimation Just before Beginning a Voluntary Motion Using Movement Related Cortical Potential Satoshi Suzuki1, Takemi Matsui1, Yusuke Sakaguchi1, Kazuhiro Ando1, Nobuyuki Nishiuchi1, Toshimasa Yamazaki2, and Shin’ichi Fukuzumi3 1

Tokyo Metropolitan University, Asahigaoka 6-6, Hino, Tokyo 191-0065, Japan Kyusyu Institute of Technology, Kawazu 680-4, Iizuka, Fukuoka 820-8502, Japan 3 NEC Common Platform Software Research Laboratories, Shibaura 2-11-5, Minatoku, Tokyo 108-8557, Japan [email protected] 2

Abstract. The present study aimed to investigate this tripartite relationship, regarding MRCP as a physiological index, ballistic movement as an index of operation and accuracy of the task performance. Experiments were conducted 'reaching' task; the subject touches the target appears 300 pixels away from the start point in a vertical direction on the touch sensitive screen with the forefinger. During experiments, EEG, EMG as trigger, image by high-speed camera and the efficiency of task were acquired. As a result, significant differences between the high and poor performance groups were clear on the NS in MRCP acquired from Fz (p < 0.05), Cz (p < 0.05) and Pz (p < 0.05). Furthermore, the difference was confirmed on the duration of ballistic movement. Based on our findings, we attempted to extract MRCP rapidly and automatically without using signal averaging and discuss whether it is possible to estimate accuracy just before the motion is executed. Keywords: Accuracy, ballistic movement, movement-related cortical potential (MRCP), reaching, voluntary motion.

1 Introduction Movement-related cortical potential (MRCP) is a feature of brain waves that appears toward the midline of the head approximately one to two seconds before the onset of voluntary motion [1, 2, 3]. Although the divisions differ slightly between researchers, MRCP is usually separated into the Bereitschafts Potential (BP), the Intermediate Slope (IS) and the Negative Slope (NS). The BP and IS appear at the front and top of the head and have a gradual slope from 2000-500 msec before movement, while the NS starts approximately 500 msec before motion and, in the case of hand movement, has an amplitude of 10 – 15 μV [3]. It can also be observed if the subject imagines the motion in their head without undertaking any real activity [4]. Recently, some studies have demonstrated a relationship between MRCP and eye movements [5], and J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 184–191, 2009. © Springer-Verlag Berlin Heidelberg 2009

On the Possibility about Performance Estimation

185

to use as a method of medical diagnostics [6], although not all details have been fully clarified. Previous work has also shown that the thinking process and information processing within the brain is broken into several steps. Many useful models of this information processing in the head are proposed [7, 8]. These models commonly have 4 steps: perception, cognition, motion planning and motion. Related to this model, MRCP is generally assumed to reflect a preparation or planning stage for beginning the motion [9]. On the other hand, “reaching”, such voluntary goal-directed movement, is known to be one of the most important components of motion by human arms. Mathematical models considering jerk, torque and dynamics of the musculoskeletal system have been used to understand how the motion of reaching is planned and controlled in the brain [10, 11, 12]. Movement efficiency has also been studied and Fitts’ Law proved that reaction time during reaching constantly changes according to the difficulty of the task [13, 14]. During the motion process reaching has two characteristics, the ballistic and corrective movements [15, 16], which influence both the accuracy and duration time of the motion. Observation of these characteristics enables us to comprehend the efficiency of the task performance. There is believed to be a relationship between the ballistic movement and MRCP as the former is a feed-forward movement and MRCP reflects a planning stage for this motion. Building on this, a relationship has also been suggested between MRCP, the ballistic movement and the accuracy of motion (Figure 1). Considering the tripartite relationship among MRCP as a physiological index, ballistic movement as an index of operation and accuracy of the task performance has meaning for ergonomics field. Motion Start Perception Cognition Controlled Processing

Information Processing

Action

MRCP EEG

Motion

BP

IS

NS N-P feedforward

feedback

Ballistic Movement

Collective Movement

Task Performance

Motion Time Accuracy

Fig. 1. The concept of this research. The aim of this study is an investigation the tripartite relationship, regarding MRCP as a physiological index, ballistic movement as an index of operation and accuracy of the task performance.

186

S. Suzuki et al.

The present study aimed to investigate this tripartite relationship, regarding MRCP as a physiological index, ballistic movement as an index of operation and accuracy of the task performance within the field of ergonomics. Based on our findings, we attempted to extract MRCP rapidly and automatically without using signal averaging. We explain and consider the results of this extracting system and discuss whether it is possible to estimate accuracy just before the motion is executed.

2 Methods 2.1 Electroencephalogram MRCP is generally observed at scalp positions Fz, Pz and Cz, on the midline of the frontal and parietal regions of the head. To achieve high spatial resolution around these regions, electroencephalogram (EEG) data were acquired using a 128ch sensor net (Geodegic Sensor Net, Electrical Geodesics Inc., OR, USA) and analyze system (Net Station 4.3, Electrical Geodesics Inc., OR, USA). Electrode impedance was set between 10-50 kΩ. EEG was recorded using a 0.1–50 Hz bandpass (3 dB attenuation). Signals were sampled at 1 kHz and were digitized. 2.2 EMG and Trigger The trajectory of voluntary motion was observed using a high-speed camera (Fastcam512PCI, Fotron Co., Tokyo, Japan) with 125 fps (Figure 2). The trigger signal was generated using the surface EMG on the common digital extensor muscle. Data from the EEG and the high-speed camera were clipped at each trial using triggers. EMG data were acquired using a bio-amp system (Biotop, NEC-Sanei Co., Tokyo, Japan) with a sampling rate of 10 kHz to accurately synchronize and analyze in real time using a Field Programmable Gate Array (FPGA) module (PCI-7831, National Instruments Co., TX, USA). 2.3 Subjects and Task Procedure Experiments were conducted on eight healthy male subjects ranging in age from 22 to 24 years old (average 22.88 +/- 0.83 years). All subjects were right-handed as confirmed by the Edinburgh handedness test [17]. Experiments were conducted in an electromagnetic-shielded room using the following protocols: the subject touches the center of a 17-inch touch-sensitive screen (LCD-AD172F2-T, I/O Data Co., Tokyo, Japan) located 30 cm in front of them with their forefinger. A small cross-shaped target then appears 300 pixels away from the point previously touched in a vertical direction (screen pixel pitch 0.264 × 0.264 mm). The subject moves their forefinger to the center of the displaying target and touches it. The trial is repeated a total of 50 times in two sets. Specific instructions regarding motion speed and accuracy during trial performance were not given to all subjects. The subject’s head was placed on a jaw rest and the right forearm on an armrest to reduce artifacts other than motion of the forefinger.

On the Possibility about Performance Estimation

187

2.4 Analysis MRCP and high-speed camera data were clipped in each trial by a trigger signal generated using the surface EMG on the common digital extensor muscle. EEG data were clipped from 1500 msec before the start of the movement to 500 msec afterward, according to previous studies. The gap from the center of the target to the position actually touched was used as an evaluation index for the accuracy of the voluntary movement. Based on the gap average of task performance, we divided data of performance in each trial into two

EMG System EMG Amp.

PC1 (Full-wave Rectification) Trigger Signal 1

EEG System Shielded room

EEG Electrodes

128ch EEG Amp.

EMG Electrode

PC2 (EEG Analysis)

Motion Analysis System High Speed Camera

PC3 (Motion Analysis)

Lenz

Touch Panel

Trigger Signal 2

Task System PC4 (Task Control)

Fig. 2. Block diagram and data flow. Experiments were conducted on eight healthy male subjects using EEG, high-speed camera, EMG and task systems.

EEG DATA

Ti …..

T1T2T3T4

T50 Trigger

Task performance Average

A

High Accuracy Low Accuracy

EEG DATA

B

A BB

AA B

A B

….. …..

T5 T7T8 T10 ….. Signal T2 T9 ….. Averaging T1 T3T4 T6

MRCPgroupA MRCPgroupB

Comparison

Fig. 3. Analytical process. MRCP and high-speed camera data were clipped in each trial by a trigger signal generated using the surface EMG. Based on the gap average of task performance, the clipped EEG and high-speed camera data from each trial were separated into two groups corresponding to the ‘A’ and ‘B’ groups. The typical waveforms in each group were calculated by averaging and compared with each other.

188

S. Suzuki et al.

groups: a high performance group ‘A’ and a low performance group ‘B’. The clipped EEG and high-speed camera data from each trial were separated into two groups corresponding to the ‘A’ and ‘B’ groups. The typical waveforms in each group were calculated by averaging and compared with each other (Figure 3).

3 Results 3.1 Task Performance Figure 4 shows the distribution of gaps in each trial according to a subject (S1). The distribution resembled a gamma distribution with an average gap of 5 pixels. The trend of the distribution form was seen in all subjects. EEG signals corresponding to trials when the gap was less than 5 pixels were placed in group A and those over 5 pixels were placed in group B, in this subject S1’s case. Typical waveforms corresponding to the two groups were calculated by averaging. High-speed camera data were also divided into two groups. 3.2 MRCP Figures 5 a) and Figure 5 b) show sample of typical MRCP waveforms corresponding to the two groups acquired from Fz and Cz based on the same subject S1 shown in Figure 4. Differences between BPs (observed from 2000 msec before movement) and ISs (observed from 900 to 500 msec before movement) of groups A and B were not obvious. However, differences in NSs (observed from 500 msec before movement) were clearly observed between the two groups, with group A showing a steeper slope than group B. This trend was confirmed from around Fz, Cz, and Pz of all subjects (Figure 6) and the difference in average values between the two groups was found to be significant (Fz: p < 0.05 (p = 0.019), Cz: p < 0.05 (p = 0.050), Pz: p < 0.05 (p = 0.017)). This suggests that there is a relationship between the performance of the arm and the NS slope. Average

20

A group

B group

frequency

15

10

5

0 0

5

10

15

20

Accuracy (pixel)

Fig. 4. Sample data of result of performance (Subject 1). The distribution resembled a gamma distribution with an average gap of 5 pixels, in this subject S1’s case.

On the Possibility about Performance Estimation

A group

B group

A group

Trigger

B group

189

Trigger

-

-1500

-1000

-500

0

500

msec

-1500

5μV

-1000

-500

0

500 msec

5μV

+

+

a) Fz

b) Cz

Fig. 5. Sample data of MRCP (Subject 1). Typical MRCP waveforms corresponding to the two groups acquired from Fz (in Fig. 5 a)) and Cz (in Fig. 5 b))based on the same subject S1 shown in Figure 4. *

15

*

*

Slope (μV/sec)

A group B group

10

(*:p<0.05)

5

0 Fz

Cz

Pz

Fig. 6. Comparison of NS slopes in 2 groups (All subjects). Differences in NS (observed from 500 msec before movement) were clearly observed between the two groups, with group A showing a steeper slope than group B.

3.3 Process of the Reaching Motion Figure 7 shows the movement distance of the forefinger peak. A small difference in ballistic movement between groups A and B was confirmed, but we could not confirm a difference in corrective movement. This is because the experimental task is simple and primitive and the quantity allocated for corrective movement was few so it is not necessary to allocate for corrective movement. Movement distance (mm) (mm)

500

400 400

300

A group B group

200 200 100

00

1 Ballistic Movement

2

Corrective Movement

Fig. 7. Comparison of movement processes in 2 groups (All subjects). A small difference in ballistic movement between groups A and B was confirmed.

190

S. Suzuki et al.

4 Discussion and Conclusion In the present study, we attempted to confirm the tripartite relationship between MRCP, ballistic movement and accuracy of task performance. We could not confirm any difference between high-performance group ‘A’ and low- performance group ‘B’ on BP and IS in MRCP acquired from Cz and Fz. However, significant differences between the groups were clear on the NS in MRCP acquired from Cz and Fz. Furthermore, the difference was confirmed on the duration of ballistic motion. As NS is generally believed to represent a preparation stage for voluntary motion, it appears that the observed difference between groups influences the process of motion and the task performance. These results show the possibility of estimating execution of the performance just before beginning the voluntary motion using MRCP. On the other hand, the motion “reaching” is nearly optimized in terms of smoothness over the entire movement in the field considering with mathematical models. Various optimal criteria are proposed for trajectory planning of this motion in terms of multi-joint arm movements, like human arms. The minimum-jerk criterion [10] plans smooth trajectories in the extrinsic task space, while the minimum-joint-anglejerk criteria, the minimum-torque-change criteria [11], and the minimum-motorcommand-change criteria plan smooth trajectories [12] in the intrinsic body space. These models and criteria are discussed under the assumption that humans plan the trajectory of arm movement before beginning the motion. However, Hogan & Wolpert [18] pointed out that it is difficult to explain the biological relevance of factors such as jerk or torque change in previous models of arm trajectory. They showed that the movement can be achieved by reducing the variance of errors at the end of the movement. The final goal of the “reaching” movement is to minimize the gap at the end of the motion, as suggested by their minimum-variance theory. Although it is not clear how the cerebrum and cerebellum contribute to achieving this movement, this model is an effective way of considering this movement. If we place our own results in context with this model, the concept becomes validated as it is known that humans plan and learn the trajectory of reaching with optimal efficiency just before beginning the motion. Finally in the current study, we attempted to develop a prototype system to derive the NS slope in MRCP from EEG data automatically and in real time. When we observe the P300 and N200, the importance of the shape of the waveform necessitates signal averaging. If the event-related potentials (ERP), such as N200 and P300, was observed, the technique signal averaging is generally used not only to remove the artifact but also to extract more information, i.e. latency, amplitude, the shape of the waveform and so on. However, in the case of MRCP, we only need to observe slopes during the 500 ms just before the motion is executed, namely, just before the trigger. We are therefore attempting to develop a prototype system that obtains a sequentialmemory of the value during 2000 ms of EEG and sequentially calculates the value of the NSs slope in 500 ms. This prototype system is currently being developed using LabVIEW (National Instruments Co., TX, USA) and we already confirmed that the performance can be estimate with 75 % maximum of the time using an index based on NS values. Thus, in the future, our concept and method appears to be promising for use controlling devises safety, such as at car driving.

On the Possibility about Performance Estimation

191

References 1. Kornhuber, H.H., Deecke, L.: Hirnpotentialanderungen bei Willkurbewegungen und passiven Bewegungen des Menschen. Bereitschafts potential und reaffernte potentiale. Pfligers Arch, Gesamte Physiol Menschen Tiere 284, 1–17 (1965) 2. Barrett, G., Shibasaki, H., Neshige, R.A.: Computer-assisted method for averaging movement-related cortical potentials with respect to EMG onset. Electroencephalogr. Clin. Neurophysiol. 60(3), 276–281 (1985) 3. Shibasaki, H., Hallett, M.: What is the Bereitschaftspotential? Clin. Neurophysiol 117(11), 2341–2356 (2006) 4. MacKay, D.M., MacKay, V.: Behind the eye. Basil Blackwell, Malden (1991) 5. Yamamoto, J., Ikeda, A., Satow, T., et al.: Human eye fields in the frontal lobe as studied by epicortical recording of movement-related cortical potentials. Brain 127, 873–887 (2004) 6. Barrett, G., et al.: Cortical potential shifts preceding voluntary movements are nomal in parkinsonism. Electroencephalogr. Clin. Neurophysiol. 63, 340–348 (1986) 7. Card, S.K., Moran, T.P., Newell, A.: The psychology of human-computer interaction. Lawrence Erlbaum Associates, New Jersey (1983) 8. Gopher, D., Sanders, A.F.: S-Oh-R: Oh Stages! Oh Resources! In: Prinz, W., Sanders, A.F. (eds.) Cognition and motor processes, Springer, Heidelberg (1984) 9. Revelle, W.: Indivisual differences in personality and motivation: ’non-cognitive’ determinants of cognitive performance. In: Baddeley, A., Weiskrantz, L. (eds.) Awareness Control, Clarendon Press, Oxford (1991) 10. Flash, T., Hogan, N.: The Coordination of Arm Movements: An Experimentally Confirmed Mathematical Model. Journal of Neuroscience 5, 1688–1703 (1985) 11. Uno, Y., Kawato, M., Suzuki, R.: Formation and control of optimal trajectory in human multijoint arm movement - Minimum torque-change model. Biological Cybernetics 61(2), 89–101 (1989) 12. Kawato, M.: Optimization and learning in neural networks for formation and control of coordinated movement. In: Meyer, D., Kornblum, S. (eds.) Attention and performance XIV, pp. 821–849. MIT Press, Cambridge (1993) 13. Fitts, P.M., Peterson, J.R.: Information capacity of discrete motor responses. Journal of Experimental Psychology 67, 103–112 (1964) 14. Accot, J., Zhai, S.: Beyond Fitts’ Law: Models for trajectory-based HCI tasks. In: CHI 1997, pp. 295–302 (1997) 15. Flowers, K.: Ballistic and corrective movement on an aiming task. Neurology 25, 413–421 (1975) 16. Pratt, J., Abrams, R.A.: Practice and component submovements: The roles of programming and feedback in rapid aimed limb movements. Journal of Motor Behavior 28(2), 149–156 (1996) 17. Oldfield, R.C.: The assessent and analysis of handedness: The Edinburgh Inventory. Neuropsychologia 9, 97–113 (1971) 18. Harris, C.M., Wolpert, D.M.: Signal-dependent noise determinations motor planning. Nature 394, 780–784 (1998)

A Usability Evaluation Method Applying AHP and Treemap Techniques Toshiyuki Asahi, Teruya Ikegami, and Shin’ichi Fukuzumi Common Platform Software Research Laboratories, NEC Corporation, 8916-47, Takayama-Cho, Ikoma, Nara 630-0101, Japan [email protected]

Abstract. This report proposes a visualization technique for checklist-based usability quantification methods. By applying the Treemap method, the hierarchical structure of checklists, weights of check items and evaluation results for target systems can be viewed at a glance. Effective support for usability analysis and its presentation tasks of usability evaluation results are expected. A prototype tool was implemented on a PC and experimental studies assuming actual usability evaluation tasks were conducted. The results indicate that the proposed method improves performance time of some typical tasks. Usability engineers gave higher subjective scores on the usefulness of the proposed method than that of printed table presentation .

Keywords: Usability quantification, checklist, Treemap, visualization, analytic hierarchy process, design tools.

1 Introduction Usability quantification has been considered one of the key technologies for systematizing and spreading user-centered design and has been studied for some time [1]. Several methods such as 1) performance analysis/prediction, 2) subjective questionnaire, 3) checklist-based heuristic and 4) biological metric measurement/analysis have been developed as quantitative usability evaluation methods. Among those, checklist-based heuristics can provide early stage evaluation and synthetic rating even for software systems with many functions and can also pinpoint the user interface elements or interaction techniques to be improved when the checklist is designed carefully and goes into detail [2]. Therefore, this method is expected to fit for benchmarking and usability improvement of large-scale software applications. However, in addition to its relatively higher execution cost, it has a disadvantage in that it is not easy for evaluators or developers to grasp the evaluation result because of the huge number of check items and their complex (hierarchical) structure. One of the solutions to this problem is to visualize the evaluation results properly. Yamada et al. [3] devised an explanation of the score distribution of target systems along with several check item categories by using segmented bar graphs. Although this method seemed to be successful in showing and comparing target systems’ total scores, J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 195–203, 2009. © Springer-Verlag Berlin Heidelberg 2009

196

T. Asahi, T. Ikegami, and S. Fukuzumi

it was still hard to read out which and how check items contribute to (or lower) the synthetic scores since the detailed check items below the second hierarchy (88 items) are not presented on the graph. In this paper, we propose to apply the well-known technique, Treemap, to visualize the evaluation result of checklist-based heuristics. One of the quantification techniques using checklist will be explained first; then the Treemap visualization techniques for that method will be demonstrated with the examples on the PC tool prototype. Section 4 describes the procedure and the results of the experimental study validating the effectiveness of the proposed method. Finally, we discuss the result and its implications for future study.

2 Checklist-Based Usability Quantification Treemap visualization was applied on a checklist-based heuristic method developed by Ikegami et al. [4]. The checklist consists of 126 check items categorized into five sections and 18 sub-sections, which were extracted and arranged from various user interface (UI) guidelines, ISO standards [5][6] and consultation know-how. Each check item is described from the viewpoints of UI components or system functions such as “Are titles attached to each window?” or “Are substitutive operations provided for double-click operations?” However, the usability score is expected to be given from user viewpoints (ex. learnability, memorability) when it is utilized in usability testing or benchmarking. Therefore, a weighting value was given to each checklist item for selected user viewpoints. Several techniques have been known for weight calculation such as expert opinion, task usage frequency or task importance, KANO model, entropy model, geometric mean of pair-wise comparison result in AHP (Analytic Hierarchy Process) and number of problems collected [7]. Ikegami et al. adopted the AHP method considering reliability and execution cost. A few usability experts applied a pair comparison method to give weighting value sets for every viewpoint. Thus, four weighting value sets, which correspond to “learnability,” “memorability,” “efficiency” and “low error rate” (selected by referring to [8]), were given to the check items, their sections and sub-sections (Fig. 1).

3 Usability Visualization Using Treemap In order to visualize the evaluation result of the checklist effectively, we think the major requirements are: 1. Giving the overall view of the checklist including its hierarchical structure, and 2. Displaying detailed information such as each check item’s weight and checking result. In other words, both structural and quantitative information, and both overall and detailed information should be displayed in one view in a form easy to understand. We assumed the Treemap method [9] developed at the University of Maryland would fill these requirements and tried to apply it to the checklist heuristics. The following parts of this section describe detailed techniques of Treemap implementation.

A Usability Evaluation Method Applying AHP and Treemap Techniques

197

Fig. 1. Outline of checklist weighting method [4]

3.1 Visualize the Checklist Treemap techniques were applied to the checklist introduced in Section 2. Figure 2 shows examples of checklist visualization, (a) without weights (assuming all sibling items or sections have equal weighting value), and (b) with the weight set given for the “efficiency” viewpoint. The “slice and dice” and “offset” techniques introduced in reference [9] were adapted to utilize the display area effectively and to present check item structure and titles clearly (full titles of check items appear by simple mouse-over operation). It is obvious that the structure and weight distribution can be viewed in a given display area even for a checklist consisting of more than 100 items. By providing four sets of weighting values and the corresponding map, each of which represents one of the four viewpoints, the entire checklists can be visualized. Although it is quite easy to merge them into one map by adding one more level of hierarchy, we chose to show them separately because the merged Treemap looked too busy and there seemed to be less need for viewing four viewpoints simultaneously.

198

T. Asahi, T. Ikegami, and S. Fukuzumi

(a) No weight

(b) With a set of weights for "Efficiency" Fig. 2. Checklist visualization

3.2 Visualize the Checking Result Two large-scale Web application systems (tentatively named “A” and “B”) were evaluated along with the checklist, and the result was displayed as shown in Fig. 3. The preceding study [10] and Ikegame proposed to do checklist evaluation with yes (meaning the target user interface satisfies the corresponding check item) or no judgment for every check item in order to minimize the effect of individual difference. In the example of Fig. 3, colored areas (yellow for A, red for B) mean “yes” and black areas represent “no” judgment. The total scores, which are the summations of colored areas, are displayed as the bar chart below the map. Intuitively, it is quite easy to read out which check items have bigger influence on the total score, or why system B gets a higher score than A from the viewpoint of

Black: means "no" for A or B Yellow: means A satisfied the check item Red: means B satisfied the item

Fig. 3. Visualization of checking result with a bar chart

A Usability Evaluation Method Applying AHP and Treemap Techniques

199

efficiency, for instance. On the other hand, it is hard to explore the checking result for the low-weighted items. (However, this problem has been partly solved by implementing the zooming function, which enables the zooming of any node (rectangle) out to the base rectangle area with its subsidiary nodes.)

4 Experimental Study In the previous sections, Treemap visualization for a checklist-based quantification method was proposed. The simulated map seems to be useful for usability analysis tasks and for reporting their results. In this section, we try to validate its usefulness experimentally assuming actual data analysis tasks in usability consultation. 4.1 Experimental Design We assumed two kinds of situations in which Treemap visualization would be helpful: 1) usability engineers analyze the checking result, and roughly estimate the usability level of the target systems or detect where they should be changed for effective usability improvement, and 2) usability engineers or consultants explain the analysis result to developers of targeted systems or clients of the consultation. The tasks in the experiment were designed assuming situation (1). Outline. The Treemap on the PC screen (such as Fig. 2) and the table forms printed on paper sheets, both of which represent the checklist items, their weighting values, and checking results for two systems (A, B), were displayed to subjects. Each subject executed tasks assuming situation (1). Task completion time was measured by an experimenter along with the correctness of the answers. Subjective questionnaires were applied after completing all tasks. Table 1. Experimental tasks (task set TA)

a b c d e f

g

Task description Meaning of tasks From the “learnability” viewpoint, what Understand which category is supis the most significant category? posed significant in each viewpoint. Select all viewpoints in which system A’s Grasp target systems’ usability featotal score is higher than that of B. ture roughly. From the “low error rate” viewpoint, Understand which check item is what is the most important check item? most influential on the final score. From the “low error rate” viewpoint, Estimate which check item is not so what is the least important check item? critical. From the “memorability” viewpoint, Compare overall scores among tarwhich system’s score is higher? get systems. From the “memorability” viewpoint, Identify where to improve for raiswhich check item should be improved to ing usability score most effectively. raise system B’s score most effectively? Count up the number of check items in Compare system usability for cersub-category “data output,” where only B tain (restricted) usability aspect. is OK.

200

T. Asahi, T. Ikegami, and S. Fukuzumi

Participants. Nine usability engineers with usability testing or consulting experience ranging from 20-40 years old participated in the experiment. They were asked to do tasks supposing they were in a consulting situation. Tasks. Seven simple tasks supposed to be typical in actual usability analysis work were selected as shown in Table 1. In addition to the seven tasks (task set “TA”), similar tasks (doing the same thing using another data viewpoint) a’ - g’ were also prepared as task set “TB.” Each subject tried TA first, then tried TB. Half of them used table data for TA and Treemap data for TB, and the other half executed TA with the Treemap data and TB with the tables. This order is considered to minimize the training effect possibly appearing in the performance time. Just after completing TA and TB, all subjects were asked to check the seven subjective questionnaires. All subjects could complete all tasks and questionnaire responses in 40-60 min without any serious problem. 4.2 Results Figure 4 shows mean and standard deviation of task completion time and number of correctly executed tasks. In five of seven tasks (except for d and e), completion times on the map seem shorter than those on the table data when simply comparing with mean value. ANOVA shows significant difference in task c (t=2.67, P<0.02) and in task f (t=4.94, P<0.01). Task completion time （sec ） 80

Table Map

60

40

20

0 a

b

c

d

e

f

g Task

Correct

9

9

9

9

7

8

6

8

9

9

8

8

7

6

Not correct

0

0

0

0

2

1

3

1

0

0

1

1

2

3

Fig. 4. Task completion times and numbers of tasks correctly executed or not

A Usability Evaluation Method Applying AHP and Treemap Techniques

201

Figure 5 shows the result of subjective rating for seven questionnaires. As for the overall impression, participants (usability engineers) gave higher scores to table form for “simplicity,” to map form for “comfortableness,” and the same score for “easiness to understand.” As for the readability of check item weights, higher scores were given to Treemap. (As for the check item structure, the same scores were given.) Also, participants tended to feel the Treemap was more useful for both usability analysis and presentation tasks. 4.3 Discussion of Experimental Result From this experiment only, we could not reach a clear conclusion because the number of subjects was not sufficient for strict statistical analysis. One of the reasons was that participants were screened according to their experience as usability engineers. However, some implications or tendencies about the usefulness of the proposed method can be extracted. The experimental results indicate the Treemap presentation should: …be useful in data analysis tasks for usability consultation. We are paying attention to the result that significant task performance improvement was seen in tasks c and f. These tasks are to “select the most important check item (c)” and “pinpoint the check item for improving overall score most effectively.” In many cases, usability engineers need to examine heavily weighted check items prior to others for improving target system usability effectively and promptly. Tasks c and f were designed with the intention of checking the adaptability to this requirement. Effectiveness and usefulness of the Treemap method are highly expected since significant improvement was observed in both tasks. …not be suitable for examining low-weight check items. Although it was not statistically significant, performance time on tables was higher than that on Treemap (approximately 10% shorter mean time) in task d (select the lowest weight check item). When there are a lot of check items, sometimes weights are set smaller than 1/1000. In Treemap representation, where the weights are displayed as areas of rectangles, it is often hard to read out these areas exactly. In real situations, there are not many cases that require elaborate examination of low weight check items, and Treemap may not support such tasks sufficiently. (If engineers are accustomed to using the zooming function, the task completion time will be greatly improved, though.) …not have clear advantages in “rough examination” tasks. As for tasks a, b, e and g, significant difference in task completion time was not observed, contrary to the authors’ expectations. These tasks include roughly comparing target system characteristics, such as selecting viewpoints in which system A’s score is higher. In these tasks, participants did not have to search for items by comparing weighting values, and they could complete the task easily just by reading numerical values in the tables. …be useful by usability engineers. Participants gave higher subjective scores on Treemap presentation for two questionnaires about usefulness. Taking into consideration that every participant was a beginner in using Treemap, this indicates their expectations of Treemap are considerably high. More experience with Treemap and additional software functions that support usability analysis tasks will raise the subjective rating further.

202

T. Asahi, T. Ikegami, and S. Fukuzumi

Very easy to understand

Easy to understand

Average

Simple

Average

Confusing

Very confusing

Very comfortable

Comfortable

Average

Painful

Very painful

Very easy to understand

Easy to understand

Average

Hard to understand

Very hard to understand

Hard to understand

Very hard to understand

Overall impression Very simple

Structure of check items

Weights of check items

For usability analysis

For presentation to developers or clients

Very easy to understand

Easy to understand

Average

Hard to understand

Very hard to understand

Very useful

Useful

Average

Useless

Quite useless

Very useful

Useful

Average

Useless

Quite useless

Printed table

Treemap

Fig. 5. Result of subjective questionnaires

5 Concluding Remarks and Future Work In this report, a method applying the Treemap visualization to checklist-based heuristics was proposed. In some tasks that were intended to simulate usability analysis, improvement of task completion time was observed. Subjective ratings were higher than table form presentation on the usefulness for both data analysis and presentation tasks. Users’ (usability engineers’) experience and additional software functions will raise this score further. We think the following three issues should be overcome for making the checklist-based quantification method practical and widespread. 1. Provide theoretical and reliable bases to the quantification method. 2. Enable objective evaluation by eliminating/minimizing the score difference caused by individual skills or impressions. 3. Provide practical and useful tools for evaluation and analysis tasks.

A Usability Evaluation Method Applying AHP and Treemap Techniques

203

As for issue (2), Ikegami has claimed it will be achieved to some extent by designing check items and their terms elaborately and tuning them iteratively through experimental studies [4]. The proposed method was intended to contribute to resolving issue (3) and some effect was shown in the experimental studies. Of course, we will need many more functions and tools to support real usability analysis tasks. We are considering a tool for weight assignment by using the Treemap as a data input tool [11]. In order to create breakthroughs on issue (1), we need to present scientific bases of quantification, but it is hard to accomplish with short-term research. Although preceding studies [2][3][4][10] have tried to add reliability by adopting well-used guidelines or regulations, they have not become widespread as established methods. We think we need to ensure the validity of the method by developing or refining user cognitive/behavioral models for the checklist-based quantification as with the GOMS and KLM models for performance prediction with CogTool. Both scientific research and practical field activities should be merged harmoniously to develop reliable and systematic methodologies.

References 1. Hirasawa, N.: Usability Engineering for Software Development. IPSJ Magazine 44(2), 136–144 (2003) 2. Smith, S.L., Mosier, J.N.: A Design Evaluation Checklist for User-System Interface Software, Technical Report ESD-TR-84-358, MITRE, MA (1984) 3. Yamada, S., Tsuchiya, K.: A Study of Usability Evaluation of PC – Discussion of Model on PCs. The Japanese Journal of Ergonomics 32, 350–351 (1996) 4. Ikegame, T., Okada, H.: Toward Quantification of Usability. NEC Technical Journal 3(2), 53–56 (2008) 5. ISO 9241-10: Ergonomic requirements for office work with VDTs –Dialog principles (1996) 6. ISO 9241-12: Ergonomic requirements for office work with VDTs –Presentation of information (1998) 7. Ham, D.-H., Heo, J., Fossick, P., Wong, W., Park, S.-H., Song, C., Bradley, M.: Model-based approaches to quantifying the usability of mobile phones. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 288–297. Springer, Heidelberg (2007) 8. Nielsen, J.: Usability Engineering. Academic Press, London (1993) 9. Johnson, B., Shneiderman, B.: Tree-Maps: A Space-Filling Approach to the Visualization of Hierarchical Information Structures. In: Proc. of the 2nd Conference on Visualization 1991, pp. 284–291 (1991) 10. Kato, S., Horie, K., Ogawa, K., Kimura, S.: A Human Interface Design Checklist and Its Effectiveness. Transaction of Information Processing Society of Japan 36(1), 61–69 (1995) 11. Asahi, T., Turo, D., Shneiderman, B.: Using Treemaps to Visualize the Analytic Hierarchy Process. Information Systems Research 6(4), 357–375 (1995)

Evaluation of User-Interfaces for Mobile Application Development Environments Florence Balagtas-Fernandez and Heinrich Hussmann Media Informatics Group, Department of Computer Science, University of Munich, Germany {florence.balagtas,heinrich.hussmann}@ifi.lmu.de

Abstract. This paper discusses about the different user-interfaces of mobile development and modeling environments in order to extract important details in which the user-interfaces for such environments are designed. The goal of studying such environments is to come up with a simple interface which would help people with little or no experience in programming, develop their own mobile applications through modeling. The aim of this research is to find ways in order to present the user interface in a clear manner such that the balance between ease-of-use and ease of learning is achieved.

1 Introduction Nowadays, the development of software applications is no longer bounded within the confines of people with programming skills. People are no longer limited to just being end-users of an application, but are encouraged to be the creators of their own applications as well. An example of this is the growth of the world wide web and how the creation of web pages are no longer restricted to people who have skills in writing HTML code and scripts. The introduction of WYSIWYG HTML Editors such as Microsoft Front Page and Google Page Editor has made this possible. By hiding the HTML code in the background and allowing components to be dragged and dropped on to a page makes it easy for novices to create their own web pages. The same thing is happening now to the mobile industry. Mobile phone users are no longer limited to use pre-installed applications on their devices or buying readymade mobile applications for their personal purposes. People now have the power to create their own applications given the right motivation, creativity, skills and tools. Mobile phone companies and organizations have now opened up their application programming interfaces (APIs) that would allow anyone to develop their own applications for their mobile devices. Examples of these are the Java Platform Micro Edition (Java ME) API1 from Sun Microsystems, the Android API2 from the Open Handset Alliance and the iPhone API from Apple3. However, even though many users may have ideas for novel applications for mobile phones, software development is simply too difficult for most people. It takes a large amount of skill and familiarity with how 1

http://java.sun.com/javame/index.jsp http://code.google.com/android/ 3 http://developer.apple.com/iphone/ 2

J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 204–213, 2009. © Springer-Verlag Berlin Heidelberg 2009

Evaluation of User-Interfaces for Mobile Application Development Environments

205

the framework is used before a person can create a decent amount of code for a simple application. Even setting up the programming environment is a complex task, let alone, trying to figure out how to use the APIs, compiling, running and deploying the application on the actual device. Other things that make developing applications for mobile devices more difficult as compared to desktop applications are factors such as device limitations (e.g. screen size, computing power, power consumption) [4], different operating systems for mobile devices, different data representation and additional device capabilities (e.g. Bluetooth, Wifi, GPS, Camera-enabled) which are not standard to all devices and therefore should be considered when developing a uniform application that can be run on different mobile devices. In this research, we are investigating ways to make application development accessible to people with low or no programming skills. We propose applying modeldriven development (MDD) which is an approach to creating complex software systems by first creating a high-level, platform-independent model of the system, and then generating specific code based on the model to the target platform [5]. In ordinary software development, models are just thought of as tools for getting system requirements and for documentation purposes, however, in MDD, the models are actually part of the implementation of the system. The basic idea of our work is to come up with a modeling environment that is specific for modeling mobile applications which targets non-experts as the main users. Non-expert users here are defined to be people who have little or no experience in programming for mobile platforms. We want to present to the user one application that they could use to model their mobile applications without having to worry about low-level coding. In order to do this, we are developing a tool called Mobile Applications Modeler (Mobia). The focus of discussion for this paper is on the design of the user-interface for the Mobia modeling environment. The aim here is to find out which user interface design concepts are most suitable in order for non-expert users to develop their own mobile application with ease. The goal is to present the interface such that the balance between ease-ofuse and ease-of-learning [10] is achieved. We have focused on non-expert users in this research and do not include expert users in general since these two types of users often differ in their experiences and needs [6]. Unlike existing modeling tools such as Magic Draw4 and Eclipse with the Eclipse Modeling Framework (EMF)5 plugins that are more general purpose modeling tools, we want to present to the user a more domain-specific modeling tool that specializes only on modeling mobile applications. The focus of this part of our research is on how to present the user interface of the Mobia modeler such that it is easy to use for non-expert users. The rest of the paper is organized as follows: section 2 will discuss a few related researches to our work particularly in the area of model-driven development. In section 3 we will discuss the different user interfaces of existing development and modeling tools that are the basis of some of our designs. And then, the remainder of the paper will be a discussion of our approach such as, the design of our prototypes and evaluation results.

4 5

http://www.magicdraw.com/ http://www.eclipse.org/modeling

206

F. Balagtas-Fernandez and H. Hussmann

2 Related Work Integrated development environments (IDEs) are tools which are made to ease the application development process. Most IDEs provide an environment that features a text editor, compiler, debugger and simulator to name a few, which are all integrated into one application. They have evolved throughout the years, adding more features (e.g. GUI designer, version control, etc.) that would help the developer in accomplishing their tasks in the most efficient way. For mobile applications development in particular, examples of IDEs that allows plugins for mobile application development are the Netbeans, Eclipse and XCode development environments. A problem with IDEs for mobile application development though is that, different mobile phones have different application programming interfaces and platforms. Thus, creating a common application that would run on different platforms of mobile phones tends to get tedious and redundant since developers have to develop different code for each of them. One solution to this problem is by applying the model-driven approach in which models are used to describe the application and through transformation tools, these models are transformed to code that would run on specific platforms [5]. An example research that applies MDD is the Multimedia Modeling Language (MML) which is a platform-independent language used for the model-driven development of multimedia applications [7]. MML models are transformed into Flash models which can then later on be loaded into the Flash authoring tool for further completion of the application [8]. The approach [7,8] is usually for teams wherein graphical designers and software designers need to work together in a certain project. Each group of users has their own expertise in terms of skills and tools they are using. However, when non-expert users are involved in the development process, this approach can be quite complicated. Extensive knowledge on how to make UML models is necessary in order to create the applications using this approach, and since the tools are not yet integrated, mastery in using these tools is a must [8]. Dunkel and Bruns [3] also presents a model-driven way of producing business applications for mobile devices with BAMOS (Base Architecture for Mobile Applications in Spontaneous networks) as the target platform. Their models are expressed in UML activity diagrams to specify control flow and the description of mobile services through a DSL they have defined using UML profiles. As with MML, the approach uses different tools which are not yet integrated [3]. Another research that applies the MDD process and targets non-experts as the primary developers is the Simple Mobile Services (SMS) project. This project aims to create service authoring tools and mobile services that are simple to use, find and set up[9]. They focus on non-expert users as the people assembling these mobile services on their own. SMS applies the MDA [5] approach in building their services [2]. Our approach is similar to SMS in a way that we target non-expert users as main users of our tool for developing mobile applications. However, while SMS focuses on mobile web-based services, our research focuses on mobile-based applications. In the next section, we will discuss the different user-interface components present in various development and modeling environments. We want to find out which existing approaches in the UI and some new ones are most suitable for non-experts.

Evaluation of User-Interfaces for Mobile Application Development Environments

207

3 A Closer Look into the User Interfaces In this section, we would like to compare user-interfaces of existing IDEs that supports mobile application development (Netbeans and Eclipse), and a modeling tool (MetaEdit+) that supports domain specific modeling of mobile applications. We want to explore what features these tools have, and which of these features are essential parts of an environment in which non-experts can benefit in it.

Fig. 1. General Parts of a Development/Modeling Environment

In studying these tools, we have identified four basic areas that are usually present in such environments. For the purpose of discussion in this paper, we will attach a general name to each of the areas, which may be identical or not to how they are labeled in such environments. Fig. 1. shows the typical default location of the main areas and their names. Table 1 contains the main areas and some of the possible contents that may appear in those areas. Table 1. The different areas and their possible contents Area Navigation/Browsing Area Main/Central Area Palette/Properties Area Toolbar Area Output Area

Possible Contents Different components in a certain development project (e.g. files and folders, classes and packages) The component in which the user is currently working on (e.g. source code, design for a user interface, data source) Components that can be dragged and dropped to the main/central area (e.g. UI components, Datasets) Button controls (e.g run, debug), editing controls (e.g. copy, paste) Program output, Compiler errors, Debugging messages, etc.

Shown in Fig. 2. is an overview of the Netbeans 6.5 environment. The components described in our general UI model for a development environment are present in the Netbeans environment. One additional feature of Netbeans is the ability to switch to different views in the main area depending on what the user is focusing on. The source view allows the user to make changes to the source code; the screen view allows drag and drop design of the mobile application’s user interface; the flow view allows adding logic to the program by dragging flow arrows between the different

208

F. Balagtas-Fernandez and H. Hussmann

screens; and the analyzer view shows unused resources and MIDP compliancy. Switching through the different views changes the contents of the palette area, depending on what components are needed in that certain view. Netbeans also has the ability to bind a screen component’s data to information taken from a database.

Fig. 2. Overview of the Netbeans Environment

The next IDE interface that we are going to discuss is the Eclipse IDE. There are several projects that aim to develop plugins for Eclipse to allow mobile applications development (e.g. EclipseME6, Eclipse Plugin for Android7). For the purpose of this paper, we will focus on analyzing the interface for the Eclipse IDE used for developing Android applications since the basic components of the IDE are similar anyway. As shown in Fig. 3, the positioning of the components in the environment are similar to that of Netbeans. However, the features offered by the Eclipse environment are just a subset of Netbeans. As of the moment this paper is written, it does not feature a drag-and-drop GUI environment for developing Android applications, but through editing an XML file for the placement of the GUI components on the screen, or by directly adding lines to the source code for the GUI. Although, as expected in the future, developers might add more features for easy GUI development as the platform matures. DroidDraw8 is one example application UI editor for the Android platform which generates an XML file that can be copied to the main code. The MetaEdit+ Modeler is a DSM tool that allows the modeling of different domain specific applications (e.g. Mobile, Automotive, Telecom, Embedded, etc.). One supported domain is for modeling smartphone applications. Fig. 4 shows an overview of the basic user-interface of MetaEdit+ for modeling mobile applications. Unlike the first two IDEs we have described, the MetaEdit modeler features a simpler interface with several of its components positioned at different areas. The palette area contains fewer number of components as compared to the first two tools discussed. It features specialized constructs which is specific for such mobile platforms. The navigation area contains a list of components in the model. Below the navigation area is the properties area, which contains information about the current component in focus. 6

http://eclipseme.org http://code.google.com/android/intro/installing.html 8 http://www.droiddraw.org/ 7

Evaluation of User-Interfaces for Mobile Application Development Environments

209

Fig. 3. Overview of the Eclipse Environment

From these three examples, we want to extract the most desirable feature of each that can be applied to the design of Mobia. The Netbeans environment for instance, features the ability to change to different views, which can allow the user to concentrate on one task at a time. However, it contains so much features that it can take awhile before the user can actually take advantage of such features. The MetaEdit tool on the other hand contains only a limited number of components. It contains specialized constructs that could easily be identified by the user. All the tasks such as designing the screen and adding flow to the program is modeled in one view. The disadvantage about this though, is that since it is very specialized, the user is restricted to the type of application they can create. The Eclipse environment also offers a very simple interface and not shows too much features. It is clearly a tool for expert developers, who basically know what source code to type in for the applications they are developing. In the next section, we will discuss more about our approach in finding the ideal user-interface for the Mobia Modeler such that non-expert users will be able to use it. We apply some design patterns that is shown in the previous tools that we have described, and try to evaluate it in order to find out which features are most desirable for such an environment.

Fig. 4. Overview of the MetaEdit+ Modeler

210

F. Balagtas-Fernandez and H. Hussmann

4 The Mobia Modeler User Interface The Mobia Modeler is a modeling tool specifically designed to allow modeling for mobile applications. The target users for Mobia are non-expert users who are people that have little or no experience at all in programming for mobile platforms. For this particular study, we feature a module of Mobia that is focused on modeling applications in the domain of mobile health monitoring. As of the moment, we want to focus on one type of domain, since different domains may offer different modeling constructs. For this module, the users will model a certain type of application that can be used for health monitoring and feature modeling constructs that represent data from different medical gadgets or medgets (e.g. ECG meter, Thermometer, etc.). In order to find the ideal interface for Mobia, we have created two prototypes using Flash, which offers two different UI designs. Just to clarify, these prototypes are simply focused on evaluating the different user interface designs and interactions and do not yet have code transformation features. 4.1 Mobia with One View Shown in Fig. 5 is a screenshot of the first version of our Mobia prototype which we called Mobia One View. The first prototype offers one view for the user which means that, the user can design an instance of the mobile screens, add data and application control flow all in one view. The user can concentrate on designing a single screen by zooming into that area, and try to see an overview of the whole system by zooming out. The palette on the right side of the screen contains screen components that can be dragged on to the mobile screen. For our prototypes, we only feature a subset of the possible screen components that a mobile application can have. The right palette also contains data input which we call medget (short for medical gadget) input. The medget constructs in the medget input palette contains abstract representations of information that comes from health monitoring devices capable of sending their data to a mobile device. More information about the different representations of medget data will not be discussed in this paper, but in a separate paper [1].

Fig. 5. Mobia with One View. (In the foreground) The main area is zoomed-in to see the screen designs better.

Evaluation of User-Interfaces for Mobile Application Development Environments

211

4.2 Mobia with Multiple Views The second design approach that we did for Mobia is what we called multiple views which is shown in Fig. 6. This is similar to the Netbeans IDE in which the main area features different kinds of views depending on the specific task that the user is doing. The reason behind the choice of this type of design is that we want the user to focus on one task at a time.

Fig. 6. Mobia with Multiple Views (Design, Data and Navigation View)

The default view is the Design View in which the user can design individual screens by dragging and dropping screen components from the palette to the screen. The left panel contains all the mobile screens for that application. Clicking on an individual screen in the left panel shows it in the main view to be further edited and designed. Screens can be added and deleted by pressing the add and delete buttons respectively. This design is borrowed from presentation programs such as Microsoft Powerpoint and OpenOffice Impress wherein each slide can be viewed from a panel and allows switching from one slide to another by clicking on the mini versions of the slides in the panel. The Data View is similar to the design view except for the fact that the palette contents on the right panel changes to medget data. In this view, the users can concentrate on how they want data taken from health monitoring devices be displayed on the screen. These medget components act as placeholders into which the real information from the devices will appear in the real application. The last view is the Flow View, which shows all the screens in the model and how the screen transitions from one to the next. The user can add basic control logic to the application by dragging on arrows and linking the screens together. In this view, a small component palette contains buttons in which the users can drag to the screens. The logic behind this design approach is that, in the application, only by pressing a control component such as a button can trigger going from one screen to the next.

5 User Study Evaluation and Results Given the prototypes that we have described in the previous section, we want to find out which of the prototypes provides a simpler UI for the user and gets the task done

212

F. Balagtas-Fernandez and H. Hussmann

quickly. For a more subjective evaluation, we also want to find out which design is more fun and easier to use. In order to do this, we have conducted a user study in which each user is given a task to accomplish using both prototypes. In order to measure efficiency, we get the time in which the user accomplishes a certain task. In order to eliminate the bias towards the second prototype, we alternate which prototype each participant uses first in doing the tasks. The participants were instructed not to ask any questions from the evaluators. The goal here is to allow the participants to explore the tool themselves and learn how to use it by themselves without any outside intervention. At the beginning of the user-study, the participant was asked to explore the prototypes and give comments. After they are done studying the tool in whichever method they choose, they are given two tasks which are to design the contents of the screens and then later on to add control flow to the screens. They were asked to create 3 screens with some screen components in them. After designing the screens, they were asked to add control flow in which allows switching to a different screen whenever a button is pressed. Table 2. The average times accomplishing the tasks using the two prototypes

Version Mobia One View Mobia Multiple Views

Average Time (Minutes) Screen Design Adding Control Task Flow Task 4.036 minutes 1.126 minutes 5.833 minutes 2.223 minutes

There were 10 participants to our user study: 60% have backgrounds in the field of Computer Science and the rest from the fields of Educational Psychology, Archeology, Architecture and Social Welfare. Only 10% of the participants have a background in programming for mobile platforms which is also in the very basic level. Table 2 shows the average times of when the users accomplished the tasks while, Table 3 shows the results for the subjective evaluation in terms of which prototype is easier and more fun to use. Based on the results shown in the tables, Mobia One View allows the users to do the tasks faster as compared to the multiple view. A factor that might contribute to this is the fact that in multiple views, the user has to switch from one to the next in order to add a different component or do another design task. Based on the subjective feedback of the participants, the Mobile one view poses an environment that is both easy and fun to use. Table 3. Subjective evaluation for the Mobia Prototypes Version Mobia One View Mobia Multiple Views None

Criteria (Percentage of Users) Easier to Use More fun to Use 60% 50% 40% 40% 0% 10%

Evaluation of User-Interfaces for Mobile Application Development Environments

213

6 Summary and Future Work In this paper, we have presented different design ideas for a mobile application modeling environment that targets non-experts as the main users. The design and results presented here are just the initial phase of our iterative approach to finding the ideal interface for a tool that would help accomplish tasks with ease. Our future work aside from continuing to polish the user interface design of Mobia, is to come up with an underlying framework to support code transformation from the models. An approach to have a user-adaptive tool that changes according to each user’s existing skills and preferences to enhance user experience and learning is also envisioned.

Acknowledgments We would like to thank the German Academic Exchange Service (DAAD) for funding this research. We would also like to thank Ugur Örgün for helping with the prototypes and to the people who participated in our user study.

References 1. Balagtas-Fernandez, F., Hussmann, H.: Modeling Information From Wearable Sensors. In: MDDAUI 2009- Model Driven Development of Advanced User Interfaces 2009. CEUR Proceedings (2009) 2. Bartolomeo, G., Casalicchio, E., Salsano, S., Melazzi, N.B.: Design and Development Tools for Next Generation Mobile Services. In: International Conference on Software Engineering Advances, ICSEA 2007, p. 16 (2007) 3. Dunkel, J., Bruns, R.: Model-Driven Architecture for Mobile Applications. Business Information Systems, pp. 464–477 (2007) 4. Gaedke, M., Beigl, M., Gellersen, H.-W., Segor, C.: Web Content Delivery to Heterogeneous Mobile Platforms. In: ER 1998: Proceedings of the Workshops on Data Warehousing and Data Mining, pp. 205–217. Springer, London (1998) 5. Kleppe, A., Warmer, J., Bast, W.: MDA Explained: The Model Driven Architecture: Practice and Promise. Pearson Education, Inc., Boston (2003) 6. Petre, M.: Why looking isn’t always seeing: readership skills and graphical programming. Commun. ACM 38, 33–44 (1995) 7. Pleuss, A.: MML: A Language for Modeling Interactive Multimedia Applications. In: ISM 2005: Proceedings of the Seventh IEEE International Symposium on Multimedia, pp. 465– 473. IEEE Computer Society, Washington, DC, USA (2005) 8. Pleuß, A., Vitzthum, A., Hussmann, H.: Integrating heterogeneous tools into model-centric development of interactive applications. In: Engels, G., Opdyke, B., Schmidt, D.C., Weil, F. (eds.) MODELS 2007. LNCS, vol. 4735, pp. 241–255. Springer, Heidelberg (2007) 9. The SMS Project, http://www.ist-sms.org 10. Weiss, S.: Handheld Usability. John Wiley and Sons, Chichester (2002)

User-Centered Design and Evaluation – The Big Picture Victoria Bellotti1, Shin’ichi Fukuzumi2, Toshiyuki Asahi2, and Shunsuke Suzuki2 1

Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA [email protected] 2 NEC Corporation, 8916-47, Takayama-cho, Ikoma, Nara 630-0101, Japan [email protected], [email protected], [email protected]

Abstract. This paper provides a high-level overview of the field of usability evaluation as context for a panel “Systematization, Modeling and Quantitative Evaluation of Human Interface” in which several authors report on a collaborative effort to apply CogTool, an automated usability evaluation method, to mobile phone interfaces and to assess whether usability predictions made by CogTool correlate with user subjective impressions of usability. If the endeavor, which is still underway at the time of writing, is successful, then CogTool may be applied economically within the product development lifecycle to reduce the risk of usability problems. Keywords: Usability evaluation, methods, metrics, systematization.

1 Introduction User-centered design and usability evaluation go hand-in-hand. It is generally accepted amongst members of the HCI community that you cannot guarantee a successful user experience without testing it in some way [15], the earlier in the process, the better [34]. Indeed, it has even been argued, perhaps unfairly, that user-centered design is often little more than trial and error, guided by checks with users in evaluations [14]. Iterative design, prototyping and usability evaluation certainly ranked very high amongst all methods cited in two surveys of HCI professionals [20, 48]. Early, rapid and low-fidelity prototyping approaches (e.g., paper or look-and-feel mock-ups) allow ideas to be evaluated for usability with real users and then refined and tested again before design decisions are finalized and high-fidelity software prototyping affords even more thorough usability evaluation during the design process. Evaluation performed during the design process is known as formative evaluation and can be contrasted with summative evaluation that can be used to assess the final merits of a system at the end of the design process [44]. Formative evaluation is useful for improving a design and summative evaluation for supporting claims to its efficacy and developing requirements for a future release of the design product. However, regardless of when and why evaluation is required, there is some considerable debate as to what usability evaluation methods are most effective (e.g. [18, 21 and 49]). This paper briefly discusses the big picture of some of the issues that can influence the choice of a usability evaluation method within the user-centered design process. It J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 214–223, 2009. © Springer-Verlag Berlin Heidelberg 2009

User-Centered Design and Evaluation – The Big Picture

215

is presented as context for a series of position papers within a panel that reflects on one particular effort to systematize usability evaluation in a large corporation where a number of fairly common constraints such as lack of availability of user-centered design experts and tight product deadlines and budgets apply.

2 Challenges for Usability Evaluation in Design Despite the obvious importance of evaluation in user-centered design (obvious at least to our own HCI community), it is not always the case that applications built to be placed in the hands of hapless end-users enjoy the benefits of any kind of objective evaluation; i.e., a method relying on more powerful evidence than the intuitions of inexperienced engineers [see, for example, 6]. And HCI researchers have expressed concern that the evaluations that do take place are inadequate (for example, they may involve the wrong user representatives [19]; or only quality control testers [41]). In this section I review three obstacles that may stand as explanations for this unfortunate phenomenon. 2.1 A Plethora of Methods to Choose From One key obstacle to understanding what usability evaluation methods one should adopt is the abundant diversity of methods and tools starting with “quick-and-dirty” methods such as expert reviews or guidelines walkthroughs [18] and simple tools such as surveys [e.g. 42] all the way through to extended in situ evaluation methods incorporating multiple data sources such as logging and interview-based data collection methods [e.g. 33] or sophisticated tools such as eye-tracking systems [e.g. 11]. Each method has its strengths and drawbacks and is appropriate in different circumstances, for example discount usability methods [36] are appropriate when resources are constrained, even if they are not as sensitive at picking up problems as a full-scale evaluation. If there were a one-size-fits-all solution available for standardized usability evaluation, it would surely be easier to train designers and developers to apply it in all projects that are likely to impact end users. But instead diversity opens the doorway to confusion and suboptimal choice. 2.2 A Diversity of Influential Design Circumstances At the time of writing in 2009, based on my own experiences interacting with representatives of a variety of commercial and research application development organizations, it is still not uncommon for a design effort to take place without a serious usability evaluation. We are all familiar with the baffling results of such endeavors, which we encounter regularly in our interactions with hardware, software and webbased user interfaces. Many circumstances can exert influence over whether usability evaluation takes place at all and over what type of evaluation with what metrics is most appropriate. Consider the following variables (which are both contextual and inherent to the design) as examples: • •

Application domain Standards and performance criteria that pertain to the application domain

216

• • • • • •

V. Bellotti et al.

Target users and their particular characteristics Novelty of the design and its interaction elements User-centered design expertise within the design team Budget Time available Organizational culture around the design team and perceptions of the importance of usability

Let me briefly illustrate the kinds of impact these factors can have. In my own past work exploring novel solutions in the application domain of personal information management (PIM) [8, 9 and 10], it quickly became apparent that experimental evaluations made no sense, since the proof of the PIM pudding so-to-speak can only be in the extended use of a solution with one’s own real personal information, leading to a need for in situ evaluation of real use over weeks rather than the hours that Nielsen [38] suggests can usefully be applied to web-site evaluation. As another example, Grudin [19] reported extensively on various organizational related constraints that can lead to suboptimal design results and Bak et al. [6] more recently also highlight organizational obstacles as significant, both in the literature and in their own survey, together with developer mindset (a culture of greater focus on functionality and efficient code and lack of user-centered design expertise). Perhaps the key factor impacting usability evaluation is the overall culture of the host organization for the design effort, (or even the culture within which that organization exists) which can in turn influence other factors that I listed above. Specifically, if few members of the organization are aware of user-centered design as a discipline and the value of a good user experience and fewer still have the relevant skills to apply the appropriate methods, then budget and time will not be allocated to a serious effort to evaluate the user experience and people with the required expertise will of course not be available to engage in that effort. In many countries today usability experts (or engineers who also have user-centered design skills) are indeed still an extremely rare species and, even if a corporation wishes to hire them, they may find that they simply cannot find them. In such circumstances, how might a large corporation make the best of limited usability expertise? This is an issue to which I will return in a subsequent section. 2.3 Metrics Usability is defined in the ISO 9241-11:1998 standard as the “extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.” Taking this standard as widely agreed upon, effectiveness, efficiency and satisfaction thus have to be measured somehow in order to know how usable a prototype or product is. Unfortunately metrics also present something of a challenge to evaluators since, as Sauro & Kindlund [43] point out without overlooking the obvious irony, “Usability Metrics Need to be Easier to Use.” Bak et al. [6] surveyed 2795 papers in the HCI literature, amongst which they found 28 with a focus on usability in organizations. Out of these, 11

User-Centered Design and Evaluation – The Big Picture

217

mentioned poor understanding of usability as an obstacle (behind resource demands, 17/28; test participant issues such as identification and access to users, 14/28; and organizational obstacles including anti-usability culture, 14/28). In particular, according to Bak et al., usability is often confused with functionality, which, at least to this author’s mind, may explain why so many applications have so many unused, often hard to discover, and near useless features. In fact there are five basic (although not cleanly independent) usability metrics that have been applied repeatedly and seem to be accepted as fairly standard within the HCI community [e.g. 32, 34 and 43]. These are: • • • • •

Time taken to learn to execute tasks Time taken to execute tasks Task completion rate (proportion of tasks in an evaluation that can be completed to some standard of correctness) Number of errors (deviations from viable task completion paths or production of a result or state that must be undone) User satisfaction (a composite of a variable host of subjective assessments)

Other less common metrics such as objectively measurable stress [e.g. 45] and analytically derived cognitive complexity [27] that can be correlated with at least one of these four have also been discussed in the literature. However, the five basic metrics can be all measured directly without special equipment (although perhaps not always as accurately as with special equipment) and cover the most significant possible consequences of bad design. The question then is, should all of these dimensions be measured or do some matter more than others in different design circumstances? In fact, design circumstances can heavily weight the importance of one metric over another and may even require trade-offs to be made between metrics. For example, a UI optimized for novice users with lots of easy to find and learn menus and buttons will tend to be slower and less efficient for an expert who will usually look for keyboard shortcuts that are faster to execute. So the evaluator must understand the importance of the metric to the design circumstances at hand and the extent to which any given tool or method is likely to provide reliable values for the metrics that matter. Different usability evaluation methods vary in the extent to which they are able to provide these metrics. For example, a cognitive walkthrough [40] will not allow the evaluator to measure task times very accurately, although it may be better at measuring the extent to which a system is likely to be error-prone. A laboratory experiment may allow an evaluator to measure time, task completion and errors quite accurately, but render no reliable measurement of user satisfaction. A user survey may measure satisfaction quite well, but provide only subjective (and thus unreliable) appraisals of time, task completion and error proneness. Of course, it can make sense to combine multiple methods in one evaluation such as an experiment and a survey, usually at a reduced cost for each method, since study participants need only be recruited, scheduled and paid once for a single session to perform more than one exercise. Whatever the case may be, it requires some expertise to know what aspects of the design situation to pay attention to in deciding what evaluation metrics are best.

218

V. Bellotti et al.

3 Systematizing Usability Evaluation Given the above challenges for usability evaluation in the design process, it is hardly surprising that some professionals have sought to develop systematic methods in an attempt to help those with less expertise assess usability without the added time and expense of bringing in real users or an expert who may be hard to find. Three approaches to systematization are: • • •

Guidelines Procedures Automation

Usability guidelines have been common for quite some time. For example, Jakob Nielsen describes participating in a US Airforce exercise to compile “existing usability knowledge into a single, well-organized set of guidelines for its user interface designers” between 1984 and 1986 [37]. Many corporate, governmental and quite a few international user interface design guidelines have been compiled and updated since then [e.g. 5, 24, and 28]. They seek to describe what the designer must aim to accomplish or constraints she or he must work within. However conforming to guidelines can be a tricky business for the unskilled, especially when they are not well articulated as in the ISO guideline for “Suitability for learning” which is articulated thus, “A dialogue is suitable for learning when it supports and guides the user in learning to use the system.” Procedures is a term I want to use here to refer to well-defined methods for usability evaluation and count as a subset of the methods described in section 2.1 of this paper. The Cognitive Walkthrough [40] is one such procedure that has been evaluated [31] to show that it can be followed by a knowledgeable person and, depending on the extent of that person’s skill, produce consistent predictions without requiring a complex modeling effort or a real user evaluation. Another similar approach is the Heuristic Evaluation method [39], which, in evaluation, has been shown to be better performed by usability experts and best of all by application domain specialist usability experts [e.g. 34]. These general-purpose methods have also been accepted for a long time and have stood the test of time, still being in use even over 15 years after their invention [22]. Earthy at al. [17] provides a review of the ISO 13407 humancentred design processes which represents an attempt to set standards for interactive systems design in general. Other evaluation procedures have been developed more recently for specific platforms (e.g. the mobile phone [30]) and specific application domains (e.g. e-learning [50]). Automation may be the Holy Grail of usability evaluation since the possible cost savings in design endeavors are immense. Card, Moran & Newell developed the foundational example of an evaluative human information processing model and the GOMS (Goals, Operators, Methods and Selection rules) approach to computational modeling of human interaction with computers in the early 80’s [13]. Since then, many attempts have been made to achieve full automation [25]. Quite a number of early efforts to develop approaches only focused on specifying the rules that would need to be learned by a user to operate a system and were never automated and took far too long to apply successfully [7]. More successful has been the work based on sustained development of working software models of a human information processor

User-Centered Design and Evaluation – The Big Picture

219

such as ACT [1] and its descendents (ACT* [2], ACT-R [3 and 4] and ACT-R/PM, [12], which is still under development at Carnegie Mellon University (CMU). Another, albeit less well-known, example of such a system is the SOAR cognitive architecture [29] developed in the UK. Building upon the ACT-R computational cognitive architecture Bonnie John at CMU and her colleagues and students have developed a GOMS-based system called CogTool [26] that has perhaps come closest to achieving minimal effort from the system developer. CogTool uses performance measurements taken from real user interactions and is able to generalize them to specifications of user interfaces that contain the same basic features (e.g. buttons, menus and other GUI elements). The user interface specification is provided to CogTool in the form of a storyboard (based on sketches or screenshots) that preserves the dimensions of the target GUI upon which the evaluator demonstrates to CogTool the actions required to execute tasks. Using its models of user thinking times and actions (plus expected system response times), CogTool is able to output a metric of task completion time predictions for a skilled user and also completion times and deviating actions together with the time they will take for novices. CogTool uses an augmentation of ACT called SNIF-ACT [17] which assumes novice users read text labels and click on items that are semantically close to their goal, sometimes this will lead to mistakes since interfaces often contain ambiguous or misleading elements [46 and 47].

4 Seeking Systematization in the Enterprise The HCI International 2009 panel, “Systematization, Modeling and Quantitative Evaluation of Human Interface” with which this paper is associated includes a number of positions from collaborators who have participated in an effort to systematize usability evaluation in a large corporation, NEC, based in Japan, that frequently develops software and hardware products for both consumers and for business use. The discussion in this paper has sought to provide some context for the approach adopted in the work reported, by addressing some of the key considerations that relate to its rationale. The chosen approach reflects a desire to simplify the choice of usability evaluation method in NEC where usability expertise is not as pervasive as would be ideal and where tight deadlines and budgets always apply. A small team of usability specialists in the research division of NEC began a collaboration effort with researchers at the Palo Alto Research Center and at Carnegie Mellon University to validate the use of CogTool (introduced above) in assessing the usability of products under development. CogTool was chosen as an ideal method because of its being easy to apply to a graphical UI specification (possibly early in the design process) without much expertise. This approach sidesteps the problems experienced by the corporation of lack of expertise and limited time and budget for usability. By providing one general-purpose systematic approach, it also seeks to get around possible bewilderment by non-experts working at the corporation at the number of methods available and the reliance of many of the more suitable, under the circumstances, economical and fast discount usability methods (such as Heuristic Evaluation) on experts to apply them well. Because of their importance to the corporation as a product category and the discrete, and thus easy to model, nature of the tasks users perform on them, mobile

220

V. Bellotti et al.

phones were chosen as having an ideal user interface upon which to first test the CogTool approach. However, CogTool at the time of writing mainly generates predictions of the occurrence of user behaviors such as looking, thinking and gesturing and the time they each take during task execution (which may include trial-and-error exploration for novice users). So it was necessary to determine whether these predictions correlate with the kind of usability that really matters to a product vendor developing applications where user performance demands (i.e. time to execute tasks and error rates) are not stringent; the subjective experiences of both experts and novices. Experts, of course, will form this opinion over extended periods of use, but novices can only form this opinion based on hearsay from existing users (probably experts) or their own initial exploration of the product, perhaps in a store (also known as “shelf usability”) or when given the opportunity to try the product for the first time by a friend or colleague. A literature search was undertaken to find the best possible method for assessing user subjective impressions of usability. Out of many possible candidates, the Mobile Phone Usability Questionnaire developed by Ryu [42] was chosen because it built upon and systematically refined questions from a number of previously well-accepted subjective usability assessment questionnaires. At the time of writing, an intensive effort is underway to obtain naïve and expert user performance data on a set of tasks across three types of mobile phone interface (between subject measures) and to correlate that data with user subjective impressions of usability obtained using the MPUQ both before and after using the mobile phones. The anticipated outcomes of the research will be: • • •

A comparison of three mobile phone models in terms of time taken by expert and naïve users to complete a set of tasks on each of the three phones. CogTool predictions for both experts and naïve users on each of the phone interfaces and an assessment of the accuracy of those predictions. A comparison between the real user data and the CogTool predictions and user subjective impressions of usability.

If the team is able to demonstrate that predictions of CogTool do indeed correlate with user subjective impressions, then we have evidence that CogTool may be used by commercial mobile phone developers to improve the usability of their mobile phones by using its predictions as a means to identify usability problems and areas for improvement in new phone interface design efforts. Whilst this might not be the ideal method for formative usability evaluation in the product development lifecycle, it will be a practical solution that can be used by non-experts under non-ideal circumstances and should reduce instances of at least some types of usability problem going undetected before product release.

References 1. Anderson, J.R.: Language, Memory and Thought. Erlbaum Associates, Hillsdale, NJ (1976) 2. Anderson, J.R.: The Architecture of Cognition. Harvard University Press, Cambridge, MA (1983)

User-Centered Design and Evaluation – The Big Picture

221

3. Anderson, J.R.: Rules of the Mind. Lawrence Erlbaum Associates, Hillsdale (1993) 4. Anderson, J.R.: ACT: A Simple Theory of Complex Cognition. American Psychologist 51, 355–365 (1996) 5. Apple: Apple Human Interface Guidelines for Mac OS X (2008), http://developer.apple.com/documentation/UserExperience/ Conceptual/AppleHIGuidelines 6. Bak, J.O., Nguyen, K., Risgaard, P., Stage, J.: Obstacles to Usability Evaluation in Practice: A Survey of Software Development Organizations. In: Proceedings of the 5th Nordic Conference on Human-Computer interaction: Building Bridges, NordiCHI 2008, vol. 358, pp. 23–32. ACM, New York (2008) 7. Bellotti, V.: Implications of Current Design Practice for the Use of HCI Techniques. In: Jones, D.M., Winder, R. (eds.) People and Computers IV, pp. 13–34. Cambridge University Press, Cambridge (1988) 8. Bellotti, V., Dalal, B., Good, N., Bobrow, D.G., Ducheneaut, N.: What a To-do: Studies of Task Management Towards the Design of a Personal Task List Manager. In: ACM Conference on Human Factors in Computing Systems, CHI 2004, pp. 735–742. ACM, New York (2004) 9. Bellotti, V., Ducheneaut, N., Howard, M.A., Smith, I.E.: Taking Email to Task: The Design and Evaluation of a Task Management Centered Email Tool. In: CSCW 2002 Workshop: Redesigning Email for the 21st Century, New Orleans, LA, ACM, New York (2003) 10. Bellotti, V., Smith, I.: Informing the Design of an Information Management System with Iterative Fieldwork. In: Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques, ACM, New York (2000) 11. Benel, D.C.R., Ottens, D., Horst, R.: Use of an Eye Tracking System in the Usability Laboratory. In: Proceedings of the Human Factors Society 35th Annual Meeting, Santa Monica, Human Factors and Ergonomics Society, pp. 461–465 (1991) 12. Byrne, M.D.: ACT-R/PM and Menu Selection: Applying a Cognitive Architecture to HCI. International Journal of Human-Computer Studies 55, 41–84 (1999) 13. Card, S.K., Newell, A., Moran, T.P.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates Inc., Hillsdale (1983) 14. Constantine, L.L.: Beyond User-Centered Design and User Experience: Designing for User Performance. Cutter IT Journal 17(2), 16–25 (2004) 15. Dix, A., Finlay, J., Abowd, G., Beale, R.: Human Computer Interaction, 2nd edn. PrenticeHall, Englewood Cliffs (1993) 16. Earthy, J., Sherwood Jones, B., Bevan, N.: The Improvement of Human-centred Processes – Facing the Challenge and Reaping the Benefit of ISO 13407. International Journal of Human Computer Studies 55(4), 553–585 (2001) 17. Fu, W.-T., Pirolli, P.: SNIF-ACT: A Cognitive Model of User Navigation on the World Wide Web. Human-Computer Interaction 22, 355–412 (2007) 18. Gray, W.D., Salzman, M.C.: Damaged Merchandise? A Review of Experiments that Compare Usability Evaluation Methods. Human Computer Interaction 13(3), 203–261 (1998) 19. Grudin, J.: Systematic Sources of Suboptimal Interface Design in Large Product Development Organizations. Human-Computer Interaction 6(2), 147–196 (1991) 20. Gunther, R., Janis, J., Butler, S.: The UCD Decision Matrix: How, When, and Where to Sell User-Centered Design into the Development Cycle (2001), http://www.ovostudios.com/upa2001/ (retrieved January 24, 2009) 21. Hartson, H.R., Andre, T.S., Williges, R.C.: Criteria for Evaluating Usability Evaluation Methods. International Journal of Human-Computer Interaction 15(1), 145–181 (2003)

222

V. Bellotti et al.

22. Hollingsed, T., Novick, D.G.: Usability Inspection Methods After 15 Years of Research and Practice. In: Proceedings of the 25th Annual ACM international Conference on Design of Communication, SIGDOC 2007, pp. 249–255. ACM, New York (2007) 23. ISO 9241-11:1998. Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs) – Part 11: Guidance on Usability. International Organization for Standardization (1998) 24. ISO 9241-110:2006 Ergonomics of Human-System Interaction – Part 110: Dialogue Principles (2006) 25. Ivory, M.Y., Hearst, M.A.: The State of the Art in Automating Usability Evaluation of User Interfaces. ACM Computing Surveys 33(4), 470–516 (2001) 26. John, B.E., Prevas, K., Salvucci, D.D., Koedinger, K.: Predictive Human Performance Modeling Made Easy. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2004, pp. 455–462. ACM, New York (2004) 27. Kieras, D., Polson, P.: An Approach to the Formal Analysis of User Complexity. Int. Journ. of Man-Machine Studies 22, 365–394 (1985) 28. Koyani, S.J., Bailey, R.W., Nall, J.R.: Research-Based Web Design & Usability Guidelines. U.S. Department of Health and Human Services (HHS) and the U.S. General Services Administration (GSA), Washington DC (2004), http://www.usability.gov/pdfs/guidelines.html#1 (retrieved February 21, 2009) 29. Laird, J., Rosenbloom, P., Newell, A.: Universal Subgoaling and Chunking: the Automatic Generation and Learning of Goal Hierarchies. Kluwer Academic Publishers, Dordrecht (1986) 30. Lee, Y.S., Hong, S.W., Smith-Jackson, T.L., Nussbaum, M.A., Tomioka, K.: Systematic Evaluation Methodology for Cell Phone User Interfaces. Interacting with Computers 18(2), 304–325 (2006) 31. Lewis, C., Polson, P.G., Wharton, C., Rieman, J.: Testing a Walkthrough Methodology for Theory-Based Design of Walk-up-and-use Interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems CHI 1990, pp. 235–242. ACM, New York (1990) 32. Macleod, M., Bowden, R., Bevan, N., Curson, I.: The MUSiC performance measurement method. Behaviour & Information Technology 16(4-5), 279–293 (1997) 33. Muller, M.J., Geyer, W., Brownholtz, B., Wilcox, E., Millen, D.R.: One-Hundred Days in an Activity-Centric Collaboration Environment Based on Shared Objects. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems CHI 2004, pp. 375– 382. ACM, New York (2004) 34. Nielsen, J.: The Usability Engineering Life Cycle. Computer 25(3), 12–22 (1992) 35. Nielsen, J.: Usability engineering. Academic Press, Boston (1993) 36. Nielsen, J.: Using Discount Usability Engineering to Penetrate the Intimidation Barrier. In: Bias, R.G., Mayhew, D.J. (eds.) Cost-Justifying Usability, Academic Press, London (1994) 37. Nielsen, J.: Durability of Usability Guidelines. Jakob Nielsen’s Alertbox (January 17, 2005), http://www.useit.com/alertbox/20050117.html (retrieved January 27, 2009) 38. Nielsen, J.: Cost of User Testing a Website. Alertbox (May 3, 1998), http://www.useit.com/alertbox/980503.html (retrieved January 24, 2009) 39. Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1990, pp. 249–256. ACM Press, New York (1990)

User-Centered Design and Evaluation – The Big Picture

223

40. Polson, P.G., Lewis, C., Rieman, J., Wharton, C.: Cognitive Walkthroughs: A Method for Theory-Based Evaluation of User Interfaces. International Journal of Man-Machine Studies 36, 741–773 (1992) 41. Poltrock, S.E., Grudin, J.: Organizational Obstacles to Interface Design and Development: Two Participant-Observer Studies. ACM Trans. Comput.-Hum. Interact 1(1), 52–80 (1994) 42. Ryu, Y.S.: Development of Usability Questionnaires for Electronic Mobile Products and Decision Making Methods, Doctoral dissertation, State University, Blacksburg, VA, USA (2005) 43. Sauro, J., Kindlund, E.: A Method to Standardize Usability Metrics into a Single Score. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems CHI 2005, pp. 401–409. ACM, New York (2005) 44. Scholtz, J.: Usability Evaluation. National Institute of Standards and Technology (2006), http://www.itl.nist.gov/iad/IApapers/2004/ Usability%20Evaluation_rev1.pdf (retrieved January 24, 2009) 45. Stickel, C., Scerbakov, A., Kaufmann, T., Ebner, M.: Usability Metrics of Time and Stress - Biological Enhanced Performance Test of a University Wide Learning Management System. In: Holzinger, A. (ed.) Proceedings of the 4th Symposium of the Workgroup HumanComputer interaction and Usability Engineering of the Austrian Computer Society on HCI and Usability For Education and Work. Lecture Notes In Computer Science, vol. 5298, pp. 173–184. Springer, Heidelberg (2008) 46. Teo, L., John, B.E.: Towards Predicting User Interaction with CogTool-Explorer. In: Proceedings of the Human Factors and Ergonomics Society 52nd Annual Meeting, pp. 950– 954 (2008) 47. Teo, L., John, B.E., Pirolli, P.: Towards a Tool for Predicting User Exploration. In: CHI 2007 Extended Abstracts on Human Factors in Computing Systems, CHI 2007, pp. 2687– 2692. ACM, New York (2007) 48. Vredenburg, K., Mao, J., Smith, P.W., Carey, T.: A Survey of User-Centered Design Practice. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 471–478. ACM, New York (2002) 49. Wixon, D.: Evaluating Usability Methods: Why The Current Literature Fails the Practitioner. Interactions 10(4), 28–34 (2003) 50. Zaharias, P.A.: Usability Evaluation Method for E-Learning: Focus on Motivation to Learn. In: CHI 2006 Extended Abstracts on Human Factors in Computing Systems, pp. 1571–1576. ACM, New York (2006)

Web-Based System Development for Usability Evaluation of Ubiquitous Computing Device Jong Kyu Choi, Han Joon Kim, Beom Suk Jin, and Yonggu Ji Dept. of Information and Industrial Engineering, Yonsei University, Seoul, Korea {jk.choi,khjoon,kbf2514jin,yongguji}@yonsei.ac.kr

Abstract. Recently, with the development of electronic technology, information technology (IT) devices that satisfy user requirements, such as PMP (Portable Multimedia Player), PDA (Personal Data Assistant), UMPC (Ultra Mobile Personal Computer) and mobile phones have been developed. These devices are making wireless communication and network communication more accessible, and by the ubiquitous paradigm, provide accessibility of information everywhere. The appearance of these devices and the development of the technology are integrating and converging in the IT devices. Therefore, there are significant changes in the purpose and environment of IT device applications. This is due to the modification of the environment in which the device is used (not only in a passive state but also in a motional state), which has a greater influence on usability. Therefore, a new methodology is required to evaluate the usability of the devices. In previous studies, by gathering and integrating the usability factors and ubiquitous characteristics, the Ubiquitous Evaluation Factor was obtained. For each factor of ubiquitous devices, deconstruction was accomplished for each usability evaluation. Through this process, components of ubiquitous devices could be extracted. Evaluation scores of ubiquitous device components and the score of the evaluation of each usability factor could be obtained from the usability evaluation. This evaluation framework was developed as a Webbased system to let the users perform the usability evaluation without having trouble with the location. This system was developed in Windows Server 2003 Enterprise Edition platform. Web Server IIS (Internet Information Server) 6.0 was used, and MS-SQL 2000 was used for the database server. For development of language, ASP (Active Server Page) was used, which is run in IIS. This study is meaningful in that through a Web-based system, various people could easily access the device, and in that evaluation of a portion of the device as well as the entire device is possible. Keywords: Ubiquitous computing device, usability, web-based system, system development.

1 Introduction The development and improvement of electronic and related technology, as well as diverse IT devices are being introduced. The appearance of Mobile devices such as PDA (Personal Digital Assistant), PMP (Portable Multimedia Player), UMPC (Ultra Mobile Personal Computer) and mobile phones is making wireless communication J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 224–231, 2009. © Springer-Verlag Berlin Heidelberg 2009

Web-Based System Development

225

more accessible and providing the possibility of having access to information everywhere[1]. Therefore, the development of these IT devices and the improvement upon technology together requires integration and convergence [2]. The environments where the IT devices are being used and the purposes for their utilization are changing. For instance, IT devices are not only used in the passive state, but also in a state of motion, which influences the usability of the device. The distinguishing characteristic of ubiquitous computing is that it is a communication system [3] that allows the users to obtain the required information in any place. Therefore, previous usability evaluation tools need to be improved, taking into consideration new user environments and ubiquitous computing [1]. From previous studies, we selected and integrated the usability factors and ubiquitous characteristic factors, developing new Ubiquitous Evaluation Factors. Each part of the ubiquitous device usability evaluation was deconstructed so the separate components of the ubiquitous device could be obtained. Through the usability evaluation, the evaluation score and each usability factor evaluation score of a ubiquitous device component can be obtained. In this study, an evaluation framework was developed as a Web-based system, allowing users to perform a usability evaluation anywhere.

2 Background 2.1 Ubiquitous Computing Device Ubiquitous computing technology development and mobile information device convergence development provide information to the user everywhere at every moment with any device. The basis of ubiquitous computing is to provide service at the request of a user and to grasp the intention of a user and situation. This results in one service system actively supporting another; that is to say the ubiquitous computing service. A ubiquitous computing device is a device for the ubiquitous service that allows a user to interact with the service anywhere at any time. Also, it grasps the intention of the user and situation to support the user. Ubiquitous devices function in a state where people do not realize that we acquire information about embedded, pervasive, portability and mobility functions; that is, to realize the ubiquitous environment [5]. Table 1. Characteristics of ubiquitous device

Researcher M.Weiser M.Weiser Burnet & Rainsford Kwon et al. uKoreaForum, 2006

Characteristics of ubiquitous device Pervasiveness Ubiquity Diversification Poratbility Interconnectivity

2.2 Previous Ubiquitous Computing Research By understanding the ubiquitous computing user’s intention and utilizing the user’s envi ronmental characteristics, it was possible to reflect the interaction with the user [4]. Ther efore, it is possible to say that the Context-Aware Computing is similar to the condition

226

J.K. Choi et al.

of the user; especially when it was focused on context-of-use of the mobile devices. Thu s, the Context-Aware Computing [6] model and the ubiquitous computing model are si milar in many ways, especially when focused on the context-of-use of mobile devices. Consequently, it is possible to say that the Context-Aware Computing model and the ubiquitous computing model have many similar points [8,9]. J. Scholtz and S. Consolvo [10] have presented a framework (UEA, Ubiquitous computing Evaluation Area) to evaluate the ubiquitous computing application. The evaluation domains for the ubiquitous computing were: attention, conceptual model, appeal with each conceptual measure and metric. In relation to Context-Aware Computing, Nigel and Miles [11] had presented the idea that to confidentially calculate the usability, it is necessary to evaluate the representative environment, user and task. Thus, it is essential to have a deep understanding of the context of use of the product. 2.3 Limitation J. Scholtz [10] defined things that are important to take into account in ubiquitous computing as “area,” and then categorize them to experiment with a systematic analysis to present a conceptual measurement variable. However, this had a major focus on the ubiquitous service; it did not focus on the device usability evaluation, so there was insufficient consideration of the user’s task. Taken from Nigel’s study[12,13], in most of the context-aware computing studies, only information on the diverse types of context is presented, lacking a concrete connection with usability principles.

3 Framework The Ubiquitous device’s usability evaluation framework was established from previous studies [14]. New suggestions on usability evaluation were proposed after a modification on the context deconstruction. Figure 1 shows how the main user and main task were selected by having the user information of the device. In this way, a specification of the device context information was achieved, which later will be used in an evaluation checklist. Consideration of each device characteristic makes further ubiquitous device usability evaluations possible.

Fig. 1. Evaluation Framework

Web-Based System Development

227

Fig. 2. Generating Evaluation Factors

3.1 Evaluation Factors Figure 2 shows the extractions made in the usability factors for the evaluation framework of the ubiquitous computing environment property. In previous studies, the basic properties for usability evaluation were: efficiency, effectiveness and satisfaction – that is to say ‘General Evaluation Factors.’ The General Evaluation Factors are considered suitable for general device evaluation, but are not specified for ubiquitous computing devices. Proposing factors for a ubiquitous device demands creation of pervasive computing quality and ubiquitous computing quality tool. Table 2. Ubiquitous Device Evaluation Factors

Factor Adaptability Controllability Interconnectivity Mobility Predictability Simplicity Transparency

Description Adaptable or easily adjusted to the changes in context Able to control device in any circumstances Interconnected network among devices, allowing sharing of information The station of the device can be mobile as the user carries it with him From past experience, the result of the system execution can be predicted User interface and instruction are simple Provides the current status of system as well as when it is running an execution

Table 2 shows some comparative computing of related studies that were used for ubiquitous service or ubiquitous software studies: Adaptability, Controllability, Interconnectivity, Mobility, Predictability, Simplicity and Transparency. This is an assortment and integration of ubiquitous device related factors. Usability evaluation factors of devices are organized as: (visual) Clarity, Accessibility, Affect, Compatibility, Consistency, Effectiveness, Efficiency, Error prevention, Feedback, Forgiveness, Helpless, Learnability, Memorability, Multi-threading, Responsiveness, Safety and User tailorability. 3.2 Evaluation Area Figure 3 shows an elementary device deconstruction for a device evaluation. It implements the usability evaluation on each device so as to obtain the degree of usability

228

J.K. Choi et al.

(high or low) that each factor has. The developed factors can be applied to evaluate each device components: LUI (Logical User Interface), GUI (Graphical User Interface), and PUI (Physical User Interface), respectively. The device component can be individually evaluated by making a separation.

Fig. 3. Evaluation area

LUI is divided into: Application software, Menu structure and Contents, System Awareness and System Acceptance. GUI is divided into Indicator, Icon and Menu. In H/W Area, Device H/W is separated in Body and Screen while PUI is separated into Control key and Touch screen. In the case of Touch screen, it is necessary to separately subdivide by input methods, and when evaluating it is risky to use different factors and checklists as standard controllers. Consequently, when evaluating devices with a touch screen, PUI of the touch screen is performed. If there is no touch screen, that evaluation factor is not taken into account. 3.3 Context of Use Figure1 shows the context of information that is solidified as: user type, device type, task type and use type. Use type is information about the environment and condition (situation) in which the user is using the device. Each context information framework has significance on the evaluation target, information access and entertainment systems. User type is divided into novice and expert, while device type is divided into PMP, Music Player, PDA, UMPC, Smartphone and Game Device. Through the expert evaluation, depending on different contexts, each evaluation factor and checklist was evaluated, giving the results of a usability evaluation with relative importance. Each evaluation factor has its own weight, which changes the importance of each checklist, depending on the device and context information.

4 System Development 4.1 System Structure In this study, the system was developed as a Web-based system to let the users perform an evaluation without having trouble with location. This system is composed for a client and a Web server or database server. The client indicates work to the Web server through browsers connected to the Internet after accessing the Web. The Web server then sends a Web page to the client

Web-Based System Development

229

and provides data that the client requires of the database server. The database server is able to query the user regarding the data that the user wants from the Web server, and carries out the work, finally returning the results to the Web server. In Figure 4, the system was developed using Windows Server 2003 Enterprise Edition, IIS (Internet Information Server 6.0) Web Server and MS-SQL 2000 Database Server. ASP (Active Server Page) was used for developing language.

Fig. 4. System structure

4.2 Evaluation Procedure The first step shows a Web page where information of each context type has to be inputted. The information about the context type that was saved in the database is recalled, then shown. In this step, the desired device to be evaluated is selected, and the user to be evaluated is identified as a novice or expert of the device in User Type. In Device Type, a selection is made between PMP Type, MP3 Type and PDA Type. Task type is divided into: video player, music player, information reading and game recognition. Use type depends on the wearable shape and portable form. After that, there is a step to input all the data in the form of a questionnaire. It gives a description of each context type so the user can understand easily, and it also gives an option of ‘not considering’ for the context types that the user does not wish to evaluate. In the second step, after having selected the information for the context, the information about the user is inputted in the server session and a Web page appears showing the different evaluation areas that can be selected to continue. This action recalls the information that is saved in the database and displays it. To allow more than one selection of the area the user wisher to evaluate, the options are selected by checking a box. Moreover, to support the user’s need of knowing more about the area to be evaluated, a description of each area is provided. In the third step, the information about the area that is going to be evaluated is saved in the server session and is shown in the corresponding checklist of each area in the database. In the upper part of the page where the checklist was selected, information about the area that is being evaluated is displayed. Each area is displayed on a different page to reduce and avoid confusion and disorder. In the last step, the user’s selection of the checklist, the information of the context type that was saved in the sever session in previous steps, and the information of the

230

J.K. Choi et al.

evaluation area are saved. From the data that was saved about a determined device, it is possible to obtain the average evaluation score. After being saved in the database, the visible result is shown in a page with an eight-column graph. The results from the evaluation of ubiquitous characteristics are shown in a graph that indicates the score of each ubiquitous factor. The results from the evaluation of general characteristics are also shown in a graph. Moreover, by providing a graph for each factor with a score over 100, we can see insufficient areas more clearly. Also, the result of the device evaluation area (LUI, GUI, PUI, Device H/W) of each of them is represented in a graph with a score over 100 so as to show the areas that have to be improved.

Fig. 5. Evaluation system

5 Conclusion and Further Study This study developed a Web-based system of a framework to evaluate the usability of ubiquitous computing devices. There are three important aspects. First, through the development of a Web-based system, the user can evaluate everywhere where the Internet is accessible. Also, it is more comfortable, as it allows the user to see the results in a moment. Secondly, the user is able to select the area that he desires to evaluate. A complete or part evaluation of the selected areas is possible. Thirdly, as this system uses a database, the evaluated data can be saved. Through this saved data, it is possible to see an average of all the other data of previous and other evaluations. However, this system has only been implemented for a small number of devices, and not in every type of device. Therefore, in further studies it is necessary to increase the validity of the system by performing an evaluation of a more diverse range of devices. Then, after obtaining the validity of the system, it will be possible to make updates.

Web-Based System Development

231

References 1. Zhang, D., Adipat, B.: Challenges, Methodologies, and Issues in the Usability testing of Mobile Applications. International Journal of Human-Computer Interaction 18(3), 293– 308 (2005) 2. Rondeau, D.B.: Branding is Experience. Communication of the ACM 48(7), 61–66 (2005) 3. Park, I.C., Kim, S.C., Choi, M.S.: A Study on the Development Methods of gestureoriented Natural Interface for the Control of Ubiquitous Device. Koeran Society of Basic Design & Art 6(3), 265–274 (2005) 4. Scholtz, J., Richter, H.: Report from ubicomp 2001 workshop: Evaluation methodologies for ubiquitous computing, SIGCHI Bulletin (January/February 2002) 5. Kim, J.Y.: Ubiquitous Computing: Business model and prospects, Samsung Economic Research Institute (2003) 6. Schilit, B.N., Adams, N., Want, R.: Context-Aware Computing Applications. In: IEEE Workshop on Mobile Computing Systems and Applications (1994) 7. Betiol, A.H., de Abreu Cybis, W.: Usability Testing of Mobile Devices: A Comparison of Three Approaches. In: Costabile, M.F., Paternó, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 470–481. Springer, Heidelberg (2005) 8. Kwon, O.B., Kim, J.H.: A Multi-Layered Approach to Assessing Level of Ubiquitous Computing Services. Information System Review 8(1), 43–61 (2006) 9. Nam, J.H., Choi, M.S.: A Study on the Multi Interface Factor Base the Service Evolution on Ubiquitous Computing Environment. Korea Digital Design Council 11, 357–366 (2006) 10. Scholtz, J., Consolvo, S.: Toward a framework for evaluating ubiquitous computing applications. IEEE 3(2), 82–88 (2004) 11. Bevan, N., Macleod, M.: Usability measurement in context. Behavioral and Information Technology 13, 132–145 (1994) 12. Lee, I.S., et al.: Use Contexts for the Mobile Internet: A Longitudinal Study Monitoring Actual Use of Mobile Internet Services. International Journal of Human-Computer Interaction 18(3), 269–292 (2005) 13. Barnard, L., Yi, J.S., Jacko, J.A., Sears, A.: Capturing the effects of context on human performance in mobile computing systems. Personal and Ubiquitous Computing 11, 81–96 (2007) 14. Kim, H.J., Choi, J.K., Ji, Y.: Usability Evaluation Framework for Ubiquitous Computing Device. In: International Conference on Convergence and Hybrid Information Technology (2008)

Evaluating Mobile Usability: The Role of Fidelity in Full-Scale Laboratory Simulations with Mobile ICT for Hospitals Yngve Dahl1, Ole Andreas Alsos2, and Dag Svanæs2 1

Telenor Research & Innovation, Otto Nielsensvei 12, 7004 Trondheim, Norway [email protected] 2 Department of Computer and Information Science, Norwegian University of Science and Technology, Sem Sælandsvei 7-9, 7491 Trondheim, Norway {Ole.Andreas.Alsos,Dag.Svanes}@idi.ntnu.no

Abstract. We have applied full-scale simulations to evaluate the usability of mobile ICT for hospitals in a realistic but controllable research setting. Designing cost-effective and targeted simulations for such a purpose raises the issue of simulation fidelity. Evaluators need to identify which aspects of the research setting that should appear realistic to simulation participants, and which aspect that can be removed or represented more abstractly. Drawing on research on training simulations, this paper discusses three interrelated fidelity components—equipment/prototype fidelity, environmental fidelity, and psychological fidelity. These components need to be adjusted according to which design aspects evaluators want to gather feedback on. We present examples of how we have configured the components in various simulation-based usability assessments of mobile ICT for hospitals. The paper concludes by providing a set of guiding principles concerning the role of fidelity in simulation-based usability evaluations. Keywords: Clinical information systems, fidelity, evaluation, human factors, mobility, simulation, training simulation, usability, user-centered design.

1 Introduction In this paper we investigate the concept of fidelity in the context of full-scale laboratory simulations and usability evaluations of mobile ICT for hospitals. Fidelity, as it is commonly understood in HCI literature, is the extent to which a computer application or system reproduces visual appearance, interaction style and functionalities during evaluation [1]. In conventional desktop-based usability testing the physical and social aspects of the use situation are of little concern for evaluators. Interaction with desktop-based systems is highly uniform and static with regard to the physical and social aspects of the use situation—one user sitting in front of a PC, with his or her attention directed mainly on the computer screen. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 232–241, 2009. © Springer-Verlag Berlin Heidelberg 2009

Evaluating Mobile Usability

233

As human-computer interaction moves “beyond the desktop” and into highly dynamic work settings, such as hospitals, the old standards for usability testing arguably no longer hold water. Work situations in hospitals often involve mobility and bodily work [2, 3]. This makes the usability of mobile ICT more subject to external factors that are not related to the GUI and the software being evaluated as such. These are factors that fall beyond what evaluations conducted in conventional usability laboratories can reveal. We have attempted to meet these challenges by means of full-scale laboratory simulations—Simulated, natural-like hospital environments, in which nurses and physicians act out clinical scenarios using mock-ups or prototype systems. Such an approach arguably raises the need for evaluators to think about the fidelity, or level of detail, of the research setting. A critical issue for applying simulations as a usability evaluation methodology in a cost-effective way is deciding on the right level of fidelity. This paper aims first to show that fidelity in usability evaluations of mobile ICT is a concept that extends beyond the software prototype being evaluated. Particularly, when addressing hospital settings, where the technology is likely to be used as part of work activities requiring manual labor with hands and feet in addition to high situational awareness, the physical environment and the work tasks become vital components of the total system being simulated. The second objective of this paper is to demonstrate how the test environment, the prototype, and the test scenario can be adjusted to achieve targeted evaluations and induce behavior among participant that is desirable in terms of informing specific design aspects relevant for the usability of mobile systems in hospitals.

2 Methodological Motivation The work conditions that ICT supporting clinical care is used in are very different than those for office settings and desktop-based computer interaction. The call for mobile ICT in hospitals essentially stems from the distributed nature of clinical work, and rapidly changing work situations. In order to conduct valid usability assessments of mobile interactive technology for such environments, then, the design solutions must be evaluated in a relevant use context [4]. Consequently, aspects that are characteristic for clinical work, such as mobility, clinician-patient interaction, and frequent context shifts, must be reflected in the setting in which the evaluation takes place. Conducting usability evaluations in actual clinical situations is challenging. Firstly, the hospital is a high-risk environment, in which it can be critical to avoid affecting ongoing work. Secondly, patient information confidentiality is likely to prevent video and audio recording. At the same time, conventional laboratories intended for controlled desktop-based usability evaluations are unsuited for reconstructing the rapidly changing conditions of hospital work. The combined need for a realistic, yet controllable, research setting, has motivated us to build a customized full-scale model of a hospital ward section, with advanced video and audio recording facilities.

234

Y. Dahl, O.A. Alsos, and D. Svanæs

3 Simulation Fidelity Principles HCI literature in general provide little practical guidance on how to compose fullscale simulations with the aim of evaluating usability of mobile ICT. This has motivated us to study literature on training simulations in search of guiding principles. Given the relatively long history of training simulators, we will first highlight some of the central principles described in related research literature. Next, we will provide a brief overview of studies within the field of mobile HCI, where simulations have been employed to evaluate prototypes and early concepts. 3.1 Simulations Applied for Training Purposes Within high-risk industries such as aviation, naval shipping, health care, and nuclear power production there has been a long tradition for using simulations for training purposes in risk-free environments. Obviously, the objective of training simulations and simulations for usability purposes differ. Simulations applied in the context of usability assessment aim at gathering data about the effectiveness, efficiency, and user satisfaction of a product used by specific users in a realistic situation. The focus is on product performance. Training simulations, on the other hand, are typically used for educational purposes and for maintaining or enhancing human work-related skills. In this sense, they are human-centric rather than product-centric. Despite these differences in focus, many of the concepts developed from research on training simulations are highly relevant when designing simulation-based usability evaluations. Among the most central concepts described in research literature on training simulations are equipment fidelity, environment fidelity, and psychological fidelity [5]. Equipment fidelity refers to the extent to which the appearance and feel of real tools, devices, or systems that simulation participants operate on are duplicated. For example, aircraft cockpit procedures have been trained both with hi-fi representations of aircraft instruments and with lo-fi mock-ups [6]. Environment fidelity concerns the extent to which physical characteristics of the real-world environment (beyond the training devices) are realistically represented in the simulation. High-fidelity aircraft simulators are full-size replica of cockpits that duplicate of the operational aircraft environment and motions to great detail [7]. In flight training environments of less fidelity (e.g., desktop evaluation environments), visual and motional cues are typically reduced or lacking. Lastly, psychological fidelity, relates to the realism of the simulation as perceived by its participants. In other words, it is extent to which participants are able to engage in the simulated situation, as they would have done in the natural setting. This is intimately dependent on the psychological demands the simulated tasks place on the participants. Human perception, attention, decision-making, memory, and action are factors that may influence psychological fidelity [8, p. 420]. Developing scenarios that replicate task demands of the real-world system is a common technique for enhancing psychological fidelity [5]. In training simulations, psychological fidelity is often considered the most important, because it is the most relevant attribute for learning. Prophet and Boyd [6] found that the transfer of training was equal for students practicing ground cockpit procedures in real airplanes, and student practicing the same routines on lo-fi representations of the relevant devices.

Evaluating Mobile Usability

235

Each of the three components described above can be set along a continuum ranging from low to high fidelity (Fig. 1). As pointed out by Beaubien and Baker [5], the level of fidelity for each component typically depends on the purpose of the training simulation. For example, low-fidelity role-plays have been used to train teamwork related attitudes and skills, while simulations of higher fidelity are required if the goal is to learn the specific consequences of actions. Taken together equipment, environment, and psychological fidelity form the overall simulation fidelity (Fig. 1).

Fig. 1. Three interrelated simulation fidelity components. Each component can be set along a continuum ranging from low to high fidelity.

3.2 Mapping the Simulation Fidelity Concepts to HCI Terminology In the context of HCI and usability assessments we consider equipment fidelity to be the equivalent of computer system or application fidelity (we will refer to this as prototype fidelity). As previously noted, this component encompasses physical appearance, interaction style, and functionalities. Environmental fidelity is the realism of the physical use setting (i.e., the point of interaction), while psychological fidelity corresponds to the user-perceived realism of the tasks they are given as part of a test. 3.3 Simulations Applied in Mobile HCI The use of simulations in mobile HCI is in many ways a result of the recognition that conventional usability laboratories and testing do not duplicate factors affecting mobile usability [9, 10]. It can also been considered an efficient approach to overcome many of the challenges related to studying mobile systems in the field [11]. Examples of usability studies in which contextual features have been simulated in laboratory settings are decribed in work by Bohnenberger et al. [12], Pirhonen et al. [13], and Kjeldskov and Skov [14]. Some simulation-based usability studies of relevance to mobile ICT in hospital settings can be found in Refs. [4, 15, 16].

4 Applying the Fidelity Principles in Practice In the current section we will provide examples from conducted simulations, showing how we have attempted to carefully adjust simulation fidelity to promote reflection

236

Y. Dahl, O.A. Alsos, and D. Svanæs

among participants regarding specific design aspects. Our examples also highlight the close interrelationship between the various simulation fidelity components presented earlier. 4.1 Case Study: Point-of-Care Scenarios There are many hospital scenarios in which mobile ICT may prove helpful [2]. We have concentrated on a limited set of situations that have been found appropriate for usability testing in our full-scale hospital ward model. In particular, we have focused on situations where nurses and physicians are located at the patient bedside, i.e., at the point of care. Examples of hospital routines were such situations occur include ward rounds, during administration of medicine, and in response to patient calls. These situations form suitable test candidates for mobile ICT because they occur frequently in hospital wards, require mobility, and involve personnel who need quick and effortless access to patient related information. 4.2 Environment Fidelity As previously pointed out, the circumstances under which mobile ICT supporting hospital work are used are radically different than that for office work and conventional desktop computer interaction. Human-computer interaction in clinical settings is more physical in nature both in the sense that hospital workers are using them while on the move, and in situations requiring physical interaction with patients (e.g., assistance and examination). We have observed that to capture physical and bodily usability aspects of mobile ICT used at point of care, prototype solutions must be evaluated in research environments that closely mimic the physical environment of real patient rooms. For example, realistically propositioned rooms furnished with patient beds makes it possible for participants to move naturally in the model both around and between patient beds. For simulations addressing the usability of point-of-care systems this is essential, because it can help give participants a realistic impression of how well different solutions are physically adapted to the care situations. Examples of physical design factors, which according to simulation participants can increase the usability of mobile ICT at point of care, include the possibility easily to share screen content with patients (Fig. 2, left) and the opportunity to have digital media ready at hand to be to be used and put aside depending on what the immediate care situation calls for (Fig. 2, right). Both examples illustrate the intimate relationship between the physical environment and the physical placement and form factors of interactive devices.

Fig. 2. Examples showing the close relationship between the physical environment and the physical placement (left) and form factors (right) of digital media

Evaluating Mobile Usability

237

There are often subtle details that need to be in place to capture physical and bodily usability factors. For example, to form a realistic impression of how well an interaction design solution accommodates the dialogue between clinicians and in-bed patients, patients need to be represented by human actors in real hospital beds (Fig. 2, left). Likewise, simulation participants need to wear their daily work uniforms, to bring forth or temporarily put aside digital media (Fig. 2, right). 4.3 Prototype Fidelity The Spectrum of Prototype Fidelity. Prototypes can generally be divided into low, medium, or high fidelity. In low-fidelity prototyping, props (e.g., foam and cardboard models, paper and post-it notes, etc.) are often used as physical representations of interactive devices with rough sketches of envisioned graphical interfaces. Mediumfidelity prototypes are functional (computer-based) models of systems. They generally have simplified GUIs, but little or no functionality behind the GUI elements. High-fidelity prototypes are sophisticated and functional versions of envisioned designs. They may show sample information content. Prototype Fidelity for Point-of-Care Scenarios. In our studies of mobile ICT applied in hospital settings we have used prototypes of different fidelities depending on design phase. As part of specifying user requirements for mobile ICT for point-ofcare usage, we have conducted simulations using lo-fi props in realistic models of patient rooms [17]. The main rational for this is to put focus on the context in which the technology will be used, rather than details concerning GUIs and software functionality. By applying mock-ups one can avoid restricting reflection among participants by committing to particular hardware and interfaces. In simulations with functional prototypes, we have attempted to be more particular about design aspects we wanted to address. For these simulations we have applied mixed fidelity prototypes [18], i.e., models that combine different fidelities with regard to GUIs, functionalities, interaction styles, and information content. For example, in simulations addressing the usability of location-based access to medical information at the point of care (the full study is described in Ref. [16]) we found it useful

Fig. 3. Mixed fidelity prototype providing location-based access to a patient’s medical record. High-fidelity sensors and radio tags detect the physical position of a test participant and near patients. The graphical representation of the medical record, which is automatically retrieved and presented on the bedside terminal, is of low fidelity.

238

Y. Dahl, O.A. Alsos, and D. Svanæs

to implement the interaction techniques using high-fidelity hardware and sensors to give participants a realistic impression of actual use. At the same time, we deliberately kept the details of the GUI at a low level and avoided to link the prototype to realistic medical sample data. The main rational for this delimitation is that we primarily wanted to help participant focus on and give feedback regarding the usability of interaction styles, rather than GUI related aspects of the design. Fig. 3 shows the mixed fidelity prototype that we applied in the example described above. 4.4 Psychological Fidelity This section gives a brief summary of different techniques we have employed to increase the psychological fidelity of our simulations. Domain Expertise. As pointed out above, developing scenarios that mimic the task demands of the real-world system can increase psychological fidelity. To facilitate this, and make sure that the simulations reflected a sufficient degree of realism, the scenarios were designed with assistance from domain experts (physicians and nurses). Baseline Scenarios. To help participants relate the prototypes and evaluated concepts to their everyday workday, we have learned that running an introductory baseline scenario reflecting current paper-based practices are useful (Fig. 4). These “as-is” scenarios can effectively act as a reference or benchmark for the participants when they later act out the same scenario using functional prototypes. This help participants “make sense” of scenarios involving information media they are not familiar with.

Fig. 4. Baseline scenarios reflecting current paper-based practices (left) can act as references for test subjects when they later try out digital solutions for the same purposes (right)

Targeted Simulations. Because the conducted simulations mainly have focused on evaluating early concepts and partial prototypes, we have tried to tailor the scenarios to help promote feedback on the particular design aspects. In some cases, this has resulted in delimitations that, depending on the purpose of the simulation, possibly might reduce psychological fidelity. Using only mock-up representations of electronic medical records in some of the simulations, as explained above, is an example of such a limitation of scope. This, however, we consider necessary compromises to achieve targeted evaluations. Feedback from the participants indicated that they found the simulations realistic in spite of such simplifications. Scripted Simulations. We have also experimented with ways to increase the participants’ perceived realism of the simulations by trying to integrate their professional experience. One technique we have applied is to use patient actors instructed to reveal

Evaluating Mobile Usability

239

certain pieces of information during the scenario, but leave it for the participants (i.e., clinicians) to decide how to act on that information [19]. For other types of simulations we have found it sufficiently to give the patient actors a more passive role, e.g., acting as physical markers in the simulations.

5 Setting the Scene Right A fundamental dilemma related to simulations, whether applied for usability assessment or training purposes, is specifying a priori a sufficient fidelity level. Typically, approximating mobile use contexts involves trade-offs between control over the research setting, realism, and available resources [4]. In the current section we will present and discuss some guiding principles for “setting the scene right” for simulation-based usability evaluations of mobile ICT for hospitals. 5.1 Psychological Fidelity First In Sect. 3.1 we pointed out that psychological fidelity is often considered the most significant fidelity component in training simulations. This is because it represents the transfer of skills learnt in the simulated setting back into the real world. Based on our experiments, we argue that the same component is also the most critical when designing full-scale simulations for usability assessment purposes, but for different reasons. We see psychological fidelity first and foremost as a key premise for provoking reflection among simulation participants. This is especially valuable in early phases of the design process before critical design choices are taken, and when end user feedback is most likely to help inform the actual design. In order to motivate reflection among simulation participants it is essential that they see the on-the-job relevance of the simulation. As discussed in Sect. 4, a realistic test environment and prototypes with certain functional features can help evoke the central psychological mechanism triggered in everyday clinical work. 5.2 How Much Fidelity Is Enough? In our full-scale simulations we have attempted to follow a “just enough” principle with regard to the amount of realism reflected. That is, duplicating aspects that we, along with domain experts, consider most likely to affect the perceived usability of the prototypes that are to be tested. This, as we have pointed out earlier, depends closely on the object of the simulation. Because the usability of mobile ICT for hospitals are highly contextual, there is no “one size fits all” approach when it comes to fidelity of simulation-based usability evaluations.

6 Summary and Conclusions In this paper we have investigated the role of fidelity in full-scale laboratory simulations used for usability assessment of mobile ICT for hospitals. Drawing on training simulation research, we identified three components of relevance—Environment fidelity, equipment or prototype fidelity, and psychological fidelity. We have shown

240

Y. Dahl, O.A. Alsos, and D. Svanæs

by examples from practical simulations how the different components can be modified to match the focus of different evaluations. The key principles the current paper has suggested regarding fidelity of simulationbased usability assessments of mobile ICT for hospitals are as follows: • Simulations need to be specific about the design aspects that are being evaluated. Cost-effective and targeted simulations should replicate features of the mobile ICT solution, physical environment, and work tasks, that are considered relevant for the design aspect one wants to gather feedback on. • There is no direct correlation between the overall fidelity of experimental simulations and effectiveness in terms of informing design. Realistic prototypes and work environments may enhance the perceived realism of the simulation, but this does not guarantee that feedback from participant are valuable in terms of informing design. • Simulations for usability assessment purposes should prioritize psychological fidelity. This component is most relevant for provoking reflection in design among simulation participants. To provoke such reflections it essential that participants are able to relate the design concept to their everyday work. • The requirements to simulation fidelity will typically increase as the mobile ICT solution is developed. We expect that future simulations in our full-scale ward model will enable us to further explore the role fidelity plays in simulation-based usability evaluations of mobile ICT for clinical work.

Acknowledgements The current work has been supported in part by the Norwegian Research Council by grant 176761 (POCMAP) of the VerdIKT program, DIPS ASA, The Industrial Research Fund for NTNU, St. Olav University Hospital, Akershus University Hospital, NTNU, and Telenor R&I.

References 1. Rudd, J., Stern, K., Isensee, S.: Low vs. high-fidelity prototyping debate. Interactions 3, 76–85 (1996) 2. Sørby, I.D., Melby, L., Nytrø, Ø.: Characterising cooperation in the ward: framework for producing requirements to mobile electronic healthcare records. International Journal of Healthcare Technology and Management 7, 506–521 (2006) 3. Bardram, J.E., Bossen, C.: Mobility Work: The Spatial Dimension of Collaboration at a Hospital. Computer Supported Cooperative Work 14, 131–160 (2005) 4. Kjeldskov, J., Skov, M.B., Als, B.S., Høegh, R.T.: Is It Worth the Hassle? Exploring the Added Value of Evaluating the Usability of Context-Aware Mobile Systems in the Field. In: Mobile HCI 2004, pp. 61–73 (2004) 5. Beaubien, J.M., Baker, D.P.: The use of simulation for training teamwork skills in health care: how low can you go? Qual Saf Health Care 13 (2004)

Evaluating Mobile Usability

241

6. Prophet, W.W., Boyd, H.A.: Device-Task Fidelity and Transfer of Training: Aircraft Cockpit Procedures Training. Tech. Report. Human Resources Research Organization, Alexandria, VA (1970) 7. Rehmann, A., Mitman, R., Reynolds, M.: A handbook of flight simulation fidelity requirements for human factors research. Tech. Report No. DOT/FAA/CT-TN95/46. Wright-Patterson AFB, OH: Crew Systems Ergonomics Information Analysis Center (1995) 8. Patrick, J.: Training. In: Tsang, P.S., Vidulich, M.A. (eds.) Principles and Practice of Aviation Psychology, CRC Press, Boca Raton (2002) 9. Johnson, P.: Usability and mobility; Interactions on the move. In: Proceedings of the First Workshop on Human-Computer Interaction with Mobile Devices (1998) 10. Graham, R., Carter, C.: Comparison of speech input and manual control of in-car devices while on-the-move. In: Mobile HCI 1999 (1999) 11. Pascoe, J., Ryan, N., Morse, D.: Using while moving: HCI issues in fieldwork environments. Transactions on Computer-Human Interaction 7, 417–437 (2000) 12. Bohnenberger, T., Jameson, A., Krüger, A., Butz, A.: Location-aware shopping assistance: Evaluation of a decision-theoretic approach. In: Paternó, F. (ed.) Mobile HCI 2002. LNCS, vol. 2411, pp. 155–169. Springer, Heidelberg (2002) 13. Pirhonen, A., Brewster, S., Holguin, C.: Gestural and Audio Metaphors as a Means of Control for Mobile Devices. In: Proceedings of the SIGCHI conference on Human factors in computing systems (2002) 14. Kjeldskov, J., Skov, M.B.: Creating Realistic Laboratory Settings: Comparative Studies of Three Think-Aloud Usability Evaluations of a Mobile System. In: Interact 2003, pp. 663– 670. IOS Press, Amsterdam (2003) 15. Alsos, O.A., Svanæs, D.: Interaction techniques for using handhelds and PCs together in a clinical setting. In: NordCHI, Oslo, Norway, pp. 125–134. ACM, New York (2006) 16. Dahl, Y., Svanæs, D.: A comparison of location and token-based interaction techniques for point-of-care access to medical information. Personal and Ubiquitous Computing 12, 459– 478 (2008) 17. Svanæs, D., Seland, G.: Putting the users center stage: role playing and low-fi prototyping enable end users to design mobile systems. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Vienna, Austria, ACM, New York (2004) 18. Petrie, J.N., Schneider, K.A.: Mixed-fidelity prototyping of user interfaces. In: Doherty, G., Blandford, A. (eds.) DSVIS 2006. LNCS, vol. 4323, pp. 199–212. Springer, Heidelberg (2007) 19. Alsos, O.A., Dabelow, B.: Stylus, Finger, or Buttons? A Comparative Evaluation Study of Interaction Techniques for PDAs in Point-of-Care Situations (submitted manuscript)

A Multidimensional Approach for the Evaluation of Mobile Application User Interfaces José Eustáquio Rangel de Queiroz and Danilo de Sousa Ferreira Federal University of Campina Grande, Electrical Engineering and Computer Science Center – Computer Science Department, Av. Aprígio Veloso, s/n – Bodocongó, Campina Grande, CEP 58109-970, Paraíba, Brazil {rangel,danilo}@dsc.ufcg.edu.br, {rangeldequeiroz,danilo.sousa}@gmail.com

Abstract. This paper focuses on a hybrid approach for the evaluation of mobile application UI, based upon a set of well known techniques for usability evaluation. Two perspectives of the problem are focused: (i) the user’s perspective, which is expressed by user’s perception of the application; and (ii) the specialist’s perspective, which is expressed by his/her considerations from the point of view of the user-application interaction, and from the point of view of the HCI community as well. Further comparisons between a lab and field evaluation approaches are given for a case study involving an Internet tablet. Conclusions are given concerning on how to apply the experience acquired by evaluating conventional UI in the mobile technology domain. Keywords: Usability evaluation, mobile devices, multidimensional approach.

1 Introduction In recent years, a variety of mobile computing devices has emerged, including portables, palmtops, and PDAs. In consequence, the evolution of mobile computing devices has imposed a clear need for evaluation methods that are specifically suited to mobile devices. Nonetheless, the more evolved is the computing market toward mobile computing systems, the more difficult becomes for HCI evaluators to choose among the approaches for mobile application UI usability evaluation recently proposed in the literature. It is a fact that a user is likely to be mobile is the single greatest difference in context between users of mobile and desktop devices, and of course that mobility leads to dynamic changes in users’ context. But it has to be also taken into account that it is difficult, if not impossible, to compare between different studies, based on a plethora of claims made without solid statistical results. Given the inherent features of these devices (e.g., mobility, restrictive I/O and storage capabilities, and dynamic use contexts), they has imposed a clear need for specifically suited evaluation methods. One of the major questions in the literature is related to the possibility of adapting concepts, methodologies, and approaches commonly used in traditional lab and field testing of desktop applications to mobile ones. Another question has been related to whether to adopt a field or a lab approach. However, little discussion is given of which technique or combination of them is best suited for a specific application and its context of use. Beyond these J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 242–251, 2009. © Springer-Verlag Berlin Heidelberg 2009

A Multidimensional Approach

243

levels of choice and for a successful choice, it seems equally essential to know about the effectiveness of the chosen approach. Practitioners need to know which methods are more effective and in what ways and for what purposes. Otherwise the evaluation process may result in a big effort with a small payoff.

2 Usability Evaluation for Mobile Devices Usability data typically consists of any kind of information which can be used as measures or identification keys for factors affecting the usability of a system. Such kinds of data are collected by usability evaluation methods and techniques that can assign values to usability dimensions for evaluating different kinds of UI [1] and/or indicate usability problems or other design deficiencies in UI [2]. Usability data are usually gathered via either analytic or empirical methods [2][3]. Analytic or expert-based methods are often conducted by HCI experts and do not involve human participants performing the tasks, i.e. they rely frequently on the specialists' judgment. On the other hand, in spite of being also conducted by HCI experts, empirical or used-based methods involve data collection of human usage. Usability diagnosis basically begins with raw observational data often categorized into models/ frameworks emphasizing either (i) the nature/fidelity of the artifact being evaluated; (ii) the context of use (involving user and social relations, tasks and psychological factors, and environmental aspects); (iii) the approach adopted for capturing the data (including the expended resources, the involved degree of formality, rigor, and amount of designers/evaluators and users); or (iv) the goal of the collection effort [4][5][6]. Some of those models and frameworks are aligned to ISO usability dimensions [7], which are commonly taken to include the aspects efficiency, effectiveness, and subjective satisfaction in a HCI process. It is undeniable that usability evaluation effort for desktop systems has grown especially in the last decade. In spite of debates still taking place within the HCI area, they are often based on a tacit understanding of basic concepts. Extensive guidelines have been written for describing how usability evaluation in controlled environments should be conducted (e.g., [3][8]). Further, experimental results highlighting pros and cons of different techniques are available to be applied (e.g., [9]). Especially in the past decade, technological advances and methodological approaches in HCI have been challenged by the growing focus on applications for mobile computing devices. Several authors (e.g., [10][11]) argue that mobile computing demands not only real users but also a real or simulated context with device interaction tasks as well as real tasks or realistic task simulations. The question about carrying on mobile device evaluation in a lab or field context has also been discussed (e.g. [9][12]), the effectiveness of the approach depending on the relevance of the results presented, and on the quality of the data analysis process as well. However, despite presenting data analysis results, the reports usually omit some important details of the data gathering and the analysis process. For sure, they could guide choices, and give a comprehensive view of the approach. While a strong effort of HCI research has been devoted on alternatives for data collection issues, data analysis/validation are presented in rare cases (e.g., [3][12]). In consequence, the evaluator is unable to replicate appropriately and successfully the reported findings in other contexts. As for empirical data analysis, many methods and techniques have been employed for field testing data, video data, expert data, or

244

J.E.R. de Queiroz and D. de Sousa Ferreira

head-mounted video and cued recall [3][13] [14]. In essence, the usual method triangulation seems to be a field testing without or with video analysis, and transcriptions of usability test sessions. The absence of an in depth usage data analysis seems be due to it is often not applicable to industrial purposes due to several constraints. Nonetheless for research purposes it is strongly recommended to provide sufficient detail to allow for replication.

3 The Multidimensional Evaluation Approach The present approach was originally proposed for evaluating desktop application UI [15], and further adapted to evaluate the usability of mobile application UI [16]. It is based upon a hybrid strategy which encompasses the best features of: (i) standards inspection; (ii) user performance measurement; and (iii) user inquiry. It is based on the premises that (i) each evaluation technique provides a different level of information, which will help the evaluator to identify usability problems from a specific point of view; and (ii) triangulation is used to compare the data collected from the various techniques with the aim to produce complementary and more robust results. 3.1 Product Standard Conformity Assessment, User Performance Measurement and User Subjective Satisfaction Measurement According to [7] conformity assessment means checking whether products, services, materials, processes, systems, and personnel measure up to the requirements of standards. For conformity assessment, the desktop version, of the multidimensional approach adopts the standard ISO 9241 (Ergonomic Requirements for Office Work with Visual Display Terminals). In its mobile application UI evaluation version, and more specifically for the Internet tablet case study presented in this paper, it was found that only some parts of ISO 9241 could be applied: 14 [17], 16 [18], and 17 [18]. Some other standards were applied to this kind of device such as ISO/IEC 14754 [20], ISO/IEC 24755 [21]. In general, user performance measurement aims to enable real time monitoring of user activities, providing data on the effectiveness and efficiency of his/her interaction with a product. It also enables comparisons with similar products, or with previous versions of the same product along its development lifecycle, highlighting areas where the product usability can be improved. When combined with the other methods, it can provide a more comprehensive view of the usability of a system. The major change introduced in the original evaluation approach concerns the introduction of field tests as a complement to the original lab tests. The measurement of user subjective satisfaction has been widely adopted as a measure of IS success, and the subject of a number of researches since the 1980s (e.g., [22][23]). User satisfaction diagnosis provides an insight into the level of user satisfaction with the product, highlighting the relevance of the problems found and their impact on the product acceptance. In this approach, user subjective satisfaction data are gathered from three methods: (i) automated questionnaires administered before and after test sessions; (ii) informal think-aloud trials performed during test sessions; and (iii) unstructured interviews conducted at the end of test sessions.

A Multidimensional Approach

245

In essence, ISO defines usability as an extent to which a product can be used by specified users, in a specified context of use, to achieve specified goals with effectiveness, efficiency and satisfaction [7]. It also defines that at least one indicator in each of these aspects should be measured to determine the level of usability achieved. As briefly exposed in this section, the multidimensional approach presented here meets the requirements set by ISO 9241-11 because it is used: (i) the task execution time as an efficiency indicator; (ii) the number of incorrect actions, the number of incorrect choices, the number of repeated errors, and the number of accesses to the online/ printed help as effectiveness indicators; and (iii) the think-aloud comments, the unstructured interview responses, and the questionnaire scores as subjective satisfaction indicators.

4 Comparative Study of Lab versus Field Use of an Internet Tablet The main objective of this study was to investigate the need for adapting the original evaluation approach to the context of mobile UI applications, based on the analysis of the influence of the context - lab versus field, mobility versus stationary interaction. 4.1 Experiment Design The experiment was designed to investigate the influence of the context (field and lab and related aspects, e.g., mobility, settings) and the user experience on the evaluation results. Consequently, independent and dependent variables were chosen, as well as objective and subjective usability indicators were defined. The independent variables chosen were: (i) Task context, which comprised external factors (e.g., noise level and light intensity) and internal factors (e.g., stress or other health conditions) that could affect the user behavior and performance during the usability test; (ii) User mobility, which referred to conditions under which the task was being performed (e.g., moving between places or stand still wandering is working while being mobile); and (iii) User experience level, which referred to the user knowledge regarding mobile devices and desktop computers, in general. On the other hand, the dependent variables chosen were: (i) Task execution time (time taken by a device user to perform a task); (ii) Number of incorrect choices (number of times the user has made incorrect choices while selecting menu options in the interface); (iii) Number of incorrect actions (number of times the user has performed incorrect actions while selecting menu options in the interface, excluding menu incorrect choices); (iv) Number of repeated errors (number of times the same error was made by the user while performing a task, excluding the number of incorrect choices); (v) Number of accesses to the online/printed help (number of times the user accessed the online and/or printed help while performing a task); (vi) Perceived usefulness (user opinion about the usefulness of the mobile application for the prescribed task); and (vii) Perceived ease of use (user subjective satisfaction when using the mobile device). The chosen usability objective indicators were: (i) Task execution time; (ii) Number of incorrect actions; (iii) Number of incorrect choices; (iv) Number of repeated errors; and (v) Number of accesses to the online help and/or printed manuals. Additionally, the chosen usability subjective indicators were: (i) Product easy of use; (ii)

246

J.E.R. de Queiroz and D. de Sousa Ferreira

Task completion easiness; (iii) Input mechanism easy of use; (iv) Text input modes easy of use; (v) Ease-of-understanding terms and labels; (vi) Ease-of-understanding messages; and (vii) Ease-of-use of help instructions. 4.2 Test Environment, Materials and Participants In both realistic test environments, all the elements (e.g., tasks, informal think aloud, unstructured interviews) were identical, only the test environment was different. The lab test was conducted in a typical usability lab, while the field test was conducted in an environment in which users could walk, stand still, sit or do whatever they would normally do while performing their tasks. To minimize moderator bias, the tests were conducted with three experienced usability practitioners with 3 to 12 years of experience in usability testing. The instructions given to participants were predefined. All moderators participated in data gathering and data analysis. Statistical analysis was performed by one moderator, and revised by another one. The chosen mobile device for this experiment was the Nokia 770 Internet Tablet and some of its native applications. Tests were performed in a controlled environment (a usability lab) and in the field as well. In the first one, the interaction was recorded by using three cameras installed in the room, one focused on the user facial expressions, a second one wider focused on the table where the user performed test tasks with the device fixed to the table or free in his/her hand. In the field experiment, a micro-camera connected to a transmitter was coupled to the device to remotely record and transmit user-device interaction data to the lab through a wireless connection (see Fig. 1).

Fig. 1. Apparatus to support the video micro-camera

Additionally, a remote screen capture software (VNC) was used to take test session screenshots, and a web tool named WebQuest [24] was used as well for supporting the user subjective satisfaction measurement. WebQuest supports the specialist during data collection, automatic score computation, performs statistical analysis, and generates graphical results. Currently WebQuest supports two questionnaires: (i) a pre test questionnaire, the USer (User Sketcher), conceived to raise the profile of the system users; and (ii) a post test questionnaire, the USE (User Satisfaction Enquirer), conceived to raise the user degree of satisfaction with the system. The participants were divided into two groups of 20 for the field and lab tests. According to their experience levels, both groups were then subdivided into three subgroups. A ratio of 8 beginners to 8 intermediates to 4 experts was adopted. For the lab

A Multidimensional Approach

247

tests only, the 20 participants were subdivided again into two subgroups of 10 to perform the task script with the device fixed on the table and free in their hands. A ratio of 4 beginners to 4 intermediates to 2 experts for each subgroup was adopted. 4.3 Experimental Procedure Observation and retrospective audio/video analysis for quantitative and qualitative data were employed. Participants were required to provide written consent to be filmed prior to, during and immediately after the test sessions, and to permit the use of their images/sound for research purposes without limitation or additional compensation. On the other hand, the evaluation team was committed to do not disclose the user performance or other personal information. According to the approach basis, the first step consisted in defining the evaluation scope for the product as well as in designing a test task scenario, in which the target problems addressed were related to the: (i) shape/dimensions of the device; (ii) mechanisms for information input/output; (iii) processing power; (iv) navigation between functions; and (v) information legibility. Since the test objectives focused on (i) investigating the target problems; and (ii) detecting other problems which affect usability, a basic but representative set of test tasks was selected and implemented. The test tasks consisted of (i) initializing the device; (ii) searching for books in an online store; (iii) visualizing a PDF file; (iv) entering textual information; (v) using the e-mail; e (vi) using the audio player. After planning, 2 pilot tests (lab and field) were conducted to verify the adequacy of the experimental procedure, materials, and environment. Aiming to prevent user tiredness, the session time was limited to 60 minutes, and the test scenario was re-dimensioned to 6 tasks. Thus, each test session consisted of (i) introducing the user to the test environment by explaining the test purpose and procedure; (ii) applying the pre-test questionnaire (USer); (iii) performing the six-task script; (iv) applying the post-test questionnaire (USE); and (v) performing a non-structured interview. For the participants who declared not having had any previous contact with the Internet tablet, an introductory explanation about the device I/O modes and its main resources was given, considering that, at the time of the experiment, it was not yet widely spread in Brazil.

5 Results Conformity assessment results can be summarized by computing an Adherence Rating (AR), which is the percentage of the Applicable recommendations (Ar) that were Successfully adhered to (Sar) [17]. The results of the conformity assessment are summarized in Table 1. As it can be observed, all the ARs, excluding that one related to ISO 14745, are higher than 75%, which means successful results. As for ISO 14745, the result indicates the need to improve the input text via write recognition. Those results corroborate with the idea that the efficacy of standards inspection can be considerably improved if it is based upon standards conceived specifically for mobile devices, which could evidence more usability problems.

248

J.E.R. de Queiroz and D. de Sousa Ferreira Table 1. Nokia 770 conformity assessment with standards

STANDARD

#Sar

#Ar

AR (%)

ISO 9241 Part 14 ISO 9241 Part 16 ISO 9241 Part 17 ISO 14754 ISO 24755

45,0 26,0 47,0 4,0 6,0

53,0 33,0 52,0 11,0 7,0

84,9 78,8 90,4 36,4 85,7

As for the user subjective satisfaction measurement, both questions and answers of the post test questionnaire (USE) were previously configured. The questionnaire was applied soon after the usability test and answered using the mobile device itself, with the purpose to collect information on the user degree of satisfaction with the device by means of 38 questions about menu items, navigation cues, understandability of the messages, ease of use functions, I/O mechanisms, online help and printed manuals, users’ impression, and product acceptance level.. With the support of the pre test questionnaire (USer), the user sample profile was drawn. It was composed of 16 male and 24 female users, of which 16 were undergraduate students, 17 post-graduate students, 5 had graduate level and 2 had post-graduate level. The age varied between 18 and 29 years. They were mainly right handed and used some sort of reading aid (glasses or lenses). All of them had at least 1 year of previous experience with computer systems, were currently using computers on a daily basis, and had previous experience with mobile devices. The ranges for USE normalized user satisfaction are 0.67 to 1.00 (Extremely Satisfied), 0.33 to 0.66 (Very satisfied), 0.01 to 0.32 (Fairly satisfied), 0.00 (Neither satisfied nor unsatisfied), 0.01 to -0.32 (Fairly dissatisfied), -0.33 to -0.66 (Very dissatisfied), and -0.67 to -1.00 (Extremely dissatisfied). The normalized user satisfaction achieved was 0.330 (Very satisfied) for the lab experiment, and 0.237 (Fairly satisfied) for the field experiment . During the test sessions were identified 23 usability problems. 21 problems (91.3%) were detected in the lab experiment, while 14 ones (60.8%) were found in the field experiment . On the other hand, 12 problems (60.0%) were found with the device fixed on the table, while 15 (75.0%) were identified with the device free, in the user’s hands. Since the multidimensional approach is based upon the triangulation of results, Table 2 summarizes the usability problem categories which were identified during the evaluation process. For each category, the number of problems identified by each technique is given. As can be seen, some of the usability problem categories were more associated to the performance measurement (e.g. hardware aspects, help mechanisms) whereas others (e.g. menu navigation, presentation of menu options) were identified by the conformity assessment. The combination of the results from the post-test questionnaire to the comments made during the test sessions and the nonstructured interviews at the end of each experiment session showed that the user opinion was in agreement (e.g., location and sequence of menu options) or disagreement (e.g., menu navigation) with the results obtained from the other two evaluation techniques. This discrepancy can originate from the users’ perception of product quality, and from the perception of their own skills to perform the task.

A Multidimensional Approach

249

Table 2. Overlay of results obtained from different techniques described above PROBLEM CATEGORY Location and sequence of menu options Menu navigation Presentation of menu options Information feedback Object manipulation Symbols and icons Text entry via stylus (writing recognition) Text entry via virtual keyboard Processing power Hardware issues Fluent tasks execution Online and offline help Form manipulation

SI 9 (06) 9 (01) 9 (04) 9 (01) 9 (06) 9 (02) 9 (07)

9 (05)

Legend: SI – Standards Inspection PM – Performance Measurement SM – Satisfaction Measurement

PM 9 (01) 9 (01) 9 (02)

9 (01) 9 (04) 9 (05) 9 (06) 9 (03) 9 (01)

SM 9 (07) 8 (01) 9 (04) 8 (03) 9 (02) 9 (07) 8 (01) 9 (04) 8 (05) 9 (06) 9 (03)

9 Consistent findings 8 Contradictory findings

The statistic analysis consisted of: (1) building a report with univariate statistics; (2) generating the covariance matrices for the predefined objective and subjective indicators; (3) applying the one-way F ANOVA test to the data obtained from the previous step in order to investigate possible differences; and (4) applying the TukeyKramer process to the one-way F ANOVA results aiming to investigate if the found differences were statistically significant to support inferences from the selected sample.According to the results (see Table 3), the series of two-factor ANOVA involving Time, Errors (Incorrect actions, Incorrect choices, and Repeated errors), and Help accesses showed that the user experience level had a more significant effect on the number incorrect choices in the field experiment than in lab one. Pre and post-test questionnaire analysis and informal interviews results reinforced that domain knowledge and computer literacy have significant influence on user performance concerning the incidence of errors, both in lab and in the field. Table 3. Lab x Field and Fixed x Free experiment results

Lab

p-Value (α=0.05) Field Fixed

Free

Experience x Task Time

0.019

0.056

0.025

0.026

Experience x Incorrect Actions

0.003

0.003

0.001

0.043

Experience x Incorrect Choices

0.049

0.0006

0.164

0.270

Experience x Repeated Errors

0.017

0.127

0.194

0.133

Experience x Help Accesses

0.164

0.563

0.148

-

Variable Pair

250

J.E.R. de Queiroz and D. de Sousa Ferreira

6 Final Considerations Studies in the literature fit basically in two categories: (i) user mobility while using the device, inside of a lab or outdoors; and (ii) user distraction in pervasive computing. This study considered both aspects as part of the task context. In field test subjects were free to choose between moving or remaining still as they performed the task with the mobile device. The movement registered was limited to situations while the user waited for some device processing (e.g. web page downloads). During the field tests, while the user was moving, there was a clear interference of the environment on the user attention. Outdoors, in the ambient light, the device’s legibility was reduced/ aggravated by glare and reflections on the screen. Although the user’s opinion was that the camera apparatus did not interfere with the task execution the vast majority decided to lay the device down during task execution. Confirming previous findings, the experiments demonstrated that applications that require a lot of interaction and user attention are inappropriate for performing while walking due to attention distraction. This reinforces that, in spite of the mobility of the device targeted in this study, the evaluation settings did not need to differ substantially from the one employed in the evaluation of stationary devices, since the users tend not to wander while performing tasks that demand their attention. Until recently, studies have been published which deal with new paradigms and evaluation techniques for mobile devices. Few of the proposed new techniques are really innovative if compared to the ones traditionally employed. The data gathered and analyzed support the initial assumption that minor adaptations in the traditional evaluation techniques and respective settings are adequate to accommodate the evaluation of the category of mobile devices targeted by this study. The above comments corroborate with the views of the authors and [15] that of the laboratory and field evaluations do not diverge but are complimentary. As shown in this study they both add to the evaluation process, producing data that is significant to the process reinforcing the relevance of a multidimensional approach for the mobile device usability evaluation.

References 1. Rosson, M.B., Carroll, J.M.: Usability Engineering: Scenario-Based Development of Human-Computer Interaction. Academic Press, San Diego, CA (2002) 2. Hartson, H.R., Andre, T.S., Williges, R.C.: Criteria for evaluating usability evaluation methods. IJHCI 15(1), 145–181 (2003) 3. Nielsen, J.: Usability engineering. Academic Press, Boston (1993) 4. Wixon, D., Wilson, C.: The usability engineering framework for product design and evaluation. In: Helander, M., Landauer, T.K., Prabhu, P. (eds.) Handbook of humancomputer interaction, 2nd edn., pp. 653–688. John Wiley and Sons, Chichester (1997) 5. Jones, M., Marsden, G.: Mobile Interaction Design. John Wiley and Sons, Inc., Chichester, West Sussex (2006) 6. Danielson, D.R.: Usability data quality. In: Ghaoui, C. (ed.) Encyclopedia of humancomputer interaction, pp. 661–667. Idea Group Reference (2006)

A Multidimensional Approach

251

7. ISO 9241-11: Ergonomic requirements for office work with visual display terminals (VDTs) - Part 11: Guidance on usability. International Organization for Standardization, Geneva, Switzerland (1998) 8. Dumas, J.S., Loring, B.A.: Moderating Usability Tests: Principles and Practices for Interacting, illustrated edn. Morgan Kaufmann, San Francisco (2008) 9. Kjeldskov, J., Stage, J.: New techniques for usability evaluation of mobile systems. IJHCI 60(5-6), 599–620 (2004) 10. Ballard, B.: Designing the Mobile User Experience. John Wiley and Sons, Chichester (2007) 11. Goren-Bar, D., Graziola, I., Pianesi, F., Zancanaro, M., Rocchi, C.: Innovative Approaches for Evaluating Adaptive Mobile Museum Guides. In: Stock, O., Zancanaro, M. (eds.) PEACH - Intelligent Interfaces for Museum Visits, pp. 245–265. Springer, Heidelberg (2007) 12. Po, S., Howard, S., Vetere, F., Skov, M.B.: Heuristic evaluation and mobile usability: Bridging the realism gap. In: Proceedings of Mobile HCI, pp. 49–60 (2003) 13. Sanderson, P., Fisher, C.: Usability testing of mobile applications: A comparison between lab and field testing. Human-Computer Interaction 9, 251–317 (1994) 14. Omodei, M.A., Wearing, J., McLennan, J.P.: Head-mounted video and cued recall: A minimally reactive methodology for understanding, detecting and preventing error in the control of complex systems. In: Proceedings of 21th European Annual Conference of Human Decision Making and Control (2002) 15. de Queiroz, J.E.R.: Abordagem Híbrida para avaliação da usabilidade de interfaces com o usuário. Tese de Doutorado, UFPB, Brazil, p. 410 (2001) (in Portuguese) 16. Turnell, M.F.Q.V., de Queiroz, J.E.R., Ferreira, D.S.: Multilayered Approach to Evaluate Mobile User Interfaces. In: Lumsden, J. (ed.) Handbook of Research on User Interface Design and Evaluation for Mobile Technology, vol. 1, pp. 847–862. IGI Global (2008) 17. ISO 9241-14: Ergonomic requirements for office work with visual display terminals (VDTs) - Part 14: Menu dialogues. ISO, Geneva, Switzerland (1997) 18. IS09241-16: Ergonomic requirements for office work with visual display terminals (VDTs) - Part 16: Direct manipulation dialogues. ISO, Geneva, Switzerland (1999) 19. IS09241-17: Ergonomic requirements for office work with visual display terminals (VDTs) - Part 17: Form filling dialogues. ISO, Geneva, Switzerland (1998) 20. ISO/IEC 14754: Information technology - Pen-based interfaces - Common gestures for text editing with pen-based systems. ISO, Geneva, Switzerland (1999) 21. ISO/IEC 24755: Information technology - Screen icons and symbols for personal mobile communication devices. ISO, Geneva, Switzerland (2007) 22. Bailey, J.E., Pearson, S.W.: Development of a Tool for Measuring and Analyzing Computer User Satisfaction. Management Science 29(5), 530–545 (1983) 23. Aladwani, A.M., Palvia, P.C.: Developing and validating an instrument for measuring user-perceived Web quality. Information & Management 39, 467–476 (2002) 24. De Oliveira, R.C.L., de Queiroz, J.E.R., Vieira Turnell, M.F.Q.: WebQuest: A Configurable Web Tool to Prospect the User Profile and User Subjective Satisfaction. In: Salvendry, G. (ed.) Proceedings of the 2005 Human-Computer Interaction Conference, vol. 2, Lawrence Erlbaum Associates, Nevada (2005) (U.S. CD-ROM Multi Platform)

Development of Quantitative Usability Evaluation Method Shin’ichi Fukuzumi1, Teruya Ikegami1, and Hidehiko Okada2 1

NEC Corporation, Common Platform Software Research Laboratories, 7-1, Shiba 5-chome, Minato-ku, Tokyo 108-8001, Japan {s-fukuzumi@aj,t-ikegami@ct}.jp.nec.com 2 Kyoto Sangyo University, Kamigamo Motoyama, Kita-ku, Kyoto 603-8555, Japan [email protected]

Abstract. A variety of evaluation methods are practiced in order to make more appealing and improve the usability of computer systems. The authors have developed a quantitative usability evaluation method that uses a checklist that outlines an evaluation procedure and clarifies judging standards. This paper describes this quantitative usability evaluation method that is not influenced by an evaluator’s subjective impression. Moreover, such clear and precise definitions makes checklist-based evaluations more repeatable (thus more reliable) and less affected by differences among evaluators. The effectiveness of our checklist has been evaluated by the experiments with novice and experienced evaluators. This article reports the method and results of the experiments. Keywords: Usability, evaluation, checklist.

1 Introduction A usability evaluation method using a checklist [1], which is typical in usability evaluations [2], can be applied to the later stages of a development process [3]. However, obtaining justifiable evaluation results is difficult because they depend on the evaluators’ skill, experience, and subjectivity. To solve this problem, we have developed a usability checklist to minimize deviation of evaluation results for realizing usability quantification [4-5]. In this paper, we introduce this checklist, validate it, and apply it to system operation products.

2 Development of a Checklist 2.1 Problem of Usability Quantification and Approach for Its Solution Exclusion of an evaluator’s skill. Generally, a score of goodness of fit for an evaluation target is obtained by using a checklist evaluation method. However, when evaluating over five stages, the results will blur due to the subjectivity of evaluators [6]. Also, sometimes an evaluator cannot understand the meaning of the contents of the item and misinterprets it. For example, the names of UI parts, such as “a list box” and “a pull-down menu”, vary in meaning even among expert evaluators. Because of this, wrong items are chosen as evaluation targets. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 252–258, 2009. © Springer-Verlag Berlin Heidelberg 2009

Development of Quantitative Usability Evaluation Method

253

The authors made sure to judge each item as “has a problem”, “No problems” or “irrelevant” for a clarifying procedure, an evaluation target, and a gauge. Moreover, to stop evaluators’ different intelligibilities and interpretations causing results to blur, samples in a checklist and a collection of terminology definitions were also prepared. Visualization of the effect to the user. An evaluation axis often consists of an element directly connected with design and development, such as a layout, and a button because it is assumed that a specialist in UI design and a developer generally use a checklist, and the effect for the user is hard to determine. To measure the degree to which the effect of each item of the checklist satisfies the user, it is important to weight the qualities of each item. The authors weighted the items using the analytic hierarchy process (AHP) method [7], and evaluation results were decided on the basis of four qualities: “efficiency”, “ease to learn”, “errors”, and “ease to memorize”. 2.2 Maintenance of a Checklist Selection of the item. The authors referred to the various standards and guidelines, made a rough draft of the checklist, selected items, improved them through verifying them, and then evaluated them. Moreover the AHP method of weighting the items was contemplated, and the chapter construction of the item was decided so that no part of the layer became too deep (table 1). Table 1. Chapter of the checklist

1 2 3

Section name Consistency of indication/ operation Legibility of information Presentation of the present state

4 5

Conformability to the user/ environment Conformability to the work

Item

Procedure

Content

Fig. 1. Items of the checklist

The number of items 17 8 22 18 19

Target

Weight

254

S. Fukuzumi, T. Ikegami, and H. Okada

Procedure of an evaluation. This checklist consists of five sections and 84 items and is equipped with “evaluation procedure”, “evaluation target”, and “the weight (4 axes)” for every item (figure 1). The flow of the evaluation was procedure-ized, and a gauge and a result in each step were described clearly to make sure that evaluation results could be judged correctly for each designated evaluation target. When a button or pull down menu was right clicked and the appropriate operation was performed, the judgment result was “No problems”. When the appropriate operation was not performed, the result was “Has a problem”. Additionally, in case that necessity which fits in is low which it may occur problems not to satisfy the precondition and a presence of customization, it is judged “Irrelevant”. Figure 2 shows items of the checklist and a case by an illustration.

Evaluation item Evaluation target

Evaluation Procedure

Example

Fig. 2. Item of the checklist (details and example)

Development of Quantitative Usability Evaluation Method

255

Weight of item. To decide the weight of each item, the AHP method was applied. The feature of the AHP method is to apply paired comparison to an evaluation target according to some gauges. This method can calculate the high score of the validity compared with deciding about the weight of each element overall. Of the five special usability qualities Nielsen advocates to writers [1], the authors chose four: “ease to learn”, “errors”, “ease to memorize” and “efficiency” by a gauge and decided the weight of all of these qualities in each item. The value of a paired comparison of the item was decided by conference of 3 specialists of user interface.

3 Effectiveness Evaluation of Checklist 3.1 Experiment Method Evaluation targets. Three or five GUI windows of an e-mail software were selected as evaluation targets. Participants (Evaluators). In total, 50 people participated in this experiment. Of these, 30 were novice college students without much experience or understand of the software’s usability. On the other hand, the remaining 20 participants were expert evaluators who had experience and understanding of the software’s usability. They are the researchers who are emplyed by a company. Procedure. In accordance with the gauge described in section 2, participants evaluated some of the GUI windows prepared as evaluation targets, as explained above. By comparing novices and experts’ reported results, novice participants’ results can be verified to see whether it is possible to obtain results the same as those of experts. 3.2 Experimental Result Each evaluation result was judged as “Has a problem”, “No problems”, and “Irrelevant”. By comparing results, the possibility of both experts and novices obtaining the same results was tested. As an index the degree is identical between experts and novices’ results, concordance rate is defined as follows [4]. The concordance rate (%) = 100 * (number of novices whose results agreed with those of an expert (%))/(Number of novices) The average concordance rate obtained in this experiment was 73.75%. The average concordance rates of “Has a problem”, “No problems”, and “Irrelevant” are show in Table 2. Table 2. Average concordance rate

Evaluation results Has a problem No problems Irrelevant

Concordance rate 50.6 % 78.7% 80.6%

256

S. Fukuzumi, T. Ikegami, and H. Okada

3.3 Discussion As shown in Table 2, concordance rates of “No problems” and “Irrelevant” are relatively high while that of “Has a problem” is relatively low. When problems existed, it can be said that in a lot of cases novice evaluators overlooked the problem. However, the concordance rate of 50.6% means that the rate of “more than one person of inside of n persons is correspondence” is 1- (1-0.506) n, and big sufficiently with 94.0% at n=4, 87.9% at n=3. From this, a right result is expected to be obtained when more than three people estimate a result.

4 Practical Use of the Checklist This section describes the operation procedure of this checklist. 4.1 Application of the Evaluation Items An evaluator judges whether an evaluation target is described as “Has a problem”, “No problems”, or “Irrelevant” in accordance with an evaluation procedure. The goodness of fit of the evaluation result is calculated by the weight of the judgment result and the item. Next, the methods of judging and calculating the goodness of fit of the evaluation result are described. Judgment of a result. In the method for evaluating represented information and the consistency of operation, these items are applied to the whole screen. When they found a part of a screen with a problem by the item, evaluators judged it as “Has a problem”. If there was a problem only in a part of the screen among screen groups, even when being unified during other pictures, evaluators judged it as “Has a problem” (figure 3). Each screen and part that was an evaluation target was evaluated in terms of the items regardless of the consistency.

Without problem Consistency of Display / operation Layout of button

There is a problem There is a problem

Layout order of table / list

Fig. 3. Evaluation of consistency (the arrangement location of the button)

Calculation of the goodness of fit of the evaluation results. Even in the same item, there exist several evaluation targets that had different a judgment results, so goodness of fit of the evaluation results needs to be calculated by integrating the results. An

Development of Quantitative Usability Evaluation Method

257

evaluator passes a basic overall judgment by prioritizing “Has a problem”, over “No problems”, and then “Irrelevant”. For example, if one “Has a problem” is in the results, the overall judgment is also “Has a problem”. The item judged as “No problems” is weighted to check the goodness of fit of the evaluation results in accordance with the respective evaluation axes. 4.2 Calculation of an Evaluation Result The amount the results concur by the same weight of each quality is the overall score on the evaluation axes. Evaluation result examples for three similar products are indicated in figure 4. Since product A clearly has the highest efficiency and product B is obviously the easiest to learn, it is possible to grasp the special quality of each product. Thus it becomes possible to confirm from four angles the evaluation result about usability by using this checklist. Efficiency Product A Product B Product C

Little error

Easy to learn

Easy to memorize

Fig. 4. Example of Evaluation Result

5 Summary The authors have developed a usability quantification method in which a checklist is used that excludes blurring of results by detailing an evaluation target, an evaluation procedure, and acceptance standard. Even when an evaluator knows little about system usability, he or she can obtain objectives results on it by using this evaluation method.

References 1. Nielsen, J.: Usability Engineering. Academic Press, London (1993) 2. Ravden, S., Johnson, G.: Evaluating Usability of Human-Computer Interfaces: A Practical Method. Prentice-Hall, Englewood Cliffs (1989) 3. ISO 13407: Human-centred design processes for interactive systems (1999)

258

S. Fukuzumi, T. Ikegami, and H. Okada

4. Ikegami, T., Okada, H., Yoshizaka, S., Fukuzumi, S.: Proposal of usability quantification method (1) –checklist for exclusion blurring among evaluators. In: Annual conference of Information Processing Society Japan (2008) (in Japanese) 5. Okada, H., Ikegami, T., Yoshizaka, S., Fukuzumi, S.: Proposal of usability quantification method (2) –experiment for validation of a checklist. In: Annual conference of Information Processing Society Japan (2008) (in Japanese) 6. Kato, S., Horie, K., Ogawa, K., Kimura, S.: A Human Interface Design Checklist and Its Effectiveness. Transaction of Information Processing Society of Japan 36(1), 61–69 (1995) (in Japanese) 7. Ham, D.-H., Heo, J., Fossick, P., Wong, W., Park, S.-H., Song, C., Bradley, M.: Model-based approaches to quantifying the usability of mobile phones. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 288–297. Springer, Heidelberg (2007)

Reference Model for Quality Assurance of Speech Applications Cornelia Hipp and Matthias Peissner Fraunhofer Institute for Industrial Engineering (IAO), Nobelstr. 12, 70569 Stuttgart, Germany {cornelia.hipp,matthias.peissner}@iao.fraunhofer.de

Abstract. The acceptance of speech applications is still very low in Germany. The German market of speech industry identified this problem and makes an effort to improve the quality of speech applications, which should lead to higher user acceptance. To ensure higher quality standards, a reference model has been developed with special regard to the needs of interactive voice response systems (IVR). This model includes instructions to improve the process quality of the development process as well as methods, measurements and quality criteria to evaluate the product quality. Furthermore, the presented reference model differentiates between eight application types of IVR and describes which methods, measurements and quality criteria are especially important for each application type. Keywords: Quality, speech, interactive voice response, automatic speech recognition, measurement, method, voice, speech interaction, reference model, application type.

1 Introduction Speech dialogue systems are struggling with user acceptance. While interaction with mouse and keyboard has reached absolute standard usage, the potential of speech interaction is not adequately exploited yet. A survey made in Germany 2006, called “Acceptance of Speech Application” (German orig. “Akzeptanz von Sprachapplikationen”) [1], shows that the publicity of speech applications is continually increasing in Germany (70% of interviewees said 2006, they know speech applications, compared to only 54% in 2003), however the user satisfaction has not risen. Counterpart to these findings is the continual improvement in the technology for interactive voice response systems and long term predictions that advantages of speech control will be more appreciated [2]. Regarding the difference between user acceptance and potential (e.g. usage while being on the move or benefit for visual impaired people who have problems with graphical user interfaces) we assume that quality control and management is not mature yet within the speech application industry. We believe that an improvement of quality assurance would systematically optimize interactive voice response systems and would improve product image, user experience and call frequency of speech applications. After research, we recognized that there are several solutions for quality J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 259–266, 2009. © Springer-Verlag Berlin Heidelberg 2009

260

C. Hipp and M. Peissner

assurance within the areas of software management [3][4], classical engineering and usability engineering [5][6]. Additionally there are first solutions for evaluating speech applications [7][8] and guidelines to improve the usability of speech applications [9]. But there is a lack of knowledge regarding the systematic improvement in quality of speech applications within an adequate reference model. There are ambitions of the German Initiative Voice Business with their yearly congress Voice Days and their Voice Award which is awarded to best German-based speech application [10]. Although the used method of testing is a solid basis for quality measurement, it is not usable within development projects as a model for continuously controlling and optimising the quality of speech applications.

2 Intention For the reasons mentioned above, we aimed to find a suitable approach for measuring and improving the quality of speech applications. We realized quickly that a holistic approach to improve the dialogue and the user experience of interactive voice response systems is needed. Different factors within the development of speech applications are affecting the quality of the IVR and they are dependent on each other. Within this paper we describe a new holistic reference model for quality assurance of speech applications. The model has been developed at Fraunhofer Institute for Industrial Engineering (IAO) Germany in cooperation with the Initiative Voice Business (IVB) and with exchange of 36 experts within the German speech application market. Additionally, the efforts made were supposed to encourage the voice industry on the German-speaking market and to find solutions which are industry-orientated and easy to transfer to concrete projects. The work is supposed to sharpen the public awareness of the subject of quality in speech applications and to show possibilities of improving the quality.

3 Description of Reference Model First, we adopted a common approach within the software development, the differentiation between product quality and process quality [4]. Product quality refers to the product itself, in this case the IVR. Process quality in contrast refers to rules, strategies and requirements of the development process. Securing a high-quality process doesn’t assure a high quality of the end product but chances are very good, that it is highly qualitative. Therefore in this holistic model, actions are described to monitor the process development as well as to continuously evaluate the product quality. For the product quality, we identified ten quality criteria explicitly for speech applications [11]. These ten criteria describe a good speech application in a holistic way so that criteria are defined within the areas voice user interface and usability as well as strategy and business logic, dialogue platform and integration and speech technology and linguistics. But the quality criteria are not assigned precisely to one specific area and can affect several.

Reference Model for Quality Assurance of Speech Applications

261

Fig. 1. Overview of the Presented Reference Model with Components and their Dependencies

The ten quality criteria can be evaluated with the help of measurements. In total, there are 34 measurements defined to the special needs of IVRs, like e.g. caller frequency or no match rates. Accordant to the quality criteria, the measurements are defined for the four different areas and do not have to be assigned to only one area. The measurements are used to supply concrete values, which can be used for comparison, either between different speech applications or different versions/ development stages of one application. With the help of methods, the described measurements can be performed to achieve concrete values. In total, there have been 23 methods identified with special regard to speech applications, like load test, wizard of oz test and expert evaluation. The methods are important elements within the reference model for the process quality. They are assigned to different steps of the process, which are differentiated between project preparation and analysis, concept and design, implementation, integration and bringing into service and operation. Methods and measurements are supposed to be used in an iterative process to control and achieve good performance.

262

C. Hipp and M. Peissner

Furthermore, quality criteria, measurements and methods have different weight for different application types. Therefore eight different application types have been identified [12] with partly very different priorities for criteria, measurements and methods. In the first step, when using the reference model, it has to be identified which of the defined application types is going to be implemented. Subsequently, important quality criteria can be looked up and then, which measurements are meaningful for the particular criteria. After that, it can be looked up, which methods are usable in order to get results for the measurements. 3.1 Quality Criteria Within the introduced reference model ten quality criteria are defined to show what applications have to achieve to be considered as a good IVR. They are described with regard to the holistic approach of the reference model and cover the four mentioned predefined themes voice user interface and usability, strategy and business logic, dialogue platform and integration and speech technology and linguistics. With the aid of measurements, data can be collected to check whether the quality criteria are achieved. Following the ten quality criteria are listed: Appropriate Functionality Coverage and Content Offering: A speech application is good, if an added value is created for the costumer by means of an attractive and complete offer of functionality. Faultless Operability and Capability: A speech application is good, if a secure and faultless functioning with high performance is assured – on peak loads as well. Administrability and Efficient Operations: A speech application is good, if technical efforts after launching the IVR can be kept at a minimum. Expandability and Scalability: A speech application is good, if the system architecture easily allows future enhancements and changes. Profitability: A speech application is good, if the service is economical profitable. Reliable Recognition of User Utterances: A speech application is good, if speech recognition reliably recognizes an appropriate amount of prospective user utterances. Effective Management of Errors: A speech application is good, if recognition errors and errors of usage do not cause major damage. Effective and Flexible Dialogue Flow: A speech application is good, if the navigation structure supports users to reach their aim fast and secure. Comprehensible and Goal-Orientated System Output: A speech application is good, if the acoustic system output supports the user within orientation and in formulating goal-orientated utterances. Impression and emotional addressing: A speech application is good, if a positive and appropriate attitude of the user towards the speech application, its use and its operator can be reached.

Reference Model for Quality Assurance of Speech Applications

263

3.2 Measurements Within the presented reference model, 32 measurements have been worked out. With their aid, data can be collected in order to identify whether a quality criteria is fulfilled or not. The measurements are performed by means of the defined methods. Measurements are defined based on the following characteristics: name, synonyms, brief description, reference to quality criteria, reference to application components, actual use in practice, usable methods for collecting data for this measurement, appraisal of profitability. As an example, the measurement routing rate is displayed in the following: Name: routing rate Synonyms: correct routing rate Brief Description: percentage of calls which can be successfully transferred (according to the wishes of the costumer) in proportion to the total amount of callers Reference to Quality Criteria: faultless operability and capability, profitability, reliable recognition of user utterances, effective management of errors, impression and emotional addressing Reference to Application Components: model, view, control and access – an optimal co-operation between all components is necessary (referring to the architectural pattern model-view-controller, enhanced with the component access) Actual Use in Practice: frequently – comment: measurement is only relevant for applications where routing has high importance Usable Methods for Collecting Data for this Measurement: logfile analysis and reporting Appraisal of Profitability: very high 3.3 Methods 23 different methods are defined within this reference model for quality assurance of speech applications. They are attached to specific process steps, but do not necessarily have to be attached only to one process step. By means of the methods, data can be collected for different measurements. Subsequently, with the aid of the measurements, quality criteria can be evaluated. Within the reference model, differentiation has been done between methods which should be necessarily done and methods which should be applied to achieve an excellent process. This classification differs between the eight application types, listed later on. Methods are defined based on the following characteristics: name, synonyms, brief description, which measurements can be covered or optimized, reference to quality criteria, reference to application components, relevance to thematic areas, reference to process steps, maturity of method, actual use in practice, potential for use in practice, requirements for this method, appraisal of profitability As an example, the method in-service test is displayed subsequently: Name: in-service test Synonyms: watchdog test, keep-alive test, availability test

264

C. Hipp and M. Peissner

Brief Description: The in-service test is a permanent test regarding the availability and functionality of a speech portal in operation. The system generates external controlling calls cyclically over a long timeframe. Time intervals, type of calls and test scripts are freely selectable. Which Measurements can be Covered or Optimised: service availability, service accessibility, answering time for costumer, correctness of system output Reference to Quality Criteria: faultless operability and capability, profitability Reference to Application Components: access – comment: affected are the telephone functionalities (referring to the architectural pattern model-view-controller, enhanced with the component access) Relevance to Thematic Areas: dialogue-platforms and integration Reference to Process Steps: in operation Maturity of Method: high Actual Use in Practice: occasional/ seldom Potential for Use in Practice: high Requirements for this Method: test equipment, test services Appraisal of Profitability: good – minor costs compared to high benefit 3.4 Product Quality The final aim of the reference model is to reach a high quality of the final product of the IVR. This can be achieved with the help of high process quality and checked with the ten defined quality criteria. 3.5 Process Quality The described reference model differentiates between process quality and product quality. While product quality focuses on evaluation and optimisation of the final result (the IVR) the process quality focuses on the process while developing the product. The notion that improving the process quality will lead subsequently to an improvement of the product quality is the reason why process quality has an important part in the reference model. The process steps which are differentiated within the development of a speech application are defined as project preparation and analysis, concept and design, implementation, integration and bringing into service and operation. These process steps are not strictly separated and should be seen as an iterative process. Additional evaluation should be done through the whole process and not included at a specific time-frame of the development. Within the reference model, methods are defined which should be carried out at specific process steps to ensure a high quality of the process. Furthermore, the model discriminates between an ensurance of a minimum of process quality and the achievement of an excellent process. Therefore, it defines methods which should be necessarily done for the first and lists additional methods for the latter. For instance, it is necessarily recommended to do functional tests during the implementation phase of self service portals. To achieve an excellent process, it is recommended in addition to already do a friendly user test during this phase.

Reference Model for Quality Assurance of Speech Applications

265

3.6 Application Types Measurements, methods and quality criteria are differently useful and meaningful for different application types. E.g. the quality criterion impression and emotional addressing is very important for marketing applications, but not that much for authentication services. Therefore the reference model does not allocate data to specific measurements for all of the speech applications, but differentiates between unequal application types. With the help of this discrimination, it is possible to compare results between applications within the same application type and identify potentials and weaknesses of them. Within the reference model, the following eight application types are defined, with different benchmarks for quality criteria, measurements, methods and process steps: Call Routing: Incoming calls are sorted thematically, and are transferred to the correct person in charge within the call center. Information Service: Costumer can receive information inexpensive, swiftly and upto-date via speech-based information service. Reminding and Alerting Service: Alerting services trigger automatic calls in the event of an emergency (e.g. catastrophes like earthquakes, hurricanes). Reminding services are calling in case of pre-defined important appointments (e.g. delivery dates or taking medication). Authentication Service: Speech application verifies whether caller is in fact the person he is declaring to be based on the unique characteristics of the human voice. Automated Telephone Switchboard: In case of absence of the callee, the automated telephone switchboard can e.g. transfer the call to a colleague or start an answering machine. Track & Trace-System: With the aid of track & trace-systems, companies can permanently provide information of the actual state of their services. Marketing Application: Companies can use IVR for marketing purposes, like lottery, advertisement or usage of voices of prominent people. Self Service Portal: Costumers can execute different transactions and employ information services using self service portals by themselves.

4 Concluding Remarks In Germany, the need for improving the quality of speech applications has been realized, due to the reason that German-speaking voice industry has an issue with the user acceptance in IVR. The potential of speech interaction is not adequately exploited yet and several German speech companies are working together to find consolidated solutions. Quality criteria, methods and measurement have already been defined with special regards of eight application types. But there are still open questions, e.g. how to compare applications within the same type, but with different degree of complexity. Furthermore, the ongoing work should conclude in an acknowledged standard to sensitize customers to quality differences in of speech applications.

266

C. Hipp and M. Peissner

Acknowledgements. Ongoing work to find solutions for quality of speech applications is done with great support of 36 experts in Germany. We would like to thank the following companies and persons: Cirquent (Dr. Bettina Attallah), D+S solutions (Kerstin Sehnert), E.ON Hanse (Frank Oldorf), Genesys Telecommunications Laboratories (Giancarlo Boi), HFN Medien (Dr. Frank Wanning), IBM Germany Research and Development (Ludovica De Sio, Dr. Carsten Günther, Dr. Marion Mast), mind Business Consultants (Sebastian Paulke, Bernhard Steimel), NEXT ID (Ralf Poplawski), Nortel (Dr. Oliver Huber), SemanticEdge (Jörn Kreutel, Dr. Lupo Pape), Sikom Software (Jürgen Hoffmeister, Dietmar Kneidl), Sparda-Bank Hamburg eG (Jürgen Mehring), SpeechConcept (Dr. Uwe Lay), Strateco (Mark Gutmann), Sympalog Voice Solutions (Dr. Jürgen Haas), tech2biz (Dr. Christian Dugast), Deutsche Telekom Laboratories (Caroline Clemens, Dr. Florian Metze, Prof. Dr. Sebastian Möller, Wiebke Johannsen), Telenet Communication Systems (Dr. Florian Hilger, Markus Kesting), T-Mobile Deutschland (Dr. Guntbert Markefka), T-Systems Enterprise Services (Frank Oberle), Unisys Deutschland (Andreas Schaub), VMA (Dr. Guntbert Markefka, Andreas Schaub, Dr. Frank Wanning), voiceandvision (Tom Houwing), Voice & Visual Design (Paul Hubert Vossen), 4Com (Dennis Jehne).

References 1. Peissner, M., Sell, D., Steimel, B.: Acceptance of Speech Applications (German orig. Akzeptanz von Sprachapplikationen). Fraunhofer Institute for Industrial Engineering (IAO), Stuttgart (2006) 2. Wu, E.: Bill Gates predicts software revolution. MIS ASIA (August 14, 2008), http://mis-asia.com/news/articles/ bill-gates-predicts-software-revolution 3. Sommerville, I.: Software Engineering. Pearson Education Germany GmbH, Munich (2001) 4. Ludewig, J., Licher, H.: Software Engineering. dpunkt.verlag, Heidelberg (2007) 5. Nielsen, J.: Usability Engineering. Academic Press, San Diego (1993) 6. Mayhew, D.: The Usability Engineering Lifecycle: A Practitioner’s Handbook for User Interface Design. Academic Press, San Diego (1999) 7. Dybkjaer, L., Hemsen, H., Minker, W.: Evaluation of Text and Speech Systems. Springer, Dordrecht (2007) 8. Möller, S.: Quality of Telephone-Based Spoken Dialogue Systems. Springer Science + Business Media, New York (2005) 9. Hempel, T. (ed.): Usability of Speech Dialog Systems. Springer, Berlin (2008) 10. Initiative Voice Business, http://www.voice-award.de 11. Peissner, M., Hipp, C., Steimel, B.: Quality Criteria, Measurements and Methods for Speech Applications (German orig. Qualitätskriterien, Maße und Verfahren für Sprachapplikationen). Fraunhofer Institute for Industrial Engineering (IAO), Stuttgart (2007) 12. Hipp, C., Paulke, S., Peissner, M., Steimel, B.: Quality Guideline – Cookbook for Good Speech Applications (German orig. Qualitätsleitfaden – Kochbuch für gute Sprachapplikationen). Fraunhofer Institute for Industrial Engineering (IAO), Stuttgart (2008)

Toward Cognitive Modeling for Predicting Usability Bonnie E. John1 and Shunsuke Suzuki2 1

Human-Computer Interaction Institute, Carnegie Mellon University, 5000 Forbes Ave. Pittsburgh, PA, 15213, USA [email protected] 2 NEC Corporation, 8916-47, Takayama-cho, Ikoma, Nara 630-0101, Japan [email protected]

Abstract. Historically, predictive human performance modeling has been successful at predicting the task execution time of skilled users on a desktop computer. More recent work has predicted novice behavior in web searches. This paper reports on a collaborative effort between industry and academia to expand the scope of predictive modeling to the mobile phone domain, both skilled and novice behavior, and how human performance relates to the perception of usability. Since, at this writing, only preliminary results to validate models of mobile phone use are in, we describe the process we will use to progress towards our modeling goals. Keywords: Cognitive modeling, GOMS, KLM, CogTool, Information Foraging.

1 Introduction Predictive human performance modeling has been an HCI “holy grail” for decades. If the field had a computational model of a human that could perform like a human (including perception, cognition and motor action), make errors like a human, learn like a human, and experience emotions like a human, then we could test our design ideas as they emerge in the design process, quickly and inexpensively. Just as an automotive crash dummy saves consumers from poorly designed vehicles, a cognitive crash dummy could expose design ideas that would harm, or at least annoy, users before they were brought to market. This would save companies substantial investment in building and marketing products that people would have a hard time learning or using or would not enjoy. The first model to make progress toward these goals was Card, Moran and Newell’s Model Human Processor in 1983 [1]; researchers have been making slow but steady progress since then. The most reliable models today predict task execution time of skilled users. Models such as Keystroke-Level Model (KLM; [2]) and GOMS (Goals, Operators, Methods, and Selection rules; [3]) have been validated in over 100 research papers by scores of authors, primarily on tasks performed on a desktop computer. Information Foraging Theory [4], a more recent entry into the realm of computational predictive modeling, predicts exploratory behavior of people searching for information. It has been operationalized in an ACT-R [5] model called SNF-ACT [6] and validated against human data on web search tasks. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 267–276, 2009. © Springer-Verlag Berlin Heidelberg 2009

268

B.E. John and S. Suzuki

Despite research progress creating and validating theory, UI developers have not adopted predictive human performance modeling as a frequently-used tool for design. Recent work has embodied these theories into tools that allow practicing developers to achieve the benefits of modeling without investing considerable time in learning to model and constructing each new model (e.g., [7, 8, 9]). However, it is difficult to make a trustworthy tool for practical design problems. This paper explains the process of doing so in the context of collaborative research between NEC, PARC and Carnegie Mellon University. Our project is aimed at producing a tool for predicting task execution time of skilled users, novice exploration to accomplish a goal, and subjective perception of the usability of mobile phones.

2 The Process of Making a Trustworthy, Practical Tool for Design The process of making a trustworthy, practical tool for design is shown in Figure 1. Each time a new domain is entered, or a new metric is added, the theory, tool and models must be validated with data from appropriate users to produce a trustworthy tool for prediction. If the models’ predictions do not match the human data sufficiently, either the theory or the tool, or both, must be revised until valid predictions are produced. The next question is whether the tool is learnable and usable by UI designers in their work process. User-centered design techniques should be used to design, evaluate, and redesign, until the tool is practical for design.

Fig. 1. General process of human performance modeling research that leads to a practical tool for design

Our project started with CogTool, a tool that allows UI designers to create valid Keystroke-Level Models in one tenth the time of doing them by hand as originally demonstrated by Card, Moran and Newell [2]. It has been shown to be easily learnable by users with no background in psychology or cognitive modeling ([8] and

Toward Cognitive Modeling for Predicting Usability

269

tutorials at professional conferences like HFES, BRIMS, and HCII). Recent research with CogTool has extended it beyond KLM and predictions of skilled task execution time to information foraging theory and predictions of novice exploration behavior [10, 11]. From this starting point, we set out to expand CogTool’s ability to predict human behavior to a new domain, mobile phones, and to a new metric, subjective impressions of usability as measured by the Mobile Phone Usability Questionnaire (MPUQ) developed by Ryu [12, 13]. Thus, this project will touch all the points in Figure 1. We start by using CogTool as it exists to make predictions of skilled execution time and novice exploration behavior and test those predictions against human data on mobile phones, fully expecting that adjustments to the underlying theory and tool will need to be made. After making changes to the theory and tool to produce valid predictions of these metrics, we intend to correlate various aspects of the predictions with people’s perceptions of usability. After verifying that we can make trustworthy predictions, we will determine whether CogTool can be used by mobile phone designers and adjust CogTool’s UI until it becomes a practical tool. At this writing, we are at the first part of the process, making predictions with CogTool as it exists and comparing those predictions to human data. The remainder of this paper will describe the current state of the research.

3 The New Domain – Mobile Phones Mobile phones were chosen as the domain in which to pursue this approach. This product category is important to the corporation and the discrete nature of the tasks users perform on mobile phones makes them relatively easy for collecting human data and to model. In addition, CogTool had previously been shows to make good predictions of skilled use of a similar hand-held device (PDAs, [14, 15]). Although the project will evaluate several different mobile phones, this paper will use the N905i, shown in Figure 2, as an example of our research process. The tasks we are examining are varied, as follows. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Call a number from a phone book Store a number into a phone book Put an event into a Schedule Change a Security Setting View a previously sent mail message Set a previously stored picture to be the wallpaper. Delete a previously stored picture Add a function into a shortcut Check memory info Shoot a movie, check it, and save it

Data was collected from skilled users who had owned their phones for at east two months and from novice users who had never used this model of phone. The phone screen was captured on video, which was later transcribed to identify which buttons

270

B.E. John and S. Suzuki

Fig. 2. N905i mobile phone shown at the screen that was the start for each task

were pressed, when each button was pressed, and how long it took the phone to respond to each button press (system response time).

4 CogTool and Initial Models CogTool is a prototyping and cognitive modeling tool created to allow UI designers to rapidly evaluate their design ideas. A design idea is represented as a storyboard (inspired by the DENIM Project [16]), with each state of the interface represented as a node and each action on the interface (e.g., button presses) represented as a transition between the nodes. Figure 3 shows the start state of the storyboard, where buttons are placed on top of an image of the phone. Figure 4 shows a storyboard for six instances of the first task, calling a person who is already listed in the phone’s contact list. The first action at the start state is to press the down button called out in Figure 3. Because different contacts are located at different points of the phone book, the task takes different paths from the start screen to completion of the task. We will use Calling Person4 as the example in the remainder of this paper. After creating the storyboard, the next step is to demonstrate the correct actions to do the task. CogTool automatically builds a valid Keystroke Level Model from this demonstration. It creates ACT-R code that implements the Keystroke-Level Model and runs that code, producing a quantitative estimate of skilled execution time and a visualization of what ACT-R was doing at each moment to produce that estimate (Figure 5). Since mobile phones are a new domain for CogTool, we do not expect that the predictions it makes “out of the box” will be very accurate when compared to human data. We expect to have several iterations of comparing the predictions to human data and fixing the underlying theory and CogTool’s implementation of that theory, before we can make trustworthy predictions to help design. The next section presents preliminary analysis of one such iteration.

Toward Cognitive Modeling for Predicting Usability

271

Each Button in the picture of the phone has a button “widget” drawn on top. Actions on widgets follow transitions as defined in the storyboard (Fig 4).

If you tap the down button, CogTool transitions to the next frame in the storyboard (Fig. 4)

Fig. 3. Start screen of the CogTool prototype

Start screen

Calling Person6

Calling Person5

Calling Person1

Calling Person2

Calling Person3

Calling Person4

Fig. 4. Storyboard of the screens a person would pass through to accomplish Task 1 (making a call from the phone book) for six different instances of the task, i.e., calling six difference people in the phone book. We will use the instance of Calling Person4 as an example throughout this paper.

272

B.E. John and S. Suzuki

4.1 Comparing Initial Models to Human Performance Data The first step in comparing human performance data to the predictions of models is to make sure the same metrics are used in both the data and the models. For example, CogTool models predict not only when a button will be pressed, but also the thinking time and visual perception that precedes pressing the button. Only button presses were recorded in the empirical study, so we cannot directly compare the “total” time predicted by the model against the “total” time observed in the experiment. Adjusting for this difference, and comparing the time from first key press to the appearance of “Calling Person4” on the screen, the CogTool model predicted 11.049 seconds. The mean of five skilled participants was 9.770 seconds, an over-prediction of the average by 13% and an average absolute percent error of 15% between the predicted time and each observed time. This level of prediction is within the 20% error typically claimed by KLM and is an excellent prediction for an initial foray into a new domain and device. The next step is to go to a deeper level of comparison and look at the predictions for each individual action. We expect the quantitative comparisons to get worse, as explained by Card, Moran and Newell [1], but we are looking for patterns in behavior at this point, not absolute quantitative match. The first types of patterns we hope to see are those predicted by the model. Consider Figure 5, a timeline of the model’s predictions for the Calling Person4 task provided by a CogTool as a visualization of its behavior. The rows in the timeline represent different types of actions in the model. The changes in the phone’s screens are on the top gray line, with the estimates of system wait time (between button press and when the screen can be read) in the second row (light gray). The three purple rows show activity associated with vision: Vision-Encoding, Eye Move–Execute and Eye Move–Preparation. The central gray row represents the cognition that controls behavior, both the long “Mental operators” empirically established by Card, Moran and Newell, and the short ACT-R cognitive acts that control vision and hand motions. The bottom red row shows the button presses, in this case, with the right hand (the thumb). The model predicts a pattern: 1 press (at time=0), pause, 6 presses, pause, 8 presses, pause, 1 press.

Fig. 5. Timeline of a CogTool model prediction

Consider Figure 6, where the data from five participants are placed below the model’s timeline, aligned so that their first key presses all start at 0.0 sec. The top four participants display a pattern in keeping with the model’s prediction (1 press, pause, 6 presses, pause, 8 presses, pause, 1 press), except for P9 who does not pause for long

Toward Cognitive Modeling for Predicting Usability

273

before the last key. However, the bottom participant, P1 does not show this pattern at all. When we went back to the video of this participant, we found that although P1 used the same number of keys to complete the task, he did not use the same keys as the other participants or the model. Further investigation is needed to understand whether this was due to an error or whether it represents an alternative correct method for this task. Either way, in the majority of cases of this small sample, CogTool automatically predicted a pattern of behavior that was observed in human performance, even without modifying CogTool for the mobile phone domain.

Fig. 6. Timeline of a CogTool model prediction with keypress data from five participants aligned below it

Looking more closely at the data of the people who used the same keystrokes as the model (the top four), another pattern can be seen, one not predicted by the unmodified CogTool. Each participant shows two grouping of keys pressed close together in time, one of six keys in the beginning of the task and one of eight keys at the end of the task. Of these eight groupings, six show a distinct pause before the last keystroke in the group (P6, 1st group; P9, both groups; P12, 2nd group; P15, both groups). The groupings of six are repeated pressing of the S3 key to move across a set of icons at the top of the screen, some of which drop down a list of items that can be selected. When the desired icon is reached and its list drops down, the user then hits the Down key eight time to move down to the desired contact and hits the Call button to complete the task. The pauses come before the last S3 key press and the last Down

274

B.E. John and S. Suzuki

key press. In both cases, the user is watching a highlight move across (or down) the screen and can anticipate when the next key press will bring the highlight to the desired item. The pause before the last key press might represent a strategy to avoid over-shooting. This monitoring activity was not included in the original systems tested by Card, Moran and Newell and therefore is not represented in the original Keystroke-Level Model. Thus, we have identified a case where we may need to develop new theory about monitoring and anticipatory keystrokes (i.e., iterate on the theory) and build it into the tool (i.e., iterate on the tool) before we can produce trustworthy predictions in this domain. Another case where the predictions do not match the data is in the inter-keystroke times. All four users who did the task in the same way as the model pressed the same key far faster than CogTool did, as seen by the denser grouping of keystrokes in the participants’ timelines than in the model timeline. In this case, we will have to iterate on the underlying theory of motor movement to allow it to produce faster keystrokes. The timeline shows us that CogTool inserts visual perception of a key between each keystroke, which likely to be wrong for repeated keystrokes, especially given the monitoring activity described above where the user’s eyes are presumably on the screen not the buttons. With a model of just one instance of one task and data from five participants, the timeline visualization has suggested that the model is making reasonable predictions of the grouping of actions but is missing some important patterns of human behavior. More tasks and more data will have to be analyzed to be sure it is necessary to change the underlying theory and build it into CogTool to get trustworthy predictions. However, this small example illustrates the process of model validation this project has undertaken.

5 Future Work In addition to following the process in Figure 1 for skilled task execution time predictions on mobile phones, this project will also examine the prediction of novice exploration behavior, with CogTool-Explorer ([10, 11], a version of CogTool that predicts novice behavior). As with skilled behavior, we do not expect CogTool-Explorer to be able to predict a new domain (mobile phones instead of web searches) in a new language (Japanese instead of English) without iteration on the theory and tool. We have already identified improvements to the tool required for mobile phones, for example, mobile phones have “soft keys” where the label of the key is displayed on the screen instead of being printed on the key and CogTool-Explorer was not originally designed to represent that relationship. Perhaps more interestingly, when we have succeeded in producing trustworthy predictions of behavior, we intend to correlate this behavior with subjective impressions of usability as measured by the Mobile Phone Usability Questionnaire (MPUQ) developed by Ryu [12, 13]. Unlike empirical methods that can only correlate observed behavior, like time on task or number of errors, with questionnaire results, we can extract much more varied metrics from the models against which to correlate

Toward Cognitive Modeling for Predicting Usability

275

subjective impressions. For example, total time on task may not correlate with subjective impressions, but time spent in cognition may. Or more complex measures may be needed, like time spent in cognition that is not in parallel with motor movements for skilled users. Or number of keys looked at by CogTool-Explorer before making a choice, for the subjective impressions of novice users. Or amount of system response time not in parallel with cognition (i.e., making the user wait). Because CogTool produces a process model of perception, cognition and motor actions necessary to do a task, many combinations of actions can be explored to see if any can explain a significant part of the variance in subjective impressions. If a significant correlation can be found, then the predictive human performance models will be extended to a subjective metric, moving the field closer to the holy grail.

References 1. Card, S.K., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale (1983) 2. Card, S.K., Moran, T.P., Newell, A.: The Keystroke-Level Model for User Performance Time with Interactive Systems. Commun. ACM 23(7), 396–410 (1980) 3. Card, S.K., Moran, T.P., Newell, A.: Computer Text-Editing: An Information-Processing Analysis of a Routine Cognitive Skill. Cognitive Psychology 12, 32–74 (1980) 4. Pirolli, P., Card, S.: Information Foraging in Information Access Environments. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 1995), pp. 51–58. ACM Press/Addison-Wesley Publishing Co., New York (1995) 5. Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An Integrated Theory of the Mind. Psychological Review 111(4), 1036–1060 (2004) 6. Fu, W.-T., Pirolli, P.: SNIF-ACT: A Cognitive Model of User Navigation on the World Wide Web. Human-Computer Interaction 22, 355–412 (2007) 7. Blackmon, M.H., Kitajima, M., Polson, P.G.: Tool for Accurately Predicting Website Navigation Problems, Non-Problems, Problem Severity, and Effectiveness of Repairs. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI 2005, pp. 31–40. ACM, New York (2005) 8. John, B.E., Prevas, K., Salvucci, D.D., Koedinger, K.: Predictive Human Performance Modeling Made Easy. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2004, pp. 455–462. ACM, New York (2004) 9. Wu, C., Liu, Y.: Usability Makeover of a Cognitive Modeling Tool. Ergonomics in Design 15(2), 8–14 (2007) 10. Teo, L., John, B.E.: Towards Predicting User Interaction with CogTool-Explorer. In: Proceedings of the Human Factors and Ergonomics Society 52nd Annual Meeting, HFES, pp. 950–954, Santa Monica (2008) 11. Teo, L., John, B.E., Pirolli, P.: Towards a Tool for Predicting User Exploration. In: CHI 2007 Extended Abstracts on Human Factors in Computing Systems, CHI 2007, pp. 2687– 2692. ACM, New York (2007) 12. Ryu, Y.S.: Development of Usability Questionnaires for Electronic Mobile Products and Decision Making Methods, Doctoral dissertation, State University, Blacksburg, VA, USA (2005) 13. Ryu, Y.S., Smith-Jackson, T.L.: Reliability and Validity of Mobile Phone Usability Questionnaire (MPUQ). Journal of Usability Studies 2(1), 39–53 (2006)

276

B.E. John and S. Suzuki

14. Luo, L., John, B.E.: Predicting Task Execution Time on Handheld Devices Using the Keystroke-Level Model. In: Proceedings of the International Conference on Human Factors in Computing System (CHI 2005), pp. 1605–1608. ACM Press/Addison-Wesley Publishing Co., New York (2005) 15. Luo, L., Siewiorek, D.P.: KLEM: A Method for Predicting User Interaction Time and System Energy Consumption during Application Design. In: Proceedings of the 11th International Symposium on Wearable Computers (ISWC 2007), pp. 69–76. IEEE Press, New York (2007) 16. Lin, J., Newman, M.W., Hong, J., Landay, J.A.: Denim: An Informal Tool for Early Stage Web Site Design. In: CHI 2001 Extended Abstracts on Human Factors in Computing Systems (CHI 2001), pp. 205–206. ACM, New York (2001)

Webjig: An Automated User Data Collection System for Website Usability Evaluation Mikio Kiura, Masao Ohira, and Ken-ichi Matsumoto Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, Japan {mikio-k,masao,matumoto}@is.naist.jp

Abstract. In order to improve website usability, it is important for developers to understand how users access websites. In this paper, we present Webjig, which is a support system for website usability evaluation in order to resolve the problems associated with the existing systems. Webjig can collect users’ interaction data from static and dynamic websites. Moreover, by using Webjig, developers can precisely identify users’ activities on websites. By performing an experiment to evaluate the usefulness of Webjig, we have confirmed that developers could effectively improve website usability. Keywords: Web usability, usability evaluation, analysis of user interactions, dynamic websites.

1 Introduction It has been found that there are various benefits associated with improving website usability, and so, in recent times, there have been increasing interests on website usability. For instance, in the case of an EC (electronic commerce) site, website usability has an impact on conversion rates. In the case of a website used in workplaces, website usability can also affect the work efficiency. In addition, the cost of user support can be reduced by improving website usability. It also influences a company’s image. Developers must understand how users access a website in order to improve website usability [1]. Usability testing is widely used for understanding users’ interactions on a website [2]. In usability testing, users perform some specific tasks within a set time under the supervision of usability engineers (or experts). The engineers observe how the users follow certain steps to accomplish the tasks. Through usability testing, developers can presume users’ intellectual process and observe their interaction; these developers can identify problems, clarify design issues, or come up with new ideas. Although usability testing is a prevalent approach for improving website usability, it cannot be applied to every website. Usability testing requires stakeholders (e.g., users, developers, and experts) to spend large amounts of time and money. Developers cannot conduct usability testing easily [3]. So far, several systems [4, 5, 6, and 7] for website usability evaluation have been proposed so that developers can understand how users access a website over a network with low cost. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 277–286, 2009. © Springer-Verlag Berlin Heidelberg 2009

278

M. Kiura, M. Ohira, and K. Matsumoto

However, these systems are designed to collect data only from static websites. Developers cannot figure out users’ interactions on a dynamic website (e.g., automatically created webpages by CGI or server-side script, and webpages developed by JavaScript in a Web browser). By using JavaScript, developers can implement an interface that can switch the displayed contents by tabs, drop down menus, or dragand-drop methods without URL transitions. On a website using such interfaces, the existing systems cannot obtain the previously displayed contents accessed by users because these contents would change. In this paper, we propose Webjig, which is a support system for website usability evaluation for both dynamic and static websites; this system records users’ interactions related to the contents that are displayed in users’ Web browser. Developers can exactly understand users’ interactions on a website by using Webjig. Thus, they can efficiently improve website usability.

2 Related Work The traditional approach to resolving problems of website usability it to use the Web server accesses logs [4]. Developers can know various kinds of information including users’ IP address, accessed time, request data, and Web server’s response from the Web server access log. The advantage of using the Web access sever log is that the access log is automatically saved in a Web server and can be used by developers with low cost. If developers can easily use the Web access log to improve website usability, however, they cannot know users’ interactions such as mouse motions, mouseclick positions, and mouse-click timings on a website [5]. Several systems have been proposed to automatically collect the data of users’ interaction on a website (e.g., MouseTrack [6], UsaProxy [7]). The systems solved the problem above, by identifying users’ mouse motions, mouse-click positions, and mouse-click timings by embedding JavaScript codes into a webpage. The systems helped developers understand users’ interactions on a website at a considerably low cost. Previous studies have suggested that there is a correlation between the point of gaze and the position of mouse cursor. Chen et al. have reported that there is a strong correlation between the point of gaze and the position of mouse cursor; further, the developer can predict a point in the website where the user interested in and they may chart a pattern of the user by users’ interaction [8]. In addition, Muller et al. reported that 35% of users traced a sentence with a mouse cursor when they read the sentence in a website. These results show that developers can detect the problems of website usability by studying users’ interaction on it.

3 Webjig In this paper, we introduce Webjig, which is a new system used to solve the problems of the existing systems. Webjig can handle data from static and dynamic websites. By analyzing DOM (Document Object Model) of HTML, Webjig can collect the data of contents clicked by users, including timings, positions, and motions. This mechanism

Webjig: An Automated User Data Collection System for Website Usability Evaluation

279

allows usability engineers and developers to solve the problems associated with the existing system, i.e., the existing system could not precisely identify users’ interactions on a dynamic website. We present the system architecture of Webjig in Fig.1. Webjig is a client/server system. The client is implemented by using JavaScript, which executes in a Web browser. The server is implemented by using PHP. The system consists of Webjig::Fetch, Webjig::Analysis, and Webjig::DB. Webjig::Fetch is a subsystem that automatically collects the data of users’ interactions on a website. Webjig::Analysis is a subsystem that shows the information of users’ interactions to developers. Webjig::DB is a subsystem that holds the data of users’ interactions and provides API to access the data.

Fig. 1. System architecture of Webjig

3.1 Webjig::Fetch Webjig::Fetch is a subsystem that automatically collects the data of users’ interactions on the website. Table 1 shows the data collected and stored by Webjig. During the time in which a user stays on a webpage, the data may be changed, except for the name and version of the Web browser. The system monitors a change in the data at intervals of dozens of milliseconds and sends the data to Webjig::DB at intervals of few seconds and at the time when the user exits the webpage. Table 1. Collected data usign Webjig Data type Name and version of Web browser Inner size of Web browser Position of scroll bar Position of mouse cursor Timing and type of mouse click Timing and type of key pressed Contents displayed in a Web browser

Timing of data collection Loaded Changed Changed Changed Pressed Pressed Changed

Timing of data transmission Loaded Intervals and exit Intervals and exit Intervals and exit Intervals and exit Intervals and exit Intervals and exit

280

M. Kiura, M. Ohira, and K. Matsumoto

For collecting users’ interactions data, developers have to install Webjig::Fetch in a webpage. what developers have to do is only to insert a line <script src=”URL of Webjig::Fetch”> in the HTML source code of the webpage that targets the usability evaluation using Webjig. Fig.2 is an example of Webjig installed in an HTML source code. Webjig works even if the developer may insert the script tag at the any place in the HTML source code. However, a mainstream Web browser interprets the HTML source code from the top and displays the contents. Therefore, we recommend inserting the script tag at the bottom of the HTML source code so that Webjig does not disturb the original contents. Sample Page<title> </head> <body> Sample Content <script src=”http://example.com/webjig.js” ></script> </body> </html> Fig. 2. An example of HTML source code 3.2 Webjig::Analysis Webjig::Analysis has various features for supporting website usability evaluation. For instance, Webjig::Analysis can replay users’ interactions such as mouse motions, mouse click, and keyboard input related to the displayed contents in a movie format by using the collected data. In Fig.3, we show a screenshot of Webjig::Analysis when it replays the users’ interactions. The system consists of displayed contents in a Web browser and some floating windows that control the system and show various kinds of information. Developers can replay users’ interactions such as play, stop, forward, and rewind anytime by using various control buttons, seek bar, or slider available on the control window. In addition, the system can also generate a heat map, which shows where the users often click, and presume the portions where the users read and do not read on a webpage. By using these features, developers can examine the following questions. • Are there any confusing graphics in links? • Do users pay attention to the content that developers want them to read? • Where do users look or not look? • How do users access the website? • What do user wrong operation on the way to the goal? • How do users use a dynamic interface? • Where do users pause when they input into forms? • Where did the user view before exiting the website? • and so forth. Webjig: An Automated User Data Collection System for Website Usability Evaluation 281 Fig. 3. Screenshot of Webjig::Analysis 4 System Evaluation 4.1 Overview We performed an experiment to evaluate the usefulness of Webjig. 54 graduate students (39 males and 15 females, average age 20) participated in the experiment as subjects. 54 subjects were divided into three groups. Each group worked on different tasks described in the next subsection. 4.2 Experiment Procedure and Task We executed the experiment according to the following procedures. Step 1. We provided 24 uses (subject of Group A) five tasks. Each task required the subjects to find a specified product from a dynamic menu implemented using JavaScript. Webjig recorded users’ interactions during task execution. Step 2. Based on the collected data in Step 1, three subjects who had a role of developers (Group B) analyzed the users’ interactions during task execution using 282 M. Kiura, M. Ohira, and K. Matsumoto Webjig::Analysis. The developers planned for an improved structure of menu. Step 3. We provided 27 different users (subjects of Group C) tasks similar to Step 1. The difference between Step 1 and Step 2 is that the subject of Group C used the improved menu. Webjig recorded users’ interactions during task execution. Step 4. Finally, comparing the task execution time of Step1 and Step3, we checked the validity of the change in the structure of the menu. Fig.4 is the dummy website for the experiment. Table 1 shows target products and categories where the products exist. Fig. 4. Screenshot of the dummy website for the experiment Table 2. Target products and category for each task Task Name Task 1 Task 2 Task 3 Task 4 Task 5 Product Dry cell SD memory card A massage chair Electronic dictionary Fax Category Audio & visual Cameras Health Office House & appliance 4.3 Experiment Results Developers can know where users look in the webpage by using Webjig. Table 3 shows what percentage of the subject of Group A firstly clicked on which categories. The grayed rectangle in Table 3 means the correct category where a specified product Webjig: An Automated User Data Collection System for Website Usability Evaluation 283 Table 3. Results of first category sellection Category Task 1 Task 2 Task 3 Task 4 task5 Camera 29% 13% 0% 0% 0% Computer 0% 46% 0% 13% 4% 4% 33% 0% 0% 21% Audio-video equipment House & appliance 54% 4% 71% 29% 29% Game 4% 4% 0% 0% 0% 58% 46% Office equipment 8% 0% 4% 25% 0% 0% Health 0% 0% exists for each task. For example, 54% of the subjects first clicked on the category of house & appliance, thought dry cell belonged to the category of audio & visual. When using existing systems, developers cannot know such the information. Table 4 shows the changed structure of the menu which was planned by the developers based on the result of Table 3. The plan is made from an idea that if there was the category more clicked by users than the current category, a target product should be moved to a proper category. In case of task 1 where subjects searched a dry cell, a dry cell belonged to the audio & visual category, but many subjects first pay attention to the house & appliance category. Therefore, the developers moved the dry cell to the category of house & appliance. Further, in case of task 4 where subjects searched an electronic dictionary, an electronic dictionary belonged to the category of office equipment, and the majority of the subjects first paid attention to the office equipment category. Therefore, the developers did not move it to any other category. Table 4. Change plan for the menu of the categories Task Name Task 1 Task 2 Task 3 Task 4 Task 5 Product Dry cell SD Memory Card A massage chair Electronic dictionary Fax Original category Audio & video Cameras Health Office House & appliance Destination category House & appliance Computers House & appliance Office Office We perform the experiment after changing the website, as shown in table 7. We show the experiment result in Fig.5. From Fig.5, the task execution time has been reduced in tasks 1, 2, and 3 by applying the changed plan. Fig. 5 shows the results of the execution time for each task in Step1 and Step 31. We can confirm that the execution time in Step 3 is shorter than that in Step 1, that is, the improved menu structure based on the developers’ analysis using Wegjig was effective. 1 Since the structure of the menu was changed in Task 4, we could not confirm the significant difference between the results in Step1 and Step3. 284 M. Kiura, M. Ohira, and K. Matsumoto Fig. 5. Result of the task execution time in Step1 and Step3 5 Discussion By using Webjig, developers can obtain information which they would not have got with the existing systems. For this reason, developers can detect problems in website usability and create a plan for improving website usability by collecting data of users’ interactions, as performed in this experiment. In the experiment where users choose the items from the menu, the developers can determine the execution time for each task by using existing systems. Thus, they can detect the problems of usability by comparing the execution time of each task and pinpoint the task where the execution time is longer than that taken by another task. In Fig.5, the execution time of tasks 1, 2, 3, and 5 is longer than that of task 4. For this reason, a developer can hypothesize that there remains problems of website usability. However, it is difficult to eliminate the problem if they cannot understand the cause of the problem. By using Webjig, a developer can efficiently detect the problem of website usability. In case of task 1 (subjects find a dry cell), we show the experiment result in table 3; dry cell belongs to audio-visual equipment, but many subjects pay attention to household appliance. The developer hypothesized that “Many users think that a dry cell belongs to a household appliance” and moved the dry cell from audio-visual equipment to household appliance. As a result, the execution time is reduced before changing the category. According to Fig.5, the task execution time of the changed website is less than that of the original website. In tasks 1, 2, and 3, we can observe significant improvement in the execution time. However, in task 5, we did not observe any significant improvement in the execution time. Webjig: An Automated User Data Collection System for Website Usability Evaluation 285 Table 5. Priority for the improvement Task Name Task 1 Task 2 Task 3 Task 5 Correct category (A) 4% 13% 25% 29% Current Category (B) 54% 46% 71% 46% B/A 13.5 3.5 2.8 1.6 We explain the reason for this. In table 5, we compare the rate of users who pay attention to the correct category with the rate of users who pay attention to the changed category. In case of task 1, 4% of users pay attention to the correct category (a category of audio & visual) when searching for dry cell and 54% of users pay attention to the wrong category (a category of house & appliance) when searching for dry cell. This has a difference of 13.5 times. Similarly, task 2 has a difference of 3.5 times, task 3 has a difference of 2.8 times, and task 5 has a difference of 1.6 times. As a result, we can say that if there is not a big difference in the rate of users who pay attention to an original category and the rate of users who pay attention to a changed category, we cannot confirm an effect in the change. Therefore, developers have to examine whether the usability is improved by understanding users’ interactions and not by the reason that the task execution time was longer than others. By using Webjig, a developer can exactly understand users’ interactions and examine whether the usability is improved. However, it is difficult to examine the improvement of website usability by using existing systems because exact users’ interactions cannot be obtained. However, developers cannot use the Webjig instead of user testing because they can know the gaze point by using the eye tracking system and they can know the intention of the user by interviewing him/her during user testing. But we saw that there was the point that could be improved website usability by using Webjig. Therefore, developers may efficiently improve website usability by combining user testing and Webjig. 6 Conclusion and Future Work In this paper, we proposed a Webjig support system for static and dynamic websites. As a result of the experiment, we show that developers can improve website usability effectively by using Webjig. In the future, we are going to think about the cost of website usability evaluation between existing systems and Webjig and compare usability testing with Webjig to determine the efficiency of website usability evaluation. Acknowledgements This study is supported by Information-technology Promotion Agency, Japan (IPA), Exploratory IT Human Resources Project (MITOU Program) in the fiscal year 2008. 286 M. Kiura, M. Ohira, and K. Matsumoto References 1. Nielsen, J., Landauer, T.K.: A mathematical model of the finding of usability problems. In: The INTERACT 1993 and CHI 1993 conference on Human factors in computing systems, pp. 206–213 (1993) 2. Dumas, J.S., Redish, J.C.: A Practical Guide to Usability Testing. Ablex Publishing, Norwood, New Jersey (1993) 3. Barnum, C.M.: Usability Testing and Research. Longman, London (2001) 4. Hong, J.I., Landay, J.A.: WebQuilt: a framework for capturing and visualizing the web experience. In: The 10th international conference on World Wide Web (WWW 2001), pp. 717–724 (2001) 5. Etgan, M., Cantoe, J.: What does getting WET (Web Event-logging Tool) mean for web usability? In: 5th Conference on Human Factors and the Web, HFWEB 1999 (1999), http://zing.ncsl.nist.gov/hfweb/proceedings/ etgen-cantor/index.html (accessed February 27, 2009) 6. Arroyo, E., Selker, T., Wei, W.: Usability tool for analysis of web designs using mouse tracks. In: CHI 2006 extended abstracts on Human factors in computing systems, pp. 484– 489 (2006) 7. Atterer, R., Schmidt, A.: Tracking the interaction of users with AJAX applications for usability testing. In: The SIGCHI conference on Human factors in computing systems (CHI 2007), pp. 1347–1350 (2007) 8. Chen, M.C., Anderson, J.R., Sohn, M.H.: What can a mouse cursor tell us more?: correlation of eye/mouse movements on web browsing. In: CHI 2001 extended abstracts on Human factors in computing systems, pp. 281–282 (2001) 9. Mueller, F., Lockerd, A.: Cheese: tracking mouse movement activity on websites, a tool for user modeling. In: CHI 2001 extended abstracts on Human factors in computing systems, pp. 279–280 (2001) ADiEU: Toward Domain-Based Evaluation of Spoken Dialog Systems Jan Kleindienst, Jan Cuřín, and Martin Labský IBM Research, Prague, Czech Republic {jankle,jan_curin,martin.labsky}@cz.ibm.com Abstract. We propose a new approach toward evaluation of spoken dialog systems. The novelty of our method is based on utilization of domain-specific knowledge combined with the deterministic measurement of dialog system performance on a set of individual tasks within the domain. The proposed methodology thus attempts to answer questions such as: “How well is my dialog system performing on a specific domain?”, “How much has my dialog system improved since the previous version?”, “How much is my dialog system better/worse than other dialog systems performing on that domain?” Keywords: Dialog, evaluation, scoring, multimodal, speech recognition. 1 Introduction Research in the field of conversational and dialog systems has a long tradition starting in 1966 with Weizenbaum's Eliza [1]. More recently, research in spoken dialog systems has tackled more ambitious domains, such as problem solving [2], navigation [3], or tutoring systems [4]. This paper is organized as follows. In introduction we outline our motivation and the principle of the proposed method. In Section 2 we introduce the concept of a domain task ontology that can serve as a benchmarking tool for well-known application domains. Section 3 describes in detail the proposed ADiEU metric and its computation. Section 4 presents a case study in the music management domain and demonstrates the application of ADiEU to a real-world task. We discuss practical considerations regarding the proposed metric in Section 5, human evaluation in Section 6, and conclude in Section 7. 1.1 Rationale Current methods and techniques for measuring performance of spoken dialog systems are still very immature. They are either based on subjective evaluation (Wizard of Oz or other usability studies) or they are borrowing automatic measures used in speech recognition, machine translation or action classification, which provide only incomplete picture of the performance of the system. Nowadays, dialog systems are evaluated by measures used in speech recognition, such as word error rate (WER) or action classification error rate [5], by techniques that measure primarily dialog coherence [6], and by systems supporting human judgment-based evaluation, such as PARADISE [7, 8]. What J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 287–294, 2009. © Springer-Verlag Berlin Heidelberg 2009 288 J. Kleindienst, J. Cuřín, and M. Labský is particularly missing in this area is (1) a measurement of performance for a particular domain, (2) possibility to compare one dialog system with others, and (3) evaluation of a progress during the development of a dialog system. By the ADiEU1 scoring presented herein we attempt to address these three cases. 1.2 The Elements of ADiEU Metric The ADiEU score consists of two ingredients both of which range from 0 to 1: A) Domain Coverage (DC) score, B) Dialog Efficiency (DE) score. We describe both scores in the following chapters. Note that the results of domain coverage and dialog efficiency may be combined into a single compound score to attain a single overall characteristic (the eigen value) of the assessed dialog system. The ADiEU score relies on a good understanding of the dialog domain that is described in the form of a domain task ontology. The more expert knowledge is projected into the domain ontology, the more reliable results we expect from the ADiEU score. 2 Capturing Domain Ontology The cornerstone of our approach is to evaluate spoken and multi-modal dialog systems within a predefined, well-known (and typically narrow) domain. In our labs we have developed many speech and multimodal applications for various domains, such as music selection, TV remote control, in-car navigation and phone control; using grammars, language models and natural language understanding techniques. In order to compare two spoken dialog systems that deal with the same domain, we first describe the domain diligently using the task ontology. This restricted ontology represents the human expert knowledge of the domain and is encoded as a set of tasks with two kinds of relations between the tasks: task generalization and aggregation. Individual tasks are defined as sequences of parameterized actions. Actions are separable units of domain functionality, such as volume control, song browsing or playback. Parameters are categories of named entities, such as album or track title, artist name or genre. Tasks are labeled by weights, which express the relative importance of a particular task with respect to other tasks. The ontology may also define task aggregations which explicitly state that a complex task can be realized by sequencing several simpler tasks. Table 1 shows a sample task ontology for the music control domain. For example, the task volume control/relative with weight of 2 (e.g. “louder, please”) is considered more important in evaluation than its absolute sibling (e.g. “set volume to 5”). This may be highly subjective if scored by a single human judge and thus a consensus of domain experts may be required to converge to a generally acceptable ontology for the domain. Once acknowledged by the community, this ontology could be used as the common etalon for scoring third-party dialog systems. 1 We call our measurement the Automatic Dialog Evaluation Understudy, ADiEU. ADiEU: Toward Domain-Based Evaluation of Spoken Dialog Systems 289 Table 1. Speech-enabled reference tasks for the jukebox domain. Tasks are divided into groups. Both group as well as tasks within the group are assigned relative importance points by an expert. These points are normalized to obtain per-task contribution to the domain’s functionality. ITC shows ideal turn count range for each task. Group Points Share Volume 2 15.50% Playback 4 31.01% Play mode 0.5 3.88% Media library 6 46.51% Menu 0.4 3.10% 100% Task Description relative absolute mute play stop pause resume next, previous track next, previous album media selection shuffle repeat browse by criteria play by criteria search by genre search by artist name up to 100 artists more then 100 artists search by album name up to 200 albums more than 200 albums search by song title up to 250 songs more than 2000 songs search by partial names words spelled letters ambiguous entries query item counts favorites browse and play add items media management refresh from media add or remove media access online content quit switch among other apps Points 2 1 2 3 3 1.5 1.5 1 1 1 Contrib. % 6.20 3.10 6.20 7.75 7.75 3.88 3.88 2.58 2.58 2.58 ITC 1 1 1 1 1 1 1 1 1 1 1 1 2 4 2 1.94 1.94 3.93 7.85 3.93 1 1 1..2 1..2 1 1 2 1.96 3.93 1..2 1..2 1 2 1.96 3.93 1..2 1..2 1 2 1.96 3.93 1..2 1..2 1 1 2 1.96 1.96 3.93 2 2 2 0.5 0.98 1 0.5 0.3 0.98 0.59 1..2 1 0.2 0.2 1 0.5 1 0.39 0.39 1.96 1.03 2.07 100 1 1..2 2..3 1..2 1..2 3 The Proposed Method of ADiEU Evaluation The actual dialog system evaluation metric that is in the heart of our method consists of two indicators: Domain Coverage (DC) - computed over the task ontology and Dialog Efficiency (DE) that quantifies the outcome of user test sessions. The DC expresses how the evaluated system covers the set of tasks in the ontology for a particular domain; while the DE indicates the performance of the evaluated system on those tasks supported by the system. 290 J. Kleindienst, J. Cuřín, and M. Labský 3.1 Scoring of Domain Coverage The domain coverage (DC) is a sum of weights of tasks supported by the system (S) over the sum of weights of all tasks from the ontology (O). DC ( S , O) = ∑ ∑ t ∈su pported tasks ( O ) t *∈all tasks ( O ) wt wt * (1) Table 1 shows a sample domain task ontology for the music management domain that shows the raw points assigned by a domain expert and their normalized versions that are used to assess the relative importance of individual tasks. The expert may control the weights of whole task groups (such as Playback control) as well as the weights of individual tasks that comprise these groups. Generally, the ontology can have more than two levels of sub-categorization that are shown in the example. 3.2 Scoring of Dialog Efficiency The actual efficiency of dialog is measured using the number of dialogue turns [9, 10] needed to accomplish a chosen task. In spoken dialog systems, a dialog turn corresponds to a pattern of user speech input followed by the system’s response. We introduce a generalized penalty turn count (PTC) that measures overall dialog efficiency by incorporating other considered factors: number of help requests, number of rejections, and user and system reaction times. PTC (t ) = TC (t ) + λNHR NHR(t ) + λNRP NRP(t ) + λURTURT (t ) + λSRT SRT (t ) (2) Where TC is the actual dialog turn count, NHR is the number of help requests, URT is user response time and SRT is system response time and the lambdas represent weights of each contributor to the final penalty turn count (PTC)2. The obtained penalty turn count in then compared to an ideal number of turns for a particular task. We define a key property, the ideal number of turns (INT), as being determined by at least the following factors. The INT is (F1) directly proportional to a number of information slots to be filled and (F2) indirectly proportional to a size of the block of information slots commonly accepted as coherent. INT (t ) = number of in formation slots to be filled size of a block of in formation slots commonly accepted as coherent (3) For example, the concept of “date” consists of three information slots (day, month, and year) that need to be filled. Here, the number of information slots (F1) is three, which is in this case the same as the size of a coherent block expected by the users. The INT for the “date” concept is thus 1 (=3/3). In the contemporary art the INT property is determined manually by a human judgment. 2 In our experiments, we set λNHR=0.5, λNRP=1, and λURT=λSRT=0 since for the music domain the user reaction time was not indicative of dialog quality and both applications responded instantly. ADiEU: Toward Domain-Based Evaluation of Spoken Dialog Systems 291 The actual score of the dialog efficiency (DE score) for an individual task is then counted as a fraction of difference between INT and PTC against current PTC, i.e.: ⎛ PTC (t ) − INT (t ) ⎞ DE (t ) = 1 − max ⎜⎜ , 0 ⎟⎟ PTC (t ) ⎝ ⎠ (4) To avoid subjective scoring we typically use several human testers as well as several trials per one task. For example for task “play by artist” the following set of trials can be used: “Play something by Patsy Cline”, “Play some song from your favorite interpreter”, or “Play some rock album, make the final selection by the artist name”. Each of these trials has assigned its ideal number of turns (this is why INT for tasks in the ontology are given by range in the Table 1.) The task dialog efficiency score is then computed as an average over all human testers and dialog efficiency for each trial. Samples of trials used in the evaluation of music management domain are given in Table 2. 3.3 The ADiEU Score The ADiEU score is then counted as a sum of products of domain coverage and dialog efficiency for each task in the domain ontology, i.e.: ADiEU ( S , O) = ∑ t ∈su pported tasks ( O ) wt ⋅ DE (t ) ∑t∈su pported tasks (O ) wt (5) 4 Case Study: ADiEU Scores for Music Management Domain We applied the ADiEU scoring to our two dialog systems developed at different times and both partially covering the music management dialog domain. Both allow their users to play music by dynamically generating grammars based on meta tags found in users’ mp3 files. The first one, named A-player, is simpler and covers a limited part of the music management domain. The second, named Jukebox, covers a larger part of the domain and also allows free-form input using a combination of statistical language models and maximum entropy based action classifiers. For both applications, we collected input from a group of 10 speakers who were asked to accomplish tasks listed in Table 2. Each of these user tasks corresponded to a task in the domain task ontology and there was at least one user task per each ontology task that was supported by either A-player or Jukebox. The subjects were given general guidance but no sample English phrases were suggested to them that could be used to control the system. In order not to guide users even by the wording of the user tasks, the tasks were described to them in their native language. All ten subjects were non-native but fluent English speakers. 292 J. Kleindienst, J. Cuřín, and M. Labský Table 2. Specific tasks to be accomplished by speakers using A-player and Jukebox Task Start playback of arbitrary music Increase the volume Set volume to level 10 Mute on Mute off Pause Resume Next track Previous track Shuffle Play some jazz song Play a song from Patsy Cline Play Iron Man from Black Sabbath Play the album The Best of Beethoven Play a song Where the Streets Have No Name Play a song Sonata no. 11 (ambiguous) Play a rock song by your favorite artist Reload songs from media A-player Jukebox ITC x x 1 x 1 x 1 x 1 x 1 x 1 x 1 x x 1 x x 1 x x 1 x 1 x x 1 x x 1 x x 1 x x 1 x x 2 x x 3 x 1 Table 3. Computation of coverage, task completion score and ADiEU for A-player and Jukebox Task sup volume relative volume absolute mute play stop pause resume next, prev. track next, prev. album shuffle browse by criteria play by criteria search by genre search by artist <= 100 artists > 100 artists search by album <= 200 albums > 200 albums search by song <= 250 songs > 2000 songs word part. search ambiguous entries media refresh 0 0 0 1 1 0 0 1 0 0 0 1 0 A-player DC DE weight score 0 0 0 7.75 0.57 7.75 1.00 0 0 2.58 0.80 0 0.50 0 0 7.85 0.82 0 0.67 1 1 1.96 3.93 0.83 0.83 1 1 1.96 3.93 1.00 1.00 1 1 0 0 1 0.34 1.96 3.93 0 0 0.39 43.99 0.79 0.79 0.67 82.6 ADiEU 0.000 0.000 0.000 0.044 0.078 0.000 0.000 0.021 0.000 0.000 0.000 0.064 0.000 0.000 0.016 0.033 0.000 0.020 0.039 0.000 0.015 0.031 0.000 0.000 0.003 36.3 sup 1 1 1 1 1 1 1 1 1 1 0.5 1 1 Jukebox DC DE weight score 6.2 0.82 3.1 0.82 6.2 0.82 7.75 0.32 7.75 0.82 3.88 0.57 3.88 0.50 2.58 1.00 2.58 0.80 1.94 0.67 1.97 0.52 7.85 0.67 3.93 0.78 1 1 1.96 3.93 0.40 0.60 1 1 1.96 3.93 0.29 0.75 1 1 1 1 0 0.55 1.96 3.93 1.96 3.93 0 83.17 0.61 0.93 0.55 0.49 66.7 ADiEU 0.051 0.025 0.051 0.025 0.064 0.022 0.019 0.026 0.021 0.013 0.010 0.052 0.030 0.000 0.008 0.024 0.000 0.006 0.029 0.000 0.012 0.036 0.011 0.019 0.000 0.554 Table 3 shows the computation of the ADiEU score and its components: domain coverage (DC) and domain efficiency (DE). For A-player, which is limited in functionality, the weighted domain coverage only reached 43.99%, whereas for Jukebox ADiEU: Toward Domain-Based Evaluation of Spoken Dialog Systems 293 this was 83.17%. On the other hand, A-player allowed its users to accomplish the tasks it supported more quickly than Jukebox; this is documented by the weighted dialog efficiency score reaching 82.6% for A-player and 66.7% for Jukebox. This was mainly due to Jukebox being more interactive (e.g. asking questions, presenting choices) and due to a slightly higher error rate of a dictation-based system as opposed to a grammar-based one. The overall ADiEU score was higher for Jukebox (55.4%) than it was for A-player (36.3%). This was in accord with the feedback we received from users from ongoing evaluations who claimed they had better experience with the Jukebox application. The two major reasons were the support of free-form commands by the Jukebox and its broader functionality. 5 Human Evaluation in Progress The HCI methodology [10] advocates several factors that human judges collect in the process of dialog system evaluation. These key indicators include accuracy, intuitiveness, reaction time, and efficiency. When designing the evaluation method we attempted to incorporate the core of these indicators into the scoring method to ensure good correlation of the ADiEU metric with the human judgment. We are currently collecting data form the evaluation test where the human judges act as personas [11]. The results of the evaluation either confirm or reject the assumption of the ADiEU scoring correlation with human judgment. 6 Practical Considerations of the ADiEU Scoring The application of the ADiEU scoring to an arbitrary dialog system has several practical considerations. Generally, there are two possibilities how to evaluate a thirdparty dialog system by our metric: 1) agreed API contract supported by the external system or 2) rich enough tracing and logging information. Both approaches will typically require cooperation with the supplier of the measured system. The API approach asserts there exists a runtime API that supports e.g.: simulating input to the system, changing the dialog state, obtaining notification about dialog state changes with sufficient introspection, possibility to read output of the system. The logging approach demands the application to write all the required information to a log file, ideally in a format compliant with the ADiEU score measuring tool. This usually means tight cooperation with the dialog system engineers, but it is easier and more straight forward than changing the application API in the case it does not provide access to all information needed by the ADiEU metric. Having the test run in the form of log has the advantage of the possibility to send the logs to the scoring tool hosted as a web service and the possibility to evaluate the system against multiple domain ontologies or ontology versions of the same domain. We have experimented with both approaches while evaluation our systems. 7 Conclusion We introduce a method for quantitative evaluation of spoken dialog system that utilizes the domain knowledge encoded by a human expert. The evaluation results are 294 J. Kleindienst, J. Cuřín, and M. Labský described in the form of a comparison metric consisting of domain coverage and dialog efficiency scores allowing to compare relative as well as absolute performance of a system within a given domain. This approach has an advantage of comparing incremental improvements on an individual dialog system that the dialog designer may want to verify along the way. In addition, the method allows to cross-check the performance of third-party dialog systems operating on the same domain and immediately understand the strong and weak points in the dialog design. Human evaluations are currently conducted to estimate the correlation between the ADiEU score and human judgment. The subjectivity of human scoring and consensus on the ontology coverage are subject of further investigation. References 1. Weizenbaum, J.: ELIZA - A Computer Program for the Study of Natural Language Communication between Man and Machine. Communications of the Association for Computing Machinery 9, 36–45 (1966) 2. Allen, J., Chambers, N., Ferguson, G., Galescu, L., Jung, H., Swift, M., Taysom, W.: PLOW: A Collaborative Task Learning Agent. In: Twenty-Second Conference on Artificial Intelligence, AAAI-2007 (2007) 3. Cassell, J., Stocky, T., Bickmore, T., Gao, Y., Nakano, Y., Ryokai, K.: Mack: Media lab autonomous conversational kiosk. In: Imagina 2002 (2002) 4. Graesser, A.C., VanLehn, K., Rosfie, C.P., Jordan, P.W., Harter, D.: Intelligent tutoring systems with conversational dialogue. AI Mag. 22(4), 39–51 (2001) 5. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (International edn.). Prentice-Hall, Englewood Cliffs (February 2000) 6. Gandhe, S., Traum, D.: Evaluation understudy for dialogue coherence models. In: Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, Columbus, Ohio, June 2008, pp. 172–181. Association for Computational Linguistics (2008) 7. Walker, M., Kamm, C., Litman, D.: Towards developing general models of usability with paradise. Nat. Lang. Eng. 6(3-4), 363–377 (2000) 8. Hajdinjak, M., Mihelific, F.: The paradise evaluation framework: Issues and findings. Comput. Linguist. 32(2), 263–272 (2006) 9. Le Bigot, L., Bretier, P., Terrier, P.: Detecting and exploiting user familiarity in natural language human-computer dialogue. In: Asai, K. (ed.) Human Computer Interaction: New Developments, pp. 269–382. InTech Education and Publishing (2008); ISBN: 978-9537619-14-5 10. Nielsen, J.: Heuristic evaluation. In: Nielsen, J., Mack, R.L. (eds.) Usability Inspection Methods, pp. 25–64. John Wiley & Sons, New York (1994); ISBN: 0-471-01877-5 11. Carroll, J.: Human Computer Interaction in the New Millennium. ACM Press, New York (2001) Interpretation of User Evaluation for Emotional Speech Synthesis System Ho-Joon Lee and Jong C. Park Computer Science Department, KAIST 335 Gwahangno, Yuseong-gu, Daejeon 305-701 Republic of Korea hojoon@nlp.kaist.ac.kr, park@cs.kaist.ac.kr Abstract. Whether it is for human-robot interaction or for human-computer interaction, there is a growing need for an emotional speech synthesis system that can provide the required information in a more natural and effective manner. In order to identify and understand the characteristics of basic emotions and their effects, we propose a series of user evaluation experiments on an emotional prosody modification system that can express either perceivable or slightly exaggerated emotions classified into anger, joy, and sadness as an independent module for a general purpose speech synthesis system. In this paper, we propose two experiments to evaluate the emotional prosody modification module according to different types of the initial input speech. And we also provide a supplementary experiment to understand the apparently prosody-independent emotion, or joy, by replacing the resynthesized joy speech information with original human voice recorded in the emotional state of joy. Keywords: Emotional Speech Synthesis, User Evaluation, Emotional Prosody Modification, Affective Interaction. 1 Introduction Speech is understood as the most basic and widely used communication method for the expression of one’s thoughts during human-human interactions, and studied also for a user-friendly interface between humans and machines. The recent progress in speech synthesis has produced artificial voice results with very high intelligibility, but the quality of sound and the naturalness of inflection still remain a major issue. Recently, in addition to the need for improvement in sound quality and naturalness, there is a growing need for a method to generate spoken language expressions with appropriate emotions to provide the required information in a more natural and effective manner, as well as for the enhancement of an emotional speech synthesis system for effective human-robot interaction. The related work in the field confirms the common belief that prosody plays a key role for the task [1, 2]. However, during the development of our emotional speech synthesis system [3], we realized that, while there are emotions that can be easily perceived with simplified prosody structures, there are those that are very hard to express with prosody structures alone, even when we provide the most accurate prosody structure. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 295–303, 2009. © Springer-Verlag Berlin Heidelberg 2009 296 H.-J. Lee and J.C. Park In order to identify and understand the characteristics of these emotions and their effects, we propose in this paper a series of user evaluation experiments on an emotional prosody modification system that can express either perceivable or slightly exaggerated emotions as an independent module for general purpose speech synthesis systems. 2 Emotional Speech Synthesis System For the analysis of prosody structure through a more precise level of units, we annotated the Korean emotional speech corpus, distributed by the Speech Information Technology & Industry Promotion Center [4], with the K-ToBI labeling system. This speech corpus was recorded by six professional actors and actresses in a sound-proof room, and is composed of emotionally neutral ten sentences with six different emotions (joy, anger, sadness, fear, boredom, and neutral). An AKG C414-B ULS microphone was used with a 16KHz sample rate, and each speech was stored as a 16bit Windows wave format. We used eight sentences spoken by six speakers, as described in Table 1, considering four emotions (joy, anger, sadness, and neutral). The number of Ejeols (words separated by a space) was evenly distributed from 1 to 6. Table 1. Eight sentences used for prosody structure analysis Ejeol 1 1 2 3 3 4 5 6 Sentence . (Yes.) . (No.) . (I don’t know either.) , . (See, let’s end it now.) . (It really is.) ? (Where are you going now?) . (This is not what I wanted.) . (I shut the door closed asking her not to leave.) 예 아니요 나도 몰라 야 이제 그만하자 정말 그렇단 말이야 지금 어디 가는 거야 이건 내가 원하던 게 아니야 난 가지 말라고 하면서 문을 닫았어 The Korean emotional speech corpus had passed manufacturer’s perception test performed by twenty subjects (eighteen males, two females), and Table 2 below shows the results. Among the emotions, anger turned out to be the most perceivable emotion (94.3%), and fear, the most confusing one (80.3%). However, the overall acceptance rate is more than 80%. For the analysis of dominant emotional prosody patterns, we annotated eight sentences spoken by six speakers with four emotions, or 192 pieces of speech in total with the K-ToBI labeling system [5]. And for the statistical verification of the K-ToBI labeled data, we performed Pearson’s Chi-square tests. As shown in Fig. 1, the results support the null hypothesis that each emotion has distinct Intonational Phrase (IP) boundary patterns that can distinguish one emotional state from the rest. Then we calculated adjusted residuals to find the distinct pitch contour pattern or patterns. If the calculated value of the adjusted residual is bigger than 2, that feature can be statistically Interpretation of User Evaluation for Emotional Speech Synthesis System 297 interpreted as the dominant pattern of a certain emotion. Pearson’s Chi-square tests and adjusted residual were performed by SPSS software. From the statistical analyses of pitch contour patterns, we were able to find very strong tendencies between anger and HL%, joy and LH%, sadness and H%, and neutral and L%. Fig. 1. Chi-square test and adjusted residual calculation results Table 2. Perception test result done by twenty subjects Speaker CWJ KKS LHJ MYS PYH YSW Average Neutral 89.5 62.5 83.5 84.5 85.0 95.4 83.3 Joy 93.5 90.5 67.5 91.5 95.0 89.5 87.9 Anger 88.5 92.0 98.0 90.0 99.0 98.5 94.3 Sadness 85.5 80.5 84.5 89.5 94.0 89.5 87.3 Fear 59.0 85.5 88.5 93.5 61.5 93.5 80.3 Boredom 93.0 82.0 84.0 81.0 94.5 81.0 85.9 To incorporate these analyzed and distinct Intonational Phrase boundary patterns for different emotional states, we propose a prosody-unit-level emotional prosody modifier that produces distinct pitch contour, intensity contour, and speech duration according to the three different emotional states: anger, joy, and sadness. The emotional prosody modifier is a simple, coarse-grained prosody re-synthesis module that consists of a pitch contour mapping function, a pitch exaggeration function, an intensity variation 298 H.-J. Lee and J.C. Park function, and a duration variation function. We set the empirical value of each prosodic parameter based on the previous findings in the literature [1, 2], also taking into account language specific phenomena for Korean including the speaker’s gender information, short and long vowel sound disambiguation [6, 7], and prosodic structure of discourse markers [8], captured from various Korean speech corpora. Equation 1 below shows the algorithm of our pitch contour modification function. This pitch contour modification function generates the base emotional pitch contour of speech including the synthesized results of Text-to-Speech (TTS) systems and recorded human voice for each emotion. (1) where t ∈[t1,t 2] ; y y′ original pitch value as a function of time t ; modified pitch value; a maximum / minimum pitch range ; initial position of pitch contour; final position of pitch contour (rising tone: 0.5, rising-falling: 1); and declination / ascent level. b c d After the modification of the base emotional pitch contour, we apply a pitch exaggeration function to characterize the difference in pitch variation according to the difference in emotion types. First, this module detects eight pitch points per unit. Then we exaggerate the difference in each pitch pair by adding 6Hz for joy and anger, and 40Hz for fear and sadness. Next, we adjust the intensity with the intensity contour modification function which is similar to the pitch contour modification function in Equation 1, but much simpler. Then we control the duration of each unit preserving the intrinsic value of f0. All these four modules are implemented in a PRAAT [9] script supporting not only commercial TTS systems, but recorded human voice also. We used the Python language for the interface of PRAAT software and TTS output or human voice, and therefore this module supports both Linux and Windows environments. Fig. 2. Pitch and intensity traces of original speech, spoken in a neutral emotional state Interpretation of User Evaluation for Emotional Speech Synthesis System 299 Fig. 3. Pitch and intensity traces of prosody modified speech to a sad emotional state 이건 내가 원하던 Fig. 2 shows the prosody trace of a recorded Korean utterance “ .” which means in English “This is not what I wanted.” spoken neutrally by a professional actress, and Fig. 3 shows its modified prosody trace as a sad emotional state produced by our emotional prosody modifier. The blue line (upper line) indicates the pitch contour, and the green line (lower line) the intensity. In Fig. 3, the entire duration is lengthened from 1.753 seconds to 2.805 seconds without any side effect such as f0 contour lowering. Pitch contour is spread more widely, and intensity is weakened. 게 아니야 3 Evaluation of Emotional Speech Synthesis System For the identification and understanding of the characteristics of three basic emotions and their effects, we prepared three stages of experiments. The first and second experiments are designed to evaluate the emotional prosody modifier according to different types of the initial input speech, such as monotonous-prosody speech and excited-prosody speech. The supplementary experiment is performed to identify apparently prosody-independent speech. The subjects of these three experiments are fourteen kindergarten teachers, twelve of them females and two males. They are 29.6 years old on average. We did not carry out any prior training for the fourteen subjects, and answers were not notified to the subjects after the experiments. At the beginning of the experiments, subjects were asked to choose one most likely emotion among anger, joy, sadness, and neutral. We used five semantically neutral sentences as show in Table 3. For the first experiment, five neutrally recorded speech files were used as a monotonous input speech, and the emotional prosody modifier produced fifteen results with three emotional states. The test sequences of first and second experiments were randomly organized. Table 3. Input sentences for the evaluation of emotional prosody modifier Sentence 야, 이제 그만하자. (See, let’s end it now.) 정말 그렇단 말이야. (It really is.) 지금 어디 가는 거야? (Where are you going now?) 이건 내가 원하던 게 아니야. (This is not what I wanted.) 난 가지 말라고 하면서 문을 닫았어. (I shut the door closed asking her not to leave.) 300 H.-J. Lee and J.C. Park Table 4 shows the evaluation results of the emotional prosody modification with monotonous input speech. From the analysis of the results of the first experiment, we find that anger is very sensitive to emotional prosody structure (80% of perception rate). And sadness also shows a strong relationship with prosody structure. It is rather surprising to note that none of the subjects perceived joy from the monotonous input speech, even though we modified the prosody structure of joy based on the analyses of real speech, exactly as we did for anger and sadness. Table 4. Evaluation result for monotonous input speech Anger Joy Sadness Anger 56 (80.0%) 12 (17.1%) 4 (5.7%) Joy 3 (4.3%) 0 (0%) 2 (2.9%) Neutral 6 (8.6%) 16 (22.9%) 23 (32.9%) Sadness 5 (7.1%) 42 (60.0%) 41 (58.6%) Total 70 70 70 For the second experiment, we used five pieces of excited voice as the input for the emotional prosody modifier, and generated fifteen randomly organized test sets. Table 5 indicates the results of the second perception experiment. Table 5. Evaluation result for excited input speech Anger Joy Sadness Anger 56 (80.0%) 18 (25.7%) 3 (4.3%) Joy 7 (10.0%) 15 (21.4%) 38 (54.3%) Neutral 6 (8.6%) 15 (21.4%) 18 (25.7%) Sadness 1 (1.4%) 22 (31.4%) 11 (15.7%) Total 70 70 70 Interestingly, anger preserved prosody sensitivity when the type of input was changed from monotonous-prosody speech to excited-prosody speech. From the second experiment, two major changes were observed: an increase in the perception rate of joy, and a decrease in the perception rate of sadness. The decrease in the perception rate of sadness can be caused by the sudden change of the test environment. In order to indentify the cause of this sudden change, we proposed the third experiment. However, the expected response of the perception rate of joy was still very weak. To identify the characteristics of the emotional prosody structure of joy, and to validate the hypothesis above on a sudden change of sadness, we performed the third experiment with the same subjects and in the same sequence as the second experiment. The only difference between the second and third experiments was just the replacement of the modified joy speech with the original human voice recordings in the emotional state of joy, which had passed the manufacturer’s perception test at the rate of 91.5%. Interpretation of User U Evaluation for Emotional Speech Synthesis System 301 Table 6. Evaluatio on result for repeated test with human voice recordings Anger Joy Sadness Anger 58 (82.9% %) 32 (45.7% %) 10 (14.3% %) Joy 7 (10.0%) 12 (17.1%) 18 (25.7%) Neutral 4 (5.7%) 15 (21.4%) 19 (27.1%) Sadness 1 (1.4%) 11 (15.7%) 23 (32.9%) Total 70 70 70 After the third perceptio on test, we made three interesting interpretations from the results shown in Table 6. First, F the same sequence in the repeated experiment did not seem to influence the percception rate of anger. There was only a slight movem ment from neutral to anger. This allows us to define anger as a primarily prosody-sensittive emotion. Second, we found that some s part of the decreased perception rate was due to the sudden change of the test en nvironment. So it is a possible interpretation that there w was a confusion of sadness in the second experiment. Despite the result of the secoond experiment, it appears that sadness s is also a prosody-sensitive emotion. Third and most importan nt, we could not find any meaningful relationship betw ween the prosody structure and the t emotion of joy, even though we used real voice whhich had passed the manufacturrer’s perception test at the rate of 91.5%. This leads uss to conclude that joy is not a prosody sensitive emotion, which forces us to find othher, effective approaches to ex xpress the emotion of joy through an emotional spoken language generation system m. 4 Discussion For the accurate understan nding of each evaluation result, a quantitative compariison method that can also desccribe the influence of wrong answers is called for. For example, the perception rate of the first experiment related to anger is just equaal to that of the second experimeent. But for the same category, it is very hard to figure out the influence of errors su uch as joy and sadness. For this kind of interpretattion including error analysis, we suggest a Euclidean distance based quantitattive comparison method. Fig. 4 describes a Euclidean distance model of tetraheddron designed for the analysis off four types of category. Fig. 4. Euclidean distance model for tetrahedron 302 H.-J. Lee and J.C. Park From this point of view, we can calculate and compare each distance described in Table 4, Table 5, and Table 6. When the size of n is 70, the maximum distance of each category is approximately 98.99, and the minimum distance is 0. Table 7. Euclidean distance of Table 4 Anger Joy Sadness Anger 16.31 73.38 81.06 Joy 87.67 84.05 82.76 Neutral 85.24 69.46 62.53 Sadness 86.06 34.41 37.28 Table 8. Euclidean distance of Table 5 Anger Joy Sadness Anger 16.79 60.32 79.86 Joy 84.51 63.70 38.44 Neutral 85.33 63.70 65.41 Sadness 89.34 55.48 72.51 Considering both correct answers and errors, we conclude that synthesized anger based on the monotonous input speech is slightly closer to the position of anger than that based on the excited speech, even though they have the same perception rate. And for the synthesis of anger, the change of initial input speech from monotonous to excited one increases the distance of joy by 3.16, but decreases the distance of neutral by 0.09 and sadness by 3.28. 5 Conclusion In this paper, we proposed an emotional prosody modification system, and evaluated the performance of the system, in order to find a relationship between prosody structures and emotions. First, we proposed a prosody-unit-level emotional prosody modification system that produces distinct pitch contour, intensity contour, and speech duration according to three different emotional states: anger, joy, and sadness. And during the evaluation process, anger and sadness were identified as prosody sensitive emotions, whereas joy was not. Consequently, this difference led us to discover the possibilities and limitations of prosody modification for the generation of emotional spoken language expression systematically. Further analyses of emotional speech data are necessary, taking into account various speakers, speaking environment, and speaking styles. And more organized evaluation and interpretation strategies are essentially needed for further work. Acknowledgments. This research was performed for the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs, funded by the Korea Ministry of Knowledge Economy. Interpretation of User Evaluation for Emotional Speech Synthesis System 303 References 1. Schröder, M.: Emotional Speech Synthesis: A Review. In: Eurospeech 2001, vol. 1, pp. 561–564 (2001) 2. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18(1), 32–80 (2001) 3. Lee, H.-J., Park, J.C.: Customized Message Generation and Speech Synthesis in Response to Characteristic Behavioral Patterns of Children. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4552, pp. 114–123. Springer, Heidelberg (2007) 4. SiTEC Emotional Speech Corpus, http://www.sitec.or.kr/English/index.asp 5. Jun, S.-A.: K-ToBI (Korean ToBI) Labeling Convention. Korean Journal of Speech Science 7 (2000) 6. Lee, H.-J., Park, J.C.: Lexical Disambiguation for Intonation Synthesis: A CCG Approach. In: Korean Society for Language and Information, pp. 103–118 (2005) 7. Lee, H.-J., Park, J.C.: Vowel Sound Disambiguation for Proper Intonation Synthesis. In: 19th Pacific Asia Conference on Language, Information and Computation, pp. 131–142 (2005) 8. Lee, H.-J., Park, J.C.: Characteristics of Spoken Discourse Markers and their Application to Speech Synthesis Systems. In: 19th Annual Conference on Human and Cognitive Language Technology, pp. 254–260 (2007) 9. PRAAT, http://www.praat.org Multi-level Validation of the ISOmetrics Questionnaire Based on Qualitative and Quantitative Data Obtained from a Conventional Usability Test Jan-Paul Leuteritz1, Harald Widlroither1, and Michael Klüh2 1 Fraunhofer IAO / Universität Stuttgart IAT, Nobelstr. 12, 70569 Stuttgart, Germany 2 Hansgrohe AG, Auestr. 5, 77761 Schiltach, Germany {Jan-paul.leuteritz,Harald.widlroither}@iao.fraunhofer.de, Michael.klueh@hansgrohe.com Abstract. Qualitative and quantitative data, collected during a usability evaluation of two innovative prototypes of a small display touch screen device, have been used to perform a multi-level assessment of the questionnaires used within the trial. The use of different validation methods is depicted and discussed concerning their advantages and disadvantages. The conclusions from the validation study are depicted, revealing that the usage of the ISOmetrics for testing uncommon prototypes may result in insufficient validity of the instrument. Keywords: Validity, questionnaire, ISOmetrics, AttrakDiff, small display devices, shower control. 1 Introduction 1.1 The Goal of the Study Questionnaires are, at a first glance, a highly attractive instrument for the evaluation of new, innovative prototypes: First, they are easy to use. Moreover, they can be given to a high number of participants without resulting in a lot of extra work for the experimenter. Their use is easy to argue because they are standardised, therefore they are rather objective in comparison to other usability evaluation methods. Last but not least, questionnaires give out a numeric measure of clearly defined dimensions of the users’ cognitions or emotions, which makes their results easy to interpret and to explain to clients or colleagues. However, depending on the exact evaluation task, some usability questionnaires are more adequate than others. Some of them might even become invalid or practically useless if they are used in a certain context. In order to decide which questionnaire to use within their specific projects, usability professionals need empirically based information on the strengths and weaknesses of certain groups of questionnaires or even specific instruments. As this kind of information is usually not provided in the testing manuals, other ways must be encountered to retrieve it. This article proposes a method to tackle this problem. The idea behind this method is based on the assumption that many usability professionals frequently do evaluations J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 304–313, 2009. © Springer-Verlag Berlin Heidelberg 2009 Multi-level Validation of the ISOmetrics Questionnaire 305 under repeating conditions; they work, for example, with the same user group or a similar test pattern or they usually evaluate prototypes from a certain line of products. Hence, they could use data from their own tests to cross-validate their survey instruments and see what kind of information they yield. This solution is fine, as long as the cross-validation procedure does not consume too much effort. In order to find out if such an approach could be recommendable, Fraunhofer Institute of Industrial Engineering (Fraunhofer IAO) conducted the study described in this article. An evaluation project commissioned by the German shower technology manufacturer Hansgrohe AG served as the basis of the multi-level validation approach. A usability test design was developed that would not just answer the respective evaluation questions but that would also provide data for multi-level validation procedures of the questionnaires used. It was paid attention to keep the additional efforts which only served the validation task as low as possible. This article presents the outline of the evaluation study and the detailed results of the multi-level validation approach. It aims at inviting other usability professionals to use and/or refine this method. Fig. 1. Prototype A (© Hansgrohe AG, Design by Phoenix Design, Stuttgart) Fig. 2. Prototype B (© Hansgrohe AG, Design by Phoenix Design, Stuttgart) 1.2 The Evaluation Project The devices to be tested were two prototypes of a wall-mounted device for controlling the different functions of a modern comfort-shower: hand showers, overheadmounted shower plates offering various combinations of water rays, wall-mounted shower-heads, steam-bath functions, coloured lighting, and a music-player. The designs, including the interaction concept, had been created by Phoenix Design GmbH & Co.KG, Stuttgart. Prototype A (Fig. 1) was a touch-screen device that featured two additional buttons and a pusher-and-rotator switch. Prototype B (Fig. 2) had a smaller screen that did not respond to touch input. It was instead controlled by a number of buttons, including a 306 J.-P. Leuteritz, H. Widlroither, and M. Klüh set of four arrow-buttons, an “OK”-button, a “menu”-button, a back-button in form of a u-turn-arrow. Prototype B also featured the pusher-rotator switch. The usability test was meant to identify the prototype with the better usability, which would then be finalised, while the other prototype would be discarded. Furthermore, the test had to provide information on how to improve the better prototype in the next design development phase. 2 Theoretical Background 2.1 Definition of Usability The definition of usability on which this validation study is based had been taken from ISO 9241-11. The main advantage of ISO 9241 is that it is an international standard and therefore widely accepted. Furthermore, other definitions of usability (like Nielsen’s definition, see Nielsen 1993) seemed to be less adequate for a validation study, as it was suspected that their subordinate constructs might not be independent factors and hence increase the preparatory efforts to be undertaken. ISO 9241-11 defines usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.” 2.2 Measuring Usability According to ISO 9241, efficacy and efficiency are best measured by so-called objective data, which means behaviour data such as error rates or the time needed to complete a task. This data can be collected during a standardised experiment. The measurement of satisfaction is more difficult, because satisfaction is the user’s subjective reaction to the interaction with the product (ISO 9241). Hassenzahl (2004) states that user satisfaction is an emotion which results from the user comparing his expectations of the system to his actual experiences with it. Satisfaction is therefore only to be measured by asking the user about his feelings towards the system. With regards to the above given argumentation, it was assumed that • The most valid measure or criterion for the efficacy of use would be the number of tasks people were not able to finish by themselves. • The most valid measure or criterion for the efficiency of use would be either the number of mistakes people made during the trial or the time they needed to complete all tasks. • The most valid measure or criterion for the users satisfaction with the interface would be either the result of a questionnaire, most probably a semantic differential, or a quantified item on their preference or choice of prototype after the test. 2.3 Selection and Purpose of the Questionnaires After collecting information about the available psychometric instruments, it was decided to use two questionnaires within the study: Multi-level Validation of the ISOmetrics Questionnaire 307 1. The ISOmetrics (Gediga & Hamborg, 1999), which is supposed to measure usability, using the set of seven dimensions for the design of dialogue systems defined in ISO 9241-10. It’s a five-point Likert-scale questionnaire. As the experiment focused on the dialogues of the shower system, the ISOmetrics seemed to be adequate. As the ISOmetrics is based on the ISO standard, it was expected to fit well into the theoretical approach chosen. According to the above given definitions of criterions, the ISOmetrics was in this study not the main source for usability measures but rather an additional instrument, the validity of which was to be examined. It was planned to compare the questionnaire results with the criterions for efficacy and efficiency and with the qualitative data collected during the test. 2. The second questionnaire, the AttrakDiff (Hassenzahl et al., 2003), is a sevenpoint-scaled semantic differential questionnaire, which is supposed to measure the attractiveness of a system to a user. Although Hassenzahl et al. (2003) do not directly state that the AttrakDiff questionnaire measured satisfaction, the construct of attractiveness, seems to reflect quite well the whole range of expectations a user can have. Hence, this was the instrument selected for the measurement of satisfaction. Validating the AttrakDiff in this context was more difficult because there is hardly a better criterion for the users emotions towards a technical system than their responses to an emotion-focused questionnaire. The only other criterion is the subsequent behaviour towards the system after the test – the motivation to carry on with the communication with the system. This is reflected in quantitative preference judgement, which was therefore selected as criterion for the AttrakDiff. 3 Method 3.1 Sample 22 users (12 women, 10 men) participated in the study, each providing both quantitative and qualitative data for the validation project. They had an arithmetic mean age of 39.1 years (SD = 14.5 years). The sample consisted of 10 potential customers, 4 elderly users (60+, selected for their lack of experience with information technology) and 8 additional users from Fraunhofer Institute. 3.2 Experimental Setting The prototypes were simulated on a touch-screen monitor, mounted within the wall of a trade-fair mock-up of a shower cabin. The test was done without having water pour from the showers, the users wore normal clothing. Therefore, a video of the shower’s functions was shown at the beginning of the test. Each participant tested both of the prototypes; the sequence was matched according to person characteristics. Each prototype test consisted in a set of tasks the participants had to complete and a questionnaire given after the completion of the task-set. The experiment ended with final questions, asking for a comparison between the tested devices. It was assured that every participant completed all tasks. Whenever the participant was unable to complete a task by himself, the experimenter provided the information 308 J.-P. Leuteritz, H. Widlroither, and M. Klüh for the next step and placed a marker in the log-file, indicating that help had been given. If the participant was able to continue by himself after receiving a hint, no further advice was given. Otherwise every assistance was rendered that was needed to complete the task. Participants were instructed to complete each task as fast as possible, without thinking aloud or giving comments. This should guarantee the reliability of the time-measures. The test was conducted in German language, including instructions and questionnaires. Each test lasted between 90 and 120 minutes. All tests were conducted by the same instructor, using a written instruction. The first trials were supervised. 3.3 Variables Collected For each participant’s interaction with each prototype, the number of tasks was counted that he/she could not complete without the help of the test instructor (number of hints). For every task of each participant, the number of errors1 they committed was counted and the time to complete the task was measured, using an automatic logging technique. The questionnaire given to the participants after each of their two trials contained: 1. The ISOmetrics in a shortened version. Items that did not apply to shower controls had been deleted. The subscale “suitability for individualization had been removed entirely as none of the items fit. This shortened version is referred to as ISOmetricsSDD (ISOmetrics for small display devices) in the text below. 2. The AttrakDiff in its full version. 3. Additional items, including • one item to determine which of the two prototypes the user would prefer in the end and • one item that asked to quantify the superiority of the preferred prototype on a five-point-scale. Qualitative data was taken from the participants’ statements and comments during and after each task. All test sessions were videotaped in order to allow a thorough analysis of all the statements the users gave and all their actions, including errors that did not appear in the log-files (e.g., touching the screen of prototype B). 4 The Validation Procedure and Its Results 4.1 Reliability As the instruments were not new but commonly used ones, there wasn’t any attention paid to the factorial structure of the answers. The reliability of the results was calculated rather to exclude a reliability problem that would render all validation attempts useless. Cronbach’s α was chosen as a correct factorial structure of the instruments had been assumed. 1 “Errors” were all intended button-pushes that did not contribute to the solution of the task. Due to the specifications of the log-file, special exceptions were phrased to exclude, for example, unnecessary rotating of the pusher-and-rotator switch from the errors count. Multi-level Validation of the ISOmetrics Questionnaire 309 Table 1. Reliability Estimation of the ISOmetricsSDD subscales, using Cronbach’s α Scale No. of items prototype A prototype B Suitability for the task 7 .70 .90 Self-descriptiveness 4 .75 .87 Controllability 4 .68 .81 Conformity with user expectations 5 .76 .78 Error tolerance 3 .48 .74 Suitability for learning 4 .79 .90 4.2 Content Validity A survey with three usability experts from Fraunhofer IAO, conducted before the usability evaluation of the shower prototypes, did not yield any majority vote calling for the deletion or the addition of a specific item or aspect to/from the ISOmetricsSDD. The lowest mean-estimation of a subscale’s validity was 82% (see table 3). Additionally, it has to be stated that the interviewed specialists did not even know the shower control prototypes and hence demanded the inclusion of items that would generally be useful but that had no application in this study. Table 2. Consolidated ratings of the content validity of the ISOmetricsSDD No. of evaluators requesting a change Mean estimation of validity Number of items to eliminate Number of aspects missing Suitability for the task 1 83 % 0 2 Self-descriptiveness 2 82 % 0 1 Controllability 1 90 % 2 2 Conformity with user expectations 0 88 % 0 1 Error tolerance 0 95 % 0 0 Suitability for learning 2 85 % 1 2 4.3 Criterion-Based Validity Extreme-group-validation The ISOmetricsSDD questionnaire was clearly able to identify the “better” prototype, preferred by 20 of 22 participants. Prototype A yielded with 4.12 (SD = 0.50) a significantly higher sum-score than Prototype B with 3.29 (SD = 0.32) (t(21) = 5.90, p = .00). Hence, using the ISOmetricsSDD would have led to the correct decision which prototype to discard. 310 J.-P. Leuteritz, H. Widlroither, and M. Klüh Correlation of problem-counts and subscale-means Another validation method applied here repeated a procedure that had already been used in a study which reported satisfying validity of the ISOmetrics questionnaire (Ollermann, 2004). A category system was created for all the usability problems encountered. Sources were the statements of the participants, the notes of the test instructor and the log-files. For each problem category, the number of occurrences was counted. Then, each problem category (40 for prototype A and 31 for prototype B) was assigned to one ISOmetricsSDD subscale. Afterwards, for each subscale the numbers of appearances of each assigned problem category were summed. This way, the whole sum of all usability problems encountered was split between the questionnaire subscales. Finally, the Pearson-correlation between the number of problems and the mean score of the subscale was calculated for both prototypes. The application of Ollermann’s method yielded less promising results: For prototype A the correlation between the usability problems encountered and the arithmetic mean scores of the ISOmetricsSDD subscales was r = -0.259 (N = 6; p = .310). For prototype B this correlation was r = 0.020 (N = 6, p = .485).2 Correlation of ISOmetricsSDD and metric criteria The Pearson-correlation between the score-differences of the ISOmetricsSDD and the differences in errors committed was statistically not significant with r = -.11 (N = 22, p = .66). The Pearson-correlation between the score-differences of the ISOmetricsSDD and the differences in the time needed to complete all tasks was statistically not significant with r = -.29 (N = 22, p = .19). The Pearson-correlation between the (A-B) difference in number of hints (the number of tasks that could only be completed with the instructor’s help) and the score-differences of the ISOmetricsSDD was r = .386 (N = 22, p = .076). Correlation of AttrakDiff and the preference item With a single item it was intended to create a criterion for the validity of the AttrakDiff questionnaire. The item requested the participant to describe the degree superiority of the better prototype to the weaker one using a 5-point Likert-Scale. The score was Pearson-correlated to the difference of the AttrakDiff sum scores (not-preferred prototype minus preferred prototype). The result was r = -.44, statistically significant with p = .04 (N = 22), which due to the value coding shows that indeed those participants who perceived their favourite to be to a great extent superior to the other prototype also yielded a higher difference in the AttrakDiff sum-scores, pointing in the same direction. 5 Discussion of the Results The results of the survey among usability experts show that there are no severe problems concerning the content of the ISOmetricsSDD items. They apparently represent quite well the ISO definition of the different constructs describing the usability of dialogue systems. However, the correlation between the numbers of problems 2 N in this case is not the number of participants but the number of subscales used. Multi-level Validation of the ISOmetrics Questionnaire 311 assigned to each subscale and the mean scores of the subscales does not support these validity assumptions. Correlations with N=6 should not be over-interpreted and significances are not to be expected in any case. However, if one looks at the whole correlation matrix, he will find that the Pearson-correlation between the ISOmetricsSDD scores of prototype A and prototype B was r = 0.416 (N = 6; p = .206) and that the problem counts of A and B correlated by r = 0.627 (N = 6, p = .091). This indicates that the data are not totally random. There are coherences between the ISOmetrics – scores and between the problems found for both prototypes. So the question is: Why do neither the sum-scores of the subscales correlate with the problem count, nor do the entire sum-scores correlate with the most objective measures of usability – user mistakes and time-to-complete? There is just no match between the usability problems and the questionnaire results. In Ollermann’s study, the first correlation coefficient found was r = 0.277. As there was one subscale that seemed to be responsible for this low result, this subscale was eliminated, resulting in the correlation jumping up to r = 0.756 (p = .019) (Ollermann, 2004). This procedure did not seem acceptable in this study because for the two prototypes, different subscales messed up the correlation. Even more disappointing were the correlations of the ISOmetricsSDD scores with number of errors, as well as with time to complete. The AttrakDiff questionnaire yielded promising results. Regarding the fact that it was validated using just one item, resulting in a possibly low reliability of the criterion, a correlation of r = -.44 can be considered sufficiently high to indicate that the results of the questionnaire do more or less reflect the constructs named in the respective theory (see Hassenzahl et al., 2003). As a consequence of the described findings, it was assumed that the ISOmetricsSDD instrument had in this case not been measuring the system’s usability. What did it measure instead? It was presumed that the ISOmetricsSDD had failed because it tried to make usability-experts out of the users. Even for the authors of the study, the categorisation of the encountered usability problems to the questionnaire’s subscales was a difficult task. Expecting a user to remember all the problems he had encountered and to correctly map them to questionnaire items seems impossible, especially if the user is asked to do so after testing an unknown system for 90 minutes. Most probably, the test participants will rather rely on their general perception of the system, on the emotional substrate of their recent experiences. Two findings support this presumption: 1. The mean scores of the different subscales were quite similar. For prototype A the standard deviation of those subscale-means is SD = 0.15, for prototype B it is SD = 0.32, which seems small for a five-point Likert-scale. Ives, Olson and Baroudi (1983, as cited in Hartson et al., 2000) say that participants tend to fill in satisfaction questionnaires quite homogeneously. This might also apply to questionnaires like the ISOmetricsSDD. 2. The correlation between the differences of the AttrakDiff-scores (A-B) and the differences of the ISOmetricsSDD-scores (A-B) of the participants was r = 0.81 (N = 22; p = .00). This means that the ISOmetricsSDD perfectly measured the emotional value that the participants gave to the system, closely linked to what is called “satisfaction”. 312 J.-P. Leuteritz, H. Widlroither, and M. Klüh 6 Conclusions 6.1 Concerning the Findings of the Study When confronted with a system for the first time, users are probably unable to remember the usability problems they encountered and to cluster them correctly, producing a valid score in all the subscales of a questionnaire like the ISOmetrics. Participants rather seem to use the instrument to convey their overall satisfaction with the system to the test instructor. Therefore the use of questionnaires focusing on different categories of usability problems is not recommendable in certain test designs. According to the findings of this study, questionnaires like SUMI, QUIS and ISOmetrics need to be used carefully. 6.2 Concerning Multi-level Validations The aim of this article and the work depicted here is to encourage usability experts to evaluate their measurement instruments with a method similar to this multi-levelapproach. This approach of course has a downside, which is the small number of participants. In the above described case, only three usability experts have been interviewed, only two prototypes were used, only 22 participants have gone through the evaluation process, and only six subscales of the ISOmetrics were taken into account. Furthermore, aspects like the assignment of the encountered usability problems to certain scales could always be questioned. Finally, it could be argued that the changes done to the questionnaire (e.g. the deletion of items) had a bad effect on the validity of the whole instrument. The results of such a study may hence seem less apt for publication than the results of big validation studies, carried out with hundreds of participants. The advantage of this method is that without going into unreasonable costs of money or time, it combines different forms of validations and collects information that is usually just lost. Eventually, the question is if a usability practitioner’s primary interest is to win a scientific argument and publishing results or if s/he just wants to get a hint on whether a certain tool is recommendable for the planned task or not. In the second case, the common perception of usability evaluation itself would then also apply to the evaluation of the assessment tools: Little and possibly unreliable information is better than none (see Nielsen, 1993). So if the assumption is true that not just literally the validity of a questionnaire but more generally the value gained from its results is possibly dependent on the product tested, the users, and other context parameters, then the here promoted method becomes recommendable. References 1. ISO 9241, Ergonomics of human-system interaction. International Organization for Standardisation (1998) 2. Gediga, G., Hamborg, K.-C.: IsoMetrics: Ein Verfahren zur Evaluation von Software nach ISO 9241/10. In: Hollingm, H., Gediga, G. (eds.) Evaluationsforschung, pp. 195–234. Hogrefe, Göttingen (1999) Multi-level Validation of the ISOmetrics Questionnaire 313 3. Hamborg, K.-C.: Gestaltungsunterstützende Evaluation von Software: Zur Effektivität und Effizienz des IsoMetricsL Verfahrens. In: Herczeg, M., Prinz, W., Oberquelle, H. (eds.) Mensch & Computer 2002, pp. S.303–S.312. B.G. Teubner, Stuttgart (2002) 4. Hartson, H.R., Andre, T.S., Williges, R.C.: Criteria for Evaluating Usability Evaluation Methods. International Journal of Human-Computer Interaction, 2001 13(4), 343–349 (2000) 5. Hassenzahl, M., Burmester, M., Koller, F.: AttrakDiff.: Ein Fragebogen zur Messung wahrgenommener hedonischer und pragmatischer Qualität. In: Ziegler, J., Szwillus, G. (eds.) Mensch & Computer 2003, pp. 187–196. B.G. Teubner, Stuttgart (2003) 6. Hassenzahl, M.: Interaktive Produkte wahrnehmen, erleben, bewerten und gestalten. In: Thissen, F., Stephan, P.F. (eds.) Knowledge Media Design – Grundlagen und Perspektiven einer neuen Gestaltungsdisziplin, Oldenburg Verlag, München (2004) 7. Nielsen, J.: Usability Engineering. Morgan Kaufmann, Heidelberg (1993) 8. Ollermann, F.: Verhaltensbasierte Validierung von Usability-Fragebögen. In: Keil-Slawik, R., Selke, H., Szwillus, G. (eds.) Mensch & Computer 2004: Allgemeine Interaktion, pp. 55–64. Oldenburg Verlag, München (2004) What Do Users Really Do? Experience Sampling in the 21st Century Gavin S. Lew User Centric, Inc. 2 Trans Am Plaza Dr, Ste 100, Oakbrook Terrace, IL 60181, USA glew@usercentric.com Abstract. As practitioners we spend a great deal of effort designing and testing products within the confines of usability testing labs when we know that a rich user experience lies outside. What is needed is more research in “the wild” where people use the very interfaces we take so much time to design, test, iterate, and develop. Through innovative advancements in mobile technology, we can expand upon the tried and true “experience sampling” research techniques, such as diary or pager studies, to effectively solicit, monitor and receive data on users’ interactions at given points in time. This paper describes various research methodologies and recent advancements in mobile technology that can provide practitioners with improved research techniques to better assess the user experience of a product. The conference presentation will also include results from a pilot experience sampling method study focused on collecting data on usage and satisfaction of a product. Keywords: Experience sampling, in-situ research, mobile device research, pager study, diary study, mobile research, SMS studies. 1 Introduction All too often, as designers and researchers, we spend a great deal of effort creating and testing products within the confines of our corporate offices or labs. Yet, we know that there is a rich user experience that lies outside—where people use the very interfaces we take so much time to design, test, iterate, and develop. In order to create better user experiences, we need to better understand how users actually use the products we build. There are research techniques that can capture experiences that occur “in the wild”. The user experience field needs to incorporate more of these techniques into the research and discovery phases to produce more insights that can foster more thought and design discussions. This paper addresses some of the reasons why traditional research techniques fall short and describes how experience sampling methods (ESMs) can be applied using recent advances in mobile technologies to capture how users actually use products. The conference presentation will also include results from a pilot ESM study focused on collecting data on usage and satisfaction of a product. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 314–319, 2009. © Springer-Verlag Berlin Heidelberg 2009 What Do Users Really Do? Experience Sampling in the 21st Century 315 2 Common Research Techniques There is a number of common research techniques employed to understand the user experience of a product. These methods range in difficulty from easy to challenging, but each provides insight into different aspects of the user experience. 2.1 Usability Testing Usability testing with users is a critical component of any user-centered design process. Traditional usability testing involves task-based research in the lab where designs can be tested, iterated and validated. Within the confines of this controlled environment, this methodology is ideally suited to assess usability in a highly tactical and specific manner. Outcomes include answers to specific design questions. Usability testing is critical to product success because we must ensure that the core features are usable. However, the focus of usability testing on tasks is also a limitation because the lens tends to target the “walk-up-and-use” user experience of the product. Session time is often limited and the user experience typically does not involve a user who interacts with a device that he/she actually owns. As practitioners and designers, we accept the lack of external validity because of the benefits of usability testing to formative and iterative design. We apply the insights uncovered in the lab into the design and hope that they generalize to how the product is actually used in the real world. However, we understand that the usability, usage, and usefulness of a product are determined over time and not necessarily in the first hour of use in the lab setting. 2.2 Surveys and Focus Groups Often the data provided to describe the “real world” user experience is obtained through survey or focus group methodologies. While these research methods are quite useful in the early-stage development of feature importance, pricing, or intent to purchase, using this information for design is challenging. Results tend to be at a highlevel and we often need more tactical direction to meaningfully influence some of our design decisions. Even when these methods are directed toward answering design questions, the obtained data is largely retrospective in nature. We know that asking users to reflect on tasks done in the past is not as robust or credible as when the same question is asked during or immediately following the completion of the task. Satisfaction metrics can be obtained in surveys, but they would be much more useful when captured as close to the actual usage instance as possible (e.g., gathering satisfaction data after completing a task rather than asking in a focus group or survey months after the experience occurred). The benefit of a short latency between the action and the satisfaction request is more than simply measurement integrity. Specific feature and functionality questions can be asked immediately after use to acquire more insightful and relevant feedback with direct impact on design. 316 G.S. Lew 2.3 Ethnographic Research One method that avoids retrospection and any associated confabulation due to the long latency between action and question is ethnography. It involves observing user behaviors in a natural environment. However, there are obvious challenges that prevent its widespread use as a research technique. Setup and logistics necessary to observe natural behaviors are difficult (e.g., consider the case of trying to observe mobile devices where screens are small and interactions are very rapid). Fieldwork and analysis can be time-consuming. Sample sizes are often small. And most importantly, the likelihood that the output of the study will be actionable is low relative to more direct and tactical techniques such as usability testing. Because ethnography is best suited to uncovering insight that drives ideation rather than answering direct design questions, securing authorization and budget to conduct ethnographic research can be difficult. However, what cannot be refuted is that ethnographic research collects data in the environment where interactions occur and with products used by the users. 2.4 Longitudinal Research Longitudinal research captures data from users over time. With its foundations from developmental psychology, this methodology has been largely observational in nature, using correlational analysis to assess phenomena. However, the longitudinal approach has applicability to user experience research. While usability testing can be seen as tapping the user experience just once, the study could be extended over time to make multiple, repeated assessments on the same set of users over time. The study could have users perform tasks and provide feedback. Thus, learning can be an area of interest. Moreover, the methodology could assess how the user adapts and uses the product during critical periods of its lifecycle. Longitudinal research is compelling as it often involves fieldwork in a naturalistic environment with the benefit of having a more structured data collection technique. Questions, tasks, and observations can also be very design-focused and tactical. Moreover, it fills in the post-walk-up-and-use gap left open with a usability testing methodology. In short, longitudinal research offers access to the daily user experience of a product. Consider a mobile phone. Usability testing can assess the usability of core functions such as the ability to add a contact or determine whether or not there is sufficient affordance to use a specific keypad button to complete tasks. The problem is that when usability issues are uncovered, it is impossible to know if the feature that was difficult in usability testing can be learned and become second nature over time or will be left unused because users could not learn it. Information about how users interact with products over time is thus extremely valuable. Longitudinal methods could provide information about a product in the hands of users. Because assessments can be made over time, the technique can capture how the user learns to use the product. Given the possible potential of longitudinal research, why is it NOT widely used? At the 2007 CHI (ACM-SIGCHI) conference in San Jose, a new special interest group (SIG) on longitudinal research was formed. What was most interesting is that only 25% of the attendees of this SIG had actually conducted a longitudinal research study in the last couple of years. Possible reasons for why longitudinal research is rare include: What Do Users Really Do? Experience Sampling in the 21st Century 317 A. Long timelines: The business challenge of a research project where data collection is stretched over time makes longitudinal research compete with “just in time” or “we need the data last week” research alternatives. B. Cost: Building a user panel where users are tapped for an extended period has a high cost and high panel attrition. Since timelines can extend across multiple product releases with benefits to different business groups, it is unclear which group should be charged for the study. Securing funding is inherently more difficult. C. Complex logistics: Study design and execution have a high initial setup cost because every aspect of the study must be coordinated. Any repeated measures technique will require allocation of resources to manage the study activities for an extended period of time. D. High effort: Data collection requires high effort from both researchers and users who must participate across multiple data sessions. Alternatively, data come in the form of written diaries where the coding process is non-trivial. E. Difficult analysis: Analyzing the large amount of data collected can be time consuming as data are essentially multiplied by the number of repeated measures. 2.5 Need for an Alternative Method If usability testing captures walk-up-and-use usability, ethnographic research gets us in the field and longitudinal research can reveal how users learn, what still seems to be lacking is usage and motivation. Consider the mobile phone example again. Manufacturers and mobile service providers know that a call was made and how long it lasted. What is unknown, however, is whether the user called “John” from their contacts or dialed the number directly. In terms of designing features, researchers and designers are blind as to whether the user ever entered John into their contact list or what motivates the user to even use the feature. All too often, when launched, the product becomes a mysterious “black box” and we do not know how users use the product or feature that took so much effort to design. 3 Experience Sampling Method Experience sampling method (ESM) refers to in-situ (Latin for “in place”) research where the phenomenon is examined in the place where it occurs. The methodology was developed in 1977 at the University of Chicago by Csikszentmihalyi, Larson and Prescott [1] to understand the experience of adolescent activities, but its applicability to other areas of user experience is clear. ESM is more commonly referred to as a “pager study” where users are asked to provide information via a diary. Users are prompted to enter information by a “page” sent to a device (e.g., “What are you doing now?”). Participants enter data into a paper diary. Prompting could be either controlled by a researcher or set to prompt at specific intervals. The data can be analyzed to understand user activity, motivation, and other cognitive and social dimensions. This methodology can be used to assess how users use products. 3.1 ESM Coupled with Advanced Mobile Technologies It would be great if the product could tell us how it is being used, but that is not necessarily practical, nor does it provide the rich user experience as interpreted and 318 G.S. Lew provided by users. Imagine if a technology could retain the tactical and rigorous elements of “in-lab” research while capturing the richness and environmental cues associated with more natural settings. What if the satisfaction data are not retrospective, but closely tied to user behavior and actions? Through innovative advancements in mobile technology, researchers can now expand upon longitudinal and experience sampling research techniques to effectively solicit, monitor and receive data on users’ interactions at given points in time. These advancement tap directly into both application and operating system to provide the building blocks to take user experience research to new levels. 3.2 Using Mobile Technology to Capture Data Mobile device technology has advanced to a level where research can be more complex than simply paging users to ask them to write passages in a diary. The mobile device itself can be the conduit between the user and researcher. Imagine what research areas would be open if practitioners could conduct studies on a robust platform that prompts the user, collects data both from the user and from the device itself and handles logistics (e.g., compensation). Moreover, what if the device is the participant’s own personal mobile device? With full QWERTY keyboards on mobile phones, one can readily imagine feedback in the form of free-form text responses. Considering the abilities of the youth of today who can type 40 words per minute using a 12-key numeric keypad, the tremendous data collection benefit of a phone over diary input is easy to envision. In addition, the device can be leveraged as a powerful remote data collection tool where areas under investigation could be anywhere a user could go with their mobile device at their side. This opens up novel forms of research never before possible without specialized equipment specifically designed for the study. Using mobile devices, user input and feedback extends beyond making a simple selection or answering a series of questions. Users could speak their response and have it recorded. They could also respond by taking a picture or recording a video of their experience. The remote capabilities of a mobile device as a research tool create a wealth of research opportunities. LEOtrace MobileTM is a mobile technology that uses ESM to obtain data [2]. It runs on Windows Mobile 6, Symbian Series 60, and RIM Blackberry devices. User input and device information that can be collected is shown in Table 1. Table 1. Types of data that can be collected from ESM using LEOtrace MobileTM User-provided data A. Open-ended feedback B. Scaled feedback (binary, Likert-scale, slider ratings) C. Image selection D. Voice recording E. Camera image F. Video clip Device-provided data A. Task completion (success/fail) B. Event (app start/end, SMS sent, picture taken, etc.) What Do Users Really Do? Experience Sampling in the 21st Century 319 3.3 Event or Behavior Triggers Research using new mobile technologies could be further enhanced by analyzing user behaviors and feature usage to trigger prompts for user feedback. In this case, the user’s own actions are of interest and the behavior itself prompts the device to ask specific questions around the behavior captured. This differs from contrived tasks set up by a researcher for the user to complete. Algorithms can be designed to watch for specific situations to occur that would trigger research questions so feedback can be obtained very close to when the behavior happened. 3.4 Other Mobile Technologies This paper describes various research methodologies and recent advancements in mobile technology that can provide practitioners with improved research techniques to better assess the user experience of a product. Besides LEOtrace MobileTM there are several other technologies available – from those that sit on old Palm Pilots to those that run on the latest mobile devices; from techniques involving simple SMS text messages to ask for feedback to web surveys solicited via phone-based email or messaging, there are many mobile technologies that can be used to solicit data from users. As practitioners, the potential of remotely capturing user interactions in an ecologically valid manner while extending beyond walk-up-and-use usability is compelling. Experience sampling techniques can further our design practice by yielding more insight into user motivation, usage, and learning. Implications for future research are vast given the capability to more efficiently and remotely monitor user behavior and perception “as it happens.” 4 ESM Study Findings The conference presentation will include findings from an ESM study. Device usage and satisfaction data will be presented from a four-week study with a participant sample size of 100. Participants will use mobile devices they presently own. Software will be loaded on the devices to passively monitor usage. Users will also be asked to perform specific tasks. Success and failure will be reported with user feedback on their experience and satisfaction using device features. Acknowledgments. Many thanks to the development team at Nurago (www.nurago. com) for developing the LEOtrace Mobile™ software used in this research. The user experience research teams at both User Centric, Inc. (www.usercentric.com) and SirValUse Consulting GmbH (www.sirvaluse.com) deserve credit as their insight was essential to the research approach and execution of this study. References 1. Csikszentmihalyi, M., Larson, R., Prescott, S.: The ecology of adolescent activity and experience. Journal of Youth and Adolescence 6, 281–294 (1977) 2. Lew, G.S.: The truth is out there: Using mobile technology for experience sampling. User Experience 7(3), 8–10 (2008) Evaluating Usability-Supporting Architecture Patterns: Reactions from Usability Professionals Edgardo Luzcando, Davide Bolchini, and Anthony Faiola Indiana University Purdue University Indianapolis School of Informatics - HCI {eluzcand,dbolchin,afaiola}@iupui.edu Abstract. Usability professionals and software engineers approach software design differently, which creates a communication gap that hinders effective usability design discussions. An online survey was conducted to evaluate how usability professionals react to Usability-Supporting Architecture Patterns (USAPs) as a potential way to bridge this gap. Members of the Usability Professionals Association (UPA) participated in a pretest-posttest control group design experiment where they answered questions about USAPs and software design. Results suggest that participants perceived USAPs as useful to account for usability in software architectures, recognizing the importance of the USAPs stated usability benefits. Additionally, results showed a difference in perception of the USAPs stated usability benefits between US and European participants. A better understanding of what the usability community thinks about USAPs can lead to their improvement as well as increased adoption by software engineers, which can lead to better integration of usability and HCI principles into software design. Keywords: Architecture Patterns, HCI, Usability, Usability Professionals, Software Design, USAP. 1 Introduction and Motivation The development of modern interactive applications entails the necessity of a smooth coordination and cooperation between software engineers (who conceive, design and develop the technological infrastructure for the system to be) and usability professionals (who conceive, design and develop the elements of the user experience). Due to cultural and historical factors, the tools mastered by software engineers and usability professionals are different and represent their own fields in isolation, thus mining mutual understanding and ultimately posing obstacles to the efficient accomplishment of the goals of the project [1-3]. In particular, one of the common communication breakdowns between software engineers and usability professionals is the lack of strategies to inform the early design of software architectures with usability principles, which helps avoid late (and expensive) architectural changes to accommodate user experience requirements [4]. As a consequence, user requirements are typically added on top of already developed software architectures, thus being constrained and locked early on by system-centered design decisions [5]. Edwards recently warned that we have been successful at J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 320–328, 2009. © Springer-Verlag Berlin Heidelberg 2009 Evaluating Usability-Supporting Architecture Patterns 321 “covering up ill-suited infrastructure features with interface veneer, but there are limits to how far this can take us.” [6] He argues that infrastructure and interaction features need to be jointly designed, and not performed ad-hoc. To address this challenge, Usability-Supporting Architectural Patterns (USAPs) have been recently proposed as a strategy to systematically embed usability requirements in the early design of software architectures [5]. USAPs are a blend of HumanComputer Interaction (HCI) and Software Engineering (SE) principles that provide a framework to design recurrent software and user requirements (e.g. provide the user a way to undo operations). USAPs are enriched with indications on how these requirements may impact the components of the system architecture, and with examples of how to deal with it at this level. The foreseen benefits of leveraging USAPs in software design are many, including: (a) the opportunity for software engineers to consider and take into account the needs of the user experience in making strategic architectural decision; (b) to provide a shared language between usability professionals and software engineers to discuss design decisions in the light of both system and user requirements; (c) to offer reusable solutions (patterns) which capitalizes on previous design expertise. Initial studies suggest that USAPs are effective when applied by software engineers [7]. However little is known about USAPs understanding and acceptance by usability professionals [8, 9]. Acknowledging the proposal of USAPs as an important step towards bridging the communication gap between software engineers and usability professionals, we have conducted a study aimed at assessing the perceived value of USAPs among the community of usability professionals. There is the risk, in fact, that the original value of USAPs (improving mutual understanding) may be weakened amongst usability professionals by the way USAPs are proposed and described: still using concepts, terminology and notation familiar only to software engineers. The study consisted in a focused online survey administrated to usability professionals and was based on the following multi-part hypothesis: H.1 - Usability professionals can perceive Usability-Supporting Architecture Patterns as relevant in their everyday work. H.2 - Usability professionals consider the usability benefits of Usability-Supporting Architecture Patterns important for their everyday work. H.3 - If Usability-Supporting Architecture Patterns are communicated in more natural HCI terminology to usability professionals, they can better appreciate the value of Usability-Supporting Architecture Patterns in their everyday work. The remainder of the paper is organized as follows. Section 2 describes the methods and instrument used to conduct the experiment. Section 3 presents the qualitative and quantitative results. Section 4 covers the discussion of the findings, and section 5 summarizes the paper with concluding statements. 2 Methods and Instruments 2.1 Participants This study surveyed a convenience sample of usability professionals from the Indianapolis Usability Professionals Association (UPA) and the Swiss UPA. The sample included approximately 80 participants that have academic training in HCI, HCI 322 E. Luzcando, D. Bolchini, and A. Faiola professional experience, or both. The study did not differentiate between professionals and students, but it was expected that most participants would have some degree of professional experience in HCI or related fields given their involvement with the UPA. 2.2 Survey Design The study is based on a mixed-methods research design to analyze an area where little research has been conducted, following a Concurrent Triangulation Strategy [10] and the quantitative data given higher priority during the analysis. The quantitative portion of the experiment used a Pretest-Posttest Control Group Design [11] with a classic between-subjects design where participants are randomly assigned to any of two groups during the data collection phase. Participants in the experiment group received a treatment in the form of specific USAP materials consisting of a software design scenario and an USAP example. Participants in the control group did not receive the treatment. A questionnaire format was used for the pretest as well as the posttest, including both quantitative and qualitative questions. Demographic information was solicited after the questionnaire, in addition to the opportunity to provide additional comments. The survey questions were created leveraging survey design techniques from Dillman [12], and several questions were constructed based on previous questions from Schuman and Presser [13] used to survey attitudes. The online survey was constructed from scratch with a combination of PHP and MySQL technologies available at IUPUI. All data was collected and stored in university infrastructure. 2.3 Procedure The survey introduction provided a brief history about the desire to improve usability in software products. Participants were then given pretest questions to record their existing knowledge and experience. Following the pretest, the treatment introduced USAPs (to the experiment group) and explained how leveraging USAPs could facilitate the communication between usability professionals and software engineers. The treatment provided a software design scenario (canceling a command) describing the communication challenges regarding usability in software design and presented a USAP example. During the posttest, participants were asked to rate the importance of USAP usability benefits from an HCI perspective using a Likert scale. This was done with the nine original USAP usability benefits as well as a newly worded set of usability benefits meant to find if different terminology would improve acceptance. Although all nine USAP usability benefits were rated, the study focused on two Table 1. USAP Usability Benefit Comparison Design Usability Benefit Original Wording Usability Benefit New Wording 1 Accelerates error-free portion Increases Efficiency 2 Reduces impact of slips Reduce the impact of errors Evaluating Usability-Supporting Architecture Patterns 323 USAP usability benefits: Accelerates error-free portion, and Reduces the impact of slips, as shown Table 1. An initial pilot study and conversations with HCI peers suggested that these two used terminology that was confusing to a usability professional. Additional posttest questions explored further perceptions about USAPs and software design, asking participants to state their opinions about USAPs and their potential applications in practice. The survey was designed to flow as one continuous questionnaire where participants were unaware of the difference between the pretest questions or posttest questions. 3 Results From the convenience sample of 80 usability professionals, 67 participants began the survey, 49 completed the pretest and 45 completed the posttest. Of the 45 participants that completed the pretest and posttest only the results of 35 participants were complete and summarized in this section. There were 17 participants in the experiment group and 18 in the control group, and 15 were from the Swiss UPA (Region 1), and 20 from the Indianapolis UPA (Region 2). Of the 34 participants that provided demographic information 20 had a masters, doctorate or post-graduate degree, 12 had a bachelors degree, and 2 did not have any degree. From these, 25 reported six or more years of experience. When asked to what extent they agreed that usability is an important aspect of software design, all 35 participants agreed, and when asked if they had worked in close contact with software engineers, 28 of 35 participants agreed. When asked to what extent they agreed that USAPs would assist usability professionals identify usability concerns that impact the architecture of a software system, 23 of 35 participants agreed (66%). When asked if they found it challenging to apply usability principles in software design projects, 30 of 35 participants answered yes, and when asked if there is a communication gap between usability professionals and software engineers, 33 of 35 participants answered yes. Additionally, participants volunteered comments about the existence of a communication gap between usability professionals and software engineers, as summarized in Table 2. When participants were asked if they were familiar with any methodologies that would improve communication between usability professionals and software engineers, 21 of 35 answered yes (60%). In addition, those participants who answered yes where asked to list the known methodologies to substantiate their quantitative answer, and their responses are summarized in Table 3. Participants were asked to rate the importance of original USAP usability benefits as well as the newly worded version using the following scale: Very Important =1, Important=2, Somewhat Important=3, Not Important =4, and Don't Know=5. The Don’t Know answers were filtered out. The results are summarized in Fig. 1 using the following weighted average: Very Important =16, Important=12, Somewhat Important=8, and Not Important =4. 324 E. Luzcando, D. Bolchini, and A. Faiola Table 2. Identified Reasons for the Communication Gap between Groups1 Answer Identified Issue Yes Knowledge: software engineers only know software development and usability professionals only know usability. They don’t know each other’s disciplines. Core focus in project: software engineers focus on getting all system parts to work, and usability professionals only focus on system parts that impact the user interface. Mutual understanding: Both groups struggle to understand each other’s needs. Awareness: software engineers have not been exposed to usability and usability professionals have not been exposed to software engineering. Process: The software design process is may or may not include usability. Availability of usability people: Not all project benefit from the participation of usability professionals. Stated there is gap, but did not elaborate on the reason. No gap No Participants 5 7 4 2 1 2 2 1 Table 3. Reported Methods to Improve Communication2 Listed Methods MILE+ Open communications (e.g. meetings, workshops) AWARE HCI-driven methodologies Using prototypes and mockups Software development methodologies Conceptual Comics Participants 2 10 1 1 3 6 1 An independent groups t test was used to test the difference in the mean response or rated importance of the target USAP usability benefits Accelerates error-free portion and Reduces impact of slips. Respondents from Region 2 (M = 1.76) showed a lower mean response than those from Region 1 (M = 2.29), t(30) =2.09, p < .05, r = .36. The rating of USAP usability benefits also collected qualitative data by asking participants to provide any comments if any of the USAP usability benefits were not clear to them. The targeted USAP usability benefits Accelerates error-free portion and Reduces Impacts of Slips received the most comments, mostly about ambiguous meaning and language that was not familiar. The other (non-targeted) USAP usability benefits did not receive similar comments. 1 2 Included five additional responses from the pretest that were not part of the 35 clean data sets. Included three additional responses from the pretest that were not part the 35 clean data sets. Evaluating Usability-Supporting Architecture Patterns 325 Fig. 1. USAP Usability Benefits Ratings When asked if they found that leveraging USAPs would be useful for their software design activities, 24 of 35 participants agreed. However, there was a directional difference between the control group and the experiment group. Of the 24 that agreed, 15 were from the control group and 9 were from the experiment group. The experiment group experienced an increase from 0 to 6 participants in the selection of the no opinion choice when compared to the control group. When asked if there is a communication gap between usability professionals and software engineers, 29 of 35 participants agreed. When participants were asked how likely it would be for them to go and learn more about USAPs after completing the survey, 25 of 35 agreed. 4 Discussion H.1 predicted that usability professionals expect to get benefits of UsabilitySupporting Architecture Patterns in their everyday work. During the pretest 66% of the participants agreed that USAPs could enable usability professionals identify usability concerns that impact the architecture of a software system. However, it is unclear why 66% agreed because no participants reported to have a priori knowledge of USAPs, and of the 60% that reported knowing methodologies to improve this gap, none reported USAPs. One possible explanation for this result could be that the term “usabilitysupporting” along with “architecture-patterns” could lead to an implicit belief that USAPs are beneficial. In the posttest, 68% of the participants reported USAPs as 326 E. Luzcando, D. Bolchini, and A. Faiola useful for software design activities based on what they had learned in the survey. However, agreement was directionally different between the control group (62%) and the experiment group (38%). This difference could stem from the participants comfort in selecting the no opinion choice. The selection of the no opinion choice could be an effect of receiving the treatment. It is possible that after participants received the treatment and were exposed to the USAP scenario, they did not understand its purpose or were perhaps confused by the presentation of the materials. For example, it could be that the USAP scenario of canceling a command did not easily apply to their experience, and therefore did not add clarity about the usefulness of USAPs. Conversely, it is possible that participants that did not receive the treatment and did not see the USAP materials were able to imagine (or construct) their own idea of what USAPs are, which in their view might be more effective than the actual USAPs. However, there was no effect found for the treatment in determining the difference between the pretest and posttest difference (p> .10). H.2 predicted that usability professionals can perceive the importance in using Usability-Supporting Architecture Patterns for their everyday work. During the pretest, 100% of the participants acknowledged that usability is an important aspect of software design, and 86% of participants acknowledged they have previously found it challenging to apply usability principles in a software design project. This suggests that participants understood the importance of usability in software design and the challenges of applying usability principles therein. Hence, the fact that 71% of participants responded that they would likely investigate USAPs further and learn more about them is a potential indication of their usefulness. However, it is possible that the perceived importance of USAPs is a result of recognizing that any technique to improve usability is innately important to usability professionals. This study did not analyze this further. H.3 predicted that if Usability-Supporting Architecture Patterns are communicated in more natural HCI terminology to usability professionals, they can better appreciate the value of Usability-Supporting Architecture Patterns in their everyday work. We predicted that when participants received the treatment they would rate USAP usability benefits as more important since they had (in the treatment) been exposed to a positive introduction of USAP usability benefits and potential use in software design. The effect of the treatment was non-significant (p > .10) for the ratings. When contrasting the control group with the experiment group, the targeted USAP usability benefits Accelerates error-free portion and Increases efficiency exhibited an 18% reduction in rating of importance when compared to their newly worded counterparts Increases Efficiency and Reduce the impact of errors. However, there was no significant effect found for the treatment (p = 0.63). An unexpected yet interesting result of the experiment was that participants in Region 1 (Europe) responded differently than those in Region 2 (US) when rating the importance of the target USAP usability benefits Accelerates error-free portion and Reduces impact of slips. US usability professionals rated the target USAP usability benefits more important than European usability professionals, which is a potential indication that USAPs are more difficult to understand for European usability professionals than for US usability professionals. Evaluating Usability-Supporting Architecture Patterns 327 5 Conclusion This study suggests that usability professionals' initial perception of USAPs is positive. Participants agreed that USAPs are relevant for considering usability concerns in software design, and that usability professionals recognize there is a communication gap with software engineers. However, exposure to USAP materials did not conclusively affect their perception of USAPs. The study suggests that usability professionals generally accept the notion of USAPs without understanding USAP details. This effect was more prominent for US participants in the study, in contrast with their European counterparts. More studies would need to be performed to evaluate additional characteristics of USAPs and their potential acceptance by usability professionals. Acknowledgments Thanks to Dr. Mark Pfaff for his guidance in conducting the statistical analysis for several parts of this study. References 1. Snyder, C.: Paper Prototyping: The Fast and Easy Way to Design and Refine User Interfaces. Morgan Kaufmann, San Francisco (2003) 2. Karat, J.: Taking Software Design Seriously. Academic Press, San Diego (1991) 3. Preece, J., Rogers, Y., Sharp, H.: Interaction Design: Beyond Human-Computer Interaction. John Wiley & Sons, New York (2002) 4. Seffah, A., Gulliksen, J., Desmarais, M.C.: Integrating Usability in the Development Process. In: Seffah, A., Gulliksen, J., Desmarais, M.C. (eds.) Human-Centered Software Engineering: Integrating Usability in the Software Development Lifecycle, pp. 3–14. Springer, Dordrecht (2005) 5. John, B.E., Bass, L., Sanchez-Segura, M.I., Adams, R.: Bringing Usability Concerns to the Design of Software Architecture. In: 9th IFIP Working Conference on Engineering for Human-Computer Interaction and 11th International Workshop on Design, Specification and Verification of Interactive Systems, Hamburg, Germany (2004) 6. Edwards, W.K.: Infrastructure and Its Effect on the Interface. In: Erickson, T., McDonald, D.W. (eds.) HCI Remixed: Reflections on Works That Have Influenced the HCI Community, pp. 119–122. MIT Press, Cambridge (2008) 7. Golden, E., John, B.E., Bass, L.: The value of a usability-supporting architectural pattern in software architecture design: a controlled experiment. In: 27th International Conference on Software Engineering ICSE, St. Louis, Missouri, p. 460 (2005) 8. Adams, R., Bass, L., John, B.E.: Experience with using general usability scenarios on the software architecture of a collaborative system. In: Seffah, A., Gulliksen, J., Desmarais, M.C. (eds.) Human-Centered Software Engineering: Integrating Usability in the Software Development Lifecycle, pp. 87–112. Springer, Dordrecht (2005) 9. John, B.E.: Evidence-Based Practice in Human-Computer Interaction and Evidence Maps. ACM SIGSOFT Software Engineering Notes 30, 1–5 (2005) 328 E. Luzcando, D. Bolchini, and A. Faiola 10. Creswell, J.W.: Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. Sage Publications, Thousand Oaks (2003) 11. Campbell, D.T., Stanley, J.C.: Experimental and Quasi-Experimental Designs for Research. Houghton Mifflin Company, Boston (1963) 12. Dillman, D.A.: Mail and Internet Surveys: The Tailored Design Method. John Wiley & Sons, New York (2000) 13. Schuman, H., Presser, S.: Questions and Answers in Attitude Surveys. Academic Press, New York (1981) Heuristic Evaluations of Bioinformatics Tools: A Development Case Barbara Mirel and Zach Wright University of Michigan (bmirel,zwright}@umich.edu Abstract. Heuristic evaluations are an efficient low cost method for identifying usability problems in a biomedical research tool. Combining the results of these evaluations with findings from user models based on biomedical scientists’ research methods guided and prioritized the design and development process of these tools and resulted in improved usability. Incorporating heuristic evaluations and user models into the larger organizational practice led to increased awareness of usability across disciplines. Keywords: Usability, heuristic evaluation, biomedical research, organizational learning, user models. 1 Introduction Assuring usefulness and usability–a perennial challenge in any software project–is particularly tricky in bioinformatics research and development contexts. Our NIHfunded bioinformatics center produces tools for systems biology analysis. The databases and tools enable biomedical researchers to interactively analyze genomic-level data for the purpose of uncovering systemic functional roles that candidate genes/gene products may play in susceptibility to a disease. Ensuring the usability of these tools is a challenge because we are not a software shop and must optimize the combination of academic and implementation specialties that we have available.The discount usability inspection method of heuristic evaluations is highly attractive. We recognize that heuristic evaluations (HE) alone—the process of scoring tools for their concordance with usability standard—are insufficient for detecting and generating improvements for significant usability and usefulness advances [8]. Therefore, we integrate heuristic evaluations with three processes known to enhance their effectiveness: (1) Evaluators are familiar with the tools and users’ query and analysis tasks; (2) heuristics—i.e., the usability principles against which tools are judged—are adapted to the domain and tasks specific to the tools, and (3) heuristics and interpretations of findings are informed by user models of researchers’ analytical performances and goal-driven cognition [6]. Additionally, we recognize that assessments of our web-based bioinformatics tools must account for the support of more complex explorations than user interfaces (UI)/web pages originally targeted by usability inspection methods support. Toward this end, we combine heuristic evaluations with research, development, and other organizational processes. This integration facilitates our abilities to distinguish real J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 329–338, 2009. © Springer-Verlag Berlin Heidelberg 2009 330 B. Mirel and Z. Wright problems in the results, set priorities for fixes, and raise developers’ awareness of user needs beyond surface fixes to better build for users’ cognition in scientific analysis. Our outcomes have been positive. We argue that for our bioinformatics tools, positive results hinge on combining domain-based, user-informed heuristic evaluations with organizational processes that break down boundaries isolating usability from development, modification request decisions, and UI design. 2 Relevant Research Heuristic evaluations involve “evaluators inspect[ing] a user interface against a guideline, be it composed of usability heuristics or cognitive engineering principles, in order to identify usability problems that violate any items in the guideline”[8]. This method is known to produce many false positives and likely omissions of problems related to users’ cognitive tasks. It nonetheless is one of the most popular usability assessment methods due to its low costs and efficiencies [2]. Thus it is important to improve the effectiveness of HEs without diminishing their benefits. Researchers have found several ways to achieve these improvements. They include conducting heuristic evaluations with many evaluators and combining them with evaluator training and reliability testing increase the effectiveness of HEs [10,12]. Heuristic evaluation results also improve when evaluators have prior knowledge of usability and the tool; when heuristics are adapted to domain tasks and knowledge; and when HE findings are compared with results from user performance studies [3]. Finally, improvements come from using sets of heuristics that are “minimal” (not overlapping) yet inclusive [10]. For example, some researchers have evaluators jointly consider heuristics and problem areas, thereby assessing to a “usability problem profile” [2]. Establishing an optimal set of heuristics, however, is still a black box. To compensate for elusive “ideal heuristics,” many usability researchers advocate integrating findings from user performance studies with HE. Demonstrably, heuristic and user performance evaluations combined uncover more problems than either method does alone. Yet quality not just quantity of problems is critical. For better quality, some researchers claim that what is missing in Nielsen’s standard set of heuristics is that they are not “related to cognitive models of users when they interact with system interfaces” [8]. Cognitively-oriented heuristics are especially important when tools support complex tasks. Recent attempts to construct heuristics that address cognition include Gerhart-Powel’s [5] cognitive engineering principles and Frokjaer and Hornbaek’s [3] principles based on metaphors of thinking. So far findings about the superiority of such heuristics have been mixed [4,8]. Running in parallel with these academic efforts, some studies by specialists in production contexts aim to improve the effectiveness of HEs by advantageously combining them with organizational processes. Hollinger [7], for example, reports on positive efforts at Oracle—against great organizational resistance at first—to combine bug reporting processes with HE findings, thereby “mainstreaming” reviews of outcomes. This mainstreaming increased usability awareness across different teams and functional specialties, incited interactive team discussions about usability, initiated tracking the costs and benefits of usability improvements, and resulted in fixing more usability defects. Moreover, results included “significant improvements in the quality of the user interface” [7]. Heuristic Evaluations of Bioinformatics Tools 331 Exploiting organizational processes is promising but, to the best of our knowledge, few production context studies report on combining HE with even more organizational processes than Hollinger [7] describes or on combining organizational processes with the established methods of improving HE outcomes by comparing them with usability performance findings, assuring evaluator familiarity with the tools, and adapting heuristics to the task domain. 3 Methods Our methods are tied to achieving the same effectiveness with HE that other researchers seek by combining them with other factors. Unfortunately, due to resource constraints we could not conduct extensive training of evaluators or involve numerous evaluators. We could, however, get several evaluators familiar with the tools, adapt and pilot test heuristics to our domain and tools, and introduce several new organizational processes. We also introduced the novel process of reframing surface problems found by HEs into more substantial problems-based on user models. 3.1 Tools We report on heuristic evaluations of one open source, web-based query and analysis tool. The tool is the front end for querying our center’s innovatively integrated protein interaction database. The query and analysis tool lets users query by gene(s), keyword(s), or gene list and provides tabular query results of relevant genes, attributes, and interactions. The tool is non visual but links to visualization tools. 3.2 User Task Models User models were derived from longitudinal field studies of 15 biomedical researchers using our tools and others to conduct their systems biology analysis [9]. These models directed both our adaptations and interpretations of heuristic evaluations. The user models are unique in bioinformatics because they captures scientists’ higher order cognitive and analytical flow for research aimed at hypothesizing and not only lower level tasks that are typically studied in usability tests, cognitive walkthroughs, or cognitive task analysis. Specifically, the user models capture moves and strategies for verifying accuracy, relevance, and completeness and uncovering previously unknown relationships of interest. These tasks involve manipulating data to turn it into knowledge through task-specific combinations of sorting, selecting, filtering, drilling down to detail, and navigating through links to external knowledge bases and literature. Additionally, to judge if genes and interactions are interesting and credible, scientists analyze high dimensional relationships and seek contextual cues from which to draw explanatory inferences. Ultimately, they examine conditions and causes in interactive visualizations, tools outside the scope of this article. This empirically-derived model of higher order cognition was critical to adapting standard Nielsen heuristics to our domain and tool. 332 B. Mirel and Z. Wright 3.3 Adapted Heuristics We adapted Nielsen’s standard set of 10 usability heuristics to our domain and uses of our tools to include the following: The presence of external links to multiple data sources and internal links to details and the large amounts of heterogeneous data in result sets; the core need for statistics and surrogates for confidence; and the variety of interactions needed for validating, sensemaking, and judging results. 3.4 Heuristic Evaluations and Evaluators Three evaluators pilot tested the adapted heuristics with other query and analysis tools developed by our center to refine their applicability to the domain and users tasks. One evaluator is trained in usability and visualizations and the other two evaluators specialize, respectively, in portal architecture and systems engineering and in web administration and marketing communications. All were knowledgeable about the tools and moderately aware of users’ tasks and actual practices through discussions with the usability director about field study findings. No reliability testing was done due to time constraints. Instead, inter-evaluator differences were analyzed by exaining comments entered in the comments field in the instrument. After heuristic evaluations were conducted, outcomes and comments were summarized and grouped by agreement and severity. Relevant design changes were suggested. 3.5 Integration of Additional Processes Concurrent with the heuristic evaluations, the following organizational and software development life cycle processes were instituted with enhanced usability in mind: • Usability categories and severity levels were built into in the modification request (MR) system. Levels were: Minor, serious, major, critical, and failure, and they were coordinated with a newly instituted Technical Difficulty ranking. • Operational processes were put into place for turning MRs into long term development priorities and for raising awareness of user models and their requirements. Our processes included forming a new committee for setting priorities composed of the directors of computer science, life sciences, and usability along with the lead developer and project manager. • Informal and highly collaborative processes between developers, web designers, usability evaluators, and scientists were implemented to assure rapid prototyping and feedback • A research project was initiated into design requirements based on heuristic evaluation findings and user models. 4 Results 4.1 Evaluation Outcomes Conducting the heuristic evaluations took on average two hours/evaluator. Summarizing added another few hours to the effort. Sample summary outcomes are shown in Table 1. Those with agreed upon high severity are highlighted. Heuristic Evaluations of Bioinformatics Tools 333 Table 1. Sample of results summarized from heuristic evaluations Heuristic 1. Currency of the tool web pages 2. Readable text 3. Hints for formulating a query for better results 4. Able to undo, redo, go back 5. Broken links 6. Examples included and prominently 7. Currency of the data; data sources cited 8. Clearly shows if no results occur 9. Able to change result organization 10. Vital statistics are available. 11. Information density is reasonable 12. Clear what produced the query results 13. Clear why results seem high or low 15. Can access necessary data for validating Problem severity /agreement High/agreement Problem(s) Design change No date present High/agreement High/agreement Small font No hints available Indicate last update to web pages 12 point font Need query hints when the query fails. High/agreement No history tracking; High/agreement “Top of page” is broken Could use more examples and better emphasis Versions of dbs are listed but no dates of latest updates Shows, but the message isn’t clear Sort is available but not apparent What would those stats be? A lot of whitespace; too many sections Range/no agreement (high to low) Range/no agreement (high to low) Range/no agreement (high to low) Range/no agreement (high to low) Range/no agreement (high to low) Range/no agreement (high to lo) Range/no agreement (high to low) Middle/agreement Low/ agreement Should redisplay search term so user ties it to results No explanations; I assume informed user knows why Not sure what the data would be Provide history tracking Fix [list of broken links] Add 1-2 (bolded) examples under the search box Add a date for last updating to our database Change message to: [Suggestion] Need note that columns are sortable No agreement Get rid of the 5 nested boxes; No agreement Not clear where the search term is “hitting.” No agreement As Table 1 shows, highly ranked problems involved broken and missing features and web page omissions that could be added without programming. Middle-ranked problems were tied more to user task needs and subjective issues such as what constitutes either “enough” support or the criteria scientists use for judging reliability/validity. Problems with little agreement about severity level were tied even more to evaluators having to project and evaluate the importance of scientists’ task needs in this domain. For example, evaluators varied widely in judging the importance of validation in scientists’ ways of reasoning and knowing. Some actual problems were not caught by the heuristic evaluations, especially those involving novel and unexpected ways users might interact with the interface. These findings were provided by the field studies. Additionally, evaluators’ comments and the summarized design changes ranged from precise to vague. Typically, design changes for familiar problems in 334 B. Mirel and Z. Wright generic UI design were precise; those tied to user task models for systems biology and complex exploratory analysis were not. 4.2 Integrating Organizational Processes Interpretations and the actions taken on the outcomes of heuristic evaluations took the following course organizationally. As noted in Methods, design changes were entered into the MR system and ranked for severity and degree of development effort. Low cost problems at the levels of failure, critical, major, and serious–e.g. broken links— were delegated and fixed immediately. Concurrently, areas where the heuristic evaluation outcomes combined with problems pertinent to scientists’ demonstrated practices in the field (as captured in the user models) were examined. From these analyses, important combinations of problems found by the HE surfaced—combinations that implied problems related to higher order cognitive task needs. For example, problems 3, 6, 8, 9, 12, and 13 in Table 1 were observed as a recurrent cluster in the field observations as part of scientists’ higher order task for locating interesting genes and relationships expediently. For this task, scientists progressively narrow down results sets based on several meaningful attributes and on validity standards, such as genes/gene products enriched for neurodegenerative processes. Once combined, this set of HE problems revealed scientists’ difficulty manipulating queries and output sufficiently to uncover potentially interesting relationships. Thus beyond easy fixes—e.g. column cues for sorting— deeper implications of a tool’s actual usefulness were uncovered by the combined HE problems and user model. Shaped by the user models developed at our center and by ongoing research in our into design requirements, issues like the example above were presented to the usability and development teams and then brought to the priority setting committee. For example. problems related to users being able to narrow down to interesting results led to realizations that the tool needed to provide a more powerful search mechanism, extensive indexing, and interfaces that allowed users to construct/revise queries using multi-dimensional. Another priority setting issue suggested by the HE outcomes and better understood through the user models was the need for specific types of additional content for users’ validation purposes. Both needs received high priority. Additionally, as the software developers became more aware of the value of these usability techniques, we started to get requests for the heuristic evaluation instrument itself so that programmers could keep the criteria in mind while in the process of developing their software. 5 Discussion Developing the heuristic evaluation instrument was an iterative process as the evaluators discovered its weaknesses and strengths during the course of evaluations. Many of the heuristics turned out to be redundant and were either combined or discarded. Close inspection of the tools also engendered new heuristics as evaluators noticed additional usability problems. Accompanying comments proved to be crucial and were made mandatory for any problems found in later evaluations. The severity Heuristic Evaluations of Bioinformatics Tools 335 numbering system also proved to be too abstract and will be replaced by ratings that mirror the ones used in the MR system. Finally, some heuristics in the instrument proved to be too theoretical or complex to be useful (e.g. “salient patterns stand out”) and had to be removed or refined. Some of these difficult heuristics were less concrete and were often better suited to incorporation and analysis within the user model. In tool assessments, heuristics alone identified isolated problems and a few inaccuracies. Combined with the user model, the heuristic evaluations enabled us to uncover problems related to integrated tasks associated with scientists’ higher order analysis and reasoning. Evaluators’ written comments, omissions, imprecision in some proposed design changes, and lack of agreement about certain items were vital in cuing us to further examine particular problems or combinations of problems in light of the user models. Had time and resources permitted, reliability testing would have diminished disagreements. A positive unintended consequence of these disagreements, however, was that they revealed where developers’ awareness of user tasks was incomplete. For example, in the heuristic evaluations, comments about “the ability to change the organization of results” indicated that the tool did not make it obvious that columns could be sorted. The user model revealed, however, that the untransparent sorting was only one shortcoming related to this specific heuristic. In actual practice, scientists’ analysis and judgments required tools to provide a combined set of sortingand-filtering interactions to rearrange results into multidimensional groupings—i.e. interesting relationships. Reframed to account for this need, this problem led to high priority, enhanced functionality. Unlike in Hollinger’s study, many usability problems—framed in ways that join heuristic evaluation outcomes and user models—were given high priority status. For such achievements, collaborations across specialties were critical—formally and informally. Developers, web specialists, project managers, scientifically expert bioinformatics specialists, and the usability, scientific, and computer science directors all played distinct roles in shaping the perspectives needed for strategically determining and then implementing a better match between tools and systems biology tasks. In the process, people across specialties grew increasingly aware of each others’ perspectives and began slowly evolving a shared language for articulating them. This process is often termed “double-loop learning” and is essential for innovation [1]. One example of this cross-organizational learning is the software developers’ requests fro the heuristics to help guide software development. Vital to this learning and the common grounding on which it rests is the perennial challenge of assuring that heuristics are expressed in the right grain size and language. As with other research focused on this goal, our center’s efforts have highlighted places to make heuristics more concrete and ways to join outcomes with user models. 6 Conclusions In our center’s case, collaborative communication, shared language, and greater awareness—i.e. double-loop organizational learning—were integrated into and developed from heuristic evaluations. We found a way to use this discount usability inspection method combined with user models and newly implemented organizational processes, 336 B. Mirel and Z. Wright to reframe problems and to gain buy-in for short and long term usability improvements aimed at scientists’ cognitive task behaviors. Heuristic evaluations coupled with user modeling revealed problems related to the higher order cognitive flow of analysis Combined with organizational and software development processes that encouraged attention to usability, heuristic evaluations produced results and recommended changes that received high priority. Moreover, developers and directors who previously had not considered usability in choices they about knowledge representations or functionality now grew increasingly sensitive to the implication of their choices from a user perspective. Our center continues to refine the instrument and apply it to other tools and is simultaneously creating a complementary instrument for heuristic evaluation of interactive visualizations in bioinformatics tools. References 1. Argyris, C., Schön, D.: Organizational learning II: Theory, method and practice. Addison Wesley, Reading (1996) 2. Chattratichart, J., Lindgaard, G.: A comparative evaluation of heuristic-based usability inspection methods. In: Proceedings of ACM CHI 2008 Conference, pp. 2213–2220. ACM Press, New York (2008) 3. Cockton, G., Woolrych, A.: Understanding inspection methods: lessons from an assessment of heuristic evaluation. In: Blandford, A., Vanderdonckt, J. (eds.) People & Computers XV, pp. 171–192. Springer, Berlin (2001) 4. Frokjaer, E., Hornbaek, K.: Metaphors of human thinking for usability inspection and design. ACM Transactions on Computer-Human Interaction 14, 1–33 (2008) 5. Gerhardt-Powals, J.: Cognitive engineering principles for enhancing human-computer performance. International Journal of Human-Computer Interactions 8, 189–211 (1996) 6. Hartson, H., Anndre, T.S., Williges, R.: Criteria for evaluating usability evaluation methods. International Journal of Human-Computer Interaction 13, 373–410 (2001) 7. Hollinger, M.: A process for incorporationg heuristic evaluation into a software release. In: Proceedings of AIGA 2005 Conference, pp. 2–17. ACM Press, New York (2005) 8. Law, E.L.-C., Hvannberg, E.T.: Analysis of strategies for improving and estimating the effectiveness of heuristic evaluation. In: Proceedings of ACM NordiCHI 2004, pp. 241–250. ACM Press, New York (2004) 9. Mirel, B.: Supporting cognition in systems biology analysis: Findings on users processes and design implications. Journal of Biomedical Discovery and Collaboration (forthcoming) 10. Nielsen, J.: Heuristic evaluation. In: Nielsen, J., Mack, R.I. (eds.) Usablity Inspection methods, John Wiley, Chichester (1994) 11. Nielsen, J.: Enhancing the explanatory power of usability heuristics. In: Proceedings of ACM CHI 1994 Conference, pp. 152–158. ACM Press, New York (1994) 12. Schmettow, M., Vietze, W.: Introducing item response theory for measuring usability processes. In: Proceedings of CHI 2008, pp. 893–902. ACM Press, New York (2008) Heuristic Evaluations of Bioinformatics Tools 337 Supplemental Material: Adapted Heuristics Heuristic Severity Rating 0 = no problem 5=major problem First Impression Does the tool fit the overall NCIBI look and feel? Does it look professional? Is the tool appropriately branded with funding source and NCIBI, CCMB, and UM logos? Does the tool link back to UM, CCMB, and NCIBI? Is it clear what to do and what to enter? (limitations are clear, how to format query is clear, what options user has, if a user needs to enter terms from some taxonomy/ontology access to those terms is available for user to choose from) Are there examples shown and are they prominent? Is the display consistent with user conventions for web pages/apps? Is it clear why use the tool and to what purpose? Does it require minimal steps to get started quickly? Is the cursor positioned in the first field that requires entry? Is help readily available? Is it clear how current the data are? Is it clear how current the website is? Are data sources cited and identified? Are appropriate publications cited? Are there any broken links? Are the page titles (displayed at the top of the browser) meaningful and change for different pages? Are page elements aligned (e.g. in a grid) for readability? Is the site readable at 1024x768 resolution? Is the text readable? (e.g. size, font, contrast)? Does the page have appropriate metadata tags for search engines? Search / Results Is the length of processing time acceptable? Do adequate indicators show system status and how long it may take? Clearly shows if there are no query results? Clearly shows how many results query produces? Is it clear what produced the query results? Is it easy to reformulate query if necessary? Are there hints/tips for reformulating query for better results? If the query results seem high or low is it clear why? Are the results transparent as to what results are being shown and how to interpret it? Are the results displayed clearly and not confusing? Is there an ability to detect and resolve errors? Interaction with Results Is there an ability to filter or group large quantities of data? Is there an ability to change the organizations of results? Is there ability to undo, redo, or go back to previous results ? Are the mechanisms for interactivity clear? Is the logic of the organization clear? Are different data items (e.g. rows) kept clearly separate or delineated? 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A Comments 338 B. Mirel and Z. Wright If there are links is it clear where they go? If there are icons is it clear what they do? Do the link outs provide reliable return? Are the vital statistics/counts of information available? Do the names/labels adequately convey the meaning of items/features? Are data items kept short? Is there too much/little information? Is the density of information reasonable? Can you access the necessary data to assure validity? (e.g. sources) Can results be saved? Are the results available for download in other formats? Can the pages be easily printed? Is vertical scrolling kept to a minimum? Is there horizontal scrolling? Comments Additional comments go here 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A 0 1 2 3 4 5 N/A A Prototype to Validate ErgoCoIn: A Web Site Ergonomic Inspection Technique Marcelo Morandini1, Walter de Abreu Cybis2, and Dominique L. Scapin3 1 School of Arts, Science and Humanities – University of Sao Paulo, Sao Paulo, Brazil m.morandini@usp.br 2 Ecole Polytechnique Montreal, Canada walter.cybis@polymtl.ca 3 Institut National de Recherche en Informatique et Automatique Rocquencourt, France dominique.scapin@inria.fr Abstract. This paper presents current actions, results and perspectives concerning the development of the ErgoCoIn approach, which allows non expert inspectors to conduct ergonomic inspections of e-commerce web sites. An environment supporting inspections based on this approach was designed and a tool is being developed in order to accomplish its validation plan. Besides this validation, the actions to be undertaken will allow us to analyze the task of applying checklists and specify an inspection support environment especially fitted for that. This is of great importance as this environment is intended to be an open web service supporting ergonomic inspections of web sites from different domains. A wiki environment for this tool development is also being proposed. Keywords: Usability, Evaluation, Web Sites, Inspection, Web 2.0. 1 Introduction An important attribute for most interactive systems is the level of usability they offer to users while accomplishing their tasks. According to ISO9241:11, usability is characterized by the effectiveness, efficiency, and satisfaction with which users achieve specified goals in a particular environment [9]. In such a way, usability is a blend of objective and subjective task oriented measures. Effectiveness can be objectively measured by the rate of users’ achievement (with accuracy and completeness) of specific goals. Efficiency can also be objectively measured by the amount of resources expended on task by actual users. User satisfaction concerns subjective data indicating how well users evaluate the system’s comfort and acceptability. Usability can be measured during user interactions with the system and evaluated by evaluators and/or inspectors that may judge how well the user interface aspects are, a priori, fitted to users, tasks and environments. In doing so, they judge the ergonomics of that user interface. Usability and ergonomics are linked to a cause-effect relationship. The more ergonomic (or fitted) the interface is the higher is the level of J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 339–348, 2009. © Springer-Verlag Berlin Heidelberg 2009 340 M. Morandini, W. de Abreu Cybis, and D.L. Scapin usability it can afford to its users [6]. Considering the software product quality model 1 proposed by ISO 9126 , ergonomics may be understood as an external quality of the software while the usability is the quality of its use [8]. Methods aimed to measure usability (usability tests) are known to be usually expensive and complex [13]. Alternatively, ergonomics of the user interfaces can be evaluated or inspected faster and at lower costs. A simple differentiation between evaluations and inspections can be established based on the type of the knowledge applied to the judgments involved with both techniques. Evaluators apply mainly implicit knowledge they accumulated from study and experience, while inspectors apply primarily the explicit knowledge supported by documents, such as checklists. Inspectors cannot produce fully elaborated or conclusive diagnosis, but their diagnoses are comparatively coherent and generally obtained at low cost. ErgoCoIn [5] is an approach designed to provide support to inspectors in order to allow them to perform objective web sites ergonomic inspections. With the goal of improving the quality of the diagnoses, this approach postulates several considerations about the web site’s context of use, including: users, tasks and environments attributes. Among them must be considered the ones concerning the interface of the web site under evaluation [9]. Content of interviews/questionnaires as well as of the others contextual data gathering activities are based on information demand presupposed by the approach’s knowledge base. Such strategy allows performing specific objective ergonomic inspections: only pertinent information gathering is proposed to the inspectors in the context of use analysis, and only applicable questions are presented to them while inspecting the web site. The ErgoCoIn checklists can support the inspectors by providing more homogeneous results when compared to those produced by ergonomic experts. This is an obvious consequence of having inspectors applying the same checklist set of questions and sharing decisions about their relative importance. This approach is interesting to web sites designers and evaluators due to the fact that questionnaires and checklists can be applied by the design staff, not necessarily experts in usability evaluation. Thus, the inspections can be usually performed quickly and at low costs. It can also be considered as a way to introduce ergonomic concepts to designers and to stimulate them in their daily work to be questioning human factors specialists when facing potentially serious ergonomics problems. In this paper we present details about both the ErgoCoIn logical architecture and the tool built to validate the approach: (i) low cost, (ii) objectivity and (iii) homogeneity of inspection diagnosis. The other requirements that were identified include the variety and novelty of the knowledge base. In order to achieve the fulfillment of these requirements, we propose the development of a collaborative effort aimed to insure that the ErgoCoIn knowledge base can be enriched continuously. We believe that inspections supported by an environment that incorporates these features can be more efficient and reliable. 1 In fact, ISO 9214:11 and ISO 9126:1 don’t agree completely about the terminology concerning the “a priory” and the “a posteriori” perspectives of usability. While the first standard employs “ergonomics” and “usability”, the second one employs “usability” and “in use quality” to denote these perspectives. A Prototype to Validate ErgoCoIn 341 This paper contains 5 sections: Section 2 presents an overview of the ErgoCoIn approach. Section 3 presents the logic architecture of an environment aimed at supporting the software application, as well as introduces the tool that is being developed for validating the ErgoCoIn approach. Section 4 presents the motivation and proposal for developing a cooperative perspective to the development of a Wiki ErgoCoIn. And finally section 5 presents the conclusions that can be considered for this environment future development and use. 2 The ErgoCoIn Approach The ErgoCoIn approach development has been motivated by four considerations: (1) web sites development became achievable to a large spectrum of designers (through easily available design tools), not necessarily skilled in computer science or in ergonomics; (2) web sites are often designed along a fast and low cost design process supported by non expensive tools which may lead designers to include numerous and sometimes obvious ergonomic flaws; (3) usability evaluations using the “traditional” methods can be expensive and (4) their results may lack homogeneity [5]. The approach is divided into two main phases: web site Contextual Analysis and Ergonomic Inspection of the components and their attributes (see Figure 1). The Co-Description Phase is based mainly on surveys. Before conducting questionnaires and interviews, inspectors must identify the components of the user interface that will be inspected. The reason for that is to guarantee that, during surveys, the inspectors will collect only the contextual data that is appropriate to inspections of the actual user interface components. Surveys are supposed to be conducted with both users and designers. From users, inspectors are supposed to gather data concerning their profile, work environment and the strategies they apply to accomplish tasks using the web site. Task strategies are described simply as a sequence of pages that the users may access when accomplishing their goals. Satisfaction issues should also be gathered in surveys from users. From designers, inspectors should gather information about the expected context of use, including data concerning the user profile and task strategies. Results from surveys are examined in order to establish comparisons between context of use elements and particular task strategies as prescribed by both users and designers. The second phase of the approach is characterized by ergonomic inspections based on checklists. This sort of technique distinguishes themselves by their organization and content, and, specifically, are defined as a set of checklists items organized according to the Ergonomic Criteria [13] basically related to the ergonomics of web sites supporting e-commerce initiatives. This questions based approach was built from the examination of a large collection of ergonomic recommendations compiled by INRIA researchers [1,14]. Each recommendation selected was reformulated as a question and associated to one ergonomic criterion. Like any other inspection dynamics, application of each ErgoCoIn inspection question follows 3 decision phases: applicability, weighting and adherence. For objectiveness, the checklists should propose only questions which are applicable to the actual web site context of use and interface components. This is insured by having all questions in the ErgoCoIn knowledge base properly indexed to the context 342 M. Morandini, W. de Abreu Cybis, and D.L. Scapin Fig. 1. The ErgoCoIn Approach Framework of use aspects (user, task, environment and interface) as gathered from both users and designers. Further, each applicable question has to be weighted in order to allow the production of properly ranked results. Particular decisions about what is more important to be considered when inspecting e-commerce web sites were taken by the ErgoCoIn designers, but they can be modified by inspectors while inspecting web sites from different application domains. For simplicity, the level of importance of an ergonomic criterion may define the level of importance of each individual question associated to it. Finally, user interface adherence to a question (or requirement) must be judged by the inspectors. They do that based on the information concerning ergonomic requirements or questions (explanations, examples and counter examples) and also the data describing the web site´s context of use (concerning users, tasks and environment). Also, the ErgoCoIn application presupposes that information about the context of use should be directly collected from users and designers with the support of questionnaires and/or interviews. As a consequence, the approach can only be applied to web sites that are being used regularly. Furthermore, it is also necessary to have some designers and users available for the interviews or, at least, able to answer some questionnaires. The ErgoCoIn approach was designed to allow extensions and instantiations. The questions base can be extended to consider other type of perspectives, not just the ecommerce, but other domains, like e-learning for instance. Ergonomic Criteria and associated questions can be ranked differently in order to define a weight for the questions in accordance to the context of use of the web site under inspection. Another kind of extension that is being considered concerns the integration of the results from the analysis of usage log data produced with this approach. Such data can be collected using specific software tools for this purpose. In fact, a usability oriented A Prototype to Validate ErgoCoIn 343 web analyzer called UseMonitor is being developed and associated to the ErgoCoIn approach [4]. This tool can present warnings about the “a posteriori” perspective on usability problems, i.e., interaction perturbations occurring while users are interacting with the web site in order to accomplish their goals. Basically, the UseMonitor can indicate when the observed efficiency rate is particularly low. Detailed efficiency indication is about the rates and time spent of unproductive users’ behaviors like solving error, asking help, hesitation, deviation, repetition and so on. Further, the UseMonitor can indicate web pages related to this kind of perturbations. A logic architecture based on the integration of (i) a typology of usability problems, (ii) the ergonomic criteria/recommendations and (iii) a model of interface components is also being defined. This will allow the UseMonitor warning the inspectors about a detailed interface aspect causing an actual usability perturbation (a posteriori result), while ErgoCoIn will be helping inspectors identifying the user interface component responsible for such perturbation as well as indicating how to fix it (a priori result). The integration of ErgoCoIn and UseMonitor defines the ErgoManager environment [4]. As a tool for usability evaluation such an environment will be automating both processes, the failure identification (by log analysis) and failure analysis (by guidelines processing) [1]. Details of this architecture are being defined and will be detailed in future publications. 3 The ErgoCoIn Environment and Validation Tool A computerized environment was designed in order to support mainly the data capture concerned by the inspection and inquiry techniques proposed by the current configuration of the ErgoCoIn approach [10]. Contextual analysis will be supported by two collectors consisted basically on a series of forms. The Contextual information collector is aimed at guiding inspectors while gathering information from designers and users. The Web site description collector will collect description data concerning web sites functions and interface components. Description questions concerned by these collectors are extracted from the environment Knowledge base. Data gathered (contextual data and site description) in this phase is stored in a Context of use data base. The support to Ergonomic Inspections starts with an Analytic evaluator, that is a system component that compares users’ and designers' information concerning the intended and real context of use features. This component will verify the existence of designer's misconceptions about users’ features, and if necessary, sends warnings to the Checklist builder. The main function of this builder is to create checklists concerning the overall web site and its pages according to the task strategies described by users and designers. It can highlight questions which could reveal ergonomic flaws due to the lack of correspondence between users and designers views about the context of use. These checklists will propose only applicable questions arranged according to their level of importance. A default order of importance is suggested, but it can be modified by the inspectors when considering the characteristics of the current web 344 M. Morandini, W. de Abreu Cybis, and D.L. Scapin site context of use. Also, the inspectors’ judgments will be supported by the Ergonomic judgment support tool that will supply them with data about the context of use as well as the information about the questions. In order to validate the ErgoCoIn approach, we are developing a tool which follows the general architecture presented in Figure 2. This environment validation strategy consists in employing this tool to support different inspectors while accomplishing inspections of different web sites and by analyzing measures concerning effectiveness and efficiency of their actions as well as the homogeneity of their results. Fig. 2. Overview of the Logical Architecture of the ErgoCoIn Environment Based on the ErgoCoIn logic architecture, we have modeled data entities and created Entity-Relationship Models. We have also designed a Use-Case Map as well as a Sequence Diagrams for the main tasks. Figure 3 presents the Use Case Diagram for several registering tasks. Interactions for registering almost all kind of data defined in the EntityRelationship Model were designed according to the CDU (Create, Update & Delete) Model. They include the registering of inspectors, users, designers, web sites, tasks, web pages, interface components, ergonomic criteria and questions among others entities (see Figure 4). Doing so, we insure that interactions are quite homogeneous all over the interface tool. An exception is related to the interaction aimed at changing relative importance of the ergonomic criteria (see Figure 5). The first cycle of the ErgoCoIn´s implementation took place immediately after the conclusion of the design activities mentioned above. The first prototype is mainly concerned to ergonomic inspections and this version features a total of 182 questions registered that are linked to the 18 Ergonomic Criteria properly ranked. A Prototype to Validate ErgoCoIn 345 Fig. 3. Use Case Diagram for the ErgoCoIn Validation Tool Fig. 4. ErgoCoIn’s Users Storing Screen The next step of development will be focused on the functions supporting activities of the others phases: Co-Description (screens concerned with users and designers questionnaires) and Inspection Reports (see Figure 1). The Ergonomic judgment support tool development will be undertaken in the future as well. Once the tool is completed, we will start accomplishing cycles of validation studies focusing not only on the tool, but also on the underlying approach. These cycles will be consisted on phases of (i) planning activities, (ii) inspections achievements, (iii) results analysis and (iv) proposals of revision. At each cycle, a number of inspectors will be invited to use the tool in order to perform inspections of a given e-commerce web site. Results from all inspectors, as well as the log of their actions will be gathered and analyzed from the homogeneity and objectiveness points of view [3]. The goal behind revision proposals is to get inspections more objective and reports more coherent. Validation cycles will be repeated until expected objectiveness and homogeneity criteria have being reached. 346 M. Morandini, W. de Abreu Cybis, and D.L. Scapin Fig. 5. Screen Aimed at Receiving Definitions Concerning Relative Ergonomic Criteria Importance The inspections cycles will allow us to have a better understanding of the way tasks concerning ergonomic inspections of web sites are accomplished, and specify a tool specially fitted to those tasks. Indeed, we intent to specify an ErgoCoIn user interface able to support inspectors spread all over the world performing ergonomic inspections of web sites from different domains, not only the ones concerning e-commerce. The idea is to offer the tool to those who wants to make inspections, and wants to contribute to the enrichment of the ErgoCoIn knowledge base and programming code. 4 The Wiki-ErgoCoIn We propose to change the scope of the ErgoCoIn development in order to support a collaborative initiative. In fact, this kind of initiative is among the most interesting phenomena observed in the recent history of the web. Collaboration is authorized by special functions offered by web sites allowing users to create, share and organize the content by themselves. Best examples of socially constructed web sites are Facebook, Youtube, Flickr, Digg, del.ici.ous and Wikipedia. Particularly, the Wikipedia is the most successful example of collaboration concerning scientific content on the web. This socially constructed encyclopedia features remarkable internet traffic numbers as it is the 9th most visited web site in the whole Web. From 2001 to now, 7.2 million of articles were posted in Wikepdia. Those were produced by 7.04 million of editors following some style and ethic rules [16]. Wilkison and Huberman [17] performed a study concerning 52.2 million edits in 1.5 million articles in the English language Wikipedia posted by 4.79 million contributors between 2001 and 2006. They split out a group of 1,211 "featured articles", which accuracy, neutrality, completeness and style are assured by Wikipedia editors. Comparisons between featured and normal articles showed a strong correlation among the article quality, the number of edits and the number of distinct editors. In the same study, the authors could associate attractiveness of the articles (number of visits) to the edits novelty. A Prototype to Validate ErgoCoIn 347 The goal of having ErgoCoIn as a collaborative web initiative is to increase the generality and attractiveness of its contents as well as the quality of the results this approach could afford. Indeed, the Wiki-ErgoCoIn is being designed in order to allow ergonomic inspectors all over the world to share efforts and responsibilities concerning the ErgoCoIn knowledge base extension and generalization. In doing so, we can expect that the Wiki-ErgoCoIn will always feature newly proposed questions concerning ergonomics of web sites from different application domains, interface styles and components. Contributions should fulfill a basic requirement: follow free-content collaboration rules like those developed by Wikipedia. We believe that the results obtained by such cooperative approach can be much more efficient and reliable than the ones that would be obtained solely by individual initiatives. 5 Conclusions ErgoCoIn is an inspection approach strongly based on knowledge about ergonomics of web site’s user interfaces. This knowledge is intended to guide inspectors while undertaking contextual data gathering and analysis, checklists based inspections and report actions. In this paper we described details of this approach and the environment designed to support it. We have also introduced the tool that is under development to validate its structure and contents. We will perform the validation activities following cycles of application-analysis-revisions until the approach reaches expected objectiveness and homogeneity goals. But the success of the ErgoCoIn initiative depends basically on the variety and the novelty of its knowledge. Nowadays, this approach is linked to the ergonomics of the current e-commerce web applications and interfaces technologies, styles and components. Indeed, all these aspects may evolve continuously using just e-commerce may be a very limited scope. Consequently, there is the need to undertake actions in order to face the challenge of continuously getting ErgoCoIn contents up to date and varied to support the production of inspection reports in different web sites domains. An open initiative is being proposed by which anybody knowledgable will be authorized to contribute to the enrichment of the Wiki-ErgoCoIn knowledge base. Consultative and executive boards will be created to define strategies and policies concerning implementation of this ergonomics inspection wiki. Participation demands are planned to be directly addressed to the authors. References 1. Brajnik, G.: Automatic Web Usability Evaluation: What Needs to be Done? In: 6th Conference on Human Factors and the Web, Austin, Texas, USA (2000) 2. Cybis, W.A., Scapin, D., Andres, D.P.: Especificação de Método de Avaliação Ergonômica de Usabilidade para Sites/Web de Comércio Eletrônico. In: Workshop on Human Factors in Computer Systems, 2000, Gramado. Proccedings of 3rd Workshop on Human Factors in Computer Systems, vol. I, pp. 54–63. Ed. Sociedade Brasileira de Computação, Porto Alegre (2000) 348 M. Morandini, W. de Abreu Cybis, and D.L. Scapin 3. Cybis, W.A., Tambascia, C.A., Dyck, A.F., Villas Boas, A.L.C., Pagliuso, P.B.B., Freitas, M., Oliveira, R.: Abordagem para o desenvolvimento de listas de verificação de usabilidade sistemáticas e produtivas. In: Latin American Congress on Human-Computer Interaction, 2003, Rio de Janeiro. Proceedings of Latin American Congress on Human-Computer Interaction. Rio de Janeiro, vol. I, pp. 29–40 (2003) 4. Cybis, W.A.: UseMonitor: suivre l’évolution de l’utilisabilité des sites web à partir de l’analyse des fichiers de journalisation. In: 18eme Conférence Francophone sur l’Interaction Humain-Machine, 2006, Montréal. Actes de la 18eme Conférence Francophone sur l’Interaction Humain-Machine, vol. 1, pp. 295–296. ACM - The Association for Computing Machinery, New York (2006) 5. Cybis, W.A.: ErgoManager: a UIMS for monitoring and revising user interfaces for Web sites. Rocquencourt: Institut National de Recherche en Informatique et en Automatique, Research report (2005) 6. Cybis, W.A., Betiol, A., Faust, R.: Ergonomia e usabilidade: conhecimentos, métodos e aplicações, Novatec Editora, São Paulo (2007) 7. Farenc, P., Bastilde, C.R.: Towards Automated Testing of Web Usability Guidelines. In: Tools for Working with Guidelines, pp. 293–304. Springer, London (2001) 8. ISO/DIS 9126; Software engineering – Product quality – Part 1: Quality model. International Organisation for Standardization (1997) 9. ISO/DIS 9241; Dialogue Principles in Guiding the Evaluation of User Interfaces – part 11Guidance on Usability. International Organisation for Standardization (1997) 10. Ivory, M.Y., Heasrstam, M.A.: The State of the Art in Automating Usability Evaluation of User Interfaces. ACM Computing Surveys 33(4) (December 2001) 11. Leulier, C., Bastien, J.M.C., Scapin, D.L.: Compilation of Ergonomic Guidelines for the Design and Evaluation of Web Sites. Commerce & Interaction (EP 22287), INRIA Report (1998) 12. Molich, R., Bevan, N., Curson, I., Butler, S., Kindlund, E., Miller, D., Kirakowski, J.: Comparative Evaluation of Usability Tests. In: Proceedings of the Proceedings of the Usability Professional’s Association Conference (1998) 13. Scapin, D.L., Bastien, J.M.C.: Ergonomic Criteria for Evaluating the Ergonomic Quality of Interactive Systems. Behaviour and Information Technology 16(4/5) (1997) 14. Scapin, D.L., Leulier, C., Vanderbonckt, J., Mariage, C., Bastien, C., Palanque, P., Farenc, C., Bastilde, R.: Towards Automated Testing of Web Usability Guidelines. In: Tools for Working with Guidelines, pp. 293–304. Springer, London (2001) 15. Wammi: Website Analysis and MeasureMent Inventory (Web Usability Questionnaire) (n.d.) (2005), http://www.ucc.ie/hfrg/questionnaires/wammi (accessed, 2009) 16. Wikipedia, http://www.wikipedia.org (accessed February 2009) 17. Wilkinson, D., Huberman, B.: Assessing the value of cooperation in Wikipedia. First Monday 12(4) (2007), http://firstmonday.org/htbin/cgiwrap/bin/ojs/ index.php/fm/article/view/1763/1643 Mobile Phone Usability Questionnaire (MPUQ) and Automated Usability Evaluation Young Sam Ryu Ingram School of Engineering, Texas State University-San Marcos, 601 University Drive, San Marcos, TX 78666, USA yryu@txstate.edu Abstract. The mobile phone has become one of the most popular products amongst today’s consumers. The Mobile Phone Usability Questionnaire (MPUQ) was developed to provide an effective subjective usability measurement tool, tailored specifically to the mobile phone. Progress is being made in the HCI research community towards automating some aspects of the usability evaluation process. Given that this effort is gaining traction, a tool for measurement of subjective usability, such as MPUQ, may serve as a complement to automated evaluation methods by providing user-centered values and emotional aspects of the product. Furthermore, experimental comparison of MPUQ assessments and automated usability analysis may enable researchers to determine whether automated usability tools generate metrics that correlate with user impressions of usability. Keywords: Usability, mobile user interface, subjective measurement, questionnaire, automating usability. 1 Development of MPUQ The mobile phone has become one of the most popular consumer products of today because it is suffused with personal meanings and individual experiences. It is carried from home to work and leisure activities, and it not only provides communication whenever needed, but can also act as a primary tool for life management [1-3]. Mobile phones have also been recognized as important indicators of consumers’ tastes for buying other groups of products [4]. The mobile phone’s design encompasses two major components (i.e., hardware and software), and aesthetic appeal and image may play an important aspect in evaluation of its usability. For these reasons, the mobile phone was selected as a worthwhile target-product for the development of a new usability questionnaire. The Mobile Phone Usability Questionnaire (MPUQ) was developed through two different phases. The goal and approach of each phase are described in Table 1. The definition of usability in ISO 9241-11 was used to conceptualize the target construct, and the initial questionnaire items pool was comprised of material derived from various existing questionnaires, comprehensive usability studies, and other sources related to mobile devices. Through redundancy and relevancy analyses completed by representative mobile user groups, a total of 124 items (119 applicable to mobile phones J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 349–351, 2009. © Springer-Verlag Berlin Heidelberg 2009 350 Y.S. Ryu and 115 applicable to PDA/Handheld PCs) were retained from an original 512 items in the initial pool. To increase reliability and validity of this draft questionnaire, follow-up studies employing psychometric theory and scaling procedures were performed. To evaluate the items, the draft questionnaire was administered to a representative sample involving approximately 300 participants. The findings revealed a six-factor structure including (1) Ease of learning and use, (2) Assistance with operation and problem solving, (3) Emotional aspect and multimedia capabilities, (4) Commands and minimal memory load, (5) Efficiency and control, and (6) Typical tasks for mobile phones. The 72 items with the greatest discriminative power relating to these factors were chosen to include in the Mobile Phone Usability Questionnaire (MPUQ), which evaluates mobile phones for the purpose of making decisions among competing variations in the end-user market, alternatives of prototypes during the development process, or evolving versions during an iterative design process. Table 1. Development procedure of MPUQ Phase Goal I Generate and judge measurement items for the usability questionnaire for electronic mobile products Approach Consider construct definition and content domain to develop the questionnaire for the evaluation of electronic mobile products based on an extensive literature review: • • II Design and conduct studies to develop and refine the questionnaire Generate potential questionnaire items based on essential usability attributes and dimensions for mobile phone Judge items by consulting a group of experts and users focusing on the content and face validity of the items Administer the questionnaire to collect data in order to refine the items by • • • Conducting item analysis via factor analysis Testing reliability using alpha coefficient Testing construct validity using known-group validity 2 Automated Usability Evaluation and MPUQ Subjective usability measurements focus on an individual’s personal experience with a product or system. According to Ivory and Hearst [5], automation of usability evaluation does not capture important qualitative and subjective information. However, it is not yet known whether subjective impressions of usability are in fact correlated with metrics that automated usability approaches can capture. By conducting subjective usability evaluation using a questionnaire of the same interface as has been modeled with an automated usability prediction tool such as CogTool [6], we can perhaps determine whether it may be the case that a metric such as time taken to complete tasks can be correlated with subjective impressions of usability. One of the single greatest advantages of using questionnaires in usability research is that questionnaires can quickly and economically provide evaluators with feedback from the users’ point of view [7-9]. Since user-centered and participatory design is one Mobile Phone Usability Questionnaire and Automated Usability Evaluation 351 of the most important aspects in the usability engineering process [10], questionnaires, applied with or without any other more ambitious method, can be a valuable tool, assuming that the respondents are validated as representative of the whole user population. There are many usability aspects or dimensions for which no established objective measurements exist, and those may only be measured by subjective assessment. New usability concepts suggested for the evaluation of consumer electronic products such as attractiveness [11], emotional usability [12], sensuality [13], pleasure and displeasure in product use [14] seem to be quantified effectively only by subjective assessment and those usability concepts are proving to be important these days. The MPUQ incorporated those dimensions; most of them are under the group of (3) Emotional aspect and multimedia capabilities. While other factor group items can be covered by other usability evaluation methods, the emotional aspects cannot presently be captured by any other practical approach than subjective measurement. References 1. Vnnen-Vainio-Mattila, K., Ruuska, S.: Designing Mobile Phones and Communicators for Consumers’ Needs at Nokia. In: Bergman, E. (ed.) Information Appliances and Beyond: Interaction Design for Consumer Products, pp. 169–204. Morgan-Kaufmann, San Francisco (2000) 2. Sacher, H., Loudon, G.: Uncovering the new wireless interaction paradigm. ACM Interactions Magazine 9(1), 17–23 (2002) 3. Ketola, P.: Integrating Usability with Concurrent Engineering in Mobile Phone Development. Tampereen yliopisto (2002) 4. PrintOnDemand. Popularity of Mobile Devices Growing (2003), http://www. printondemand.com/MT/archives/002021.html (cited February 5, 2003) 5. Ivory, M.Y., Hearst, M.A.: The state of the art in automating usability evaluation of user interfaces. ACM Comput. Surv. 33(4), 470–516 (2001) 6. John, B.E., et al.: Predictive human performance modeling made easy. In: The Proceedings of SIGCHI Conference on Human Factors in Computing Systems, CHI 2004, ACM, New York (2004) 7. Kirakowski, J.: Questionnaires in Usability Engineering: A List of Frequently Asked Questions [HTML] (2003) (cited November 26, 2003) 8. Annett, J.: Target Paper. Subjective rating scales: science or art? Ergonomics 45(14), 966– 987 (2002) 9. Baber, C.: Subjective evaluation of usability. Ergonomics 45(14), 1021–1025 (2002) 10. Keinonen, T.: One-dimensional usability - Influence of usability on consumers’ product preference, University of Art and Design Helsinki, UIAH A21 (1998) 11. Caplan, S.H.: Making Usability a Kodak Product Differentiator. In: Wiklund, M. (ed.) Usability in Practice: How Companies Develop User-Friendly Products, pp. 21–58. Academic Press, Boston (1994) 12. Logan, R.J.: Behavioral and emotional usability; Thomson Consumer Electronics. In: Wiklund, M. (ed.) Usability in practice: How companies develop user friendly products, pp. 59–82. Academic press, Boston (1994) 13. Hofmeester, G.H., Kemp, J.A.M., Blankendaal, A.C.M.: Sensuality in product design: a structured approach. In: CHI 1996 Conference (1996) 14. Jordan, P.W.: Human factors for pleasure in product use. Applied Ergonomics 29(1), 25– 33 (1998) Estimating Productivity: Composite Operators for Keystroke Level Modeling Jeff Sauro Oracle, 1 Technology Way, Denver, CO 80237 jeff@measuringusability.com Abstract. Task time is a measure of productivity in an interface. Keystroke Level Modeling (KLM) can predict experienced user task time to within 10 to 30% of actual times. One of the biggest constraints to implementing KLM is the tedious aspect of estimating the low-level motor and cognitive actions of the users. The method proposed here combines common actions in applications into high-level operators (composite operators) that represent the average error-free time (e.g. to click on a button, select from a drop-down, type into a text-box). The combined operators dramatically reduce the amount of time and error in building an estimate of productivity. An empirical test of 26 users across two enterprise web-applications found this method to estimate the mean observed time to within 10%. The composite operators lend themselves to use by designers and product developers early in development without the need for different prototyping environments or tedious calculations. 1 Introduction 1.1 Measuring User Productivity Measuring productivity with an interface is a key aspect of understanding how changes impact its ease of use. One measure of productivity is the time saved by a more efficient design, that is, a design with a task flow requiring fewer steps. Time saved over repetitions of a task, as a measure of productivity, is a key aspect to calculating return on investment (ROI). Productivity metrics are often needed well before there is a working product or any existing users (esp. when the product is new). Such constraints make gathering empirical measures of productivity from a summative usability test difficult and untimely. The usual process for obtaining time on task data involves recruiting then testing actual users in a lab or remote test setup. This procedure while providing a wealth of informative data can be expensive, time-consuming and requires a working version of the tested product. As a large software organization, we have dozens of products with hundreds of distinct application areas. There is a large demand for benchmarking and improving the time to complete tasks for mostly productivity-based software such as expense reports, call center applications, etc. Conducting summative usability tests with the main goal to record benchmark task-time data is a herculean undertaking that takes resources away from formative designs. Our challenge was to derive a more reliable J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 352–361, 2009. © Springer-Verlag Berlin Heidelberg 2009 Estimating Productivity: Composite Operators for Keystroke Level Modeling 353 way to estimate time-on-task benchmarks and to inform designers about the productivity of their designs as early as possible during product development. 1.2 Cognitive Modeling Rather than observing and measuring actual users completing tasks, another approach for estimating productivity is cognitive modeling. Cognitive modeling is an analytic technique (as opposed to the empirical technique of usability testing). It estimates the task completion time from generalized estimates of the low-level motor operations. Breaking up the task that a user performs into millisecond level operations permits the estimation of task completion times for experienced users completing error-free trials. The most familiar of these cognitive modeling techniques is GOMS (Goals, Operators, Methods and Selection Rules), first described in the 1970s in research conducted at Xerox Parc and Carnegie-Mellon and documented in the still highly referenced text The Psychology of Human Computer Interaction, by Card, Moran and Newell (1983) [1]. GOMS itself represents a family of techniques, the most familiar of which is Keystroke Level Modeling (KLM). In its simplest form, a usability analyst can estimate user actions using KLM with only a few operators (pointing, clicking, typing and thinking)—see [2] p.72 for a simple introduction. KLM, probably because of its simplicity, has enjoyed the most usage by practitioners. It has been shown to estimate error free time task completion time to within 10 to 30% of actual times. These estimates can be made from either live working products or prototypes. It has been tested on many applications and domains such as maps, PDAs, and database applications [3][4][5][6][7][8][9]. One major disadvantage of KLM is the tedious nature of estimating time at the millisecond level. Even tasks which take a user only two to three minutes to complete are composed of several hundred operators. One must remain vigilant in making these estimates. Changes are inevitable and errors arise from forgetting operations (Bonnie John, personal communication, October 12th, 2008). In our experience, two to three minute tasks took around an hour to two hours to create the initial model in Excel, then an additional hour in making changes. 1.3 Software to Model KLM Operators: Cog-Tool A better way of building the estimates comes from a software-tool called Cog-Tool, built and maintained at Carnegie Mellon [10] Cog-Tool itself is the results of dissatisfaction with manual GOMS estimating [7]. Cog-Tool is free to download and after some familiarity can be a powerful and certainly more accurate cognitive modeling tool than hand-tracked estimates. Cog-Tool builds the task time estimates by having the analyst provide screen-shots or graphics from the application and then define each object the users interact with (e.g., a button, a drop-down list, etc.). There is a bit of overhead in defining all the objects and defining the sequence of steps the users take during a task. Once completed, however, Cog-Tool provides an easy way to get updated estimates on the productivity of a task. User-interface designers can actually do the prototyping within Cog-Tool and this in-fact exploits the functionality since changes made within the prototyping environment will immediately lead to a new task-time estimate. If prototyping is done in another environment (which it is in our 354 J. Sauro organization) then the analyst will need to import, define and update the objects and task-flows for each change made. 1.4 Consolidating the Operators Our organization has a rather complicated infrastructure of prototyping tools for designers so shifting our prototyping efforts into CogTool, while possible, would be a large undertaking surely met with resistance. We wanted a method to create estimates using KLM like Cog-Tool, that automated the tedious estimation process. We also wanted to allow designers to generate prototypes in whatever environment they preferred. Many requests for productivity come from the Marketing and Strategy teams who can use this information to support sales. We also wanted a method by which we could allow product managers and product strategists to generate their own estimates with little involvement from the usability team. 1.5 Looking to Industrial Engineering Some of the inspiration for GOMS (see [1], p. 274) came from work-measurement systems in Industrial Engineering which began in the early 1900s (e.g., Fredrick Taylor) and evolved into systems like MTM (Methods Time Management see [11]). Just like GOMS, these systems decompose work into smaller units and use standardized times based on detailed studies. These estimating systems evolved (MTM-2, MTM-C, MTM-V, etc.) to reflect the different domains of work and more sophisticated estimates. Generating task-times with these systems, while accurate, are often time consuming. A modification was proposed by Zandin [12] called the Maynard Operation Sequence Technique (MOST). MOST, also based on the MTM system, uses larger blocks of fundamental motions. Using MOST, analysts can create estimates five times faster than MTM without loss of accuracy [13]. Similar to the MOST technique, we wanted to describe user-actions at a higher level of work. Instead of building estimates at the level of hand-motions and mouse clicks, we wanted to estimate at the level of drop-down selections and button clicks. Each of these operations is still composed of the granular Card, Moran, and Newell operators, but the low-level details which caused the errors and were time consuming could be concealed from analysts. 2 Method To refine the KLM technique to a higher level of abstraction we first wanted to see if these higher-level composite operators could predict task times as well as the lowlevel operators. We used the following approach: 1. KLM Estimation: Estimate task times using the KLM technique with low level operators for a sequence of tasks. 2. Generate Composite Operators: Generate an estimate of the task times for the same tasks using the composite operators by identifying larger operational functions. 3. Empirically Validate: Validate the new composite operators by testing users completing the same tasks repeatedly. Estimating Productivity: Composite Operators for Keystroke Level Modeling 355 4. Refine Estimates: Use empirical data to refine composite estimates (such as updating the system response time) and modify the mental operators to account for concurrent processing. 2.1 KLM Estimation Using the method defined in [1] and [5], we estimated the times. For example, the operators for the initial operations of the task “Create an Expense Report” are: 1. 2. 3. 4. 5. M: Mental Operation: User Decides where to click (1.350s) H: Home: User moves hand to Mouse (.350s) P: Point: User locates the create expense report link target (1.1s) K: Key: User clicks on the link (.25s) R: System Response time as New Page Loads (.75s) The system response time was updated based on taking some samples from the applications. 2.2 Generate Composite Operators Using the granular steps from above, the logical composite operator is clicking on a link, so the five steps above are replaced with: Click on Link/Button. The time to complete this operation is modeled as 1.350 + .350 + 1.1 +.250 +.75 = approximately 3.8 seconds. This process was repeated for all steps in the 10 tasks. While not a complete list, we found that a small number of composite operators was able to account for almost all user actions in the 10 tasks across the two web applications. The most commonly used actions are listed below: 1. 2. 3. 4. 5. 6. 7. 8. 9. Click a Link/ Button Typing Text in a Text Field Pull-Down List (No Page Load) Pull-Down List (Page Load) Date-Picker Cut & Paste (Keyboard) Scrolling Select a Radio/Button Select a Check-Box 2.3 Empirical Validation We tested 26 users on two enterprise web-based applications (hereafter Product O and Product P). The products were two released versions of a similar travel and expense reporting application allowing users to perform the same five tasks. The participants regularly submit reports for travel and expenses and were experienced computer users. Ten of the participants had never used either of the applications, while 16 of them had used both. To reduce the learning time and to provide a more stable estimate of each operator, each participant was shown a slide slow demonstration of how to perform each task. This also dictated the path the user should take through the software. They then attempted the task. 356 J. Sauro The participants were not asked to think out loud. They were told that we would be recording their task times, but that they should not hurry – rather to work at a steady pace as they would creating reports at work. If they made an error on a task, we asked them to repeat the task immediately. To minimize carry-over effects we counterbalanced the application and task order. We had each participant attempt the five tasks three times on both systems. The training was only showed to them prior to their first attempt. From the 30 task attempts (5*2*3=30) we had hundreds of opportunities to measure the time users took to complete the dozens of buttons, links, dropdowns and typing in text-boxes. These applications were selected because they appeared to provide a range of usable and unusable tasks and exposed the user to most of the interface objects they’d likely encounter in a web-application. The goal of this test setup was to mimic the verification methods Card, Moran, and Newell did in generating their granular estimates. They, however, had users perform actions hundreds of times. Comparatively, our estimates were more crudely defined. We intended to test the feasibility of this concept and were most interested in the final estimate of the task-time as a metric for the accuracy of the model. 2.4 Concurrent Validation When estimating with KLM one typically does not have access to user data on the tasks being estimated. It is necessary to make assumptions about the system response time and the amount of parallel processing a user does while executing a sequence of actions. System response time understandably will vary by system and is affected by many factors. Substituting a reasonable estimate is usually sufficient for estimating productivity. In estimating parallel processing, there are some general heuristics ([2], p. 77) but these will also vary with the system. For example, as a user becomes more proficient with a task they are able to decide where to click and move the mouse simultaneously. The result is the time spent on mental operators are reduced or removed entirely from estimate. In the absence of data, one uses the best estimate or the heuristics. Because our goal was to match the time of users and we had access to the system, we needed to refine the operators with better estimates of actual system response time and of the parallel processing. To do so, we measured to the hundred of a second the time it took users to complete the composite operations (e.g., clicking a button, selecting from a pull-down list) as well as waiting for the system to respond. We adjusted the composite operators’ total time by reducing the time spent on mental operation; in some cases eliminating them entirely (see also [14], for a discussion of this approach). The final empirically refined estimates appear in Table 1 below. Table 1. Composite Operators and the refined time from user times Composite Operator Click a Link/ Button Pull-Down List (No Page Load) Pull-Down List (Page Load) Date-Picker Cut & Paste (Keyboard) Typing Text in a Text Field Scrolling Refined Time (seconds) 3.73 3.04 3.96 6.81 4.51 2.32 3.96 Estimating Productivity: Composite Operators for Keystroke Level Modeling 357 Some of the operators need explanation. The Date-Picker operator will vary depending on the way the dates are presented. The Cut & Paste Keyboard option includes the time for a user to highlight the text, select CTRL-C, home-in on the new location and paste (CTRL-V). The estimate would be different if using context menus or the web-browser menu. Typing Text in a Text Field only represents the overhead of homing in on a text-field, placing the curser in the text-field and moving the hands to the key-board. The total time is based on the length and type of characters entered (230msec each). Finally, the refined times above contain a system response time which will vary with each system. That is, it is unlikely that clicking of a button and waiting for the next page to display will always take 3.73 seconds. Future research will address the universality of these estimates across more applications. 3 Results and Discussion Table 2 below shows the results of the KLM estimates using the “classic” Card Moran and Newell operators and the new composite operators for all 10 tasks. Both the number of operators used and the total task times are shown. Table 2. Comparison between Classic KLM Composite KLM Time & Operators Classic KLM Product O O O O O P P P P P # of Operators Task Create Meeting Rprt Composite KLM Time (sec) # of Operators Time (sec) 81 62 23 98 51 52 21 46 43 26 15 35 32 18 6 26 149 88 32 55 169 134 36 156 93 74 21 82 65 46 13 60 48 31 11 43 131 118 23 111 Mean 86.2 64.9 20.1 71.2 SD 48.1 38.9 9.3 40.5 Update a Saved Rprt Edit User Preference Find an Approved Rprt. Create Customer Visit Rprt Create Meeting Rprt Update a Saved Rprt Edit User Preference Find an Approved Rprt Create Customer Visit Rprt The data in Table 2 show there to be a difference of six seconds between the composite and classic KLM estimates of the mean task completion time but this difference is not significant [ t (17) = .727 p >.7]. The correlation in task time estimates between the two systems is strong and significant (r =.891 p <.01). The average number of operators used 358 J. Sauro per task differed substantially—66 (86.2 vs 20.1) representing a 75% reduction. This difference was significant [ t (9) = 4.27 p <.01]. This reduction in the number of operators per task suggests estimates can be made 4 times faster using composites operators. 3.1 Do the Applications Differ in Their Composite KLM Times? Next we used the composite operators to estimate which product had better productivity (allowed users to complete the tasks faster) as this would be one of the primaryaims of estimating productivity. Table 3 shows the average of the KLM times for the sum of the operations for the five tasks for both applications. Table 3. KLM Composite Estimates between applications Task Product P (Secs.) Create Meeting Report Update a Saved Report Edit User Preference Find an Approved Report Create Customer Visit Report Average Product 0 (Secs.) Diff. (Secs) % Diff. 156 98 58 37 82 46 36 44 60 35 25 42 43 26 17 40 111 55 56 50 90 52 38 42 Table 3 above shows the KLM composite estimates to predict Product O to be approximately 42% more productive (90-52)/90 than Product P. To validate these estimates we used the 3rd error-free completed task from each user for the empirical estimates. Table 4 below shows the mean and standard deviations for both products. Table 4. Mean Task Times in Seconds for All Participants for Their Last Trial (Completed & Error Free Attempts Only) Task Prod. P (SD) Prod. O (SD) Diff. n % Diff. t pvalue 1 Create Meeting Rpt 157 (24) 105 (14) 52 16 33 9.5 <.001 2 Update a Saved Rpt 81 (19) 54 (9) 26 13 32 6.1 <.001 3 Edit User Preference 52 (13) 34 (6) 18 15 35 5.0 <.001 4 Find an Approved Rpt 38 (10) 33 (11) 5 18 13 1.6 >.12 5 Create Cust. Visit Rpt 123 (19) 61 (14) 62 15 50 11.9 <.001 Ave 89 (49) 57 (29) 32 15 36 # Estimating Productivity: Composite Operators for Keystroke Level Modeling 359 The third error free trial data shows the Product O application to be approximately 36% more productive (89-57)/89 than Product P. This difference represents an error of 14% (.42-.36)/.42. The estimates of 89 seconds and 57 seconds represent errors of 1% and 9% respectively. In assessing the accuracy of these estimates we are using the mean time from a set of users, which is in itself an estimate of an unknown population mean time of all users. There is therefore error around our estimate, which varies depending on the standard deviation and sample size of the task (just as in any usability test which uses a sample of users to estimate the unknown mean time). Some tasks have fewer users since not all of the 26 users were able to complete the third task on both systems without error. The means and 95% confidence intervals around the empirical estimates are shown in Figure 1 below. Also on the graph are the predicted KLM estimates using both the classic and composite methods. 3rd Trial Error-Free Times by Task Task # 1 2 3 4 5 95% CI for the Mean Product O P O P O P O P O P 20 40 60 80 100 120 Task Time (seconds) 140 160 180 Fig. 1. Means and 95% Confidence Intervals for 3rd Error-Free Trial by Task and Product (blue circles and error bars). The black triangles are the Composite estimates and the black squares are the granular “classic” estimates. Figure 1 shows visually the variability in the users’ mean times. When the KLM estimates are within the range of the blue error bars, there is not a significant difference between the KLM estimate and the likely population mean time. For example, both KLM estimates are not as accurate on Task 1 (especially the Classic KLM estimate) as both estimates are outside the range of the error-bars. On task 2 the estimates are more accurate as three out of the four KLM estimates are within the likely range of the actual user time. 360 J. Sauro 3.2 Limitations While the refined times of the operators displayed in Table 1 above estimated our total task time well, actual composite times will vary with each system. A major factor in each composite operator is the system response time. For desktop applications there might be little if any latency compared to the typical network delays one gets with a web-application. For each system, an analyst should define the composite operators, which would likely include many of the ones defined here. 4 Conclusion The data from this initial exploration into combining the granular operators into composite operators shows KLM estimates can be made four-times faster with no loss in accuracy. The estimates made with the composite KLM operators are within 10% of the observed mean time of error free tasks. Composite task times were not significantly different than those from classic KLM estimates (p > .7) and task level times correlated strongly (r=.89). While the composite operators and their times will vary based on the interface, the method of combining low-level operators into a highergrain of analysis shows promise. When productivity measures need to be taken and cognitive modeling is used as a more efficient alternative, using composite operators similar to those defined here show promise for faster and more approachable than millisecond level KLM estimates. References 1. Card, S., Moran, T., Newell, A.: The psychology of human-computer interaction. Lawrence Erlbaum Associates, Hillsdale, NJ (1983) 2. Raskin, J.: The Humane Interface. Addison-Wesley, Reading (2000) 3. Baskin, J.D., John, B.E.: Comparison of GOMS analysis methods. In: CHI 1998 Conference Summary on Human Factors in Computing Systems, Los Angeles, California, United States, April 18-23, 1998, pp. 261–262. ACM, New York (1998) 4. John, B.: Why GOMS? Interactions 2(4), 80–89 (1995) 5. Olson, J.R., Olson, G.M.: The growth of cognitive modeling in human-computer interaction since GOMS. Hum.-Comput. Interact. 5(2), 221–265 (1990) 6. Gray, W.D., John, B.E., Atwood, M.E.: Project Ernestine: A validation of GOMS for prediction and explanation of real world task performance. Human–Computer Interaction 8(3), 207–209 (1993) 7. John, B., Prevas, K., Salvucci, D., Koedinger, K.: Predictive Human Performance Modeling Made Easy. In: Proceedings of CHI 2004, Vienna, Austria, April 24-29, 2004, ACM, New York (2004) 8. Gong, R., Kieras, D.: A validation of the GOMS model methodology in the development of a specialized, commercial software application. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHII 1994, Boston, Massachusetts, pp. 351–357. ACM, New York (1994) Estimating Productivity: Composite Operators for Keystroke Level Modeling 361 9. Haunold, P., Kuhn, W.: A keystroke level analysis of a graphics application: manual map digitizing. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHII 1994, Boston, Massachusetts, pp. 337–343. ACM, New York (1994) 10. John, B.: The Cog-Tool Project (2009), http://www.cs.cmu.edu/~bej/cogtool/ (accessed January 2009) 11. Maynard, H., Stegemerten, G., Schwab, J.: Methods Time Measurement. McGraw-Hill, New York (1948) 12. Zandin, K.: MOST Work Measurement Systems. Marcel Dekker, New York (1980) 13. Niebel, Freibalds: Methods, Standards, and Work Design. McGraw-Hill, New York (2004) 14. Mayhew, D.: Keystroke Level Modeling as a Cost-Justification Tool. In: Bias, Mayhew (eds.) Cost-Justifying Usability, 2nd edn., pp. 465–488 (2004) Paper to Electronic Questionnaires: Effects on Structured Questionnaire Forms Anna Trujillo* NASA Langley Research Center, MS 152, Hampton, VA 23681 USA anna.c.trujillo@nasa.gov Abstract. With the use of computers, paper questionnaires are being replaced by electronic questionnaires. The formats of traditional paper questionnaires have been found to affect a subject’s rating. Consequently, the transition from paper to electronic format can subtly change results. The research presented begins to determine how electronic questionnaire formats change subjective ratings. For formats where subjects used a flow chart to arrive at their rating, starting at the worst and middle ratings of the flow charts were the most accurate but subjects took slightly more time to arrive at their answers. Except for the electronic paper format, starting at the worst rating was the most preferred. The paper and electronic paper versions had the worst accuracy. Therefore, for flowchart type of questionnaires, flowcharts should start at the worst rating and work their way up to better ratings. Keywords: Electronic questionnaires, Cooper-Harper controllability rating, questionnaire formats. 1 Introduction Paper questionnaires are slowly being replaced by electronic questionnaires. Respondents’ ratings, though, may subtly change when using an electronic format [1, 2]. This research begins to determine how electronic questionnaire formats change subjective ratings from the traditional paper formats; in particular, how electronic formats may affect responses to a structured, flowchart type of questionnaire. 1.1 Background Previous research found that ratings change depending on the electronic format of a traditional paper questionnaire [1, 2]. Furthermore, an electronic version of the NASA-TLX questionnaire had a higher workload rating associated with it [3]. Even with the traditional paper formats of questionnaires, the format may affect a subject’s ratings [4, 5]. * The author gratefully acknowledges the significant contributions of Lucas Hempley of Lockheed Martin for his programming of the experiment. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 362–371, 2009. © Springer-Verlag Berlin Heidelberg 2009 Paper to Electronic Questionnaires 363 For this experiment, subjects used the Cooper-Harper (CH) Controllability Rating Scale [6, 7] on a control task that required them to keep a randomly moving target centered. Subjects were told that desired performance was maintaining the target in the inner portion of the screen while adequate performance was maintaining the target in the middle portion of the screen (Fig. 1). Each rating was also described to the subjects with respect to the control task. Adequate Performance Target Desired Performance Fig. 1. Target Tracking Task with Indicated Desired and Adequate Performance 1.2 Objective The objective of this research was to determine whether electronic formats of paper questionnaires change subjects’ ratings. In particular, how electronic formats may affect responses to a structured, flowchart type of questionnaire. 2 Experimental Variables 2.1 Subjects’ Piloting Experience Twenty people participated as subjects. Ten were certificated pilots with at least a current Private Pilot license [8]. The rest of the subjects were non-pilots. The average age of the pilots was 48 years and the average age of the non-pilots was 40 years. The pilots averaged 22 years of piloting experience and they had an average of 7314 hrs of total piloting time. 2.2 Cooper-Harper (CH) Controllability Rating Scale Formats Each subject saw five CH controllability rating scale formats – the standard paper format and 4 electronic formats. The electronic formats were: (1) electronic paper, (2) forced choice bottom, (3) forced choice middle, and (4) forced choice top. Paper CH Format. The Paper CH format was the standard CH format [6, 7]. 364 A. Trujillo Paper and Electronic Paper CH Format. The Electronic Paper CH format mimicked the paper version but on a touch screen (Fig. 2). In order to choose a rating, subjects had to touch the appropriate rectangle (e.g., Major deficiencies … 8). AIRCRAFT CHARACTERISTICS Next DEMANDS ON THE PILOT IN SELECTED TASKS Excellent Highly desirable Pilot compensation not a factor for desired performance Good Negligible deficiencies Pilot compensation not a factor for desired performance PILOT RATING Minimal pilot compensation required for Fair – Some mildly unpleasant deficiencies desired performance Yes Is it satisfactory without improvement? No Deficiencies warrant improvement Yes Is adequate performance attainable with a tolerable pilot workload? No Deficiencies require improvement Minor but annoying deficiencies Desired performance requires moderate pilot compensation Moderately objectionable deficiencies Adequate performance requires considerable pilot compensation Very objectionable but tolerable deficiencies Adequate performance requires extensive pilot compensation Major deficiencies Adequate performance not attainable with maximum tolerable pilot compensation Controllability not in question Major deficiencies Considerable pilot compensation is required for control Major deficiencies Intense pilot compensation is required to retain control Major deficiencies Control will be lost during some portion of required operation Yes Is it controllable? No Improvement mandatory Fig. 2. Electronic Paper CH Format Forced Choice Bottom CH Format. The Forced Choice Bottom CH format expanded depending on the choices selected by the subject. The flow chart started from the bottom (Is it controllable?) and worked its way up in ratings (Fig. 3). When the subject reached the ratings, only the ratings of the path taken were available. The path the subject had taken to get to those ratings was visible. Forced Choice Middle CH Format. The Forced Choice Middle CH format also expanded depending on the choices selected by the subject. The flow chart started from the middle (Is adequate performance attainable with a tolerable pilot workload?) and worked its way up or down in ratings. As before, when the subject reached the ratings, only the ratings and their associated path were visible. Forced Choice Top CH Format. The Forced Choice Top CH format expanded depending on the choices selected by the subject but the flow chart started from the top (Is it satisfactory without improvement?) and worked its way down in ratings. As with the other two forced choice CH formats, when the subject reached the ratings, only the ratings of that path and the path itself were visible. Paper to Electronic Questionnaires Yes Yes 1 Is it controllable? 365 2 No Is adequate performance attainable with a tolerable pilot workload? No No Is it controllable? AIRCRAFT CHARACTERISTICS Next DEMANDS ON THE PILOT IN SELECTED TASKS PILOT RATING 3 Adequate performance not attainable with maximum tolerable pilot compensation Controllability not in question Is adequate performance attainable with a tolerable pilot workload? Deficiencies require improvement Major deficiencies Considerable pilot compensation is required for control Major deficiencies Intense pilot compensation is required to retain control Is it controllable? Fig. 3. Forced Choice Bottom CH Format 2.3 Control Task Difficulty Each subject attempted to keep a moving target centered for 1 minute using a righthanded side stick. The control task difficulty levels ranged from a CH rating of 1 to a CH rating of 10. Each scenario had a preset control task difficulty level that was accomplished by linearly changing the speed of the target and inceptor gain. A pretest to verify that the control task difficulty levels matched an operator’s CH rating was conducted. The average difference between the control task difficulty level and the three subjects’ CH ratings was -0.07±1.4 with a median of 0. A linear regression of the data was significant (F(1,59)=1161.58; p≤0.01). The slope was 0.94 with an R2=0.95. 2.4 Dependent Variables The primary dependent variable was the subjects’ CH ratings compared to the control task difficulty. The time taken to complete the CH ratings and the workload incurred to complete the CH ratings were also analyzed. At the end of the experiment, subjects completed a final questionnaire. This questionnaire asked subjects to rate on a continuous scale how easy the CH formats were 366 A. Trujillo for rating the control task difficulty and the associated workload to complete the various CH formats. The questionnaire also asked for subject preferences, and likes and dislikes by display type. 3 Procedure When subjects first arrived, they signed a consent form before being given a verbal briefing on the experiment tasks. Subjects then moved to the simulator where they completed two practice runs with the first CH format. After the practice runs, subjects completed 10 data runs. During each run, subjects had to keep a randomly moving target centered for 1 minute using a right-handed side stick. They also had to indicate when a frequency changed and answer a question that required basic multiplication skills. At the end of each run, subjects completed the CH controllability rating scale and the workload of determining a CH controllability rating. At the end of the 10 data runs with the first CH format, subjects completed at least one practice run with the next CH format and then the 10 data runs with that CH format. This was repeated until subjects had seen all five CH formats. At the end of the simulation runs and questions, subjects completed the final questionnaire. 3.1 Apparatus The simulations ran on two PCs running Windows™ XP Professional1. These had a redraw refresh rate of 60Hz and a graphics update rate of 30Hz. The target tracking task was displayed on a 30-inch LCD screen in front of and slightly above the subject’s eye level. The information indicating the frequency change and to answer the multiplication question was on a screen to the right of the subject. The questions were answered using a touch screen to the subject’s left. The CH questionnaire was also presented on this left screen at the end of the run. These two touch screens were 19inch LCD screens with an Elo Touchsystems IntelliTouch overlay for touch-screen capability. The side stick used was a Saitek Cyborg evo joystick. Subjects used their right hand to manipulate the side stick. 3.2 Data Analysis Data was analyzed using SPSS® for Windows v16. Most of the time, the data was analyzed using a 3-way ANOVA with CH format, control task difficulty, and pilot status (pilot vs. non-pilot) as the independent variables. To determine the accuracy of the CH formats, the control task difficulty level was subtracted from the subjects’ CH ratings. Therefore, a subject was the most accurate when this difference was 0 and the least accurate when the absolute value of this difference was 9. Furthermore, the CH ratings were on an integer scale. In the ANOVA analysis, the CH rating was treated as a continuous scale even though it is ordinal [9]. The final questionnaire responses were on continuous 100-point scales. 1 The use of trademarks or names of manufacturers in the report is for accurate reporting and does not constitute an official endorsement, either expressed or implied, of such products or manufacturers by the National Aeronautics and Space Administration. Paper to Electronic Questionnaires 367 4 Results 4.1 Accuracy of Subjects’ CH Ratings Mean Subject CH Rating - Control Task Difficulty When subtracting the control task difficulty from subjects’ CH rating, pilot status by CH format was significant (F(4, 900)=3.21; p≤0.02) (Fig. 4). In general, both pilots and non-pilots underestimated the control task difficulty with non-pilots underestimating the difficulty a bit more than pilots especially for the Forced Choice Middle and Forced Choice Top CH formats. Subjects for these two formats typically underestimated the control task difficulty by a full rating. 0.00 -0.25 -0.50 -0.75 -1.00 -1.25 -1.50 -1.75 SE of the Mean Non-Pilot Pilot -2.00 Paper Electronic Paper Forced Choice Forced Choice Forced Choice Bottom Middle Top CH Format Fig. 4. Mean Subject CH Rating – Control Task Difficulty by Pilot Status and CH Format A linear regression estimating the subjects’ CH rating by the control task difficulty was done in order to compare the effects of pilot status and CH format. As can be seen in Figure 4 and Table 1, subjects typically underestimated the control task difficulty by Table 1. Linear Regression Statistics of Estimating Subject CH Rating with Control Task Difficulty by Pilot Status and CH Format Pilot Status Non-Pilot CH Format Paper Electronic Paper Forced Choice Bottom Forced Choice Middle Forced Choice Top Slope 0.80 0.80 0.87 0.84 0.68 R2 0.86 0.89 0.89 0.87 0.86 Pilot Paper Electronic Paper Forced Choice Bottom Forced Choice Middle Forced Choice Top 0.82 0.82 0.84 0.85 0.85 0.91 0.88 0.91 0.89 0.89 368 A. Trujillo 15%. For pilots, the most accurate CH formats were flowcharts while the Forced Choice Bottom CH format was the most accurate format for non-pilots. 4.2 Time to Complete CH Ratings The CH format was significant for the time it took subjects to complete the CH ratings (F(4, 900)=31.98; p≤0.01) (Table 2). Not surprisingly, the Paper CH format took the longest to complete with the Forced Choice Bottom CH format taking the second longest. This is probably because this format typically requires a greater number of button pushes. The other formats were not significantly different from one another. Table 2. Time to Complete CH Rating by CH Format Time to Complete CH Rating (sec) Mean SE of the Mean 18.27 0.58 10.34 0.71 13.16 0.62 10.99 0.43 10.80 0.46 CH Format Paper Electronic Paper Forced Choice Bottom Forced Choice Middle Forced Choice Top 4.3 Subjective Data Subjects’ preference for the CH formats was dependent on CH format (F(4, 87)=2.95; p≤0.03) and pilot status by CH Format (F(4, 87)=4.36; p≤0.01) (Fig. 5). In general, subjects preferred the Electronic Paper and Forced Choice Bottom CH formats. 100 Non-Pilot Pilot Preference (0=low, 100=high) 80 60 40 SE of the Mean 20 0 Paper Electronic Paper Forced Choice Forced Choice Forced Choice Bottom Middle Top CH Format Fig. 5. CH Format Preference by Pilot Status and CH Format Paper to Electronic Questionnaires 369 Pilot status by CH format was also significant for subjects’ reported workload in completing the CH ratings (F(4, 90)=2.51; p≤0.05) (Fig. 6). Workload for the Electronic Paper CH formats was the same for both pilots and non-pilots. But for pilots, the Forced Choice Bottom CH format a slightly higher workload than the Electronic Paper CH format but the workload was on par with the Paper version. The other two flow chart methods had even higher workloads for pilots. For non-pilots, the electronic versions of the CH format did not really affect the workload but they were lower than the Paper CH format. Subjects indicated that the CH format affected their ability to arrive at their desired rating (F(4, 83)=4.26; p≤0.01) (Table 3). In general, subjects felt that the Paper, Electronic Paper, and Forced Choice Bottom CH formats allowed them to arrive at an accurate CH rating. 100 Workload to Enter CH Rating (0=low, 100=high) Non-Pilot Pilot 80 60 Compared to Paper CH Format 40 SE of the Mean 20 0 Electronic Paper Forced Choice Bottom Forced Choice Middle Forced Choice Top CH Format Fig. 6. Workload to Enter CH Rating by Pilot Status and CH Format Table 3. Ability to Arrive at Desired Rating by CH Format CH Format Paper Electronic Paper Forced Choice Bottom Forced Choice Middle Forced Choice Top Ability to Arrive at Desired Rating (0=low, 100=high) Mean SE of the Mean 65.32 6.89 77.37 6.42 65.94 5.09 48.17 5.21 52.41 4.55 Additionally, subjects indicated that on the Paper version, they specifically went step by step through the flow chart only about half of the time even though they were instructed to arrive at their ratings via sequentially answering the questions in the flow 370 A. Trujillo chart: specifically 45% of the time for non-pilots and 64% of the time for pilots. This may be because the Paper and Electronic Paper CH formats allow subjects to “cut to the chase” and choose a number without going through the flow chart (Table 4). Table 4. Subject Comments on the CH Formats Subject Comment Categories and Example Comments All choices are available on Paper and Electronic Paper CH formats “like to see all options”; “easier to compare measures” Number 18 Too much information on Paper and Electronic Paper CH formats “hard to sort all information”; “information overload” 8 Like the mechanics of flowchart “like flowchart with its logical sequence” 8 Do not like the mechanics of flowcharts “takes longer” 5 Do not like mechanics of Paper CH formats “more cumbersome”; “required most time to answer” 9 Specific comments on where to start in flow chart “flow chart pulls you in the direction of where you started” “liked starting at the bottom because it was the worst case” 16 Many subjects commented that they liked having all the information available to them to see at once. Some subjects did say that the Paper and Electronic Paper CH formats induced “information overload” because “there was too much information.” Subjects who liked flowcharts said it was because they had a “logical sequence” which helped “produce a more reasoned rating.” As for where to start on the flowchart, most subjects commented that they like to start at the bottom because it was the “most intuitive” and “ask[ed] the most important question first.” Other comments relating to other starting points in the flowcharts indicated that the “flow logic was counter intuitive.” Generally, subjects liked having all the information available to them at once but they did feel like the flow chart formats produced a logical thought process. Of the flow chart sequences, the Forced Choice Bottom CH format had the most preferred logic sequence. 5 Discussion Electronic questionnaires are replacing paper formats. The formats of traditional paper questionnaires have been found to affect a subject’s rating. Consequently, the transition from paper to electronic format can subtly change results. This research had subjects use five different formats of the CH Controllability Rating Scale that requires respondents to give their ratings by answering questions posed in a flowchart. Results indicated that while all formats were reasonably accurate, the Electronic Paper and Forced Choice Bottom CH formats produced the most accurate ratings Paper to Electronic Questionnaires 371 while being the most preferred. In general, subjects underestimated the difficulty of the control task using all CH formats. Workload in inputting their answers was a bit higher for pilots when using the Forced Choice Bottom CH format but was on par for the workload when using the Paper version. Subjects did indicate that they only went through the Paper flow chart questions about half the time even though they were instructed to arrive at their ratings only after answering the flow chart questions. Therefore, moving questionnaires from paper to electronic media could change respondents’ answers. Specifically, the above results suggest that when using a flow chart type of questionnaire, it is best to have subjects directly answer each decision point while starting at the worst rating. Although this inflicts a slight penalty in time and workload, it does insure that subjects make decisions at each point while minimizing the underestimation of the difficulty of the task. References 1. Trujillo, A.C., Bruneau, D., Press, H.N.: Predictive Information: Status or Alert Information? In: 27th Digital Avionics Systems Conference, St. Paul, MN (2008) 2. Trujillo, A.C., Pope, A.T.: Using Simulation Speeds to Differentiate Controller Interface Concepts. In: 52nd Annual Meeting of the Human Factors and Ergonomics Society, HFES, New York (2008) 3. Noyes, J.M., Bruneau, D.P.J.: A Self-Analysis of the NASA-TLX Workload Measure. Ergonomics 50(4), 514–519 (2007) 4. Riley, D.R., Wilson, D.J.: More on Cooper-Harper Pilot Rating Variability. In: 8th Atmospheric Flight Mechanics Conference, Portland, OR (1990) 5. Wilson, D.J., Riley, D.R.: Cooper-Harper Pilot Rating Variability. In: AIAA Atmospheric Flight Mechanics Conference, Boston, MA (1989) 6. Cooper, G.E., Harper, R.P.: The Use of Pilot Rating in the Evaluation of Aircraft Handling Qualities, Technical Report 567, AGARD. p. 52 (1969) 7. Harper, R.P., Cooper, G.E.: Handling Qualities and Pilot Evaluation (Wright Brothers Lecture in Aeronautics). Journal of Guidance, Control, and Dynamics 9(6), 515–529 (1986) 8. Federal Aviation Administration. Electronic Code of Federal Regulations - Title 14: Aeronautics and Space Subpart E-Private Pilots Section 61.103 (August 28, 2008), http://ecfr.gpoaccess.gov/cgi/t/text/text-idx?c=ecfr&tpl= %2Findex.tpl (cited September 2, 2008) 9. Bailey, R.E.: The Application of Pilot Rating and Evaluation Data for Fly-by-Wire Flight Control System Design. In: AIAA Atmospheric Flight Mechanics Conference, p. 13. AIAA, Portlan, OR (1990) Website Designer as an Evaluator: A Formative Evaluation Method for Website Interface Development Chao-Yang Yang College of Management, Industrial Design Department, Chang Gung University, 259 Wen-Hwa 1st Road, Kwei-Shan, Tao-Yuan, Taiwan, R.O.C. dillon.yang@mail.cgu.edu.tw Abstract. Commerce plays a fundamental part in a lot of websites so that their goals may be different from conventional computer system design e.g. to increase the user base or encourage repeat visits. With limited budgets, website designers are unlikely to involve their users during the design process and not all website designers have access to an evaluator, appropriate testing facilities or evaluation knowledge to support their design. The research develops a low cost, tailorable, formative evaluation method for web designers. The method addressed both HCI and commercial website goals such as the encouragement of repeat visits. This research first investigate the contemporary evaluation method, the users’ and designers’ needs from websites and website evaluation methods. Finally, the method was developed as a set of guidelines and verified in the evaluation of a website. The potential usefulness, practicality and necessity of the method was then confirmed by website. Keywords: Website usability, Engagement, Formative Evaluation. 1 Introduction Instead of evaluating websites with real users, designers normally rely on their own experience, company guidelines, and clients’ demands to inspect the site before launch. Such an approach yields user responses such as “I don’t like the colour”, “I cannot find the information I want”, and “I don’t want to provide that much information when I shop online”. For designers to fully realize a site’s market potential, an appropriate website evaluation method is needed that will enable them to understand user requirements and enable them to redesign a site in terms of user needs. The development of website technology such as Java, My SQL database system, has made the development of various functions for users possible. However, such innovations complicate the design task by increasing the number of elements from which a designer can select to include in a website. As website and software interfaces are similar in some respects (such as information architecture and navigation design), a number of standard HCI evaluation methods, such as observations and questionnaires, have been used to help web designers identify and categorize usability problems in terms of effectiveness, efficiency, learnability, and satisfaction. HCI is concerned with understanding how people use computer systems with the aim of informing the development of systems that more closely meet users' needs [1, 17]. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 372–381, 2009. © Springer-Verlag Berlin Heidelberg 2009 Website Designer as an Evaluator 373 Attracting new users and retaining them through good design and usability are of greater importance as competition increases on the Internet. However, because of the inherent differences between website usability and conventional computer systems usability, it may be inappropriate to directly apply standard HCI methods to evaluate websites. Spool et al [13] and Nielsen [9] suggest that website designers should pay more attention to enhancing the functions and information to make users like a web site. Furthermore, Nielsen [9] has also established that users have low tolerance for complex designs or slow sites; people don't want to wait and they don't want to learn how to use a home page. As there is no training or manuals for a Web site, people have to be able to grasp the function of the site immediately upon scanning the home page. On the other hand, this research has identified a need for designers to also consider the site’s ability to retain users and attract regular repeat visits. Hence, the users’ needs from the website and the designers’ needs from website evaluation should be clarified. As websites develop and evolve more quickly and cheaply than normal software releases [3, 16], designers are commonly faced with the need to refine or redesign their sites. Testing late in the development cycle enables the site to be compared to predetermined usability standards or benchmarks (e.g. task components work together, thereby preventing flawed releases entering the marketplace, that will need recall or adjustment [12]. Thus a general method that can be used quickly, by practicing designers is needed that will provide the specific information needed for redesign. Most existing evaluation methods have been criticized because they do not specifically identify usability problems [2]. Such deficiencies in information about user issues may leave designers guessing at solutions. For example, the statement “some of the users cannot find the correct navigation to link to next page to finish the task” provides insufficient information for an evaluator to judge the problem precisely. The best that can be concluded is that “the users cannot navigate correctly”, which is not specific enough to properly guide redesign. Instead, the actual problem could be that the users do not understand the navigation term used, or that the navigation element could not be located easily. When a problem is not fully understood [10] or is described at the wrong level of abstraction, and the designer is not a typical user [11], it is easy for the designer to overlook some of the most critical but subtle dimensions that contribute to a situation, and the resulting solution may make some parts of the interface worse [10]. This being the general case, it is important to examine current evaluation methods to establish their specific deficiencies. When the above in mind, this research aims to contribute to knowledge by enhancing existing evaluation methods forward in website design. A low-cost evaluation method for web designers to use for formative evaluation, prior to site launch was developed and validated. In particular the research shows that: In terms of designers’ goals and users’ expectations, web design needs to consider more than just the HCI issues considered in conventional computer interface design; • User-centred design is important, as designers’ and users’ perceptions of websites are different; • Web designer’s requirements with regard to evaluation methods are not met. They need to be presented with well specified, detailed problems from which they can 374 C.-Y. Yang generate effective solutions. Such methods would enhance the efficiency of the redesign process; • Decomposing the website into visual, informational, navigational, and functional sections enables the evaluator to systematically test and determine problems. 2 Methods The research method adopted was that of problem-solving. Given that a problem has been identified, requirements for the solution generation are collected; a solution is proposed and finally tested. In order to achieve our aims and objectives, a number of methods were used as appropriate to an “understand –propose –realise -evaluate” lifecycle [17] (as shown in Fig. 1). An understanding of the problem was achieved through a literature review, analysis of current methods and attitudinal survey. From the requirements identified in the survey, literature review and the analysis of the applicability of existing evaluation methods, a web evaluation framework was developed that would meet both users’ and designers’ needs. An action learning approach was taken to the development of the method, whereby the researcher iteratively designed, tested, and selected methods based on their usefulness. Following evaluation (see below), the method was formalized for use by practicing designers. The method was subjected to three forms of evaluation: Iterative interface design and evaluation; formalisation of the method; evaluation by web designers. Understand Propose Realise Evaluate Fig. 1. Design Research Model with Feedback Loops [17] 3 Literature Review A website consists of navigation, information, and visual elements. The purpose of a site may be viewed as a conjunction of satisfying what users are trying to accomplish (e.g. doing research, buying products, or downloading software), and the designer’s and client’s goals [13]. These goals are related to issues such as HCI, pleasure, security, technical issues, and accessibility. Typically, the website designer is in charge of the look, ease of use, and the content of a website [3], with the intention of providing a clear marketing message, trust, frequent update of information, aesthetics, functionality, so as to achieve client company goals. The following elements have been identified as important in terms of the user and designer’s needs. 1. Navigation. HCI plays an important role in website navigation design. This has been addressed in website design in terms of effective and efficient information structure, user interface, page design, content authoring, cognitive process, linking strategy and task design. Website Designer as an Evaluator 375 2. Information. A commercial website should provide useful, helpful (e.g. to help make purchase decision), updated, and individualized information. In addition, it should include clear marketing messages and effective privacy statements that show it is following privacy and consumer protection guidelines, making the security of customer data a priority and using independent certification bodies. 3. Visual design. The likeability and attractiveness of visual design relies on the aesthetics of the layout, colour scheme, animation etc. Also accessibility for colour blind users needs to be considered. 4. Other issues – in particular technical and accessibility issues should be considered. This section has introduced website design, considered the role of HCI and identified a new set of web design considerations. Evaluation has been identified as crucial in supporting the design process, hence it follows that website evaluation methods will need to accommodate the identified design goals and issues associated with website design. The diversity of website users, their purpose, individual characteristics, information seeking behaviour, and cultural issues are complicated and affect the design of different types of websites. Having reached this point the question is, “To what extent do website design and evaluation methods handle the factors discussed above?” With this in mind, next, website design methodologies will be reviewed prior to an examination of website evaluation methods. The goal of this analysis will be to assess the extent to which current website design and evaluation is fit for purpose. Effective usability evaluation methods for website redesign need to be fast, lowcost, easy to learn, provide high confidence and high impact. This section reviewed conventional usability methods and those that have been employed in website design. It can be concluded that most website usability evaluation in late development stages requires real user testing, i.e., observing users completing website tasks. These methods provide reliable and useful information for redesign. However, given that existing methods have been adapted from those used in HCI, there are several shortcomings in the extent to which website features and issues can be evaluated using such methods and the extent to which such methods can be applied to rapid website development with short product life cycles, small development teams and limited budget. In such an environment, methods that require specialists or specific equipment may not be used. Further, the evaluation methods have also been discussed from a marketing perspective in which it has been shown that the marketing goals have not been properly addressed. In the next section, the issues identified from this review will be investigated through a study of selected user testing and data analysis tools. 4 Study of Current Usability Testing Methods To gain more insight into the usefulness of current evaluation methods, a representative set of different types of methods was used to evaluate a website prior to launch. These included observation, Meaning in Mediated Action [15], and Website Analysis and Measurement Inventory (WAMMI) [5] and Breakdown Analysis [14, 18]. The UNITE (Ubiquitious and Integrated Teamwork Environment) website, which had not 376 C.-Y. Yang been launched was selected as the test object. This site was designed to promote an EU founded project which aimed to develop an environment for virtual teamwork. Overall, the evaluation was time consuming (especially in the task completion section) and each method provided both useful and not so useful information for redesign. By observing the process and the usefulness of the results, WAMMI was identified as being useful in assessing the participants’ preferences through its rating system. However, some of the questions were irrelevant and unclearly defined problems were not useful. The designer indicated that the MIMA and Task Completion sections were helpful for redesign of the navigational elements as they provided details of specific navigation problems. Further, the designer indicated that WAMMI provided information about the site’s engagement and time-based issues which was lacking in MIMA and the User Testing. To summarize, each method had strengths and weaknesses and using more then one technique can help ensure that the findings are reliable [14]. The comprehensive information needed for redesigning the site can be generated through multiple methods, although the process could be shortened by discarding less useful elements, simplifying them and concentrating on the elements/information needed by the designer for the task in hand. In summary, the results from existing different methods support each other; the results from user testing and MIMA are useful for redesign; marketing issues need to be considered in the evaluation; marketing goals can be addressed in a questionnaire but this should provide more specific information. The next stage of the research will employ a questionnaire to gather further opinions about website usability from users and designers. This will help to establish the requirements for website evaluation enabling us to construct a new method, geared to the needs of designers. 5 Internet Surveys of Designer and User Needs Through testing existing usability methods on the UNITE website it was found that different methods favour different aspects of web usability. Through triangulation and selected use and adaptation of different methods a more complete picture of usability issues can be established. However to be useful outside the experimental situation such a combination of methods has to provide sufficient detailed information for designers to concentrate on the important elements from the user’s perspective. Therefore the design of an effective method should, on the one hand recognize the needs of the designers (to produce usable sites quickly) and the requirements of the design task and on the other hand the needs of the user – to find the information they need efficiently. Taking the findings of the previous studies into account, the important elements for web site design may be summarized as: adherence to best practice in HCI, usefulness, pleasure in using the site, user retention, and the ability to attract new users. This part of research will detail a study undertaken to establish whether there are any differences in the way in which web site designers and users perceive usability, and what type of information designers would like for redesign (i.e. formative as opposed to summative evaluation). In this case the method chosen was an on-line Internet questionnaire which would help reach the massive key target populations – web designers and site visitors. This Website Designer as an Evaluator 377 questionnaire focused on the users’ and web designers’ opinions of web usability. Web designers, whose sites were included on www.coolhomepages.com, were invited. The user participants were selected by posting an invitation on professional message boards and discussion areas (for experienced users) such as www.coolhomepages.com and www.msn.com, as it was believed these users would be more web-savvy. The results have confirmed that all five general goals should be given equal prominence in website design. In addition, several participants felt that a website should require its users input (e-mail or buying products) as this can bring benefits to the website. These can be attained by improving design requirements such as ease of navigation, helpful information, and good visual design. Helpful, updated, or interesting information are the user’s primary needs from a website. These features also affect the likeability of and degree of user engagement with a site. Clear and attractive visual design mainly affects likeability. Easy of navigation is a primary requirement, a feature emphasized in conventional HCI. In addition, as described previously, ease of navigation has been indicated as an important predictor of recommendability. Functionalities, such as a message board and search engine have been indicated as key ingredients of a website. Therefore, it is reasonable to propose that providing useful functionality that meets users needs, could improve the degree of engagement with and likeability of a website. The designers generally pay attention to the site’s usability before launching the site. Although the designers stated that they understood typical usability statements and could act on them appropriately – when their answers were considered in detail, different solutions were proposed by different designers to the same problem, indicating that the statements might be ambiguous and lacking in sufficient detail for reliable decisions to be made. Designers prefer feedback from users that have a clearly stated problem report relating to the site’s information structure, image, colour, compatibility, font, symbols, logo, etc. Further, they were concerned about the cost and complexity of evaluation. However, given the differences between designers and users it is still necessary to evaluate websites with real users. To summarize, a good site is designed considering issues of adherence to best practice in HCI, usefulness, user retention, likeability, and ability to attract new users. Designers and users were shown to have slightly different views on website usability. In terms of evaluation, the existing problem statements were not detailed enough. Following Newman and Landay’s [7] categorization, websites can be broken down into navigation, information, and visual elements to provide a clear view of the site. In addition to these elements, functionality has also been identified as a fundamental element that may affect website goals and usability.Applying this categorization to the development of an evaluation method may provide a clearer view of the website and lead to the development of more designer friendly methods. Collecting data is necessary but not sufficient for a usability test [4]. A clearly defined problem report is also necessary and the need for a usable output should be remembered, for example detailed problem identification is required to avoid ambiguity, and questions should be related to the actual web site rather than overall features. Such problems may arise especially in areas of overlap between knowledge domains. For example a problem such as “this web site is a waste of time in every respect” could relate to either poor quality information or hardness of navigation. To avoid 378 C.-Y. Yang such ambiguity the method will relate statements more clearly to the domain they refer to, such as aesthetics or navigation. The results of this study have also shown that depending on the site, the target users may be different and may have different perceptions using the site. For example, a human resources site may aim to provide an easy to use interface, but a Disney site may aim to achieve high likeability. Without feeding the target user’s needs, the site may fail to retain their users. Therefore, a more tailorable approach to evaluation is needed, which is based around the site, the expectations of the owners of the site, the designers and the users. 6 Composing the Website Evaluation Method Previous research has shown that increasing the user base, the likeability and efficiency of the web site, engaging the users, and identifying and meeting user needs are important for website development. For users, as shown in Fig. 2, helpfulness of information, ease of use, attractiveness of the visual design, and functionality play a part in determining whether a site achieves these goals. An evaluation method is proposed that can be used by designers. The method is composed mainly by four evaluation techniques: MIMA interview, card sorting, user testing and structured interview. It takes into account time, cost, learning time, degree of confidence in the method, and the potential for impact on redesign. The evaluation method was designed to provide an effective and efficient formative evaluation that could be used by designers to provide information for redesign. MIMA interview. The elements to be tested are shown to the participant first in isolation, and then in the context of the web page they appear on. The participant is required to interpret the representation. This is recorded in the format of Table 1 against that of the designer. Where necessary the evaluator should ask for clarification of interpretation, so that the nature of the interpretation is fully understood. Table 1. An example of function key assessment Functions Search Intended action Assessment of IM Participant’s interpretation Start searching the Search information related to given keyword in the the keyword database Card sorting. The participant is asked to associate cards with the most relevant main navigational links (could be terminology or graphic). The cards containing the navigational elements to be tested are placed separately on the desk in front of the participant with the main navigation cards set out at the top. The participant needs to assign the sub-navigational elements or contents to these by placing them underneath the element to which they appear most relevant. If the site contains sub-sub-navigations/ contents, the evaluator should then ask the participant to assign these under the subnavigations determined by the designer. Website Designer as an Evaluator 379 Fig. 2. Relationship among the design elements Table 2. The card sorting result format Main navigation Superb-cards Designer’s categorization Assessment of Participant’s categorization participant’s categorization ○ “Saver” telephone card “Bubble” telephone card + 30% off Swifty telephone card Advantage buying from Superb Call + download mobile ring tones and pictures (yahoo.com.tw, 2005) + Web-telephone The participant’s categorization is recorded and assessed as shown in Table 2. Those instances where the participant categorizes information in a similar way to the designer are of no interest and are marked “ ”. Additionally, a “-“ is given when the participant fails to sort a card in the correct place on the navigational link. A “+” is given when a card is placed on an incorrect link. This information may assist the designer in re-organizing the navigational structure. ○ User testing. Direct observation provides more objective information than surveys [6]. The user should now have some familiarity with the site, and is provided with a set of tasks to assess how efficient the web site is in letting them perform common tasks, where problems occur and the reasons for these. The tasks are fully explained to the users, but no other assistance is provided. The participants are required to verbalize their thoughts and feelings during the task as this can generate valuable usability information [8]. The time, path, action, and verbalizations are recorded as in Table 3. 380 C.-Y. Yang Table 3. The task completion data analysis format Time (second) Path 5 Home 22 News Actions Think aloud protocol Moving the cursor around Still looking, still looking. Haven’t seen all links in this page any thing say “subscribe” at the moment. I am going to News area as it looks most related The path and completion times will later be compared to those provided by the designer. Where errors occur, the verbalizations and video are used to provide a rationale for this. Structured interview. The structured interview is conducted to assess information, visual, and function design. The interview is structured around a questionnaire, with the participant being required to provide a rationale for their ratings. As the questions are closely aligned to the contents of the web site, this necessitates participants using the website in some detail. All the ratings, for each question are combined for all the participants and average scores are used to determine the severity of the problems. Examples of the rationale are also presented so the designer can achieve a greater understanding of the design problem. 7 Conclusions This research considers the requirements to support commercial website design based on user’s and designer’s needs. Typically, HCI plays an important role in this domain and recent research shows that websites require more aspects to be taken into account. Without addressing specific issues such as marketing and pleasure, current usability methods will poorly support the design. Hence, the appropriateness of applying standard usability measures to website design was investigated. By incorporating the user’s and designer’s opinions, it was confirmed that websites not only have to meet usability criteria, they also have to increase the user base, likeability etc. It also showed that these issues can be achieved through improvements to the design components in navigation, information, visual, and functional aspects. Each aspect can be assessed efficiently and precisely by different evaluation techniques. Therefore, a multi-method method has been produced which is tailorable to different websites to advance the use of existing usability evaluation in commercial website design. In addition, the research has formalized the method into one which a designer can use. The studies undertaken have shown the validity, practicability and usefulness of this approach for website designers. In conclusion, the research has contributed to knowledge by identifying and filling the gap in the current use of evaluation methods by providing a method that practicing web designers can use with representative end users. Website Designer as an Evaluator 381 References 1. Benyon, D., Davies, G., Keller, L., Preece, J., Rogers, Y.: A Guide to Usablity. UK, The Open University 2. Berkun, S.: The role of flow in web design, Microsoft Corporation (1990), http:// msdn.microsoft.com/library/en-us/dnhfact.html/hfactor10_1. asp?frame=true (accessed 24/11/2001) 3. Brinck, T., Gergle, D., Wood, S.D.: Designing Web Sites that Work - Usability for the Web. Academic Press, USA (2002) 4. Dumas, J.S., Redish, J.C.: A Practical Guide to Usability Testing, Intellect (1999) 5. Kirakowski, J.C.N., Whitehand, R.: Human Centered Measures of Success in Web Site Design. In: 4th Conference on Human Factors & the Web, New Jersey, USA (1998), http://www.research.att.com/conf/hfweb/proceedings/ kirakowski/ (accessed 06/04/2000) 6. Moseley, B.: Test the Usability of Your Web Site. Folio:PLUS 30(Part 4), 9–10 (2001) 7. Newman, M., Landay, J.A.: Sitemaps, Storyboards, and Specifications: A Sketch of Web site design Practice. In: DIS 2000, New York (2000) 8. Nielsen, J.: Usability Evaluation and Inspection Methods. Addison-Wesley, Reading (1993) 9. Nielsen, J.: Designing web usability: the practice of simplicity. New Riders Publishing, USA (2000) 10. Nielsen, J., Mack, R.L. (eds.): Usability Inspection Methods. John Wiley & Sons, New York (1994) 11. Norman, D.A.: The Design of Everyday Things. New York, Doubleday (1990) 12. Rubin, J.: Handbook of Usability Testing: how to plan, design, and conduct effective tests. John Wiley & Sons, USA (1994) 13. Spool, J.M., Scanlon, T., Snyder, C., Schroeder, W., DeAngelo, T., et al.: Web Site Usability: A Designer’s Guide. Academic Press, San Diego (1999) 14. Urquijo, S.P., Scrivener, S.A.R., Palmen, H.: The Use of Breakdown Analysis in Synchronous CSCW System Design. In: Proc. of the Third European conference on Computersupported Cooperated Work, Milan, Italy (1993) 15. Waldegg, P.B.: Handing Cultural Factors in Human Computer Interaction. Derby, UK, doctoral thesis (unpublished, 1998) 16. Winckler, M., Pimenta, M., Plalanque, P., Farenc, C.: Usability Evaluation Methods: What is still missing for the Web? In: 8th International Conference on HCI International, New Orleans, USA, August 5-10 (2001) 17. Woodcock, A.: Supporting Ergonomics in Concept Design, Loughborough, Loughborough University, doctoral thesis (unpublished, 2001) 18. Woodcock, A., Scrivener, S.A.R.: Breakdown Analysis. In: McCabe, P. (ed.) Conference of Contemporary Ergonomics, pp. 271–276. Taylor and Francis, Edinburgh, UK (2003) Building on the Usability Study: Two Explorations on How to Better Understand an Interface Anshu Agarwal and Madhu Prabaker salesforce.com, The Landmark @ One Market St. Suite 300, San Francisco, CA 94105 {aagarwal,mprabaker}@salesforce.com Abstract. In this paper, we describe two separate studies that improved our ability to understand our users’ experience of our products at salesforce.com. The first study explored a methodology of combining expert and novice performance data to yield a measure of intuitiveness. The second study created a methodology that combines both verbal and nonverbal emotion scales to better understand the emotional effect our products have on our users. We present both these methods as expansions on the standard usability study and examples of ways to better understand your users within an industry environment. 1 Introduction Positive user experience is often considered synonymous with good usability. Indeed, empirical usability studies are often used as the sole indicator of overall user experience. However, our work as usability practitioners has suggested that certain components of user experience – such as intuitiveness and emotional response – may not be sufficiently measured through the usability study. 1.1 Study One: Defining Intuitiveness The word “intuitive” is a term that has become increasingly common among user experience professionals. It is used in a manner that suggests that it is, at most, a requisite for a good user experience and, at least, a strongly desired characteristic. A commonsense definition provided by Oxford American Dictionary is “easy to use and understand”. Naumann et al. have derived the more formal definition: “A technical system is, in the context of a certain task, intuitively usable while the particular user is able to interact effectively, not-consciously using previous knowledge” [6]. The key component in both definitions seems to be that an intuitive interface enables users to complete tasks efficiently without high cognitive demands. Although we have metrics for evaluating the efficiency of an interface (e.g. time on task), it is difficult to easily measure a user’s cognitive load while using a product. Our research goal in the investigation that follows was create a more usable definition of “intuitiveness”. This definition, in addition to making this useful concept more measurable, aims to allow usability professionals the ability to draw more meaningful and complete conclusions about the effectiveness of their interfaces. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 385–394, 2009. © Springer-Verlag Berlin Heidelberg 2009 386 A. Agarwal and M. Prabaker 1.2 Study Two: Defining Emotional Response The topic of emotion has recently attracted increased research attention in HCI studies[1]. Numerous authors have proposed that that emotion may play an important role in user performance and user experience. However, very few “real world” case studies have been conducted to study the role of emotion in an HCI context. It is important to first define the often vague term "emotion." However, coming up with a precise and scientifically respectable definition of the term is notoriously difficult. As one might imagine, there are many definitions of "emotion" in the relevant literature [4]. Nevertheless, there are two generally agreed on aspects of what actually constitutes human emotion [1]. First, emotion is a psychological reaction to events relevant to the needs, goals, or concerns of an individual. Second, emotion is comprised of physiological, affective, behavioral, and cognitive components [1]. 2 Study One: Measuring Intuitiveness Although it is often advisable to ensure that designs work equally well with both novice and expert users, not all systems need to be evaluated with this range of expertise. For most “walk up and use” systems, like movie kiosks, or one-time use systems like installation programs, it may not be necessary to test expert users. However, for most consumer and enterprise software systems, the system must allow experienced users to perform their tasks efficiently and novice users to complete tasks effectively without requiring extensive training or practice. 2.1 Measuring Novice Performance By performing an empirical usability study and measuring the average task completion time across a group of novice users, we can begin to understand how well a particular design performs. Although task completion times allow us to say, “it took a user x seconds to complete a task”, they fail to help us understand whether this time is too long or acceptable. To provide a more comparative understanding, we often visually compare time across all tasks (fig. 1). From this we can say, “it took an average novice user x seconds more to complete Task 3 than Task 4”. Although this is a more meaningful statement, it is still difficult to understand how long a task should take. Fig. 1. An example of a visualization of the average task times for novice users on a system Building on the Usability Study 387 2.2 Measuring Expert Performance When designing interfaces, we are often concerned with making tasks as efficient as possible. Although we can gauge expert performance by conducting usability studies with expert users and recording task completion time, this is often challenging in practice. It’s difficult to find users who are experts in all aspects of an interface – experts in one functional area are often novices in others. Additionally, an expert may not yet exist for a new design. For these reasons, practitioners have utilized human performance modeling methods to create reliable estimates of task performance time for skilled users. A particular model for expert user performance that has proven to produce highly useful and scientifically valid results is Keystroke-Level Modeling (KLM) [2]. When provided with a description of a task being performed, the model applies human performance estimates to produce a predicted task completion time. For the purposes of this paper it is not essential that we understand how exactly KLM is derived, but rather, that KLM is a relatively quick and low cost way to get expert performance task time data. Plotting these values results in a chart that shows expected task completion times for expert users (Fig. 2). Another way to look at this is that these values represent the efficiency limit of a particular design. Using this we can make statements about the minimum task times imposed by the design; for example, “we expect that Task 5 will take at minimum x seconds”. Fig. 2. An example of a visualization of the average task times for expert users on a system 2.3 Deriving a Measure of Intuitiveness Revisiting Our Definition. Earlier we defined intuitive as being able to “interact effectively, not consciously using previous knowledge”. We also showed how we could get measurements of novice and expert performance across a set of tasks using a particular design. Mapping these two concepts on each other yields a more measurable definition – an intuitive interface can be thought of as one that “minimizes the difference between expert and novice task performance”. When an expert is using the system, they are not consciously thinking about how to use the system, rather, how to solve the task at hand. In this way, the closer the novice user performance resembles expert performance, the more intuitive the interface can be regarded. 388 A. Agarwal and M. Prabaker Combining Expert and Novice Performance. In the Figure 3, shown below, we’ve plotted the task completion time for both novice and expert users. The intuitiveness line shows the difference between the novice user performance and the efficiency limit of the design. This is a more meaningful metric than the novice or expert measures alone because it enables us to make statements like, “novice users took x seconds longer on Task 8 than our design called for.” It’s important to not underestimate how much more powerful this statement is in driving design changes; it allows us to more explicitly recognize and dissociate the limits imposed by the design (the efficiency limit) from the observed performance data. Fig. 3. The expert visualization (fig. 2) has been superimposed on the novice visualization (fig. 1). The difference between the times is shown as a line. Additionally, because this visualization quantitatively takes into account the inherent difficulty differences between tasks, it enables us to notice phenomenon that’s hidden in the novice performance data. For example, although it initially appeared that Tasks 5, 2, 8, and 9 may be problematic because of their relatively long task performance times, Tasks 8, 2, and 9 are the ones that really demand attention – the design actually performed well for Task 5. While it is quite common in industry to invest the time and money to gather quantitative metrics for novice users, expert efficiency analysis is not always done. By using accurate expert task prediction models, we can achieve deeper insight in our analysis without requiring significant additional resources in our testing phase. 2.4 Empirical Validation of the Method In order to validate that this method of deriving “Intuitiveness” yielded valuable insight into the performance of a design that was not achieved through the standard usability study, we employed this technique within a comparative study between two versions of a Customer Relationship Management (CRM) application. To understand the novice user experience, we employed a between-subjects study design where we recruited 18 experienced salespeople to perform a set of 10 common, representative sales tasks (e.g. adding tasks, converting a lead, sharing an opportunity, Building on the Usability Study 389 etc) as quickly as possibly without committing any errors on one of two CRM applications (Application A and Application B). All participants reported familiarity with each of the sales tasks, but none of them had prior experience with the application they were assigned to. For each session, participants were presented with the tasks in a randomized order and among the dependent metrics collected were Time on Task and Number of Assists. Because we were focused on understanding a more natural assessment of the time it took novice users to complete a task, we chose to provide assists instead of capturing the number of errors committed1. This methodology ensured that all participants completed each task and that our Time on Task metric captured the inherent difficulty novice users had. To understand the expert performance times for each of the 10 tasks, we performed KLM analysis using the software application, CogTool [7]. Table 1. The table below shows the 10 task times for Novice Users (Empirical), Expert Users (KLM), and the difference between these two (Intuitiveness). The design that performed better has been bolded for each task. Novice Performance A B Expert Performance A B Difference (Intuitiveness) A B 1. Complete a Task 153.91 91.22 19.82 21.74 134.09 69.48 2. Add a few tasks 187.75 197.19 64.81 55.46 122.94 141.73 3. Edit a contact 118.06 118.71 19.47 21.15 98.59 97.57 4. Convert a lead 114.26 184.51 17.26 23.75 97.00 160.77 5. View reports on leads 82.93 105.57 6.93 8.82 76.00 96.75 6. Share an opportunity 150.35 152.65 25.63 25.27 124.73 127.38 7. Manipulate a calendar entry 8. Manipulate a forecast 231.57 203.36 30.1 33.65 201.46 169.71 77.68 93.05 6.82 6.48 70.87 86.58 9. Create a campaign with leads 10. Search using help 292.28 237.77 27.16 54.72 265.12 183.05 58.215 86.33 13.65 9.07 44.56 77.25 Analysis and Results. Although no statistical difference was found across the overall task performance of novice users, users were statistically faster on Task 1 using Application B (p = 0.008)2. Application A had a lower expert performance time for six out of ten tasks3. 1 An assist was provided when the participant ceased making progress towards the completion of the task. The assist was given such that it only provided the user with enough direction to make it to the next step in the task and only when it became clear that the user was unable to advance to the next step. 2 We performed a Two-Sample T-Test on the novice, empirical, performance data. 3 Since the KLM values are not empirically derived, we can consider any difference between the designs as significant. 390 A. Agarwal and M. Prabaker The value of this method can be seen based on how the conclusions might differ based on the data at hand. Armed with only the traditional, empirical usability study data we might be able to derive that both applications performed equally well with novice participants, though Application B had a more efficient interface for Task 1. Therefore if we are redesigning Application A, we should focus our effort on improving our design for Task 1; since the other nine tasks performed statistically similarly it’s unclear as to whether the designs on both are equally good or equally poor. However, once we add the expert performance metric and derive the Intuitiveness metric, we start to see a more interesting and insightful picture. Application B’s faster time for Task 1 cannot be attributed to an overall more efficient design – in fact Application A’s design allowed for expert users to complete the task faster than Application B’s design. Therefore, for Task 1, although Application A was more efficient than Application B, it was less intuitive. In this way we’ve changed the focus of our redesign efforts from a focus on efficiency to a focus on making it easier for the novice user to accomplish. With this insight, if our task is to redesign Application A we cannot help but notice that, in additional to Task 1, Tasks 7 and 9 should be the focus of our efforts. 3 Study Two: Measuring Emotional Response Emotion is an inherently complex construct to study. As such, researchers have created many different emotion measurement tools, including verbal measurement tools, nonverbal measurement tools, and physiological measurement tools in an effort to meet this challenge. In this study, our research challenge was to develop an emotion measure that would be quick to utilize, easy to understand, deployable remotely, and easy to incorporate into an empirical usability study. Given the nature of emotion, it would seem that “fuzzy” nonverbal measures would be most apt to assess emotion. However, most of the nonverbal measures in the HCI literature are either impractical in a “real world” setting, or of unknown validity. We therefore decided to combine an extensively used and validated verbal scale with a more experimental non-verbal emotion measure to improve the strength of our methodology. 3.1 Verbal and Non-verbal Emotion Measurement For the verbal component, we chose to utilize the PAD (Pleasure, Arousal, and Dominance) Semantic Differential Scale developed by Mehrabian and Russell [5]. By rating a set of bipolar adjective pairs along a nine-point range, this scale was shown to measure three important aspects of emotion: Pleasure, Arousal, and Dominance. Pleasure may be defined as a positive affective state, which is separate from feelings such as preference and reinforcement. Arousal refers to an emotional state from sleepy to very excited. The final dimension, Dominance, refers to the extent to which a person feels unrestricted or free from outside control. We reviewed Mehrabian and Russell’s original adjective sets to ensure that the pairs were relevant to interface emotional responses (Table 2). Building on the Usability Study 391 Table 2. Although we maintained most of the original adjective word pairings of the PAD scale, we revised some pairings to ensure that the scale was concise and relevant to software interface assessment PAD Dimension Pleasure Arousal Dominance Maintained Pairs Annoyed - Pleased Unsatisfied – Satisfied Despairing - Hopeful Relaxed – Stimulated Calm – Excited Sleepy – Wide Awake Unaroused - Aroused Controlled – Controlling Influenced – Influential Submissive – Dominant Guided - Autonomous Discarded Pairs Melancholic – Contented Bored – Relaxed Unhappy - Happy Sluggish – Frenzied Dull - Jittery Cared for – In control Awed - Important Additional Pairs Tense – Relaxed Friendly - Unfriendly None None We selected the Emocard tool by Desmet for the non-verbal component of our measure (fig. 4) [3]. The Emocard tool consists of sixteen cartoon-like faces, half male and half female, in which each face represents a combination of Pleasure and Arousal. We interpreted results in the Calm-Pleasant and Excited-Pleasant quadrants as positive feedback. Fig. 4. The Emocard tool was an effective nonverbal measurement of emotional response which used human-like representations of emotion 3.2 Empirical Validation of the Method In order to validate this methodology, we performed a comparative study between two versions of a CRM application interface. We collected traditional usability measures (time on task and number of errors), as well as the new dual emotion measure we constructed. This measure utilized both the non-verbal Emocards and the verbal PAD scale methods in a linear fashion. 392 A. Agarwal and M. Prabaker Twenty-two participants, thirteen male and nine female, were assigned to assess one of the two versions of the interface. Although participants had experience with CRM, they had no prior experience with the interface they were evaluating. Seven comparable CRM tasks between the two interfaces were created (e.g. manipulate a calendar entry, view a report of leads by source, create a new marketing campaign, etc). These tasks were representative of typical sales users of CRM interfaces. Tasks were randomized and participants were assigned to one of the three task list versions. As collected in a usability study, the traditional measures of time on task and number of errors were collected during each task. This was followed by an online survey where participants selected the Emocard that best represented their initial emotional reaction to each task. Participants then continued onto the PAD scale, and were asked for their qualitative feedback. This procedure was repeated for each task. Analysis and Results. No significant differences were found between interfaces using the usability measures collected in the study4. Neither time on task nor number of errors was significantly different between the interfaces when analyzed both overall across all tasks and by individual task (p > .05). Analysis of the PAD scale, however, did show significant differences in participants' emotional responses between interfaces. Overall, the Interface A was significantly rated by participants as being more Satisfying and Friendly (p < .05). When analyzed by task, users rated Interface A as more Pleasing and Relaxing for three out of seven tasks (p < .05). Participants therefore found that Interface A elicited a more positive emotional experience than Interface B, even though user’s performance levels in the usability studies were almost identical. Emocard responses were then compared between the two interfaces for each of the seven tasks (Fig. 5). As can be seen in the figure, clear differences and patterns in how users immediately reacted to the interfaces can be identified. Interface A elicited a more consistently positive response compared to Interface B, which included the selection of a few Emocards that represented negative emotions. Fig. 5. Emocard selections for a sample task between Interface A (left) and Interface B (right) show clear differences in users’ immediate emotional responses 4 An independent sample t-test was used for analysis of the data to compare the two interfaces. Building on the Usability Study 393 Qualitative feedback was also collected for each task. Two sample participant quotes are provided below: “It took me a while to find the [content]… I chose the slightly perplexed face… after exploring I found the [content] but initially it was a bit frustrating.” “I absolutely hate when I see something red that pops up and doesn't tell me anything... It makes me feel stupid. It drives me up the wall. I put a sad face, because it makes me kind of sad… I had a strong negative reaction to that. It was kind of unexpected, [Interface B] had a nice clean interface then this red blinking error popped up out of nowhere. It made me kind of tense.” As indicated in these quotes, the qualitative data we collected was both rich in content and often emotionally charged. 3.3 Studying Emotional Response: Considerations Practitioners might assume that positive emotional response may be adequately indicated through usability metrics. However, the results of this study suggest that this may not be the case. If we had utilized only the usability metrics of time on task and number of errors as measures of user experience – and believed these measures to be comprehensive indicators of user experience – we would have concluded that the quality of the user experience for both interfaces were nearly identical. This conclusion, however, would have been incorrect, and, at the very least, incomplete. Differing emotional response to the two interfaces demonstrated that there were significant distinctions between the two interfaces beyond just that of usability. Additionally, these emotions may not only be central to how a user judges the overall product experience, but may also affect how a user perceives its usability. The goal of this study was to demonstrate the value of studying emotion and to test metrics for this purpose. Utilization of these metrics may help open up opportunities for HCI practitioners to incorporate fruitful and insightful emotional study into their process. Moreover, interaction designers of software interfaces may best be able to utilize the results of emotion studies to help enhance their interface designs. 4 Conclusion The two studies outlined in this paper demonstrate how studying emotion and measuring intuitiveness can add value to traditional user experience research. Both studies utilize new methods that practitioners can use to build upon the traditional usability study. Both explorations also yielded significant insight into our understanding of our users’ experience with marginal additional effort. The research efforts discussed here were only initial exploratory studies that merit further research. The intuitiveness measure still demands more empirical testing to validate its ongoing value and accuracy. Although emotional response has been proven a valuable aspect to study, further exploration of how interfaces might be improved based upon the results should be conducted. In the end, these methodologies hope to benefit the user experience community by encouraging practitioners to extend their everyday usability research in search of greater insights. 394 A. Agarwal and M. Prabaker Acknowledgments. We thank the User Experience team at salesforce.com for all their help, support, and interest in this research. References 1. Brave, S., Nass, C.: Emotion in human-computer interaction. In: Jacko, J., Sears, A. (eds.) Handbook of human-computer interaction, pp. 251–271. Lawrence Erlbaum Associates, Mahwah (2002) 2. Card, S.K., Moran, T.P., Newell, A.: The keystroke-level model for user performance time with interactive systems. Communications of the ACM 23(7), 396–410 (1980) 3. Desmet, P.M.A.: Emotion through expression; designing mobile telephones with an emotional fit. Report of Modeling the Evaluation Structure of KANSEI 3, 103–110 (2000) 4. Kleinginna Jr., P.R., Kleinginna, A.M.: A categorized list of emotion definitions, with suggestions for a consensual definition. Motivation and Emotion 5(4), 345–379 (1981) 5. Mehrabian, A., Russell, J.A.: An approach to environmental psychology. MIT Press, Cambridge (1974) 6. Naumann, A., Hurtienne, J., Israel, J.H., Mohs, C., Kindsmüller, M.C., Meyer, H.A., Husslein, S.: Intuitive Use of User Interfaces: Defining a Vague Concept. In: Harris, D. (ed.) Engineering Psychology and Cognitive Ergonomics, HCII 2007, vol. 13, pp. 128–136. Springer, Heidelberg (2007) 7. The CogTool Project: Tools for Cognitive Performance Modeling for Interactive Devices. Carnegie Mellon University (April 16, 2006), http://www.cs.cmu.edu/~bej/cogtool/index.html Measuring User Performance for Different Interfaces Using a Word Processor Prototype Tanya R. Beelders, Pieter J. Blignaut, Theo McDonald, and Engela H. Dednam Department of Computer Science and Informatics, University of the Free State, South Africa {beelderstr,pieterb,theo,dednameh}.sci@ufs.ac.za Abstract. Usability tests were conducted in order to establish the effect on user performance of different icon sets in a word processor. Both a set of alternative pictorial icons and text buttons were developed for a subset of word processor functions for comparison with the standard icons. In order to accommodate users in their home language the interface was available in English, Afrikaans and Sotho to determine whether usability of a product is increased when the users are allowed to interact with the product in their mother tongue rather than having to use the commonly available English interface. The scores obtained for completed tests as well as the time taken to complete tasks successfully were evaluated. Results indicate that neither icons nor language play a significant part in the usability of a product. In fact, the only significant contributor to user performance was the word processor expertise of the user. Keywords: Usability, word processor, icons, text buttons, localization. 1 Introduction The advent of the graphical user interface (GUI) resulted in an increase in the use of icons within computer applications [1]. More than 50% of the sales revenue for most computer companies is generated from markets outside of the country of development [2]. Furthermore, users have exhibited distinguishable preferences for interface components such as language, navigation, symbols and color [3]. Together these facts motivate the need for careful consideration of translation and icon development, amongst other things, in user interfaces [4]. This study first produced a set of alternative icons and then compared them to the standard word processor icons (currently found in the Microsoft Office software package), to determine whether the alternatives are better suited to South African users. Secondly, the research aimed to investigate how text-based icons and pictorial icons affect the usability of a word processor product and taking a step further the issue of translation was also tested in order to determine whether there is a need for translation of the interface into the first language of the user. 1.1 Usability According to the International Standards Organisation (ISO) standard 9126-1 usability is “the capability of the software product to be understood, learned, used and attractive J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp.395–404, 2009. © Springer-Verlag Berlin Heidelberg 2009 396 T.R. Beelders et al. to the user, when used under specified conditions”. This definition is further expanded upon in ISO 9241-11 where usability is defined as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” [5]. In terms of these definitions, four distinct components of usability can be identified, namely effectiveness, efficiency, satisfaction and learnability. Shneiderman [6] lists a set of five measurable objectives that can be measured in order to determine the usability of a product and several usability models are also available that provide a number of measurements which can be used by developers to comprehensively test the usability of a product [7]. 1.2 Icons Any attention that is devoted to the interface detracts from the concentration of the user and constitutes interference with the primary task [8]. Since processing of text is considered to be a cognitive task, and the user is typically focused on some cognitive task when using a computer, more interference will be caused when using a textbased interface as it draws on the same cognitive resources as those required during completion of the task [8]. Icons are common interface components that employ images to represent an object or an action that can be carried out by the user [8]. Continued use of icons has been attributed to the fact that they are easier for users to learn and to use [1]. Their use also increases the productivity of the user since recognition is generally faster for a picture than for text [1], [8]. One disadvantage of icons is that they may be misinterpreted by users if the chosen image invokes unintended associations [8] – the picture that “speaks a thousand words may say a thousand different words to different viewers” [9]. No visible advantages have been detected when using pictorial icons rather than a text based interface [8], while it was also apparent that neither pictorial nor text icons were always immediately recognizable to users [9]. Furthermore, the tooltips that appear below the icons as an expanded explanation for the icon also did not always serve to assist the user in determining what the icon represents [9]. 1.3 Language The issue of translation into the home language of the user has proven to be a fairly contentious one, with many researchers determining that translation increases usability of the product [10], whilst others advocate caution when considering translation, as not all users show a preference for carrying out tasks in their mother tongue [11], [12] and performance is often hampered by translation [11]. These studies did however focus on translation of web content which typically contains large amounts of text which must be read by the user, much more so than the single commands found in a word processing environment. Users of an interface that is not in their first language do however encounter a number of inherent problems, one of which is verbal context – where surrounding words serve to place a word in context, allowing users to identify the actual meaning of the word rather than the potential meaning thereof [13]. Many interfaces do not include verbal context in menus, toolbars or buttons, which is clearly disadvantageous Measuring User Performance for Different Interfaces 397 to second language users [13] and could also lead to difficulty for novice or first time users who are unfamiliar with the domain terminology and concepts. 2 Methodology A small word processor application was developed which possessed minimal capabilities, while still ensuring that it was representative of a fully-fledged word processor or advanced text editor. Functions which were incorporated into the word processor prototype included document handling (e.g. open and close), text formatting (e.g. font size and style) and text manipulation (e.g. copy, cut and paste). Users were required to complete a number of simple tasks, representative of common word processor tasks, such as font formatting. The tasks were displayed sequentially and individually at the bottom of the word processor window (Fig. 1) and could be completed solely by making use of either a toolbar shortcut (icon) or a menu option. The prototype allowed for real-time evaluation of the tasks, that is, once the user had completed a task the application immediately determined whether or not the task had been completed successfully as well as the capturing of certain measurements such as the time required to complete the task. Each task was assigned a difficulty index based on the number of actions or inferences which had to be carried out by the user in order to complete the task successfully. Tasks had difficulty indices ranging from 3 to 8. Fig. 1. Word processor prototype with alternative icons and English menu 398 T.R. Beelders et al. 2.1 Subjects The test subjects consisted of first year university students who were taking a basic computer literacy course. The test was conducted during the first practical session of the course, before the subjects had received any instruction in word processor packages. Test subjects spoke a variety of languages, including English, Afrikaans, Sotho, Tswana, Xhosa and Zulu. All subjects were conversant in either English or Afrikaans as these are the tuition languages of the university. The participants provided for different levels of word processor expertise. Of the participants 403 were female and 283 male. 2.2 Languages As mentioned above there was a wide range of different languages spoken amongst the test subjects. Since the interface was only available in English, Afrikaans and Sotho, the participants were divided into one of these three groups according to their first language (L1). Afrikaans users completed the test on either an Afrikaans (L1), or an English (their second language - L2) interface. Sotho and Tswana users completed the test either in Sotho (L1), or in English (L2). The remainder of the users completed the test in English, where English would either be their L1 or L2. 2.3 Icon Sets Three sets of icons were used in the different interfaces, namely (i) the standard icons found in the Microsoft Office package, (ii) an alternative set of icons obtained from previous studies [14] and via two brainstorming sessions (see Fig 1), and (iii) text based icons. The set of icons obtained during the first brainstorming session were distributed amongst potential word processor users. Respondents were required to indicate which icon they would choose for each of a number of listed word processor functions. Alternative icons for Open, Close, Save, Cut, Copy and Paste were determined in this way. The remainder of the icons were developed during a second brainstorming session, and these were included in the design without confirmation by non-computer literate users. The icons were developed to provide more context for novice users in the hope that these users would easily be able to relate to the concepts depicted by the icons. For example, the icons used for Bold, Italic and Underline consisted of a bold, italic or underlined capital letter “F” respectively. This was done in an effort to convey to the user the font changes that would occur if the function were invoked. By using the same letter throughout and by placing them adjacent to one another on the toolbar, easier visualization of the font styling (Fig. 1) was ensured. The textual word icons had no images; instead they displayed the name of the function they represented and were available in English, Afrikaans and Sotho. 2.4 Menus and Tooltips The menu structure, when available in the interface, was the same as the standard menu found in the Microsoft Office 2003 package and the toolbar situated at the top of the screen was divided into the standard and formatting toolbars. Measuring User Performance for Different Interfaces 399 To enable the effect of the icons to be tested without interference from other interface components, each pictorial icon set was included in an interface with neither menus nor tooltips. This ensured that the user had to rely entirely on interpretation of the icon when using this interface. The next group of interfaces to be added to the test interfaces used the same pictorial icon sets but tooltips were added in English, Afrikaans or Sotho. To complete the set of interfaces for testing, the afore-mentioned interfaces were used as a base to which a menu was added in the same language as the tooltips for that particular interface. The interface using the text-based icons had no menu although the tooltips were still used. This was to compensate for the fact that often the entire function name did not fit on the button; in particular the Sotho translations. To ensure legibility of the icons, a shortened version of the function was placed on the button and the full-length version was displayed in the tooltip to provide verbal context. 3 Analysis Taking all of the above-mentioned considerations into account, there were seven possible interface configurations (Table 1). Interfaces 3 and 6 (Table 1) have no language component to speak of, since they have neither a menu nor tooltips, but the remainder of the interfaces were available in either the users’ L1 or L2, resulting in a total of 12 different interfaces. The two interfaces without a language component (3 and 6) were removed from the initial analysis to be evaluated separately from the remainder of the interfaces. The subjects who participated in the study and completed the test on the interfaces that contained a language component are designated as group A, and the rest of the subjects are categorized as group B. Table 1. User distribution Group Group A Group B Total Interface 1 Standard icons, menu, tooltips 2 Standard icons, no menu, tooltips 4 Alternative icons, menu, tooltips 5 Alternative icons, no menu, tooltips 7 Text icons, no menu, tooltips Standard icons, no 3 menu, no tooltips Alternative icons, no 6 menu, no tooltips Language L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 Novice 24 26 22 24 13 26 21 25 15 33 Expert 20 23 26 13 17 23 23 26 20 25 19 15 17 265 21 252 Total 445 72 400 T.R. Beelders et al. Each user was classified as being a novice, intermediate or an expert word processor user based on their level of experience with a word processor application and the frequency with which they had made use of such an application prior to the test. The level of frequency and experience were rated on a scale of 0 to 4 and 0 to 5 respectively. These individual ratings were then cross-multiplied to obtain a scale consisting of fourteen distinct expertise ratings. In order to eliminate the effects of an individual’s uncertainty regarding expertise, the intermediate group was not included in the analysis of the results. The final distribution of users is shown in Table 1. As an effectiveness measurement, each user was assigned a weighted score which was calculated as the sum of the cognitive loads of all the tasks completed successfully by that user. The time taken to complete each task, which measures efficiency, was measured in seconds and then converted to 1/time for further analysis. A factorial ANOVA was used to test the following hypotheses: 1. H0,1: The word processor expertise of the user has no effect on the test score. 2. H0,2: The interface used has no effect on the test score. 3. H0,3: An interface in the user’s L1 or in L2 has no effect on the test score. 4. H0,4: The word processor expertise of the user has no effect on the time taken to complete the task. 5. H0,5: The interface used has no effect on the time taken to complete the task. 6. H0,6: An interface in the user’s L1 or L2 has no effect on the time taken to complete the task. 3.1 Analysis of Group A Group A consisted of those users who used any one of the interfaces 1, 2, 4, 5 or 7. Furthermore, these interfaces could either be in the L1 or the L2 of the user. Table 2. 1/Time ANOVA results for Group A and the consolidated group Task A 1 364 2 395 3 406 4 419 5 270 6 360 7 396 8 425 9 424 10 405 11 409 * p < 0.05 n Both 418 457 467 486 312 396 457 493 492 464 467 Expertise A Both 0.004* 0.022* 0.000* 0.000* 0.000* 0.000* 0.000* 0.000* 0.000* 0.029* 0.000* 0.002* 0.001* 0.000* 0.000* 0.000* 0.000* 0.000* 0.002* 0.002* 0.002* 0.004* Interface A Both 0.897 0.774 0.002* 0.001* 0.989 0.845 0.779 0.339 0.621 0.781 0.402 0.001* 0.147 0.256 0.225 0.281 0.041* 0.006* 0.755 0.419 0.848 0.120 Language A 0.577 0.940 0.743 0.886 0.867 0.468 0.697 0.802 0.624 0.567 0.579 Measuring User Performance for Different Interfaces 401 H0,1 was rejected (FExpertise(1, 425) = 27.73, p = 0 < 0.001), indicating that the word processor expertise of the user did indeed have an effect on the score achieved by the user. Neither H0,2 (FInterface(4, 425) = 1.10, p = 0.356 > 0.05) nor H0,3 (FLanguage(1, 425) = 0, p = 0.947 > 0.05) could be rejected, leading to the conclusion that neither the interface nor the language had an effect on the score achieved by the user. The results of the ANOVA for the time analysis, which included only correctly completed tasks in the analysis, are summarized in Table 2 (italicized font). An α level of 0.05 was used throughout to distinguish between significant and nonsignificant differences. For the sake of brevity, the results of the interaction between the variables have been excluded as they all had a p-value of over 0.05. As would be expected, expert users performed significantly better than novice users with hypothesis 3 being rejected for all of the tasks. Hypothesis 5 could be rejected for the tasks that require a single word to be made bold (p = 0.002, task 2) and a phrase to be italicized (p = 0.041, task 9). Hypothesis 6 could not be rejected for any of the tasks at an α level of 0.05, therefore it could be concluded that the interface language had no effect on the time needed by the user to complete the task successfully. As discussed previously, the group B users were removed from the analysis since the interfaces had neither a menu nor tooltips, thus containing no language component. Users of this group had to rely entirely on interpretation of the icon and as such could not be included in the initial analysis. Since it was shown that language had no effect on either the score achieved by the user or the time taken to complete the tasks, the need to separate the groups no longer existed and the two groups could be consolidated into a single group for the remaining analysis. 3.2 Analysis of the Consolidated Group Groups A and B were amalgamated into a single user group in which language no longer played a role. The analysis of this group included users of all seven interfaces listed previously. Hypothesis 3 and 6 as listed above were no longer applicable. H0,1 was rejected (FExpertise(1, 503) = 26.47, p = 0 < 0.001) indicating that the word processor expertise of the user does indeed have an effect on the score achieved by the user. The interface used had no effect on the score achieved by the users, so H0,2 (FInterface(6, 503) = 2.01, p = 0.063 > 0.05) could not be rejected. The results of the ANOVA for the time analysis are summarized in Table 2 with non-italicized font. Once again, only tasks that were completed successfully were included. Hypothesis 1 could be rejected for all of the completed tasks since expert users performed significantly better than novice users in all of these tasks. Hypothesis 2 could be rejected for three of the eleven tasks. Possible reasons for these observations are discussed below. • Task 2: Change font style of a single word to bold (p = 0.001) A significant difference was found between the users of interface 2 and interface 7 as well as between users of interface 2 and interface 6. This indicates that the icons contribute significantly to the performance of the user. In both cases, users of the standard icon had a shorter completion time than users of the other two interfaces. This indicates that the standard icon for Bold is extremely intuitive and succeeds in con- 402 T.R. Beelders et al. veying the concept of bold to the user, even more so than does the word “Bold” on the button. Since only those users of the alternative icons where no tooltips were used showed a significantly longer completion time than those of the standard icons, it would seem that the tooltips assisted the users in deciphering the functions linked to the icons for the remainder of the alternative icon users. • Task 6: Close a document (p = 0.001) With an average completion time of 53 seconds, users of the alternative icons with no menu and no tooltips (interface 6) took longer to close a document than all other users, who completed the task in times ranging from 18 seconds to just marginally longer than 20 seconds. The number of correct answers to this task was also the second lowest of all the tasks. Although the alternative icon for the Close function was chosen by questionnaire respondents, the results of this task show that it does not successfully communicate the concept of Close when used in an interface without any tooltips or menus to assist the user. In fact, the icon chosen by the respondents was actually designed as an alternative for an electronic mail interface. The icon appears to be acceptable when used in conjunction with a tooltip • Task 9: Italicize a phrase (p = 0.006) Post-hoc tests indicated that the most significant difference occurred between users of interface 6 and interface 2 as well as between users of interface 6 and interface 4. Alternative icon users with no tooltips and no menu (interface 6) had a significantly longer average completion time than the users of the other two mentioned interfaces. These results indicate that the alternative italic icon does not succeed in conveying the function to the user. However, once again, inclusion of a tooltip to indicate the icon purpose assisted the users in determining the functionality linked to the icon. 4 Discussion Overall, the most significant contributing factor to the performance of the users is that of word processor expertise. The interface used appears to have minimal effect on user performance. The only difference between the pictorial and text icons occurred in the task which required users to bold a single word. Thus, there is very little performance difference between users of pictorial icons and those using text icons, a finding which supports those of [8]. The majority of performance differences that were detected existed between users of the alternatively designed icons where tooltips were absent from the interface and one of the other interfaces. The attempt to place the set of styling icons in a concrete context by using the same lettering and simply changing the styling effect had mixed results. Users of these icons with no tooltips showed a remarkably slower completion rate than users of other interfaces. The icons did, however, not seem to impede user performance when used in combination with tooltips or a menu. The only other icon that was unsuccessfully implemented was that of Close. Even though this icon was chosen as the preferred icon by non-computer literate users, it did not succeed in conveying the function concept to the user. This finding motivates the need for usability testing of interfaces even in the case where the interface is designed with the assistance of end users, since preferred interface choices do not always increase the proficiency of the users. Measuring User Performance for Different Interfaces 403 Whether users work in their home language or not also has no effect on their productivity. These findings show that although users may not prefer to work on an interface in their L1 [11], [12], a translated interface does not hamper their performance, failing to corroborate the assertions that translation does increase user productivity [10] and that translation may adversely affect user performance [11]. The failure to confirm the results of previous studies may be attributed to the fact that the mentioned studies tested user performance on a translated website which contained large amounts of text [11], as opposed to single words or short phrases such as those used in this study. Also, where appropriate, word processing commands were placed in context, for example, the Sotho for Close and Open were translated as “close document” and “open document” respectively. 5 Conclusion All indications are that user performance is not adversely affected by different interfaces, be they textual or pictorial, or different languages. Rather it is the experience of the user which dictates the effectiveness and efficiency of user performance. Differences between users of the standard and alternative interfaces were minimal, indicating that there is no need for the development of an alternative interface. From these results it appears evident that once the user has been provided with enough training and has gained enough experience to be confident within the task and the application domain, they will easily adapt to changes in the interface. References 1. Abran, A., Khelfi, A., Surya, W., Suryn, W., Seffah, A.: Usability meanings and interpretations in ISO standards. Software Quality Journal 11, 325–338 (2003) 2. Benbasat, I., Todd, P.: An experimental investigation of interface design alternatives: icon vs. text and direct manipulation vs. menus. International Journal of Man-Machine Studies 38, 369–402 (1993) 3. Blignaut, P.J., McDonald, T.: The implications of reading and writing language preference for Internet access in a multilingual South Africa. S.A. Tydskrif vir Natuurwetenskap en Tegnologie (2006) (in Afrikaaans) 4. Bodley, G.J.H.: Design of computer user interfaces for Third World users. M.Com. Dissertation, University of Port Elizabeth, South Africa (1993) 5. Cyr, D., Trevor-Smith, H.: Localization of Web Design: An Empirical Comparison of German, Japanese, and U.S. Website Characteristics. Journal of the American Society for Information Science and Technology 55(13), 1–10 (2004) 6. De Wet, L., Blignaut, P., Burger, A.: Comprehension and usability variances among multicultural web users in South Africa. In: Proceedings of CHI 2002, Minneapolis (2002) 7. ISO9241: ISO 9241-11: Ergonomic requirements for office work with visual display terminals. Beuth, Berlin (1997) 8. Johns, S.M.: Colors, buttons, words and culture: Designing software for the global community. In: CODI Conference, April 9-11, Mesa, AZ (1997) 9. Kacmar, C.J., Carey, J.M.: Assessing the usability of icons in user interfaces. Behaviour and Information Technology 10(6), 443–457 (1991) 404 T.R. Beelders et al. 10. Kukulska-Hulme, A.: Communication with users: insights from second language acquisition. Interacting with Computers 12, 587–599 (2000) 11. Nielsen, J.: International Web Usability. Alertbox (August 1996) 12. Shneiderman, B.: Designing the user interface: Strategies for effective human-computer interaction, 3rd edn. Addison-Wesley, Reading (1998) 13. Teklebrhan, R., Blignaut, P.: A study on the effect of Western designed metaphors in some culture groups in South Africa. Technical Report, 2005/02, University of the Free State, South Africa (2005) 14. Zammit, K.: Computer icons: a picture says a thousand words or does it? Journal of Educational Computing Research 23(2), 217–231 (2000) Evaluating User Effectiveness in Exploratory Search with TouchGraph Google Interface Kemal Efe and Sabriye Ozerturk Center for Advanced Computer Studies, University of Louisiana, Lafayette, LA 70504 {efe,sxo7344}@cacs.louisiana.edu Abstract. TouchGraph Google Browser displays connectivity of similar pages around search results returned by Google. A major research question is: to what extent does this graph help improve user effectiveness during exploratory search? This paper reports on our user study with TouchGraph visualization. This study has interesting implications for designing user interfaces of search applications. 1 Introduction Search engines generally do a good job in finding information that is easy to label, like “calories in apples.” However, information is not always easy to articulate. In some cases users may not even know what the correct answer looks like until they have seen it. Learning, navigation and exploration play important parts in information seeking. A user generally starts with an initial query and successively refines it until desired information is found. Each new query in this progression reflects something learned along the way. Exploration by successive query-refinement is difficult. Users frequently give-up searches in frustration. A key area that has become a hot research topic in recent literature [6,7] is determining the right set of tools that will help users during exploratory search. It is not known what interface tools would support exploration. User-interface research [1] suggests that use of symbols and graphics improves cognition and perception. Visualizations can highlight aspects of information not comprehensible with plain text. For example, TouchGraph Google Browser1 displays connectivity of similar pages around search results. An interesting research question is to what extent does this graph help improve user effectiveness during exploratory search. This paper reports on our user study with TouchGraph visualization. 2 Related Work Earlier work has focused on display and navigation of search results. Koshman [5] studied TouchGraph interface for Amazon.com and examined user ability to select similar items. Their user test with 17 participants showed that there was a high overlap between system-suggested similar items and user-discovered similar items. 1 http://www.TouchGraph.com J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp.405–412, 2009. © Springer-Verlag Berlin Heidelberg 2009 406 K. Efe and S. Ozerturk Efe, Asutay, and Lakhotia [2] evaluated a visualization interface that allows following the links forward and backward. Additional tools allowed changing the scope of displayed graph, orientation support and backtracking. User tests with 50 participants showed that, given equal time, users were able to successfully complete twice as many search tasks as the users of a traditional interface. Heo and Hirtle [3] investigated different methods for visualizations of category information with 80 participants using distortion, zoom, and expanding outline. The study showed that performance did not improve with visualization tools, however the expanding outline was shown to be more useful to users than other visualizations. Hightower et al [4] studied visualizations of visited paths displayed in a tree structure. User tests based on 37 participants showed that users with the visualization tool completed the given set of search tasks nearly twice as fast as the control group without the visualization tool. 3 Research Question In this research we hypothesized that visualization can support exploratory search by displaying relationships among documents and by enhancing user interaction with the system. The major research question in this context is how well can users navigate a graphical presentation of related documents to reach desired information. Two aspects of this question are: a) how well can users navigate visualizations to reach multiple sources of related information on a subject, b) To what extent does this ability translate to reaching desired information. 4 Study Design 4.1 System TouchGraph is well suited for the research questions we consider. It supports exploratory search by displaying connections between related web sites. Web sites are displayed as nodes of a graph related to search results. Clicking on a node retrieves other related nodes and displays around the selected node. Additional information about a node is displayed by moving the mouse over the node. Display area also contains a list of search results in a vertical box on the left side of the screen. Selecting an item from the list highlights the corresponding graph node. It also displays site description in a special text area. An example screen generated in response to query “book sellers” is shown in Figure 1. 4.2 Participants Thirty-five computer science graduate students participated in the experiments. All of them said that they used Google on a daily basis, but none of them had knowledge of TouchGraph before the experiment. We randomly divided participants into two groups based on the parity of their student Id numbers. Students with even Id numbers were told to use Google and students with odd Id numbers were told to use TouchGraph. It turned out that there were 16 Google users and 19 TouchGraph users. Before the experiment, participants were given a five-minute training on TouchGraph to demonstrate different functionalities available. Evaluating User Effectiveness in Exploratory Search with TouchGraph Google Interface 407 Fig. 1. TouchGraph visualization 4.3 Experiments Two experiments were designed. In both experiments, participants were free to use any queries they wished. User effectiveness is measured by the number of successful searches completed within a fixed period of time. The first experiment is designed to measure ability of users to reach a multiplicity of related documents. Participants were provided the URLs of 20 web sites on used books (sales, exchange, collector clubs, etc) and they were asked to find as many of these sites as possible by entering queries to the system and by exploring related pages. Table 1 shows the list of URL’s the participants were required to find. Table 1. The list of related pages used in user test Addall.com Bookspot.com Booktalk.com Biblio.com Bkdirectory.com Resourcehelp.com Iblist.com Fetchbook.info Booksalefinder.com Edifyingspectacle.org Clresource.com Digital-librarian.com Bookwire.com Bookrecycler.org Abaa.org Pbfa.org Bookfinder.com BookSmith.tripod.com Book-ClubsResource.com Booksweep.com The second experiment contained 10 search tasks specifically designed to require exploration before reaching a document with the correct answer. Topics of search were selected to be specialized enough that a layperson is not likely to have independent knowledge. Participants were asked to record the URL of the required 408 K. Efe and S. Ozerturk document when they find it. For most search tasks, we tried to make sure that the required web source described in search question was unique in its information content. When in doubt, we provided page-specific information as part of search task specification. An example question in this test is as follows: Find a web site that offers thematic online maps relating to various topics. Upper-right corner of the page contains links for interesting maps available there. Examples of maps include population maps, economic maps, airport locations, travel maps, and others. This site also has detailed information about various topics of interest, such as top-10 countries in the world (in the sense of various criteria), map of top 100 hotels in the world, map of top 100 wonders of the world, etc. The expected answer for this question was http://www.mapsofworld.com. When queried on Google with “thematic world maps,” this particular URL comes up as the top item2. The specification about upper-right corner of page was intended to distinguish the “correct” document from others. Among 16 Google users, all but one found this URL as the correct answer. Of the 19 TouchGraph users, only four missed the correct answer. Another question, which had close to fifty percent success rate for both groups, was the following: Find a web site that provides a URL-based search over the billions of archived pages on the web. Here a user can enter the URL of a page and retrieve different versions of the same page as it was on different dates. The site doesn’t support keyword searching like search engines. The user must enter the URL of a page as input. The expected answer was http://www.archive.org. The “correct” answer is unique since there is no other web site providing this service. It shows up as the top result when searched on Google with query “internet digital library.” 5 Google users and 6 TouchGraph users gave the correct answer. Another question with somewhat lower success rate was: Find a web site that sells prehistoric monuments like fossils, meteorites, and other items that are related with dinosaurs. The monuments are displayed in a gallery. There is a diversity of prehistoric monuments in this exhibition like a two hundred million year old petrified wood slice, a dinosaur claw, eggs, toys and etc. The expected answer for this search task was http://www.dinosaurstore.com/. This URL shows up as the top item in the list of search results when searched with “dinosaur teeth fossils.” Notice that the word “teeth” does not appear in page description we provided. This may have made it harder to find the page, and search may require a good deal of exploration. We had found this web site by searching with 2 It should be noted that Google ranking of pages may vary over time. Generally, this variation is less for well-established web sites. Page rankings mentioned in this paper were true as of the date of user experiments. Evaluating User Effectiveness in Exploratory Search with TouchGraph Google Interface 409 “fossils meteorites dinosaurs” and clicking on “similar pages” link of the second item on the list (which was www.arizonaskiesmeteorites.com/Dinosaur_Fossils_For_Sale). The page appears as the sixth item on the similar pages list. The “correct” answer was unique as of the date of tests because of the two hundred million year old petrified wood slice that none of the similar pages had. While six Google participants found this URL as the correct answer, among TouchGraph users only one participant found this URL. This was surprising because we had expected TouchGraph users to be more successful on this question due to the way we had reached it. The only question with no correct answer from any participant in either group was the following: Find a web site whose primary purpose is to maintain a comprehensive listing of African-Diaspora-related Web pages. The site provides a directory and a full text search of the indexed pages at a central site. Here, any one of several possibilities can be considered as being the correct answer, like http://www.ubp.com/, www.blackpages.com/, www.blackpgs.com/, and possibly others. All three of these pages are returned on the first page of Google when queried with “black pages.” Our original expectation was that most participants would suggest http://www.ubp.com/ as the correct answer, because it is the only site that explicitly states its mission, as “The primary purpose of the UBP is to maintain a comprehensive listing of African-diaspora-related Web pages at a central site.” Moreover, it comes up as the second item when searched with “African Diaspora pages,” or as the fifth item in response to “diaspora related pages.” This was one of the few search tasks with multiple acceptable answers in our experiments. Yet, it turned out to be the only one with no correct answer from any participant. After close inspection, we found that in Google search engine, “diaspora” seemed to retrieve (mostly academic) pages that study the concept of Diaspora, and history of diaspora. Using the keywords “black pages” was essential for Google to retrieve sites with directory or search facilities, but participants didn’t try these keywords from the provided description of search task. 5 Statistical Evaluation of Results 5.1 Related Items Test In finding related items, TouchGraph users were more successful than Google users. This result concurs with Koshman’s findings [5] where users of TouchGraph were highly effective in finding related items in Amazon.com. This is expected since TouchGraph user interface graphically displays relationships between pages. Figure 2 shows the user scores obtained, and Table 2 shows the group statistics. We performed Mann-Whitney significance test on these results by using SPSS statistical package. Significance measures are reported in Table 3 below. As can be seen, the difference between user performances was highly significant. 410 K. Efe and S. Ozerturk Number of related pages found by participants 9 8 7 6 TouchGraph 5 Google 4 3 2 1 19 17 15 13 11 9 7 5 3 1 0 Fig. 2. User scores in searching for related pages Table 2. Group Statistics in finding related pages Interface N Mean Std. Deviation Std. Error Mean Google 16 3.5000 2.03306 .50827 TouchGraph 19 5.5263 1.38918 .31870 Table 3. Significance Statistics URL found Count Mann-Whitney U 54.500 Wilcoxon W 190.500 Z -3.274 Asymp. Sig. (2-tailed) .001 Exact Sig. [2*(1-tailed Sig.)] .001 5.2 Exploratory Search User scores in exploratory search are plotted in Figure 3 below. Table 4 shows the corresponding group statistics. Evaluating User Effectiveness in Exploratory Search with TouchGraph Google Interface 411 Contrary to the first test, Google users were more effective in exploratory search than TouchGraph users. Mann-Whitney test on these results showed the performance difference to be significant, rejecting null hypothesis with 95 percent confidence. Mann-Whitney significance measures are reported in Table 5 below. Number of successful searches by participants 7 6 5 4 Google 3 TouchGraph 2 1 19 17 15 13 11 9 7 5 3 1 0 Fig. 3. User scores in exploratory search Table 4. Group Statistics Interface N Mean Std. Deviation Std. Error Mean Google 16 3.5000 1.59164 .39791 TouchGraph 19 2.4211 1.26121 .28934 Table 5. Significance Statistics Correct Count Mann-Whitney U 89.000 Wilcoxon W 279.000 Z -2.138 Asymp. Sig. (2-tailed) .033 Exact Sig. [2*(1-tailed Sig.)] .037 412 K. Efe and S. Ozerturk 6 Discussion The purpose of our experiments were to find answers to two key questions: a) how well can users navigate visualizations to reach multiple sources of related information, b) To what extent does this ability translate to reaching desired information in exploratory search. The first test showed that TouchGraph interface positively helps users reach a multiplicity of related pages. However, the second test showed that this ability did not necessarily translate to more effective searches. The results appears counter-intuitive at first because we would expect that enhanced ability to reach a multiplicity of related documents would translate to enhanced ability to reach desired information. However, our interviews with users showed that they had difficulty in comprehending the graphical display. When the question only asked about page URL, they performed well because they could readily see the URL associated with each graph node displayed on the screen. However, there was a semantic gap between the URL of a page and its information content. They couldn’t see enough content description to meaningfully navigate their way toward required information. Consequently, they made very little use of related page information. Google user also admitted that they didn’t make much use of “similar pages” facility. Both groups tried to use query refinement as the primary mechanism for exploration. One month after the test, we asked participants: “Now that you are aware of its existence do you use TouchGraph instead of Google for search?” None of the participants acknowledged using it even occasionally. Effective tools that help users to explore related pages are essential in web search. In real life, learning and reasoning about related information is a primary method of discovering the unknown. There is no reason why this should not be the case for exploratory search on the web. We only know that TouchGraph is not the right tool for this purpose. References 1. Aspillaga, M.: Perceptual foundations in the design of visual displays. Computers in Human Behavior 12(4), 587–600 (1996) 2. Efe, K., Asutay, A.V., Lakhotia, A.: A User Interface for Exploiting Web Communities in Searching the Web. In: WEBIST-2008, Proceedings of the Fourth International Conference on Web Information Systems and Technologies, Funchal, Madeira, Portugal, May 4-7 (2008) 3. Heo, M., Hirtle, S.: An empirical comparison of visualization tools to assist information retrieval on the Web. Journal of the American Society for Information Science and Technology 52(8), 666–675 (2001) 4. Hightower, R.R., Ring, L., Helfman, J.I., Bederson, B.B., Hollan, J.D.: Graphical Multiscale Web Histories: A Study of Padprints. In: Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (HYPERTEXT 1998), Pittsburgh, PA, USA, June 20-24, 1998, pp. 58–65 (1998) 5. Koshman, S.: Web-based visualization interface testing: similarity judgments. Journal of Web Engineering 3(3/4), 281–296 (2004) 6. White, R.W., Kules, B., Drucker, S., Schraefel, M.C.: Supporting exploratory search: Introduction. Special issue of the Communications of the ACM 49(4) (2006) 7. White, R.W., Muresan, M., Marchionini, G.: Proceedings of the ACM SIGIR 2006 Workshop on Evaluating Exploratory Search Systems (EESS 2), Seattle, USA, August 10 (2006) What Do Users Want to See? A Content Preparation Study for Consumer Electronics Yinni Guo1, Robert W. Proctor2, and Gavriel Salvendy1,3 2 1 School of Industrial Engineering, Purdue University, W. Lafayette, IN, USA 47907 Department of Psychological Science, Purdue University, W. Lafayette, IN, USA 47907 3 Department of Industrial Engineering, Tsinghua University, Beijing, China 100084 {guo2,salvendy}@purdue.edu, proctor@psych.purdue.edu Abstract. To investigate what users want to see from consumer electronic devices, a content preparation study was conducted. A questionnaire was constructed based on the results from web site content research and traditional usability studies on consumer electronics, and was completed by 401 Chinese participants. The statistical results reveal that there are nine major factors of cell phone content. Also users of different age and gender have different requirements for cell phone content, especially concerning accessory and multimedia functions. This study suggests guidelines for cell phone designers targeted at the Chinese market, as well as a base for content study of other consumer electronics. Keywords: Content preparation, factor structure, consumer electronics. 1 Introduction and Literature Review To assist users’ decision making and enhance customer satisfaction, researchers should study not only the question of how to design the interface so that users can operate it easily, but also the question of what information needs to be made available for users to access. The term content preparation is used to refer to the study of determining what types of information to convey to the users, what techniques are appropriate to elicit necessary content from customers, and how to classify the information into well-established categories [1]. Three recent studies of content preparation have been conducted by Liao et al. [2], Guo and Salvendy [3], and Savoy and Salvendy [4]. Liao et al. [2] developed and implemented a web-based survey in an attempt to discover the differences in information preferences between U.S. and Chinese online consumers. The survey concluded with seven factors: product return and exchange, online retailer reputation, product cost and performance, manufacturer reputation, product appearance, technology, and product facts. The factor analysis of Guo and Salvendy’s [3] study indicated 15 main content factors for Chinese e-business web sites: security content, quality content, service content, appearance description, contact information, aid function, customized function, search function, product specification, purchasing aid, price content, detailed description, comment content, matching product, review content. In another J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp.413–420, 2009. © Springer-Verlag Berlin Heidelberg 2009 414 Y. Guo, R.W. Proctor, and G. Salvendy culture-related content preparation study, Savoy and Salvendy [4] found that seven factor captured the survey structure: general product description, member transaction, shipping, secure customer service, company, durability and price. Comparison of the three studies reveals similar factors of web site content and provides evidence that the quality of content plays an important role in usability. Given the importance of content preparation for web sites, the study of content preparation should be extended to non-web-based products like consumer electronics. Compared to web-based products, consumer electronics are similar in being used to display a large amount of information to the users. However, unlike web-based products, consumer electronics like mobile phones and PDAs impose limitations of small screen size and cumbersome input mechanisms [5]. Therefore, indication about ways to control the devices may be essential in the content structure. We chose cell phones as representative of consumer electronics because nowadays cell phones are being developed as multi-functional devices; therefore, the factor structure of cell-phone content may be applicable to other appliances. A related study of content for consumer electronics was performed by Caus et al. [6]. They pointed out that reasons for low market penetration of mobile applications included their lack of standardization concerning the handling of information and high technical complexity. Caus et al. proposed that one possible way to reduce the problem of representing and selecting content in mobile Internet use was to only offer users content relevant to their particular situation, through context-aware information processing. 2 Methodology When the users are using a certain function of the cell phone, they are facing a series of tasks. Therefore the essential issue for cell phone content is what content should be provided so that users can operate the functions easily, and what types of functions are necessary. One way to figure out what content is needed is to ask the customers themselves. The three previous studies [2, 3, 4] validate the efficiency of surveys. Therefore, a questionnaire was developed based on previous content preparation studies, cell phone usability studies [7, 8, 9, 10], multi-media studies on cell phones [6, 11] and observation of current advanced cell phones. Content questions included six major categories: function (18 questions), menu (8 questions), instruction and status (15 questions), file (6 questions), input and search (8 questions), service (4 questions), and phone calls features (5 questions). The questionnaire included these 64 questions and 4 questions asking participants’ feelings concerning how much cell phone content would influence their satisfaction, operation efficiency, and effectiveness, as well as whether current cell phone content is enough. There were also seven demographic questions to investigate users’ backgrounds. Out of the 68 questions, there are four repeated questions to test internal consistency. 3 Procedure and Participants The survey was conducted in Xiamen, China, in May 2008. A paper-based questionnaire was used due to distribution convenience. A total of 401 participants filled out What Do Users Want to See? A Content Preparation Study for Consumer Electronics 415 the survey, out of which 375 yielded usable results. Twelve participants did not finish the whole questionnaire, and 14 participants answered the questionnaire with low internal consistency. 42% of the participants were female. The ages of subjects ranged from 18 to 60 years, with 95% of them being under age 40 years. About 96% of the subjects had education higher than an associate college degree. The participants had a diverse range of occupations. Most had experience of using cell phones for 2 years or longer and with 2 or more models. Detailed description of the subjects’ demographic information can be found in Table 1. Table 1. Demographic characteristics of survey participants Gender Education Level Age Occupation Models Years Categories Female 158 High school 6 Under 20 33 Manager 28 Technician 22 0 7 0-1 10 Associate 6 20-29 315 Sales 29 Teacher 41 1 111 2-4 195 Male 217 Undergraduate 316 30-39 15 Staff 43 Student 133 2 137 5-8 123 Graduate Others 47 0 Over 40 12 Engineer 68 Others 11 >2 120 >8 47 4 Results and Discussion 4.1 General Results and Factor Analysis We use a 7-point Likert scale to record users’ attitude. The mean answers for each question ranged from 4.05 to 6.22, with standard deviations ranging from 0.93 to 1.73. Some questions yielded extremely high means with low standard deviations, which indicates that these items are considered very important across all participants. These questions are about the main or basic cell phones features like calendar, message status, search by name function, time of a missed call, and missed call times. On the other hand, some items like sequential shooting camera, mobile televisions, dual time zone function, and animation of power on/off had low means and high standard deviations. This result shows that participants’ preferences concerning accessory functions differ a lot, probably due to different backgrounds. The survey also reveals that participants agree that quality of cell-phone content would influence their satisfaction (mean rating, M = 5.56), as well as their operation efficiency (M = 5.23). There was no agreement on “current content is enough” (M = 4.46; SD = 1.73), which indicates that for many current cell-phone models, but not all, the necessary contents are not all included. 416 Y. Guo, R.W. Proctor, and G. Salvendy The survey showed an acceptable overall internal consistency of 0.82. To get the hidden structure of information content, maximum likelihood factor analysis with varimax and promax rotations was conducted. By examining the scree plot and eigenvalues, we found that 9 factors would explain 85.54% of the total variance. Under each factor, items with loadings lower than 0.50 were considered insignificant and eliminated. The factors were named according to the loading questions. Factor 1 includes content items of “current input method”, “the input ‘pinyin’ letters”, “what content has been input”, “search by name” and “search by initial”. Therefore it could be concluded as factor about “Input and search”. Factor 2 covers questions about “number of each function”, “name of each function”, “all options of each function on any menu”, “scroll bar” and “cursor”, which are all related to assistance to functions. Therefore, Factor 2 is named as “Functions”. Items under Factor 3 are all related to the indication of keys or functions, like indication of “back to previous menu”, “confirm key” and “which keys are in use”. Therefore Factor 3 is named as “Operation”. Factor 4 includes the three most widely used multimedia functions (digital camera, sequential shooting camera, video camera), and is named as “Multimedia function”. Factor 5 covers items of “file size”, “photo size”, “file properties” and “storage”. It could be named as “Stored files” since all four items are related to cell phone storage space and stored file attributes. Questions loaded under Factor 6 are all about phone calls like “miss call times”, “time of a miss call” and “length of each call”. It could be named as “Phone calls”. Factor 7 could be named as “Help and service” because the loaded questions are about how to get more information about signal carrier and manufacturer, as well as help information of cell phone functions. Factor 8 covers a large range of questions from reminding icons to emergency key and could be considered as “Accessorial functions”. Factor 9 can be concluded as “Message” since it contains two items “icon of message box status” and “icon of voice mail status”. Of the original 64 questions of cell-phone content, 27 items did not load on a factor. Therefore the questionnaire could be simplified for future use. Out of these nine factors, four factors (Factor 4, 6, 8, 9) are about specific cellphone functions. These factors and the items covered could be applied to the design of cell-phone content. The other five factors are related to general functions and operation. These factors are universal and could be applied to the content design of most consumer electronics. For instance, Factor 5 could be used for devices that could store files, like music player, digital camera, PDA, GPS; Factor 2, Factor 3 and Factor 7 need to be applied for every information appliance. 4.2 Analysis of Users of Different Backgrounds The survey included seven demographic questions to classify participants with different backgrounds. By checking the difference, we could give guidelines of whether different designs should be considered for different user groups. Duncan’s multiple range test has been taken to compare all pairs of means, with alpha level of 0.05 set for statistical significance. Difference of means over 10% is considered as a practical significant difference; and items revealed to be practical significant different are listed in Table 2. For the comparison between females and males, 23 items show statistically significant differences. However, only one item “Message of break out incident” shows a practical difference. Females tends to agree more than males on getting What Do Users Want to See? A Content Preparation Study for Consumer Electronics 417 message when there is any break out incident like explosion, hurricane, or earthquake. This is probably because women are more averse to risk taking [12] and perceive greater danger than men [13]. The comparison of three age groups showed many differences. Although most of the subjects were no more than 40 years of age, differences still exist among participants of different age ranges. Thirty questions show statistically significant differences between age groups of “under 23 years old”, “23 to 29 years old” and “above 29 years old”. Twelve questions show practical significant differences, while 11 of them show decrease on mean as age increases (Fig. 1). These questions are all about whether a certain accessorial function is necessary, like mp3 player, instant message, memo, etc. This result can be concluded as the older users do not want accessorial functions as much as younger users do. Age Effect 23-29 Above 29 ed day s mo ma rk me me ssa nge r ins t ant es urf in g pla yer on lin mp 3 iza tio n e- b oo k cus tom eo vid ita l dig ent ial s ho oti n g cam era ita l seq u dig mp 3 pla yer Question Response Under 23 7 6,5 6 5,5 5 4,5 4 Questions Fig. 1. Mean response for different product features as a function of age The education level of participants varies from associate degree to Ph.D. degree. However, there are much more undergraduate degree level subjects than associate degree level subjects. Therefore, we decided to compare only two groups, undergraduate degree level and graduate degree level. By checking the results we can see that there are 8 items showing statistical significance, 6 of which show practical significance. Similar to what we have found in the age comparison, these 6 items are questions about whether a certain multimedia or accessorial function is necessary. The results suggest that participants with higher degrees would pay less attention to these features. However, this tendency might also interact with the age factor since people with graduate degrees tend to be older than those with undergraduate degrees. The demographic table shows that the numbers of sales personnel, technicians and managers are not comparable to the number of students. Therefore, we decided to combine participants with jobs and compare them with the students. The results reveal 418 Y. Guo, R.W. Proctor, and G. Salvendy Table 2. Practical significant differences between subjects of different backgrounds Questions Message of break out incident Questions Mp 3 function Digital camera function Sequential shooting Video camera E-book function Customization function Mp3 function Online surfing Instant messenger Indication of “back to previous menu” Icon of “memo” status Icon to remind marked days Questions Digital camera function Video camera Cell phone game Customization function Mp3 function Instant messenger Questions Video camera E-book function Customization function Online surfing Instant messenger Icon of “memo” status Icon to remind marked days Questions Animation of power on/off Questions Video camera Mp3 function Indication of “confirm” key Animation of power on/off Questions Digital camera function Number of each function Female Male % P 6.28 5.56 12.95 <0.0001 < 23 23 to 29 > 30 % P 5.16 4.77 4.38 17.81 0.0041 5.63 4.99 4.77 18.03 <0.0001 4.69 4.28 4.00 17.25 0.0083 5.09 4.49 4.04 25.99 <0.0001 5.71 5.69 4.81 18.71 0.0022 6.13 5.67 5.54 10.65 <0.0001 5.06 4.71 4.15 21.93 0.0032 5.77 5.31 5.15 12.04 0.0011 5.30 4.92 4.58 15.72 0.0037 5.56 5.12 5.65 10.35 0.0053 5.90 5.48 5.15 14.56 0.0002 5.77 5.33 5.00 15.40 0.0001 Undergraduate Graduate % P 5.51 4.81 14.55 0.0011 4.96 4.38 13.24 0.0111 5.25 4.74 10.76 0.0185 6.05 5.49 10.20 0.0003 5.02 4.28 17.29 0.0012 5.22 4.70 11.06 0.0107 Non-student Student % P 4.47 4.93 10.29 0.0467 5.09 5.71 12.18 0.0026 5.37 6.04 12.48 <0.0001 5.00 5.68 13.60 0.0005 4.63 5.22 12.74 0.0048 5.16 5.82 12.79 0.0003 5.09 5.67 11.39 0.0021 Number of models 0 1 2 >2 % P 4.29 4.76 4.39 4.23 12.5 0.0320 Years of using 0-2 3-6 >6 % P 5.17 4.92 4.49 15.14 0.0256 5.54 4.93 4.43 25.06 0.0003 5.00 5.59 5.46 11.80 0.0003 5.00 4.38 4.35 14.94 0.0146 Usability’s importance High Median Low % P 5.29 5.31 5.88 11.15 0.0042 5.27 4.97 4.76 10.71 0.0063 that there are 9 items that show statistically significant differences, and 7 of them are practical significant. Like the difference between undergraduate and graduate degree holders, the 7 items are all about accessorial functions. Non-student participants would not pay as much attention to these functions as students. This result might also interact with the age factor since people with jobs tend to be older than students. What Do Users Want to See? A Content Preparation Study for Consumer Electronics 419 There is only one item showing significant difference between participants with different cell phone model experience. Participants who have used more cell phone models prefer less animation of power on/off than less experienced users. But this trend does not apply to participants with no experience at all. Compared to other effects like age, education level and job category, the effect of model experience is much weaker. On the contrary, how many years that the participant has used cell phones shows more significant differences. Out of 11 statistically significant items, 4 of them are practical significant different. Experienced users were prone to place less attention to accessorial functions. This might also interact with the age factor since people have been used cell phone longer tend to be older. The comparison of participants that hold different opinions about cell phone usability shows six items as significantly different, and two of them are slightly practical significant. Participants who consider usability as “very important” or “median” do not think that the digital camera function is as important as participants who consider usability to be “not important”. But when talking about “number of each function”, the result is the opposite. It might be because “number of each function” is a way to support cell phone usability. After finalizing the factor structure, we compared participants with different backgrounds on the nine factors. Results from a MANOVA showed that only cell phone model experience has no influence on any factor. The age effect and usability effect would cause most differences. For all effects, the differences always include the need of information of accessorial functions or information of multimedia function. By checking the main effect of demographic characteristics, we found that Age and Gender are the two major characteristics, each of which shows significant on 27 and 20 items (p < 0.05). All the other characteristics have less than 7 significant items. Therefore, we can conclude that designers should make different models for users of different target age groups and target genders. For the current market, designs for different genders are more common than designs for different age groups. Older users complain that they cannot find cell phones that they can use [14]. 5 Conclusions and Guidelines There are basically four conclusions from the above analysis. First, content with higher quality will benefit customer satisfaction, and there is room for current cell phones to improve their content. Second, there are different needs of content for users with different backgrounds, especially for different ages. Younger users, especially students, rely a lot on multimedia functions and content, while older users or industrial populations do not consider them important. Designers need to take this into consideration when they are designing cell phone content. For example, offering the elder and business users cell phones with stable and high quality phone call functions, easy accessed phone book and easy input keypad is more important than providing the most advanced multimedia functions. Thirdly, 27 questions in the original questionnaire did not load on any of the factors. Therefore, the questionnaire can be simplified for future use. Fourth, information about input and search shows importance on how much variance it explains and the factor mean. This factor is essential for the Chinese 420 Y. Guo, R.W. Proctor, and G. Salvendy population because it is more difficult to input Chinese using the small cell phone panel, though text messaging is more widely used in China than in the U.S. Compared to existing studies of cell phone interface and cell phone usability, this study complements the lack of consideration given to content and content structure [7, 8, 9, 10]. Compared to the study of Caus et al. [6], which discussed about contextadaptive information for cell phones, this study provides a straightforward structure of the necessary information. References 1. Proctor, R.W., Vu, K., Salvendy, G.: Content Preparation and Management for Web Design: Eliciting, Structuring, Searching, and Displaying Information. International Journal of Human-Computer Interaction 14, 25–92 (2002) 2. Liao, H., Proctor, R.W., Salvendy, G.: Chinese and U.S. Online Consumers’ Preferences for Content of E-commerce Web Sites: a survey. Theoretical Issues in Ergonomics Science 10, 19–42 (2009) 3. Guo, Y., Salvendy, G.: Factor Structure of Content Preparation for E-business Web Sites. Behaviour and Information Technology (in press) 4. Savoy, A., Salvendy, G.: Foundations of Content Preparation for the Web. Theoretical Issues in Ergonomics Science 9, 501–521 (2008) 5. Venkatesh, V., Ramesh, V., Massey, A.P.: Understanding Usability in Mobile Commerce. Commun. ACM 46, 53–56 (2003) 6. Caus, T., Christmann, S., Hagenhoff, S.: Hydra – An Application Framework for the Development of Context-Aware Mobile Services. In: Business Information Systems, vol. 7, part 14, pp. 471–481. Springer, Heidelberg (2008) 7. Simth-Jackson, T.L., Nussbaum, M.A., Mooney, A.M.: Accessible Cell Phone Design: Development and Application of a Needs Analysis Framework. Disability and Rehabilitation 25, 549–560 (2003) 8. Kaikkonen, A., KekÄlÄinen, A., Cankar, M., Kallio, T., Kankainen, A.: Usability Testing of Mobile Applications: a Comparison between Laboratory and Field Testing. Journal of Usability Studies 1, 4–17 (2005) 9. Zhang, D., Adipat, B.: Challenges, Methodologies, and Issues in the Usability Testing of Mobile Applications. International Journal of Human-Computer Interaction 18, 293–308 (2005) 10. Ji, Y.G., Park, J.H., Lee, C., Yun, M.H.: A Usability Checklist for the Usability Evaluation of Mobile Phone User Interface. International Journal of Human-Computer Interaction 20, 207–231 (2006) 11. Miyauchi, K., Sugahara, T., Oda, H.: Relax or Study? A Qualitative User Study on the Usage of Mobile TV and Video. In: Changing Television Environments, pp. 128–132. Springer, Heidelberg (2008) 12. Byrnes, J., Miller, D., Schafer, W.: Gender Differences In Risk Taking: A Meta-Analysis. Psychological Bulletin 125, 367–383 (1999) 13. Lagrange, R., Ferraro, K.: Assessing Age and Gender Differences in Perceived Risk And Fear Of Crime. Criminology 27, 697–720 (2006) 14. Guo, Y., Proctor, R.W., Salvendy, G.: Development and Validation of Axiomatic Evaluation Method (working paper) “I Love My iPhone … But There Are Certain Things That ‘Niggle’ Me” Anna Haywood and Gemma Boguslawski Serco Usability Services, London, United Kingdom anna.haywood@serco.com, gemma.boguslawski@serco.com Abstract. Touchscreen technology is gaining sophistication, and the freedom offered by finger-based interaction has heralded a new phase in mobile phone evolution. The list of touchscreen mobiles is ever increasing as the appeal of ‘touch’ moves beyond the realms of the early adopter or fanboy, into the imagination of the general consumer. However, despite this increasing popularity, touchscreen cannot be considered a panacea. It is important to look beyond the promise of a more direct and intuitive interface, towards the day-to-day reality. Based on our independent research, this paper explores aspects of the touchscreen user experience, offering iPhone insights as examples, before presenting key best practice guidelines to help design and evaluate finger-activated touchscreen solutions for small screen devices. 1 Introduction Although ‘touch’ is the latest buzzword, touchscreen technology and devices have been in use for over ten years, from public systems such as self-order and information kiosks, through to personal handheld devices such as PDAs or gaming devices. As the technology gains sophistication and teething problems are worked on, the list is ever increasing. However, it’s only since touchscreen heralded a new phase in mobile phone evolution with the freedom of finger-based interaction, that ‘touch’ has taken hold of the general consumers’ imagination. Mobile phones by their very nature are intended to support communication and, increasingly, access to information while on the move. Technology intended for mobile use needs to map users’ needs without imposing unnecessary constraints. Accordingly, it is important to ask whether the much-discussed touchscreen interface is both useful and usable for the mobile user. Does it really live up to expectations, providing a more direct and inviting interaction, which matches users’ day-to-day activities without imposing unnecessary constraints? In an ongoing programme of research into finger-based touchscreen interfaces for mobile devices, which incorporates user studies (with existing touchscreen users and novices), expert reviews, and anecdotal evidence, Serco Usability Services have examined many of the latest touchscreen mobiles in order to identify overall ease-of-use factors and usability issues associated with touch interfaces for small devices. Drawing from this body of research, this paper explores aspects of the users’ experience, offering research into the iPhone as an example, before presenting a range of key best practice guidelines to help design and evaluate finger-activated touchscreen J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 421–430, 2009. © Springer-Verlag Berlin Heidelberg 2009 422 A. Haywood and G. Boguslawski solutions. Although focused rather than exhaustive, the guidelines aim to optimise the user experience by bringing qualities such as simplicity and ease of use, as well as consistency and responsiveness to the fore. 2 Perhaps ‘Cool’ But Not a Panacea Within the touchscreen arena, Apple’s iPhone is often the first device that springs to mind when asked to name a touchscreen mobile phone, despite manufacturers such as Samsung, Motorola and LG also being very strong contenders in the touch marketplace. Especially since the advent of the Apple’s 3G iPhone, interest in touchscreen mobiles has received a boost. A wealth of competitor products are hitting the market in order to ride on the iPhone wave, each aiming towards the large screen size and aesthetic appeal of the iPhone, while looking to distinguish themselves sufficiently, so not to attract a ‘me-too wannabe’ label or being too iPhone-esque to risk a costly legal battle. Despite its increasing popularity and the promise of a more intuitive interface, touchscreen is not a panacea. While finger-activated touchscreen can arguably be considered a progression over stylus-manipulation, views promoting touchscreen as a natural progression for mobile phones in general, are on ‘shaky ground’. All things being considered, it cannot be considered the ‘cool solution’ that waves goodbye to the usability issues typically associated with traditional, non-touch handsets. In addition, touchscreen devices can bring their own usability problems. At least for now, even Apple’s iPhone, which is often heralded as a touchscreen success story, is by no means perfect. Indeed, often tales of ‘iPhone love’ have user experience issues in the subtext, upon further investigation. 3 Exploring the ‘iPhone Experience’ 3.1 The Transition to ‘Touch’ The iPhone is frequently touted as being more intuitive than other mobile devices, not merely by virtue of its touchscreen interface, but also because it relies on one toplevel menu and a single physical button. Additionally, the device is often considered to offer a good balance between flashy design and practical functionality. This ‘balance’ is often cited as adding to its emotional appeal, especially amongst the consumer rather than the business market. While there’s a degree of truth in the ability of novice users to adapt relatively quickly to its use, this cannot be held true across the interface. There are aspects of the iPhone’s interface that still have a ‘learning curve’, requiring familiarisation and patience before an acceptable degree of performance is attained. For example, users sometimes struggle to discover and perform gesture style interactions such as zooming. Typically, ‘mastery’ and the overall user experience is measured in comparison to previous mobile phone use, including non-touchscreen devices. Although the transition to touch isn’t always ‘rosy’, the iPhone is often heralded as revolutionary in instances where prior mobile use was constantly fraught with difficulties and building the mental model necessary to use the device, was not easy to attain. As an example from our research, after persistently struggling with non-touch devices over several “I Love My iPhone … But There Are Certain Things That ‘Niggle’ Me” 423 years (and multiple handsets), one 73 year old respondent reported being a total iPhone devotee who regularly texts, downloads applications, and is addicted to playing games on her beloved iPhone. The interaction paradigm offered by the iPhone is often considered to add to its intuitive nature. Here, rather than adopting a computer-based model of scrolling (like some competitors) where a scroll bar sits to the right of the screen, the iPhone’s physical interaction model (i.e. scroll up to access content further down the page and vice versa) encourages users to freely scroll anywhere on the screen. This model allows users to focus on page content and maximises screen real estate. In our studies, many users indicated that dragging a list/page up or down felt very smooth and very much like interacting with a real, physical object. In particular, the ability to flick a list in order to scroll it with momentum was appreciated once mastered. 3.2 Touchscreen Responsiveness Working to minimize the touch-response lag is imperative to the usability of touchscreen interfaces, as delays will frustrate and confuse users, encouraging repeated selection of target elements. Optimising responsiveness will dissuade users from pounding the screen and/or attempting to use their fingernail or pen like a stylus. When users’ reactions to the responsiveness of the iPhone were probed, responses typically signaled a high degree of satisfaction. Responsiveness was thought to be extremely good, with a negligible amount of lag between selection and launch. Indeed, with its underlying capacitive technology, the iPhone was generally considered more responsive than competitor devices that relied on direct pressure (resistive): only the lightest touch was required. However, where novice users were concerned, the iPhone’s high degree of sensitivity sometimes fostered niggling concerns about accidental interactions, for example, where overall finger size exceeded the target’s dimension or when hitting the target off-centre. Also, due to its responsiveness, an ongoing problem with the iPhone was that it sometimes confused navigation with selection, if users scrolled too slowly across a webpage full of links. This issue was then compounded by the inability to stop a new page opening. 3.3 Screen Size Matters In addition to the inherent novelty appeal, the ‘no-button’ design of touchscreen phones lends itself to a large screen size and the potential for a more sleek aesthetic design not ‘burdened’ by the need to accommodate physical buttons. When it comes to touchscreens, screen clarity and size matters. Large good quality screens are considered essential to provide space for key elements, as well as affording comprehension of the elements presented. Users need to feel that icons and other screen elements are large enough to select without accidentally selecting adjacent items. The hardware design of the iPhone is seen to bolster its emotional appeal. As noted by our respondents, the large 3.5-inch screen size, the clarity of the touchscreen, and its ‘unfussy’ single button design, positively combine, contributing to perceptions of the iPhone as being a high-end phone. Indeed, such factors were cited as reasons why current iPhone users had chosen the iPhone over the competition in the first place. For some novice users, however, positive reactions to the large screen were sometimes pitted against concerns that the screen may be vulnerable to damage, which may 424 A. Haywood and G. Boguslawski render the device inoperable. With its reliance on finger-activation, the large iPhone screen was also viewed as a ‘finger print trap’, and novice users sometimes questioned whether the screen would depreciate in sensitivity, especially for scrolling activations, due to a build-up of dirt and grime on the screen. Also considered an issue for the iPhone’s capacitive touchscreen was the requirement for user’s fingers to be bare - activation relies on electro-connectivity in the user’s fingers. Where discussed, this was seen as a potential burden during winter months, as gloved users would find their fingers rendered useless, unless fingerless or specialist gloves were worn. 3.4 Form Factor Referring to issues such as size, weight and shape, the ergonomic aspects of both touch and non-touch handsets have a notable impact on the user experience. For practical reasons, the ideal mobile phone should not impose constraints on the user’s clothing or accessories. The desired size and shape needs to fit comfortably, not only in the hand, but also in pockets and/or the user’s choice of bag. Accordingly, during our studies some participants wanted to place the iPhone in a pocket, to try it out for size. Typically, reactions highlighted that the handset’s physical design achieved a good balance between being a suitable size and weight to be accommodated in bags or clothing with relative comfort, while still offering a screen size optimised for touchscreen interaction, especially when it comes to web browsing. In terms of handling the device there were a modicum of concerns that the iPhone’s overall form factor may be uncomfortable and potentially awkward to use for voice calls, particularly lengthy ones, especially if protected by an attached casing. Also, mixed with positive reviews of the iPhone’s ‘sleek’ hardware design, was a degree of mourning that the days of wedging your handset between shoulder and ear, in order to free up your hands, would be at an end with this handset. However, even where considered a little heavier than traditional handsets, the iPhone was largely considered ‘weighty’ in a positive way, with this being perceived as a mark of quality. 3.5 Navigation – The Importance of Simplicity and Consistency Like their non-touchscreen siblings, it is imperative that touchscreen interfaces aim towards simplicity and consistency throughout the interface, in order to minimise potential frustration and allow user expectations to be appropriately managed. If users have problems with finding, selecting and using the most basic functionality, then they will feel negative about the product. With a mobile phone, it is vitally important to support key functions such as answering or ending a call, creating and accessing text functionality (and email, if available), listening to music (and volume alteration), and accessing the internet, etc. In terms of accessing functionality, the steps involved should be minimised, by keeping access points at a high level. To support users’ navigation, there also needs to be a clear and direct path to the Main Menu or ‘Home’ area. In this respect, the iPhone was typically praised. All applications are accessible from the home screen, creating a shallow menu structure that is practically impossible to get lost in, and the single hard key provides a constantly visible route home. On the negative side, secondary functions of the Home key, such as the ability to double-press this to access Favourite Contacts and its role to exit the menu customization mode, were generally only discovered by accident or word of mouth. “I Love My iPhone … But There Are Certain Things That ‘Niggle’ Me” 425 The iPhone interface follows the Apple philosophy of achieving ease of use through simplicity, limiting the number of options and functions available to make menus as simple as possible. One notable example of a key function where performance was marred was instigating a call. Here, fuelled with anticipation of a dedicated call ‘button’, new users often overlooked the need to press the actual phone number once on the contact details page. Perhaps surprisingly, there were some key functions that even presented difficulties for existing iPhone users – e.g., setting an alarm, discovering the ‘pinch’ gesture to zoom, etc. In the pursuit for simplicity, it is noteworthy that several functions, cited as important by mobile phone users, are omitted from the iPhone. Here, our respondents commonly complained about the lack of an MMS facility, the inability to forward received messages, no communication concerning the number of characters remaining in an SMS (resulting in recipient frustration over multiple texts), the inability to cut and paste, the lack of flexibility in displaying SMS messages (we’ve observed a lovehate relationship with the chat-style view). Functionality increasingly provided on mobile phones, such as radio, a camera facility complete with flash and zoom abilities, as well as an (official) way to record video clips (using the built-in camera), were also missed. Accompanying this last point, several comments highlighted reluctance to ‘tinker’ with the handset in order to explore ‘unofficial’ solutions for core functionality, given the perceived high cost of the device. Consistency is, largely, a key attribute of the iPhone interface. For example, once users learn to tap in a text entry field to access the virtual keyboard, this works in the same way across applications. Overall, users reported that elements for onward navigation could be distinguished with relative ease, despite some inconsistency in the interface being noted. Also, the consistent positioning of back buttons throughout the interface was welcomed. However, consistency does not guarantee good usability. Noteworthy here is that the meaning of the ‘+’ button, which is widely used throughout the iPhone interface to, for example, add configuration set-ups or new notes pages, was not immediately visible or understood by all. For example, users often stumbled when setting an alarm. In this case, the ‘edit’ and/or ‘+’ button was often overlooked, with users expecting to select the field of an existing alarm in order to edit the time. In addition, in some places users need to save their changes explicitly, whereas in other places alterations are saved automatically – e.g. while ringtone settings are automatically saved, ‘setting the alarm’ requires users to select ‘Save’ to commit their settings. Where encountered, there was occasional uncertainty and confusion about whether or not the performed action had been accepted and confirmed. Even existing iPhone users were sometimes surprised that they needed to save their changes within certain areas of the interface and not others. There seems to be a movement towards automatic saving in mobile interfaces, however, to reinforce this model this needs to be applied consistently across the interface. 3.6 Visual Design As with non-touchscreen devices, it is important for users to readily understand, at a glance, any iconography presented, especially if it’s not supplemented with a label descriptor. Where icons are relatively abstract or their visibility is reduced (through 426 A. Haywood and G. Boguslawski either their visual design and/or a cluttered display), users will become frustrated if they continually struggle to locate target features. Considering the iconography on the iPhone’s Home screen, the colorful array of default items, as well as downloaded applications, tended to attract positive reviews amongst our respondents, with icons largely regarded as depictive rather than abstract. Here, the size and relative spacing of the application icons, and the provision of supplementary labels (under each item) were considered to support both selection as well as an understanding of the functionality presented. In terms of its graphical look and feel, despite adopting a rather limited colour palette, the iPhone tended to attract praise, especially amongst Apple-consumers, with a more ‘jazzy, colourful look’ only being requested by a minority. 3.7 The Virtual Keyboard If devices exclusively rely on an on-screen keyboard, the aim should be to mirror levels of speed and accuracy offered by traditional handsets as far as possible. Also, without a permanently presented physical keypad, clear access to the virtual keyboard is vitally important. Users must not be left wondering how to enter text using the touchscreen. Additionally, it is important to ensure that users can readily change between different text input modes, to support the creation of messages that involve punctuation, numbers and the input of special characters. Representing a common task for mobile users, for some even more important than making or receiving calls; writing and sending text messages and, increasingly, emails, is one aspect where touchscreen mobile devices often come under fire, typically amid concerns that virtual keys are not adequately sized for accurate fingerselection. In this regard, the iPhone is no exception. As our findings suggest, those who use their mobile phone extensively, especially for text entry (e.g. heavy texters or Business users), may have a less smooth transition to touchscreen devices, than more ad hoc or light text users. For this latter group, especially in instances where multi-tap text entry was considered a chore, there were indications that performance may even be enhanced, at least in the users’ perception. During our research, the iPhone’s on-screen QWERTY keyboard was largely appreciated, as the layout (if not the experience) was familiar from using a computer keyboard. However, although the iPhone’s keyboard fills approximately half the screen, the size of the keys tended to attract mixed reactions, and there were concerns over selection accuracy. Especially for novice users, the keys were often considered too small and there were worries that fingers would span more than one key, increasing input errors. Aiming to negate such concerns, the ‘magnification bubble’ of the key selected, was popular, both aesthetically and in terms of supplying feedback, as when using the keyboard users’ fingers occluded their selection. Although users often reported improvements in keyboard comfort as their familiarity with the on-screen keyboard increased, there was not widespread confidence that performance could ever match that exhibited on a physical keypad. Reports of being able to type more efficiently and with more accuracy on conventional non-touch devices were abound. In particular, those who used their mobile device heavily for email or text messaging perceived a deficit in their performance when using the iPhone’s on-screen keyboard. Indeed, both users of conventional numeric 12-key keyboards (multi-tap and predictive users) and users of hard-key QWERTY “I Love My iPhone … But There Are Certain Things That ‘Niggle’ Me” 427 keyboards, reported frustration at a higher perceived level of entry errors, as well as a reduction in perceived speed, when using the iPhone as compared to prior experience with physical keypads. Also, for respondents who indicated proficiency at multitasking when texting on a traditional keypad (e.g. texting whilst watching TV or even while driving!), the need to always attend more to the virtual keypad was anticipated and considered a bind. Also, existing touchscreen users (iPhone and competitor) were seen to lament the ability to enter text one-handed without looking at the screen. Similarly, singlehandedly balancing the device and taking a photo using the on-screen button was also found to be tricky, if the desired composition was to be maintained. As frequently found with touch interfaces, the iPhone’s keyboard also came under fire for not being ‘thumb-friendly’, and those with long fingernails often experienced difficulties, especially when inputting text. In this latter case, users’ attempts to initiate their selections using their fingernail much like a stylus were thwarted. At least initially, many users attempted to carry over text entry techniques from using physical keypads. For example, numeric keypad users often attempted a onethumb approach to typing, while some hard-QWERTY users tried a two-thumb technique. However, after frequently mis-keying due to miscalculating which bit of their digit(s) hit the keyboard first and which character was being selected as a result, it was often reasoned that, until familiarity had grown, single index finger interaction was probably the most efficient method, given the width and spacing of the keys. Due to the above concerns, the ability to change the orientation of the device from portrait to landscape, where available, was deemed very welcome. Some anticipated the ability to access the landscape keyboard universally across the interface and expressed surprise when they realised that horizontal text entry was only available in the Safari browser. With the perception that an increase in horizontal width would improve both single-digit and thumb performance, this facility was often requested across applications. Interestingly, even where respondents were iPhone users themselves, none used or reported awareness that letters were only registered once the selected ‘key’ had been released. Additionally, where this strategy was prompted, comments reported that it didn’t feel natural to remove one’s fingers from the screen in order to make a selection, and it was doubted whether adopting this strategy would improve performance. The technique was still seen to strongly rely on feedback via the ‘magnification bubble’, and it didn’t remove the need to divide one’s attention between the keying area and the characters accruing in the text field. When considering the iPhone’s text correction facilities, despite potential benefits being acknowledged, initially the auto-correct (and complete) facility was considered to represent part of the problem. Optimal use of this can take a little bit of getting used to, as it requires users to divide their attention between what they are doing with their fingers (i.e. typing) and what is being registered on the screen, as well as what is being suggested in the ‘pop-up fields’. Indeed, reports of sending text messages with ‘odd sentences’ in them, because a suggested word had surreptitiously entered the message, pepper our research. The iPhone’s ‘magnifying glass’ editing feature, which supports corrections earlier in a word or line, was not considered intuitive, even amongst some current iPhone users. Accordingly, when faced with the task of correcting input errors, unless users were already aware of this feature, many expressed frustration that, without an 428 A. Haywood and G. Boguslawski obvious alternative, they needed to delete strings of correct characters in order to access the place where editing was needed. Once this feature was acknowledged, however, it tended to be widely praised for both its actual function and the aesthetic appeal of the magnifier. 3.8 Missing the ‘Tactility’ of the Keyboard Largely, for both novice users and iPhone users alike, some users lamented the natural haptic response of a physical keypad. In particular, comments highlighted concerns that locating specific ‘buttons’ on the hard, uniform touchscreen, required users to stare at the on-screen keys while they typed, which took attention away from the text in the message field. This loss of tactility was cited as a factor contributing to deficits in both speed and accuracy, especially amongst novice users. Comments revealed that the undulating feel of a physical keypad and the ‘click’ offered upon selection, supported selection without needing to focus undue amounts of attention on the keypad. Especially for numeric keypad users who regularly use predictive text, the need to make selections from a virtual QWERTY keyboard, without any haptic support to signal the relative positioning of the keys, was considered daunting by some. Whilst this discussion has focused on the keyboard, problems associated with a lack of tactile feedback extend to other direct selections. In the absence of a tactile response, the careful design and placement of visual feedback become more important. The problem with visual feedback on small screens is that fingers occlude parts of the screen (including the elements under selection). With this in mind, users welcomed that, in Safari, feedback appears at the top of screen when a page is loading. Similarly, the magnification bubble that provides feedback as users explore the keyboard, was often thought to be both ‘funky’ and informative. With touchscreen interfaces, there is obviously scope for attempting to replicate the haptic feedback offered by a physical keypad by introducing tactile sensations (potentially synchronized with sound). However, despite the argument that providing vibro-tactile effects, corresponding to the user's exploration and selection, will provide a more satisfying user experience, at this stage a lot seems to depend on the sophistication of the haptic technology. Indeed, our research suggests that offering a range of discrete sensations, such as the realistic feel of buttons depressing and releasing, in addition to the feel of a screen populated with icons and the sensation of an undulating keypad, may hold more appeal than a coarse ‘buzz sensation’ whenever a selection is made. Furthermore, regardless of the treatment offered, it is important that users are given the freedom to turn this facility on or off, as preferred. 4 Best Practice Guidelines Based on our research in this arena, this paper concludes with a range of best practice guidelines for finger-operated touchscreen interfaces. Being focused, rather than exhaustive in scope, the guidelines indicate factors that are important to consider when evaluating or designing touchscreen solutions for small screen devices. “I Love My iPhone … But There Are Certain Things That ‘Niggle’ Me” 429 Table 1. Best practice guidelines for finger-touch interfaces Screen size matters • When it comes to touchscreens, screen clarity and size matters - large good quality screens are essential to provide space for key elements. • As larger screens can foster concerns over vulnerability, the hardware design needs to support notions of robustness and quash any concerns over screen fragility. Touchscreen responsiveness • Aim towards high system responsiveness, as delays will frustrate and confuse users. Minimising response lag will dissuade users from pounding the keys to repeatedly select target elements, and/or using their fingernail or pen like a stylus. • To minimise keying errors, ensure that sensitivity and screen alignment (calibration) are optimised. Maximise sensitivity levels, uniformly, across all areas of the screen. Particularly where a scroll bar draws the users’ focus, sensitivity at the perimeter needs to be optimised. Towards a tactile experience • As the tactile experience offered by conventional keypads, may offer a positive effect on efficiency, error rates, and user satisfaction, consider options to support a more tactile user experience – e.g. tactile output for the identification of controls and/or vibro-tactile sensations to in response to selections. • Aim towards an array of discrete sensations, rather than just a coarse ‘buzz sensation’ upon selections. • If provided, tactile feedback should be an optional rather than a default feature, with a means to easily switch between the two modes being understood and clearly visible. Navigation & efficiency of use • If users have problems with the most basic functionality, then they will feel negative about the product. Support key functions such as answering or ending a call, instant messaging, listening to music, viewing messages, accessing the internet, etc. • Minimise steps to access or perform core functions, by keeping access points at a high level. • Allow clear and direct navigation to return Home and to the Main Menu. This is especially important where the device doesn’t offer a physical button dedicated to this. • To reassure users and allow ease of navigation, ensure consistency throughout the interface. • As users’ fingers may occlude parts of the screen (including selected items), carefully consider the design and placement of visual feedback. Ideally, feedback should appear above the item selected. • Consider ways to ensure navigation and selection are easily discernable, so that users don’t accidentally make selections when they scroll. • Allow actions to be readily reversible, so that if an error is made, it can be easily rectified. • As appropriate, consider providing on-screen buttons that can be readily selected and hidden when not required, and ensure that the existence (and access to) these buttons is understood. • Although a help facility mustn’t be seen as the solution to a poor user interface, if feasible, consider options to provide an on-device Help system that is both easy to find and use. 430 A. Haywood and G. Boguslawski Table 1. (continued) The virtual keypad • Aim towards mirroring the levels of speed and accuracy offered by traditional handsets as far as possible. • Without a permanently presented keypad, clear access to the virtual keyboard is vitally important. Consider using a consistent convention across the interface (e.g. tapping the field). • Consider presenting a virtual QWERTY keyboard instead a multi-tap configuration (where characters are shared on individual keys). Without the tactile cues familiar on a conventional keypad, a QWERTY layout may be easier to use than a multi-tap design – with the latter, lag and precision issues may come to the fore. • Consider the option to offer the keyboard in a horizontal view, and allow this to be consistently available across the interface. • Ensure users can change between different text input modes with ease. Options to enable and disable predictive text and switch between letter, number or symbol inputs, must be clearly presented and quick to use, with any shortcuts being clearly understood. • As users need to feel confident that selections will have the desired effect, ensure the selectable area (icon/button) is larger than the target or of an acceptable size. Remember people will want to reach for a stylus if things go wrong or if they don’t feel confident that their selection will be accurate. • As well as being sized to accommodate finger-input, users need to perceive keys and other screen elements to be adequately sized for accurate selection. Explore ways to minimise concerns about finger size relative to key size. Maximise the perceived size of elements, through visual design, ensuring that a good delineation of keypad elements is presented. • Also, to minimise mis-selection ensure that sufficient space between entries in a vertical list. Icons & labeling • Carefully consider iconography. Make use of familiar icons (and colour conventions) so users can associate with them. • Consider colour icons that have detail to them to make the most of graphical capabilities. • Aim towards a high contrast between discrete touch elements, text, and background colours. Also, to enhance visibility, controls and text should not be placed over an image or patterned background. • Where icons are relatively abstract, users will become frustrated if they continually struggle to locate and use target features (e.g. without a physical key, ensure that the means to end a call is highly visible). While preserving a non-cluttered display, consider supplementing graphical symbols (such as icons) with labeling or other textual cues. • To aid legibility on small screens, especially across lighting conditions, consider adopting a sans serif font for all text and labels. • Labels and instructions should be short and simple, with abbreviations avoided, if possible. • Allow icons to be suitably sized and spaced, so they can to be readily selected without worrying about accidental selection of nearby icons/screen elements. Acceptance of Future Technologies Using Personal Data: A Focus Group with Young Internet Users Fabian Hermann, Doris Janssen, Daniel Schipke, and Andreas Schuller Fraunhofer Institute of Industrial Engineering Nobelstr. 12, D-707569 Stuttgart, Germany {fabian.hermann,doris.janssen,daniel.schipke, andreas.schuller}@iao.fraunhofer.de Abstract. Future technologies in smart and social environments are expected to use personal data extensively. As young users of today’s social web platforms already take risks of privacy loss, the question of acceptance of technology using personal data and influencing factors appears of to be of strong relevance. We present results from a focus group with ten young internet users which indicate different attitudes on privacy and different aspects of social influence on use decisions. Implications for technology acceptance theories are discussed. Keywords: Technology acceptance, smart environments, social web, privacy. 1 Introduction Ubiquitous computing systems are described as complex systems that use situational and personal data, derive conclusions from them, and adapt the system UI and behavior partly autonomously (see e.g. IST Advisory Group, 2001, 2003). These functionalities rely on highly integrated data on the physical environment, situation, but also the user’s location, preferences, interaction behavior, etc. In this respect, future ambient and mobile social systems bear similar and even higher risks as currently discussed social web platforms: Users risk a loss of privacy because of permanent storage of personal data, profiling and address trading by hosts etc. (Hildebrandt, 2008). Nevertheless, social web media are broadly accepted in the markets (Universal McCann, 2008) and a frankness unexpected until now spreads in particular among younger users (The National Campaign, 2008). On this background, factors influencing technology acceptance appear to be of high relevance. 2 Technology Acceptance Theories The acceptance of technical systems was well investigated in various studies resulting in models that describe factors influencing user acceptance. One of the well established theories is the Technology Acceptance Model TAM (Davis, 1989; Davis, Bagozzi,, & Warshaw, 1989; Venkatesh & Davis 2000). This model predicts the intention to use a system. The factors influencing this intention are: J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp.431–437, 2009. © Springer-Verlag Berlin Heidelberg 2009 432 F. Hermann et al. • usefulness: the perceived or expected practical advantages of the system • effort: the expected effort to use the system Other variables were not included in this original model, as it was assumed that other important factors like individual abilities, tasks, system type, situational constraints etc. mediated the perceived usefulness and effort. In order to model these external variables explicitly, a new version of TAM, the Unified Theory of Acceptance and Use of Technology UTAUT was proposed (Venkatesh, Morris, Davis, & Davis, 2003). It assumed that system use is influenced by “facilitating conditions” like system accessibility, training support etc. Due to the UTAUT, the intention to use a system is depending not only on the expected system “performance” and effort expectancy but also by social influence. Social influence measures the perception of social pressure to use a system. These models were mainly indented to predict system acceptance in organizational contexts and professional use. They were also applied and partly adapted to describe consumer decisions for private technology purchase and use (e.g. Carlsson et al., 2006; Kwon, 2000; van Biljon et al., 2007). While these studies worked on more classical technologies, the acceptance of emerging technologies using personal data was investigated by acceptance models of ubiquitous computing services. Beier, Rothensee, and Spiekermann (2006) used the following predictors for acceptance of such technologies (together with the already introduced “usefulness”): • Risks, e.g. loss of time or financial risks that may result from system use a user perceives • Control: perceived controllability of system behavior by the user Both variables were expected to influence the usage intention via the emotional attitude towards a system as a mediating variable. Spiekermann (2008) added another variable: • Privacy: necessity to provide private data and the user’s concerns of them being given away. This variable was expected to have a negative impact on usage intention via the mediating variable “affective attitude”, i.e. the general emotional attitude towards the system. Taken together, the acceptance research found stable effects for usefulness as well as practical issues like effort or expected risks. Newer results on future systems stress the influence of perceived control on usage intention. Interestingly, concerns about private data were hypothesized, but could not be shown to have significant impact on usage intention (Spiekermann, 2008). 3 A Focus Group with Adolescents To get a picture about young internet user’s privacy-related behavior, their use of internet platforms and applications, and acceptance we conducted a focus group with young internet users. Further discussion topics as the participants’ ideas and wishes for future technology trends yielded no relevant results and therefore are ignored here. Acceptance of Future Technologies Using Personal Data 433 3.1 Procedure Sample. Two sessions were carried out with altogether 10 participants. In one session, 6 participants from 14 to 17 were invited. In a further session, 4 adolescents from 17 to 19 participated. Participants have been acquired by a chain email advertisement initially send to employees of an university institute. The sample can be characterized as follows: 6 male, 4 female, age from 14-19, internet use on average since the age of 10, mobile phone use on average since the age of 8, average online time 3.7 hours per day. Participants used instant messenger like ICQ, MSN or Skype very frequently as their main online communication medium. They stated to send 25 mobile short messages (sms) and 16 emails per week on average. Stationary computers are mainly used for instant messaging, gaming, and music. Also school tasks play an important role on the PC. Mobile phone games do not play an important role for any participant. This pattern of frequent use of online communication was quite homogeneous amongst participants. No participant used online media rarely. Open Discussion. After an initial questionnaire on internet behavior and general communication patterns (like mobile phone use and instant messengers) a creativity method (6-3-5 method) was used to initiate the discussion. The following discussion on generated ideas was moderated by one session leader with the goal to foster a vivid, open exchange of thoughts. The topics included current use of technology, acceptance of new technologies and privacy behavior, and ideas and expectations on future life and its support through computers, internet, artificial intelligence etc. In some cases, the moderator directly posed open questions on the topics of interest to direct the discussion and to encourage statements on issues like privacy or social pressure. Many issues were addressed repeatedly, while others were discussed only once depending on the argument line of the open discussion. The analysis of the discussion was done by transcripting parts of a session video. A rater clustered discussion statements related to the issues of acceptance, privacy, and social influence. Statements are qualitatively interpreted in the following chapter. Questionnaire with Open Items. Last step of the session was a questionnaire with several questions on particular issues on communication technology use. The following open items directly addressed the participants’ attitude towards data privacy and related behavior: • How do you safeguard your personal data using the internet (in general, when using blogs, communities, chats)? • Are you using pseudonyms? • Are you feeling watched when surfing the net? • Do you think it’s good if companies use your data (e.g. which sites you’re visiting) to give you personalized offers and advertisements? • How important is the fact to you that no one knows which sites you visited? Why? • If you are chatting or surfing the internet is it important to you that no one can watch your monitor and see what you’re doing? On the basis of the individual answers to these questions a rater classified the participants into types of attitudes towards privacy. It was possible to synthesize categories that describe the main direction of the answers of each participant. For most participants, the answers to the different questions appeared to be quite 434 F. Hermann et al. homogeneous. In many cases, participants referred to their own answers on previous questions. However, some answers of two users were inconsistent. The rater then decided to assign these users to the category based on the most prevalent answers. 3.2 Results: Different Attitudes towards Protection of Private Data The following categories were derived to characterize the users’ privacy-related behavior and attitude: • Naive users aren’t aware of any problems regarding data protection. These users don’t think that anybody would be interested in his special actions or personal data, so nobody would try to find out about them. A characteristic statement of one participant (14, male) was: “I don’t feel observed and nobody can see what I have made, because nobody knows my password”. • Frank users don’t mind about privacy and are willing to let anyone know things about themselves. Typical statements here expressed that one has nothing to hide. For example, one participant (male, 18 years old) answered “Many people can see what I’m doing [in the internet]. But I don’t care about it. I don’t have anything to hide.“ • Sensitized users are aware of risks and potential problems from publishing private data. They are willing to live at the best with it. They adopt strategies to protect private data or identity, for example, by using different personas, trying to act anonymous, or avoiding using tools they don’t trust. A typical statement here was given by a participant (15, male): “I use usually different nick names and change my identity.” Figure 1 shows the distribution of the different user characteristics amongst the ten participants of our focus group. 3.3 Results: Social Influence on Technology Acceptance During the open discussions, participants stressed the fact that not participating in communication technologies, in particular social internet platforms would result in alienation from the peer group. One of the participants said that one would feel as a loner if one would the only one not using a certain technology. Another participant said that if you don’t share information in a social network, then no one “would like to chat with you via IM”. Another set of arguments addressed social facilitation: Participants said that if everyone would get used to a technology or interaction mode, public behavior would become acceptable even if had appeared awkward before (like using speech commands or gesture interaction on a mobile phone in public). Participants also mentioned they would use communications media like instant messengers or social community platforms when interacting with younger people, whereas using e-mail was to send job applications or to communicate to older people. A variant of this argument appeared when the issue of embarrassing content like party photos posted in social communities was raised. One of the older participants said that a prospective employer searching for applicant’s web information would wonder if there were only well-behaved pictures in your profile. Participants Acceptance of Future Technologies Using Personal Data 435 discussed that people might reason why someone has no party pictures, that they are hold back on purpose, or that he has no social contacts whatsoever. Fig. 1. Frequency of user’s attitudes (absolute numbers) 4 Conclusions The results of the focus group show two aspects that may have impact on acceptance research: 4.1 User Attitudes on Privacy We found indication that users have different attitudes towards privacy. According to the statements users made during the session, they have adopted different behavior styles from naive and unaware use of profiles to the strategies sensitized users follow to protect private and identity information. The age distribution in our small sample support the assumption that privacy issues become more important the older users get and the more they acquire knowledge and media competences. This suggests that privacy and other factors on acceptance may not only depend on features of the system and the user’s evaluation of them, but also on person characteristics like e.g. knowledge about information use and risks, experiences with concrete consequences of publishing private data, etc. However, it seems not to be plausible that interindividual differences in privacy concerns are stable. In particular for the participants in our focus group, we assume some of the different attitudes to represent different stages of knowledge about risks and possible consequences of risky behavior. There might be more stable differences found when looking at a broader range of age. The general idea of different attitudes on technology and resulting user types of acceptance can also be found in the field of innovation adoption (Rogers, 2005), were different typical behavior styles of adopting new technologies are characterized. This may have implications on the modeling of privacy perception in the acceptance models: Even if no significant influence of privacy concerns on acceptance as subjective measure could be found (Spiekermann, 2008), there might be 436 F. Hermann et al. an influence on use behavior and strategies. Also subgroups of differently sensitized users who do care about technologies risks might be found. 4.2 Peers Influencing Usage Decision Several statements of our focus group highlight the social influence on decisions to use or avoid technologies. Statements imply direct peer pressure from the adolescent’s friends and peers as well as informal comparisons with the cohort of comparable age and social group that seems to have impact on personal decisions to use a technology. Communication media used in the age-group of adolescents seems to be much more attractive than “old-fashioned” means of communication like e-mail. Generalized expectations by others and norms of technology are taken into consideration when deciding for usage. They also seem to influence the acceptance of possible⎯ known or unknown⎯ risks like abandoning parts of privacy. Although not occurred in our focus group, we expect a further interesting aspect of social influence to have impact on voluntary usage decisions: Peers may serve as trusted behavioral models that facilitate purchase decisions by reducing complexity of research on advantages and risks. An integrated model of user acceptance should cover the known influences like usefulness, risks and privacy, but also different types of social influence. The already investigated construct in UTAUT (Venkatesh et al., 2003) of social influence was found to be moderated by other variables, in particular usage experience (Li, Kishore, 2006). The concepts of self-identity and related internalized norms (a well-established variable from the theory of reasoned action, Fishbein & Ajzen, 1975) was shown to have impact on technology acceptance also for voluntary decisions but it’s relation to other constructs are open (Lee, Lee, & Lee, 2006). Considering the privacy risks of current social media and future technologies the clarification of social influence on technology use and risky behavior still is of high importance as well as the investigation of related believes and attitudes among users and social mechanisms. References 1. Beier, G., Rothensee, M., Spiekermann, S.: Die Akzeptanz zukünftiger Ubiquitous Computing Anwendungen. In: Heinecke, A.M., Paul, H. (eds.) Mensch & Computer 2006, pp. 145–154. Oldenbourg Verlag, München (2006) 2. Carlsson, C., Carlsson, J., Hyvonen, K., Puhakainen, J., Walden, P.: Adoption of Mobile Devices/Services — Searching for Answers with the UTAUT. In: Proceedings of the 39th Annual Hawaii international Conference on System Sciences, vol. 6 (2006) 3. Davis, F.: Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly 13(3), 319–334 (1989) 4. Davis, F., Bagozzi, R., Warshaw, P.: User Acceptance of Computer Technology: A Comparison of Two Theoretical Models. Management Science 35(8), 982–1003 (1989) 5. IST Advisory Group: Scenarios for ambient intelligence in 2010. Final Report. European Commission (2001), ftp://ftp.cordis.lu/pub/ist/docs/istagscenarios2010.pdf (21.2.2009) Acceptance of Future Technologies Using Personal Data 437 6. IST Advisory Group. Ambient Intelligence: from vision to reality. Draft Report (2003), ftp://ftp.cordis.lu/pub/ist/docs/istag-ist2003_draft_ consolidated_report.pdf (21.2.2009) 7. Fishbein, M., Ajzen, I.: Belief, attitude, intention, and behavior: An introduction to theory and research. Addison-Wesley, Reading, MA (1975) 8. Hildebrandt, M.: Profiling and the Rule of Law. Identity in the Information Society Journal (2008) 9. Kwon, H.S.: A Test of the Technology Acceptance Model: The Case of Cellular Telephone Adoption. In: Proceedings of the 33rd Hawaii international Conference on System Sciences, vol. 1 (2000) 10. Lee, Y., Lee, J., Lee, Z.: Social influence on technology acceptance behavior: Self-identity theory perspective. SIGMIS Database 37(2-3), 60–75 (2006) 11. Rogers, E.M.: Diffusion of Innovations, 5th edn. Free Press, New York (2003) 12. Spiekermann, S.: User Control in Ubiquitous Computing: Design Alternatives and User Acceptance. Shaker Verlag, Aachen (2008) 13. The National Campaign, Sex and Tech-Results from a Survey of Teens and Young Adults (2008), http://www.thenationalcampaign.org/ (21.2.2009) 14. Universal Mc Cann, Power to the People. Social Media Tracker Wave 3 (2008), http:// www.universalmccann.com/Assets/wave_3_20080403093750.pdf (21.2.2009) 15. van Biljon, J., Kotzé, P.: Modelling the factors that influence mobile phone adoption. In: Proceedings of the 2007 Annual Research Conference of the South African institute of Computer Scientists and information Technologists on IT Research in Developing Countries (2007) 16. Venkatesh, V., Davis, F.: A Theoretical Extension of the Technology Acceptance Model: Four Longitudinal Field Studies. Management Science 46(2), 186–204 (2000) 17. Venkatesh, V., Morris, M.G., Davis, G., Davis, F.: User Acceptance of Information Technology: Toward a Unified View. MIS Quarterly 27(3), 425–478 (2003) Analysis of Breakdowns in Menu-Based Interaction Based on Information Scent Model Yukio Horiguchi, Hiroaki Nakanishi, Tetsuo Sawaragi, and Yuji Kuroda Graduate School of Engineering, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto 606-8501, Japan {horiguchi,nakanishi,sawaragi}@me.kyoto-u.ac.jp Abstract. High communicability of the menu-based system is on the basis of consistent vision and clear policy in designing the system of menus, and then they should be perceivable to the users. In this light, failures in menu-based interactions can be explained that they might emerge from lack of information in the users’ available cues to identify the design vision. This study focuses on communicative breakdowns in menu-based human-computer interactions from this perspective, and investigates their causes in ill-organized structures of menu hierarchy in terms of the user’s interpretation of the menu items. Pirolli’s information scent model is extended and utilized as an analytical tool for describing the meaning system of menus from the users’ point of view, and their decision making in search of particular menu items is analyzed by use of information scent. Keywords: Menu-based interaction, information scent model, communicative breakdowns, human-computer interaction. 1 Introduction Menu design not compatible with the users’ style of decision making easily misdirects and confuses their search of target items. When effects of a menu selection disagree with what was meant to be the case, it will disturb the user’s understanding of how he/she may or must interact with the computerized system. This picture can be explained in terms that the user’s mental model differs far from the design model or the designer’s conceptual model of how the system should work [1], which means the user does not accommodate the proper usage of the system the designer originally intended. It is ideal that both of the mental models do coincide with one another in essentials. However, the user makes use of only a portion of the product functions in usual and will not necessarily experience all of them. Therefore, deliberate design of menus is required so that certain experiences with use of the system can enable the user to predict how he can access other unknown functions the system is equipped with. In other words, underlying design intent and principles should be communicated to the user through interactions with the system efficiently and effectively. de Souza emphasizes such an important role of interactive artifacts to mediate a kind of “designer-to-user communications”, and redefine the concept of communicability in this sense [2]. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 438–445, 2009. © Springer-Verlag Berlin Heidelberg 2009 Analysis of Breakdowns in Menu-Based Interaction 439 Usability of a hierarchical menu system is characterized by a structure in which the menu items are organized as well as by a familiar terminology with which they are written, and both of these characters should be designed as sensible, comprehensible and convenient forms relevant to the user’s task [3]. On the other hand, high communicability of the menu-based system is, by definition, on the basis of consistent vision and clear policy in designing such systems of menus, and then they should be perceivable to the users. In this light, failures in menu-based interactions can be explained that they might emerge from lack (or inconsistency) of information in the users’ available cues to identify the designer’s vision. In this study, we focus on breakdowns in menu-based interactions from this perspective, and investigate their causes in ill-organized structures of menu hierarchy in terms of the user’s interpretation of the menu items. Pirolli’s information scent model [4-6] is extended and utilized as an analytical tool for describing the meaning system of menus from the users’ point of view, and their decision making in search of particular menu items is analyzed by use of information scent. The scent measure, which can estimate the strength of each option to attract the user’s attention relevant to a particular goal, is applied to specifications of possible discrepancies between the designer’s intended usage and the user’s actual decision. 2 Information Scent of Menu Relevant to User’s Goal Two different activation patterns derived from one common spreading activation network are compared for measuring the scent value of a menu item. One pattern of them represents the activities of concepts (to be precise, indexing words) induced by the user’s goal whereas the other simulates the activities induced by the menu texts the user has encountered on the UI. The network of words was built from a text corpus, i.e., a large collection of documents, whose subject is to provide descriptions about the usage of the product’s functions. In this network, every directed arc has weight derived from the conditional probability at which its source word would appear in a document containing its destination word, and each node has base level activation derived from the probability at which the corresponding word would appear in a document. So as to calculate these probabilities, we utilize instruction manuals as the corpus, which are decomposed into documents in accordance with its functional units, because it has sufficient statements about all the functions of the product in terms both of quality and quantity. The detailed descriptions on this calculation are given in [7]. As shown in Fig. 1, the scent value of a menu item is calculated according to the following procedure: 1. Each word’s activation level induced by the user’s goal, i.e., L = ( L1 , L2 ,...) , is derived after all words’ activities in the user’s task description Q have spread in the network. 2. Each word’s activation level induced by the menu texts, i.e., R = ( R1 , R2 ,...) , is derived after all words’ activities in the target menu description C have spread in the network. 3. The scent value of the menu item C in relation to the task description Q is given by the inverse Euclidean distance between the two activity patterns, i.e., L and R . 440 Y. Horiguchi et al. As is clear from its definition, the more similar the two activity patterns in response to the different activation sources, the larger the menu’s information scent. Q Task Description (Q) Menu Description (C) C Activation Pattern triggered by task description terms L Activation pattern triggered by menu description terms Network of Indexing Terms trm1 trm2 … trmN R trm1 trm2 … trmN Information Scent ⎛ ⎜ g (C , Q) = ln⎜ ⎜ ⎝ ⎞ ⎟ 2 ⎟ ∑i ( Li − Ri ) ⎟⎠ 1 Fig. 1. Diagrammatic illustration of how the information scent of a menu item is calculated. Two different activation patterns derived from one common spreading activation network are compared for measuring the scent value of each menu item. 3 Breakdowns in Menu-Based Interaction 3.1 Experiment A DVD recorder, which is one of typical multifunctional electric appliances that have hierarchical menus, was employed as the target application system. Twelve female users from three different age groups (30’s, 40’s and 50’s: each age group contains four people) participated in the experiment, and all of them have no experiences with DVD recorders whereas they have some with VCRs. Four different tasks listed below were prepared for this experiment, and they are related to programming or configuring the recorder: • Task 1: “Program the recorder for timer recording of a television show on which a particular on-screen talent will appear.” • Task 2: “This recorder has a capability to display closed captions on the television screen for terrestrial and BS digital broadcasts. Configure the recorder to display the captions.” • Task 3: “Configure the recorder for recording the second audio programs provided by multichannel broadcasting services.” Analysis of Breakdowns in Menu-Based Interaction 441 • Task 4: “This recorder has a capability to adjust timer recordings automatically to any airtime changes of the scheduled programs when some extension or delay of the prior programs has occurred. Configure the recorder for enabling this function.” The participants performed these tasks in the order from Task1 to Task4. Among them, the later task would be more difficult for the users because its goal is a peripheral function that is rare to be used and thus that is located at ‘out-of-the-way’ corners of the menu hierarchy. Each task was specified on a sheet of paper1 which was presented to the users immediately before a measurement session. After the experimenter had confirmed the user’s sufficient understanding of the task without the sheet, he gave her a cue to start operation. During each session, the users were not allowed to refer to the sheets. A base time limit was set to four minutes that was used to judge the exit state of the participants’ performances. 10 Task1 Task2 6 Task3 6 Task4 1 0% 0 3 20% 60% 0 5 6 40% 0 3 1 4 2 0 Success Give-up Time-out Mistake 1 80% 100% Fig. 2. Summary of the participants’ performances. The results of individual sessions are classified into four different classes: “success”, “give-up”, “time-out” and “mistake”. 3.2 Results Fig. 2 presents the summary of the participants’ performances where the results of their individual sessions are classified into four different classes: “success” represents the state that the user successfully completed her task in time; “give-up” represents to the state that the user gave up her task; “time-out” represents to the state that the user was interrupted by the experimenter to abandon her operations because it seemed to be no chance for her to complete the task; and “mistake” represents the state that the user could not find the correct menus although she declared she had finished the task for herself. The result indicates that Task4 is the most difficult while Task1 is the easiest of all. 3.3 Analysis of Failures during Menu-Based Interaction Low communicability of an interactive system can be evaluated by numerous patterns of slips, mistakes and failures spotted during interaction between the user and the system. 1 The descriptions of all the tasks were given in Japanese. 442 Y. Horiguchi et al. The concept of communicative breakdown is prepared to capture instances of such problematic interactions [2]. A communicative breakdown will appear during interaction between the user and the computerized system when the effects on the state of affairs induced by his/her operations do not coincide with what was meant to be the case. From this perspective, failures during menu-based interactions are analyzed here. After all measurement sessions, the experimenter interviewed every participant about the reasons for her menu selections by watching playback videos together. With the use of their answers and comments as reference, failures of interactions between the users and the menu system were associated with the categories of communicative breakdowns. In accordance with de Souza’s method [2], problematic portions of user-artifact interaction were tagged with one or more virtual “utterances” of the users corresponding to the categories of communicative breakdowns such as • “What's this?” — the user is being unable to interpret what a certain interface element means, • “Where is it?” — the user is not finding where his/her expected element is, • “I can't do it this way.” — the user is abandoning a path of interaction composed of many steps, and so on. Fig. 3 illustrates an example of the analyzed discourse between a participant user and the DVD recorder where the user was performing Task1. In this figure, tags of communicative breakdowns are represented in the dialogue balloons. Elasped time 0:00:15 0:00:28 User's operation (selection of menu item) Behavior of menu system (display of menu screen or icon) (SESSION START) Communicative Participant's comment breakdown [BANGUMIHYO] "I thought it would have some menus for timer recording." [BANGUMIHYO] screen (cursor moving) (cursor operations) 0:00:38 [BACK] 0:00:47 [SAISEI NAVI] Where is it? I can't do it this way. "But there were no such menus." What's this? "I tried it because I had never used this button." TV screen [SAISEI NAVI] screen (cursor moving) (cursor operations) 0:01:12 [BACK] 0:01:33 [KINOU-SENTAKU] 0:01:42 [BACK] 0:01:45 [SUBMENU] 0:01:53 [KINOU-SENTAKU] 0:02:07 [BANGUMIHYO-NO-KENSAKU] Where is it? I can't do it this way. TV screen What now? "I had no choice but to select this button after all." What's this? "I tried it because I had never used this button, but" "I abandoned this path after the indication of invalid operation." [KINOU-SENTAKU] screen TV screen Invalid operation icon [KINOU-SENTAKU] screen [BANGUMIHYO-NOKENSAKU] 0:02:17 What's this? "Since I couldn't find any appropriate options among these "This was the only one option I thought plausible." [JINMEI-KENSAKU] [JINMEI-KENSAKU] screen (GO STRAIGHT TO THE GOAL) Fig. 3. An example of analyzed discourses where the user was performing Task1 with the use of the DVD recorder. The dialogue balloons represent the tags of communicative breakdowns. As clarified in Fig. 2, Task1 and Taks4 have quite a difference in their success rate. Fig. 4 compares these two different tasks in terms of frequencies of the communicative breakdowns. This bar chart indicates that breakdowns tagged with “I can’t do it this way.” occurred in Taks4 more than twice as frequently as in Task1. This type of breakdown involves the user’s becoming aware of a need for some reform of her search strategy since a series of her operations seemed not compatible with what the designer intended. Before it comes up in the user-system interaction, repetitions of “Where is Analysis of Breakdowns in Menu-Based Interaction 443 it?” were observed when the user did not find a certain expected element (i.e., “it”) among her selectable options. We can see a significant number of utterances of “Where is it?” in Task4 than those in Task1, and it reminds us that the design intent of the menu hierarchy should be distant from the users’ assumptions for interpreting the interface signs. This hypothesis is also supported by the high frequency of “What's this?” because it corresponds to the breakdown where user is looking for any other cue about what a particular interface sign means. In addition, both of these breakdowns should induce another type of utterance “What now?”. The latter indicates the situation where the user could not make sense of the interaction the designer intended and thus she was temporarily clueless about what to do next. 35 Task1 Frequency 30 Task4 25 20 15 10 5 0 I give up. Looks fine to me. Ia Where is it? Ib What What now? Whare am happened? I? IIa Oops! I can't do it this way. What's this? Help! IIb Complete Failure Why I can do doesn't it? otherwise. IIc IIIa Temporary Failure Thanks, but no, thanks. IIIb Partial Failure Fig. 4. Frequency distribution of communicative breakdowns. Task1 and Taks4 are compared in terms of frequencies of breakdowns because they have quite a difference in their success rate. 35 Frequency of selection 30 25 20 15 10 5 0 1 2 3 4 5 6 7 Rank order Fig. 5. Frequency distribution of the participant users’ menu selections with respect to the rank order of information scent. The users selected menu items of higher rank order more frequently. Both of the designer and the user have their distinctive assumptions for generating or interpreting the interface signs. The above result from the communication analysis suggests that there is a large difference between them, especially on peripheral functions 444 Y. Horiguchi et al. like the goal of Task4. In order to visualize this difference, the information scent analysis is applied to the user’s interaction with the menu system in the next section. 4 Analysis of Breakdowns Based on Information Scent The decision strategy of the participant users can be explained by the perspective of information scent. Fig. 5 shows the frequency distribution of menu items that the users actually selected with respect to the rank order of information scent. The histogram illustrates that the higher rank order menu items have, the more frequently the users selected them. The scent distribution has thus a power to explain and predict the users’ menu-selection behaviors. On the basis of this finding, the organization of menus was analyzed through the scent distribution. Table 1 shows the scents of each menu in relation to the four different tasks, where the numbers in blue boldface represent the highest values of a menu list while the underlined numbers represent the correct options the users should select to get to the goals. Table 1(a) presents the scent distribution in the portal menu screen while Table 1(b) presents the distribution in the menu screen after MENU 7 is selected in this portal. These two tables show a significant tendency that the more successful tasks like Task1 have more manifest scent in their correct paths. Conversely, in the less successful tasks like Task4, menu options competing to the correct one have stronger scent toward the goals. These menus are not compatible with the users’ decision strategy explained above. The analysis here indicates it can easily misdirect and confuse the users’ search of the goal items by attracting their more attention. It is lack of information in the users’ available cues. The users are in difficulty to identify how they may or must interact with this system, i.e., the design vision. This menu hierarchy can be said not to have a well-organized structure, especially for peripheral functions of the product. Table 1. Scent distribution among menu items in two different menu screens. Scent values of individual menu items are listed with respect to each task. (a) Portal menu screen MENU 1 MENU 2 MENU 3 MENU 4 MENU 5 MENU 6 MENU 7 Task1 1.168 2.704 2.135 1.331 1.373 1.139 0.976 Task2 0.898 1.539 1.530 1.260 1.304 0.839 1.292 Task3 1.100 1.535 1.511 1.218 1.283 1.032 1.305 Task4 1.052 2.137 2.106 1.478 1.564 1.062 1.218 (b) ‘MENU 7’ screen MENU 7-1 MENU 7-2 MENU 7-3 MENU 7-4 MENU 7-5 MENU 7-6 Task1 1.048 2.513 0.983 1.179 1.078 1.175 Task2 0.825 1.066 1.099 1.671 2.036 0.873 Task3 0.958 1.284 1.136 1.860 1.695 1.069 Task4 0.951 1.528 1.074 1.797 1.659 1.059 Analysis of Breakdowns in Menu-Based Interaction 445 5 Conclusion This paper discussed breakdowns in menu-based interactions between the users and the computerized system from the perspective of perceivable structures of the menu systems. Information scent model was utilized for comparing the meanings of menus from the users’ point of view and then analyzing the users’ decision makings in search of particular menu items. The communicative breakdown analysis confirmed that there is a large difference between the designer and the users in assumptions for signifying or interpreting the menus (i.e., menu items and their organization for listing), especially in the case of the product’s peripheral functions. On the other hand, the information scent analysis confirmed that the distribution of information scent among a menu list provides a powerful clue for predicting the user’s menu selection. This result supports the findings that the success rate of the users’ search will decrease if menu options competing to the correct one are designed to have stronger scents toward the goal. Menu designs not compatible with the users’ naturalistic decision making can easily misdirect their search. The latter analysis specified the discrepancy between the designer and the users which was suggested in the former analysis. Acknowledgments This work has been partially supported by the Grant-in-Aid for Creative Scientific Research No.19GS0208 of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan. We are grateful for their support. References 1. Norman, D.A.: The Psychology of Everyday Things. Basic Books (1988) 2. de Souza, C.S.: The Semiotic Engineering of Human-Computer Interaction. MIT Press, Cambridge (2005) 3. Shneiderman, B.: Designing the User Interface: Strategies for Effective Human-Computer Interaction, 3rd edn. Addison-Wesley Longman, Amsterdam (1998) 4. Pirolli, P., Card, S.K.: Information Foraging. Psychological Review 106, 643–675 (1999) 5. Pirolli, P.: The Use of Proximal Information Scent to Forage for Distal Content on the World Wide Web. In: Kirlik, A. (ed.) Adaptive Perspectives on Human-Technology Interaction: Methods and Models for Cognitive Engineering and Human-Computer Interaction, pp. 247–266. Oxford University Press, Oxford (2006) 6. Pirolli, P.: Information Foraging Theory: Adaptive Interaction with Information. Oxford University Press, Oxford (2007) 7. Horiguchi, Y., et al.: Analysis and Proposal of Hierarchical Menu Design from the Perspective of Communicative Breakdown. The Transactions of Human Interface Society 10(3), 21–34 (2008) (in Japanese) E-Shopping Behavior and User-Web Interaction for Developing a Useful Green Website Fei-Hui Huang1, Ying-Lien Lee2, and Sheue-Ling Hwang3 1 Oriental Institute of Technology, Department of Marketing and Distribution Management, Pan-Chiao, Taipei County, Taiwan, R.O.C., 22061 Fn009@mail.oit.edu.tw 2 Chaoyang University of Technology, Department of Industrial Engineering and Management, Wufong Township, Taichung County, Taiwan, R.O.C., 41349 yinglienlee@gmail.com 3 National Tsing Hua University, Institute of Industrial Engineering and Engineering Management, Hsinchu, Taiwan, R.O.C., 30013 slhwang@ie.nthu.edu.tw Abstract. In recent years there has been an increasing respect for green issues. It has been addressed in various products/services as well. There is still no website to support green customers’ decision process on electronic commerce (EC). The aim of this study is to understand user EC needs and expectations in order to elicit the design requirements of a useful interface. A questionnaire and an experiment were conducted to get users’ green knowledge and to detect user external behaviors interacting with computer when e-shopping. The study is centered on electric green products, including computers, communication devices, and consumer electronics. The results are used to produce the online-shopping process flowchart and several suggestions for improving e-shopping. The suggestions including information search, information display, and web site features have been addressed. From this, further research will focus on the design of web sites supplying consumers with green product information. Keywords: User-centered design, User-Web Interaction, Green product, Ecommerce. 1 Introduction In recent years there has been an increasing respect for green issues. It has been addressed in various products/services as well. In the information-rich world of today, the internet is simply an alternative mechanism for accomplishing certain communicationrelated functions. The internet provides the capability of inexpensively storing vast amounts of information in different virtual locations and supporting and facilitating several forms of interaction, including one-to-one, one-to-many, many-to-one, and many-to-many interactions [1] and [2]. The electronic commerce (EC) Web site design has allowed companies to provide customers with a large amount of information and more options to enhance consumers’ online experience. More and more companies and government organizations are developing web sites to convey green information in J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 446–454, 2009. © Springer-Verlag Berlin Heidelberg 2009 E-Shopping Behavior and User-Web Interaction 447 establishing an international image and reinforcing people's environmental values i.e. conservation, preservation, world of beauty. However, there has been a tremendous increase in the number of web sites and most of them have been designed without respect to the user’s cognitive thinking leading to a frustrating and disappointing experience for searching information. In addition, rarely can one find a web site catering to the needs of consumers searching for green products. This study is designed to observe how users search for specified product information and then making a purchase decision online. This project is an initial phase focusing on obtaining consumers’ needs of green information for guiding their buying behavior towards being more environmentally friendly. Therefore, consumers’ information seeking processes and needs have to be considered during the interface design. An experiment has been developed to understand users’ needs and interaction with Web resources in order to elicit the design requirements of a useful interface for dealing with information overload and usability problems. Here, the study is centered on electronic products, including computers, communication devices and consumer electronics. The aims of this study are: (1) to acquire target consumers’ preferences, purchase intention, and acceptance level of electronic green products (EGP) by using a questionnaire; (2) to develop an experiment for studying user-web interaction and information searching process from the e-purchase behavior; and (3) to provide several suggestions for developing an EGP Web site with user-centered interfaces in an EC environment from the results of the questionnaire and the e-purchase behavioral experiment. 2 Relevant Literatures Green information is very important to people in protecting our environment. The speed at which electronic technology is improving has shortened the life cycle of products contributing to increased pollution. The internet has already become a powerful tool for people to search information, evaluate alternatives, and make decisions before making purchases. Developing a web site for green products is not easy but important. Effective website design is necessary for improved customer satisfaction and enhanced consumer experience. Electronic commerce (EC) may exchange large amounts of product information between users and sites. Given the large amounts of information available at the site, user interaction with web sites becomes an effort. To improve user’s operations in EC requires understanding their behavior online. The user’s external behavior is important to understand because they may correlate with their cognitive need. What the user considers success can be seen from their behavior when interacting with the interface design. The user-centered interface design on web-based systems may assist the user in receiving the right information in the right way and in an acceptable time before making any purchases. Also, it may support user knowledge in a specific domain, minimize of the cost of interaction, and provide less information load. There is an increasing number of consumers making purchases online, however currently there is no model for EC purchasing decision-making. For traditional offline shopping there exists multiple proposed models, one of the most popular is the EBM model for purchasing decision-making process. It was abstracted 448 F.-H. Huang, Y.-L. Lee, and S.-L. Hwang from the EKB model. The consumer purchasing decision-making process can be divided into five stages: need recognition, information search, alternative evaluation, purchase, and after purchase evaluation [3]. This study aims to develop a flowchart of e-shopping process for users in the web environment from the real shopping process, including information seeking and decision making from the e-shopping behavior. The focus will be on the information search and alternative evaluation stages from the EBM model. Information is an important tool for the growing public support for environmental issues and also developing environmentally responsible behavior in many ways. With the right kind of information it is possible to influence consumers’ value priorities, and to persuade them to change their priorities [4]. Information is accessible in various forms, and nowadays people find it easier to search for information online. The internet is being searched both when a consumer’s objective is specific product or service information in anticipation of a purchase as well as when the objective is to obtain general information about a brand or product or service category [1]. There are two types of internet-based consumer information search behavior using six dimensions [5]. Specific information search was characterized as being extrinsically motivated, having an instrumental orientation, reflecting situational involvement, seeking utilitarian benefits, consisting of directed search, and focusing on goal-directed choices. General information search was characterized as being intrinsically motivated, having a ritualized orientation, reflecting enduring involvement, seeking hedonic benefits, consisting of non-directed search, and focusing on navigational choices. The complexity of consumer information search behavior is inherent. In order to develop a user-centered web-based system, anticipated needs, requirements, and expectations from the EC will need to be identified. To build effective and efficient human-centered electronic information systems, developers need to ground systems in a comprehensive understanding of the information-foraging process in context [6] and [7]. Collecting quantitative data on thoughts and feelings from userweb interactions besides physical movements is important to developing a usercentered web-based system. User-web interaction can be seen as (1) communication consisting of a series of transactions between the user and the web, and (2) information processing and problem-solving in which the user makes decisions based on the interpretation of information presented to him/her via an interface [8]. The store front for an EC transaction is the web site and online retailers invest in its design improvements. The usability of a website has be a focused upon in determining its success or failure according to human-computer interaction (HCI) literature. In the standard ISO-9241 Part 11, usability has been defined as ‘the extent to which a system can be used by specified users to achieve a specified goal with effectiveness, efficiency and satisfaction in a specified context of use’. The usability or HCI criteria are important in making the customer’s interaction with the website a satisfying one through the web site interface. An interface is the layer between the user and the system that facilitates human-computer communication [8]. This study researches eshopping behaviors for improving user-web interaction via the web site interface design. E-Shopping Behavior and User-Web Interaction 449 3 Method According to the latest report from Taiwan Network Information Center (See the TWNIC at http://www.twnic.net.tw/) up to January 31, 2008, the population ages 12 to 35 accounted for about 90% of the internet usage. In particular, the percentage of internet users in age group 16 to 20 rises to 96.95%. It can be seen that the major part of internet user in Taiwan is young adults especially in age 16 to 20. This age group has the highest percentage online presence and will be the next generation of online consumers. Here, an initial questionnaire and an experiment were conducted to investigate the user needs, captured by consumers’ external and mental patterns, to apply it on the interface design. 3.1 Collection and Analysis of Questionnaire Data To elicit responses from consumers in Taiwan, a questionnaire has been designed With survey questions concerning the preference and purchase intention of green products. Most questions provided multiple-choice items and allowed selecting multiple answers. Using the questionnaire method, a total of 291 (97%) questionnaires retrived, including active Web males (n = 197) and females (n = 94) students ages 18 to 21 at Oriental Institution of Technology (OIT) in Taiwan, were analyzed using descriptive statistics. The results indicated that 73.5% students had some form of knowledge about electric green products, and only 1.3% students have bought related goods. In addition, if 291 students are willing to buy green products, 60.9% students will choose computers, 45.7% of them will choose electric appliances, and 43.6% students will choose communication devices. The reasons of buying green products for students were protecting environment (78%), marketing (40.1%), and friend’s recommendations (31.4%). The reasons of not purchase green products were having no idea about the products (49.3%), price (49%), and limited selections of products (38.6%). 3.2 Experiment The subsequent experiment has been designed based on the results of the initial questionnaire and was conducted to detect users’ external behaviors interacting with web sites when e-shopping and to collect users’ mental pattern from experimental questionnaires. Participants. Forty undergraduate students at OIT in Taiwan, 20 males and 20 females in age 18 to 21 years, were paid to participate in the experiment. All had online shopping experience for an average of 30.83 (SD 20.4) months. Apparatus. A computer with internet capabilities provided for online information search and shopping. Input devices available included a keyboard and a mouse. Output device available was 15 inch liquid crystal display (LCD) screen. Procedure. An experimenter introduced a task, which is searching for laptop computer product information on the Web and deciding which to buy. All the participants used the same computer and were given the same task. After filling out a pre-experiment questionnaire each participant was instructed to find the item that they would most 450 F.-H. Huang, Y.-L. Lee, and S.-L. Hwang likely purchase online in any way that they preferred. During the experiment, the participant was video taped to analyze their user-web interaction, user e-shopping process, and time spent shopping from the online behaviors. After the participant completed the task of making the purchase decision, he/she had to fill out a postexperiment questionnaire. Measurements. The following sections for the measurements are pre-experiment questionnaire, user-web interaction, online shopping process, and post-experiment questionnaire. Pre-experiment questionnaire. It is designed with nine questions to obtain participants’ background, online experience, and anticipated features for buying a laptop, which was to be completed before the start of the experiment. User-web interaction. It is analyzed by objective measure in this study. A quantitatively-based measure has been developed by simple frequency counts to describe the nature of the interaction between user and World Wide Web. The interactions are captured by observers at an intermediate level of detail that incorporates behavior and quantitative aspects of the interaction. During the run of an experiment, the observer watched the interactions in real time, and used a specially designed form to capture the source, the recipient, and the type of the interactions between the user and the web sites. After the experiment, the results of interaction were double checked from the video tapes. Online shopping process. It is presented by a simply flowchart constructed from the participants’ e-shopping behavior. The flowchart is hypothesized to facilitate researchers understanding users’ cognitive style and habitual behavior for dealing with the current Web environment. Post-experiment questionnaire. It is designed with fifteen open-ended questions to get information such as what kind of laptop the participant wanted, where to buy, and why and to elicit their intentions, experiences, decision-making, information load and suggestions for online shopping process. 4 Results 4.1 Pre-experiment Questionnaire In this experiment, the average experience of participants’ interaction with World Wide Web resources is 102.45 (SD 19.85) months and the online purchase is 30.825 (SD 20.35) months. About 55% of the participants have had the experiences of buying clothes, 43% of buying accessories, and 40% of buying books online. Before execute searching information online, the anticipated features of laptop product are that 63% of the participants value tech specs 58% design, 55% usefulness, and 53% size and weight. 4.2 User-Web Interaction The results is shown in Table 1, one can see that the participants visited the mean number of 34.63Web pages and of 9.13 (SD 8.05) Websites by using keyword/hierarchial/others search functions for mean number of 20.85 (SD 13.59) times E-Shopping Behavior and User-Web Interaction 451 in 71.81 (SD 22.32) minutes. This demonstrates that the users visited many web sites and web pages before having sufficient information to make their decisions. The userweb interaction data is analyzed by interaction ratios. In search anticipation ratios, the ratios for females (ratios=1.38), for males (ratios=1.67), and for all participants (ratios=1.5) are larger then 1.0 indicating that users using the search functions are able to locate more relevant information that non relevant information. The ratios <1 means that users using the search function are less likely to be able find relevant information. The information ratio is 1:0.75, meaning text and image information are both important for users. Table 1. Summary of results for user-web interaction Used search functions Keyword F Interaction types M Total Interaction ratios F M Total Hierarchial others 7.7 7.46 12 (7.04) (4.93) (6.19) 4.58 5.6 8.8 (6.26) (4.48) (4.6) 6.65 10.4 6.08 (4.83) (6.77) (5.52) Search anticipation 1.38 ＝ Search anticipation＝1.67 Search anticipation＝1.5 Visited Web pages 37.5 (18.27) 31.75 (15.4) 34.63 (16.9) Web sites data format text Image 10.75 15.7 9.95 (9.3) (7.95) (2.96) 7.5 15.9 13.8 (6.4) (8.63) (6.94) 9.13 15.8 11.88 (8.05) (8.19) (5.62) Information anticipation 1 0.63 1 0.87 1 0.75 ： ：： ＝ 4.3 Online Shopping Process The online shopping process from the experiment has been simply drawn as a flowchart (please refer to Figure 1). About 85% of participants use Yahoo.com as their web portal and use the keyword, hierarchical, or other search functions to get wanted information based on their preferences for a laptop computer including tech specs, design, usefulness, size and weight, and/or price. Then, they will narrow down the choices of which to buy. At that time, they will need the information to compare about the tech specs, price, consumers’ rating, and/or user reviews to aide in their decision making. Here one can see that the consumer has to make a decision on whether or not to make a purchase. If the consumer does not make a decision to purchase, then they will have to repeat the searching process again from the mean number 9.13 (SD 8.05) of web sites until a suitable product is chosen for purchase. After the experiment, a few participants may want to see the real product in the offline store before deciding where to make their purchase. 4.4 Post-experiment Questionnaire In addition, the results revealed that about 70% of participants were satisfied about the online shopping process and was due to a large amount of information (40%), the information easy to be obtained and understood (40%), and convenient (12%). The remaining 30% of participants were unsatisfied with the online shopping process because of difficulty in obtaining more specific information (69%), insufficient information (15%), and information overload (15%). During the experiment, every 452 F.-H. Huang, Y.-L. Lee, and S.-L. Hwang participant came across unnecessary information or visited wrong web pages/ web sites sometimes because 21% of users reached unrelated information, 29% users using the hierarchical search do not get right information because the classification model is not match their mental thinking, 29% users could not find wanted information online with the resources available to them. Users can buy the laptop computer online for the following reasons; 53% value high seller reputation, 50% value deliver service, and 25% value better price. Keeping track of information that has been retrieved can be a difficult process. The two main methods are searching the browsing history of the browser and keeping multiple browsing windows or tabs open. However 78% of users still have difficulty in relocating information that they have previously found. With so much information to process, some users may face information overload. How does one deal with this problem. 95% of consumers felt overloaded with information. After the experiment, the participants provided the following suggestions to improve the online shopping experience: 30% prefer more online security, 30% prefer a well designed interface, 28% prefer real and relevant information, 18% prefer information to be concise, 15% prefer improved speed of web site, and 13% prefer the use of product placement advertisement to introduce new products. In addition most of the users agreed that having the ability to do a side by side comparison of products from different vendors from a single website would greatly assist in making their purchase decision easier and quicker. Www.oit.edu.tw Search portal Ex. Yahoo, Google Preferences Ex. Tech specs, Design, Usefulness, Size & Weight, Price Searching Information Ex. Keyword, Hierarchial, others Learn more laptop products information from Websites/Web pages Decide possible candidates for purchase No Compare Information Ex. Tech specs, Price, Consumer ratings, User reviews Select one to purchase? Yes Select store to make purchase. Fig. 1. Online shopping process flowchart E-Shopping Behavior and User-Web Interaction 453 5 Discussion The following discussion topics follow from the results: information search, information display, and web site features. • Information search: Users search information by product preferences or requirements from their online experiences and knowledge about the product (please refer to Figure 1) via keyword or hierarchical search which plays an important role in the first step in finding the right information. The way the search behaves has to match the user‘s mental model or cognitive style in order to avoid unnecessary information overload from visiting a lot or unnecessary web pages thus improving interaction performance by using less search functions to get right information from less web pages. • Information display: This also plays an important role in helping people have the right knowledge about green products (please refer to Figure 1). Availability of green information may affect their shopping decision from traditional to green products. With green information 78% surveyed from the questionnaire would buy green products for protecting our environment. However, 49.3% of persons questioned have no prior experience with real green information. Green information from marketing (40%) and recommendation from others (31%) are very important in this case. Availability of green information on websites would enhance people’s environmental values. The laptop information that online users request most are: tech specs, design, reviews (negative and positive), computer accessories, comparison information (price, consumers’ rating, etc), new product information, and clear and definite information. The way to display the information would be using text and images in the ratio of 1:0.75, with comparison information integrated in one table allowing easier reading and comprehension. • Web site features: These play an important role in attracting visitors including (1) site reputation and services provided; (2) real information via social network that allow users to share their reviews; (3) automatic record tool to record the important information for the user including past searches and provide recommendations; and (4) comparison tool to compare specific criteria i.e. price or tech specs for specific items. 6 Conclusion Provided with more green information, consumers would be more willing to purchase green products. Consumers have grown accustomed to using the internet to search for their information needs. This study is aimed towards researching consumers’ online shopping behavior to obtain users’ needs and expectations on web-based system for improving users’ ability to gather information during their online shopping. For usercentered design on web-based system, user e-shopping behavior has been investigated producing the online-shopping process flowchart and several suggestions for improving e-shopping. The suggestions about information search, information display, and web site features have been addressed. Results from this study will be applied in the design of future web sites focuses on supplying consumers with green product information. 454 F.-H. Huang, Y.-L. Lee, and S.-L. Hwang Acknowledgments. The authors would like to express their gratitude to National Science Council of Taiwan for the funding under the grant number NSC-97-2221-E324-018-MY3. References 1. Peterson, R.A., Merino, M.C.: Consumer information search behavior and the internet. Psychology & Marketing 20(2), 99–121 (2003) 2. Peterson, R.A., Balasubramanian, S., Bronnenberg, B.J.: Exploring the implications of the internet for consumer marketing. Journal of the Academy of Marketing Science 25, 329– 346 (1997) 3. Engel, J.F., Blackwell, R.D., Miniard, P.W.: Consumer Behaviour, 8th edn. Dryden Press, Fort Worth (1995) 4. Ball-Rokeach, S.J., Rokeach, M., Grube, J.W.: The great American values test: influencing behaviour and belief though television. Free Press, New York (1984) 5. Hoffman, D.L., Novak, T.P.: Marketing in hypermedia computer-mediated environments: Conceptual foundations. Journal of Marketing 60, 50–68 (1996) 6. Garg-Janardan, C., Salvendy, G.: The contribution of cognitive engineering to the effective deasign and use of information systems. Inform. Services Use 6(5/6), 235–252 (1986) 7. Levy, D.M., Marshall, C.C.: Going digital: A look ata assumptions underlying digital libraries. Commun. ACM, 77–84 (1995) 8. Wang, P., Hawk, W.B., Tenopir, C.: Users’ interaction with World Wide Web resources: an exploratory study using a holistic approach. Information Processing and Management 36, 229–251 (2000) Interaction Comparison among Media Internet Genre Sang Hee Kweon, Eun Joung Cho, and Ae Jin Cho The Department of Mass Communication and Journalism 53 Myeongnyun-dong 3-ga, Jongno-gu, Seoul, 110-745, Korea skweon@skku.edu, putyourhope@gmail.com, holymars@nate.com Abstract. This research explores interactivity dimension in the portal media (such as Yahoo, Naver, Daum, Paran, and Nate)1. The research is designed to measure user’s perception of interactivity in the portal site at the three levels including 1) media 2) contents, 3) perception of HCI and CMC. This research also seeks the associated variables relationship among those variables through SEM (structural equation model). The 587 data was collected and was analyzed to test the hypotheses. The results shows that the dimension of the media side’s interactivity affected to the content’s side’s interactivity. The content side’s interactivity affected the user’s perception of portal media level either HCI and CMC media. Keywords: HCI, CMC, Interactivity, Communication, Community, Hypertext, Interface. 1 Introduction The future of the media is changing from fixed form to open form. Nicholas Negroponte confirmed that “media form of the future won’t be much like the ones in existence today.” Portal is constantly changing from the initial stage to multiple stages. Therefore, the user's participation will be increased as pull their wanted service. In addition, the portal service also will keep increased push their content to make increase media competition. The interaction functions are crucial factor in portal both the user and portal company. How the user has been perceived the 'interactivity' dimension. First of all, the definition of the interactivity is varied from scholar to scholar. The basic interactivity dimension is communication based interaction, technical interaction, psychological liminal interaction, mechanical transitive interaction. 2 The Concept and Measurement of ‘Interactivity’ There are several different types of interactivities from functional aspect to communication aspect. Most interactivity studies focuses on one dimension rather than multi-dimension such as social interactivity, psychological interactivity, message 1 In Korea, there are two types of portals: one is foreign portal such as Yahoo and Google; the other is Korea based portal such as Naver, Daum, Paran, and Nate. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 455–464, 2009. © Springer-Verlag Berlin Heidelberg 2009 456 S.H. Kweon, E.J. Cho, and A.J. Cho level, media level, and user level. Moreover, there are difficulty in the blurring line between communicator and audience, message and media, and genres. The digital media is going to convergence from technology convergence, to media convergence, and to user convergence. As is the same level, the interactivity keeps going on convergence from technological interactivity to content convergence. Therefore, this research define the concept of interactivity as three dimension 1) technological dimension (hypertext and interface), 2) content or message dimension (personal communication, community work, news or information, and 3) perception of media characteristics ( HCI and CMC). 2.1 Media (Technical) Interactivity There are two factors of portal’s technical interactivity: one is hypertext and the other is interface. The hypertext most often refers to text on a computer that will lead the user to other, related information on demand. Hypertext represents a relatively recent innovation to user interfaces, which overcomes some of the limitations of written text. Rather than remaining static like traditional text, hypertext makes possible a dynamic organization of information through links and connections. The interface (or Human Machine Interface) is the aggregate of means by which people—the users—interact with the system—a particular machine, device, computer program or other complex tools. The user interface provides means from the various interactivities. Deuze [13] argues that hypertextuality, interactivity and multimediality determine the ‘added value’ of online journalism, which he names as the ‘fourth’ kind of journalism that differs in its characteristics from traditional types of journalism. Content Aspect in interactivity Many researches confirm that the interactivity occurs from contents selection from various menus. Pavlik (1997) described the evolution of online content in three stages. The first stage involves ‘repurposing print content’ for the online edition. In stage two, content is rearranged with interactive features, such as hyperlinks, newinterface, and search engines. In stage three, the news providers create original news content for own media sites and designed specifically for the new medium. This type of content involves both new forms of storytelling and increased levels of interactivity from simple to complexity level Communication. There are several types of communication interactivity in the portal such as e-mail, instant messenger, and chatting. It is including not only synchronous communication but also a synchronous communication. According to David Fortin (1997) interactivity is ‘the degree to which a communication system can allow one or more end users to communicate alternatively as senders or receivers with one or many other users or communication devices, either in real time (as in video teleconferencing) or on a store–and–forward basis (as with electronic mail), or to seek and gain access to information on an on–demand basis where the content, timing and sequence of the communication is under control of the end user, as opposed to a broadcast basis.’ Jens Jensen (1998) defined interactivity as “a measure of a medium’s potential ability to let the user exert an amount of influence on the content Interaction Comparison among Media Internet Genre 457 and/or form of the mediated communication”, while Ha and James described interactivity as “the extent to which the communicator and the audience respond to each other’s communication need.” News and Information seeking interactivity. Portal’s main function is information providing, news services, and various contents services. There are many ways to percept the interactivity including online journalism, searching, learning, and various buying information seeking activities. Those portal navigation activities are ‘contents’ related interactivity. The previous research shows that the different media usage determined the message selection activities. Table 1 shows that newspaper user are select as much as political, international, and economic news, while portal news user are more interactivity to soft news such as health or life, information or communication, sports and entertainment. Community and group activity interaction. Portal has various cyber community activities such as cyworld, myspace, and second life, UCC community. The users conducted many types of social interaction, where users perceived certain pattern of interactivity. Stromer–Galley [44] argues that the term refers to two distinct phenomena: interactivity between people and interactivity between people and computers or networks Table 1. Interactivities Classficiations Dimension Type of Interaction Content Technical Interaction Playfulness/ Functional Remote-control, Channel Flip, Control Screen, Computer Control Penel, Technical Procedure Communication With content Information collection Text Selection, News, Information Seeking Communication Reciprocal Reply to the Mass Media Contents With content Communication Producer-User Inter-personal Useres Communication Inter-personal Connectedness / Communication network Interpersonal Communication E-amil, IM, SMS, Blog Social Association, Community Cyber-community Fig. 1. Perception of Interactivities Prospect in Digital Era Traditional Mass Media Digital Media evolutes to more interactivity such as IPTV, DMB and DTV Both Mass Media and Digital Media, Portal Media is the most ARS, Reply, Text Participat, Two-way communication From Face to Face, Chatting, IM etc. Digital-cyber community Fig. 2. The research model 458 S.H. Kweon, E.J. Cho, and A.J. Cho HCI VS CMC. The portal has two aspect media characteristics: one is human computer mediated communication and the other is computer mediated communication. The HCI is related to Computer as Source Model, where as CMC is related to Computer as Media Model. CMC (computer mediated communication) is communication interactivity, while HCI is information and news selection interactivity. 3 Research Questions 3.1 Research Question The research question is constructed to measure perception of interactivity in the portal sites. [Research Question1] The media characteristics (technological elements of interactivity) including hypertext and interface is positively affecting user's perception of portal interactivity. [Research Question 2] Are the characteristics of content or message interactivity positively related to the user's perception of interactivity in the portal sites? [Research Question 3] How user's perception of interactivity in portal to define media characteristics either HCI or CMC aspect? 3.2 Research Method Research Method and subjects. This research is designed to figure out the interactivity level in the portal sites using constructed questionnaires that is derived form previous studies. The literature reviews created research variables (media variables, contents variables, and perception variables). Using the survey method, the research measured the user’s perception of interactivity in the three levels. This survey’s subjects are most college students because they are using the portal media daily based. Therefore, they perceived the various the portal sites’ interactivities from technical to messages. They could provide the portal’s interactivity recognition. Operational definition of variables. To Measure the interactivities, the research design conducted the variables with measurable definition. The hypertext is webpage linkage, the interface is the contact point and usability. The contests variables are communication, community, and information. The user perception of the interactivity is classified two sides HCI and CMC. HCI is the portal as news or information sources model, whereas CMC is the portal as media model. Questionnaires. The questionnaires are constructed as several dimensions: 1) media usage time and year, 2) media factors, 3) contents factors, 4) user perception factors. This study adopted previous used questionnaires to measure the portal’s interactivity (McMillan& Hwang, 2002), [52]. The questionnaires as follow: To measure the dimension of portal interactivity, the researcher collected data using constructed questionnaires with survey method. The total sample was 587 respondents. The table1 shows demographics of respondents. The gender was: female (50.5%) and male (49.1%). A majority age is between 20 and 24 years old (46.1%), more than 25 and under 29 (00%), not apply (0.3%). Concerning of education variable is 46.1% of the sample responded are university students or graduated from Interaction Comparison among Media Internet Genre 459 high school, and 30.6% of the respondents answered that they didn't go to the high school or just graduated from middle school. And 23.4% of the respondents graduated from university. Regarding household income, 41% of the sample earns less than 40,000,000 won ($40,000), 24.3% earns between 40,000,000won ($ 40,000) and 50,000,000won ($50,000). 4 Results 4.1 Reliability and Factor Analysis After reliability test, the researcher conducted confirmatory factor analysis based on constructed concept. The each factor of the goodness of fit index is based on conservativeness. The index standard is as follow: GFI (Goodness-of-Fit Index: =0.90), AGFI (Adjusted Goodness-of-Fit Index: = 0.90), RMR (Root Mean Square Residual: = 0.05), NFI (Normed Fit Index: = 0.90), p value (= 0.90 a=0.5). Table 3. Confirmation Factor Analysis by Variables Media Con-tents Perception Factor Items in First Hypertext Interface Communication Community Information (HCI) (CMC) 7 3 5 4 4 3 2 GFI AGFI RMR .923 .876 .519 .038 .477 -.097 .897 .64 .987 .956 .849 .698 .784 .352 .056 .001 .029 .056 .006 .362 .540 p 21.34 895.73 122.99 34.12 53.12 145.794 188.93 .001 .000 .000 .003 .001 .000 .000 Table 4. Media Aspect Interactivity Factor Analysis 6) Link connection and immediately pop-up the related information window. 7) The user easily navigates one page to another. 8) The site has optimal information and its usability. 9) There are many selectivity information menus. 4) Linked information is highly related to the appropriated information. 5) The sites provide visual window construction as map orientation. 10) The site has communication menu section with many bulletin and menu 2) The user knows where the user would go. 1) The sitel provides the interface where I am while I am using or navigating. 3) The users can navigate what they want to go or sites. Eigenvalues % of Variable (%) Cumulative(%) Extraction Method: Principal Component Analysis Rotation Method: Varimax Hypertext Interface 0.782 0.091 0.780 0.747 0.739 0.683 0.663 0.133 0.121 0.167 0.334 0.217 0.312 0.177 0.173 0.910 0.134 0.884 0.302 4.388 43.88 43.88 0.781 1.555 15.55 59.433 460 S.H. Kweon, E.J. Cho, and A.J. Cho Table 10 shows the factor analysis and reliability of the questionnaires. The reliability scores are .85 in hypermedia and interface factors. Table 3 shows the confirmatory factor analysis and the result shows that all variables have statistical satisfaction form GFI to RMR. Table 4 shows the media aspect’s interactivity and conducted factor analysis. There are two factors: one is hypertext, the other is interface. These two factors explained 59% of media interactivities. Table 5 is content aspect’s interactivities. There are three factors: one is communication, the other is community, and the third factor is information factor. These three factors explained contents dimension’s interactivities with 64.05% of the variables. Table 5. Factor analysis of Contents aspects 3) wiring comment on the reading article 4) reading comment 5) writing article / updating photos 6) sending mail / chatting 7) updating information 11) making community about the subject 12) operating community 10) debating about specific subject 9) removing information to a blog 1) searching information 2) reading article 8) downloading information 13) joining community Eigenvalues % of Variable (%) Cumulative(%) Extraction Method: Principal Component Analysis Rotation Method: Varimax Communication 0.832 0.794 0.764 0.635 0.569 0.251 0.262 0.369 0.364 -0.105 0.251 0.097 -0.053 5.198 39.988 39.988 Community 0.245 0.050 0.394 0.468 0.553 0.842 0.821 0.676 0.448 -0.154 -0.165 0.297 0.441 1.997 15.360 55.348 Information 0.039 0.187 0.039 -0.116 0.124 0.028 -0.007 -0.062 0.386 0.815 0.780 0.617 0.465 1.131 8.702 64.051 Table 6 is user’s perception of interactivity in the two sides: one is HCI and the other is CMC. HCI is 41.479 % and CMC is 21.013% of the user perception variable. Table 6. HCI and CMC Questionnaires HCI 3) It is technical sensitive ./HCI 0.809 5) It is activity. /HCI 0.682 2) It is non-personal ./HCI 0.649 4) It is humanity. /CMC 0.097 1) It is personal communication. /CMC 0.197 Eigenvalues 2.074 % of Variable (%) 41.479 Cumulative(%) 41.479 Extraction Method: Principal Component Analysis Rotation Method: Varimax CMC 0.044 0.218 0.118 0.874 0.843 1.051 21.013 62.492 Interaction Comparison among Media Internet Genre 461 4.2 Correlation Analyses After factor analysis in the level of interactivity dimension, the researcher conducted correlation analysis among variables (hypertext, interface, communication, community, information, and perception –HCI and CMC). The correlation r is Table 7. Table 7. Correlation among variables in the interactivity Hyper-text Inter-face Communication Hypertext 1 Interface -.002 1 Communication .140(**) -.039 1 Community .069 .131(**) -.007 Information .378(**) .251(**) .005 HCI .314(**) .205(**) .066 CMC .018 .051 .194(**) ** Correlation significant level= 0.01. Community Information HCI CMC 1 -.005 .145(**) .276(**) 1 .265(**) -.023 1 .001 1 The result shows that hyper text factor correlated with communication and information interactivities, whereas the interface factor is correlated to the community and information factors. In addition communication variables is correlated to communication (CMC) perception , whereas community factor is correlated with HCI and CMC perception variables. The information factor is correlated to the HCI perception of the interactivities. 4.3 Structural Equation Model This structural equation model (SEM) is empirical confirmed by variable test. Table 8, and figure 3 is the results. Table 8. The result of path score by variables Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 Process Hypertext Æ communication Hypertext Æ Group/ Community Hypertext Æ Information / News Interface Æ communication Interface ÆGroup/Communication Interface Æ Information news Communication Æ HCI Communication Æ CMC Group/Community Æ HCI Group/Community Æ CMC Information News Æ HCI Information News Æ CMC Estimate .14 .07 .39 .-.04 .13 .25 .07 .26 .15 .20 .29 -.01 S.E .041 .041 .037 .041 .041 .037 .039 .039 .042 .039 .039 .040 t-value 3.425* 1.76 10.281 3.217* 6.852 1.664 3.742 6.745 -1.243 5.129 7.260 -.232 p *** .088 .001 *** .096 *** *** .214 *** *** .816 462 S.H. Kweon, E.J. Cho, and A.J. Cho Fig. 3. Structural Equation Model by Amos 7.0 5 Discussion This research shows the dimension of portal’s interactivities. There are three factors in the portal sites’ interactivities including technical or media factor, contents factors, user perception factor. The hypertext affect in communication and the interface factor is correlated to the information and community interactivities. The portal has two big different images in the aspect of interactivities including HCI and CMC. The web 2.0 is represented with sharing and participation. The HCI and CMC interactivities will be co-evolutes and co-existence in the digital media’s conversion feature. Acknowledgement. This research is supported by 2009 SBS Foundation Inc. References 1. Barker, C., Gronne, P.: Advertising on the WWW, p. 27, Master’s Thesis, Copenhagen Business school (unpublished, 1996) 2. Baker, M.J., Churchill Jr., G.A.: The impact of physically attractive models on advertising evaluations. Journal of Marketing Research 14(4), 538–555 (1977) 3. Barnes, S.: Computer-mediated communication: Human-to-human communication across the Internet. Allyn & Bacon, Boston, MA (2003) 4. Barnes, S.: Online Connection: Internet interpersonal relationships (2001) 5. Bell, D.: An Introduction to Cyberculture. Routledge, NY (2001) 6. Bolter, J.: Writing Space: Computers, Hypertext, and the Remediation of Print. Lawrence Erlbaum Associates, Mahwah, NJ (2001) 7. Bolter, J., Grusin, R.: Remediation: Understanding new media. The MIT press, Cambridge, MA (1999) 8. Bomie, A., Pohlmann, K.: Writing for New Media: The Essential Guide to Writing For Interactive Media, CD-ROMs, and the Web. John Wiley & Sons, Chichester (1998) 9. Bretz, R.: Media for interactive communications. Sage, Beverly Hills, CA (1983) 10. Miller, C.R., Shepherd, D.: Blogging as Social Action: A Genre Analysis of the Weblog (2004), http://hochan.net/archives/2004/04/0201:41AM.html 11. Coyle, J.R., Thorson, E.: The effects of progressive levels of interactivity and vividness in Web marketing sites. Journal of Advertising 30(3), 13–28 (2001) 12. Dahlgren, P.: Television and the public sphere. Sage Publication, London (1995) Interaction Comparison among Media Internet Genre 463 13. Deuze, M.: Online journalism: Modelling the first generation of news media on the World Wide Web. Online journalist review 6(10) (2001), http://www.firstmonday.dk/issues/issue6_10/deuze/index.html 14. Deuze, Mark: The Web and its journalisms: Considering the consequences of different types of newsmedia online. New Media & Society 5(2), 203–230 (2003) 15. Dillon, A., Gushrowski, B.: Genres and the Web: Is the Personal Home Page the First Unique Digital Genre? Journal of the American Society for Information Society 51(2), 202–205 (2000) 16. Fidler, R.: Media morphosis: Understanding new media. Pine Forge Press, Thousand Oaks, CA (1997) 17. Ghose, S., Dou, W.: Interactive functions and impacts on the appeal of Internet presence sites. Journal of Advertising Research 38(2), 29–43 (1998) 18. Gumbrecht, M.: Blogs as Protected Space (2004), http://www.blogpulse.com/papers/www2004gumbrecht.pdf 19. Hall, J.: Online journalism: A critical primer. Pluto Press, London (2001) 20. Kawamoto, K.: Digital journalism: Emerging media and the changing horizons of journalism. Rowman & Littlefield Publishers, NY (2003) 21. Kelleher, T., Miller, B.M.: Organizational blogs and the human voice: Relational strategies and relational outcomes. Journal of Computer-Mediated Communication 11(2) (2006), http://jcmc.indiana.edu/vol11/issue2/kelleher.html 22. Killan, C.: Writing for the web: Writer’s edition. Self-Counsel Press, Bellingham, WA (1999) 23. Kiousis, S.: Interactivity: A concept explication. New media & Society 4(3) (2002) 24. Macias, W.: A preliminary structural equation model of comprehension and persuasion of interactive advertising brand Web sites. Journal of Interactive Advertising 3 (2003), http://www.jiad.org/ 25. Manovich, L.: The language of new media. The MIT Press, Cambridge, MA (2002) 26. McMillan, S.J., Hwang, J.S.: Measure of perceived interactivity: an exploration of the role of direction of communication, user control, and time in shaping perception of interactivity. Journal of Advertising 31(3), 29–42 (2000) 27. Miller, C.H.: Digital storytelling: a creator’s guide to interactive entertainment. Focal Press, Boston (2004) 28. Miller, C.: Genre as social action. Quarterly Journal of Speech 70, 151–167 (1984) 29. Miller, C., Shepherd, D.: Blogging as Social Action: A Genre Analysis of the Weblog (2003), http://blog.lib.umn.edu/blogosphere/blogging_as_social_ action.html 30. Murray, J.H.: Hamlet on the holodeck: The future of narrative in cyberspace. MIT Press, Cambridge, MA (1997) 31. Newhagen, J.E., Bucy, E.P.: Routes to media access. In: Bucy, E.P., Newhagen, J.E. (eds.) Media access: Social and psychological dimensions of news technology use, pp. 3–23. Lawrence Erlbaum Associates, Mahwah, NJ (2004) 32. Newhagen, J.E., Cordes, J.W., Levy, M.R.: Nightly@nbc.com: Audience scope and the perception of interactivity in viewer mail on the internet. Journal of communication 45(3) (1995) 33. Newhagen, J., Reeves, B.: Negative video as structure: Emotion, attention, capacity, and memory. Journal of Broadcasting and Electronic Media 40, 460–477 (1996) 34. Pavlik, J.V.: Journalism and new media. Columbia University Press, NY (2001) 464 S.H. Kweon, E.J. Cho, and A.J. Cho 35. Poor, N.: Mechanisms of an online public sphere: The website Slashdot. Journal of Computer-mediated Communication (2005), http://jcme.indiana.edu/vol10/issue2/poor.html 36. Pryor, L.: The third wave of online journalism. Online journalism review (2002), http://www.ojr.org/ojr/future/1019174689.php (April 1, 2003) 37. Rafaeli, S.: Interactivity: From new media to communication. In: Hawkins, R.P., Wiemann, J.M., Pingree, S. (eds.) Advancing communication science: Merging mass and interpersonal processes, pp. 110–134. Sage, Newbury Park, CA (1988) 38. Rheingold, H.: The Virtual community: Homesteading on the electronic frontier. MIT Press, Cambridge, MA (2000) 39. Rheingold, H.: The smart mobs: The next social revolution (2002) 40. Samsel, J., Wimberley, D.: Writing for the interactive media: The Complete Guide. Allworth Press, NY (1998) 41. Sohn, D., Lee, B.: Dimensions of interactivity: Differential effects of social and psychological factors. Journal of Computer-Mediated Communication 10(3), article 6 (2005), http://jcmc.indiana.edu/vol10/issue3/sohn.html 42. Stansberry, D.: Labyrinths: The art of interactive writing and design. Wadsworth Publishing, Belmont, CA (1998) 43. Steuer, J.: Defining virtual reality: Dimensions determining telepresence. Journal of Communication 42(3), 73–93 (1992) 44. Stromer-Galley, Jennifer: Interactivity–as–product and interactivity–as–process. The Information Society 20, 391–394 (2004) 45. Sundar, S., Kalyanaraman, S., Brown, J.: Explicating website interactivity. Communication research 30(1) (February 2003) 46. Turkle, S.: Life on the screen: Identity in the age of the Internet (1997) 47. Wallace, P.: The Psychology of the Internet (1999) 48. Walter, J.B.: Interpersonal effects in computer-mediated interaction: A relational perspective. Communication Research 19(1), 52–90 (1992) 49. Williams, F., Rice, R.E., Rogers, E.M.: Research method and the new media. Free Press, New York (1988) 50. Wolf, M.: Genre and the video game. In: Wolf, M. (ed.) The Medium of the Video Game, pp. 113–134. University of Texas Press, Austin (2001) 51. Wood, F., Smith, M.: Online communication: Linking technology, identity, & culture. Lawrence Erlbaum Associates, Mahwah, NJ (2005) 52. Wu, H.D., Bechtel, A.: Web site use and news topic and type. Journalism and Mass Communication Quarterly 79(1), 73–86 (2002) 53. Yates, J., Orlikowski, W.J.: Genres of organizational communication: A structurational approach to studying communication and media. Academy of Management Review 17(2), 299–326 (1992) Comparing the Usability of the Icons and Functions between IE6.0 and IE7.0 Chiuhsiang Joe Lin, Min-Chih Hsieh, Hui-Chi Yu, Ping-Jung Tsai, and Wei-Jung Shiang Department of Industrial Engineering, Chung Yuan Christian University 200, Chung Pei Rd., Chung Li, Taiwan 32023, R.O.C {hsiang,g9674019,s921909,g9674021,wjs001}@cycu.edu.tw Abstract. Microsoft has presented the newest net browsing interface , Internet Explorer 7 IE7 in 2007. The purpose of this study was to compare the design of icons and functions between IE 7.0 and IE 6.0 for the effect of operating performance. Thus, we designed two experiments and a program which was constructed in Builder C++ 6.0. Participants were given missions, and then we recorded the mission completed time as operating performance. The results showed that the difference of icon design and functions between IE 7.0 and IE 6.0 do affect the operating performance. （） Keywords: Interface Design, Usability, Browser. 1 Introduction With the rapid development of the Internet, its usage has been on increase since World Wide Web popularized. According to the data published by comScore Media Metrix and collected by FIND (Foreseeing Innovative New Digiservices), there were about 750 million people who used Internet globally in January 2007 at a growth rate of 10% comparing to the previous year [1]. Among the twenty three million people in Taiwan, the ratio of population using the Internet is over sixty percent [1]. Therefore, the Internet has been part of daily life for people. The Internet users navigate the page with the browser, and interact with the page, document, image, and other information. Currently there are several browsers available to choose from, including Internet Explorer (IE), Firefox, Mozilla, and Opera, among them the usage ratio of Internet Explorer (IE) is the highest, as shown in Table 1. Table 1. Market share of Internet browsers [8] IE Firefox Mozilla Opera Safari Other 2007 78.59 14.95 0.14 0.52 4.85 0.95 2006 83.15 11.46 0.29 0.57 3.43 1.1 J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 465–473, 2009. © Springer-Verlag Berlin Heidelberg 2009 466 C.J. Lin et al. Microsoft published a new browser (Internet Explorer 7), and there are many differences between IE6 on the user interface. Among them, the buttons are divided into several groups and arranged to different places (Fig. 1), which differs far from IE6. In addition, on the same basis of display pixels, the button of IE7 becomes smaller obviously, and increases new functions of Tabbed browsing and Quick tabs (Fig. 2). Therefore, the purpose of this research is to firstly, examine how icon design and location influence users when they surf the net with IE6 and IE7 and, secondly, inspect how tabbed browsing would influence users when switching between sites. Fig. 1. Tool bar in Internet Explorer 7 Fig. 2. Quick tabs (left) and Tabbed browsing (right) in IE7 2 Literature Review Psychologists consider that people do not only think by simple words, but sometimes recall the memory in mind by icon or space place [3]. The meaning which icons contain in the interface of human-computer interaction is more extensive than what the words contain. Wickens [6] and Weidenbeck [7] thinks icons are better to comprehend completely and easier to memorize than words. This is the reason why icons are applied extensively. Among the current developed application programs, functional icons have been the basic elements, however, the icons on the tool bar in most application programs may appear too small and the spacing too narrow. Lindberg and Näsänen [3] think the size and spacing of icons have significant influence on user performances. They pointed out that the preferred spacing to use between icons (interface elements) is 1/2 to 1 icon (interface element) width, and the icon width ought to be bigger than 0.7 degree visual angle. In other words, the efficiency will be promoted when the icon width is about 0.5 cm at the viewing distance of 40 cm, or when the icon width is about 0.9 cm at the viewing distance of 70 cm. The final Comparing the Usability of the Icons and Functions between IE6.0 and IE7.0 467 purpose for icon design is to allow the user to control more directly, reduce the user’s burden for memorizing, and reduce the complication and errors of operation. Norman [4] addresses four principles to help designing for the sake of making up the gap between the designers and the users, visibility, good mental model, good mapping, and feed-back. Norman hopes that the user-centered design by these four principles can reduce the blind spots of the designers. The eight golden rules of interface design proposed by Shneiderman and Plaisant [5] contain some concepts similar to Norman’s principles. The eight golden principles put more emphasis on the design of interactive systems. They are: 1. Strive for consistency. 2. Cater to universal usability. 3. Offer informative feedback. 4. Design dialogs to closure. 5. Prevent error. 6. Permit easy reversal of actions. 7. Support internal locus of control. 8. Reduce short-term memory load. The icon design is involved in human’s psychology and physiology, and it has to express the meaning correctly. Horton [2] addresses the principles for icon design as follows: 1. Understandable: The icon should be presented in an easy way to comprehend, and the content should be connected directly. When designing the icon, it is better to apply the familiar and easily recognized visual icon for the users to reduce the burden of learning and memorizing. 2. Importance: The icons of the same character should be collected in the same block. 3. Distinct: There has to be frames and shadow surrounding the icons to differ from other icons. In addition, there has to a great difference form the similar icon and avoid designing the familiar icons confuse the users. 4. Memorable: The usage has to be continuous in the documents, and attractive. 5. Size: The size of icon will affect the users’ recognition and operation. The too big icon will occupy too much space; the too small icon is hard to attract users’ attention, and cost users’ concentration to click. 6. Attractive: The designers should pay attention to the balance of visual sensation and the coordination of the size, color, and interface. This study contains two experiments. The first experiment is based on Horton’s principles to evaluate how icon design and location influence users when they surf the net with IE6 and IE7. The second experiment inspects how tabbed browsing would influence users when switching sites. 3 Method 3.1 Experiment 1 There are ten subjects, including 7 males and 3 females, aged from 24 to 35 years old, all right-handed, and are without the disease of eyes and hands. In addition, they all have the experience of using the Internet over one year. Among them, there are 7 subjects using IE7 for over one year; the rest still using IE6. The experiment hypothesis is that the buttons of IE6 are bigger and centralized. Therefore, the merit of using IE6 is better than using IE7. The major measure lies in the spending time for using the buttons of the different interfaces. The independent variable is the two different browser interfaces, including five buttons, previous/next, stop, refresh, my 468 C.J. Lin et al. favorites, and home. The dependent variable is the time which the subjects need to finish the tasks. Experiment Environment and Procedures. A personal computer, with 19-inch LCD monitor, and an interface written in Builder C++ 6.0 were prepared for the study. Before starting the tests, there were thirty minutes for the subjects to practice the experiment interface. The experiment adopts the within subject design. For each subject, there were ten trials, five on each experimental interface simulating either IE6 or IE7. To avoid tiredness, subjects were asked to take 5 minutes break between two trials and do not end until the experiment finishes. When the subject presses the “Enter” button, the time starts to be counted, and at this time the page will turn to the IE interface to ask the subject to carry out the instructions. In every experiment, there will be instructions for each task, such as “press previous button, please”. When the subject completed the click on the target, the time will be recorded. 3.2 Experiment 2 The same subjects were used as the experiment 1. Among them, there are 7 subjects using IE7 for over one year and the rest still using IE6. The experiment hypothesis is the speed to switch different pages by using the function of Quick Tabs in IE7 is faster than in IE6. The independent variable is the function of Quick Tabs (Fig. 3.) and the traditional window types (Fig. 4.). The dependent variable is the time which the subjects need to finish the task. Fig. 3. Quick Tabs Experiment Environment and Procedures. The study prepared two computers with the same standards, with 19-inch LCD monitor, and pixel size of 1280*1024. Among them, one browser is IE7 and the other is IE6. Nine different pages were opened at the same time. When the subjects were familiar with the experiment interface, the experiment started. Each subject performed two trials. Each task trail is to have the subjects randomly picked an interface and later upon instruction clicked on the target page. The time was recorded between the subjects obtaining the instruction and finishing choosing the target. After finishing one experiment, there will be a tenminute break before the next experiment. Before starting the traditional interface Comparing the Usability of the Icons and Functions between IE6.0 and IE7.0 469 experiment, this study will shrink all the pages to the lower task row and put the cursor of mouse at the center of the monitor. And before the new Index Tabs experiment starts, this study will also shrink all the pages to the upper task row and ask the subjects to open the function of Quick Tabs. Fig. 4. Traditional window types 4 Results 4.1 Result of Experiment 1 After analyzing the data from experiment 1 with ANOVA, we can find the interface has significant influence on operation performance (F=8.49, P =0.004), and the operation time using IE6 is less than IE7 (Fig. 5). The function key has significant influence on operation performance (F=5.06, P =0.001) The time with which the subjects click the “Page Up”/”Page Down” is less than other function keys and the time to click “Homepage” is more than other function keys (Fig. 6). The order has significant influence on operation performance (F=13.98, P =0.000). The time the subjects spend in first order is more than other orders (Fig. 7). There is significant interaction between the interface and the function keys, and the difference of the two function keys “Homepage” and “My Favorites” is greater in IE6 and IE7(Fig. 8). Fig. 5. Mean time for IE6 and IE7 in experiment 1 470 C.J. Lin et al. 4.2 Result of Experiment 2 After analyzing the data from experiment 2 with ANOVA, we can find the interface has significant influence on operation performance (F=25.65, P =0.001), and the operation time which the subjects use in IE7 is less than in IE6 (Fig. 9). Fig. 6. Mean time for different icons Fig. 7. Interaction plot for time between interface and order Fig. 8. Interaction plot for time between interface and icon Comparing the Usability of the Icons and Functions between IE6.0 and IE7.0 471 Fig. 9. Mean time for IE6 and IE7 in experiment 2 5 Discussion 5.1 Experiment 1 Total Time. The time which the subjects need to finish the task in IE7 is obviously more than IE6. The reason might be the distribution of keys, for IE6 adopts centralized distribution and it is easier for the subjects to find the key they need. However, there is not significant difference on the time of every key (Fig. 8), the result will be discussed separately as follows. “Page Up”/”Page Down”. Although in the two different browsers, the color is different and smaller in IE7, yet the time is not significantly different. The speculation is that both the two interfaces use the bright color. Besides, the place and the shape of the arrow are almost the same; therefore, it might increase the memorable, and the understandable of the icon. That can also explain why this key can perform the best among these five keys. “Stop” and “Refresh”. In different browsers, there is no difference on the key” Stop” and “Refresh”. The speculation is that the color and the icon are totally the same in IE6 and IE7. Although the keys are smaller in IE7, yet the subjects can search the keys they need by the bright colors. In addition, these two keys stand side by side in the two interfaces, and increase the importance and memorable of message. “My Favorites”. There is significant difference on searching in the two interfaces. The speculation is that the size is the most different so that it takes the subjects more time to search. “Homepage”. There is significant difference on searching in the two interfaces. The speculation is that the color of the icon is similar to the color of the background in IE7, and the size is smaller so that the subjects cannot find this key in the first moment. The size seems to neglect the principle of attractive making the subjects spend more time to search in IE7. 472 C.J. Lin et al. 5.2 Order and Interface After analyzing the experiment order and the interface time, we find that interface time decreases with the experiment order; in other words, there are learning effects. However, the operation time in IE7 is still longer than that in IE6 with the learning effects. Interestingly, for the four subjects who are previously used to IE7 still finished the tasks in IE7 longer than in IE6. Therefore, the performance of subjects using IE6 is better than IE7. 5.3 Experiment 2 This experiment focuses on the new function in IE7, and the research finds that the searching time for using Quick Tabs to switch the pages becomes shorter obviously. Because Quick Tabs can present all the browsing pages in one window without clicking each window individually. Therefore, Quick Tabs does really work on browsing. 6 Conclusions The new products should be designed to be more convenient and faster for the users to use. When designing the product, visibility is a very important principle. The controls to be operated correctly should be obvious and the system should provide the users visible information (Norman, 2000). In this research, we find that some icon designs in IE7 are not referenced with the principle of visibility. Therefore it is hard for the users to browse pages fast and conveniently. The performance was worse than the old browser in most keys tested in this study. However, the function Quick Tabs in IE7 does really increase the performance of browsing the pages for the users. This research finds that the size, color, and place of the icon do really influence the users’ performance. Accordingly, the designers should think over this factor. Future userinterface designers should be aware of the importance of these findings and take them into account when designing new products so as to enhance users’ efficient performance on Internet usage. Acknowledgement This study is financially supported by a project from the National Science Council of Taiwan under contract No. NSC-97-2629-E-033-001. References 1. FIND. Global network popular rate grows 10% in 2006 (2007), http://www.find.org.tw/find/home.aspx?page=news&id=4726 2. Horton, W.: The icon book: visual symbols for computer systems and documentation. John Wiley & Sons, Chichester (1994) 3. Lindberg, T., Näsänen, R.: The effect of icon spacing and size on the speed of icon processing in the human visual system. Displays 24, 111–120 (2003) Comparing the Usability of the Icons and Functions between IE6.0 and IE7.0 473 4. Norman, D.A.: The psychology of everyday things. Yuan-Liou, TW (2000) 5. Shneiderman, B., Plaisant, C.: Designing the user interface: strategies for effective humancomputer interface, 4th edn. Addison Wesley, Reading (2005) 6. Wickens, C.D., Hollands, J.G.: Engineering Psychology and Human Performance, 3rd edn. Prentice-Hall, Englewood Cliffs (1992) 7. Weidenbeck, S.: The use of icons and labels in an end user application program: an empirical study of learning and retention. Behavior & Information Technology 18(2), 68–82 (1999) 8. WIKIPEDIA. Usage share of web browsers (2006), http://zh.wikipedia.org/w/ index.php?title=%E7%B6%B2%E9%A0%81%E7%80%8F%E8%A6%BD%E5%99%A8 %E7%9A%84%E4%BD%BF%E7%94%A8%E5%88%86%E4%BD%88&variant=zh-hant Goods-Finding and Orientation in the Elderly on 3D Virtual Store Interface: The Impact of Classification and Landmarks Cheng-Li Liu1, Shiaw-Tsyr Uang2, and Chen-Hao Chang3 1 Department of Industrial Management Vanung University, Taoyuan, Taiwan 2 Department of Industrial Engineering and Management Minghsin University of Science & Technology, Hsinchu, Taiwan 3 Granduate School of Business and Management Vanung University, Taoyuan County, Taiwan johnny@vnu.edu.tw Absract. The internet 3D virtual store has received wide attention from researchers and practitioners due to the fact that it is one of the most killing applications customers can feel in a real shopping environment and possibly increase satisfaction. Though numerous studies have been performed on various issues of the internet store, some research issues relating to the spatial cognition of the elderly when immersed in a 3D virtual store still await further empirical investigation. The objective of this study was to examine how elderly users acquire spatial cognition in an on-screen virtual store. Specifically, the impact of presence and absence of goods-classification on the acquisition of route and survey knowledge was examined. Since landmarks are associated with both route and survey knowledge, we expected to observe the impact of different types of landmarks with both presence and absence of goods-classification. The experimental results indicated that the presence of goods-classification was more important in constructing route knowledge than in absence, and the time of duration of goods-finding would be shorter. However, we also found that the measuring scores of survey knowledge in presence of goods-classification were not significantly larger than in absence. In addition, the measuring scores of route knowledge were the largest and the time of duration of goods-finding was shorter while the presence of goods-classification combined with landmark in the type of alphanumeric + 2D picture + 3D object. Simultaneously, it could be found in absence of goods-classification. Therefore, while the goodsclassification is absent, the landmarks could be seemed as redundant codes for goods-finding in 3D virtual store. Keywords: 3D virtual store, Goods-finding, Goods-classification, Landmarks, Route knowledge, Survey knowledge. 1 Introduction The internet store has received wide attention from researchers and practitioners due to the fact that it is one of the most killing applications for customers to buy any J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 474–483, 2009. © Springer-Verlag Berlin Heidelberg 2009 Goods-Finding and Orientation in the Elderly on 3D Virtual Store Interface 475 things on the Internet. It lets us buy what we want, when we want at our convenience, and helps us to imagine ourselves buying, owning, and having positive outcomes by the goods available out there on the web [6]. Shopping has been a way of identifying oneself in today's culture by what we purchase and how we use our purchases. Online shopping has been quiet popular nowadays since its first arrival on the Internet in society. Although the percentage of older adults (i.e. silver tsunami) using the web is less than the percentage of younger individuals, surveys indicate that this may not be the case for long. The World Health Organization estimates that by the year 2020, 24% of Europeans, 17% of Asians, and 23% of North Americans will be over the age of 60 [32]. By 2020, the world will have more than 1 billion people age 60 and over. The older population is growing rapidly worldwide and is becoming an increasingly important demographic to understand. According to Coyne and Nielsen [4], there are an estimated 4.2 million Internet users over the age of 65 in the United States. This number will continue to grow internationally at a rate that reflects the overall population trends previously discussed. However, most internet stores adopt letters, two dimensional (2D) pictures, voice and cartoons to display goods, which are short of the intuition of goods and systems do not support interaction with the goods [8]. It would influence the customers’ real shopping experience; what’s more, it minifies the shopping desire of the customers. These problems can be solved by utilizing the technology of virtual environments (VEs). In Web-based virtual environments, scenes and physical images can be compressed and transferred through network with limited bandwidth and 3D scenes of goods can be built on the web using some newest techniques. This technology allows the simulation of 3D VEs on a computer: humans can experience those environments by active exploration [11]. In 3D virtual store there provides a computer-synthesized world in which users can interact with goods and perform various activities. Although 3D virtual store improves the sense of reality and interaction with goods, there are some issues need to be discussed. Generally in large-scale VEs, the user’s viewpoint cannot encompass the entire environment [26]. In such VEs, navigation is a fundamental activity, and successful use of the VE requires that the user be able to easily and efficiently navigate from one location to another [5]. However, previous researches have shown that users of VEs are often disoriented and feel lost in hyperspace. Therefore, users often have extreme difficulty in completing navigational tasks [3][7]. It also has the same problems in 3D virtual store. Although the 3D virtual store is characterized by on-screen small-scale virtual environments, the user has an egocentric viewpoint that is within the environment, and it visually affords the experience of movement, rotation, and changing the elevation of the view in such environment [29]. In conventional grocery store, it is often divided into several areas or rooms, each built around a particular shopping theme [27]. Of course, in 3D virtual store, the store layout is also built for helping customer easier to browse. Sometimes the destination is not immediately visible. Such can be the case also with the typical on-screen virtual environment which may include, for example, walls that make up internal rooms and corridors visually obscuring the destination [21]. Therefore, the human spatial abilities that influence their acquisition and usage need to explore, particularly when interacting with such environments. Those abilities, in the 3 D virtual store, consist primarily of orientation and goods-finding. Additionally, previous studies on cognitive aging have found that certain aspects of human information processing 476 C.-L. Liu et al. abilities are negatively correlated with age [20]. Specifically, Park summarized four basic mechanisms accounting for age-related decline in cognitive functions, including processing speed, working memory, sensory function and inhibition. Lin found that older people were more likely to get disoriented in hypertext perusal and they also failed to browse the document as broadly as young adults could [16]. Neither did the aged people manage to retain browsed information as accurately as their young counterpart even when assisted with multimedia presentations. In view of this trend, there is an increasing need to investigate and better understand the abilities of orientation and goods-finding in the elderly, particularly when interacting with a 3D virtual store. In general virtual environment, the ability of orientation influences the efficiency and effectiveness of wayfinding. Wayfinding is a term that can refer to a rather narrow concern: That is, how well people are able to find their way to their particular destination without delay or undue anxiety [23]. It is knowing where you are in a building or an environment, knowing where your desired location is, knowing how to get there from your present location, and used in the context of architecture to refer to the user experience of orientation and choosing a path within the built environment. Travelers find their way using landmark knowledge, route knowledge, and survey knowledge [18]. The first two types of knowledge are more specific knowledge representing sensory experience [14]. Landmark knowledge records the visual features of landmarks, including their shape, size, texture, and so forth [10]. For a structure to be a landmark, it must have high imageability; that is, it must be distinctive and memorable [17]. Most often, landmark knowledge is an acquired first when encountering and learning to know a new and unfamiliar environment. The recognition of landmarks becomes a part of the construction of route knowledge, where the landmarks are the points that make up the routs. Later, landmarks are the objects and elements in the survey knowledge and they are part of constructing the layout and relational configuration of the elements in the environment [2][30]. Survey knowledge provides a map-like, bird’s-eye view of a region and contains spatial information including locations, orientations, and sizes of regional features [9]. Each type of knowledge helps the traveler construct a cognitive map of a region and thereafter find their way using that map [1][22]. In study of orientation, the notion of a “cognitive map” has received much research attention [13][19]. At its most general, a cognitive map is a mental construct which we use to understand and know the environment [12]. The term assumes that people store information about their environment which they use to make spatial decision [13]. Because of the importance of landmark knowledge for constructing a cognitive map, much research was devoted to the impact of landmarks on orientation. Several researchers have studied the value of 2D landmarks (i.e. textual and 2D images) for wayfinding. Witmer et al. used verbal directions and photographs to study route learning [31]. Waller et al. applied cardboard numeral, images of stuffed animal and arrow to study real-world task training and found that long exposure fostered good spatial knowledge under several VEs [28]. Additionally, several researches have been done on the effects of 3D landmarks for wayfinding. Elvins found that 3D thumbnails make better guidebooks for 3D worlds than do text and 2D thumbnail images [9]. Parush and Berman discussed the impact of navigation aids and 3D landmarks on the acquisition of route and survey knowledge in spatial cognition and found that combined impact of both the navigation aids used in the learning and the presence of 3D landmarks was primary evident in the orientation task [21]. Goods-Finding and Orientation in the Elderly on 3D Virtual Store Interface 477 Although each of these studies recognized the value landmark knowledge for wayfinding, none studied the value of combination of 2D and 3D landmarks in familiarizing travelers within a virtual environment. In addition, the environment of 3D virtual store is a small geographical area. We are interested in the efficiency and effectiveness of goods-finding, not wayfinding. The showrooms in the store could be considered as blocks in the map and the cabinet in the showroom as the buildings in the block. So the landmarks should be also important for user to construct a “cognitive map” on goods-finding in 3D virtual store, and there are some different types of landmarks could be applied, such as 3D object, 2D image and alphanumeric icon. In addition to landmarks, the goods-classification should be another cue for goodsfinding. Goods-classification divides goods into some classes by natures or trademarks, and allows people to seek a good appropriately from these classes. Whether the combined impact of both the goods-classification and different types of landmarks (i.e. alphanumeric, 2D image and 3D object) are significant for good-finding in the elderly within the 3D virtual store or not? This is a primary goal of our study. Therefore, the objective of this study was to examine how elderly users acquire spatial cognition in an on-screen virtual store. Specifically, the impact of different types of landmarks on the acquisition of route and survey knowledge was examined. In addition, their combined effect with the presence or absence of goods-classification was also examined. If goods-classification is associated more with the acquisition of spatial knowledge, we expected to observe its greatest impact during the goods-finding. Since landmarks are associated with both route and survey knowledge, we expected to observe the impact of different types of landmarks with both presence and absence of goods-classification. 2 Method 2.1 Participants There were 32 people (average age of 69.5 years) selected to participate in the experiment. They were paid a nominal NTD500 as compensation for their time. All participants were fully informed and had signed a consent form. Some researchers found that repeated exposure to the same virtual environment with separation of less than seven days could significantly affect the levels of cybersickness which would induce participant’s disorientation and nausea [25][15][24]. Therefore, the participants had not been exposed to the experimental VE in the previous 2 weeks. 2.2 Apparatus and the VE The VE experiment was constructed using a virtual environment developing software (MAYA and Virtool) and presented on a 19” TFT-LCD display. The scene was designed as a retail store which contained four showrooms. There were two conditions to be designed. One condition was that the store was divided into four subject showrooms including stationeries, hand tools, cleaning articles and toiletries as shown in Figure 1. The other was just divided into four showrooms, not classified. The landmarks in the specific 3D environment were highly visual and salient with a unique 478 C.-L. Liu et al. shape and volume in contrast to the goods. The landmarks were typical home and office objects such as a painting, a metaphor picture, a flowerpot and others. 2.3 Experimental Design and Procedures The study involved a 2 (Goods classification: absence or presence) × 8 (Types of landmarks: none, alphanumeric, 2D, 3D, alphanumeric + 2D, alphanumeric + 3D, 2D + 3D and alphanumeric + 2D +3D) between-subjects experiment, resulting in a fullfactorial design with 16 treatment conditions. Each participant was randomly assigned one of the eight conditions of landmarks in presence or absence of goodsclassification to do the task of goods-finding. Therefore, there were two participants was randomly assigned to one of the sixteen conditions. In other words, there were two participants assigned to each condition. Alphanumeric Hand Tools Cleaning Articles 2D Stationeries Toiletries Entrance (a) A top view of the experimental 3D store 3D (b) A view of the stationery room (c) Three types of landmarks are presented in the hand tools room Fig. 1. The 3D virtual environment of the retail store During the exposure period, participants were asked to search for and confirm some goods in the store. There were eight target goods, two in each showroom, need participants to search for. When they found the target, s/he should move the cursor on the object and click the left button of mouse twice to respond ‘hit.’ There were four stages in the experiment. In each stage, participant was randomly assigned two target goods to find and asked to attempt to recall the location of target goods and complete an oral questionnaire of route knowledge when two target goods had been found. When all eight target goods had been found, s/he was asked to complete a questionnaire of survey knowledge. 2.4 Measurements Spatial knowledge and performance measurements were recorded and collected for all trials. The measurements included the following: Goods-Finding and Orientation in the Elderly on 3D Virtual Store Interface 479 1. Measurement of route knowledge: When participants finished two goods finding and returned to the entrance in each stage, they were requested to complete an oral questionnaire to describe the correct position of one of the two target objects. Participants would receive 9 points if they were able to point out the correct showroom, 6 points if they were unable to point out the showroom but the next door and 3 points the next to next door; 7 points if they were able to point out the correct cabinet where the target object was posited, 5 points if they were unable to point out the cabinet but the next door and 3 points the next to next door; 5 points if they were able to point out the correct shelf where the target object was posited, 4 points if they were unable to point out the shelf but the next door and 3 points the next to next door. 2. Measurement of survey knowledge: When participants finished all eight goods finding, they were requested to point out the correct position of all eight goods on the map. The calculation of scores was the same as route knowledge. 3. Goods-finding duration: The time from beginning of the trial until the Entrance key was pressed indicating the end of the trial. 3 Results and Discussion This experimental study was designed to investigate the impact of presence and absence of goods-classification on the acquisition of route and survey knowledge and the effects of different types of landmarks with both presence and absence of goodsclassification. The score analysis in the route knowledge revealed that the presence of goods-classification was significantly better than in absence (F1,16 = 25.20, P = 0.000). However, the F for survey knowledge was not statistically significant (F1,16 = 2.99, P = 0.103). The time of goods-finding in presence of goods-classification was also significant shorter than in absence (F1,16 = 30.98, P = 0.000). In addition, the scores of route knowledge with landmarks in presence of goods-classification were significant larger than in absence (t(26)=4.69, P=0.000) and the time of duration of goods-finding was also significant shorter (t(26)=4..85, P=0.000). These results showed that the goods-classification was positive and important for the acquisition of route knowledge and good-finding. When the participant had more route knowledge, the time of duration of goods-finding would be shorter. While the route knowledge was constructed, the survey knowledge was also made up but slowly. Goods-classification and landmarks were good help for spatial knowledge in the beginning, especially in route knowledge. After the learning phase, the survey knowledge would be made up clearly even though the goods-classification and landmarks were absent. Therefore, the time of duration of good-finding in the forth stage was shorter than the first stage. Because the survey knowledge was measured when experiment finished, the scores of survey knowledge were not significantly different on the means of each condition. In effects of landmarks, we found that different types of landmarks were not significantly different in acquisition of route knowledge (F7,16 = 1.74, P = 0.169) and survey knowledge (F7,16 = 0.61, P = 0.741). However, in comparing the difference between the presence and absence of landmarks, we found that the scores of route knowledge in presence of landmarks were significantly larger than in absence when goods-classification was present (t(26)=4.78, P=0.000) and absent (t(26)=2.59, 480 C.-L. Liu et al. P=0.008), and the time of duration of goods-finding was also significantly shorter for participants that navigated with landmarks as compared to without landmarks in presence (t(26)=22.14, P=0.000) and absence (t(26)=13.65, P=0.000) of goods-classification. The scores of survey knowledge were significantly different between presence and absence of landmarks in presence of goods-classification (t(26)=2.27, P=0.016), but not significant different in absence of goods-classification (t(26)=1.19, P=0.123). It can be seen that landmarks were also important for goods-finding in 3D virtual store, no matter what combined type was used. This finding is in agreement with many previous studies indicating the impact of landmarks on spatial cognition. The finding also reflects the more advantage of having landmarks in the 3D virtual store environment particularly when goods were classified. According to the previous results, goods-classification is associated more with the acquisition of spatial knowledge. Finally, an additional analysis was performed to evaluate what type of landmarks would be the greatest impact in goods-finding while the goods-classification was present. Figure 2 displays the mean scores of route and survey knowledge and duration time on the eight types of landmarks for presence and absence of goods-classification. It can be seen that the measuring scores of route knowledge were the largest and the time of duration of goods-finding was shorter while the presence of goods-classification combined with landmarks in the type of alphanumeric + 2D picture + 3D object. Simultaneously, it happened in absence of goods-classification. Taken together, the findings show that goods-classification was much important in goods-finding when participants navigated in the 3D virtual store, and the landmarks could be seemed as redundant codes. If there is more information from landmarks on goods-finding in the 3D virtual store, the acquisition of route knowledge of participant would be better. Additionally, if the duration of goodsfinding in 3D virtual store is short, the survey knowledge would be not easily made up and little impact on goods-finding. 180 180 90 90 160 80 140 70 140 70 120 60 100 50 80 40 60 30 M e a n sc o re s o f k n o w l e d g e 80 M e a n sc o re s o f k n o w l e d g e 160 120 60 100 50 80 40 60 30 40 20 40 20 20 10 20 10 Route knowledge Survey knowledge Duration Survey knowledge Duration (min.) (min.) 0 Route knowledge Types of landmarks Presence of goods-classification 3D +3 nu ha alp ha me nu r ic +2 me D+ D r ic 2D +3 D D 3D alp ha nu me ha r ic nu +2 me 2D r ic ne nu alp a lp ha No 3D +2 r ic me nu ha alp alp r ic me me nu ha alp D+ D +3 +2 r ic nu ha alp 2D +3 D D 3D 2D me No r ic ne 0 Types of landmarks Absence of goods-classification Fig. 2. Mean scores of route and survey knowledge and duration time on the eight types of landmarks for presence (left panel) and absence (right panel) of goods-classification Goods-Finding and Orientation in the Elderly on 3D Virtual Store Interface 481 4 Conclusion 1. Analysis of the goods-finding task with goods-classification in the 3D virtual store indicates that classification had a significant impact on acquisition of spatial knowledge and goods-finding. When the goods are classified in different showrooms in accordance with natures, the showrooms would be regarded as a regular graphic representation of the entire geographic area, including the layout of elements (i.e. showcases) and the spatial relationships among them. Therefore, a clear cognitive map representing the environment including position and direction information would be constructed easily. 2. Landmarks are also important for goods-finding in 3D virtual store, no matter what combined type is used. The finding also reflects the more advantage of having landmarks in the 3D virtual store environment particularly when goods were classified. However, If goods-classification was absent in such an environment, one could have additional information from landmarks to process. Such information enables the determination of one s current position and basis for goods-finding. 3. In correct goods-finding responses, we found that no mater goods are classified or not, the landmark in type of alphanumeric + 2D picture + 3D object has good effect on the acquired spatial cognition. 4. Landmarks are not the only resource to make up spatial knowledge in 3D virtual store; goods-classification also is a good one and may be more important than landmarks. ’ Acknowledgement The authors would like to thank the National Science Council of the Republic of China for financially supporting this work under Contract No. 97-2221-E-238-013. References 1. Appleyard, D.A.: Planning a Pluralistic City: Confliicting Realities in Ciudad Guayana. MIT Press, Cambridge (1976) 2. Chen, C.: Information Visualization and Virtual Environments. Springer, London (1999) 3. Conroy, R.: Spatial Navigation in Immersive Virtual Environments. Thesis, University College London (2001) 4. Coyne, K., Nielsen, J.: Web Usability for Senior Citizens: 46 Design Guidelines Based on Usability Studies with People Age 65 and Older. Nielsen Norman Group Report (2002) 5. Darken, R., Peterson, B.: Spatial Orientation, Wayfinding, and Representation. In: Stanney, K. (ed.) Handbook of Virtual Environments: Design, Implementation, and Applications, Lawrence Erlbaum Associates, Mahwah (2002) 6. Davis, S.G.: Culture Works the Political Economy of Culture. Minneapolis. University of Minnesota Press, London (2001) 7. van Dijk, B., op den Akker, R., Nijholt, A., Zwiers, J.: Navigatiion Assistance in Virtual Worlds. Informing Science Journal 6, 115–125 (2003) 482 C.-L. Liu et al. 8. Ding, J., Yu, L., Wang, Y., Pan, Z.: EasyHouse-I: A Virtual House Presentation System Based on Internet. In: 11th International Conference on Human-Computer Interaction, Las Vegas (2005) 9. Elvins, T.T., Nadeau, D.R., Schul, R., Kirsh, D.: Worldlets: 3-D Thumbnails for Wayfinding in Large Virtual Worlds. Presence 10, 565–582 (2001) 10. Goldin, S.E., Thorndyke, P.W.: Simulating Navigation for Spatial Knowledge Acquisition. Human Factors 24, 457–471 (1982) 11. Jansen-Osmann, P.: Using Desktop Virtual Environments to nvestigate the role of landmarks. Computers in Human Behavior 18, 427–436 (2002) 12. Kaplan, S.: Cognitive Maps in Perception and Thought. In: Downs, R.M., Stea, D. (eds.) Image and Environment, Aldine, Chicago (1973) 13. Kitchin, R.M.: Cognitive Maps: What are They and Why Study Them? Journal of Environmental Psychology 14, 1–19 (1994) 14. Kitchin, R., Blades, M.: The Cognition of Geographic Space. I. B. Tauris Publishers (2002) 15. Lathan, R.: Tutorial: a Brief Introduction to Simulation Sickness and Motion Programming. Real Time Graphics 9, 3–5 (2001) 16. Lin, D.-Y.M.: Evaluating Older Adults’ Retention in Hypertext Perusal: Perfects of Presentation Media as a Function of Text Topology. Computers in Human Behavior 20, 491– 503 (2003) 17. Lynch, K.: The Image of the City. MIT Press, Cambridge (1960) 18. Montello, D.: A New Framework for Understanding the Acquisition of Spatial Knowledge in large-scale Environment. In: Egenhofer, M., Golledge, R. (eds.) Spatial and Temporal Reasoning in Geographic Information Systems. Spatial Information Systems, pp. 143–154. Oxford University Press, New York (1998) 19. Omer, I., Goldblatt, R.: The Implications of Inter-Visibility Between Landmarks on Wayfinding Performance: an Investigation Using a Virtual Urban Environment. Computers, Environment and Urban Systems 31, 520–534 (2007) 20. Park, D.C.: The Basic Mechanisms Accounting for Age-related Decline in Cognitive Function. In: Park, D.C., Schwarz, N. (eds.) Cognitive Aging: A Primer, pp. 3–21. Psychology Press, Philadelphia (2000) 21. Parush, A., Berman, D.: Navigation and Orientation in 3D User Interfaces: the Impact of Navigation Aids and Landmarks. International Journal of Human-Computer Studies 61, 375–395 (2004) 22. Passini, R.: Wayfinding in Architecture, 2nd edn. Van Nostrand Reinhold, New York (1992) 23. Peponis, J., Zimring, C., Choi, Y.K.: Finding the Building in Wayfinding. Environment and behavior 22, 555–590 (1990) 24. Regan, E.C., Price, K.R.: The Frequency and Severity of Side-Effects of Immersion Virtual Reality. Aviat. Space Environ. Med. 65, 527–530 (1994) 25. Stanney, K.M.: Handbook of Virtual Environments. Earlbaum, New York (2002) 26. Vinson, N.: Design Guidelines for Landmarks to Support Navigation in Virtual Environments. In: ACM Conference on Human Factors in Computing Systems, pp. 278–285. Pitttsburgh, Pennsylvania (1999) 27. Vrechopoulos, A.P., O’Keefe, R.M., Doukidis, G.I., Siomkos, G.J.: Virtual Store Layout: an Experimental Comparison in the Context of Grocery Retail. Journal of Retailing 80, 13–22 (2004) 28. Waller, D., Hunt, E., Knapp, D.: The Transfer of Spatial Knowledge in VE Training. Presence: Teleoperators and Virtual Environments 7, 129–143 (1998) Goods-Finding and Orientation in the Elderly on 3D Virtual Store Interface 483 29. Wickens, C.D.: Frames of Reference for Navigation. In: Gopher, D., Koriat, A. (eds.) Attention and Performance 16, pp. 130–144. Academic Press, Orlando (1999) 30. Wickens, C.D., Hollands, J.G.: Engineering Psychology and Human Performance, 3rd edn. Prentice-Hall, Englewood Cliffs (1999) 31. Witmer, G.G., Bailey, J.H., Knerr, B.W., Parsons, K.C.: Virtual Spaces and Real World Places: Transfer of Route Knowledge. Int. J. Human-computer Studies 45, 413–428 (1996) 32. World Health Organization, Press Release WHO/69, http://www.who.ch/press/1997/pr97-69.htm Effects of Gender Difference on Emergency Operation Interface Design in Semiconductor Industry Hunszu Liu Department of Industrial Engineering and Management, Minghsin University of Science and Technology, No.1, Xinxing Rd., Xinfeng Hsinchu 30401, Taiwan R.O.C hliu@must.edu.tw Abstract. This research investigates the effects of gender difference on emergency operation interface design through studying monitoring operations performed at emergency response center. An experiment is designed to test the performance differences between fifteen male and fifteen female college engineering students. The signal detection time, incident processing time, number of errors, and duration of experiment are dependant variables to measure the participants’ performance. Statistical analysis indicates that no significant differences can be found between males’ and females’ performances except the number of errors. Female participants make more errors than male participants. A training program is suggested to help female workers familiar with the emergency operations. The research results provide evidences for adjusting current disaster prevention personnel recruitment policy and suggestions for further improvements of emergency operation interface design in semiconductor industry. Keywords: User interface design, emergency management, human performance, gender differences. 1 Introduction Enhancing the robustness of safety barriers in industry requires continually scrutinize the performance of hardware and human and endeavor new feasible strategy for any possible improvement. In the case of emergency response center, current personnel recruitment policy may limit the effectiveness of loss prevention systems. The scenario of male workers dominant current working crews in emergency response center is very common among local industry in Taiwan. Although it is forbidden by law to exclude female workers from performing jobs, the unproven belief that female workers cannot meet the physical and psychological requirements of performing emergency operation still prevail within many factory managers and supervisors. This unwritten discrimination becomes a career obstacle for female safety and health majored college students. Some female safety and health practitioners may be forced to leave their current jobs or limit their career future. As a result, the company may be suffered from this inappropriate strategy. The performance of emergency response operation is an important part of accident and disaster prevention program. Risky manufacturing processes, such as utilization J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 484–489, 2009. © Springer-Verlag Berlin Heidelberg 2009 Effects of Gender Difference on Emergency Operation Interface Design 485 of various highly toxic chemicals in tightly enclosed buildings or area, require continuously surveillance on the factory activities and facilities to ensure safe operations. The facility management and control system (FMCS), widely installed in the emergency response center (ERC) located within Taiwan semiconductor industries, is designed to serve this purpose. The FMCS monitors the status of power supply subsystem, gas supply subsystem, water processing subsystem, clean room air control subsystem, accident control subsystem, chemical control subsystem, process cooling water subsystem, air condition subsystem, gas detection subsystem, very early smoke detection alarm subsystem etc. through the integrations of communication and information technologies. Each subsystem, automatic or semi-automatic, collects system operation data through sensors, video camera, alarms, and communication devices located around shop floors. Typical task performed by operators in the ERC is to monitor the information display devices, such as computer screens, TV screens, telephones, inter-com devices, and broadcasting devices and to perform the information processing tasks. Whenever abnormal signal presents, operators are required to detect the signal and initiate appropriate actions in time based on the standard operation procedures. It is generally recognized by local industry that male workers are more capable of performing these tasks than female workers. This Current approach may put the emergency response operation under the risks of poor human performance and not use the company resource effectively. This research investigates the effects of gender difference on emergency operation interface design through studying monitoring operations performed at emergency response center. 2 Background Information The concept of establishment of ERC emerged after some severe accidents, which have cost Taiwan semiconductor industry more than 20 billions of losses since 1996. The inherent risk manufacturing characteristics, such as utilization of various highly toxic chemicals during continue production process and long working hours, usually 12 hours per shift, of semiconductor industry require continue surveillance on the factory activities to ensure the safe operations of its production. In Taiwan, the responsibilities of monitoring the semiconductor plant activities often fall upon the shoulders of working teams in Emergency Response Center (ERC) through the integration of data from automatic sensor devices, video camera and communication channels. Duty supervisor examines all these data, and appropriate instructions are given to related personnel to perform correct rectified actions. Computer programs are utilized to assist the data integration process, and some decision supporting systems are also installed to reduce the mental workload of monitors. As a result, various types of data are presented on several computer screens, which require experienced workers to interpret and transform them into useful information. The interface design plays as one of the crucial factors for ERC supervisors’ performances. Studies of gender differences on safety present different results. Crowe (1995) argued that although earlier study results show that females are more safety conscious than males, gender is not a good estimate of their safe practices [1]. Mardis & Pratt, (2003), Mayhew & Quinlan (2002), and Breslin & Polzer et al. (2007) found no 486 H. Liu gender differences in injury rate [2][3][4]. On the other hand, Islam & Mannering (2006) concluded that there are statistically significant differences between male and female drivers [5]. Monarrez-Espino (2006) presented similar result that men’s fatality rate was five times higher that that of women for single car crashes [6]. These studies have shown that the common impressions of gender differences on safety shared by industry persons may not be right. Further studies on this issue are needed. 3 Experiment Historical data of FMCS in a local semiconductor company has shown that the gas detection system records the most frequent events. Therefore, the gas detector alarm system is chosen to test the gender differences in interface design. The structure of the Gas Detection System is depicted in Fig. 1. The major components of this system are gas detector, programmable logic controller, supervisory control and data acquisition server (SCADA), and FMCS display screen. The gas detector is installed around the shop floor. When gas detector is activated, the alarm will sound and alarm signal will appear on FMCS display screen (Fig. 2). The ERC operator is required to pick the icon on screen to identify the location of detector and inform related site operator to take necessary precautions through communication system, such as phone line, radio system or broadcasting devices (Fig. 3). The FMCS is an online monitoring system, which required fully function twentyfour hours. It is not practical to be used directly as the experiment facility. In stead, in this research, a computer program, which simulates the operation of gas detector alarm informing procedure, is used as testing tool to investigate the performance of participants (Fig. 4). If there is incident message appearing on screen, he/she must perform the informing procedure immediately (Fig. 5). Fifteen male students and fifteen female students who have the potential to work as supervisor voluntarily participate this experiment. They all have engineering background and interested in safety and health career. Participants were randomly mixed and trained to familiar with the gas alarm informing procedure. After he/she has confidence to accept the test, each subject runs the testing program with three different frequencies of incident rate. Performance measurement index include signal detection time, incident processing time, number of errors, and durations of experiment. Fig. 1. The structure of gas detector alarm system Effects of Gender Difference on Emergency Operation Interface Design 487 The environment setting is arranged according to the layout of ERC (Fig 6). The FMCS system is installed in No. 1 and No. 2 Display, where are the source of incident information. Each participant is required to monitor these two screens during the experiment. The cellular phones are used as the communication devices to inform related personnel for further instructions. Fig. 2. Display screen of the FMCS display screen Fig. 3. Display screen of the gas detector alarm system Fig. 4. Simulation of FMCS 488 H. Liu Fig. 5. Display screen of informing procedure Fig. 6. Layout of experiment Four hypothesis tests were conducts to explore the gender differences in interface design. Hypothesis 1: Female students take longer time to detect the signal than male. Hypothesis 2: Female student take longer time to process the incident. Hypothesis 3: Female student make more errors than male. Hypothesis 4: Female student take more time to finish the experiment than male. 4 Results and Discussions The interface design of emergency operation may greatly affect the effectiveness and efficiency of ERC operations and the system safety. This research investigates the effects of gender difference on emergency operation interface design through studying monitoring operations performed at emergency response center. In this study, the gas detection subsystem is classified as the most active subsystem, which required greater attentions from workers. Simulation program and working environment are developed in the lab to mock up the scenario of gas detection subsystem. An experiment is designed to test the performance differences between fifteen male and fifteen female engineering college students. The participants are required to identify time, node, tag-name, description, value, status which were shown at the lower part of screen when the alarm signal appeared on the screen. Based on these information participants have to locate corresponding area through maneuvering the FMCS and Effects of Gender Difference on Emergency Operation Interface Design 489 obtained the identification of machine and gas type. Then, record this incident on daily activity logs. The signal detection time, incident processing time, number of errors, and duration of experiment are dependant variables to measure the participants’ performance. Statistical analysis indicates that no significant differences can be found between males’ and females’ performances except the number of errors. Female participants make more errors than male participants. 5 Conclusions In the past, female worker can easily apply safety and health related job. In fact, many safety and health personnel in Taiwanese manufacturing environment are female. The establishment of ERC has limited the job right of female safety and health practitioners. This study result shows that female make more errors than male. Further discussions with female participants indicate that system familiarity is the main reason for this problem. The feasible solutions include the interface design improvement and more training for female workers. The capable ERC supervisors are hard to find. Most new recruits need to be trained with not only the unique FMCS operations, but also other safety and health operations. With little improvement of interface design, female safety and health personnel can perform the ERC tasks as well as male does. The research results provide evidences for measuring the appropriateness of current disaster prevention personnel recruitment policy and suggestions for further improvements of emergency operation interface design in semiconductor industry. Acknowledgements This research is funded by National Science Council in Taiwan (NSC# 97-2221-E159-011). References 1. Crowe, J.W.: Safety Values and Safe Practices Among College Students. Journal of Safety Research 26(3), 187–195 (1995) 2. Mardis, A.L., Pratt, S.G.: National injuries to young workers in the retail trades and services industries in 1998. Journal of Occupational and Environmental Medicine 43, 316–323 (2003) 3. Mayhew, C., Quinlan, M.: Fordism in the fast food industry: Pervasive management control and occupational health and safety risks for young temporary workers. Sociology of health and Illness 24, 261–284 (2002) 4. Breslin, F.C., Polzer, J., MacEachen, E., Morrongiello, B., Shannon, H.: Workplace injury or part of the job?: Towards a gendered understanding of injuries and complaints among young workers. Social Science & Medicine 64, 782–793 (2007) 5. Islam, S., Mannering, F.: Driver aging and its effect on male and female single-vehicle accident injuries: Some additional evidence. Journal of safety Research 37, 267–276 (2006) 6. Monarrez-Espino, J., Hasselberg, M., Laflamme, L.: First year as a licensed card driver: Gender differences in car crash experience. Safety Science 44, 75–85 (2006) Evaluating a Personal Communication Tool: Sidebar Malena Mesarina, Jhilmil Jain, Craig Sayers, Tyler Close, and John Recker HP Labs, Palo Alto firstName.lastName@hp.com Abstract. By more closely integrating email with the web we aim to bring organization to email and more collaboration to the web. To this end we developed the Sidebar, a web-browser plug which displays email messages which link to the currently displayed URL. We conducted longitudinal studies on two versions of Sidebar to observe the usage of Sidebar and determine if it improves communications productivity. We found that providing an email summary in Sidebar resulted in raised awareness of the email collaborations, increased serendipitous discovery of information, and resulted in higher reported communication productivity. This paper summarizes Sidebar’s operation, describes the user studies, and presents conclusions. Keywords: Personal communication, browser plug-in, longitudinal user study, interviews, diary study, surveys, usability evaluation, conversation visualization, information visualization, email visualization, conversational thumbnail, email, related links, related web-pages. 1 Introduction We developed the Sidebar [1] to improve the productivity of our email communications by leveraging the global information space provided by the web. The core operation provided by Sidebar is the display of a summary of user’s email that links to a currently displayed web page. The Sidebar user interface can be seen in Fig. 1. where the browser displays the current draft of a standards document on the right, and the user (a W3C editor) is using Sidebar to track an email conversation about requested changes to the web document on the left. The Sidebar has automatically searched the user’s personal email archive, and displayed a summary of all emails containing the current URL, sorted by date and URL fragment identifier. As a result, all email conversations related to the document are displayed adjacent to the document. The key enabling detail is that each email message links to the document section it is about, as is normal to do when writing about a web page or other resource on the web. By searching for these links in the user’s email archive, his or her personal conversational history about a web resource can be automatically found and displayed while web browsing. Clicking on a displayed email summary in Sidebar causes the standard system email editor to appear showing the entire email and with all the usual email options (“reply”, “forward”, “delete”, etc). In addition, a link within the Sidebar to “Start a new email thread” causes the J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 490–499, 2009. © Springer-Verlag Berlin Heidelberg 2009 Evaluating a Personal Communication Tool: Sidebar 491 email editor to display a pre-populated message whose To and CC fields are filled with all those people who have previously been involved in email discussions about the relevant web page and where the body of the email includes the URL and title for the web page. Fig. 1. Sidebar at work for the W3C Web Security Context Working Group. The plugin on the left-hand side summarizes email that links to the currently displayed document. It is sorted by date and URL fragment identifier. While the basic operation is simple, Sidebar leverages both the existing email infrastructure for content storage/delivery and the existing web infrastructure for navigation. As a result it serves several roles: 1. It allows serendipitous email discovery. Users may just browse the web as usual and Sidebar will discover emails that refer to the visited web page. One common means of navigating the web is by clicking on a link inside an email message and in this case the Sidebar conveniently shows not only that email message, but also any other messages the user may have received about the same page. 492 M. Mesarina et al. 2. It provides organization for email discourse. When starting a new conversation, just include a link to a related web page. For example, if discussing a trip you might link to an appropriate map. Now, whenever the user revisits that page he or she will see all the related discussions even though those emails may have had different recipients, subjects and creation dates. Think of the web as providing a navigational structure on which you can hang conversations. Note that end-users also have the option to generate pages themselves (using a wiki for example) just to serve as handles for later conversations and that in corporate environments domain experts can mine existing databases to automatically generate a set of web pages which can then serve a navigational role. 3. It collates private and public information by showing your personal email alongside relevant web pages. For example, when visiting a wiki page you can choose to publish content in a relatively public way by editing the wiki page or in a relatively private way by authoring an email to a more limited set of individuals. Sidebar shows both communication paths side-by-side and allows email to serve as a replacement for access-controlled collaborative systems. This is particularly valuable in corporate environments where security and document retention policies for email are already well-defined and understood. 1.1 Visualization Thumbnails While showing summary text of every relevant message is adequate for short conversations it does not scale well. This is particularly true for our application since the email conversations are peripheral to the browsing task and so are given only limited display area. Accordingly, we explored the design of small visual images to represent email conversations. An example of these conversational thumbnails is shown in Figure 2. Fig. 2. A visual thumbnail is shown in the sidebar at left. A closeup of the visualization is shown on the right. Along the bottom are conversational participants, the arcs show message paths, and the small planet-like objects are links to related web pages. The thumbnail depicts conversational participants, the flow of messages from sender to receiver, and other URL links found in the email conversations that refer to the current URL. Size and relative positions were used to visually relate the importance of these objects to the user. Evaluating a Personal Communication Tool: Sidebar 493 People are represented with icons whose proximity to the current user (the blue person icon) reveals how important the person is to the user in conversations about the current web page. More recent message which are either directly to (not CC or BCC’d) or from the current user are considered of higher importance. The color assigned to each person is a function of their email addresses. The flow of messages between participants is depicted by the curved links between the dots on the white circle. The white circle is a symbolic representation of the current conversation, where the user is represented by the blue dot at the center and dots at the border are the rest of the people involved in the conversations about the web page. Each curved link represents a message exchange whose color depends on the color of the sender. Note that some links do not pass through the center (as in the above figure) and these indicate messages on which the current user was CC’d. Related web sites are depicted by the grey circles, that we call “planets” orbiting the current conversation. The size and vertical location of the planets depend on their relevance to the current user – was he or she the sender or recipient, how frequently was the link mentioned in email, and how long ago did the communication happen. Further details of the visualization are available in [9]. 2 The Study We conducted two studies, one for each version of Sidebar. The first study on the “text” summary version of Sidebar was designed to find whether Sidebar improved collaborative communications in job related tasks. The second study on the “visual” version of Sidebar was conducted to find whether the visualization thumbnails provided an added benefit to the usability of Sidebar. 2.1 First Study We conducted the first study in early 2008. The participants were selected from a marketing communications (communications) and administrative (admins) assistants staff at HP Labs. Sending URLs of web sites as references for work related content were typical activities of both groups. Five of the participants (all females) were admins and the rest (4 females, 1 male) were from communications, with ages in the 20-60 range. We asked the participants to install and use Sidebar for three weeks. We used the diary study methodology to capture participant experiences during the entire three week period. Additionally, we conducted weekly semi-structured interviews and surveys with all participants to study how usage behavior and perceptions change over time. We also encouraged participants to report any particularly good or bad experience via email (the link was provided on the Sidebar interface) as soon as possible after the relevant interaction. 2.2 Second Study For the second study we selected five new participants with different job functions: research manager, scientist, writer, admin and research intern. Two were female, three male, and they ranged in age from 30 to 45. After installing the version of Sidebar 494 M. Mesarina et al. that included the visualization thumbnail, we conducted a two-week study. We did not explain the visualization, but instead asked them to explore it on their own. We interviewed them after 20 minutes and then again after one week and asked them to explain the various facets of the visualization. 3 Findings 3.1 Purpose of Using URLs during Communication One of the goals of the study was to uncover the primary reasons why people send URLs to themselves and to other people. We identified five reasons: a) transferring data from one machine to another, i.e. laptop to desktop and vice versa, or one physical location to another, i.e. work to home; b) sharing events (upcoming conference) or knowledge (articles, papers); c) decision making – which component/device should I buy?, recommendations for books; d) archiving – since this is more permanent than bookmarking and is also accessible from any machine, any browser, and any location; and e) reminding- taxes/forms to fill, conference registration, etc. 3.2 Adding Comments from Sidebar Though Sidebar provided a link "Start a new email thread", we observed that while this was the favorite feature of one admin, many still preferred a manual cut and paste mechanism to send emails. It could be that these users were struggling between habit and convenience, or they simply preferred to use the same mechanism as needed to insert other content into a message. When the user clicks the "Start a new email thread" hyperlink, Sidebar automatically adds all the people engaged in the various conversation threads as recipients of the generated email. Somewhat surprisingly, most users did not appreciate this convenience and were wary of accidentally sending email to unintended recipients. 3.3 Wiki+Sidebar = Topic Organization One of our participants from the communications department is responsible for organizing customer visits to HP. After using Sidebar for a week, he began using the company’s Wiki tool along with Sidebar to organize each visit. He created a new wiki page for each visit where he recorded organizational details and the agenda for the visit. Then he would include a link to the relevant agenda page within each message. In this manner, he could use Sidebar to organize his emails relevant to each meeting. Interestingly, in contrast to the concerns of most other participants, he found the prepopulation of emails addresses of people involved in a thread useful. 3.4 Effects of Combining Email and Browsing Most of our target users use Microsoft Outlook for their email communications, and Internet Explorer/Firefox for browsing. Keeping this in mind, we designed Sidebar such that all the email-related tasks are removed from it in order to reduce the cognitive load. But we observed that once users became comfortable with using Sidebar, Evaluating a Personal Communication Tool: Sidebar 495 they wanted to be able to access and search all their email from Sidebar itself. When surveyed, almost 80% of the participants wanted a tool that could integrate email and web page browsing. 3.5 Communication Productivity Each week we asked users if they felt that Sidebar had increased their communication productivity. Here we observed an interesting phenomenon; participants felt that Sidebar had resulted in increased productivity especially when they did not remember the URL. Note that in this scenario, participants typically had to first search inside Outlook using keywords to locate any one email that contains the (unknown) URL in question. They would then click on the URL link in that email to display the page in the browser and use Sidebar to extract other related emails. Thus it appears that users saw Sidebar’s ability to display all the other messages as valuable. Participants’ web browsing sessions consisted of viewing not only web pages that they were collaborating on, but also web pages that they were not. Thus, from a communications point of view, only a subset of the web pages viewed in the browsing session were relevant to the participants. Sidebar, however, shows relevant emails for any web page that the participant may have sent email about. This feature revealed something interesting. When viewing known URLs in the browser, the participants tended to not interact with the emails since they were already aware of the content of the conversations about the URL. However, when the URL was unknown, the participants always interacted with the emails displayed by Sidebar since they were specifically using Sidebar to extract related emails. Thus most participants had the perception that Sidebar was more favorable when the URLs were not known. During this study we observed there were a number of times when participants desired other web pages related to the web page being viewed, or needed help to locate emails for an unknown URL. We conclude that when displaying communication related to a web page, the email communication channel becomes intertwined with browsing, and an integrated interface that combines the relevant functionalities of both becomes essential. 3.6 Multiple Communication Channels People generally use multiple communication channels that (typically) consist of more than one email account, instant messaging, SMS, Twitter etc. When they share URLs they also tend to use more than one of these channels. Note that while email was a predominant channel at work, we observed that participants used Jabber (IM client) or SMS to send short snippets of information. Typically, Jabber was used when information was not of much importance, and SMS was used mostly for urgent or time critical issues. When users were viewing a URL, if they had used a channel other than their work email for sending the URL or communication related to it, this information would not appear on the Sidebar interface (since at the time of development Sidebar only indexed emails from Outlook) thus causing confusion. This was an interesting finding and one of the key problems of integrating web and communication channels. 496 M. Mesarina et al. 3.7 Personal Productivity Trends We observe that some users were quite surprised to discover both how many of their incoming email messages already contained URLs and how many messages containing links they were sending. Included with Sidebar is a web application for extracting reports about the Sidebar index. These reports show how long the user has been talking about a topic, what days of the week the topic is normally discussed and what the hottest topics of conversation are. Sidebar trends track the popularity of topics in the user's social sphere and help determine communication productivity around a topic. 3.8 A Personalized Index into Long Web Documents When Sidebar is indexing the user's email it recognizes those messages which refer to particular sections within a larger document (using Fragment Identifiers [7]) and arranges the email messages using section headings extracted from the web document. Users can then navigate the larger document by clicking on sub-headings in the Sidebar, or navigate among messages by clicking on links to sections of the document. One of the co-authors of this paper utilized this feature extensively during his role as editor of a W3C working group document. As working group members sent emails with comments and links to the corresponding sections that had a URL fragment, Sidebar would group and display the emails referring to the appropriate section next to it. This is an example where the integration of email content with web documents makes editorial work more efficient, however our user groups did not appear to notice this feature and it is unlikely to be appreciated until the use of fragment identifiers becomes more pervasive. 3.9 Privacy Initial observations (conducted before the longitudinal study) indicated that some users were concerned about the location of the email index created by Sidebar. We specifically addressed this by making clear to participants that the index was being created on their own local machine. During the weekly interviews we asked users if the email indexes were created as per their privacy expectations, and all the 11 participants reported that it was. We were also concerned about inadvertent disclosure caused by having Sidebar open while browsing the web in a group setting. None of the users reported any discomfort and several mentioned that it was very easy to toggle between opening and closing Sidebar in the browser. 3.10 Visualization Thumbnail All participants understood that the people displayed in the thumbnail represented those involved in the email conversations about the currently displayed web page. Although the relationship between the seating position of a person and the importance of his/her emails was not apparent to the participants, all mentioned that seeing the number of people involved in a conversation suggested how important a web page was (in a collaborative context). Evaluating a Personal Communication Tool: Sidebar 497 Everyone figured out that they were at the center of the circle, but the abstraction of messages as lines was not apparent to either the writer or the admin. Interestingly in both cases they had tried hovering over and clicking on the message lines and people spots and so if we had supported an appropriate action in each case then the meaning may have been more readily apparent. All users noted that the planets represented links related to the current web page, and thought that the size of the planet depended on the number of times the URL was mentioned in an email (the latter is not exactly correct since we had actually computed the size based on a weighted function so that, for example, links I send carry more weight than those I receive). No users correctly interpreted the planet positions. Even though several noticed that pages closer to the current page were more to the right and that similar web pages appeared close together, they were confused when dissimilar pages also appeared close together on the outer bits. 4 Related Work Several tools, such as Remembrance Agent [6], Margin Notes [5] and VistaBar [4], have been developed for indexing email, files and other online documents based on web page content. One of the main differences between Sidebar and these tools is the relevancy of the matched documents. Sidebar uses the webpage URL to search for messages embedding the same URL, thus it is able to find messages that are, most of the time, about the web page. In contrast [6], [5], and [4] use keyword matching between the content of a page and other documents. Anchored Conversations [2] is a tool for adding comments to documents. It is a plug-in for Microsoft Word that allows users to attach multiple chat windows to sections for real-time discussions. In the case of Sidebar, email messages with a URL fragment point to the corresponding sections under that fragment on a webpage. There are several academic and commercial tools for searching and sorting email. Two prominent solutions are Google Desktop Search [3] and Xobni [8]. Although one feature of Sidebar is that email messages are organized based on shared embedded URLs, Sidebar is more of an integration tool between email communications and web content, rather than a mail organizer or search tool, and in fact coexists well and with those search tools. 5 Discussions and Conclusions Sidebar provides a summary of user email which is relevant to a currently-visited web resource. After conducting two user studies, we conclude that: • Sidebar provides serendipitous discovery of emails when browsing. In this case, the email summaries served mostly as historical reminders, and the participants tended not to interact with them. • Sidebar also aids in search. Once users had used existing email search tools to find one message containing a URL, they could select it to see all other relevant messages within the Sidebar. 498 M. Mesarina et al. • Some participants preferred manually creating email messages and copying URLs even though Sidebar could automatically provide this service. We note that users are very comfortable with existing email and web browsing tools, and that some users were nervous when address fields were automatically pre-populated. • Hanging conversations off web-pages is a powerful organizational tool, and we expect it to have the most benefit in organizations where there is already a suitable set of web addresses or where users change their behavior to accommodate it. A particularly good usage pattern is to create a twiki page for each meeting and then include a link to that in all related emails. • Participants expressed the desire to have a single unified experience combining web browsing with email and other communication channels. • Although the document indexing capability of Sidebar using fragment identifiers was very useful for online editing, this seems to not be a prevalent activity among users. • In term of privacy issues, although at first the participants raised some privacy concerns related to the usage of their email archives, Sidebar’s approach of storing personal email on the local machine matched user privacy expectations. • Displaying people and related links on the “visual” version of Sidebar were both immediately appreciated by all our test users. Interestingly, while we’d computed multi-dimensional weighted functions to determine position and size, our test users tended to assume a much simpler and more democratic relationship: size of object is proportional to the number of messages. • The visualization has room for improvement, especially in the messaging display which caused the most difficulty for our users. Improved interactivity should help. In today’s workplace, a large percentage of communication is conducted using email, leaving behind large but mostly unorganized email archives. Today’s web provides a wealth of organizational structures, as well as technology for bringing other organizational structures onto the web. By more closely integrating email with the web, Sidebar brings organization to email and more collaboration to the web. Acknowledgements Thanks to Venugopal Srinivasmurthy and Badrinath Ramamurthy for their assistance earlier in this research project, to Vlad Bolshakov who helped create the Outlook plugin, and to all those developers who contributed to the open source libraries on which our solution depends. Thanks also to our colleagues at HP for their advice and encouragement, especially our anonymous test users. References 1. Close, T., Recker, J., Sayers, C., Badrinath, R.: Sidebar: Ad-hoc, yet organized Personal Collaboration. HP Labs Technical Report HPL-2008-17 (2008), http://www.hpl.hp.com/techreports/2008/HPL-2008-17.html 2. Churchill, E.F., Trevor, J., Bly, S.A., Nelson, L., Cubranic, D.: Anchored Conversations: Chatting in the Context of a Document. In: Proc. CHI 2000, pp. 454–461. ACM Press, New York (2000) Evaluating a Personal Communication Tool: Sidebar 499 3. Google Desktop, http://desktop.google.com/features.htm 4. Marais, H., Bharat, K.: Supporting Cooperative and Personal Surfing with a Desktop Assistant. In: Proc. of UIST 1997, pp. 129–138. ACM Press, New York (1997) 5. Rhodes, B.: Margin notes: Building a contextually aware associative memory. In: Proc. IUI 2000, pp. 219–224. ACM Press, New York (2000) 6. Rhodes, B., Starner, T.: Remembrance agent: a continuously running automated information retrieval system. In: Proc. PAAM 1996, pp. 487–495. The Practical Application Company Ltd., Blackpool (1996) 7. URL Fragment Identifiers, http://www.w3.org/Addressing/URL/4_2_Fragments.html 8. Xobni, http://www.xobni.com/ 9. Sayers, C., Mesarina, M., Jain, J., Recker, J.: Visualizing Email Conversations and Related Web Resources, HP Labs Technical Report HPL-2008-138, Hewlett Packard (October 2008) “You’ve Got IMs!” How People Manage Concurrent Instant Messages Shailendra Rao1, Judy Chen2, Robin Jeffries3, and Richard Boardman3 1 Stanford University University of California, Irvine 3 Google shailo@stanford.edu, judychen@ics.uci.edu, {jeffries,rickb}@google.com 2 Abstract. Instant Messaging (IM) clients allow users to conduct multiple simultaneous conversations, which we term “concurrent IMs.” In this study we investigate how adults manage concurrent IMs both in the workplace and within the context of a goal-directed, time-bounded recreational task. We discuss differences in behavior between engaging in a single IM conversation and engaging in concurrent IMs. We document the errors that arise as a consequence of concurrent IMs and identify four main strategies users employ to manage them: controlling the pace of conversations, limiting the number of simultaneous conversations, window management, and using tabbed IM windows. Finally, we explore the pros and cons of these strategies and examine design tradeoffs to enable effective space and attention management while minimizing disruption to the user. Keywords: Instant messaging, concurrent IMs, multitasking, informal communication, notifications, tabs. 1 Introduction As instant messaging (IM) has become enormously popular over the last decade, researchers have noted the advantages of IM as a tool for lightweight interaction in the workplace. Nardi et al., for example, documented that IM supports tasks such as quick questions and clarifications, coordination and scheduling, organizing impromptu social meetings, and keeping in touch with friends and family [9]. The use of instant messaging as a means of socializing, coordination and collaboration has also been reported in recreational contexts [5]. These studies show that IM supports lightweight communication by providing users with a channel of communication that allows for the immediacy of face-to-face and over-the-phone interaction but without the overhead of maintaining these types of interactions. As a result, IM interactions can often be characterized as opportunistic, brief and spontaneous [5,6,7,9]. These characteristics of IM interactions have facilitated the ability to multitask while engaging in IM conversations. For example, teenagers regularly use IM while completing schoolwork, surfing the web, checking email, and engaging in multiple simultaneous IM conversations while doing so [5]. Isaacs et al. [6] also reported that multitasking J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 500–509, 2009. © Springer-Verlag Berlin Heidelberg 2009 “You’ve Got IMs!” How People Manage Concurrent Instant Messages 501 while using IM frequently occurs in the workplace and observed users participating in multiple simultaneous one-to-one IM conversations, which we term concurrent IMs. Although previous work has studied the management of multitasking – that is, the engagement of multiple activities [8], the phenomenon of engaging in concurrent IM conversations introduces several meta-issues to the already complex nature of multitasking. Although studies such as [5] and [6] have recognized the occurrence of concurrent IM conversations, we examine this behavior in depth and explore the issues that arise as a consequence of engaging in multiple IM conversations simultaneously. How does one decide on the degree of attention to give to a particular conversation? What are the challenges people encounter when managing multiple simultaneous conversations? What strategies are useful in dealing with those challenges? What design tradeoffs follow from engaging in concurrent IMs while multitasking? 2 Method Because users' goals and context influence how they communicate over IM, we chose two real world settings to observe and learn about how people manage and negotiate multiple conversations. The first was an investigation of IM usage in the workplace, where users at a technology company communicate with co-workers as well as outside friends and family. In the second phase we explored IM usage in an online Fantasy Football draft, where participants chat with multiple people during a goaldirected, time-bounded task. In total, the collected observational data consisted of 29 hours (14 hours of IM in the workplace and 15 hours of Fantasy Football drafts). In addition, we performed 25 hours of interviews (20 with the workers and 5 with the Fantasy Football managers). 2.1 Phase I: IM in the Workplace In the first phase of the study, we investigated the IM usage of 20 employees at a large technology company. All of the participants were experienced IM users (1+ years of usage.) The participants in our workplace sample included a receptionist, an administrative assistant, an internal communications specialist, customer support representatives, software engineers, facilities coordinators, and several interns. We conducted a 1-hour long, semi-structured interview with each participant to understand their typical IM usage. We asked participants about their recent experiences with concurrent IMs and group chats, as well as how they prioritized conversations. After the initial interview, we asked the participants to provide us with an hour-long screen-capture of their IM activity. In a follow-up interview, we reviewed the screencapture with each participant and asked them to provide us with context, informing us of the tasks they were working on while using IM, with whom they were chatting, and whether or not their conversations were related to the other tasks in which they were engaged. We compensated the participants for the initial interview with reward packages valued at $50. Participants received an additional $25 after they submitted a screen-capture of their IM activity and participated in a follow-up interview. We received screen-captures from 14 participants. 502 S. Rao et al. We anticipated that participants would have privacy concerns about participating in a study in which the content of their IM conversations would be visible. This is an especially sensitive issue in the workplace, because participants may feel selfconscious about holding non-work-related conversations and could cause participants to depart from their normal IM behavior. We were also aware that we could potentially observe a participant for hours and not see any IM activity. As an alternative to observing the participants live, we asked them to use screen-capture software to record their activity. We wanted to reassure participants that the content of their IM conversations was not the focus of our study, so the screen-captures were deliberately of low quality, enabling us to see screen activity but not read any specific text on the screen. Participants were also given full control over the timing and content of their submissions; they decided when and what they wanted to capture. 2.2 Phase II: Fantasy Football Draft and IM The second phase of our study examined IM use in Fantasy Football, in which participants play the role of a manager of a National Football League (NFL) team. Near the start of the season managers conduct a draft in which they forecast which NFL players will have the best statistics during the season and select players accordingly. After the draft, team managers earn points based on their players’ performance in weekly NFL games [3]. An online Fantasy Football draft user interface typically supports a group chat and a timer. The group chat is seen by all league members in their draft window and is usually used for draft-related discussion and banter. The timer ensures that all managers have no more than a specified amount of time to select a player. In addition to the timed task of drafting a roster and using the group chat, managers can engage in IM conversations, phone calls, face-to-face conversations, and email, outside the draft interface. Draft participants can contact or be contacted by people in their league about content in the group chat, such as picks, trades, advice, as well as jokes via a private backchannel. They can also be contacted by people outside their league about things that are not related to the draft. We chose to study an online Fantasy Football draft because it represents a setting in which both group chats and concurrent IMs can occur while people are multitasking. The draft is a unique environment because of its fast pace and massive inflow and outflow of communication, making it an interesting arena to study concurrent IM usage in the context of a goal-directed, time-bounded recreational task. We observed seven managers from six different Fantasy Football leagues do their drafts and interviewed them afterwards. We recorded the Fantasy Football managers’ computer screens with screen-capture software. After the draft, we interviewed each participant, following the same protocol we used in the first phase of the study. We reviewed the screen-capture with the participants and asked them to comment on their behavior, communication, and task management strategies. Because we were studying people in a recreational setting, some of the limitations that restricted us in the workplace setting did not apply. Live observation was appropriate because the content of Fantasy Football participants’ conversations was less likely to be sensitive. The Fantasy Football managers were compensated with reward packages worth $50. “You’ve Got IMs!” How People Manage Concurrent Instant Messages 503 3 Results and Discussion Table 1 summarizes the IM usage data from the participants in both phases of the study. We begin our discussion by examining how our participants across both phases behaved differently when managing an individual IM as opposed to concurrent IMs. Then, we document some of the common errors people made when engaged in concurrent IMs. We then identify four key strategies that people utilized to manage concurrent IMs, which emerged from our interviews and observations. Finally, we discuss key design tradeoffs that follow from our findings about concurrent IMs. 3 3-4 2-3 3-4 2 2 4 4 4 2-3 4 4 5 2 1 2 2 4-5 2-3 2 6-7 2 4 4-5 8-9 4-5 3 Observed Max. # of Concurrent IMs 1-2 1-2 1 1-2 4-5 2 3 3 1-3 1 3-4 <5 5 3 2 1 3 2 n/a 2 3 n/a 2 3 6 3 1 n/a n/a 2 8 n/a n/a Observed Max.# Concurrent IMs 60-120 90 <60 120-180 420-600 20-30 15 Reported Max. # of Conccurent IMs IM Usage Characterization Reported Time Chatting/Day (Minutes) 30 30-60 10-15 180-240 30 30-60 90-120 30 60-90 30 >90 15-60 180-240 Gender F F M F F M F always on always on on when available always on always on always on always on always on always on always on always on always on always on always on; invisible when unavailable often on always on always on always on on when available always on Fantasy Football Participant P14 P15 P16 P17 P18 P19 P20 F F M F M F F F M M F M F Reported Comfortable # Concurrent IMs P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 Gender Participant Table 1. A summary of the participants’ IM usage for the Phase I participants (left) and the Phase II participants (right) F1 M 5 F2 M 3 F3 M 9 F4 M 4 F5 M 0 F6 M 4 F7 M 3 3.1 Comparing Behavior between Individual and Concurrent IMs Both the workplace IM and Fantasy Football participants reported engaging in different behavior when participating in concurrent IMs compared to chatting in a single conversation. Shorter Responses. Six workplace participants and two Fantasy Football participants reported that they gave shorter responses to their conversation partners when they were chatting with multiple people. One of the workplace participants said that he did 504 S. Rao et al. not mind giving terser responses since he felt people generally expect short and abrupt conversations over IM. Less Attention per Each Concurrent IM. Five workplace participants reported that they paid less attention to each conversation when there were concurrent IMs. With concurrent IMs attention is divided across conversations. Such division is potentially unequal, depending on a user’s context, their relationship to their chatting partners, and the content of each dialogue. Splitting attention across concurrent IMs is not easy. Four technology workers noted that they have a hard time keeping track of multiple IM conversations. As we expected, several participants recalled memory lapses where they had forgotten what had been said in certain conversations. Multitasking Stress. Three workplace participants and one Fantasy Football manager explicitly noted that handling concurrent IMs can be stressful. One possible explanation for this stress is that distributing attention across multiple conversations increases the cognitive load. IM system notifications, specifically blinking windows, can also make it difficult for a user to focus on a particular IM conversation, let alone deal with other tasks and applications. 3.2 Errors with Concurrent IMs Participants reported making the following errors with greater frequency when managing concurrent IMs as opposed to a single IM conversation: sending a message to the wrong person, forgetting about chat partners, accidentally closing an IM window, and sending a message in a language the partner did not understand. Sending a Message to the Wrong Person. The most frequent error participants reported was sending a message to someone other than the intended recipient. Six of the tech workers and one of the Fantasy Football managers recalled making this error. We also observed one Fantasy Football manager commit this error during their draft. Several participants reported being worried about making this mistake whenever they engage in concurrent IMs. As Grinter and Palen pointed out, the consequences of making this mistake can vary drastically in severity [5]. Mistakenly sending one casual chat line to the wrong friend may be inconsequential. However, sending a message to the wrong person, particularly in the workplace, can have serious consequences. One of our participants from the tech company reported that she once accidentally told her boss to “hold on a freaking second” while she was chatting with several people simultaneously. Luckily, her boss was understanding when she later explained her mistake. Forgetting about Chat Partners. Three work IM users and one Fantasy Football participant recalled instances where they forgot about chat partners when they had concurrent IMs. This is expected when a user is dividing their attention unevenly across IMs. It may arise as a consequence of chat windows being obscured by other windows or overlooking IM notifications. Accidentally Closing Windows. We interviewed three workplace IM users who told us that they have accidentally closed IM windows by clicking the “x” on the window when they meant to click the minimize button. One Fantasy Football manager also “You’ve Got IMs!” How People Manage Concurrent Instant Messages 505 made this error during his draft. On many clients, concurrent IMs require multiple windows, which in turn can lead to window management errors. Recovering from this error can be as trivial as reopening the chat window or as difficult as recalling the topic and the conversation from scratch. This problem is exacerbated when IM users use clients that do not support conversation logging and history or when they have not enabled this feature. Confusing Languages. Two of our work IM participants regularly spoke to people over IM in different languages and scripts. One of these participants reported sending something in the wrong language over IM. Unlike the error of sending a message to the wrong person, this mistake was not due to addressing the wrong conversation window, but instead missing a critical pragmatic cue. 3.3 Strategies for Managing Concurrent IMs Our interviews and observations uncovered several key strategies participants used to manage concurrent IMs and deal with the aforementioned challenges. Controlling the number of conversations. One of the ways people manage concurrent IMs is by reducing them to a number they feel comfortable managing. Grinter and Palen reported that some IM users felt that they had a personal threshold beyond which they were unable to monitor their conversations sufficiently [5]. This threshold depends on the individual and varies according to the nature of the conversations, the chat partners involved, and the deadlines of the other tasks they are managing. Common ways of controlling the number of conversations we observed included adjusting online status and visibility (e.g. available, idle, away) and using different screennames or IM services to divide up different groups of contacts and tasks. One workplace IM participant and one Fantasy Football manager reported that they frequently quit their IM programs when they reach their upper limit of concurrent IMs. Signing off or exiting the IM program altogether essentially reduces the number of IM conversations to zero. Another strategy for controlling the number of conversations was to merge them by creating a group chat. It was interesting to note that merging IMs into a group chat did not necessarily decrease the number of conversations. Some participants kept their individual IMs active to maintain private backchannels. This was common among the Fantasy Football managers. In this case, IM users are not decreasing the number of conversations, but rather establishing a shared space so that messages did not need to be repeated across individual conversations. With a group chat and private backchannels, chat participants are controlling both the amount and type of content in the concurrent IMs. Controlling the pace. IM is inherently asynchronous since users can decide if and when they will respond to a message. While recent work has been done on predicting whether or not a user is likely to respond to an incoming message within a certain period of time [1], we explored responsiveness with respect to the way it was used to control the pace of IM conversations. Monitoring one’s response speed and attending to conversational cues about one’s conversation partner are some of the ways to achieve this. We often observed participants intentionally ignoring their chat partners while they were participating in a conversation with someone else or engaged in 506 S. Rao et al. another activity altogether. Participants often found themselves synchronizing their pace with their chat partners, which reduced the number of overlapping message transmissions and interleaved conversation threads. IM Window Management. We observed two approaches to IM window management: grouping and closing. Participants typically kept all of their IM conversation windows in a specific area of the screen, leaving the rest of the screen available for other computer-related tasks. With respect to closing, there were three strategies to IM window management. Aggressive closers were users who closed a conversation window or tab before a conversation is over. These users typically closed an IM window or tab after each message interchange. Moderate closers tended to close IM windows or tabs after a conversation ends. The non-closer usually left all IM windows and tabs open indefinitely, or until they quit the IM client or turned off their computer. We observed participants employing a combination of different strategies depending on their context and their chat partners. Using Tabbed IM Windows. Many participants employed tabbed IM window features to manage concurrent IMs. From our observations, it appeared the participants who used tabbed IM windows were more likely to maintain more IM conversations than the participants who did not use tabbed IM windows. The main advantage to tabbed IM windows is that they save screen real estate. Instead of several chat windows populating a user’s computer screen, there is a single window dedicated to IM conversations (see Figure 1). Furthermore, the participant needs to engage in less window management. Tabbed IM windows were reported to be less disruptive, since new conversations pop up as unfocused tabs, rather than as new windows. They also do not capture keyboard input focus, reducing the likelihood of a participant unintentionally typing in the wrong IM window and sending a message to an unintended contact. However, tabbed IM windows have weaker visual cues than non-tabbed IMs. When a minimized IM window blinks, non-tab users can tell whom the message is from, since only one chat partner is associated with the window. However, with concurrent IMs tabbed IM users do not know who the new incoming IM message is from based only on the blinking notification. This is of particular concern for IM users who prioritize conversations based on person or content since there is no way to differentiate conversations based on window level notification schemes. Fig. 1. A tabbed IM window “You’ve Got IMs!” How People Manage Concurrent Instant Messages 507 Other Strategies for Managing Concurrent IMs. Nearly half (9) of the workplace IM participants reported that they prioritize concurrent IMs based on either their chatting partner or the content of the conversation. This supports the intuition that some people pick particular conversations to pay attention to when handling more than a single IM. It can be a conscious decision in which concurrent IM users impose meaning on their chat windows, rather than let the window cues and placement always dictate their attention and response strategies. It has been previously documented that a specialized language filled with abbreviations, acronyms, and contractions has evolved with text messaging, including IM and email [2,4]. Three of our participants reported they find themselves using abbreviations, shorthand, and acronyms more often when engaging in concurrent conversations than with an individual conversation as a way of responding to their chat partners more efficiently. Some examples of such abbreviations are “busy ttyl” for “I’m busy, I’ll talk to you later”, “brb” for “be right back”, and one participant’s shared convention of “222” for “In a Meeting”. The text equivalent of “uh huh” and emoticons were employed so that users could quickly let their chatting partners know that they were paying attention. Two workplace participants and one Fantasy Football manager noted that they would avoid asking in-depth questions of their conversation partners when holding multiple simultaneous conversations. Their justification was that an in-depth question could result in a long, engaging conversation that requires increased attention and greater cognitive effort. Six workplace IM users and two Fantasy Football participants also reported that they often had to repeat themselves during concurrent IM conversations. One strategy to expedite constructing these repetitive messages was to copy and paste text between different IM conversations. 4 Design Tradeoffs This study uncovered two sets of key tradeoffs with concurrent IMs while multitasking: managing multiple windows versus managing tabs and notifications versus disruptions. 4.1 Managing Multiple Windows versus Tabs Tabs are one attempt to deal with the window management issues that arise with concurrent IMs. This approach brings a set of tradeoffs. Compared to separate non-tabbed windows, tabs require a different set of physical actions. Unlike non-tabbed windows, with tabs there is only one window to move and rearrange regardless of how many conversations are being managed. This can potentially make it easier for users handling concurrent IMs. Tabs can also cause more physical action and effort compared to non-tabbed windows. With tabs, if a window with concurrent IMs is minimized to the task bar, only the focused conversation’s title will be visible. Adding to an ongoing conversation with someone other than the partner in the focused tab means opening the tabbed window and then selecting the appropriate tab. This is an increase in effort compared to selecting a single non-tabbed window. 508 S. Rao et al. Working with tabbed IMs can affect how users impose meaning on their conversation windows. Tabs make it easy to focus on one conversation at a time, which helps people who privilege one partner’s chat over other ongoing IM conversations. Users can leave a conversation in the foreground of the tab set if they are awaiting a message that they deem important. Conversely, users can leave a conversation in an unfocused tab if they are trying to hide that content from people passing by (i.e. co-workers). Separate IM windows can help users who want to spread their attention across multiple conversations simultaneously. We observed one technology worker who purposely placed two conversation windows horizontally side by side. They reported that the conversations had equal importance to them, and this arrangement helped them attend to both equally. Tabs make only one conversation visible at a time, so this mechanism for assigning importance would not be possible. 4.2 Managing Alerts versus Disruptions Our study has begun to uncover the tradeoffs between the alerts that notifications provide and the disruptions they may impose. This tradeoff is dependent on both a user's situation and their personal alert preferences. Designing an effective notification system is challenging and rests on many subtleties. The different types of notifications each tell us something different about alerts in IM. The visual cues of color, pop-ups, and blinking windows differ in intensity and effectiveness. Color, the weakest of the visual cues, does not attract attention as much as the other three. None of our users turned color off, suggesting that this notification did not by itself place excessive demands on attention in the context of IM. Pop-up notifications are stronger cues than color because the human visual system is sensitive to motion. All of our participants turned off the pop-up notifications for their contacts' status. There could be three reasons for this: 1) the contact status information is not useful, 2) the pop-up action itself is distracting, or 3) the utility of the information is does not require such a strong notification cue. Contact status updates don't seem to be needed since over 80% (22/27) of our users made a conscious effort to keep their contact lists visible all times. This suggests that notifications relying on motion need to be carefully considered and selected. Window blinking, the strongest visual notification, is not without its tradeoffs. It is attention grabbing and hard to ignore because of its constant motion. In some cases this alerts users appropriately, but in others it becomes a disruption rather than an alert. This suggests the need for window blinking IM notifications to be rethought. Our participants preferred to have sound turned off. Sound may not be as disruptive as other cues to the user, but unlike the visual cues it has the potential to disturb co-located people who are focused on their own tasks. 5 Conclusion Although IM has been the focus of many studies, concurrent IM conversations have not yet been widely explored. Given the fragmented nature that is inherent of IM use, understanding the differences between one-to-one IM and concurrent IM use would enable designers to design for effective space and attention management while minimizing disruption. Although still in its exploratory stages, this study has uncovered “You’ve Got IMs!” How People Manage Concurrent Instant Messages 509 that concurrent IM use is highly situated, requiring the user to constantly make decisions regarding attention, window, and conversation management. The study has also allowed us to gain a better understanding of the behavioral differences between oneto-one and concurrent IM, highlighting some of the challenges users face when engaged in concurrent conversations. References 1. Avrahami, D., Hudson, S.E.: Responsiveness in instant messaging: Predictive Models Supporting Inter-Personal Communication. In: SIGCHI Conference on Human Factors in Computing Systems, pp. 731–740. ACM Press, New York (2006) 2. Baron, N.S.: See You Online: Gender Issues in College Student Use of Instant Messaging. J. Language and Social Psychology 23(4), 397–423 (2004) 3. Fantasy Football, http://en.wikipedia.org/wiki/Fantasy_football_American 4. Grinter, R.E., Eldridge, M.: Y do tngrs luv 2 txt msg? In: Seventh European Conference on Computer Supported Cooperative Work, pp. 219–238. Kluwer Academic Publishers, Netherlands (2001) 5. Grinter, R.E., Palen, L.: Instant Messaging in Teen Life. In: ACM Conference on Computer Supported Cooperative Work, pp. 21–30. ACM Press, New York (2002) 6. Isaacs, E., Walendowski, A., Whittaker, S., Schiano, D.J., Kamm, C.: The Character, Functions, and Styles of Instant Messaging in the Workplace. In: ACM Conference on Computer Supported Cooperative Work, pp. 11–20. ACM Press, New York (2002) 7. Kraut, R.E., Fish, R.S., Root, R.W., Chalfonte, B.L.: Informal Communication in Organizations: Form, Function, and Technology. In: People’s Reactions to Technology (Claremont Symposium on Applied Social Psychology), pp. 145–199. Sage, Thousand Oaks (1990) 8. Mark, G., Gonzalez, V.M., Harris, J.: No Task Left Behind? Examining the Nature of Fragmented Work. In: SIGCHI Conference on Human Factors in Computing Systems, pp. 321– 330. ACM Press, New York (2005) 9. Nardi, B.A., Whittaker, S., Bradner, E.: Interaction and Outeraction: Instant Messaging in Action. In: ACM Conference on Computer Supported Cooperative Work, pp. 79–88. ACM Press, New York (2000) Investigating Children Preferences of a User Interface Design Jamaliah Taslim, Wan Adilah Wan Adnan, and Noor Azyanti Abu Bakar Faculty of Information Technology & Quantitative Sciences, Universiti Teknologi MARA Malaysia jamaliah@tmsk.uitm.edu.my Abstract. Though there have been many studies of user interface design preferences, only a few have considered the children preferences. This paper presents an investigation into the children preferences regarding user interface design. The objective of studying this area is to investigate the differences of children preferences on the elements of a user interface design. An experiment was conducted regarding five elements of user interface design: font type, font size, background color and interface type. Findings show that there is a significant differences in the children preferences for interface type, font type and background color. Further analysis was conducted and the results indicate that there is a significant difference between gender groups for background color, interface type and font color. This study provides empirical evidence on the importance of considering the children in the interface design. Keywords: Children, User interface design, Preference, Color, Interface type. 1 Introduction Currently, almost the applications designed for children are developed by adults and they do not consider the children’s skills and preferences. As a result, the applications may not be easily learned and used by children [1]. Besides that, majority of the tools available are for the expert users which are not suitable for novice users particularly for children who have very limited knowledge in computer. In addition, the importance of individual differences such as gender, has been emphasized in the human computer interaction literature regarding the user interface design. However there is still lacking of empirical studies that examine the gender differences among children in their preferences of user interface design. Further research are required to strengthened the empirical evidence on the gender differences among children’s in the user interface design issues [2]. The purpose of this paper is to investigate the differences of children preferences on the elements of a user interface design. An experimental study was conducted for this purpose. 2 Literature Review One of the largest group of using the computer and internet is elementary-age children. They are not young adults but a special user group. However, many interfaces J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 510–513, 2009. © Springer-Verlag Berlin Heidelberg 2009 Investigating Children Preferences of a User Interface Design 511 for children do not consider their skills and preferences [1]. Interface developers should not design with the expectation that the child is able to understand in interaction with an extremely complex machine. The principle for the user centered design practices is that there is no design that fits all, but design should be driven by the knowledge of the target users. There is a growing amount of attention given to children as a special user group [3]. Many authors have discussed interface design, and it is common to have an emphasis on the user in the discussion. Shneiderman [4] did argue that any design should be based upon an understanding of its intended users. Among the important user characteristics that should be considered are age group and gender. Shneiderman argued that it is very common to find in practice that the children are not being considered in the user interface design guidelines. In fact, the involvement of children themselves in the design process is very unusual. Therefore, the interface designers and developers should be responsible to seek for a good quality products design which will positively contribute towards the children’s development and health [5]. The children’s interactions with the technologies are different depending on their age level which reflects their changing interests, characters, humors and contexts. According to Acuff and Reiher [6], ages between 8-12 the children are in the rule or role stage. In this age group, the interests gradually shifted from fantasy to reality. They become more interested in competition and prefer play in pairs and groups. A sense of logic and reasoning and simple abstractions start developing. This is a stage of shifting from main influence of parents and schools to a bigger influence from peers. From the age of 8 to 12 children start to understand more abstract terms and longer and more complex sentences. They develop the ability to analyse critically what they read. Children at the age of 9 and 10 are still not very good at planning their story and start telling the story straight away. Handheld computing devices and laptops are examples of current products targeting to this age user group. The design of these devices are more adult-like such as using less bright colors, than those designed for them when they were at their younger ages. More complex interfaces often provided by these new products such as having several functions represented in one button and varieties of menu structures available for them to explore. The functionality gives them more freedom in performing their task. Children, like adults often use the technology to perform their tasks. Markopoulos and Bekker [7] argue that the interface design need to be extended and specifically address the needs of children. They have pointed out two major issues In the context of designing for children: age-specific interaction styles, e.g. how to structure menu, font, interface type, color, etc.; and the involvement of children in the design process. According to them, research in the former is very sparse. One of the study related to children is by Inkpen [8]. This study reports that children ages between 9 to 13 preferred point and click over drag and drop. In addition, Read et al. [9] discussed the different text input techniques suitable for children. This research is rather limited compared to the corresponding research for adults. In addition, the research especially on the user interface elements e.g. font type, font size, color and interface type is still lacking. Standard user centered design approaches need to be adapted when considering the specific needs for children. Current design guideline compilations still focus mostly 512 J. Taslim, W.A. Wan Adnan, and N.A. Abu Bakar on adult users. Gilutz and Neilsen [10] take initiative to compile guidelines for web sites for children. 3 Method An experimental study was conducted with 40 primary schoolchildren of Sekolah Kebangsaan Seksyen 6, Shah Alam Malaysia. They were randomly divided into groups with five children per group. The range of their age is from 10 to 12 years old. A briefing on the purpose of the experiment and about the instruction was given to each of the group before they start the experiment. Each of the participants was given a maximum of 15 minutes to complete the task. Five user interface elements had been tested in this study, namely font type, font size, background color, interface type. For font type, 4 conditions were tested which were Arial, Comic Sans MS, Courier New and Times New Roman. For font size, 2 conditions were tested which were 12 and 14. For background color, 5 colors were chosen for the experiment namely green, blue, purple, red, and yellow. The interface types were categorized as simple and complex. The participants were asked to select the most preferred choice for each of these interface elements. 4 Results Results from the Mann-Whitney Test for analyzing gender difference have shown that there were significant differences between boys and girls in their preferences for background color with p = 0.001, and interface complexity with p = 0.036. In addition, there was marginal significance in their preferences for font type but no significant difference for font size. From the cross tabulation analysis, it was found that majority of girls prefer purple whereas the boys prefer blue for the background color. For the interface type, all girls have chosen simple interface type as their preferences. However, there were 20% of the boys preferred the complex interface type. Results for the font type, have shown that majority of the boys have chosen Arial as the most preferred and Comic Sans MS as the least preferred. In contrast, majority of the girls have chosen Comic Sans MS as the most preferred and Times New Roman as the least preferred. Further analysis has been conducted to examine the age difference among the children using Kruskal Wallis Test. The results shows that there were marginal age difference among children in the background color (p=0.063) and interface type (p=0.073). 5 Conclusions Interface design guidelines are not hard to find, but typically they also meant for adults rather than young users. This study examines the children preferences on the interface design. Five interface design elements were tested. Results showed that there are significant differences in children preferences for interface type and background color. In addition the result also highlights the importance of considering the effects Investigating Children Preferences of a User Interface Design 513 of gender-based differences in the user interface design for children. From these findings it is concluded that a specific interface design guidelines are required for children rather than simply relying upon general design guidelines and it is necessary to involve these users in the design process in order to formulate those guidelines. References 1. Hutchinson, H.B., Bederson, B.B.: Interface for Children’s Searching and Browsing (2005) 2. Zaman, Geerts: Gender differences in children creative game play. Young People & New Technology, UK Northampton (2005) 3. Bruckman, A., Bandlow, A.: HCI for Kids. In: Jacko, J., Sears, A. (eds.) Human-Computer Interaction Handbook, pp. 428–440. Lawence Erlbaum, Hillsdale, NJ (2003) 4. Shneiderman, B., Plaisant, C.: Designing the User Interface: Strategies for Effective Human-Computer Interaction, 3rd edn. Addison-Wesley, Reading, MA (1998) 5. Wartella, E., O’Keefe, B., Scantin, R.: Children and Interactive Media. A Compendium of Current Research and Directions for the Future, Markle Foundation (2000) 6. Acuff, D.S., Reiher, R.H.: What Kids Buy and Why. The Psychology of Marketing to Kids. The Free Press, New York (1997) 7. Markopoulos, P., Bekker, M.: Interaction Design and Children. Interacting with Computers 15(2), 141–149 (2003) 8. Inkpen, K.M.: Drag-and-drop versus point-and-click: mouse interaction styles for children. ACM Transactions on Computer Human Interaction, Full Text via CrossRef 8(1), 1–33 (2001) 9. Read, J.C., MacFarlane, S.J., Casey, C.: Proceedings BCS HCI 2001, Lille, France, pp. 559–573. Springer, London (2001) 10. Gilutz, S., Neilsen, J.: Usability Websites for Children: 70 Design Guidelines. Neilsen Norman Group (2002), http://www.NNgroup.com/report/kids Usability Evaluation of Graphic Design for Ilmu’s Interface Tengku Siti Meriam Tengku Wook1 and Siti Salwa Salim2 1 Faculty of Information Science and Technology National University of Malaysia, 43600 Bangi Selangor 2 Faculty of Computer Science and Information Technology University of Malaya, 50603 Kuala Lumpur tsm@ftsm.ukm.my, salwa@um.edu.my Abstract. Graphic design is fundamental to Ilmu’s interface (i.e. WebOPAC for children) and is the focus of this study. A usability evaluation is carried out for the new prototype of Ilmu’s interface which gives the emphasis to the components of graphic design. Questionnaire and observation methods are used to accumulate the usability data. The usability of Ilmu's new interface is shown to be significantly better through t-testing, and statistical testing using chi square (χ2 ). Keywords: Usability, graphic design and children’s interface. 1 Introduction Ilmu is a WebOPAC application used as an information resource throughout Malaysia to facilitate the location of references and the analysis of bibliographical information by students. Graphic design plays an important role in arranging or placing information on children’s interface of WebOPAC, and Ilmu needs enhancement in its graphical design as this factor receives the highest ranking in contributing to the usability problems [1]. A new prototype of Ilmu (Ilmu_2) is implemented to demonstrate the usability of the existing interface (Ilmu_1) that can be upgraded. An effective and user-friendly graphic design depends on the use of space, content arrangement, functional accessory and color coordination. Hence, those elements are the focus of improvisation of Ilmu_2 design. The use of space plays a vital role in generating hierarchical information. Nonhierarchical information, can cause user disorientation [2]. Users will lose interest when their searching and surfing objectives are not accomplished. Hierarchy within information helps the user to determine current location and status. Control functions act as an intermediate object or pictorial icon - an accessory set apart from text and which serves to implement a function. Examples are the Icons ‘help’, ‘back and previous’ on instruction buttons and the label ‘X’, ‘EXIT’ which describes a function. An animated character acts as an assistant to enhance the usage of the function and improve user’s understanding. Color coordination is very important in graphic design as it helps the site to look interesting enough for the system interface [3]. The choice of colors must be appropriate and consistent throughout the whole site as to create a standardization effect [4] and [5]. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 514–519, 2009. © Springer-Verlag Berlin Heidelberg 2009 Usability Evaluation of Graphic Design for Ilmu’s Interface 515 2 Hypotheses The objective of carrying out the usability evaluation is to determine whether there is a significant difference and effects between Ilmu_1 and Ilmu_2 designs. The following five hypotheses serve the basis for conducting a usability evaluation of Ilmu’s interface: H1. There is a significant difference between the use of space in the Ilmu_1 and Ilmu_2 designs. H2. There is a significant difference in the content arrangement between Ilmu_1 and Ilmu_2 designs. H3. There is a significant difference for the functional accessory between Ilmu_1 and Ilmu_2 designs. H4. There is a significant difference in the color coordination arrangement between Ilmu_1 and Ilmu_2 designs. H5. There is an excellent level of acceptance by Malaysian students of the new Ilmu_2, design. Fig. 1. Relationship of independent variables between Ilmu_1 and Ilmu_2 The main aim of hypotheses 1 – 4(H1 – H4) is to demonstrate any significant difference of usability score between Ilmu_1 and Ilmu_2. This is tested using t-test (paired sample test). Figure 1 shows the independent variables (Ilmu_1 and Ilmu_2), the components of graphic design (use of space, content arrangement, functional accessory, and color coordination – which are the main focus of Ilmu_2 design), and the usability factors for each component of graphic design. Usability factors used in this research are the effectiveness, accessibility, easy to learn, and enjoyable. The aim of hypothesis 5 (H5) is to observe student’s perspective towards Ilmu_2’s graphic design (use of space, content arrangement, functional accessory, and color coordination) in relation to the usability factors, hence, making conclusion the level of acceptance by students of Ilmu_2’s interface. This is tested using chi-square (χ2). 516 T.S.M. Tengku Wook k and S.S. Salim 3 Usability Evaluatio on Methods Two usability evaluation teechniques were applied, which involved students providding feedback via a questionnairre and observing students interaction with Ilmu_2 interfaace. One hundred students particcipated in the questionnaire exercise while twenty studeents were involved in the observ vation process. 3.1 Questionnaire The survey required the sttudents to answer the questions using a 1-5 Likert sccale range. A t-test (paired samp ple t-test) was applied to monitor any significant differeence of usability score between the t Ilmu_1 and Ilmu_2 designs. 3.2 Observation The observation required the researcher to observe children’s behavior and the understanding and ability to o search and browse books using Ilmu_2. As to ensure tthat data is collected consistenttly from the students during the observation, a checklisst is used to record the findings, concentrating on the four components of graphic desiign. The data was gathered quantitatively according to the measurement criteeria categorized by Dumas and Redish R [6]. Excellent Acceptable Unacceptable Ilmu_2 in nterface is effective, practical and easy to learn to seaarch for the biibliographic information Students are satisfied with the searching nterface is not ‘OK’. Students are having difficulties ussing Ilmu_2 in Ilmu_2 in nterface to search for the bibliographic information 4 The Results of Usa ability Evaluation Figure 2 shows the rangee of mean scores for the components of graphic dessign between Ilmu_1 and Ilmu u_2. The usability score of Ilmu_2 interface shows an increase of 1.56 points forr use of space, 1.58 points for content arrangement, 11.61 points for functional accessory and 1.12 points for color coordination. 5.00 4.00 3.00 2.00 1.00 0.00 4.47 2.91 4.41 2.83 4.37 2.76 4.39 3.27 Ilmu 1 Ilmu 2 Use of Space Functional Color Content aarrangement Accessory coordination Fig. 2. Mean scores off Ilmu_1 and Ilmu_2 for the components of graphic design Usability Evaluation of Graphic Design for Ilmu’s Interface 517 4.1 Results of H1 As shown in figure 2, there is a significant difference between the use of space in Ilmu_1 and Ilmu_2 (t = 39.546, p < .05). Table 1 shows the mean scores and percentages of usability factors for use of space component. Ilmu_2 has recorded a positive increase in the easy to learn and enjoyable factors. Table 1. Use of Space Component Usability Factors Effectiveness Easy to learn Enjoyable Mean Score 4.43 4.87 4.87 Percentage 33.05% 33.47% 33.47% A walkthrough technique played a major role in the improvement of the use of space in Ilmu_2. Through its implementation, students are allowed to move the mouse (cursor) to the right or left during their 360° environment exploration. Students are free to explore and carry out daily activities on the screen without any assistance from teachers or their elders. 4.2 Results of H2 There is also a significant difference for content arrangement between Ilmu_1 and Ilmu_2 (t = 37.954, p < .05). The strength of content arrangement in Ilmu_2 lies on the application of tree-maps technique. Text and graphic types of information are displayed hierarchically and in a structured manner which enhances the usability of Ilmu_2. The location of objects such as menu, instructions, buttons, lines and images was aligned horizontally with the movement of the mouse (to the left and right) during the exploration. A comic-strip technique was implemented in performing the arrangement of sub-subject folders in a cabinet. Table 2. Content Arrangement Component Usability Factors Effectiveness Easy to learn Enjoyable Mean Score 4.462 4.378 4.39 Percentage 33.73% 33.09% 33.18% 4.3 Results of H3 As shown in figure 2, functional accessory has the most significant difference between Ilmu_1 and Ilmu_2 (t = 39.304, p < .05). This component in Imu_2 lies in the deployment of a label function, animation function, terminology and the caterpillar character that act as an assistant. Students were satisfied and it is easy for them to use Ilmu_2 on their own. A clear and concise set of instructions on the menu using bigger fonts provided the easy access. 518 T.S.M. Tengku Wook and S.S. Salim Table 3. Functional Accessory Component Usability Factors Effectiveness Easy to learn Enjoyable Mean Score 4.365 4.39 4.345 Percentage 33.32% 33.51% 33.17% 4.4 Results of H4 Color coordination has the least significant difference between Ilmu_1 and Ilmu_2 (t = 24.485, p < .05). Ilmu_2 uses a combination of light and cheerful colors. Appropriate selection of colors adds to the student’s enjoyment as they feel happy and comfortable while they search and surf. Table 4. Color Coordination Component Usability Factors Effectiveness Easy to learn Enjoyable Mean Score 4.47 4.383 4.33 Percentage 33.91% 33.25% 32.84% 4.5 Results of H5 Results obtained from the statistical evaluation using the chi-square (χ2) shows scattered data for the excellent and acceptable parameters to be χ2 (10, N = 30) = 240.8, p < 0.05. Table 5 shows excellent feedback from students at a level of 83.93% and none rejected Ilmu_2. Table 5. Students’ acceptability towards Ilmu_2 Adaptability Excellent Acceptable Unacceptable Mean Score 23.5 4.5 0 Percentage 83.93% 16.07% 0% 5 Conclusion Graphic design is a vital element to creating children’s WebOPAC. The usability of Ilmu_2 is shown to be significantly better through t-testing, and statistical testing using chi square (χ2). Table 6 compares the graphic design techniques applied between Ilmu_1 and Ilmu_2. Usability Evaluation of Graphic Design for Ilmu’s Interface 519 Table 6. Comparison of the application of graphic design techniques Graphic Design Use of space Searching Technique Keyword Subject Content Arrangement Functional Accessory Location Keyword Subject Location Keyword Subject Location Ilmu_1 Interface Ilmu_2 Interface Exact match Boolean Operation Image or text hyperlink Non hierarchical Exact match Non hierarchical Use of label, icon and button Use of label, icon and button - Image or text hyperlink Pan/zoom Hierarchical (tree-maps) Hierarchical (Comic strip) Magnification glass (Lens) Use of label, icon and button. Use of label, icon, button and image Worm character (interface agent) Use of label, icon, button and image Caterpillar character (interface agent) References 1. Meriam, T.S., Wook, T., Salim, S.S.: User Testing of Children’s WebOPAC: A Malaysian Experience. In: The Seventh Asia-Pacific Conference on Computer Human Interaction, Taiwan (2006) 2. Hutchinson, H.B.: Children’s interface design for hierarchical search and browse. ACM SIGCAPH Newsletter. College Park, pp. 11-12 (2003) 3. Christoffel, M., Schmitt, B.: Accessing libraries as easy a game: Visual Interface to Digital Libraries, pp. 25–38. Springer, Berlin (2002) 4. Murch, G.M.: Physiological principles for the effective use of color. In: IEEE Computer Graphics and Applications, pp. 49–54. IEEE Computer Society Press, Los Alamitos (1984) 5. Oosterholt, R., Kusano, M., Vries, G.: Interaction design and human factors support in the development of a personal communicator for children. In: Computer Human Interaction, pp. 450–457. ACM, Vancouver (1996) 6. Dumas, J.S., Redish, J.C.: Creating Task Scenario. A Practical Guide to Usability Testing. Intellect, USA (1999) Are We Trapped by Majority Influences in Electronic Word-of-Mouth? Yu Tong and Yinqing Zhong Department of Information Systems, School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 117543 {tongyu,zhongyin}@comp.nus.edu.sg Abstract. Being an effective online mechanism to generate large-scale electronic Word-of-Mouth (EWOM), online feedback systems (OFS) offers a variety of system design cues to facilitate consumers’ decision making. However, such cues may lead consumers to make inferences based on an overall picture of the majority opinion without scrutinizing the content of reviews. This study draws on theories of majority/minority influence and dual-process to explore the influences of OFS design cues on consumers’ learning outcomes (i.e., awareness of product/service, confidence in judgment, intention to searching for additional information and intention to conform to majority). Numerical and power majority influences are examined through two design cues: review clustering format (list-clustering vs. pair-clustering) and source credibility (available vs. unavailable). Keywords: Word-of-mouth, online feedback system, majority influence, system design. 1 Introduction With the advancement of communication technologies and the expansion of the Internet, the influence of Word-of-Mouth (WOM) has been amplified to a great extent. Online feedback system (OFS), being a web-based Information System (IS) that allows consumers to post feedbacks on products/services in different forms such as ratings, short and/or detailed consumption reviews [12], has been touted as an effective mechanism to generate large-scale Electronic WOM (EWOM), which exerts a strong impact on consumer learning and product merchants’ revenue [12]. The scope of EWOM in OFSs can be comprehensive, such as in Epinions.com or be restrictive to a specific market segment, such as entertainment guides in Citysearch.com. Inside an OFS, all reviews pertaining to the same product/service are organized together to facilitate consumers’ product evaluation. Furthermore, a variety of cues stemming from system designs are embedded in many OFSs. For instance, reviews written by reputable reviewers (e.g., top reviewers) are highlighted to aid consumers in recognizing experts’ knowledge. Reviews can also be clustered in various formats, which provide consumers’ different pictures of the overall product’s appraisal. With these cues, consumers can easily infer the expert’s and/or majority’s perspectives and hence make judgment without scrutinizing the content of reviews. Such influences J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 520–529, 2009. © Springer-Verlag Berlin Heidelberg 2009 Are We Trapped by Majority Influences in Electronic Word-of-Mouth? 521 thus have led to the questions: Whether a consumer’s product evaluation can be influenced by cues embedded in OFS designs rather than review contents? If yes, through what system designs? While extant research has focused on the impacts of reviews on trust building and sales revenue (e.g., [4, 22]), relatively little effort has been devoted to examining the relationships between system designs on consumer learning and product evaluation. We seek address this gap in the literature by drawing on theories of majority/minority influence and dual-process to explore the influences of two majority designs in OFSs (i.e., review clustering format and availability of source credibility) on consumers’ learning outcomes. The approach is appropriate as there is evidence from duo-process theories that an individual may be influenced by heuristic cues without scrutinizing the content of the information [31]. 2 Theoretical Foundation 2.1 Majority and Minority Influences Prior studies on the majority/minority influences have inconclusive views on how individuals change their behaviors/attitudes based on the majority or minority’s view. Conversion theory [29], which forms the foundation of many early studies on this issue, posits that individuals undergo different forms of cognitive and motivational processes with different source of influences (i.e., majority or minority). When the influence is received from the majority, individuals would choose to conform to the majority’s attitudes publicly without carefully examine the arguments. On the other hand, when the influence is received from the minority, individuals would pay more attention on interpreting the information from the minority viewpoint. Consequently, they may convert their attitudes privately [15, 29], which induce more enduring changes [5]. Different from conversion theory, other studies propose that majority influence can also exert a significant impact on sustained attitude changes (e.g., [21, 25, 32]. Mackie’s [26] objective consensus approach contend that it is the majority that elicits greater message processing, as people tend to accept the majority viewpoint as objective reality. Consequently, people do no engage in a great deal of cognition over the minority message. Based on the conversion theory and the objective consensus approach, Erb et al. [15] examine the effects of recipients’ prior attitudes on message scrutiny in the context of minority and majority influence. Specifically, majority message is processed more extensively than minority message when recipients held a moderate prior attitude. When recipients held an opposing prior attitude, however, the minority message was processed more extensively than the majority message. Notwithstanding the contention in the majority/minority literature, previous studies highlight in common the positive effects of processing message content in decision making. Minority view, even when wrong, stimulates reappraisal and a consideration of more alternatives [30]. Previous research contends that dual-process theories of information processing offer an insight view on why and how people process messages from the majority and/or minority group (e.g., [21]). 522 Y. Tong and Y. Zhong 2.2 Dual-Process Theories of Information Processing The dual-process theories of information processing encompass a family of theories, which examine the roles played by both the message content and the contextual factors. Both the elaboration likelihood model (ELM) [31] and the heuristic-systematic model (HSM) [14] of persuasion identify two information processing modes that people use when assessing the validity of a message. Systematic processing, also named as central route, refers to the process in which individuals carefully scrutinize all relevant information and try to incorporate it into what they already know. Heuristic processing, also named as peripheral route, refers to the process in which individuals use cues or heuristics, such as “credibility implies correctness”, to assess content validity [11]. When the systematic processing is used, the attitude change is apt to be relatively enduring and probably affects the subsequent behaviors. On the other hand, when heuristic processing is used, individuals tend to be influenced by factors other than the message itself. As a consequence, any change in attitude is likely to be temporarily and has less effect on how they actually act. Elaboration likelihood refers to the degree of systematic processing an individual uses in processing the message [23]. Previous research shows that elaboration likelihood could be affected by three factors: relevance of the topic to the person, enjoyment of thinking, and diversity of argument [23]. Specifically, the more important the topic is, the more likely the individual will scrutinize and think about the messages. The more an individual enjoys mulling over arguments, the more likely he/she will use systematic processing. While confronted with diverse opinions and the relevance and enjoyment of thinking being equal, individuals tend to process information systematically as making peripheral judgments is not easy [1]. The dual-process theories of information processing have been commonly adopted in prior research to explain how group opinion operating as simple cues causes individual members to use heuristics in thinking, particularly when they are moderately motivated and unfamiliar with a given topic [3]. Individuals tend to easily reject the minority message using the heuristic “a lack of consensus implies a lack of validity”. Instead of scrutinizing the argument, they make judgments quickly on the majority/minority cues [23]. 3 Research Model and Hypotheses Fig. 1 depicts the research model. Our main proposition is that consumers’ learning outcomes are contingent on majority cues embedded in two OFS designs (i.e., reviews clustering format and source credibility). In line with previous literature [33], majority is defined in two dimensions: (1) number of members (i.e., majority group is numerically greater than minority group); (2) relative power (i.e., majority group is relatively more powerful than minority group). The former is a common cue used when people rely on the sheer number of arguments to determine whether to accept a perspective [31]. Similarly, the credibility of the source can be used as a cue [27]. When source credibility is high, the message is more likely to be scrutinized and accepted. Are We Trapped by Majority Influences in Electronic Word-of-Mouth? 523 Fig. 1. Research Model 3.1 Dependent Variable: Consumer Learning Consumer learning plays a critical role in affecting consumers’ consumption behaviors and manifests itself in three dimensions: cognitive, affective, and conative [20, 24, 34]. Cognitive dimension measures the ability of a medium such as OFS designs to attract attention and ultimately transfer product/service information to consumers’ memory. In this study, awareness of product/service is examined as a dependent variable to indicate the extent of product information consumers learn from the OFS [2]. This variable reflects consumers’ efforts in generating awareness, establishing product/service knowledge, increasing comprehension of the brand name, and understanding the information presented [2, 8, 18]. Affective dimension measures either established or created attitudes from a medium, such as OFS [28]. However, in this study the ‘absolute’ value in product attitudes are not of interest, as it is contingent on the perspective of the reviews. Instead, we examine consumers’ confidence in judgment, which is the degree to which consumers feel confident in their evaluation of the product [7]. Lastly, conative dimension involves some types of behavioral intention, such as searching for additional information [17]. Similarly as affective dimension, purchase intention is out of this study’s scope as the content of product reviews could be either positive or negative. Instead, intention to search for addition information and intention to conform to majority are chosen as dependent variables in dimension. 3.2 Review Clustering Format: List-Clustering vs. Pair-Clustering Clustering is increasingly adopted by some OFSs to organize reviews pertaining to a product/service. Through an empirical comparison between standard list and clustered list of a search engine interface, Zamir and Etzioni [35] reported that substantial differences exist in using these two formats. In this study, influences from reviews 524 Y. Tong and Y. Zhong clustering format (i.e., numerical majority/minority) are manipulated into two types: list-clustering vs. pair-clustering. List-clustering refers to a format, in which positive and negative reviews are grouped into two separate and adjacent lists. Deduced form the HSM, we posit that list-clustering format makes the cue of numerical majority salient to consumers as consumers can easily form an impression about reviewers’ overall appraisal of the product/service. As individuals tend to rely on the sheer number of arguments to determine whether to accept a perspective [31], instead of scrutinizing the content of reviews, they are likely to follow numerical majority’s perspective using the heuristic “a lack of consensus implies a lack of validity”. According to HSM, any changes resulting from peripheral route tend not to be lasting. To diminish the majority effects on consumers’ information processing, we propose one specific review clustering format, namely pair-clustering. This format pairs up positive and negative reviews and displays them in an alternative manner inside one list. In contrast to list-clustering format, consumers need to go through all reviews (positive and negative) to glean the best information as positive and negative reviews are displayed alternatively. When the heuristic cue is not salient (i.e., it is difficult to judge the number of reviews on each perspective), consumers have to make the judgment by systematically scrutinizing the messages in the OFS. Thus, we hypothesize, H1: Pair-clustering format will lead to higher awareness of product/service than list-clustering format. H2: Pair-clustering format will lead to higher confidence in judgment than listclustering format. H3: Pair-clustering format will lead to higher intention to search for additional information than list-clustering format. H4: Pair-clustering format will lead to lower intention to conform to majority than list-clustering format. 3.3 Availability of Source Credibility Source credibility, which refers to a communicator’s ability to affect receivers’ information acceptance in communication, can be considered as a form of power inside a group [13]. When an individual possesses higher credibility, his or her statements are perceived as more persuasive to others even though such statements may only represent the perspective of a numerically small group. In line with this categorization, a majority group is one that is considered more credible compared with a minority group. Research shows that source credibility is determined by two elements, source expertise and source bias [9]. Source expertise is defined as “the perceived competence of the source providing the information”. Source bias is defined as “the possible bias/incentives that may be reflected in the source’s information” [9, p.6]. Essentially, a source is considered as more credible when it possesses greater expertise and has less incentive to bias the information [10]. In the context of OFS, a consumer who provides his or her review on a product/service is less prone to bias since he/she is unlikely to have an ulterior motive (e.g. sell the product) [16]. Therefore, the level of expertise substantially determines perceived credibility of the reviewer. Are We Trapped by Majority Influences in Electronic Word-of-Mouth? 525 Due to the anonymous nature of online environment, it is relatively difficult to directly evaluate a reviewer’s expertise. Practically, many OFSs provide features to identify “expert” reviewers in a certain product category. For example, at Epinions.com, reviewers constantly providing high-quality reviews can obtain a status such as category leaders and top reviewers. When such reviewers write reviews about a certain product, their status are highlighted on top of the reviews. As the calculation of status is based on cumulative and objective criteria (in terms of quantity and quality of the reviews), reviewers who gain such status from an OFS is perceived to be highly credible. Source credibility can be used as a cue or heuristic. When an individual is moderately involved in a focal topic, he or she may base on a simple heuristic (i.e. “expert’s statements can be trusted”) to evaluate the content validity [1]. In the context of OFS, when consumers observe a top reviewer’s product review, they may treat this source credibility as a cue and go through heuristic processing. Under such circumstances, consumers’ attitude changes are made based on what the expert says rather than scrutinizing the content of other reviews. Therefore, it is unlikely for them to have an enduring attitude change. However, when there is no indication of reviewer’s status, the heuristic cue in terms of source credibility is not salient. In such circumstances, consumers who need to evaluate the product have to go through systematic processing by scrutinizing content of all relevant reviews. Thus, we hypothesize, H5: Availability of credible source will lead to lower awareness of product/service than the condition when credible source is unavailable. H6: Availability of credible source will lead to lower confidence in judgment than the condition when credible source is unavailable. H7: Availability of credible source will lead to lower intention to search for additional information than the condition when credible source is unavailable. H8: Availability of credible source will lead to higher intention to conform to majority than the condition when credible source is unavailable. 4 Research Method The research model will be tested in a laboratory experiment with three betweensubjects factors: review clustering format (list-clustering vs. pair-clustering), source credibility (available vs. unavailable), and the attitude of numerical majority (positive vs. negative). When a credible source is available, we counterbalance the alignment of numerical majority and credible source (yes or no). Thus, we employ a completely counterbalanced, full factorial design, which provides 12 combinations in total. An air-con maintenance service company is chosen for the experiment mainly based on two reasons. First, intangible service is an ideal context for studying OFS effect as it is hard for consumers to obtain a comprehensive judgment before consumption. Second, air-con maintenance service allows us to control for a moderate initial motivation before subjects (university students) read the reviews. 4.1 Independent Variables Review clustering format is manipulated on the product reviews page. The number and content of reviews displayed in two formats are identical. In the list-clustering 526 Y. Tong and Y. Zhong condition, all positive reviews will be placed in the left list and all negative reviews will be placed in the right list. In the pair-clustering condition, five pairs of alterative reviews (positive + negative) are displayed first followed by the rest five reviews with the majority attitude. All reviews consist of an overall rating, representing the attitude toward the service, and written comment. While only the first two lines will be displayed for each review, subjects can read the full review in a popup window when clicking on “read full review” link at the end of the each review. Two levels of source credibility are studied: available and unavailable. Subjects in available credible source group will notice a top reviewer status placed on the first review in either numerical majority or minority side. Subjects in unavailable source credibility groups will see all reviews without the top reviewer status. 4.2 Dependent Variables The measurements for dependent variables are adapted from previous validated scales. Awareness of product/service is operationalized as the extent of message recall measured using thought-listing techniques [5]. Subjects are asked to list all of the thoughts they had after reading the reviews, classifying each according to attributes provided by researchers (e.g. price, membership). Two researchers will independently rate each subject’s thoughts using a 1 to 10 scale. Any disconformity will be checked. Confidence in judgment is measured using the scale of attitude confidence from Berger and Mitchell [7] and adapted in our context. It is measured with two 7-point scales anchored by the statements: “not at all certain/very certain”, and “not at all confident/completely confident”. Both intention to search for additional information and intention to conform to majority are measured using four 7-point scales anchored by the statements (“unlikely/likely, improbable/probable, uncertain/certain, definitely/ definite not”) adapted from Bearden et al. [6] to determine the likelihood that subjects intend to perform the two behaviors. 4.3 Pre-test The literature suggests that argument quality can influence an individual’s information processing [5]. A pre-test is conducted on 20 subjects from the same subject pool of the main experiment. 40 reviews toward the air-con maintenance service company (30 positive and 10 negative) are presented. All reviews contain the same number of sentences and all sentences in one review represent same attitude (positive or negative). To control for the argument quality, subjects are asked to rate their perceived persuasiveness for each review. 10 positive reviews and 5 negative reviews deemed at similar level of persuasiveness are selected for the main experiment. Subjects are also asked whether these two numbers can correctly represent numerical majority and minority group. An OFS is developed specific for this study. It is installed on a server in the same local area network as the computers in the laboratory to ensure a consistent high network speed for all subjects. Subjects are asked to comment on the design of the OFS from various aspects such as speed, and layout. Feedbacks from the subjects are used to fine-tune the design of the OFS. In addition, we also check subjects’ prior attitudes toward the air-con service company to ensure subjects have moderate motivation before reading the reviews. Are We Trapped by Majority Influences in Electronic Word-of-Mouth? 527 4.4 Participants and Experimental Procedures 160 students from a large university are recruited and randomly assigned to 12 groups. Subjects are told to evaluate a new air-con service company given the information from a specialized OFS in this industry. Before they look at the reviews, a brief instruction of the OFS and the procedure to compute top reviewers is explained to establish a common frame of reference. Next, subjects are instructed to browse the OFS and locate information on a specific service to become familiar with the system layout and its various features. Subjects are then instructed to evaluate the experimental service at their own pace and to raise their questions if any. Subjects are required to fill a post questionnaire, which includes manipulation checks, measurements for dependent variables and demographic information. Lastly, subjects are debriefed and given a token for participation. 5 Concluding Remarks This study aims to advance the theoretical understanding in the area of EWOM and majority influence. First, this paper constitutes one of the first studies in the EWOM literature investigating the effects of possible system designs on consumer learning outcomes. Second, this study extends the literature by examining influences from both positive and negative reviews. Third, this study advances the majority influence literature by considering a broader conceptualization of majority, including both numerical and power majority. This study can also draw interesting practical implications for OFS designers. First, as consumers are possibly subjected to the influence from review clustering format, OFS designers should appropriately select the format that is helpful for consumers to make product evaluation. For example, they can give consumers the option to select their preferred format or automatically choose format according to consumers’ indicated preferences in OFS or their log history. Second, as source credibility could significantly influence consumer’s evaluation process, practitioners should scrutinize the algorithm used in computing top reviewers. References 1. Areni, C.S., Ferrell, M.E., Wilcox, J.B.: The Persuasive Impact of Reported Group Opinions on Individuals Low vs. High in Need for Cognition: Rationalization vs. Biased Elaboration? Psych. and Marketing 17, 855–875 (2000) 2. Ariely, D.: Controlling the Information Flow: Effects on Consumers’ Decision Making and Preferences. J. Consumer Res. 27, 233–249 (2000) 3. Axsom, D., Yates, S., Chaiken, S.: Audience Response as a Heuristic Cue in Persuasion. J. Pers. Soc. Psych. 53, 30–40 (1987) 4. Ba, S., Pavlou, P.A.: Evidence of the Effect of Trust Building Technology in Electronic Markets: Price Premiums and Buyer Behavior. MIS Quart. 26, 243–268 (2002) 5. Baker, S.M., Petty, R.E.: Majority and Minority Influence: Source-Position Imbalance as a Determinant of Message Scrutiny. J. Pers. Soc. Psych. 67, 5–19 (1994) 528 Y. Tong and Y. Zhong 6. Bearden, W.O., Lichtenstein, D.R., Teel, J.E.: Comparison Price, Coupon, and Brand Effects on Consumer Reactions to Retail Newspaper Advertisements. J. Retailing 60, 11–34 (1984) 7. Berger, I.E., Mitchell, A.A.: The Effect of Advertising on Attitude Accessibility, Attitude Confidence, and the Attitude-Behavior Relationship. J. Consumer Res. 16, 269–279 (1989) 8. Braun, K.A.: Postexperience Advertising Effects on Consumer Memory. J. Consumer Res. 25, 319–334 (1999) 9. Brown, J., Broderick, A.J., Lee, N.: Word of Mouth Communication within Online Communities: Conceptualizing the Online Social Network. J. Interactive Marketing 21, 2–20 (2007) 10. Buda, R., Zhang, Y.: Consumer Product Evaluation: The Interactive Effect of Message Framing, Presentation Order, and Source Credibility. Internat. J. Management 9, 229–242 (2000) 11. Chaiken, S., Liberman, A., Eagly, A.H.: Heuristic and Systematic Information Processing within and beyond the Persuasion Context. In: Uleman, J.S., Bargh, J.A. (eds.) Unintended Thought, pp. 212–252. Guilford, New York (1989) 12. Dellarocas, C.: The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms. Management Sci. 49, 1407–1424 (2003) 13. Dholakia, R.R., Sternthal, B.: Highly Credible Sources: Persuasive Facilitators or Persuasive Liabilities? J. Consumer Res. 3, 223–232 (1977) 14. Eagly, A.H., Chaiken, S.: The Psychology of Attitudes. Harcourt, Brace, & Janovich, Orlando (1993) 15. Erb, H.-P., Bohner, G., Rank, S., Einwiller, S.: Processing Minority and Majority Communications: The Role of Conflict with Prior Attitudes. Pers. and Soc. Psych. Bull. 28, 1172– 1182 (2002) 16. Grewal, R., Cline, T.W., Davies, A.: Early-Entrant Advantage, Word-of-Mouth Communication, Brand Similarity, and the Consumer Decision-Making Process. J. Consumer Psych. 13, 187–197 (2003) 17. Hoch, S.J., Ha, Y.W.: Consumer Learning: Advertising and the Ambiguity of Product Experience. J. Consumer Res. 13, 221–233 (1986) 18. Hoffman, D.L., Novak, T.P.: Marketing in Hypermedia Computer-Based Environments: Conceptual Foundations. J. Marketing 60, 50–68 (1996) 19. Hsieh-Yee, I.: Research on Web Search Behavior. Library and Inform. Sci. Res. 23, 167– 185 (2001) 20. Hutchinson, J.W., Alba, J.W.: Ignoring Irrelevant Information: Situational Determinants of Consumer Learning. J. Consumer Res. 18, 325–345 (1991) 21. Kerr, N.L.: When is a Minority a Minority? Active versus Passive Minority Advocacy and Social Influence. European J. Soc. Psych. 32, 471–483 (2002) 22. Lee, J., Park, D.H., Han, I.: The Effect of Negative Online Consumer Reviews on Product Attitude: An Information Processing View. Electronic Commerce Res. and Applications. 7, 341–352 (2007) 23. Littlejohn, S.W.: Theories of Human Communication, 7th edn. Wadsworth Publishing Company, California (1997) 24. Lutz, R.J.: Changing Brand Attitudes through Modification of Cognitive Structure. J. Consumer Res. 1, 49–59 (1975) 25. Maas, A., Clark III, R.D.: Hidden Impact of Minorities: Fifteen Years of Minority Influence Research. Psych. Bull. 95, 428–450 (1984) 26. Mackie, D.M.: Systematic and Nonsystematic Processing of Majority and Minority Persuasive Communications. J. Pers. Soc. Psych. 53, 41–52 (1987) Are We Trapped by Majority Influences in Electronic Word-of-Mouth? 529 27. Maheswaran, D., Caiken, S.: Promoting Systematic Processing in Low Motivation Settings: The Effect of Incongruent Information on Processing and Judgment. J. Pers. Soc. Psych. 61, 13–25 (1991) 28. Mehta, A.: Advertising Attitudes and Advertising Effectiveness. J. Advertising Res. 40, 62–72 (2000) 29. Moscovici, S.: Toward a Theory of Conversion Behavior. Advances in Experimental Soc. Psych. 13, 209–239 (1980) 30. Nemeth, C.J.: Differential Contributions of Majority and Minority Influence. Psych. Rev. 93, 23–32 (1986) 31. Petty, R.E., Cacioppo, J.T.: Communication and Persuasion: Central and Peripheral Routes to Attitude Change. Springer, New York (1986) 32. Tanford, S., Penrod, S.: Social Influence Model: A Formal Integration of Research on Majority and Minority Influence Processes. Psych. Bull. 95, 189–225 (1984) 33. Worchel, S., Grossman, M., Coutant, D.: Minority Influence in the Group Context: How Group Factors Affect when the Minority will be Influential. In: Moscovici, S., MucchiFaina, A., Maass, A. (eds.) Minority Influence, pp. 97–114. Nelson-Hall, Chicago (1994) 34. Wright, P., Rip, P.D.: Product Class Advertising Effects on First Time Buyers’ Decision Strategies. J. Consumer Res. 7, 708–718 (1980) 35. Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. In: Proc. of the Eighth WWW Conference, Toronto, Canada, pp. 1361–1374 (1999) Leveraging a User Research Framework to Guide Research Investments: Windows Vista Case Study Gayna Williams Principal Program Manager, Developer Division, Microsoft* Abstract. During the development of Windows Vista we had the opportunity to invest in new methods to understand user behavior. We leveraged standard usability methods to work on feature areas during development; however, we had to invent and adapt new approaches to measure holistic experiences. In this area user research methods are evolving, due to the integration of technologies and changes in the definition of a successful experience. While considering the methods that suited our needs, a user research framework was created. This helped us manage investments in research activities. The framework is organized along two dimensions: perspective and time. Perspective refers to the breadth of the experience being considered: ‘narrow’ defines a focus on an individual feature area or small product area, and ‘broad’ defines a focus on an integrated experience. Time can indicate either a product cycle or real time. For product cycle most of the research is spent on the evaluation of the designs of the features and experiences related to predicting user behavior for a particular release of a product, whereas real time is our research investment into understanding how products are used in the wild without our intervention. Each quadrant of the two-dimensional framework highlights different research methods and purposes. It’s important to realize that the value of the framework comes from the integration of findings that provides a rich holistic picture of our users to ultimately guide product decisions. This paper describes some of the methods that were evolved and created during the development of Windows Vista and their relationship to the user research framework. The methods described in the paper include user experience score-carding, measurement of desirability, and the impact of the consumer adoption program. These methods continued to be used today in the development of Windows 7. 1 Introduction One challenge in working on an operating system is that it contributes to a computer experience in more than one significant way. It provides stand-alone experiences and it contributes substantially to extended experiences. When developing Windows Vista, the user research team had the challenge of considering how to provide deep insight in particular areas to impact product creation, while also playing a critical role in understanding the quality of the holistic experience. Many parts contribute to the ecosystem that users of Windows experience. Our role was to understand this holistically and to drive that understanding into product development. * Previously User Research Director, Windows Vista. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 530–539, 2009. © Springer-Verlag Berlin Heidelberg 2009 Leveraging a User Research Framework to Guide Research Investments 531 Before diving into the research framework I’d like to provide some context. The Windows organization is a large organization that includes over 5000 people. Windows Vista was not the only product to be produced by this organization as the core components contribute to other products (e.g., Windows Server), or service packs (e.g., Windows XP SP2). The user experience team is a centralized organization and consists of user research, design, and user assistance. During the Windows Vista development cycle these three groups stepped up their accountability to raise the importance in the product experience. For design, there was an increase in demonstrating how design could lead product definition, and also the continuous engagement from product inception through to marketing messaging and branding. For user assistance there was a change in focus from being a team that documents help to work towards the goal of being a continuous publishing group with a data driven content strategy. And for user research we stepped up to consider how to drive accountability for user experience across teams in a holistic way, which is what this paper describes further. I was the user research director of the research team. The team was approximately 24 people in size. The team consisted of 14 user researchers who did much of the iterative work with product teams and also owned particular experiences, plus some researchers also owned particular projects or research methods that benefited the whole team, two anthropologists, one project manager, one product planner (a role focused on identifying opportunities through working with internal and external partners), two data analysts and a small development team (4 people) for building tools and managing the instrumentation projects. The mission for the team was, “Deliver outstanding Windows client and partner experiences that build upon a deep understanding of people”. It is important to understand the deliberate decision to use the word ‘people’ in the statement. So often in usability we are focused on “the user,” defining the user as the person actually using the system in contrast to the customer, the person responsible for purchasing the system. However, we realized that we needed to understand many people within the ecosystem in order to deliver the right experience. We also realized that to succeed in delivering on what people perceive to be the Windows experience required assisting partners to also understand how to make their part of the experience better. For example, most people experience Windows when they purchase a new computer. The original equipment manufacturer (OEM) is responsible for part of the experience, and when a user sees the desktop for the first time it is a joint responsibility of Windows and the OEM. One option when we started to work on Windows Vista was to map user researchers to particular teams within the Windows Client Organization and then manage their workload in strict alignment with the teams. However because we were organized as a central team we had the opportunity to set our own priorities and focus areas. Our position gave us a unique opportunity to have a perspective across the whole experience. We decided to leverage this position to drive product development from the perspective of holistic customer understanding. Achieving this perspective required us to approach our work differently and invest resources differently. The necessity of getting this rich view shaped the user research framework. 532 G. Williams 2 User Research Framework There are many tried and tested research methods for improving usability during a product cycle. It is encouraging that today more user researchers succeed in implementing these methods throughout a cycle rather than being brought in at the end of a cycle to validate decisions or create recommendations where time is too short to respond to them. More recently, challenges have arisen as product teams seek to know more about the emotional experiences of users with their products. Research methods have been evolving to accommodate this need, but when we were working on Windows Vista (2002-2006) these methods were less established than they are now. At best the methods available then were for evaluating finished products, not products in development. How were we to get a sense of the overall emotional experience of Windows Vista three years ahead of the product release? Another challenge was that the Windows engineering team is very large (i.e., a few thousand).We knew most individual tasks with Windows Vista would require people to use UI elements produced by several teams, who might be in more than one division. We decided to step up to the challenge by creating a list of tasks to serve as a common reference point for much of our research work. More about how we created and leveraged these tasks is discussed later in the paper. So to tackle the research work for Windows Vista we needed to invest in narrowly focused but deep usability activities that aligned with the product teams, and we had to work in lock-step with the schedule of the product teams. However, to do our jobs well and deliver on an outstanding holistic experience, we knew we had to invest in broader expansive research that tackled some of the challenging new wave of requirements that target ‘experience’. These two types of investment are represented in the first row of the user research framework in Figure 1, which focuses on user research work that aims to predict user behavior with the product when it is complete. I define first a narrow perspective, the investment in mapping research activities to the requirements of product engineering. The second perspective is broad by comparison, extending across engineering insofar as it is encountered throughout the product, or across feature areas as it spans possible feature boundaries. Although the second is a critical investment if user research is to deliver ‘experiences,’ it had less history to guide us in successfully integrating it into the product development cycle. To achieve the product, holistic understanding we went outside the product design cycle to study people currently using Windows. We realized that the current use of the product was influenced by how the previous version had been created, this knowledge provides tremendous value. We invested time in instrumentation, survey techniques, and field work throughout the development cycle, and as a result were always able to learn from current user behaviors as one input to informed decision-making. We were careful not to influence the current users we were learning from by revealing information obtained from other users exposed to prototypes or other information that we were using to help assess the future behavior of Windows Vista users. Leveraging a User Research Framework to Guide Research Investments 533 Fig. 1. User Research Framework The final perspective that was used to ground our understanding of people and their behaviors was obtained through an investment in collecting ethnographies. In this area we removed the restrictions of considering our product and even our company’s technologies from the research brief and focused on audiences and situations that were considered to be of future interest. This work provided a rich context of understanding the world in which our finished products would be situated (or not!). The latter two perspectives are key elements in the lower row of the framework. This is referred to as Real Time because for the most part we are not influencing behavior when collecting observations by trialing software or scenarios with the users. The lower left cell represents a narrow product perspective, meaning we define the audience we’re engaging with by the product we are interested in, whereas the lower right cell is a life perspective as we do our best to observe the situations and audience without making a priori decisions as to which products we wish to see used. The framework allowed us to consider how to invest resources in tool and method development, and how to invest our research time. Below, I go through the framework in greater detail. 3 Applying Windows Vista User Research to the Framework 3.1 Product Cycle – Narrow: Features/Product Area This is the part of the framework that I felt is best understood through wellestablished methods, such as iterative usability testing, heuristic evaluations, and paper prototyping. As a team we were significantly invested in this work, which maps most closely to how the engineering teams work; when user researchers (URs) are well integrated with the teams they work with, it is easier for the research to have an impact. The URs were assigned to work with a particular themed area (e.g., Photos & Video, or Storage) which usually mapped to a particular product team (and sometimes 534 G. Williams to more than one as the elements of a themed area were distributed across teams). However, URs were aware that while they worked in detail with the teams on their areas, they were also accountable for driving the broader holistic goals of the product through broader user tasks. This work then started to enter the larger experience work. Occasionally the UR might be in conflict with the team with which they worked most closely in order to drive for a change that would benefit a high level task –one of the challenging aspects of being a user researcher is being able to maintain a trusted relationship with a team while driving a user issue. Product teams who may have had a dedicated user research for their work previously had to adjust to the UR driving a broader charter. 3.2 Product Cycle-Broad: User Experiences Cross Product Experiences An operating system supports and enables many different tasks for many different audiences (home user, enterprise, IT specialist). We needed to prioritize which areas of Windows Vista we cared about most. This required setting up criteria to evaluate different tasks. The criteria we considered included task frequency, known task difficulty (based on our previous understanding of customer challenges), and newly enabled tasks. We were able to leverage previous research work from field, lab, and ethnographies to help us in identifying these tasks. We also had to define what a task was and how this differed from focusing on features. A task is defined in user language; to complete the task the user may use several features. For example, for a user to download 50 photos and send her favorite to her friend in email involves multiple features provided by several different teams (devices, photo download, file management, email setup, email send/receive, add attachment, receive an attachment). Although we were responsible for creating the list of tasks, we also needed buy-in from the individual teams that these were indeed tasks that they wanted to address with their features. We had to work with the development teams to create success criteria that were acceptable to development and research (e.g., 80% of participants should complete the task successfully) plus we had to incorporate some leeway in success criteria that would allow for emotional evaluation of experience, and customer site visit feedback. We established a list of over 160 tasks that we tracked during the development of Windows Vista. This list of tasks provided significant benefit to the development team throughout the development cycle. Frequently a group such as the performance test team would ask for the top scenarios that Windows Vista was targeting, and our list was defined with sufficient detail to be an actionable starting point for responding to such requests. This list of tasks provided a critical starting point for driving accountability into engineering through the creation of a User Experience Scorecard (Figure 2). We iterated several times on creating a scoring system that Fig. 2. Scorecard example teams would respond to and Leveraging a User Research Framework to Guide Research Investments 535 that we felt reflected the experience we were on track to ship. This included allowing heuristic assessment of plans and specifications to be incorporated during the early stages of development, and evaluation as the milestones progressed in the development cycle. We used a three color rating system (red, yellow, green). Because we had detailed task success measures we used these as the primary assignment of the color rating. However if an additional data source provided insight that suggested a serious user problem we took that into account in the rating - mostly this would prevent a task that was completed successfully in a lab situation from being green if field insights suggested challenges. We made sure that anything less than green was accompanied by actionable bugs to be addressed. At the end of most ship cycles are quality gates that must be met for a product to be released. Typically quality gates are test quality measurements, such as reliability, performance, and security. The rigorous procedures of our scorecarding method enabled us to adapt our scorecard to become part of the quality gate process. From our list of tasks we defined a subset that were considered ‘ship-stoppers’: critical tasks for which a failure to meet the criteria would lead to having the bugs and issues examined at a more detailed and senior review level to insure that things were fixed. The User Experience scorecard and task list was used to drive many product changes, but identifying and eliminating task seams was a major benefit of the method. There were challenges for the user researchers (URs) in driving the issues. Most of the URs worked closely with particular feature teams, but not with all the teams that might contribute to a particular experience, so to stay up to date on relevant feature plans required additional effort. This was one of our bets in terms of allocating resources--we decided the benefit for user experience of investing the time to track experiences across the product that mapped to user tasks would be greater than additional individual depth in particular niche areas. It was better to make the effort required to work across experiences than to leave to the users to work across siloed experiences after the product shipped. With this investment we uncovered many seams that might not have been addressed in the product had we not done this. Emotional Connection We were very much aware that an emotional experience is inextricably tied to satisfaction with a product, especially in the consumer market. At the time of working on Windows Vista we found methods that had been trialed to evaluate desirability, but the challenge was how to use these methods during the development phase and how to make the insights actionable. Benedek and Miner [1], members of the research team, created a desirability toolkit to help us evaluate these experiences. The tool is very simple but it provided insightful data that the URs and the designers could collaboratively turn into impactful action. After interacting with a product or prototype, a user is asked to select from a list of words, those words that they associate with the experience. The UR then discusses with the user why they selected particular words. The most important part of the assessment is the user’s explanations. We used this tool during lab usability tests, benchmarks and in the field (with an automated version of the tool). Miner and Benedek were responsible for mining the themes across the studies and assessing how the particular lab study (or situation) may have influenced the selection of words and explanations. This was another example of how results pulled from the product-deep work were used to inform the broader experience of the product. 536 G. Williams Productivity Early in the development of Windows Vista we were asked what we could do to demonstrate improved productivity with the use of Windows Vista. As we unpacked what productivity meant in the context of Windows Vista use, we realized that it would be a difficult concept to measure for enterprise workers. After exploring the topic further with field representatives who work with our enterprise customers, we learned that they were less interested in demonstrating improved productivity than in knowing how we would assist people in climbing the learning curve as they deployed the new operating system. This insight led to a different approach to understanding how the enterprise learning experience should unfold. The feedback told us that we didn’t need to build everything into the product to remove a seam—in this case, a companion experience could solve the problem. We developed an Enterprise Learning Framework (ELF) [2]. Working with enterprise users, we reviewed what should be included in the ELF. It included a time line (week before deployment, day of, day after, etc.), and what topics would be relevant to which users at that time. The topics then hooked up to the help system. In working through the topics we leveraged the insights URs had from working deep with feature teams to determine what would be useful to mention or areas in which users might have difficulties. We provided guidance to User Assistance about content to cover, something that may not otherwise have been included. To accompany the website a whitepaper was produced by Nowicki [3] which leveraged her learning from the research and creation of the framework. A triumph of the framework was in responding to enterprise customers’ requests that it include both Office information and Windows Vista information, since they roll out desktops (Office and Windows), not individual pieces. So again the investment of tackling productivity as a cross-product experience paid off rather than requesting for teams to think in an individual way about productivity. 3.3 Real Time – Narrow: Product Customer Feedback Panel We wanted to know a lot about user’s behavior with Windows XP. To understand how a very large group of users were using Windows XP, we invested in creating the Windows Customer Feedback Panel [5]. Windows XP itself is not instrumented so we built a research platform that allowed us to upload data collecting tools to PCs over the Internet, which would then collect data from those machines on a regular basis. We recruited participants who were willing to allow us to gather instrumented data from their computers and associate it with other data sets related to them to enable us to ask follow-up questions. The advantage of leveraging a panel of known users is that we could profile characteristics of usage that applied to particular user groups. We could also survey this set of users on a needs basis. Because of the flexibility of the research platform we could adjust the data we were collecting–when new questions came up we could adjust the data collection tools to provide answers. As with all research, it was important to consider the sample bias. Although we were gathering data from more than 10K users, we knew they were slightly more technical users than average and were installing the data collection tools on home machines more often than work machines. This research platform allowed us to gather data we were not previously able to get, and was extremely good at gathering hardware, Leveraging a User Research Framework to Guide Research Investments 537 configuration, file arrangements, and installed apps types of data. Understanding these dimensions of usage aided teams, such as the application compatibility team and the performance team, that we would not have been able to help using our regular user research. Send-a-smile We were now understanding what was happening on panelists’ PCs. We also wanted to capture spontaneous emotional moments that arise during use. We created a tool called ‘Send-a-smile’ as part of the customer feedback toolset (Figure 3). A green smiley and red frown face were situated in the system tray (icons near the clock). When a user had a good moment she could click on the green face, or after a bad moment click on the red face. These would pop up a window with a text field for entering a comment and a screenshot of what was visible on the desktop. Comments and images were returned to us through the feedback tool. It was a very engaging tool to use, but as with all verbose feedback tools it was challenging to review all the feedback and turn into actionable suggestions or bugs to be entered into the bug data base [6, January 2007]. We used this Send-a-smile to gather feedback on the use of Windows XP and Vista, but it was product agnostic and was also leveraged by other teams at Microsoft. Fig. 3. Send-a-smile Customer Adoption Panel Windows has extensive beta programs, but most people who participate in them, especially in operating system betas, tend to be people who are relatively technicallyminded. We knew it was important to include less-technical home users in the beta programs to get a rounded view of bugs and feedback on experiences. The research team owned the consumer adoption program for Windows Vista and had participants from throughout the US and overseas [6, January 2007]. The research program called “Living with Windows Vista” was an opportunity to provide all the usual bug feedback required from betas while also leveraging our research toolkit to evaluate additional dimensions of experience and use. This panel was relatively small (approximately 30 families) but we had deep engagement with them. The panel was invaluable because not only did it generate unique bugs but also we used our observations to change features, and several default settings based on problems encountered. 3.4 Real Time – Broad: Life Studies Exploratory Ethnographies The real time–broad cell covers an area of work that is basically understanding people without intervention, or with as little intervention as possible. Two anthropologists were on the research team. They were tasked with exploratory work. Their research areas were broad and not necessarily tied to technology; they could consider areas that might benefit from the introduction of technology. This set of work included research 538 G. Williams in different geographical locations to understand emerging markets, the digital divide, the relationship between baby-boomers and their parents, dawn to dusk lives of small businesses, and other topics [6, 2005]. Each of the projects was uniquely designed, for example some were single day shadowing of participants, others were longitudinal over the course of a year. The challenge with this type of work was to allow sufficient freedom in the research to truly enable the discovery about peoples’ lives. The second challenge was how to share the insights from this work with the engineering and product marketing team. One strength of the work was in creating team member empathy for people and situations. This led to devising creative ways to communicate the findings, including photo-story narrations at the espresso coffee stand [4], posters in the buildings, and engagement through the creation of events related to the populations studied. Not every observation leads to feature improvement, but it does provide the rich perspective of peoples’ lives and their contexts that enable team members to realize how our products or potential products might fit into those lives. Customer Engagements Getting product teams involved in site visits is an activity that has been promoted for many years. We invested time in programs that weren’t research but which were designed to drive empathy with customers. When team members are empathetic with their users they are more receptive to recommendations from user research. We created programs entitled, ‘Know-a-knowledge-worker’ or ‘Get-to-know-an-IT-Pro’. Senior team members and executives were assigned a participant and provided with sufficient guidance to be able to conduct a site visit, and then spent time with a targeted customer to understand what they did in their day-to-day life at work, traveling to interact with them in their work context. The participants were not recruited based on their use of a particular technology, but based on what they did at work. We kept the requirements on reporting back from the visits to a minimum, as at the end of the day the benefit was to have more than 100 people on the team who had experienced what their customers would be doing. It was clear that the visits made an impression as reference to the visits would come up in discussions during development. 4 Summary Although I have mapped the research that took place for Windows Vista to the User Research Framework, it is important to realize that the quadrants didn’t act in isolation. It was the rich integrated insights gained from working in all these ways that provided us with a holistic view of our customers. The framework also provided a way of describing the size of investment in each quadrant. Teams get anxious when they can’t clearly see a connection between research and specific feature impact. Even with this framework the majority of resources are invested in narrow product work – that is the most obvious opportunity to impact product, however we know from our experience that paying attention to the other quadrants has valuable impact on the experience in less obvious ways. Many of the programs and tools established during the Windows Vista development cycle have continued to be used and enhanced by the Leveraging a User Research Framework to Guide Research Investments 539 Windows 7 team, by other user research teams at Microsoft, and even to assist in the marketing of Windows Vista. References 1. Benedek, J., Miner, T.: Measuring Desirability: New methods for evaluating desirability in a usability lab setting. In: Usability professional association, 2002 Conference Proceedings (2002), http://www.microsoft.com/usability/UEPostings/ DesirabilityToolkit.doc 2. Enterprise Learning Framework, http://www.microsoft.com/technet/desktopdeployment/bdd/elf/ welcome.aspx.com/presspass/features/2005/apr05/ 04-04Ethnographer.mspx 3. Nowicki, J.: Learning Windows Vista in the Work Place. Microsoft white paper (2006), http://download.microsoft.com/download/d/9/b/d9b9587e-427c439f-b90c-69a1e643de4c/windows_vista_workforce.doc 4. Steele, N., Lovejoy, T.: Engaging our audiences through Photostory. Visual Anthropology Review 20(1) (2004) 5. Windows Customer Feedback, http://wfp.microsoft.com/Welcome.aspx 6. Microsoft Presspass, Large-Scale Research Project Aims to Make Windows Vista Useful, Fun for All (January 2007), http://www.microsoft.com/presspass/features/2007/jan07/ 01-29LivingWithVista.mspx; Making technology conform to people’s lives (April 2005); Interview with anthropologist Tracey Lovejoy, http://www.microsoft.com/presspass/features/2005/ apr05/04-04Ethnographer.mspx; Developers Tap Real-Life Families to Find Out What Consumers Really Want from Windows Vista (January 2007), http://www.microsoft.com/presspass/features/2007/jan07/ 01-29VistaDevelopers.mspx A Usability Evaluation of Public Icon Interface Sungyoung Yoon, Jonghoon Seo, Joonyoung Yoon, Seungchul Shin, and Tack-Don Han Dept. of Computer Science, Yonsei University, 134, Seodaemun-Gu, Seoul, 120-749, Republic of Korea {freesiz,jonghoon.seo,jyyoon}@msl.yonsei.ac.kr, seungchul.d.shin@samsung.com, hantack@kurene.yonsei.ac.kr Abstract. Existing image codes interface needs additional visual marker and explanation of the service. To overcome these limitations, there were some researches to use a public icon as an anchor. The public icon is human-readable and does not need additional visual marker or explanation. In this paper, we carried out the usability evaluation of the public icon interface with a high-fidelity prototype in comparison to the existing image code. In addition, we analyze user preferences from the results. From the analysis, we perceived that the public icon interface is better to use in the public because the public icon interface is familiar with people and doesn’t need additional materials or much cognitive load and are in good harmony with current environments. Keywords: Public icon, pictogram, color-based image code, image code, barcode. 1 Introduction Due to the advance and popularity of camera phones, various forms of image code such as 1D barcode [1], 2D barcode [2] and image code [3], [4] are being widely used. Image code takes a role as a service pointer. It is useful for the users to use them to access to the services such as personal information exchange, advertising and electronic commerce. However, image code is not human-readable, therefore it needs visual marker and explanation of the service. For these reasons, image codes are not popular as expectation. To overcome these limitations, there were some researches on the public icon [5] and sign boards [6-7] to use them as an anchor instead of image codes. The public icon interface is based on ISO 7001: Public Information Symbols [8] published by the ISO. The public icon in the public icon interface takes a role as a service pointer to obtain context-awareness service. The public icon interface has several features. Namely, the public icon is human-readable and does not need additional visual marker or explanation. However, usability evaluation of the public icon interface for users has not been performed. In this paper, we are going to carry out usability evaluation with high-fidelity prototype. Additionally, on the basis of the result of the usability evaluation, we are going to analyze the user’s needs and preferences. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 540–546, 2009. © Springer-Verlag Berlin Heidelberg 2009 A Usability Evaluation of Public Icon Interface 541 2 Analysis In this section, our goal is to analyze technical issues regarding the data type and coverage. Existing barcodes such as 1D barcode [1], QR code [2], ColorCode [4] and pictorial image code [9] put the information in the code itself as a form of the index or full-text and offer various types of service to the user after decoded. Each image code has its own data type and data size [10]. QR code can store full-text up to 300 characters [11], [12]. Color Code and pictorial image code can store a data of index which can distinct up to 17 billion [13]. Data size of the image codes can be increased by attaching extra rows or columns without exceeding technological limit. Therefore, these various types of image codes have enough data coverage of an anchor to provide a specific service for the user. On the other hand, the public icon only indicates what it means after recognized [5]. Additionally, it is sure that data capacity of the public icon can’t be increased as opposed to the other image codes. Therefore, the public icon interface has to reference location information essentially on the purpose of providing appropriate service to the user. It is reasonable to supplement the shortcoming of no-information on itself, as there are no two pictograms which indicate the same meaning at the same location. However, this method has a potentiality of error because some places such as a bus stop have two pictograms of the same meaning in an area of GPS error range. We can remove the potentiality of error by using contexts from the user’s own mobile devices such as a mobile phone or a PDA. There are lots of contexts in the user’s own devices. A user profile, schedule, text or multimedia messages and phone records are good examples. Almost all cases of two-bus-stop pictograms in a GPS error range just indicate the opposite directions. Considering the example we already implemented [5] and other context-aware applications [14], it is not difficult to deduce appropriate direction from the context in the devices. Though this method drops the service performance, the service can be expanded for a variety of purposes and become convenient for use. Table 1 shows what we explained in this phase. Table 1. Comparison of Public Icon and other Image Codes [9] Item Data Type Data Size Expansion Weakness QR code Full-text Varies by size (about 100 to 300 character) O Blurring due to out of focus (a) barcode ColorCode/Pictorial Code Index Varies by size (17 billion patterns for 5x5 code) O Color variation by illumination (b) QR code (2D) Public Icon Index Just indicates what it means X Shortcoming of data storage, low performance (c) Color Code (d) Pictorial image code Fig. 1. Examples of various image codes 542 S. Yoon et al. 3 High-Fidelity Prototyping and Experimental Evaluation To evaluate usability, we improved high-fidelity prototyping which we used in our previous study [5]. Then, we did usability evaluation with eighteen undergraduate university students by using high-fidelity prototype. 3.1 High-Fidelity Prototyping We evaluated usability with eighteen participants by using high-fidelity prototype. We used UMPC in which installed a public icon decoding program we implemented in our previous study [5]. The UMPC (SONY VAIO VGN-UX57LNS) has Intel core 2 solo processor (1.2 Ghz), 1GB DDR2 SDRAM and 1.3 mega-pixel CCD on the back side [15]. We used AForge.NET 1.5.0.0 for the recognition algorithm which is a C# framework designed for developers and researchers in the fields of Computer Vision and Artificial Intelligence [16]. In Figure 2, an overall system flow including recognition algorithm is depicted. Framework Public Icon Recognition Service Type Service Code Capture Icon False Binarization User Profile Context Location, Time, etc.. Hobby, fondness … Noise Filtering Parameters Extract Code Area User defined parameter Rotation Predefined Query Maker Recover from geometry distortion Query Down sampling Yes No PID < 3? No Selection adaptive result In the database OK? Yes PID = PID + 1 No HTTP Request Success? Yes Web Service Service request Service Daemon Integrity Check OK? Service Error Yes Load service data HTTP Response No Fig. 2. Flowchart of the public icon interface system: (top left) recognition algorithm, (top right) framework, (bottom) web service A Usability Evaluation of Public Icon Interface 543 In addition, samples of the public icon (Fig. 3) which stick on real-size pictogram board were provided. We choose three types of the public icon on the purpose of comparing the public icon interface to the other image code interfaces. The first type is a public icon. The second type is a ColorCode on the public icon. In this case, the public icon does not work as a visual marker, but ColorCode does. The reason for choosing the color code instead of QR code is that ColorCode is in harmony with diverse materials [13]. The last one is a pictorial image code. Fig.2 is an example of three types. 3.2 Experimental Evaluation Eighteen participants in the experiment are undergraduate students who are not specialized in computer engineering and all of them have no previous experience of using an image code interface before. All of them have camera phones. They were allowed five minutes to experiment. Once the users execute the public icon decoding program (Fig. 4), the users can see real-time preview image. If an icon or an image code is shown on the preview image, the users can see contents related to the icon or the image code by pressing ‘recognition’ button. We assumed that the samples which have the same public icon provide the same service. (a) Public icon (b) Public icon + ColorCode (c) Pictorial image code Fig. 3. Samples of the public icon: (a) Public icon (b) ColorCode on public icon (c) pictorial image code Fig. 4. Public icon decoding program: (left-top) preview image, (left-bottom) current location, ‘image recognition’ button and user information, (right) service contents 544 S. Yoon et al. Table 2. Results of the survey. (Bad/Low: 0-33.3%, Average/Medium: 33.4-66.6%, Good/High: 66.7-100%) Item Harmony with environments Familiarity Physical effort Cognitive load Understanding what it means Speed performance Public Icon Good Good Medium Medium Good Bad ColorCode + Public Icon Average Average Medium Medium Good Average Pictorial Code Average Bad Low Good Average Average After the experiment, we did a survey about the response of the users by using the questionnaire. We permitted multiple answers. According to the survey, the users felt that the public icon interface is in harmony with environments and familiar with themselves. Besides, they replied that physical effort and cognitive load to use the public icon interface are not worse than the others. Moreover, they answered that understanding what it means is very easy. However, they appealed that speed performance of the public icon interface is worse than the others. Table 2 shows results. 4 Discussion Physical Restrictions, Speed Performance. The public icon interface has a drawback of service providing related to the limitation of data size because the public icon interface cannot store any information on it. On the other hand, all of the existing image codes can save information on itself. To complement this demerit, the public icon interface needs a GPS sensor. It seems reasonable. That is because in most cases, there are not two same public icons with different purpose at the same location. Moreover, it is encouraging that GPS phones became popular. However, there is a possible exception. Bus stop is a good example. As we mentioned above, we can remove the exceptional cases by using contexts from the user’s own mobile devices. However, we should care about the decreasing of performance as result of this method. Harmony, Cognitive Load, Understanding. The public icon is already familiar to everyone because most of people have seen it for a long time. Therefore people feel that the public icon is in harmony with its environments of facilities. However, the familiarity makes it more difficult to recognize its existence compare to the other image codes which have a unique pattern or colors. That is because the unique pattern usually attracts people. However, understanding the public icon what it means is less difficult than pictorial image code. Even though people who have not used the image codes by camera phone already use the public icon as a sign post. From this, we can infer that there is a weak trade-off between cognitive load and understanding what it means. User Interface. The public icon interface decodes an image from camera only one time when a user presses the ‘Recognition’ button. This method decreases hit-rate comparing to the decoding method which decodes all preview images repeatedly on A Usability Evaluation of Public Icon Interface 545 the background. However, an important characteristic of public icons is that they are installed with several other public icons at a location. Therefore, current decoding method is appropriate to avoid decoding the public icon which a user does not want to get service from. Additionally, most of the public icons in the real world not in a map stick on a high position and big. This attribute makes it possible to recognize a public icon from distance and share service with more people at a time. On the other hand, the user in close proximity has to look up the public icon in the high position through his phone inconveniently. For the public purpose. As we spoke several times in this paper, the public icon interface has a lot of merits in comparison to the other image codes in spite of a few demerits. Human-readable, familiarity, harmony with the environments and no necessity of additional explanation are good examples. Considering these advantages, the public icon interface is outstanding for the public oriented context-awareness service. It makes sense that everyone can acquire the public service only indicating the public icon with camera added phone after downloading a simple service application. It can also save time and troubles of the government, facilities or society. However, a few users may puzzle to predict what service will be provided. Besides, it is possible problem that a variety of services are mapped to a single public icon. In this case, enough advertisement of the service from the facilities in charge will be a good answer. 5 Conclusion and Future Works In this paper, we have performed the usability evaluation of the public icon interface which uses a public icon as a visual marker. From the result of the evaluation, we found out the disadvantages of the public icon interface such as difficulties of confirming what public icon will provide a service and predicting what service will be provided before decoding. Also, necessity of GPS is a drawback, but GPS phone will be in popular soon because of its usefulness [17]. In addition, the participants pointed out lower speed performance than the other image codes. On the other hand, the public icon interface does not need additional materials or explanation on facilities to show its meaning or existence. In addition, the public icon is already in harmony with the environment. In addition, public icon is much bigger than the other image codes so that a user can decode it from much far places. Therfore, numerous users can access to the service conveniently, even if in a crowded place. From this research, we discovered that the public icon interface is suitable to the public objective. Also we found what should be improved to use the public icon as an service pointer instead of the image codes. To improve the public icon interface, we have plans to research on the invisible image code to attach public icons and the recognition algorithm for multiple public icons. Acknowledgement This work was performed in the research project ‘Mobile computing based ContextAwareness Service Framework’ (11052) supported by the Seoul R&BD Program. 546 S. Yoon et al. References 1. Barcodes, ISO Standards (September 2007), http://www.iso.org/iso/en/ CombinedQueryResult.CombinedQueryResult?queryString=bar+code/ 2. Info Plant Conducts Survey on QR Code. DigInfo (September 21, 2005), http://www.diginfo.tv/archives/2005/09/21/ info_plant_conducts_survey_on_2.html (September 2008) 3. Pavlidis, T., Swartz, J., Wang, Y.P.: Information Encoding with Two-Dimensional Bar Codes. IEEE Computer 25(6), 18–28 (1992) 4. ColorCode. ColorZip Media Inc. (September 2008), http://www.colozip.com 5. Kim, D., Shin, S., Yoon, S., Han, T.: Public Icon Communication Service System: A Human Readable Tag Interface for Context-awareness Service. In: The 10th International Conference on Ubiquitous Computing Poster 6. Rohs, M., Roduner, C.: Camera Phones with Pen Input as Annotation Devices. In: Proceedings of the Workshop PERMID (2005) 7. Roduner, C., Rohs, M.: Practical issues in physical sign recognition with mobile devices. In: Strang, T., Cahill, V., Quigley, A. (eds.) Pervasive 2006 Workshop Proceedings (Workshop on Pervasive Mobile Interaction Devices, PERMID 2006), Dublin, Ireland, May 2006, pp. 297–304 (2006) 8. ISO 7001:1990 Public information symbols, ISO Standards, http://www.iso.org/iso/iso_catalogue/catalogue_tc/ catalogue_detail.htm?csnumber=13565 (September 2008) 9. Cheong, C., et al.: Pictorial Image Code: A Color Vision-based Automatic Identification Interface for Mobile Computing Environments. In: Proceedings of the Eighth IEEE Workshop on Mobile Computing Systems and Applications (WMCSA 2007) (February 2007) 10. General EAN.UCC Specification v.6.0 (November 2006), http://www.ean.se/GSV6.0/HTML_Files/ Document_Library/Bar_Code/00103/index.html 11. International Standard ISO/IEC 18004, Information technology: Automatic identification and data capture techniques - Bar code symbology - QR Code, 1st edn., ISO/IEC (2000) 12. Info Plant Conducts Survey on QR Code, DigInfo, http://www.diginfo.tv/archives/2005/09/21/ info_plant_conducts_survey_on_2.html (October 2006) 13. Cheong, C., Kim, D.-C., Han, T.-D.: Usability Evaluation of Designed Image Code Interface for Mobile Computing Environment. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 241–251. Springer, Heidelberg (2007) 14. Chen, G., Lotz, D.: Survey of Context-Aware Mobile Computing Research, Dartmouth Computer Science Technical Report TR2000-381 15. VAIO Online Korea UX57LN/S, http://vaio-online.sony.co.kr/CS/handler/vaio/kr/ VAIOPageViewStart?PageName=notebook/enjoy/ UX57LNSN.icm&ProductID=UX57LNSN 16. AForge.NET Framework, http://www.aforgenet.com/framework/ 17. Kaasinen, E.: User Needs for Location-aware Mobile Services. Personal and Ubiquitous Computing 7(1), 70–79 (2003) Little Design Up-Front: A Design Science Approach to Integrating Usability into Agile Requirements Engineering Sisira Adikari, Craig McDonald, and John Campbell Faculty of Information Sciences and Engineering, University of Canberra ACT 2601 Australia {Sisira.Adikari,Craig.McDonald,John.Campbell}@canberra.edu.au Abstract. In recent years, Design Science has gained wide recognition and acceptance as a formal research method in many disciplines including information systems. Design Science research in Human-Computer Interaction is not so abundant. HCI is a discipline primarily focusing on design, evaluation, and implementation where design plays the role as a process as well as an artefact. In this paper, we present a design science approach using “Little Design Up Front” to integrate the User-Centred Design perspective into Agile Requirements Engineering. We also present the results of two agile projects to validate the proposition that incorporating UCD perspective into Agile Software Development improves the design quality of software systems. Keywords: Design Science, Agile Requirements Engineering, Usability. 1 Introduction Usability has been identified as an important quality attribute of software products [1] but it has been classified as one of the Non-Functional Requirements (NFR) in Requirements Engineering (RE) [2]. A key aspect of traditional requirements engineering is to have formal requirements specified prior to software development. It also concentrates on functional requirements and ensuring that the developed products meet such requirements, rather than other NFR, which are considered less important [3]. The designation of usability as a less important NFR impacts the design because a reduced focus on user-centredness creates systems acceptance problems, necessitates rework and negatively impacts end user experience [4]. Current trends of software development increasingly favour agile development methods over plan-driven Software Engineering (SE) processes to better handle rapid change of stakeholder, business and technology requirements. Despite the success of Agile Software Development (ASD) reported by many software development organizations, none of the major agile development methods explicitly incorporate usability engineering practices in respective software development processes [5]. Recent research reported by Düchting et al. [6] involving two of the most popular agile models revealed that both had significant deficiencies in handling user-centered requirements. Accordingly, it is evident that ASD processes lack usercentric perspectives in their development methods and this likely to propagate usability J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 549–558, 2009. © Springer-Verlag Berlin Heidelberg 2009 550 S. Adikari, C. McDonald, and J. Campbell issues into finished products. As a result, end-user experience and satisfaction are directly affected. In this paper, we present a design science approach using “Little Design Up Front” to integrate User-Centred Design (UCD) perspective into Agile Requirements Engineering. We also present the results of two agile software projects to validate the proposition that incorporating UCD perspective into ASD improves design quality of software systems. 2 Design Science Design Science is a problem solving paradigm which aims at creating and evaluating innovative artifacts that address important and relevant organizational problems [7]. According to March and Smith, there are two fundamental design science processes: ‘build’ and ‘evaluate’, and four types of products namely: ‘constructs’, ‘models’, ‘methods’ and ‘instantiations’. A construct forms the vocabulary of a domain, a model is a set of propositions expressing relationships among constructs, a method is a set of steps used to perform a task, and an instantiation is the realization of an artifact in its environment [8]. 2.1 Design Science Research for Information Systems In recent years, design science has gained a wide recognition and acceptance as a formal research method in many disciplines including Information Systems (IS). The Design Science paradigm has its roots in engineering and the sciences of the artificial [9]. Simon made the distinction between natural science and design science in that the former is concerned with how things are and the latter is concerned with how things ought to be [9]. Behavioral Science research is an origin of natural science and aims at developing and justifying theories which explain or predict organizational human phenomena surrounding the analysis, design, implementation, management, and use of information systems. On the other hand, Design Science Research (DSR) aims at creating innovations that define ideas, practices, technical capabilities, and product through the analysis, design, implementation, management, and use of information systems [7],[8]. As creating design solution artifacts for an important problem in Human-Computer Interaction (HCI) is a combined effort of both behavioral science and design science paradigms, these two research paradigms complement each other. Behavioral Science attempts to “understand” the problem. Design Science attempts to “solve” it. According to Iivari [10], design science is a contrast to natural-behavioral science research which aims at finding empirical regularities, whilst design science aims at building artifacts. Hevner et al. [7] presented an IS research framework that combined both behavioral-science and design-science paradigms for understanding, executing, and evaluating IS research (see Figure 1). In the IS research framework, the Environment defines the scope of the problem domain that includes organizations, technology, and people. IS Research is the research effort conducted by applying behavioral science, through the use of theories that explain or justify the business problem, and design science to address the building and evaluation of artifacts designed to meet the identified business need. The Knowledge Base encompasses all the theoretical foundations, including the research methodologies and the kernel theories. Little Design Up-Front 551 Fig. 1. Information Systems Research Framework [7] In a recent paper, Hevner [11] further elaborated the IS research framework in terms of three inherent DSR cycles to enhance the understanding of high quality DSR in IS. Hevner pointed out that these three research cycles must be present and clearly identified in a DSR project. These research cycles within the IS research framework are shown in Figure 2. According to Hevner, the relevance cycle connects the contextual environment of the research project with the design science activities. The main focus of relevance cycle is to capture problem to be addressed or requirements for the research and to provide design solution artifacts to the environment for study and evaluation in the application domain. The rigor cycle connects the design science activities with the knowledge base that informs the research project. That is, it ensures innovation by providing existing knowledge to the research. The knowledge base consists of foundations, existing experiences and expertise, and existing artifacts and processes. The main focus of rigor cycle is to provide applicable knowledge for design science activities Fig. 2. Design Science Research Cycles [11] 552 S. Adikari, C. McDonald, and J. Campbell and to feedback the updated knowledge to enrich the knowledge base. The internal design cycle iterates between core activities of building and evaluating the design artifacts and processes of the research. The main focus of the design cycle is to create, evaluate and refine design artifacts until a satisfactory design is achieved. For this research project, we have deployed the information systems research framework associated with DSR cycles (Figure 1 and 2 above). 3 Agile Requirements Engineering and Practice The main distinction between Agile Requirements Engineering (RE) and traditional RE is that the former welcomes rapidly changing requirements even late in the software development process and the latter gathers and specifies requirements up front prior to software development. The dynamic nature of most organizations makes continuously changing requirements normal, hence it is difficult to gather and specify complete, stable and accurate requirements up front. Rapid changes in competitive threats, stakeholder preferences, development technology, and time-to-market pressures make pre-specified requirements inappropriate [12]. A recent empirical case study [13] on ten software development organizations identified seven key agile RE practices namely: Face-to-face communication over written specifications, Iterative requirements engineering, Requirement prioritization, Managing requirements change through constant planning, Prototyping, Test-driven development, and Use review meetings and acceptance tests. These practices are in line with agile principles [14] such as: Satisfy the customer through early and continuous delivery of valuable software; Welcome changing requirements even late in development; Deliver working software frequently; Business and developers work collaboratively throughout the project; Build projects around motivated individuals; Face-to-face conversation as the most efficient and effective method of communication; Working software is the primary measure of progress; Promote sustainable development; Continuous attention to technical excellence and good design; Simplicity; Self-organizing teams and Regular reflections to become more effective. 4 User-Centred Design Integration with Software Engineering In HCI literature, there are many user-centric methods and techniques that have been proposed to assist the production of usable, useful, and desirable software products [15], [16], [17]. Software product development still follows through a software development process where functionality is considered as the main priority. According to the literature, SE and HCI are largely two distinct communities. For the IEEE [18], SE is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software where as HCI is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use in a social context, and with the study of major phenomena surrounding them [19]. Importantly, HCI is by no means considered a central topic in SE and usability is considered as one of many non functional requirements and quality attributes [20]. Little Design Up-Front 553 As recently reported in the literature, there is a growing interest to incorporate user-centric perspective into SE practice so that usability awareness is widely known and software products become more user-centred and usable [21], [22]. This integrated approach is known as Human-Centred or User-Centred Software Engineering. Seffah et al. [23] discussed some of the most relevant HCI and SE integration frameworks and highlight their strengths and weaknesses as well as the level of objectivity in integrating HCI methods and principles for different software engineering methods. The frameworks they summarized were found to be useful for usability and software specialists who are interested in the development of methodologies and standards, who have researched or developed specific user-centered design techniques or who have worked with software development methodologies. Generally these frameworks provided insights in how to integrate user-centered best practices and user experiences with software engineering methodologies [20]. Discussing the importance of user modeling and usability modeling for user-centred software requirements, Adikari et al. [4] presented a framework for integrating ISO 13407 process model into a typical software development life cycle. The particular emphasis of the framework was its framework has the potential for defining the requirements to be more user-centred and task-oriented with lesser turnaround time. 5 Little Design Up-Front Traditional RE stresses that requirements elicitation and specification required to be complete up front prior to the software development. Similar to traditional RE, UCD also assumes that contextual research and design will take place at the start of the project to provide detailed design information for subsequent development and evaluation. In agile environments, this assumption does not hold. Rather than defining requirements up front, agile software processes seek to follow an evolutionary approach to define requirements during the course of analysis, which is known as JustIn-Time (JIT) requirements analysis. As far as UCD is concerned, there should be at least a little contextual information available to support creating the design artifacts and proceed further. Therefore, JIT design approach is quite difficult and not appropriate for creating UCD focused artifacts in agile environments. As a practical solution, we propose Little Design Up Front (LDUF) - an approach providing only required details of UCD information as needed to support the analysis and design in agile iterations. The objective is to provide only sufficient LDUF information to support the popular agile JIT analysis and design so that UCD perspective can be considered without overloading existing agile practices. The LDUF is drawn from design solutions created in a DSR setting using environmental requirements, and applicable knowledge from the knowledge base as shown in Figure 3. Figure 3 is similar to Figure 2 except that the relevance cycle in Figure 2 was replaced with Requirements (an input from environment to the DSR) and Solutions (an input from the DSR to the Environment) and these changes are in line with Figure 1 where Requirements and Solutions are represented by Business Needs and Application in the Appropriate Environment respectively. Moreover, the emphasis of Create Little Design Artifacts has been shown within DSR. 554 S. Adikari, C. McDonald, and J. Campbell Fig. 3. Design Science Research Cycles with LDUF 6 Research Design This research consisted of two agile projects. The first project was conducted as the baseline reference to compare the project incorporating user-centred design. The first project was a typical agile project with three iterations and its’ research design is shown in Figure 4. Fig. 4. Research design – Agile project 1 There were three defined roles in project 1 namely Product Owner, Agile Coach, and Agile Team. The product owner provided abstract level requirements for both projects and participated in tasks related to the product backlog analysis. The agile coach provided directions to the project and was responsible for removing any process impediments. The agile team made the necessary decisions to achieve goals of respective iteration and carried out the software development. The second agile project was directed by a different agile coach and two usercentred designers worked with a new agile team in the design analysis providing the LDUF. The research design of the second agile project is shown in Figure 5. 6.1 Research Process There were two different agile teams and agile coaches for project 1 and 2 and there were no other cross-over of resources excepting the product owner, who provided business requirements of an accommodation management system for both projects. The product owner was part of the each big team and was available in all iterations for requirements verification and validation. Project 1 ran first with three iterations. The first iteration was focused on requirements analysis and setting up the product Little Design Up-Front 555 Fig. 5. Research design – Agile project 2 backlog. The agile team worked under the guidance and direction of the agile coach to produce working software. At the end of the first iteration, the agile team formally presented the first version of the working software to the product owner for assessment. In consultation and agreement with the product owner, the product backlog was then updated and the second iteration was planned. The second and third iterations were conducted in the same way as the first one based on similar agile settings and principles. At the end of the third iteration, the product owner formally assessed the final product delivered by the first project (P1) and signed off. The second project was run in a similar fashion to the first project except that two user-centred designers were allowed to consistently engage with the team to put forward LDUF for design analysis. They worked very closely with the agile team and the product owner to create and assess paper-based artifacts in support of analysis, verification and validation. At the end of the third iteration, the product owner and usercentred designers formally assessed the final product delivered by the second project (P2) and signed off. 7 Product Evaluation The product P1 and P2 were subjected to one-on-one usability evaluations with 16 participants who were randomly drawn from a large pool of users. The evaluation ran 556 S. Adikari, C. McDonald, and J. Campbell in three stages. Firstly, the product P1 was evaluated with first 8 participants (group U1). Secondly, the product P2 evaluated with the second 8 participants (group U2) followed by first 8 participants (U1). Thirdly, the product P1 was evaluated with second 8 participants (U2). We followed this approach to minimize any learning effect bias in the assessments. We used a number of scenarios to guide the participant to go through the product and complete assigned user tasks. After the evaluation, each participant was given a pack containing the Product Reaction Cards (PRC) [23] and System Usability Scale (SUS) [24] questionnaire. Participants were then asked to reefer to the PRC and tick all words that best described their user experience with the product and then to prioritize five of those words that they thought were most descriptive of the product. We then asked them to reason out why they chose those five words. We used Product Reaction Cards to aid participants to think deeply about their interaction experience. Finally the participant was requested to fill out the SUS questionnaire. 8 Results A repeated measures Analysis of Variance (ANOVA) was conducted on the data for each question from the SUS questionnaire for both products. The aim was to determine if there was a significant difference of agreement of user groups in relation to their interaction with Product P1 and P2. Table 1 shows the mean response values for each product, statistical significance levels, the difference between mean values, and the percentage of change in mean values. Table 1. Analysed results: Product P1 and P2 According to the above Table, for each question, there is a positive difference of agreement from users for Product 2. Importantly, the agreements for the Q3, Q4, and Q7 are of significant difference (as the P<0.05 regarded as significance) yielding that Product 2 is easy to use (Q3), easy to learn (Q7) and product 1 requires additional support to be able to use (Q4). Table 2 shows the SUS percentage for P1 and P2 reported by each participant. The mean of P1 = 47.31 and P2 = 52.95 and the difference is 5.61. The SUS usability difference of P1 and P2 is 11.92%. Accordingly product P2 found to be of better usability than product P1. Little Design Up-Front 557 Table 2. SUS values for Product P1 and P2 9 Conclusion This paper presented the results of two agile projects to validate the proposition that incorporating a User-Centred Design perspective into Agile Software Development improves design quality of software systems. A design science approach using “Little Design Up Front” was used to integrate the User-Centred Design perspective into development process. The results show that users find products developed using this approach easier to learn, easier to use and require less support to be able to use. Acknowledgements We would like to thank Andrew Boyd, Donna Spencer, Dulan De Silva, Evan Laybourne, Narayanan Srinivasan, Rowan Bunning and Sandun Kodithuwakku for their advice/ support in this research project. References 1. Jokela, T.: Guiding Designers to the World of Usability: Determining Usability Requirements through Team Work. In: Human-Centred Software Engineering – Integrating Usability in the Software Development Lifecycle, pp. 61–78 (2004) 2. Sommerville, I.: Software Engineering, p. 122. Pearson Addison-Wesley, England (2004) 3. Bevan, N.: Design for Usability. In: Proceedings of HCI International, pp. 762–767 (1999) 4. Adikari, S., McDonald, C., Lynch, N.: Design Science-Oriented Usability Modelling for Software Requirements. In: Proceedings of HCI International, pp. 373–382 (2007) 5. Kane, D.: Finding a Place for Discount Usability Engineering in Agile Development: Throwing Down the Gauntlet. In: Proceedings of the Agile Development Conference, pp. 40–46 (2003) 558 S. Adikari, C. McDonald, and J. Campbell 6. Düchting, M., Zimmermann, D., Karsten, N.L.: Incorporating User Centered Requirement Engineering into Agile Software Development. In: Proceedings of HCI International, pp. 58–67 (2007) 7. Hevner, A., March, S.T., Park, J., Ram, S.: Design Science Research in Information Systems. MIS Quarterly 28(1), 75–105 (2004) 8. March, S.T., Smith, G.F.: Design and Natural Science Research on Information Technology. Decision Support Systems 15(4), 251–266 (1995) 9. Simon, H.: The Sciences of the Artificial. MIT Press, Cambridge (1996) 10. Iivari, J.: A Paradigmatic Analysis of Information Systems as a Design Science. Scandinavian Journal of Information Systems 19(2), 39–64 (2007) 11. Hevner, A.: A Three Cycle View of Design Science Research. Scandinavian Journal of Information Systems 19(2), 87–92 (2007) 12. Merisalo-Rantanen, H., Tuunanen, T., Rossi, M.: Is Extreme Programming Just Old Wine in New Bottles: A Comparison of Two Cases. Journal of Database Management 16(4), 41– 61 (2005) 13. Cao, L., Ramesh, B.: Agile Requirements Engineering Practices: An Empirical Study. IEEE Software 25(1), 60–67 (2008) 14. Manifesto for Agile Software Development, http://agilemanifesto.org/ 15. Nielsen, J.: Usability Engineering. Academic Press, San Diego (1993) 16. Mayhew, D.J.: The Usability Engineering Lifecycle. Morgan Kaufmann, San Francisco (1999) 17. Constantine, L.L., Lockwood, L.A.D.: Software for Use: A Practical Guide to the Models and Methods of Usage-Centered Design. Addison-Wesley, Boston (1999) 18. IEEE: IEEE Std 610.12-1990. IEEE Standard Glossary of Software Engineering Terminology. IEEE, New York (1990) 19. ACM SIGCHI: Curriculum for Human-Computer Interaction. ACM Press, New York (1992) 20. Seffah, A., Desmarais, M.C., Metzker, E.: HCI, Usability and Software Engineering Integration: Present and Future. In: Human-Centered Software Engineering - Integrating Usability in the Software Development Lifecycle, vol. 8, Springer, Heidelberg (2005) 21. Zimmermann, D., Grötzbach, L.: A Requirement Engineering Approach to User Centered Design. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 360–369. Springer, Heidelberg (2007) 22. Seffah, A., Gulliksen, J., Desmarais, M.D. (eds.): Human-Centered Software Engineering Integrating Usability in the Development Process. Springer, Heidelberg (2005) 23. Benedek, J., Miner, T.: Measuring Desirability: New Methods for Evaluating Desirability in a Usability Lab Setting. In: Proceedings of UPA, Oralando, Florida (2002) 24. Brook, J.: SUS: A Quick and Dirty Usability Scale. In: Jordan, P.W., McClelland, I.L., Thomas, B. (eds.) Usability Evaluation in Industry, pp. 18–195. Taylor and Francis, London (1996) Aesthetics in Human-Computer Interaction: Views and Reviews Salah Uddin Ahmed1, Abdullah Al Mahmud2, and Kristin Bergaust3 1 Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway 2 Department of Industrial Design, Eindhoven University of Technology, The Netherlands 3 Faculty of Art, Design and Drama, Oslo University College, Norway salah@idi.ntnu.no, a.al-mahmud@tue.nl, kristin@anart.no Abstract. There is a growing interest of aesthetics issues in Human-Computer Interaction (HCI) in the recent days. In this article we present our literature review where we investigate where and how aesthetics has been addressed by the HCI researchers. Our objective is to find out the sectors in HCI where aesthetics has a role to play. Aesthetics in HCI can be the common interest that involves both art and technology in HCI research to facilitate from each others discipline in the form of mutual interaction. Keywords: Aesthetics, interaction, usability, art and technology. 1 Introduction Interaction with computer addresses many issues such as ergonomics, design, human factors, usability, aesthetics and so on. Interaction includes the main interface with which human communicates with computer. Thus it is one of the most potential sectors in computing where aesthetics is applicable. As the technology advances and the processing power of computers increases along with the reduced memory cost, the user interfaces of applications are getting richer with much more details including aesthetics elements that enhance the user experiences. In this article we present our literature review where we investigate where and how aesthetics has been addressed by the Human Computer Interaction (HCI) researchers. Our objective is to find the sectors of common interest between artists and technologists and improve collaboration between them to facilitate from each other’s background knowledge and skills in the form of a mutual symbiosis. An important topic in HCI, aesthetics, however is an abstract concept and is used in a wide range of contexts in association with other words denoting varying meanings. Even in art and humanities discipline the word is used in various contexts and hence it is understandable that researchers in technology and computing might use the word incoherently. Besides, there are also different views on applying aesthetics in human computer interaction. In this article we would like to present from the literature how the word J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 559–568, 2009. © Springer-Verlag Berlin Heidelberg 2009 560 S.U. Ahmed, A. Al Mahmud, and K. Bergaust aesthetics has been used by the researchers, which contexts it is used and what meaning it is used for. We would like to categorize the field of human computer interaction into some clusters of thematic areas in order to understand how the field looks like when we view it from the point of aesthetics. The objective is to understand the meaning of aesthetics, remove confusions and understand the use, applications and possibilities of aesthetics in the context of human computer interaction. The use of aesthetics in computing specially in human computer interaction is getting lot of attention in the recent days. Call for paper of CHI 2008 (conference on Human Factors in Computing Systems) with a slogan, “Art, Science, Balance” reveals how the influence of aesthetics and art in general is well recognized by the research community recently. But as the area is still very new and as HCI researchers are not experts on arts and aesthetics there is room for misunderstanding. Therefore an investigation of our current understanding and a snapshot of the field viewed from the perspective of aesthetics will help the researchers to better understand the position and role of aesthetics in human computer interaction. Besides aesthetics being a link between technology and art will help future researchers to open up more possibilities of collaboration between art and human computer interaction. The rest of the paper is organized as follows. Section 1.1 presents our research background and section 1.2 describes the research method. Section 2 gives definitions of aesthetics and its importance in HCI. Section 3 provides the result of the review. Section 4 concludes the paper with discussion and our viewpoints reflecting the result obtained from the review. 1.1 Research Background The research is conducted as part of the SArt project [1] inside the Software Engineering group at the Department of Computer Science in the Norwegian University of Science and Technology. Our ultimate objective is to propose, assess, and improve methods, models, and tools for software development in art context while facilitating collaboration with artists. As part of research in SArt we have performed a literature review in order to conceptualize the intersection of software and art [2]. From the review we have discovered that the intersection involves people from diverse background and interest, for example art critics, software developers, educators and so on. This is also visible in the software dependent art projects that SArt group members have participated. From our experience of working with artists we have seen that technologists and artists have different viewpoints. While working with artists we have observed that there is a difference in the way how artists and technologists work as well as differences in how the make meanings of different concepts. In a field where there are both technologists and artists involved there are many issues that has to be considered such as having a common language, good collaboration, coexistence of artistic process and technical process, evaluation of products both from technical and aesthetic viewpoint etc. This work is in line with our investigation of how we can improve the collaboration between artists and technologists by working together and learning from each other. As a part of that goal here we would like to look into the field of human computer interaction to see how aesthetics, a concept from arts has Aesthetics in Human-Computer Interaction 561 been applied by the technologists to discover the field from artists’ and technologists’ common interest where aesthetics is the bridge between them. 1.2 Research Method We followed a systematic review of the literature published in recent conference proceedings and journals which are easily accessible from ACM, IEEE and Springer digital library. We started the review by following Kitchenham’s principles [3]. At the beginning, we selected papers by reading the titles and abstracts, later we have modified the process as we could not find lot of papers by only title and abstracts. We searched the entire article with the keyword search and if a match was found we read the abstract and part of the article that has the keyword. We discarded articles that has only mentioned the word merely and has no significant meaning or relationship between aesthetics and the main work of the paper. Otherwise we read the full article and listed in our final selection. A total of 67 papers were selected from the following top-level conference proceedings limited with the years mentioned below: • Conference on Human Factors in Computing Systems CHI – Years (from 2008 to 1997) • Conference on Designing Pleasurable Products and Interfaces, DPPI – Years (2007, 2005, 2003) • DIS conferences – with no time limits • TOCHI – with no time limits • NordiCHI and HCII – with no time limits The keywords that we have used are a. aesthetics b. aesthetic. We have chosen both ‘aesthetics’ and ‘aesthetic’ in order not to miss the mentions of word such as aesthetically. 2 Aesthetics and HCI New conferences and workshops that explore various aspects of the consequences of the integration of computing into everyday life are emerging. New terms are entering the HCI vocabulary such as emotion, pleasure, experience, expression, and indeed aesthetics. Aesthetics is increasingly viewed as a key issue with respect to interactive technology. But aesthetics is a general term and it is often used in association with other terms. Here we present what it actually means and how it is defined. The Oxford English Dictionary presents the following definitions for aesthetics: i) the science that treats the conditions of sensuous perception; and ii) the philosophy or theory of taste, or of the perception of the beautiful in nature and art [4]. In the preface of Encyclopedia of Aesthetics which is one of the most comprehensive references on this topic, Kelly states [5], “Ask contemporary aestheticians what they do, however, and they are likely to respond that aesthetics is the philosophical analysis of the beliefs, concepts, and theories implicit in the creation, experience, interpretation, or critique of art.” Britannica encyclopedia defines it as philosophical study of the qualities that make something an object of aesthetic interest and of the nature of aesthetic 562 S.U. Ahmed, A. Al Mahmud, and K. Bergaust value and judgment [6]. To define its subject matter more precisely is, however, very difficult. It also mentions that self-definition could be said to have been the major task of modern aesthetics. 2.1 Why Aesthetics Human computer interaction started from computing field and now extending its scope in many other directions by including behavioral science, psychology, and sociology and so on. However, there are allegations that human technology interaction has focused almost exclusively in goal driven behavior in all work settings [7]. However, since computing expands its domain from workplace to pervasive and domestic environments, interest in aesthetics for designing is increasing in HCI. Gaver and Martin suggests that the importance of non-instrumental user needs, such as surprise, diversion, or intimacy should be addressed by technology [8]. Jordan proposed a hierarchy of such needs and claimed that – along with the functionality and usability of a system – different aspects of pleasure are important to enhance the user’s interaction with it [9]. 2.2 Types of Aesthetics In the context of HCI two types of aesthetics are mentioned in [10], classical aesthetics and expressive aesthetics. Classical aesthetics refer to traditional notions emphasizing orderly and clear design and expressive aesthetics to designs creativity and originality. Study shows that classical aesthetics is perceived more evenly by users whereas expressive aesthetics can vary depending on framing effects or different cultural and contextual stimuli [7]. Different kinds of quality dimensions are mentioned such as Ergonomic, Hedonic, Instrumental and Non-instrumental. Ergonomic quality comprises quality dimensions that are related to traditional usability, i.e. efficiency and effectiveness [11]. Hedonic quality comprises quality dimensions with no obvious relation to the task the user wants to accomplish with the system, such as originality, innovativeness, beauty etc. In other place, instrumental Quality and Non instrumental Quality is used in relation to perception of user experience which are basically the same as ergonomic and hedonic qualities [7]. 3 Aesthetics in Human-Computer Interaction Aesthetics come into play in many stages in many ways in HCI. In this section we would like to present the different areas and contexts where aesthetics is mentioned and addressed in our reviewed literature. From the review, we have identified some of the areas where aesthetics is used in HCI. Some of the key areas that we have identified are: Artifacts design, System design, Attractiveness and look and feel of User Interface (UI), Interaction with a system, Usability and user experience, Research methods for HCI. The following table lists the articles according to the particular addressed themes or areas of HCI. Aesthetics in Human-Computer Interaction 563 Table 1. Thematic subject areas where aesthetics has been addressed in HCI literature Themes Artifacts Design System Design Attractiveness and Look and Feel of UI Interaction with a system Usability and User Experience HCI Research methods Articles [12], [13], [14], [15], [8], [16], [17] [7] [17], [11], [18], [19], [17] What is addressed Design of artifacts and gadgets, evaluation of artifacts, environment related design of artifacts (ubiquitous computing). Software applications, tools, artistic software, games. [20], [21], [22] User interface of an application, mobile phone, web sites etc. Interactive art installations, museum guide, interactive learning system, ATM machines etc. Users’ feelings, emotion, usability. [23] [24] [25] [26],[27], [28], [12], [29] [7] [30], [27], [31], [32], [25] [33], [34] Research methods that considers aesthetics. 3.1 Artifact Design With the recent shift from narrow focus on work to a broader view of interaction industrial designers, communication designers, and newly minted interaction designers all began to play more important roles in the invention and development of new artifacts meant to address a broad set of problems and opportunities [14]. Future Information appliances design may benefit from considering aesthetics aspects of the gadgets [8]. How digital technologies might be employed in everyday settings can combine concepts from both work and plays and these devices often act not only as an emulator or information source instead creates a new form of appreciation both conceptual and aesthetic [16]. Often these devices are created by artists and are displayed as interactive and or digital art in the art galleries. Evaluation of Artifacts. Aesthetics is not only an issue during the design of these diverse computing devices/artifacts but also in the evaluation of these devices. After running a survey on heuristic evaluation match between design of ambient display and environments in [35], the authors assert, "The display should be pleasing when it is placed in the intended setting." Ubiquitous Computing. Areas such as ubiquitous computing, augmented reality, and physical computing have made it evident that the personal computer is just one out of many possible ways in which we can design how humans interact with computers [12]. The design of these devices should be carefully done so that they consider the contextual qualities of the environment such as aesthetics, emotions and aspirations whether they are place indoor in home, museums or outdoor in public places [15] [36] [34]. For example, when designing an ambient display, one should notice an ambient display because of a change in the data it is presenting and not because its design clashes with its environment [35]. 564 S.U. Ahmed, A. Al Mahmud, and K. Bergaust 3.2 System Design By system design, we mean the context of creating new tools or software applications. Hedonic quality plays a substantial role in forming users judgment of appeal and it should be explicitly taken into account when designing a software system [11]. In [17], authors presented a technique used for creating a new kind of tool for 3D drawing. In creative settings where innovation and novelty is sought, artists and technologists work together in close collaboration. 3.3 Attractiveness and Look and Feel of UI Aesthetics has been addressed by many articles in the attractiveness and look and feel of different websites. There is already a transition towards aesthetically pleasing interfaces and it will continue as more importance is placed on the aesthetics of a user interface, and as the proper tools are available to interface designers for creating such interfaces [20]. A theoretical framework for assessing the attractiveness of web sites has been introduced in [22] where as aesthetics is considered an issue for rating web sites in [23]. Aesthetic factors beyond usefulness and traditional usability are increasingly recognized as contributing to the overall success of a product or system [24] [25]. 3.4 Interacting with a System Aesthetics and interaction are interwoven concepts, rather than separate entities [26]. In aesthetics of interaction the emphasis shifts from an aesthetically controlled appearance to an aesthetically controlled interaction, of which appearance is a part. Aesthetics of interaction moves the focus from ease of use to enjoyment of the experience [27]. Mixed Reality and Virtual Reality. Design for mixed reality or virtual reality devices are driven by many contextual requirements of which aesthetics is an important part [36]. Artistic association is also important in virtual reality systems, “The curtain rain was chosen for its aesthetic qualities, both in terms of its striking visual image and sound, its asymmetric transparency, and not least, due to the artistic association of projecting a virtual desert into a curtain of water” [19]. Interactive Art. Interactive art is a new kind of art that is highly dependent on technology and user interaction. Often these kinds of artworks are illustration of interdisciplinary collaboration between research, design, craft and art and involve interaction with the user in a new or innovative way such as presented in the case of computational composite [12] or computational textile kit in [29]. In [28], the authors present a method developed to support design of innovative interactive technology. 3.5 Usability and User Experience The use of aesthetics is not always warmly accepted by HCI researchers. In fact, as mentioned in [37], it is often seen by many professionals as inversely proportional to easiness of use or usability. There has been continuous debate on conflicting impact of usability and aesthetics in HCI [38]. However, later many researchers worked on Aesthetics in Human-Computer Interaction 565 the positive impact of aesthetics. Now empirical evidence shows correlations between the perceived aesthetic quality of a system’s user interface and overall user satisfaction [31], [32] leading to claims that aesthetic design can be a more important influence on users’ preference than traditional usability [25]. Usability is important but good aesthetic design can overcome some deficit of usability problems. In fact, usability and user experience is related to the appraisal of a system which depends on both instrumental and non instrumental qualities [7]. 3.6 HCI Research Methods HCI has emerged as a design-oriented field of research, directed at large towards innovation, design, and construction of new kinds of information and interaction technology. Three accounts have been named regarding the design theory such as the conservative account, the romantic account, and the pragmatic account of which pragmatic account is the one that considers the issues such as creativity, craft, aesthetics [33]. HCI researchers have adopted approaches based on traditions of artistdesigners. Thus new methods have been developed in HCI such as cultural probes whose purpose is to inspire the creation of appropriate pleasurable even provocative designs [39]. 4 Discussions and Conclusion From the review we have seen where and how aesthetics has been used in context of HCI. The outcome reveals us a picture of the relationship between aesthetics and HCI The consideration of aesthetics is visible in many sectors of HCI, from artifact design to research methods for collecting user data or evaluating artifacts. What we see from the review is that the most common use of aesthetics in HCI refers to visual aesthetics or expressive aesthetics. The conflict with usability also accounts in case of expressive aesthetics. We believe that the contradiction arose since we referred to only visual or expressive aesthetics and compared its effects with usability. But aesthetics as a philosophy is a wide concept rather than only the visual or static beauty of interfaces. It refers to the feelings associated with the use and interaction with a system. In this case aesthetics of interaction is not conflicting with usability; rather usability is a part of aesthetics of interaction. Having high expressive aesthetics and less usability can affect the emotion of the user negatively, thus the aesthetics of interaction. So proper aesthetics of interaction should define where and how expressive aesthetics will be included and how much they will act aligned with usability, and overall user experience of the user effecting their emotion positively. In this paper, we have put together different contexts of use of aesthetics in HCI to present the reader different meanings and views that we attach with aesthetics in HCI. As technology will advance giving us more options to use more expressive and innovative interaction with computers, aesthetics in HCI will get even more attention in the future. Proper understanding of the meaning, value and impact would help the future researchers to be more conscious about the role of aesthetics and possibly eliminate confusions around it among them. Aesthetics can in that way bring together 566 S.U. Ahmed, A. Al Mahmud, and K. Bergaust artists, designers and technologists with creativity, inspiration and engagement in a collaborative and multidisciplinary milieu inside human computer interaction. References 1. SArt Project, at Norwegian University of Science and Technology, http://prosjekt.idi.ntnu.no/sart/ 2. Ahmed, S.U., J.L., Trifonova, A., Sindre, G.: Conceptual framework for the intersection of software and art. In: Braman, J., Vincenti, G., Trajkovski, G. (eds.) Handbook of Research on Computational Arts and Creative Informatics, Information Science Reference (2009) 3. Kitchenham, B.: Procedures for Performing Systematic Reviews. Keele University Technical Report TR/SE-0401 and NICTA Technical Report 0400011T.1 (2004) 4. Oxford English Dictionary, http://www.oed.com/ 5. Kelly, M. (ed.): Preface to Encyclopedia of Aesthetics, vol. 1. Oxford University Press, New York (1998) 6. Encyclopedia Britannica, http://www.britannica.com 7. Mahlke, S., Thüring, M.: Studying antecedents of emotional experiences in interactive contexts. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, San Jose, California, USA (2007) 8. Gaver, B., Martin, H.: Alternatives: exploring information appliances through conceptual design proposals. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 209–216. ACM, The Hague, The Netherlands (2000) 9. Jordan, P.W.: Designing pleasurable products. Taylor & Francis, London (2000) 10. Hartmann, J., Angeli, A.D., Sutcliffe, A.: Framing the user experience: information biases on website quality judgement. In: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pp. 855–864. ACM, Florence, Italy (2008) 11. Hassenzahl, M., Platz, A., Burmester, M., Lehner, K.: Hedonic and ergonomic quality aspects determine a software’s appeal. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 201–208. ACM, The Hague, The Netherlands (2000) 12. Vallgårda, A., Redström, J.: Computational composites. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 513–522. ACM, San Jose, California, USA (2007) 13. Lehn, D.: Engaging constable: revealing art with new technology. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 1485–1494. ACM, San Jose, California, USA (2007) 14. Zimmerman, J., Forlizzi, J., Evenson, S.: Research through design as a method for interaction design research in HCI. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 493–502. ACM Press, San Jose, California, USA (2007) 15. Tolmie, P., Pycock, J., Diggins, T., MacLean, A., Karsenty, A.: Unremarkable computing. In: Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, pp. 399–406. ACM Press, Minneapolis, Minnesota, USA (2002) 16. Gaver, W., Boucher, A., Law, A., Pennington, S., Bowers, J., Beaver, J., Humble, J., Kerridge, T., Villar, N., Wilkie, A.: Threshold devices: looking out from the home. In: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pp. 1429–1438. ACM, Florence, Italy (2008) Aesthetics in Human-Computer Interaction 567 17. Schkolne, S., Pruett, M., Schröder, P.: Surface drawing: creating organic 3D shapes with the hand and tangible tools. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 261–268. ACM, Seattle, Washington, United States (2001) 18. Santella, A., Agrawala, M., DeCarlo, D., Salesin, D., Cohen, M.: Gaze-based interaction for semi-automatic photo cropping. In: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 771–780. ACM, Montréal, Québec, Canada (2006) 19. Koleva, B., Taylor, I., Benford, S., Fraser, M., Greenhalgh, C., Schnädelbach, H., Lehn, D.v., Heath, C., Row-Farr, J., Adams, M.: Orchestrating a mixed reality performance. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 38– 45. ACM, Seattle, Washington, United States (2001) 20. Grossman, T., Kong, N., Balakrishnan, R.: Modeling pointing at targets of arbitrary shapes. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 463–472. ACM, San Jose, California, USA (2007) 21. Consolvo, S., McDonald, D.W., Toscos, T., Chen, M.Y., Froehlich, J., Harrison, B., Klasnja, P., LaMarca, A., LeGrand, L., Libby, R., Smith, I., Landay, J.A.: Activity sensing in the wild: a field trial of ubifit garden. In: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pp. 1797–1806. ACM, Florence, Italy (2008) 22. Hartmann, J., Sutcliffe, A., Angeli, A.D.: Investigating attractiveness in web user interfaces. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 387–396. ACM, San Jose, California, USA (2007) 23. Ivory, M.Y., Hearst, M.A.: Statistical profiles of highly-rated web sites. In: Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, pp. 367–374. ACM, Minneapolis, Minnesota, USA (2002) 24. Green, W.S., Jordan, P.W.: Pleasure With Products: Beyond Usability. Taylor and Francis, New York (2002) 25. Norman, D.A.: Emotional Design: Why We Love (Or Hate) Everyday Things Basic Books (2003) 26. Djajadiningrat, W., Gaver, W., Fres, J.W.: Interaction Relabelling and Extreme Characters: Methods for Exploring Aesthetic Interactions. In: Proceedings of the conference on Designing interactive systems: processes, practices, methods, and techniques (2000) 27. Isbister, K., Höök, K., Sharp, M., Laaksolahti, J.: The sensual evaluation instrument: developing an affective evaluation tool. In: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 1163–1172. ACM, Montréal, Québec, Canada (2006) 28. Ljungblad, S., Holmquist, L.E.: Transfer scenarios: grounding innovation with marginal practices. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 737–746. ACM, San Jose, California, USA (2007) 29. Buechley, L., Eisenberg, M., Catchen, J., Crockett, A.: LilyPad Arduino: using computational textiles to investigate engagement, aesthetics, and diversity in computer science education. In: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pp. 423–432. ACM, Florence, Italy (2008) 30. Tohidi, M., Buxton, W., Baecker, R., Sellen, A.: Getting the right design and the design right. In: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 1243–1252. ACM, Montréal, Québec, Canada (2006) 31. Tractinsky, N., Katz, A.S., Ikar, D.: What is beautiful is usable. J. Interacting with Computers 13, 127–145 (2000) 32. Lindegaard, G., Dudek, C.: What is this evasive beast we call user satisfaction? J. Interacting with Computers 15, 429–452 (2003) 568 S.U. Ahmed, A. Al Mahmud, and K. Bergaust 33. Fallman, D.: Design-oriented human-computer interaction. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 225–232. ACM, Ft. Lauderdale, Florida, USA (2003) 34. Ballegaard, S.A., Hansen, T.R., Kyng, M.: Healthcare in everyday life: designing healthcare services for daily life. In: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pp. 1807–1816. ACM, Florence, Italy (2008) 35. Mankoff, J., Dey, A.K., Hsieh, G., Kientz, J., Lederer, S., Ames, M.: Heuristic evaluation of ambient displays. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 169–176. ACM, Ft. Lauderdale, Florida, USA (2003) 36. Schnädelbach, H., Koleva, B., Flintham, M., Fraser, M., Izadi, S., Chandler, P., Foster, M., Benford, S., Greenhalgh, C., Rodden, T.: The augurscope: a mixed reality interface for outdoors. In: Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, pp. 9–16. ACM, Minneapolis, Minnesota, USA (2002) 37. Karvonen, K.: The beauty of simplicity. In: Proceedings on the 2000 conference on Universal Usability, ACM, Arlington, Virginia, United States (2000) 38. Norman, D.A.: The Design of Everyday Things. MIT Press, London (1998) 39. Wolf, T.V., Rode, J.A., Sussman, J., Kellogg, W.A.: Dispelling “design” as the black art of CHI. In: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 521–530. ACM, Montréal, Québec, Canada (2006) Providing an Efficient Way to Make Desktop Icons Visible Toshiya Akasaka and Yusaku Okada Department of Administration Engineering, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522 Japan {to48_a,okada}@ae.keio.ac.jp Abstract. Desktop icons allow users to access files/programs quickly. Some users are struggling to adapt their window management strategy to secure the visibility of desktop icons. In this paper, we propose an approach to provide users with an efficient way to make desktop icons visible in order to reduce the workload of window management. The approach was developed based on careful considerations to the context in which we aim to help users. The experimental results showed that out approach made the process of making desktop icons visible faster. However, it was not confirmed that the workload of window management was reduced. Keywords: Desktop icons, Display space management, Desktop Environment, Window management. 1 Introduction Personal computers have undergone remarkable changes over the last few decades. The huge gains in processor speed and physical memory size has allowed uses to run several programs simultaneously, each using different files. Also, the desktop environment which makes use of the graphical user interface has spread widely, allowing users to interact with programs and files through the use of windows, icons, and other desktop-mimicking entities. Thanks to the desktop environments, users today are enjoying the great ease and efficiency with which to use computers. Desktop icons, one of the features of the desktop environment, are used by some users as a means to access files/programs quickly. Office workers, for example, need to open different files/programs frequently as requested by emails, phone calls, or visitors. Such users often create desktop icons to represent commonly-used programs and temporary files. Located in the desktop, these desktop icons provide users with speedier accesses to important files than those located deep in the directory hierarchy. However, users using desktop icons are still having a difficulty in adapting their window management strategies to securing quick accesses to both windows and desktop icons. According to the survey [1] interviewing office workers about their display space management strategies, many of the interviewees reportedly exhibited the practice of keeping specific important desktop icons always visible. This is because once icons are covered users first need to make icons visible before accessing them. In order to keep desktop icons visible, they have to complete a lot of window operations J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 569–578, 2009. © Springer-Verlag Berlin Heidelberg 2009 570 T. Akasaka and Y. Okada and avoid automated window functions like maximize, imposing additional burden for window management. Furthermore, reserving an area for desktop icons would leave a smaller area than the entire screen for placing open windows. In short, the idea of keeping desktop icons always visible does help users access icons quickly, but it also prevents them from concentrating on the window management and eventually their primary tasks. The goal of our research is to investigate an efficient way to access desktop icons. Our approach is to develop a method that provides an efficient way to make desktop icons visible. With an efficient way to make desktop icons visible, users would not only be able to access icons more quickly, but also be free to place windows over desktop icons, reducing the workload of window management. In this paper, we describe our approach and the results of experiments to test its effectiveness. First, we describe in detail the context in which we aim to assist users. Then, our approach is described along with some comparisons with other approaches. Finally, we present the experimental results conducted to test the effectiveness of our approach. 2 Context The way to interact with desktop icons is different from user to user. The same user may also exhibit different ways of managing icons in different environments. An excellent way to help a certain user access icons in a particular setting may turn out to be useless once it is brought into use for a different user or in a different environment. We need to confine our approach to a certain context. In this chapter, we describe the context in which we aim to help users make desktop icons visible. 2.1 Icon Management Strategy Before describing the context, we must define the icon management strategy (IMS). An IMS is a strategy to access and place desktop icons. The following four factors constitute an IMS: 1. How to make desktop icons visible There are several ways to make desktop icons visible. Moving or iconifying windows is a common way to make icons visible. Special keyboard sequences such as <windows + D> in Microsoft Windows also make icons visible. Alternatively, some users opt to arrange windows carefully so that windows never cover important icons in the first place. 2. How to select a desktop icon Usually desktop icons are selected by clicking the mouse, but the keyboard can also select icons if the desktop window has the input focus. Voice instruction and finger touch can also be options as voice recognition and touch panel have become robust gradually. 3. Placements and sizes of desktop icons Users exhibit various ways to place desktop icons. The most usual placement would be to have icons lined up in either side of the screen. However, some users like to group important icons and place them at an arbitrary area. Size of icons also Providing an Efficient Way to Make Desktop Icons Visible 571 shows diversity. Mostly, users can choose a favorite size from the three options of normal, small, and large. 4. Types and number of desktop icons Virtually any file and program can be a desktop icon. Some users only place commonly-used programs and temporarily files that are useful for their primary tasks. Sometimes, on the other hand, desktop icons are picked up in the aesthetic viewpoint to decorate the desktop. The number of icons also varies greatly, ranging from a few to 30 or more. Note that the first two factors determine how to access desktop icons, and that the latter two determine how to place desktop icons. Although decision making on these four factors is greatly dependent on users, userindependent elements can also affect decisions. For example, as for how to choose desktop icons’ placements and sizes, some users like grouping important icons at an arbitrary place. However, they may alternatively have icons lined up in either side of the screen when they use a system with a small monitor or a low screen resolution; with a small screen real estate (pixels), desktop icons have a high chance of being covered by windows unless lined up in either side of the screen. Here, the display configuration affects decision making on how to choose desktop icons’ placements. Likewise, decision making on the other three factors can be affected by not only users but also user-independent elements. Figure 1 shows the user-independent elements affecting decision making on the four factors of icon management strategy and the relations of which element affect which factors. As shown in the figure, window manager, input device, display configuration, and mode of using computer are the user-independent elements affecting decision making and hereafter are called environment elements. The details of the four environment elements are as follows: 1. Window Manager A window manager is a module of system software providing users with services to control windows. A window manager is responsible for providing users with desktop icons as well as docks, taskbars, program launchers, wallpaper, etc. 2. Input Device This element relates to what types of input devices are available. Keyboard and mouse is a typical set of input devices, but as mentioned earlier there are other devices as well such as voice input device and touch panel. 3. Display Configuration Display configuration consists of the size and number of monitors as well as screen resolution. 4. Mode of Using Computer This element relates to the purpose and authority of a user using the computer. People use computers at work are likely to pursue a high productivity, which is not always the case with people using computers at home. Home users have the right to modify the desktop as they want, while office users using shared personal computers may have some restrictions. 572 T. Akasaka and Y. Okada Window Manager Input Device How to Select a Desktop Icon IMS Placements and Sizes of Desktop Icons Types and Numbers of Desktop Icons Display Configuration How to Make Desktop Icons Visible Mode of Using Computer Fig. 1. Each environment element affects several factors of Icon Management Strategy (IMS). This diagram shows which element contributes to the form of which factor. These environment elements directly or indirectly affect decision made about the four factors of IMS. We already mentioned one example of how the display configuration affects one of the factors. The other elements also have impacts on decision making on the factors. For example, some window managers do not have any special keyboard sequence to make desktop icons visible, affecting decision making on how to make desktop icons visible. Voice instruction, one way to select a desktop icon, is not viable unless any voice input device is available. Users using shared personal computers at work may not be able to create desktop icons as they want, restricting decisions made on types and number of desktop icons. Likewise, the environment elements affect decision making on the four factors in many other ways as indicated by arrows in Figure 1. 2.2 Context of Our Approach Having defined the icon space management, we are now in the position to describe the context in which we aim to help users. This means that we describe what type of icon management strategies we assume. As stated in the introduction chapter, our approach is to provide a new way to make desktop icons visible. This offers a new option for one of the four factors of IMS, namely, “How to make desktop icons visible.” Obviously, the three other factors need to be consistent in order to evaluate the effect of our approach. However, as we have seen above, decisions on these factors are affected by the environment elements as well. Therefore, we need to specify these elements as well as the three factors. We first specified the environment elements so that the resulting elements would represent a typical office environment. Note that Windows XP (or below) which is Providing an Efficient Way to Make Desktop Icons Visible 573 Environment Elements Window Manager Windows XP (or below) Input Device Keyboard and Mouse Display Configuration Monitor: 14’’ – 24’’ single monitor Pixels: 1024x768 – 1600x1200 Mode of Using Computer Using a personal computer at work Factors of IMS How to select a desktop icon Mouse click Placements and sizes of desktop icons Placement: lined up in either side Size: normal size Types and numbers of desktop icons Type: programs and temporary files Number: 10 – 20 Fig. 2. Context in which we aim to assist user consists of the environment elements and the three factors of ISM. Arrows between the elements and factors are based on those shown in Figure 1. actually an operating system is designated for the window manager. We used the term Windows XP to indicate its window manager which is tightly integrated with the operating system. Next, we went on to specify the three factors of IMS. We did this so that the resulting strategy would be plausible under the specified environment elements. Figure 2 shows the specified elements and factors, which hereafter we collectively call the context. 3 Approach The context defined in the previous chapter provided us with a basis based on which we investigated a way to make desktop icons visible without hindering users’ window management. To achieve this, we developed the application called Icons Space Saver. We decided to develop an application rather than modifying the underlying window manager as we had chosen as a part of the context Windows XP which could not be modified. In this chapter, we describe the ISS and compare it with other possible approaches to help users to access desktop icons. 3.1 Icon Space Saver The Icon Space Save (ISS) allows users to designate an area around important desktop icons as the Icon Space (IS). Figure 3 shows how the ISS make desktop icons visible. As it is launched, the ISS displays a semi-transparent vertical border on the desktop. The border can be moved only horizontally by grab-and-drag operation. The area between the border and the left edge of the screen is the IS. Windows staying within or straddling the border of the IS are automatically tossed out of the IS when the user moves the mouse cursor into any visible part of the IS. After the user clicks 574 T. Akasaka and Y. Okada Icon Space (IS) Move the mouse cursor Click an icon Move the mouse cursor Visible part of the IS Fig. 3. Icon Space Saver (ISS) moves windows out of the icon space (IS) and bring them back to the original positions automatically; users have only to move the mouse cursor into and out of the IS. The ISS also prevents a maximized window from covering the IS. an icon and moves the cursor out of the IS, all the tossed windows are brought back to the original positions. With this mechanism, users can make desktop icons visible just by moving the mouse cursor to the IS, causing the least additional mouse movement. Still, they do not have to perform any window operations; covering windows move out of the IS and then get back to the original positions automatically. Also, the ISS allows users to use the automated maximize function without covering desktop icons. When the maximize button of a window is pressed, the ISS catches the event and changes the behavior of the window so that it grows to be as large as possible without covering the IS. This offers users a simple one-click operation to make windows large while maintaining the visibility of desktop icons. 3.2 Comparisons with Other Approaches The ISS is not the only possible option to assist uses with making desktop icons visible. There are many other existing and potential solutions. In this section, we compare the ISS with some other approaches, and discuss the competence and possible problems of the ISS. Special Keyboard Sequence. Windows XP, a part of the context of our approach, offers the special keyboard sequence <Windows + D> to hide all windows and make Providing an Efficient Way to Make Desktop Icons Visible 575 desktop icons visible. With this function, users do not have to perform any window operation to make icons visible. This function makes all desktop icons visible, which is not always the case with the ISS. Two problems can be pointed out on this function. The first problem which was discovered by Hutchings [1] is that it is hard to remember to use the operation. Probably, this is because it is not intuitive to type the keyboard to make the desktop visible. The second problem is that the hidden windows are not restored automatically. This is problematic when the user wants a newly created window to be visible together with the existing windows. The ISS are free from these problems. It is intuitive to move the mouse cursor to the IS to access a desktop icon. The ISS also can restore the window layout automatically. Quick Launch Tray. Windows XP offers a special interaction place called quick launch tray to put very small icons for program launchers. Among those icons is a special icon to make the desktop visible. Located on the taskbar, the quick launch tray is free from being covered by windows. Thus, the icon for making the desktop visible has the advantage of being always accessible. Like <Windows + D>, the icon also has the advantage of making all desktop icons visible. These two advantages cannot be found in the ISS. However, the required operation for this function lacks intuitiveness just as <Windows + D> does. It also requires the user to move the mouse cursor all the way to the bottom part of the screen. The very small size of the icon is also a problem; the well known Fitts’s law [2] says that the smaller the target is, the longer it takes to set the mouse cursor to it. The ISS, on the other hand, requires the least additional mouse movement and no need to place the mouse cursor at a particular position. Machine Learning Scheme. As machine learning algorithms have proven to be practical, a new approach to develop/improve the computer user interface has emerged which incorporates the machine learning scheme to adapt the user interface to each individual user. For example, there have been some efforts to adapt the UNIX command line shell to predict the user’s next command based on the record of commands issued in the past [3]. A similar approach could be potentially used to assist users with accessing desktop icons. That is, it may be possible to predict the exact file that the user wants to access next and bring the icon for that file to the top near the current place of the mouse cursor. However, we avoided this approach. One reason is that user’s access patterns to desktop icons are unlikely to exist. As for commands in the UNIX shell where programming is one of the most common tasks, some patterns are likely to exist. On the other hand, the context of our approach assumes that users are using desktop icons for general purposes, which makes pattern extraction difficult. Our approach, the ISS, may require users to search for the icon they want to access among many icons, but instead it is able to make important icons visible without any mistake. Having being compared with other approaches, the ISS is found to have some advantages and possible problems. Comparison with the machine learning scheme especially clarified important characteristics of the ISS, namely, putting priority on simplicity and stability rather than using sophisticated technology. 576 T. Akasaka and Y. Okada 4 Experiment The aim of the ISS is to provide users with an easy way to make desktop icons visible and by doing so reduce the burden of window management, increasing the overall productivity. In this chapter, we describe the experiments that we conducted to test whether the ISS really achieved its aim. 14 persons took part in the experiments as subjects. These persons were those who usually used computers in similar contexts to that of our approach. The display of the computer used in the experiments was a 15inch single monitor with the resolution of 1024 x 768; this configuration was also chosen in line with the context of our approach. 4.1 Efficiency to Make Desktop Icons Visible The ISS allows users to make desktop icons visible without any window operations and is obviously efficient in terms of workload. However, it needs to be confirmed that the ISS can also make the process of making desktop icons visible faster than window operations. Therefore, we first conducted experiments to examine how much the ISS could reduce the time it takes to make desktop icons visible. Task Completion Time (sec) 6 w/o ISS w/ ISS 5 4 3 2 1 0 ** ** 1 2 3 The number of windows initially covering the desktop icons Fig. 4. Without using the ISS the task completion times gets longer as the number of windows increases, while the times are stable over the three conditions when using the ISS The experiment task was to click the desktop icon specified by voice instruction. A task began with a voice instruction and lasted until the subject clicked the specified icon. 17 desktop icons all representing text files were lined up in the right edge of the screen. The area around the desktop icons was initially covered by windows. Consequently, subjects first needed to move windows away from the area before clicking the specified icon. To move windows without using the ISS, they had to either iconify or grab/drag windows. It was up to each subject which of the two operations to use. When using the ISS, on the other hand, subjects were requested to use function of the ISS without performing any window operation. Subjects performed the task under the Providing an Efficient Way to Make Desktop Icons Visible 577 three conditions. The difference between the conditions was the number of windows that initially covered the area around the desktop icons; the number varied from 1 to 3. For each condition, a subject performed the task five times. Each time the task completion time was measured. Figure 4 shows the completion times averaged over all the subjects (14 subjects x 5 trials = 70 trials) with the error bars indicating the 95% confidence intervals. When subjects performed the task without using ISS (w/o ISS), the task completion time gets longer as more windows cover the area around the desktop icons. This rise in time is inevitable as subjects needed to perform window operations on each window. In contrast, when subjects used the ISS (w/ ISS), the task completion times were stable over the three conditions with relatively little deviations. This resulted in performance gain for the conditions of 2 and 3 windows, and the differences were statistically significant for 0.01 level with paired T-test. However, the ISS did not show any improvement for the condition of one window. In short, with only one window covering desktop icons the ISS gave the subjects the same level of speediness as usual window operations, but the ISS could maintain that level even when several windows cover the desktop icons, effectively bringing performance gain. From the experimental results above, we can conclude that the ISS can at least maintain the same level of speediness as window operations to make desktop icons visible, and that it can raise the level when several windows cover desktop icons. 4.2 Productivity of Primary Task Having confirmed that the ISS could make the process of making desktop icons visible faster, we then conducted experiments to examine whether the ISS could improve the overall productivity of a typical task for office workers. The experiment task was to compile a spreadsheet document. Figure 5 shows a typical screenshot of the desktop during the task. In the spreadsheet (lower right in Figure 5) there was a matrix with rows representing persons and columns representing items of information. The task was to fill the matrix by gathering information spread over many files placed on the desktop. The persons in the rows of the matrix were divided into four groups. Pieces of information in other files than the spreadsheet were described by group (upper). Mappings between persons and groups were not shown in the spreadsheet, but in a different text file (lower left). This situation caused subjects to have the text file always visible, usually at the right bottom of the screen as that place saved the mapping information from being occluded. Consequently, the desktop icons which subjects needed to access to gather information were frequently covered by the text file’s window. In addition, subjects needed to manage three windows as shown in Figure 5 or at least two windows (the spreadsheet and the text file). This formed the situation where subjects were in a dilemma of maintaining the visibility of desktop icons while managing windows. The 14 subjects were divided into two groups, each with 7 subjects. One group performed the task on a set of files with the ISS, and then did the task on a similar set of files without the ISS. The other group also performed the task twice, but this time with the reversed order; first they did not use the ISS and next they did. This design was to counterbalance the learning effect. For each trial, the task completion time was measured. 578 T. Akasaka and Y. Okada Task Completion Time (sec) 1400 1200 1000 791 800 751 600 400 200 0 w/o ISS Fig. 5. In order to fill the matrix of the spreadsheet, subjects needed to access several desktop icons, while keeping the spreadsheet and text file visible w/ ISS Fig. 6. The ISS did not make a statistically significant difference, although the sampled data showed improvement of about 5 % on average as well as reduction in deviation The results are shown n Figure 6. It was not confirmed that the ISS made a statistically significant difference, although the sampled data showed the performance gain of about 5%. In addition, the subjects showed the smaller deviation when using the ISS. As mentioned in chapter 3, the ISS can make desktop icons visible without any window operation. This might have contributed to stable window management, which in turn led to the smaller deviation. 5 Conclusion In this study, we developed the Icon Space Saver (ISS) which aimed to provide users with an efficient way to make desktop icons visible, thereby reducing the workload of window management. The experimental results showed that the ISS did make the process of making desktop icons visible faster. However, it was not confirmed that the ISS could reduce the workload of window management and raise the overall productivity of primary tasks. To confirm this needs further investigations, which will be the focus of a future study. References 1. Hutchings, D.R., Stasko, J.: Revisiting Display Space Management: Understanding Current Practice to Inform Next-generation Design. In: Proceedings of Graphics International 2004, pp. 127–134 (2004) 2. Fitts, P.M.: The Information Capacity of the Human Motor System in Controlling the Amplitude of Movement. Journal of Experimental Psychology 47(6), 381–391 (1954) 3. Davison, B.D., Hirsh, H.: Experiments in UNIX Command Prediction, Technical Report ML-TR-41, Department of Computer Science, Rutgers University (1997) An Integration of Task and Use-Case Meta-models Rémi Bastide IRIT – Université de Toulouse, ISIS – CUFR J.F. Champollion, Castres, France Remi.Bastide@irit.fr Abstract. Although task modeling is a recommended practice in the HumanComputer Interaction community, its acceptance in the Software Engineering community is slow. One likely reason for this is the weak integration between task models and other models commonly used in Software Engineering, notably the set of models promoted by the mainstream UML method. To overcome this problem, we propose to integrate the CTT model of user tasks into the UML, at the meta-model level. CTT task models are used to provide an unambiguous model of the behavior of UML use-cases. By so doing, we also bring the benefit of hierarchical decomposition of use-cases (“extend” and “include” relationships) to CTT. In our approach, CTT tasks also explicitly operate on a UML domain model, by using OCL expressions over a UML object model to express the pre- and post-conditions of tasks. 1 Introduction In the current Software Engineering practice, use-cases are routinely used during the early phases of software development, namely requirements gathering. Use-cases are arguably the less formalized of all UML notations. Rather than a hindrance, this is to be considered as an advantage: the main point of use-case modeling is to reach a common understanding of the problem between the various stakeholders of the system under development, and especially between the customer (who holds the knowledge of the business domain) and the software development team (who has the know-how of the software development process). Noted methodologists [2] argue that writing good use-cases is essentially a literary piece of work, and that a natural language description of use-cases is a good way to form consensus and mutual understanding between the stakeholders regarding what has to be done, regardless of how it has to be done. A delicate point comes with the need to relate an informal, natural language description of use cases to the increasingly formal notations used in the UML, for instance class diagrams, behavioral models such as StateCharts, etc, eventually leading to a satisfactory implementation. UML is notoriously vague and non-prescriptive with regards to the precise way to describe the behavior of use-cases. Some authors stick to a detailed natural language scenario, others prefer a partitioned narration, and others use UML sequence diagrams to describe the information exchanges between the usecase actor and the system under design. We contend that task models (in our case J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 579–586, 2009. © Springer-Verlag Berlin Heidelberg 2009 580 R. Bastide CTT models [8]) offer several advantages over the latter two, notably due to the richness of the temporal operators available. This is increasingly important, since modern user interfaces (direct-manipulation, multi-modal...) depart from the old-fashioned conversational, question-answer style, and are almost impossible to model with sequence diagrams. It is also a routine practice to develop an analysis model of the business objects of the system under design (the so-called “domain model”) early on in the development process, in order to precisely identify the business objects, their structure and their mutual relationships. This domain modeling is performed using UML class diagrams, leaving out premature implementation-related considerations. The main point of this paper is to promote CTT task models as the behavioral language for use cases. To this end, we first introduce our view of the design process which is expected. We then show how the metamodel of CTT can be tightly integrated into the UML metamodel of use-case diagrams, so that the notion of extend and include relationships become meaningful for CTT task models. 2 Design Process For the sake of efficiency, formal modeling work has to be guided by strong methodological, process-oriented guidelines. A design process defines in which order the various artifacts have to be produced during the software lifecycle, defines the expects contents of these artifacts, and what information is needed as an input and produced as an output of the various modeling and design activities. The work presented here deals mainly with the initial phases of the process, namely requirements engineering and preliminary design. • The goal of the requirements engineering phase if to form a consensus between the stakeholders (mainly the customer and the analysis team) regarding what the problem actually is, and what has to be developed in order to solve the problem. The main outcome of this phase is a common understanding between the customer and the development team of the problem domain: no work on the solution domain should be performed at this phase. • Work on the solution domain begins at the preliminary design phase: this is where the first decisions on software architecture are made, and where the best practices of interaction design (in particular iterative prototyping with increasing fidelity level) should be used. Of course, we do not recommend a strict separation between these two phases: it is quite common that work performed at the preliminary design phase uncovers new unforeseen insights on the requirements, and that some iteration has to be performed between these two phases. Although iteration is frequent between these two phases, it should always remain clear to the various actors whether they are working on the problem domain (i.e. the requirements) or on the solution domain (i.e. the design). Our claim is that task modeling is especially useful during the requirements engineering phase, and that it complements nicely the domain models and use-case An Integration of Task and Use-Case Meta-models 581 models that are developed during this phase. At this stage, class models are used to provide an analysis-level model of the domain (they formalize the vocabulary of the business domain), while use-cases and use-case diagrams are used to provide a useroriented view of the system functionality. The natural language scenarios that are associated with use-cases are essential in easing the construction of a common understanding of the problem between the stakeholders, since they are written in the vocabulary of the business and can be understood and validated by the customer. Our view that task models are essentially a requirement analysis tool contradicts several authors, who recommend using task models at the design phase, for instance to drive the generation of dialogue [6] or abstract interface models. In our approach, requirement task models necessarily remain at a rather abstract level, since at this stage the user interface is not (and should not be) yet under design. It follows that requirement task models should not mention any user-interface specifics: rather, the task models will drive the user-centered design of the UI that will follow in the subsequent phases, where the user interface specialist will strive to design an interface that is best suited to the user task, while taking into account the limitations inherent to the target platform for the interactive system. We do not believe that (except maybe in very stereotyped situations, such as business form-filling applications) a satisfactory user interface can be automatically generated from a task model. Rather, in our view, the task model can be used as a test case for the user interface that will be designed using user-centered techniques such as incremental low-fidelity prototyping. To allow for the smooth integration of task models in the software design lifecycle, we propose to integrate task models and use-cases at the meta-model level [5, 14], thus opening the way for efficient use of Model-Driven Engineering (MDE) techniques such as model weaving and model transformation. The process we advocate is inspired by the “essential use-cases” work proposed by Constantine and Lockwood [4] and the work in [13]. In particular, since use-cases are meant the be an input to interaction design, they should be devoid of any specific reference to user interface, otherwise it would be a premature commitment to a user interface design, before this design has been presented and validated by users through low-fidelity prototyping. We propose that CTT task models should serve as the behavioral language for usecases. In this usage of task modeling, task models are meant to provide an abstract view of the user’s activity, exploring their goals as well as the temporal and causal structure of their interactions with the system. Task models are thus the formal counterpart of the natural language, narrative descriptions of scenarios that is routinely associated with use-cases, and that are still quite useful: natural languages scenarios are ideal to communicate and form consensus with the customer, and can be developed and validated with the customer during brainstorming sessions. Task models, on the other hand, are useful to communicate with the design team, since they convey a precise semantics of the dynamics of human-computer interaction that has to be supported by the software to be produced. Fig. 1 illustrates our view of the early stages of the design process, highlighting the strong bonds between use-cases, domain model and task models that are the main outcomes of the requirements analysis phase. 582 R. Bastide Requirements analysis Use cases Domain model Task model Preliminary design Dialogue models Interaction design Prototyping Fig. 1. First stages of the design process 3 Related Work The need to bridge the gap between the current practices of Software Engineering (centered on UML diagrams) and user-centered design (including task analysis and modeling) has been stressed by numerous authors. A remarkable variety of solutions to this problem has been proposed. The very father of the CTT notation [12] has identified the main trends of work in this field: • • • Representing CTT by an existing notation: Nobrega et al. [10], for instance, provide semantics of the temporal operators of CTT in terms of UML activity diagrams. Nunes et al. [9] use the extensions mechanisms provided by the UML (profiles, stereotypes) to represent the concepts of CTT in a UML framework. Developing automatic converters from UML to task models [6] (and back, we should add). It can be contended that, in the HCI literature, one can find proposals for generators from any kind of model to any other kind. Building a new UML for interactive systems “which can be obtained by explicitly inserting CTT in the set of available notations” [10]. This is the trend we follow in this paper, by integrating a metamodel of CTT inside the metamodel of UML itself. An Integration of Task and Use-Case Meta-models 583 Although we share the goals expressed in [10], our technical proposal is quite different with the one presented there. − In the first place, we work formally at the metamodel level, whereas only a rough sketch of a solution was provided in [10]. We believe that explicit use of metamodels brings several fundamental advantages, including the opportunity to use existing MDE tools such as model transformation languages or model weavers to extend the potential use of models. We have demonstrated this advantage in previous work [1], by showing how the notions of human errors can be integrated in task diagrams through the use of error patterns and automatic model transformations. − Furthermore, it appears that our proposal is almost an “inside-out” reversal of the approach in [10] : the authors proposed a path to transform a use-case diagram (also called a use-case map) into a CTT task model, that could be further refined. In the contrary, we propose to use CTT as a language to specify the behavior of use-cases. 4 Alignment with the UML Use-Case Metamodel The metamodel of UML use-cases is given in Fig. 2. This is actually the metamodel of use-case maps (diagrams that show the relationships between the use cases for a system), since UML is non-prescriptive as to what a use case actually is, i.e. as to what the behavioral description of a use-case should be. Fig. 2. The UML use-case metamodel (from [11]) 584 R. Bastide There has been some picky debate amongst specialists over this very metamodel [15], several of its flaws have been pointed out, and better alternative metamodels have been proposed. Although we mainly agree with these criticisms, we have chosen to stick with the “official” metamodel, since our goal is to be as close as possible to the standard. It should also be noticed that the ill-defined notion of use-case specialization relationship, formerly available in the UML, has been removed in the current version of the standard. Starting from this “official” metamodel of UML use-cases we want to cleanly integrate a metamodel of CTT, in order to express that a CTT task model is used to express the behavior of a use-case, and to show that “include” and “extend” relationships can be expressed over a CTT model. The metamodel of CTT illustrated in Fig. 3 improves on the one we previously published in [1] in several ways: − Our earlier metamodel used eCore [14] Ecore as the metamodeling language. The one presented here uses UML class diagrams for the same purpose, which allows us to cleanly express its relationships with other elements in the UML metamodel. For instance, it expresses that the notion of Actor in CTT is identical with the same notion in UML use cases. In turn, this enriches CTT with the features available for UML actors (for instance, one can design a specialization hierarchy of actors with increasing responsibilities) − It explicitly aligns CTT with UML use cases, bringing their structuring features (“include” and “extend” relationships) to CTT. <<enumeration>> TaskAllocation User Application Interaction Abstract 0..1 +incoming 1 +next CttNode CttTransition 0..1 * +name : String +operator : Operator +subtasks +allocation : TaskAllocation +outgoing +iteration : int +type : String +frequency : String Choice 1 +extensionPoint ExtensionPoint +description : String ConcurrencyInfoExchange 0..1 +precondition : BooleanExpression OrderIndependence 1 +postCondition : BooleanExpression Deactivation +extensionPoint Enabling +base +root +base EnablingWithInfoPassing 1 1 1 SuspendResume <<enumeration>> Operator +include 0..1 * Include Extend * +extension +addition Actor +performer 1 1 * +extend * 1 CttTask +behaviorSpecifiedBy +name : String 0..1 +behaviorFor 1 UseCase Fig. 3. A metamodel of CTT integrated in the metamodel of UML An Integration of Task and Use-Case Meta-models 585 In Fig. 3, the classes with a white background are imported from the UML metamodel, and should be related to the identical ones in Fig. 2. The classes with the filled background are specific to CTT. Basically, a CTT task model (CttTask) is a tree of nodes (CttNode) which can be related by transitions (CttTransition) that feature one of the CTT temporal operators (Operator). The use-case metamodel of Fig. 2 states that a use-case can have several “extend” and “use” relationships (* cardinality). The cardinalities chosen in our metamodel of CTT in Fig.3 should be carefully considered: • Include relationship: a CttNode has 0..1 include relationships, meaning that any CTT node can optionally include another CttTask (which in turn is a tree of CttNodes). This models a classical hierarchical decomposition, which makes it easy to reuse a task model in another one, by simply including it at the proper node. It is natural to allow for a maximum of one inclusion, since otherwise the temporal combination of the included CttTasks would be left undefined. • Extend relationship: an extend relationship is ternary, relating a base to an extension through one extensionPoint. In our metamodel, a CttNode has an optional extensionPoint, meaning that it can be optionally extended. However, a CttNode can have several extensions, discriminated by condition: BooleanExpression in metaclass Extend (cf. Fig. 2). It is noteworthy that the metamodel in Fig. 3 conveys the same information as the initial use-case metamodel, only more so. For instance, the set of Include relationships for a given use-case (which are actually the relationships appearing on use case maps) can be computed by exploiting the Include and Extend relationships of Fig. 3 recursively using the hierarchical composition relationship between CTTNodes. The metamodel in Fig.3 also relates to the domain model, albeit implicitly: the preConditions an postConditions elements in CttNode are meant to be Boolean expressions expressed in OCL (Object Constraint Language) operating on a domain model defined by a UML class diagram. As OCL itself is not part of the UML metamodel, but defined in a separate, language-oriented specification the relationship between task and domain model is not apparent, but is nonetheless fundamental. 5 Conclusion We have presented our view of a design process where task and use-case modeling are tightly integrated during the requirement engineering phase. CTT task models are used to provide an unambiguous description of use-case behavior, complementing natural language scenarios. An integration of CTT into the UML metamodel has also been presented, which opens the way to automatic processing of requirement models, to be use in subsequent phases of the design and implementation, for instance test sequence generation. References 1. Bastide, R., Basnyat, S.: Error Patterns: Systematic Investigation of Deviations in Task Models. In: Coninx, K., Luyten, K., Schneider, K.A. (eds.) TAMODIA 2006. LNCS, vol. 4385, pp. 109–121. Springer, Heidelberg (2007) 2. Cockburn, A.: Writing Effective Use Cases. Addison-Wesley Professional, Reading 586 R. Bastide 3. Constantine, L., Campos, P.: Canonsketch and tasksketch: innovative modeling tools for usage-centered design. In: OOPSLA 2005: Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pp. 162–163. ACM, New York (2005) 4. Constantine, L.L., Lockwood, L.A.D.: Constantine & lockwood, ltd. structure and style in use cases for user interface design 5. Limbourg, Q., Pribeanu, C., Vanderdonckt, J.: Towards Uniformed Task Models in a Model-Based Approach. In: Johnson, C. (ed.) DSV-IS 2001. LNCS, vol. 2220, pp. 164– 182. Springer, Heidelberg (2001) 6. Luyten, K., Clerckx, T., Coninx, K., Vanderdonckt, J.: Derivation of a dialog model from a task model by activity chain extraction (2003) 7. Montero, F., López-Jaquero, V., Vanderdonckt, J., González, P., Lozano, M.D., Limbourg, Q.: Solving the mapping problem in user interface design by seamless integration in idealXML. In: Gilroy, S.W., Harrison, M.D. (eds.) DSV-IS 2005. LNCS, vol. 3941, pp. 13–15. Springer, Heidelberg (2006) 8. Mori, G., Paterno, F., Santoro, C.: Ctte: Support for developing and analyzing task models for interactive system design. IEEE Trans. Software Eng. 28(8), 797–813 (2002) 9. Jardim Nunes, N., Falcão e Cunha, J.: Towards a UML profile for interaction design: The wisdom approach. In: Evans, A., Kent, S., Selic, B. (eds.) UML 2000. LNCS, vol. 1939, pp. 101–116. Springer, Heidelberg (2000) 10. Nóbrega, L., Jardim Nunes, N., Coelho, H.: Mapping ConcurTaskTrees into UML 2.0. In: Gilroy, S.W., Harrison, M.D. (eds.) DSV-IS 2005. LNCS, vol. 3941, pp. 237–248. Springer, Heidelberg (2006) 11. Object Management Group: Unified Modeling Language (UML), version 2.0. Technical report, OMG (2005), http://www.omg.org/technology/documents/formal/uml.htm 12. Paternó, F.: Towards a UML for interactive systems. In: Nigay, L., Little, M.R. (eds.) EHCI 2001. LNCS, vol. 2254, pp. 7–18. Springer, Heidelberg (2001) 13. Rosson, M.B.: Integrating development of task and object models. Commun. ACM 42(1), 49–56 (1999) 14. Stahl, T., Volter, M.: Model-Driven Software Development. Wiley, Chichester (2006) 15. Williams, C., Kaplan, M., Klinger, T., Paradkar, A.M.: Toward engineered, useful use cases. Journal of Object Technology 4(6), 45–57 (2005) Model-Based Specification and Validation of User Interface Requirements Birgit Bomsdorf1 and Daniel Sinnig2 1 Department of Applied Computer Science, Fulda University of Applied Sciences, Germany 2 Department of Computer Science and Software Engineering, Concordia University, Montreal, Quebec, Canada birgit.bomsdorf@hs-fulda.de, d_sinnig@cs.concordia.ca Abstract. Core functional requirements as captured in use case models are too high-level to be meaningful to user interface developers. In this paper we present how use case models can be systematically refined into detailed user interface requirements specifications, captured as task models. We argue that the transition from functional to UI specific requirements is a semi-formal step which necessitates experience, skills and domain knowledge of the requirements engineer. In order to facilitate the transition we sketch out an integrated development methodology for use case and task models. Since the engineer is also responsible for establishing conformity between use cases and task models we also show, how this validation can be supported by means of the WTM task model simulator. Keywords: Requirements specification, use case model, task model, model simulation. 1 Introduction A common challenge in Software Engineering (SE) as well as in Human-Computer Interaction (HCI) is the transition from functional requirements to user interface (UI) specific requirements. UI development and the engineering of functional requirements are still often carried out by different teams using different processes and lifecycles [1]. This is likely to result in duplication of effort, inconsistencies, and even contradicting requirements. The apparent gap between software engineering and UI development has been noted by several authors [2, 3, 4] and has been (partially) addressed in our work [5] and the work of others [6, 7, 8, 9]. It has been noted that one possibility to close this gap is to conceptually join use case models and task models in one common development process. The functional requirements of the application are captured by the use case model, which are then stepwise refined into UI-specific requirements captured by task models. A combination of both models has been first investigated by Paternò [10]. In his work, however, the transition from use case to task specifications is performed informally, and task modeling is part of the design process and is not considered at the requirements level. Kujala [9] defines a systematic process for transforming user needs into use case J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 587–596, 2009. © Springer-Verlag Berlin Heidelberg 2009 588 B. Bomsdorf and D. Sinnig specifications, but does not take into account task model specifications. Sinnig et al. [5] have defined a common semantic model for use case and task models, and propose a formal, but static, refinement relation between the two artifacts. We firmly believe that the requirements engineer should not be exempted from deciding whether or not a task model faithfully refines the use case it is developed from. On the contrary, finding the answer often depends on domain knowledge and properties specific to a project. Often refinements validation cannot be automated but has to be carried out manually by the requirements engineers themselves. In such a case, simulation and animation have proven to be powerful tools, assisting the requirements engineer in assessing the validity and accuracy of development artifacts [11, 12, 13]. Based on the discussion above, the contributions of this paper are twofold: (1) We propose a systematic and integrated development process according to which UI requirements are derived as a logical progression from a functional requirements specification. (2) We demonstrate how our tool WTM Simulator [12] assists the requirements engineer in verifying whether a task model is a valid refinement of a given use case model. The remainder of this paper is organized as follows: In Section 2, we sketch out, from a generic point of view, the key characteristics of the development process we propose. Section 3 and 4 define use case models and task models as means for capturing functional and UI requirements, respectively. In Section 5 we introduce the WebTaskModel (WTM) approach and present its application to verifying conformity between use case and task models. Finally, in Section 6 we conclude and provide an outlook to future avenues. Related work is discussed throughout the paper. 2 Systematic and Integrated Development Process The basic idea of our current work on a systematic and integrated development process is depicted in Fig. 1. Use cases are used to capture the bare functional requirements of the system, which are afterwards refined to UI specific requirements by means of a set of task models. Both use cases and task models belong to the family of scenario-based notations, and as such capture sets of usage scenarios of the system. In theory, both notations can be used to describe the same information. In practice and in our approach however, use case models capture requirements at a higher level of abstraction whereas task models are more detailed. Ideally, the functional requirements captured in use cases are independent of a particular user interface [7, 14], whereas the refined requirements captured in the task models take into account the specificities of a particular type of user interface and the characteristics of a detailed user role. For example, if the application supports multiple UIs (e.g., Web UI, GUI, Mobile, etc.) and multiple user types (e.g., novice user and expert user), then the use case model is instantiated to several task models; one for each “type” of user interface and user. In modern software engineering, the development lifecycle is divided into a series of iterations. Within each iteration, a set of disciplines and associated activities are performed while the resulting artifacts are incrementally refined and perfected. The development of use case and task models is no exception to this rule. On the one hand, ongoing prioritization and filtering activities during the early stages of development will Model-Based Specification and Validation of User Interface Requirements 589 gradually refine the requirements captured in the use case model. On the other hand, a task model is best developed in a top-down manner, where a coarse grained task model is gradually refined into a more detailed or more restricted task model. In both cases, it is important to ensure that the refining model is a proper refinement of its relative base model (and all its predecessor models). Validation is an important step of a model-based approach so as to avoid ill-defined or miss-behaving models impacting the final design. Fig. 1. From Functional requirements to UI Requirements To illustrate the introduced development process, we use an example that is based on a scenario in which a new web-based Invoice Management System (IMS) is to be developed. It should feature (among others) the following functionalities: “Order Product”, “Cancel Order”, “View Orders”, and “Ship Order”. All the functionalities shall be accessible through a Web UI and should support two user types: New Customer and Registered Customer. As a first step, a functional requirements specification in the form of a use case model is developed, which is shown next. 3 Functional Requirements Specification: Use Cases Use cases were introduced in the early 90s by Jacobson [15]. He defined a use case as a “specific way of using the system by using some part of the functionality.” Modern popularizations of use case models are often attributed to Cockburn [14]. Use case modeling is making its way into mainstream practice as a key activity in the software development process (e.g. Rational Unified Process [16]). There is accumulating evidence of significant benefits to customers and developers [17]. A use case model captures the “complete” set of use cases for an application, where each use case specifies possible usage scenarios for a particular functionality offered by the system. Every use case starts with a header section containing various properties (e.g. primary actor, goal, goal level, etc). The core part of a use case is its main success scenario. It indicates the most common way in which the primary actor can reach his/her goal by using the system. A use case is completed by specifying the use case extensions. These extensions define alternative scenarios which may or may not lead to the fulfillment of the use case goal. An example use case is given in Fig 2. The use case captures the interactions for the “Order Product” functionality of the previously mentioned Invoice Management System (IMS). The main success scenario of the use case describes the situation in which the primary actor directly accomplishes his/her goal of ordering a product. The 590 B. Bomsdorf and D. Sinnig extensions specify alternative scenarios which may (3a, 6a) or may not (7a) lead to the abandonment of the use case goal. In the next section we show how the “Order Product” use case is refined by UIspecific task models. Use Case: Order Product Primary Actor: Customer Goal: Customer places an order for a specific product. Level: User-goal Main Success Scenario: 1. Primary actor browses the product inventory and selects a specific product for purchase. 2. Primary actor specifies the desired quantity 3. System validates the availability of the product quantity and displays purchase summary. 4. Primary actor provides/validates payment and shipping information. 5. System prompts primary actor to accept the terms of conditions and to confirm the order. 6. Primary actor accepts and confirms. 7. System has the payment authorization unit to carry out payment and finalizes order. 8. System confirms and invoices the order. 9. Use case ends successfully Extension Points: 3a. The desired product is not available: 3a1. System notifies primary actor that product in desired quantity is not available. 3a2. Use case ends unsuccessfully 6a. The primary actor cancels the use case: 6a1. Use case ends unsuccessfully 7a.The payment information is invalid: 7a1. System notifies customer that payment information provided is invalid. 7a2. Use case resumes at step 4 Fig. 2. “Order Product” Use Case 4 Refined UI Requirements Specification: Task Models Task modeling is by now a well understood technique supporting user-centered UI design. The resulting specification is the primary input to the UI design stage in most HCI development approaches. Since we use task models to refine the raw requirements specification given by use cases, several task specifications may be defined for a single use case, one for each type of user interface and/or user type. A task model describes how users will be able to achieve their goals by means of the future application. Furthermore it also indicates how the system will support the involved (sub)tasks. Several approaches to defining such models exist (e.g., CTT [13], TaO Spec [18], MAD [19] and VTMB [11]). The WebTaskModel (WTM) used here is a further development of our previous work [11] to account more appropriately for characteristics of interactive web applications. The enhancements, however, are applicable to conventional interactive systems as well. In the following we are not going to point out web-specific details but introduce only those extensions as relevant for this paper. A more comprehensive overview of WTM can be found in [12, 20]. Fig 3. shows a subset of a task model refining the “Order Product” use case described above. The task model was specifically developed for a Web UI and the user type New Customer. As usual, the task hierarchy shows the decomposition of a task into its subtasks which can be of different task types. In the specification of refined UI Model-Based Specification and Validation of User Interface Requirements 591 requirements we distinguish between cooperation tasks (represented by ) to denote pieces of work that are performed by the user in conjunction with the application, user tasks ( ) denoting the user parts of the cooperation performed without system intervention, and system tasks ( ) defining pure system parts. Abstract tasks ( ), similarly to CTT [13] and MAD [19], are compound tasks whose subtasks belong to different task categories. Fig. 3. “Order Product” Task Model for the role New Customer The order of task execution is given by temporal relations. In the notation used in the figure, temporal relations are denoted by abbreviations: The symbol defines a selection of subtasks. >> denotes tasks that are to be performed strictly one after the other in the specified order (visualized by ). The partial task model shown in Fig 3. specifies the task order product, which is decomposed into the subtasks search for product (according to step S1 of the use case), specify quantity (step S2) feedback (S3 and S3a1) and payment (steps S4 – S8). The task feedback is decomposed into the subtasks display summary, for which we define the precondition C1:product quantity available, and display prod. unavailable, for which we define the precondition NOT C1. Both conditions are derived from the use case extension 3a. Please note that the conditions are not shown in the diagram but were assigned by means of the task property window of the WTM editor (see [20]). The task display prod. unavailable is a so-called stop task. It denotes the premature termination of the scenario and is the task model counterpart to use case step S3a1. In addition to the task model for the role New Customer, a task model for a Registered Customer is compiled. It differs from the presented task model in terms of how the payment task is broken down. Instead of having to provide the shipping and payment information in each case, a registered customer has the option to alter shipping or payment data or to entirely skip the involved subtasks. As seen, different sub-roles lead to slightly different UI requirements. If different UI types were to be supported the use case model would also be refined into device specific task models. 592 B. Bomsdorf and D. Sinnig 5 Tool Supported Validation As mentioned above, use case models capture requirements at a higher level of abstraction whereas task models are more detailed taking into account the specificities of a particular type of user interface and characteristics of a detailed user role. The question arises whether or not a task model faithfully refines the use case it is based on. The requirements engineer is not exempted from deciding this question as finding the answer often depends on domain knowledge and project details. In the following we demonstrate how the tool WTM Simulator [12] can be used to check whether a task model is behaviorally equivalent to a given use case. Firstly, use cases are transformed into a formal (machine readable) presentation based on finite state machines. In the WTM approach, task models are represented by a set of task state machines, which are used within the final application as part of the UI controller [21]. Task state machines are also used to simulate task models within the development steps. In the work reported here a formal correspondence between use case and task models is established to simulate their execution in conjunction. This will be presented by means of a concrete simulation example. 5.1 Mapping Use Cases to UC-FSM At first use cases are transformed into a finite state machine representation called UCFSM. A UC-FSM is a labeled, directed, connected graph, where nodes denote states and the edges represent state transitions. In a UC-FSM the execution of a step is denoted by a transition. The transition labels serve as references to the corresponding steps in the original use case description. We believe that UC-FSM capture easily and intuitively the nature of use cases. As use cases are typically captured in purely narrative form the derivation of the use case graph will be a manual activity. The composition of the use case graph from a given use case depends on the flow constructs, which are implicitly or explicitly entailed in the use case. Examples of such flow constructs are: jumps (e.g. use case resumes at step X), sequencing information (e.g. the numbering of use case steps), or branches to use case extensions. Concrete details on the mapping process as well a slightly more elaborated formal model can be found in [22]. Fig. 4. Use Case FSM for “Order Product” Use Case Fig 4 depicts the corresponding UC-FSM for the “Order Product” use case. As shown, all the steps of the use case are also present in the UC-FSM. Note that starting from states {quant.selected}, {awaiting confirmation} and {confirmed}, two transitions Model-Based Specification and Validation of User Interface Requirements 593 are defined, denoting the execution of steps in the main success scenario and alternatively the execution of steps defined in the corresponding extensions. 5.2 Task State Machine and UC-FSM Assignment In WTM each task formally possesses a state machine describing a generic task life cycle (see Fig 5). For each task the state machine can be extended to specify application specific task behavior. The rules that are used for this purpose are of the form task.task-state.task-eventÆ action, where task denotes the task whose behavior is extended, task-state and task-event denote the state and corresponding trigger event upon which the action is to be performed. In the work presented in this paper, this “extension” technique is used to combine task state machines with the UC-FSM. The objective is to specify dependencies between task executions and use case steps. skipped Skip Restart initiated Start Restart Suspend State if all preconditions are fulfilled the task can be started skipped the task is omitted running denotes the actual performance of the task and of its subtasks, if applicable completed marks a successful task execution suspended running Resume End completed Abort Abort terminated Meaning initiated suspended the task is interrupted Fig. 5. Generic Task State Machine In order to run a conformance simulation we extend the various task state machines such that they generate the trigger events needed to run the UC-FSM. The specification of the extensions rules depends on which tasks are meant to be a refinement for which use case step. Hereby, due to the before mentioned different levels of abstraction, one use case step is often refined by several tasks. Table 1 (column 1 and 2) depicts the refinement mapping between use case steps and tasks. Note that abort order product is added since S3a1 is a stop task. The mappings defined by the row of step S4 result from the task model differentiation of the role Customer. Column 3 of Table 1 depicts the state of the task state machine responsible for sending the corresponding use case event to the UC-FSM. Examples of rules resulting from Table 1 are: display summary.completed.on_entry Æ send S3 to Use Case product order display prod. unavailable.completed.on_entry Æ send S3a1 to Use Case product order Æ send abort to task order product Finally we note that the table is manually created by the requirements engineer. According to our experiences we argue that if the task model was specifically developed based on a given use case specification (as suggested in the paper) the corresponding refinement mappings are clearly defined and hence the conception of the table is a straightforward activity. 594 B. Bomsdorf and D. Sinnig Table 1. Refinement Mapping between Use Case Steps and Tasks Step S1 Task search for a product Task State completed S2 S3 specify quantity display summary completed completed S3a1 display prod. unavailable New Customer: provide payment information completed / abort order product completed Registered Customer: alter data completed or skipped S5 prompt confirmation completed … … … S4 5.3 WTM Simulation Tool and Example In [12] we presented a tool that supports the developer in validating task, role, taskobject models and their behavioral interrelations by means of model simulation. In the tool each task is represented by an icon showing static and dynamic information about the task (such as the task type, temporal relations, and the current state). A context menu attached to each task allows triggering one of the events that are defined by the generic task state machine and are currently valid. The WTM simulator provides the software engineer with different areas implementing several views on the models, e.g., showing the hierarchical task structure, listing all tasks that can be started or ended at a current point in time, respectively, and presenting task objects. Some examples are shown by the screenshots in Fig 6. Here, the object area shows only USE CASE product order and its state changes resulting from task execution. Please note that modeling use cases as objects is only a workaround since the use case extensions are not yet implemented in the WTM simulator. In the upper part of Fig 6 the UC-FSM is in the state quant.selected. Since the condition C1 is fulfilled (see condition area) the task display summary can be performed at this point in time. After its completion the UC-FSM state changes to prod. available and provide shipping information is enabled (indicated by the arrow in Fig 6). The second scenario shows the unsuccessful run in case of NOT C1 (defined by C2): Once the display-task is executed order product is terminated (thus the startable leaf task area is empty) and the UC-FSM switches to state prod. unavailable. During simulation the requirements engineer can check whether or not each task sequence allowed by the task model is a valid scenario according to the use case specification and vice versa. Furthermore, the simulator allows also one to observe how the steps of a scenario under investigation affect task-objects and domain objects, respectively. As in the case of the USE CASE object, the simulator tool represents them in the object area showing their name, classes, and their manipulations in terms of state changes. Similarly, but not depicted in Fig 6, a role area shows all defined roles, allowing the investigation of role changes resulting from task execution as well as disabling and enabling of tasks caused by role changes. For example, the requirements engineer can check the validity of a user registration scenario (by which the role has to change from New Customer to Registered Customer) and its coactions with the use cases and task models, respectively, defined for each role. Model-Based Specification and Validation of User Interface Requirements 595 scenario 1 scenario 2 Fig. 6. Simulating Task and Use Case Executions 6 Conclusion In this paper we presented our current work towards an integrated development methodology for the derivation of UI requirements from high-level functional requirements. The development approach reported here consists of two basic steps. First, a use case model is iteratively created to capture core application requirements. Next, the use case model is successively refined into a set of task models. While use cases capture “raw” functional requirements which are independent of a particular user interface, task models capture refined UI specific requirements which not only take into account the specificities of a particular type of user interface but also the characteristics of a detailed user role. As a result, one use case is typically refined by several task models; one for each UI type or user role. The focus of this paper was on the systematic development of use case and task models. Our approach, however, takes also user roles and involved objects into account - the description of which has been omitted for the sake of conciseness. The tool WTM Simulator was used to check conformity between a task model and a given use case model. In particular, we demonstrated how use cases can be translated into a state machine representation and formally combined with the task state machine approach of WTM, which in turn is used as input to the simulator. The results of the simulation guide and assist the developer in deciding whether the task model is a valid refinement of the underlying use case. The research reported in this paper is the first offspring of a larger project, the goal of which is the establishment of a model-driven UI engineering framework, encompassing all phases of the software lifecycle and involved models. Within our next working step we will elaborate the refinement of the functional requirements, e.g., by means of UML activity diagrams. We also aim to further extend the WTM Simulator such that it allows for direct input of structured textual use cases and (semi) automatically generates refinement mappings between use case steps and tasks. 596 B. Bomsdorf and D. Sinnig References 1. Kazman, R., Gunaratne, J., Jerome, B.: Why Can’t Software Engineers and HCI Practitioners Work Together? In: Proc. of HCI Intern., Crete, Greece, pp. 504–508 (2003) 2. Ferre, X., Juristo, H., Windl, H., Constantine, L.: Usability basics for software developers. IEEE Software 18(1), 22–29 (2001) 3. Kazman, R., Bass, L., John, B.: Bridging the gaps between software engineering and human-computer interaction. In: Workshop at ICSE 2004, Scotland, UK (2004) 4. Sutcliffe, A.: Convergence or Competition between Software Engineering and Human Computer Interaction. In: Seffah, A., Desmarais, M.C., Metzger, M. (eds.) HumanCentered Software Engineering -Integrating Usability in the Software Development Lifecycle, pp. 71–83. Springer, Heidelberg (2005) 5. Sinnig, D., Chalin, P., Khendek, F.: Common Semantics for Use Cases and Task Models. In: Proc. of Integrated Formal Methods, Oxford, England, pp. 579–598 (2007) 6. Clemmensen, T., Norbjerg, J.: Separation in Theory – Coordination in Practice. In: Workshop Bridging the Gap between Software Engineering and HCI, Portland (2003) 7. Constantine, L.L., Lockwood, L.A.D.: Software for Use: A Practical Guide to the Models and Methods of User Centered Design. Addison-Wesley, Reading (1999) 8. Constantine, L., Biddle, R., Noble, J.: Usage-Centered Design and Software Engineering: Models for Integration. In: Workshop Bridging the Gaps Between SE and HCI, Portland (2003) 9. Kujala, S.: Linking User Needs and Use Case-Driven Requirements Engineering. In: Human-Centered Software Engineering-Integrating Usability in the Development Process, pp. 113–125 (2005) 10. Paternó, F.: Towards a UML for interactive systems. In: Nigay, L., Little, M.R. (eds.) EHCI 2001. LNCS, vol. 2254, pp. 7–18. Springer, Heidelberg (2001) 11. Biere, M., Bomsdorf, B., Szwillus, G.: Specification and Simulation of Task Models with VTMB. In: Proc. of Computer-Human Interaction Conference, pp. 1–2 (1999) 12. Bomsdorf, B.: The WebTaskModel Approach to Web Process Modelling. In: Proc. of Task Models and Diagrams for User Interface Design, Toulouse, France, pp. 240–253 (2007) 13. Paternò, F.: Model-Based Design and Evaluation of Interactive Applications. Springer, Heidelberg (2000) 14. Cockburn, A.: Writing Effective Use Cases. Addison-Wesley, Boston (2001) 15. Jacobson, I.: Object-Oriented Software Engineering: A Use Case Driven Approach. ACM Press (Addison-Wesley Pub), New York (1992) 16. Larman, C.: Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development, 3rd edn. Prentice Hall PTR, Englewood Cliffs (2004) 17. Merrick, P., Barrow, P.: The Rationale for OO Associations in Use Case Modelling. Journal of Object Technology 4(9), 123–142 (2005) 18. Dittmar, A., Forbrig, F., Stoiber, S., Stary, C.: Tool Support for Task Modelling - A Constructive Exploration. In: Proc. of DSV-IS, Hamburg, Germany, pp. 59–76 (2004) 19. Sebillotte, S., Scapin, D.L.: From users’ task knowledge to high level interface specification. International Journal of Human-computer Interaction 6, 1–15 (1994) 20. Bomsdorf, B.: Modelling Interactive Web Applications: From Usage Modelling towards Navigation Models. In: Proceedings of 6th International Workshop on Web-Oriented Software Technologies – IWWOST 2007, Como, Italy, pp. 194–208 (2007) 21. Betermieux, S., Bomsdorf, B.: Finalizing dialog models at runtime. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 137–151. Springer, Heidelberg (2007) 22. Sinnig, D., Chalin, P., Khendek, F.: LTS Semantics for Use Case Models. In: Proceedings of ACM - SAC 2009, Honolulu, HI (to appear, 2009) A Position Paper on 'Living Laboratories': Rethinking Ecological Designs and Experimentation in Human-Computer Interaction Ed H. Chi Palo Alto Research Center, Augmented Social Cognition Group, 3333 Coyote Hill Road, Palo Alto, CA 94304 USA echi@parc.com Abstract. HCI have long moved beyond the evaluation setting of a single user sitting in front of a single desktop computer, yet many of our fundamentally held viewpoints about evaluation continues to be ruled by outdated biases derived from this legacy. We need to engage with real users in 'Living Laboratories', in which researchers either adopt or create functioning systems that are used in real settings. These new experimental platforms will greatly enable researchers to conduct evaluations that span many users, places, time, location, and social factors in ways that are unimaginable before. Keywords: HCI, Evaluation, Ecological Design, Living Laboratories, Methodology, Web Services. 1 Introduction Looking back on the history of Human-Computer Interaction as a field, we see fundamental contributions mainly from two groups of researchers: (1) computing scientists interested in how technology would change the way we all interact with information, and (2) psychologists (especially cognitive psychologists) interested in the implications of those changes. This created a combustible environment for great research, because the computing scientists wanted to create great and interesting tools but did not have a great way to measure its impact, yet many classically trained psychologists were looking beyond classic research in the brain and the understanding of human cognition. This resulted in an area called “Human Information-Processing”, which closely coupled with the growth of cognitive psychology, human factors, and human engineering [1, 11]. One enduring core value in Human-Computer Interaction (HCI) research has been the development of technologies that augment human intelligence. This mission originates with V. Bush, Licklider, and Engelbart, who inspired many researchers such as Alan Kay at PARC in the development of the personal computer and the graphical user interface. Together, both groups of researchers were excited by the possibilities of the computing machinery in producing systems that augmented human intellect [5], which was a possibility that was deeply intriguing to researchers that may have been J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 597–605, 2009. © Springer-Verlag Berlin Heidelberg 2009 598 E.H. Chi slightly disillusioned with artificial intelligence research but yet believed computers were great tools for modeling and understanding human cognition. The aim of augmented human cognition has remained a core value for Human-Computer Interaction research. With this aim, during the formation of the field, the need to establish HCI as a science had pushed us to adopt methods from psychology, both because it was convenient as well as the methods fit the needs. HCI field's rise paralleled the rise in the notion of personal computing---the idea that each person would have one computer at her command. Systems were evolving from many users using a single system to a single user multi-tasking with her own desktop computer. The costs of these systems forced researchers to think about how users would most productively accomplish knowledge work. The metaphor of the desktop, files, windows, and the graphical icons on bitmapped displays arrived naturally. The study of how users would respond to icons flashing on the screen, how users would move a pointing device like the mouse [2] to move a file from one location to the next paralleled some of the psychological experiments on stimulus and human response that psychologists were already routinely measuring. Fitts' law [1, 2], models of human memory [7], cognitive and behavioral modeling methods like GOMS [1] enabled HCI researchers and practitioners to model a single user interacting with a single computer. 2 Outdated Evaluative Assumptions Of course, the world has changed. Trends in social computing as well as ubiquitous computing had pushed us to consider research methodologies that are very different from the past. In many cases, we can no longer assume: Only a single display. Users will pay attention to only one display and one computer. Much of fundamental HCI research methodology assumes the singular occupation of the user is the display in front of them. Of course, this is no longer true. Not only do many users already use multiple displays, they also use tiny displays on cell phones and iPods and peripheral displays. Matthews et al. studied the use of peripheral displays, focusing particularly on glance-ability, for example. Traditional HCI and psychological experiments typically force users to attend to only one display at a time, often neglecting the purpose of peripheral display designs. Only knowledge work. Users are performing the task as part of some knowledge work. The problem with this assumption is that non-information oriented work, such as entertainment applications, social networking systems, are often done without explicit goals in mind. With the rise of Web2.0 applications and systems, users are often on social systems to kill time, learn the current status of friends, and to serendipitously discover what might capture their interests. Isolated worker. Users performing some task by themselves. Much of knowledge work turn out to be quite collaborative, perhaps more so than first imagined. Traditional view of HCI assumed the construction of a single report by a single individual that is needed by a hierarchically organized firm. Generally speaking, we have come to view such assumption with contempt. Information work, especially work done by highly paid analysts, is highly collaborative. Only the highly automated tasks that are A Position Paper on 'Living Laboratories' 599 routine and mundane are done in relative isolation. Information workers excel at exception handling, which often require the collaboration of many departments in different parts of the organizational chart. Stationary worker. User location placement is stationary, and the computing device is stationary. A mega-trend in information work is the speed and mobility in which work is done. Workers are geographically dispersed, making collaboration across geographical boundaries and time-zone critical. As part of this trend, work is often done on the move, in the air while disconnected. Moreover, situation awareness is often accomplished via email clients such as Blackberries and iPhones. Many estimates now suggest that already more people access the internet on their mobile phone than on desktop computers. This certainly has been the trend in Japan, a bellwether of mobile information needs. Task duration is short. Users are engaged with applications in time scales measures in seconds and minutes. While information work can be divided and be composed of many slices of smaller chunks of subgoals that can be analyzed separately, we now realize that many user needs and work goals stretch over for long period of time. User interests in topics as diverse as from news on the latest technological gadgets to snow reports for snowboarding need to be supported over periods of days, weeks, months and even years. User engagement with web applications are often measured in much longer periods of time as compared to more traditional psychological experiments that geared toward understanding of hand-eye coordination in single desktop application performance. For example, Rowan and Mynatt studied peripheral family portraits in the digital home over a year-long period and discovered that behavior changed with the seasons [14]. The above discussion point to how, as a field, HCI researchers have slowly broken out of the mold in which we were constrained. Increasingly, evaluations are often done in situations in which there are just too many uncontrolled conditions and variables. Artificially created environments such as in-lab studies are only capable of telling us behaviors in constrained situations. In order to understand how users behave in varied time and place, contexts and other situations, we need to systematically re-evaluate our research methodologies. 3 Re-thinking Evaluations Fundamentally, traditional HCI research is busting the seams in two different ways: (1) ubiquitous computing research is challenging the notion of personal computing in front of a desktop, looking at computation that is embedded in the environment as well as computation done with ever powerful devices that can be taken while mobile [3, 4]; (2) social computing research that is simultaneously challenging the notion of computing systems designed for the individual, instead of for a group or community [6, 12]. Both trends have required re-thinking our evaluation methodologies. Traditional CSCW research have already drawn on qualitative methodologies from social scientists, including field observations and interviews, diary studies, survey methods, as well as focus groups and direct participation. Ubicomp, on the other hand, have used 600 E.H. Chi a mixture of methods, but have more readily examined actual deployments with real users in the field. In either case, it may be time for us to fundamentally re-think how HCI researchers ought to perform evaluations, as well as the goal of the evaluations. Since, increasingly, HCI systems are not designed for a single person, but for a whole group, we need research that not just augment human intelligence, but also group intelligence and social intelligence. Indeed, a natural extension of research in augmenting human intellect is the development of technologies that augment social intelligence, lead by research in the Social Web and Web2.0 movements. Traditional CSCW research has already studied the needs of coordination for a group and to some extent a community of practice. Many researchers are now conducting research in a social context, in which factors are less easy to isolate and control in the lab. Some research in the past might have treated variations in social contexts as part of the noise of the overall experiment, but this is clearly unsatisfactory since larger subject pools are necessary to overcome the loss in the power of the experiment. Moreover, we now know that many social factors follow distributions that are not normally distributed, making the prediction of individual factors in greatly varying social situations difficult, if not impossible. Since users now interact with computing systems in varied ubiquitous contexts, ecological validity is often much more important than studying factors in isolation. In ubicomp applications, for example, productivity measurements are often not the only metrics that are important. For example, adoption of mobile applications is now often cited as evidence of the usefulness of an application. One might argue that if using an application results in no productivity increase then the fact there is adoption of the application is irrelevant. However, this view is short sighted, because the opposite is also true: If there is productivity increase from using the application, but there is no adoption (perhaps due to ease of use issues, for example), then it is also unclear what benefit the application will ultimately bring. Obviously, the best situation is to have both productivity improvements as well as real adoption. However, research resource constraints often conspire against us to achieve both. Interestingly, academic research often tend to focus on the former rather than the latter, increasing the perceived gulf between academics' ivory tower and the trenches of the practitioners. An example that illustrates this gulf is the studies around color copiers and printers. It has been circulated here at PARC that researchers had studied the need for color output from copiers and printers, and had concluded that there was either negligible increase or no productivity increase from using color. Cost and benefit analysis showed that black-and-white copiers were often just as good and more economical than color copiers in the majority of the cases. While it is unclear whether the studies took into account of increase use of color in various media might possibly drive future demand and utility of color systems, what is clear now is that the adoption of color copiers and printers would occur independent of productivity studies. If what matters in the industry are the adoption of technology, while academic research remains focused on measurements of productivity, we will never bring the two communities together and technology transfer will forever remain challenging. A Position Paper on 'Living Laboratories' 601 4 Evaluations Using 'Living Laboratories' The Augmented Social Cognition group have been a proponent of the idea of 'Living Labratory' within PARC1. The idea is that in order to bridge the gulf between academic models of science and practical research, we need to conduct research within living laboratories. Many of these living laboratories are real platforms and services that researchers would build and maintain, and just like Google Labs or beta software, would remain somewhat unreliable and experimental, but yet useful and real. The idea is to engage real users in ecological valid situations, while gathering data and building models of social behavior. Looking at two different dimensions in which HCI researchers could conduct evaluations, one dimension is whether the system is under the control of the researcher or not. Typically, computing scientists build systems and want them evaluated for effectiveness. The other dimension is whether the study is conducted in the laboratory or in the wild. These two dimensions interact to form four different ways of conducting evaluations: (1) Building a system, and studying it in the laboratory. This is the most traditional approach in HCI research and the one that is typically favored by CHI conference paper reviewers. The problem with this approach is that it is (1) extremely timeconsuming, and (2) experiments are not always ecologically valid. As mentioned before, it is extremely difficult, if not impossible, to design experiments for many social and mobile applications that are ecologically valid in the laboratory. (2) Not building a system (but adopt one), and still study it in the laboratory. For example, this is possible by taking existing systems, such as Microsoft Word and iWorks Pages and comparing the features of these two systems. (3) Adopting an existing system, and studying it in the wild. The advantage here is to study real applications that are being used in ecologically valid situations. The disadvantage is that findings are often not comparable, since factors are harder to isolate. On the other hand, the advantages are that real findings can be immediately applied to the live system. Impact of the research is real, since adoption issues are already removed. As illustrated below, we have studied Wikipedia usage in detail using this method. (4) Building a system, releasing it, and studying it in the wild. A well-publicized use of this approach is Google's A/B testing approach2. According to Google, A/B testing allowed them to finely tune the Search Engine Result Pages (SERPs). Some details about this kind of A/B online experiments has been documented [8]. For example, how many search results should the page contain was studied carefully by varying the number between a great number of users. Because the subject pool is large, Google can say with some certainty which design is better on their running system. A major disadvantage of this approach is the effort and resource requirement it takes to study such systems. However, for economically interesting applications 1 2 http://asc-parc.blogspot.com/2008/11/living-laboratories-rethinking.html http://news.cnet.com/8301-10784_3-9954972-7.html 602 E.H. Chi such as Web search engines, the tight integration between system and usage actually shorten the time to innovate between product versions. Of these variations, (3) and (4) are what we consider to be 'Living Laboratory' studies. 5 Examples of Living Laboratory Style Research Here we will illustrate how to conduct Living Laboratory studies with some examples. GroupLens and MovieLens. First, an example of building a real system, releasing it, and studying it in the wild was the seminal work of the GroupLens [9] research group at University of Minnesota. GroupLens was first created to deal with information overload, particularly the high amount of traffic in Usenet news. In this way, GroupLens was hoping to adopt an existing community and system, and augment it with some technology and studying how the technology performs in the wild. The technology in question was collaborative filtering. The idea at the time was related to user profiling. Users expressing interest in the same items must be somewhat similar and can form a virtual neighborhood. Therefore, we can recommend to them items that their neighbors are interested in. The research group was somewhat successful in doing this, as enough users on Usenet news adopted the technology and provided feedback on the system. Later, the research group built a movie recommendation site on the Web that used similar collaborative filtering algorithms called MovieLens [10]. The website retained a community of about 6000 users that became an ecosystem in itself. Someone volunteered to keep the movie database up to date, and some participated in discussions about features the recommendation system should have. Later research on specific recommendation algorithms often split users into groups temporarily, where one group might receive one treatment, while the other would receive another treatment. The results are then compared to see how the two groups differed, including whether they evolved different group behaviors. Fig. 1. Movielens system is an academic project with a live community A Position Paper on 'Living Laboratories' 603 Games with a Purpose (gwap.com). Luis von Ahn's work on ESP games has evolved in a highly intriguing site called Games with a Purpose (gwap.com). On this site, users can engage in mini-games that are fun in themselves, but also the games end up collecting data that is useful in some other way. One well-known example is the image labeler, in which two users (without other communication means) must agree on the same keyword to receive points. The objective is to agree on the labeling in as many images as possible in a given timeframe. Here the objective is to engage real users in realistic contexts, in which the goal was to entertain the user and to gather behavioral data that tell us something about the images. One can now analyze word choices over many data points, collective action (including any attempts at cheating), as well as longitudinal issues like number of repeat visits, the diversity of users, or viralness of the game. Engagement measures, such as stickiness, can be directly measured. Fig. 2. The Games With A Purpose (gwap.com) website engages real users with games, while having them accomplish some task that is useful for research WikiScanner / WikiDashboard over Wikipedia. One realistic approach is to adopt an existing community and system, and create mashup applications that augment the original system with some new capability and studying its effects. For example, Wikis are collaborative systems in which virtually anyone can edit anything. Although wikis have become highly popular in many domains, their mutable nature often leads them to be distrusted as a reliable source of information. For example, Virgil Griffith took open source data from Wikipedia and enabled people to discover the possible identities of Wikipedia editors by cross-referencing the IP address with institution names3. Our own research on social transparency also took this approach. We downloaded a copy of all of the edits on Wikipedia and tabulated the editing statistics for all articles and all users4. This enabled us to create a visualization the editing patterns for 3 4 http://wikiscanner.virgil.gr/ http://wikidashboard.parc.com/ 604 E.H. Chi each article and each user [13]. WikiDashboard has received tens of thousands of visits from Wikipedia users. We know also that both systems were discussed extensively in the Wikipedia community. Fig. 3. An example page from WikiDashboard [13] project, which inserts a visualization of the social dynamics and edit patterns for every Wikipedia page 6 Conclusion HCI research have greatly benefitted from borrowing evaluation methods that were fine-tuned in other fields, especially the behavioral sciences. Evaluation methods are inseparable from the kinds of science and models that can be build in a field. HCI have long moved beyond the evaluation of a single user sitting in front of a single desktop computer, yet many of our fundamentally held viewpoints about evaluation continues to be ruled by outdated biases derived from this legacy. In this position paper, we have argued that traditional views of human performance in systems have long been only focused on productivity. It is time for us to break out of these longheld views, and look at evaluations in more holistic ways. Fig. 4. A way to think about the role of Living Laboratory prototypes in scientific research A Position Paper on 'Living Laboratories' 605 One way to do this is to engage with real users in 'Living Laboratories', in which researchers either adopt or create real useful systems that are used in real settings that are ecologically valid. This enables a tight loop between characterization of behavior, models of the users and system, prototype, and experimentation. The new Social Web platform is enabling researchers to build systems with amazing speed, enabling the whole loop to be completed within much shorter amounts of time than the past. Similar experimentation platforms for mobile computing is just becoming reachable, with iPhone and Google's Andriod leading the charge. These platforms will greatly enable Living Laboratory researchers to conduct evaluations that span many users, places, time, location, and social factors in ways that are unimaginable before. Acknowledgments. We thank PARC’s Augmented Social Cognition team and the HCIC workshop for many helpful discussions on this position paper. References 1. Card, S., Moran, T.P., Newell, A.: The Psychology of Human Computer Interaction. Lawrence Erlbaum Associates, Mahwah (1983) 2. Card, S.K., English, W.K., Burr, B.J.: Evaluation of mouse, rate-controlled isometric joystick, step keys, and text keys for text selection on a CRT. Ergonomics 21(8), 601–613 (1978) 3. Carter, S., Mankoff, J., Klemmer, S., Matthews, T.: Exiting the cleanroom: On ecological validity and ubiquitous computing. HCI Journal (2008) 4. Chi, E.H.: Introducing Wearable Force Sensors in Martial Arts. IEEE Pervasive Computing 4(3), 47–53 (2005) 5. Engelbart, D.C.: Augmenting Human Intellect: A Conceptual Framework. Summary Report AFOSR-3223 under Contract AF 49(638)–1024, SRI Project 3578 for Air Force Office of Scientific Research, Stanford Research Institute, Menlo Park, CA (1962) 6. Grudin, J.: Groupware and social dynamics: Eight challenges for developers. Communications of the ACM 37(1), 92–105 (1994) 7. Jones, W.P.: On the Applied Use of Human Memory Models: The Memory Extender Personal Filing System. International Journal of Man-Machine Studies 25(2), 191–228 (1986) 8. Kohavi, R., Longbotham, R.: Online Experiments: Lessons Learned. Computer 40(9), 103–105 (2007), doi:10.1109/MC.2007.328 9. Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: GroupLens: Applying Collaborative Filtering to Usenet News in special section: recommendation systems. Communications of the ACM 40(3), 77–87 (1997) 10. Riedl, J., Konstan, J.: Word of Mouse: The Marketing Power of Collaborative Filtering. Warner Books, New York (2002) 11. Sears, A., Jacko, J.A.: The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications. CRC Press, Boca Raton (2008) 12. Shneiderman, B.: Science 2.0. Science 319(5868), 1349–1350 (2008) 13. Suh, B., Chi, E.H., Kittur, A., Pendleton, B.A.: Lifting the Veil: Improving Accountability and Social Transparency in Wikipedia with WikiDashboard. In: Proceedings of the ACM Conference on Human-factors in Computing Systems (CHI 2008), Florence, Italy, pp. 1037–1040. ACM Press, New York (2008) 14. Rowan, J., Mynatt, E.D.: Digital family portrait field trial: Support for aging in place. In: Proc. of CHI 2005 Conference on Human Factors in Computing Systems, pp. 521–530. ACM, New York (2005) Embodied Interaction or Context-Aware Computing? An Integrated Approach to Design Johan Eliasson, Teresa Cerratto Pargman, and Robert Ramberg Department of Computer and Systems Sciences, Stockholm University/Royal Institute of Technology, SE-164 40 Stockholm, Sweden {je,tessy,robban}@dsv.su.se Abstract. This paper revisits the notion of context from an interaction design perspective. Since the emergence of the research fields of Computer supported cooperative work and Ubiquitous computing, the notion of context has been discussed from different theoretical approaches and in different research traditions. One of these approaches is Embodied Interaction. This theoretical approach has in particular contributed to (i) challenge the view that user context can be meaningfully represented by a computer system, (ii) discuss the notion of context as interaction through the idea that users are always embodied in their interaction with computer systems. We believe that the particular view on users context that the approach of Embodied Interaction suggests needs to be further elaborated in terms of design. As a contribution we suggest an integrated approach where the interactional view of Embodied Interaction is interrelated with the representational view of Context-aware computing. Keywords: Embodied Interaction, Context-aware computing, Design, Representation, Context. 1 Introduction In his book “Where the Action Is: The Foundations of Embodied Interaction” [1], P. Dourish introduces the idea that tangible and social computing have a common denominator. They both exploit the idea of people’s familiarity and facility with the everyday world, be it a world of physical artifacts or of social interaction. “This role of the everyday world here is more than simply the metaphorical approach used in traditional graphical interface design. […] Instead of drawing on artifacts in the everyday world, it draws on the way the everyday world works or, perhaps more accurately, the ways we experience the everyday world.” (p. 17, orig. italics). Dourish explains that both tangible and social interactions “draw on the fact that the ways in which we experience the world are through directly interacting with it, and that we act in the world by exploring the opportunities for action that it provides to us – whether through its physical configuration, or through socially constructed meanings” (pp. 17). The Embodied Interaction approach [1] can be regarded as a picture of a user actively engaged in human computer interaction. The focus lies on how interaction is played out when the activity develops smoothly and problem-free; when our daily interaction is handled effortlessly, without reflection. The Embodied Interaction J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 606–615, 2009. © Springer-Verlag Berlin Heidelberg 2009 Embodied Interaction or Context-Aware Computing? 607 approach thereby portrays human agents as engaged in an interaction characterized by skilled and continuous coping. It thus describes an understanding of human computer interaction that can be exemplified with a user finding herself in a situation of being able to handle all difficulties and not losing focus in her activity even once. How she gets there and how she manages to remain in this focused activity, is out of the scope. But how well does this picture of skilled and engaged human computer interaction guide design? And more specifically; How well does it guide design of context-aware systems? To understand these questions better we need to return to one of the targeted problems of Embodied Interaction. Namely the gap between the social conception of context and the technical one [2, 3]. Embodied Interaction is but the latest contribution to improving the understanding of this gap. It uses the philosophical tradition of phenomenology as a theoretical departure point for understanding interaction. This tradition has previously been presented in the HCI research community, since it seems to offer a way to take both social and technical views into account. Based on the “present-at-hand” mode of use, Winograd and Flores [4] discuss user activity in terms of “breakdown”. Weiser [5] introduces the concept of transparency, in calm computing and ubiquitous computing research, based on the “ready-to-hand” mode of use. The approach of Embodied Interaction relies much on the idea of a well practiced and smooth interaction, with and through computers, and it deemphasizes the developmental aspects of the user activity. Without addressing these developmental aspects, it is difficult for designers to operationalize the approach of Embodied Interaction in their work. Without looking at how you become a skilled user in the interaction with a system, one opportunity for design is passed over. Advocating history of use, Chalmers [6] have argued that the ideal of transparency, which can be also found in Embodied Interaction, is an unachievable goal. Räsänen and Nyce [7] have pointed out that the approach of Embodied Interaction is reductionist in that it does not go beyond interaction, and focus the here and now too much. We will go one step further to claim that Embodied Interaction does not take all modes of interaction present in the activity into account, and thereby misses out on how skill is acquired. In this respect we will claim that the Embodied Interaction approach overlooks the interplay between learning and practice, between reflection and action that characterizes any kind of human computer interaction. This observation is particularly interesting for design of context-aware computing systems because this field has strong connection to Embodied Interaction. This paper revisits the notion of context in the field of context-aware computing from an integrated design perspective on Embodied Interaction. We believe that the rich conceptualization of Embodied Interaction deserves to be further developed in terms of design of context-aware computing systems. This leads to the following question: How do we design for context-aware computing systems in the light of Embodied Interaction? In this paper we will not try to answer questions about proactivity in context-aware computing. Instead we follow Rogers [8] in that what we aim for is not proactive computing but proactive people. 608 J. Eliasson, T. Cerratto Pargman, and R. Ramberg 2 The Notion of Context from an Embodied Interaction Perspective Grounded in Merleau-Ponty’s Phenomenology of perception [9], Schutz’s Social phenomenology [10] and Heidegger’s Hermeneutic phenomenology [11], Dourish [1] suggests a theoretical approach to human computer interaction which he coins Embodied Interaction. The Embodied Interaction approach views context not as information but as a relation and, as human actors participate in the world, action does not occur in a particular context but context is rather created and recreated in concert with interaction [12]. Because of this, context is not stable but instead a dynamic feature; constantly changing. What is to be regarded as context is thereby determined by the setting, actors and interaction. According to the Embodied Interaction perspective context is not some delineable aspect of a setting that can be encoded and represented [12]. Rather context is something people do. In this way the context model in Embodied Interaction is an interactional model and not a representational model [12]. The view that context is what people do, comes from the primacy of action in Embodied Interaction. An emphasis on action is shared with Situated Action [13], which is also one departure point for Embodied Interaction. Both approaches regard context and meaning as continually changing and only possible to recognize in how interaction unfolds. According to Embodied Interaction, the way we interact with a computer system is a sign of how we relate to the system. Meaning is also embodied, both in a physical and a wider sense. In this way our interaction is dependent on our physical, social and cultural body. The theoretical approach of Embodied Interaction argues against disembodied, objective and reflective use. What Embodied Interaction instead focuses on, inherited from embodiment [9] and being-in-the-world [11], is a moment of mindless interaction, a moment of skilled coping. 2.1 Challenges for Context Design from an Embodied Interaction Perspective Dourish [1] suggests the following six principles as a backdrop for design (pp. 162): 1. 2. 3. 4. 5. 6. Computation is a medium Meaning arises on multiple levels Users, not designers, create and communicate meaning Users, not designers, manage coupling Embodied technologies participate in the world they represent Embodied interaction turns action into meaning When trying to design for context from the Embodied Interaction perspective, we are left with these broad design principles. That the principles are broad make them difficult to operationalize. This while the alternative of designing for context, using objective representations, is merely seen as positivist thinking, incompatible with the philosophy put forward by Embodied Interaction [12]. Take for instance the third and fourth design principle above, they directly address the role of designers although they do so in a rather negative, excluding sense. Principle number three and four state Embodied Interaction or Context-Aware Computing? 609 what designers of these systems should not do. Thereby the role of the interaction designer seems to be marginalized, to an enabling one. It is probably not meant that the ideal we should strive for is the ultimate and final system, allowing for every kind of appropriation and every kind of interaction. Dourish [12] notes that one and the same system should support evolution: “[...] our concern is not simply to support particular forms of practice, but to support the evolution of practice—the ‘conversation with materials’ [Dourish quoting Schön [14]] out of which emerges new forms of action and meaning.” (p. 25). This seems like a contradictory claim as the evolution of practice is only known in retrospect and in analysis. So how can this be used for claims about design? In a passage about Place and Space, Dourish [1] writes: “…place can’t be designed, only designed for.” (p. 91). If Embodied Interaction is about meta-design, then what are the remaining implications for design, and especially design for context? Our interpretation of Embodied Interaction is that interaction designers should leave context and meaning as open to appropriation as possible. What designers ideally should strive for, then, is completely open systems. In these computer systems each user can interact with the most suitable content and structure. The computer system has, from this particular understanding of interacting with computers, to be able to show every possible structure and the current state and configuration of the system [12]. From an Embodied Interaction perspective on human computer interaction, we can design user interfaces, but not how they should work, as creation of meaning should be left to users in their appropriation of the interfaces. Because we are not allowed to design how an interface should work we also cannot explicitly support skill acquisition. In Embodied Interaction skill acquisition is not an issue because it is does not belong in the picture of skilled and engaged coping, and thereby it falls outside the scope of Embodied Interaction. As a result, acquiring skill becomes something magical, something designers will not need to attend to. The Embodied Interaction approach has an interactional model of context. But if the notion of representation is absent in the description of interaction with a system how can designers design for this interaction? The concept of representations is key for the design of computer systems and especially context-aware computing. 3 The Notion of Context in the Field of Context-Aware Computing In context-aware computing the notion of representations of context is seen as a prerequisite for designing context-aware systems. The assumption is that it is possible to divide the context of a device (or a user) into smaller parts and that some of them are more or less objective and stable. Thereby it is possible to meaningfully represent them in a computer system hosting the device. For example Dey et. al. [15] reasons in terms of identifying and analyzing the constituent elements of context. In identifying and analyzing the constituent elements of context, ubiquitous computing research is bottom-up, starting with sensor data representing aspects of the physical environment [15]. One example is when sensor values as GPS coordinates are used in navigational applications. Starting from sensor values 610 J. Eliasson, T. Cerratto Pargman, and R. Ramberg then context and meaning is inferred up to the level of human interaction with the device. As described in Dey et. al. [15]: “One hypothesis that a number of ubiquitous computing researchers share is that enabling devices and applications to automatically adapt to changes in their surrounding physical and electronic environments will lead to an enhancement of the user experience.” (p. 100). One last step then is to use the model not only to adapt, but also to try to foresee what is going to take place next and let the application act proactively, guessing what users soon might need to have at hand. In this case questions for system designers are how to adapt to context and how to act proactively in context. Obviously, it is a very hard problem to get all these abstractions, models and inferences right. It can certainly be questioned whether these systems will ever succeed outside very specific domains with very limited scope [8, 16]. 3.1 Challenges for Design of Context-Aware Computing Systems Context-aware computing has been blamed for making only small advances and relying too much on systems engineering to solve problems origination in human interaction [3, 17]. It is also questionable whether we will see a major breakthrough in context-aware computing any time soon as the problems of strong AI and proactive computing are still far from solved [8]. The problem for context-aware computing lies in the representational models that are built in context-aware computing. In a representational model there are inherent questions about what is represented and how it is represented. The next question is how different representations are related. Computational representations use specific values, structures and interrelations. There is no vagueness involved, but every possible value, structure and interrelation have to be decided in advance by the designer. The effect of these decisions is that the behavior of each model of context is also at a basic level determined in advance. Because of this the user model and the system model will diverge as soon as the context-aware system is put in use. The context-aware computing solution to this divergence is either to add an exception to the model every time it diverges or to trust in future AI advancements to solve all discrepancies. In the field of context-aware computing physical and digital representations of context are building blocks of design. As opposed to human and social representations, designed representations are bounded in terms of structure and contents. In computer science, representations are the internal software components that together make up a computer program. These digital software components rely on physical hardware components, which in turn bound the representational power. A computer system is then itself built on representations and therefore cannot be non-representational. But this still allows for non-representational use, with embodied physical or digital representations. This duality between non-representational use and designed and bounded representation is present in every interaction with something that is designed. The representations of context in context-aware computing are seen as objective because of their origin in sensor values. But this concept of objective context should not be interpreted as absolute. Even for instance, GPS coordinates are only valid within their social frame, which in this case is a very wide frame. Chalmers [18], in accordance with Ricoeur and Gadamer, writes: “‘Objectivity’ comes from distanciation: representation is fixed, dissociated from intention and only displays universally Embodied Interaction or Context-Aware Computing? 611 shared references. […] objectivity is not absolute. Instead, we see degrees and forms of distanciation.” (p. 213). Also on objectivity, Dourish [12] writes: “In contrast to the objective and quantitative nature of positivist theories, phenomenological theories are subjective and qualitative in orientation. By ‘subjective’ I mean that they regard social facts as having no objective reality beyond the ability of individuals and groups […]” (p. 21). The interpretation of this is not that everything is subjective in the sense that everyone has their own interpretation, different from everyone else’s. If this were the case, we would not be able to relate to what others do, we would simply not be able to engage in any interaction without questioning every step of it. Instead we socially create meaning, which we use in interaction. That is: ‘objective’ and ‘subjective’ may not be so far apart. As the extreme of objective representations is never the case and as it is impossible to design for the completely subjective, we need to find a point where we can agree. If “groups” in the previous quote is taken to be the people we design for, then we are essentially agreed, and can meet half way between objective and subjective. 4 Towards an Integrated Approach: Reintroducing the Concept of Representations to Embodied Interaction At some level computer technology is always designed. In fact we both design human computer interaction and we design for human computer interaction. One extreme is the socio-cultural approach. Relying on ethnographical methods, we start out by describing specific users as a basis for design and then design for context. In this view human action is in focus. Action is performed within context at the same time as context is interpreted and recreated. With this focus context is never stable and therefore cannot be deliberately designed. The remaining option for a designer is to support user context formation by relying solely on user appropriation. Human action is subjective and situated, rendering each interaction different from the previous one. In Table 1 this corresponds to “action” as mode of use and because of the subjective nature system designers can only support interaction and design representations for context determined by users. The other extreme is the technology perspective, where we design representations of context and let uses adapt to these representations. Context is modeled using objective and stable representations of sensor values. Users can then interact with this computer model where use is objectifying and reflective. The mode of use as seen in Table 1 is characterized by reflection on representations of context. To combine results stemming from these two approaches is challenging, e.g. [12]: “Translating ideas between different intellectual domains can be both exceptionally valuable and unexpectedly difficult. One reason is that the ideas need to be understood within the intellectual frames that give them meaning, and we need to be sensitive to the problems of translation between these frames.” (p. 20). On fundamental ontological disagreements it is questionable whether it can be done at all. On context in computer supported cooperative work and ubiquitous computing, despite the seemingly contradictory approaches, there have been many attempts to bridge or at least narrow the gap between these two intellectual frames [3, 19]. An alternative to bridging the gap would be to acknowledge that the both sides, computer 612 J. Eliasson, T. Cerratto Pargman, and R. Ramberg Table 1. Mode of use related to artefacts of design Mode of use Action Reflection Design artefacts Representations for context Representations of context representations stemming from sensor data and analytical representations of context are necessary. Instead of searching for one common ground for these views of context we note that they are two sides of the same coin. When learning a new system much time and effort goes into figuring out how the system works instead of engaging in the activity itself. At first when the system has been learnt it can be handled without reflection, with skilled and embodied interaction. But still there are instants when “an event ‘leaps to the eye’ because it is expected or is a deviation from that which one would expect” [20] (p. 294). Also Heidegger noted this (here in the words of Dreyfus [21]): “…mental content arises whenever the situation requires deliberate attention.” (p. 70). These points show us towards an answer in revisiting Heidegger’s original view of hermeneutic phenomenology. His famous example with the hammer does not only serve to show how the hammer is transparent in ready-to-hand use, but also how “breakdown” (when the head falls off and the hammer becomes present-at-hand) leads to acquiring skill (in avoiding this malfunction in the future). As Dreyfus [21] says when clarifying Heidegger “…the occurent is necessary for explaining the functioning of the available…” (p. 121). Here Dreyfus uses the terminology “occurent” instead of present-athand and “available” instead of ready-to-hand. Figure 1 shows how ready-to-hand action and present-to-hand reflection are interrelated. With this integrated view there is no necessity to choose between action and reflection, no necessity to choose between designing representations for context and designing representations of context. Instead the mode of use repeatedly shifts between action and reflection.1 Take GPS positioning for example. Most of the time the coordinates are correct and a user can interact with the navigational program without paying too much attention. The mode of use is here seen as “action”. But there are certainly occasions when the mode of use shifts to reflection; for example when a breakdown in interpretation occurs because of a mismatch between the map position and the position in the real world. Another breakdown could occur when a user moves indoors and gets a message about lack of coverage. In both these examples, interaction is interrupted and the user may need to reflect upon what the problem is, to be able to find a solution (e.g. update GPS-data or move outdoors) before interaction can be reengaged. Objective representations for context are not only to be seen as harmful, constraining user context, but they also form a structure to relate to in a hermeneutic interpretation. Instead of trying to give guidelines for how to design one ultimate design, we need to acknowledge that a design and thereby also the designer is part of this hermeneutic development and that continuous redesigns, done by both designer and user, are necessary for the system to stay relevant to a user. 1 Since both modes of use can be found in the Hermeneutic phenomenology of Heidegger there might be no ontological disagreements in the end. Embodied Interaction or Context-Aware Computing? 613 Reflection Action Fig. 1. The two modes of use as interrelated 5 Discussion The Embodied Interaction perspective has both turned away from, and argued against objective representations for context. Although the Embodied Interaction view of context contributes to a better understanding of human interaction with and through computers, at the same time it marginalizes objective representations for context without offering an alternative basis for design. Maybe it even marginalizes design as a whole. It is time to turn the perspective back again to enable both design of context and design for context. The alternative to design systems completely open to appropriation is to use current descriptions of context as a basis for design. If we cannot use current descriptions, but instead need to leave more open for appropriation, then the role of the designer is marginalized accordingly. Computer systems always have room for interpretation and appropriation, but through careful design appropriation and skill acquisition can be guided. Leaving more open to appropriation means constraining the choices that the designer has. A similar trend in design was when the concept of affordance became the one guideline overshadowing all others in HCI. Given the Hermeneutic phenomenology perspective, it poses no problem to reintroduce objective representations of context in the philosophy put forward by the Embodied Interaction approach. Action and reflection are just different modes of use where present-at-hand reflection is an important complement besides embodied ready-to-hand action, and it is not one or the other. Users act in context by (hermeneutically) going back and forth from ready-to-hand embodied interaction to presentat-hand reflection and back again. Our integrated approach undoubtedly have much in common with Winograd and Flores [4], focusing “breakdown” as important, but there are differences. They came 614 J. Eliasson, T. Cerratto Pargman, and R. Ramberg to the conclusion of modeling computer use through utilizing a state machine representation of speech act theory, with labeled states and directed arcs. Our approach is to use present-at-hand categories, but not to build a general model that enforces some elaborate structure. Instead we only point to the interrelation between present-at-hand and ready-to-hand. This approach can be used either to build general systems with small descriptive powers or specific systems with large descriptive power. But our main contribution is that the present-at-hand categories give us a way of talking about design, while still relating to ready-to-hand Embodied Interaction. It is interesting to note what Dourish [1] write about the states of ready-to-hand and present-at-hand. Dourish explicitly refers to these “states” when discussing coupling using a computer system as example: “If there were simply these two states […] However the truth is more complex. As we have seen, the tools through which we operate when interacting with a computer system are not simply physical objects, but software abstractions, too. There are very many of these abstract entities in operation at any given moment, and programs link them together in a variety of ways.” (p. 139). This surely gives the impression of great complexity. Dourish ends this passage in the following: “The consequence, then, is that there are very many different levels of description that could be used to describe my activity at any given moment. Some, perhaps, are ready-to-hand and some present-at-hand at the same time […]” (p. 140). But that some entities are ready-to-hand while others are present-at-hand is nothing new. On a conceptual level even when Heideggers’ hammer was ready-to-hand some other part of the activity was present-at-hand. Computer systems does not change this. If we design these systems with using present-at-hand categories deliberately, we might even bring Embodied Interaction one step forward. References 1. Dourish, P.: Where the action is: the foundations of embodied interaction. MIT Press, Cambridge (2001) 2. Dourish, P.: Seeking a foundation for context-aware computing. Hum. Comput. Interact. 16, 229–241 (2001) 3. Barkhuus, L.: The context gap, an essential challenge to context-aware computing, vol. Diss. IT University of Copenhagen, Copenhagen (2005) 4. Winograd, T., Flores, F.: Understanding computers and cognition: a new foundation for design. Ablex, Norwood (1986) 5. Weiser, M.: The Computer for the 21st Century. Scientific American 265, 95, 98–102, 104 (1991) 6. Chalmers, M.: A historical view of context. Computer Supported Cooperative Work: CSCW: An International Journal 13, 223–247 (2004) 7. Räsänen, M., Nyce, J.M.: A new role for anthropology?: Rewriting “context” and “analysis” in HCI research. In: ACM International Conference Proceeding Series, vol. 189, pp. 175–184 (2006) 8. Rogers, Y.: Moving on from Weiser’s Vision of Calm Computing: Engaging UbiComp Experiences. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 404– 421. Springer, Heidelberg (2006) 9. Merleau-Ponty, M.: Phenomenology of perception. Routledge, London (2002/1962) Embodied Interaction or Context-Aware Computing? 615 10. Schutz, A., Luckmann, T.: The structures of the life-world. Northwestern U.P., Evanston (1973) 11. Heidegger, M.: Being and time. Harper, New York (1962) 12. Dourish, P.: What we talk about when we talk about context. Personal Ubiquitous Comput. 8, 19–30 (2004) 13. Suchman, L.A.: Plans and situated actions: the problem of human-machine communication. Cambridge Univ. Press, Cambridge (1987) 14. Schön, D.: The reflective practitioner: how professionals think in action. Basic Books (1983) 15. Dey, A.K., Abowd, G.D., Salber, D.: A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human-Computer Interaction 16, 97– 166 (2001) 16. Dreyfus, H.L.: What computers still can’t do: a critique of artificial reason. MIT Press, Cambridge (1992) 17. Håkansson, M.: Playing with context: explicit and implicit interaction in mobile media applications, Vol. Diss. Department of Computer and Systems Sciences (together with KTH), Stockholm University, Kista (2009) 18. Chalmers, M.: Hermeneutics, information and representation. European Journal of Information Systems 13, 210–220 (2004) 19. Chen, Y., Atwood, M.E.: Context-centered design: Bridging the gap between understanding and designing. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 40–48. Springer, Heidelberg (2007) 20. Schmidt, K.: The problem with ’awareness’: Introductory remarks on ’awareness in CSCW’. Computer Supported Cooperative Work: CSCW: An International Journal 11, 285–298 (2002) 21. Dreyfus, H.L.: Being-in-the-world: a commentary on Heidegger’s Being and time, division I. MIT Press, Cambridge (1991) Supporting Multidisciplinary Teams and Early Design Stages Using Storyboards Mieke Haesen, Jan Meskens, Kris Luyten, and Karin Coninx Hasselt University – tUL – IBBT, Expertise Centre for Digital Media, Wetenschapspark 2, B-3590 Diepenbeek, Belgium {mieke.haesen,jan.meskens,kris.luyten,karin.coninx}@uhasselt.be Abstract. Current tools for multidisciplinary teams in user-centered software engineering (UCSE) provide little support for the different approaches of the various disciplines in the project team. Although multidisciplinary teams are getting more and more involved in UCSE projects, an efficient approach to communicate clearly and to pass results of a user needs analysis to other team members without loss of information is still missing. Based on previous experiences, we propose storyboards as a key component in such tools. Storyboards contain sketched information of users, activities, devices and the context of a future application. The comprehensible and intuitive notation and accompanying tool support presented in this paper will enhance communication and efficiency within the multidisciplinary team during UCSE projects. 1 Introduction When combining HCI techniques and software engineering principles in usercentered software engineering (UCSE), the biggest challenge is the communication within a multidisciplinary team including the end users. MuiCSer, a framework for Multidisciplinary user-centered Software Engineering processes, focuses on the benefits of both disciplines, and was introduced to investigate the features and shortcomings of current UCSE models and tools [1]. One missing link in most user centered processes is a tool to progress from informal design artefacts (e.g. scenario) toward more structured design artefacts (e.g. task model). Most tools and techniques require specific knowledge about specialized notations or models, thus exclude most team members to be involved. Furthermore, functional information may be missing in informal design artefacts while structured design artefacts may not always contain all non-functional information. We propose the usage of storyboards as a comprehensible artefact related to features of graphical user interface design tools to overcome these shortcomings. In summary, the main contributions in this paper are: • • a novel user-centered design approach that uses storyboards as a common language in a multidisciplinary team; tool support for creating and editing storyboards in order to bridge the gap between the early stages of the UCSE process and the user interface J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 616–623, 2009. © Springer-Verlag Berlin Heidelberg 2009 Supporting Multidisciplinary Teams and Early Design Stages Using Storyboards 617 design. This tool supports the connection between storyboards and artefacts created later in the process. 2 Related Work User-centered processes recommend combining non-functional as well as functional requirements by involving a multidisciplinary team [2]. The early design stages of usercentered design (UCD) include a user needs analysis and generally result in several artefacts such as usability requirements [3], scenarios [4] and personas [5] describing the user needs. These artefacts are written in a narrative style and are usually created by interaction designers. Similar artefacts are used in software engineering and agile development [6] (e.g. essential use cases, scenarios, story cards, user stories). Although several disciplines provide notations to describe user needs, the notations are not always comprehensible for all members of a multidisciplinary team. Lindgaard et al. [7] address the difficulties in presenting user needs for requirements engineering. Earlier studies describe the needs of interaction designers in a multidisciplinary team. Brown et al. [8] conducted an ethnographic study to investigate the collaboration between user interaction designers and developers. The study describes the benefits of stories and sketches in the early stages of user-centered approaches and emphasizes the power of combining both. Assembling stories and sketches is a powerful technique to reveal errors, and to consider temporal and contextual information. Arecent study of Myers et al. [9] reports that designers are experiencing difficulties when designing the behavior of user interfaces. While prototyping the appearance of user interfaces is straightforward, designing and communicating the behavior is an ongoing process. Furthermore, the survey revealed designers frequently use sketches and storyboards. Currently little tool support is available for storyboarding in multidisciplinary teams. Demais [10] and IBM Rational Requirements Composer1 focus on storyboards in the design process of multimedia applications while Denim [11] and Highlight [12] feature storyboards for web applications. All these tools are developed to describe the behavior of software or web applications and support a first walkthrough of the future system or website. The storyboards created using these tools contain mock-ups of UI designs and their relationships, and thus are designed after the requirements gathering of a future application. The ActivityDesigner [13] tool allows storyboarding at the early stages of design. In this tool, designers can extract activities from concrete scenarios making it possible to include rich contextual information about everyday lives as scenes. Based on the scenes, higher level structures and prototypes can be created. The tool we present in this paper also provides the possibility to build storyboards during the gathering of requirements in order to facilitate the creation of artefacts at later stages. 1 http://www.ibm.com/developerworks/rational/library/08/1118_zhuo, last visited 6 January 2009. 618 M. Haesen et al. 3 Storyboards In a UCSE process, a report of a user needs analysis, scenarios and personas are presented to the entire team after conducting the first user studies. Structuring artefacts that are written in a narrative style is a complex though important process. All artefacts created at later stages need to be consistent with these first results. Unfortunately little tool support is available for the first transition in UCSE processes. This implies that the entire team needs to verify consistency among the informal results of a user needs analysis and artefacts created later in the process. A good understanding within the multidisciplinary team at this point can be crucial for the resulting user experience. We investigate how storyboards can be used in UCSE processes by a multidisciplinary team. 3.1 Users, Activities, Devices, Context The professional use of storyboards originates from the film-industry and is getting introduced in several disciplines such as advertisement and product design [14]. In UCSE a storyboard can have several meanings. Storyboards can depict manual steps, users interacting with a product, screen mockups of a new work practice or the link with the system behind-the-scenes [6]. The focus on visual information renders it highly comprehensible for any member of the team, independent of their background or role in the team [14] [15]. In the context of our research, we want to define storyboards as sketches of real life situations, depicting users carrying out several activities by using devices in a certain context. An example of a simple storyboard is presented in the center of Fig. 1. Since storyboards contain a lot of information about the future use of an application, they can be used to provide a link between a user needs analysis and requirements gathering, containing functional as well as non-functional requirements. Furthermore, the natural style of presenting the use of a future system implies this artefact is very comprehensible for all team members including end users. Since scenes of a storyboard contain contextual information, they are suitable for the specification of context-aware applications. This contextual information has to be taken into account during the entire development process, thus storyboards can contribute to the evaluation, verification and validation of several stages. 3.2 Bridging the Early Stages of UCSE Processes The creation of storyboards happens at the early stages of a UCSE process, after the creation of scenarios and personas. An example storyboard and the interrelationship between a storyboard and other artefacts are presented in Fig. 1. A storyboard is built by splitting up the scenario into scenes and presenting the scenes as sketches depicting users interacting with the future system. Connecting scenes of a storyboard, structures the narrative information of the scenario. The understandability of storyboards increases the amount of team members that can collaborate during this phase. Even end-users can be involved to create or evaluate storyboards. Supporting Multidisciplinary Teams and Early Design Stages Using Storyboards 619 Fig. 1. A storyboard and its interrelationship with other artefacts in the development process. Situations and devices in the scenes are extracted from scenarios, while the user information is extracted from the scenarios as well as the personas. The storyboard is used as input for the creation of task flow diagrams and the UI designs. Once all scenes are added to the storyboard, personas and devices can be highlighted in each individual scene. This enriches the information contained by the storyboard and can be used to make the transition to other artefacts. Task flow diagrams, presenting user actions and processes to complete a task, can be produced based on the information in and the connections between the scenes of the storyboard. At a later stage of the development process, the storyboard can guide the UI design and development. By carefully considering the situation of each scene, designers and developers build an application corresponding to context, requirements and constraints contained by the storyboard. Interaction designers can use a storyboard to verify that the UI designs take into account all requirements. A storyboard also contributes to the preparation of the usability tests. Using storyboards in UCSE processes increases the visibility of the project. New team members for instance, can explore the requirements of the project at a glance by looking at the storyboard. 4 Tool Support for Storyboards As stated above, storyboards contribute to the development process of software applications in multidisciplinary teams. When suitable tool support is available for all team 620 M. Haesen et al. members, storyboards become more powerful and the visibility and traceability of a project increase. A literature survey [1] showed there is a need for tools that support UCSE processes in the early stages of design. Since storyboards are created during the requirements gathering, storyboarding tools can partly cover transitions between the early stages of UCSE. Furthermore, storyboards are a very suitable artefact to specify the use of context-aware applications, thus we decided to integrate the tool support for storyboards in the Gummy [16] GUI builder tool. Gummy supports graphical design of multi-device and context-aware user interfaces. To enable this, Gummy automatically adapts its workspace according to the considered target platform and thus allows designers to create user interfaces for a wide range of devices without having to change their work practices. The inclusion of storyboards during this design stage better describes the context of a user interface and provides a more convenient way to describe the intended contextof-use. This way, the storyboards provide guidelines for the design of the UI. In the storyboarding extension of Gummy, a team member, e.g. an interaction designer starts the creation of the storyboard by loading a scenario into the workspace. Following, a sequence in the scenario can be selected and consequently, a new scene can be created. The sequence of the scenario is automatically added to the scene as a description, while the interaction designer can load an image and add a title. The image of the scene can be a photo of the user observations or a scanned sketch, which encourages designers to sketch in a creative and informal way [11]. A screenshot of the storyboarding tool is shown in Fig. 2. Fig. 2. A screenshot of the storyboarding extension in Gummy. Scenarios can be loaded on the left panel. For a selected sequence in the scenario a new scene can be created. A scene can contain sketches of users interacting with the future system, a title and a description. Supporting Multidisciplinary Teams and Early Design Stages Using Storyboards 621 For each scene in a storyboard, team members can add annotations and point out the personas and devices. When the specifications of a device (e.g. screen size) are included in the scenes, this information can be considered when the workspace for the UI design is loaded. The contextual information of the scenes (e.g. sketches presenting the environment or courses of communication) can be used as guidelines for the UI design without obstructing the creativity of UI designers. By extending existing tool support, the visibility and traceability of UCSE processes can be enhanced. The storyboard extension makes it possible to include the results of the first UCSE stages (user needs analysis) and helps to process and structure these narrative artefacts. Furthermore, a visualization of a scenario by scenes makes it possible to see the usability requirements at a glance, which improves the communication and efficiency in the project team. 5 Ongoing and Future Work Storyboards are implicitly used in different ways by multidisciplinary teams. This partly explains the many interpretations of storyboards and reveals the challenges in developing a storyboarding tool for multidisciplinary teams. In ongoing work we are carrying out a survey considering the roles in a multidisciplinary team and the tools used by members of a team. Observations and interviews are organized to investigate current practices of multidisciplinary teams in industry. Furthermore, storyboards as defined in this paper are introduced in a multidisciplinary project team and the storyboarding tool will be evaluated during several iterations. Based on the findings of these studies, we will fine-tune the features of storyboards and the relationships between storyboards and other artefacts. As the current version of the storyboarding tool is intended for individual use, the user studies may also reveal some expectations of teams regarding a distributed and a collaborative version of the tool. Furthermore, contextual information and platform specifications can be extracted from the scenes in a storyboard to guide design of UIs in the Gummy GUIbuilder. 6 Conclusion In this paper we described how storyboards can contribute to UCSE. By sketching users interacting with a future application, pointing out devices and adding annotations in the early stages of a UCSE project, these storyboards contain functional and nonfunctional requirements. Storyboards can contain rich contextual information and are based on an intuitive notation providing more structure than narrative scenarios of use. We integrated tool support for the creation and the use of storyboards in the Gummy multi-device GUI builder. Ongoing and future studies are being carried out to examine the approach of multidisciplinary teams in industry and to adapt the storyboarding tool according to current practices. 622 M. Haesen et al. This new level of tool support can simplify the creation of artefacts at later stages of a development process and improves the communication within a multidisciplinary team. The comprehensibility of storyboards allows non-technical team members to be involved in the first activities of model-based UI development. Consequently, the loss of information after a user needs analysis will decrease while the visibility and traceability of a project increase. Storyboards are a common language in multidisciplinary teams, which contributes to the user experience of the final user interface. Acknowledgements. Part of the research at EDM is funded by EFRO (European Fund for Regional Development) and the Flemish Government. The MuiCSer process framework and the Gummy tool, including the storyboarding extension, are based on our experiences in the IWT project AMASS++ (IWT 060051). References 1. Haesen, M., Coninx, K., Van den Bergh, J., Luyten, K.: MuiCSer: A Process Framework for Multi-Disciplinary User-Centered Software Engineering processes. In: Proceedings of Human-Centred Software Engineering, September 2008, pp. 150–165 (2008) 2. International Standards Organization. ISO 13407. Human Centred Design Process for Interactive Systems. Geneva, Swiss (1999) 3. Redmond-Pyle, D., Moore, A.: Graphical User Interface Design and Evaluation. Prentice Hall, London (1995) 4. Carroll, J.M.: Making use: scenario-based design of human-computer interactions. MIT Press, Cambridge (2000) 5. Pruitt, J., Adlin, T.: The Persona Lifecycle: Keeping People in Mind Throughout Product Design. Morgan Kaufmann, San Francisco (2006) 6. Holtzblatt, K., Wendell, J.B., Wood, S.: Rapid Contextual Design: A How-to Guide to Key Techniques for User-Centered Design (Interactive Technologies). Morgan Kaufmann, San Francisco (December 2004) 7. Lindgaard, G., Dillon, R., Trbovich, P., White, R., Fernandes, G., Lundahl, S., Pinnamaneni, A.: User needs analysis and requirements engineering: Theory and practice. Interact. Comput. 18(1), 47–70 (2006) 8. Brown, J., Lindgaard, G., Biddle, R.: Stories, Sketches, and Lists: Developers and Interaction Designers Interacting Through Artefacts. In: Proceedings of Agile 2008, pp. 39–50 (2008) 9. Myers, B.A., Park, S.Y., Nakano, Y., Mueller, G., Ko, A.: How designers design and program interactive behaviors. In VL/HCC, pp. 177–184 (2008) 10. Bailey, B.P., Konstan, J.A., Carlis, J.V.: Demais: designing multimedia applications with interactive storyboards. In: ACM Multimedia, pp. 241–250 (2001) 11. Newman, M.W., James, A.L.: Sitemaps, storyboards, and specifications: A sketch of web site design practice. In: DIS 2000 Designing Interactive Systems, pp. 263–274. ACM Press, New York (2000) 12. Nichols, J., Lau, T.: Mobilization by demonstration: using traces to re-author existing web sites. In: IUI 2008: Proceedings of the 13th international conference on Intelligent user interfaces, pp. 149–158. ACM Press, New York (2008) Supporting Multidisciplinary Teams and Early Design Stages Using Storyboards 623 13. Li, Y., Landay, J.A.: Activity-based prototyping of ubicomp applications for long-lived, everyday human activities. In: Proceedings of the Conference on Human Factors in Computing Systems, CHI 2008, pp. 1303–1312 (2008) 14. van der Lelie, C.: The value of storyboards in the product design process. Personal Ubiquitous Computing 10(2-3), 159–162 (2006) 15. Sova, R., Sova, D.H.: Storyboards: a dynamic storytelling tool. Technical report, Sova Consulting Group, Tec-Ed Inc. (2006) 16. Meskens, J., Vermeulen, J., Luyten, K., Coninx, K.: Gummy for multi-platform user interface designs: Shape me, multiply me, fix me, use me. In: Proceedings of the working conference on Advanced Visual Interfaces, AVI 2008, pp. 233–240. ACM Press, New York (2008) Agent-Based Architecture for Interactive System Design: Current Approaches, Perspectives and Evaluation Christophe Kolski1, Peter Forbrig2, Bertrand David3, Patrick Girard4, Chi Dung Tran1, and Houcine Ezzedine1 1 LAMIH – UMR8530, University of Valenciennes and Hainaut-Cambrésis, Le Mont-Houy, F-59313 Valenciennes Cedex 9, France firstname.name@univ-valenciennes.fr 2 University of Rostock, Computer Science Department, Albert-Einstein-Str 21, D-18051 Rostock, Germany Peter.Forbrig@informatik.uni-rostock.de 3 LIESP, Ecole Centrale de Lyon, 36 avenue Guy de Collongue, F-69134 Ecully Cedex, France Bertrand.David@ec-lyon.fr 4 ENSMA / LISI, Teleport 2, 1 Avenue Clément Ader, B.P. 40109, F-86961 Futuroscope Chasseneuil Cedex, France girard@ensma.fr Abstract. This paper proposes a survey concerning agent-based architectures of interactive systems. This survey is focused on certain models and perspectives. Indeed, general agent-based architectures are first presented. Then agent-based approaches dedicated to CSCW systems are reviewed. The appearance of web services requires new agent-based approaches; basic ideas are introduced. Agent-based interactive systems necessitate new tools for their evaluation; an example of representative evaluation tool is presented. Keywords: Human-computer interaction, Architecture model, agent-based systems, CSCW, design, evaluation. 1 Introduction Since 1983 and the Seeheim’s workshop, architecture is an important research topic in the Human-Computer Interaction domain. It started by defining recommendations for developers, and today, it allows tool definition that will help designing, developing and validating interactive systems. Different types of interactive system architectures have been proposed in the literature. The paper proposes a survey about agent-based architectures. A first global overview of models available in the literature is showed in Fig. 1. The paper is composed of four main parts. In the first one, basic principles of architecture models will be introduced; general agent-based approaches will be listed. The second part concerns agent-based approaches dedicated to Computer Supported Cooperative Work (CSCW). The third part will link agent-based architecture and web services domains. Finally, the fourth part concerns evaluation of interactive systems based on an agent-based architecture; the first version of a dedicated evaluation tool will be briefly exposed. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 624–633, 2009. © Springer-Verlag Berlin Heidelberg 2009 Agent-Based Architecture for Interactive System Design 625 Fig. 1. Global overview of available architecture models 2 From Seeheim Model to Agent-Based Architectures Two main approaches of architecture models were first elaborated: global models, and agent-based models. Global models define a precise structure based on a fixed number of components, whose role and nature are precisely defined. The well-known Seeheim model is the first of them [1]. It recommends developing user interfaces as a separate module, connected to a functional core on which it must lean. The interface itself is organized in three parts: the Presentation (devoted to the management of inputs and outputs), the Controller (defined as a component that manages the sequence of interaction elements) and the Application Interface (which allows the translation between the interactive “world” and the functional core). The main interest of the Seeheim model is to give original definitions that establish good foundations for all works on architecture and tools in HCI. For example, the Arch model [2] proposes some modifications of the Seeheim model (including the functional core into the model, defining an additional component, defining the notion of a “slinky model”), but keeps the main definitions, namely for dialogue control. Nevertheless, global models bring forward some drawbacks, mainly when trying to apply object-oriented approach. While current object-oriented interactive application may involve hundreds of cases, the global structure gives no help on defining elementary interaction classes. MVC (Model-View-Controller) [3], and then agent-based architecture models, such as PAC (Presentation-Abstraction-Control [4], AMF (multi-Agent-Multi-Facets [5]) and AoMVC (Agent-oriented MVC [6]), were designed to solve this problem. They define elementary software bricks composed of some parts (fixed number or not), and define the relations that must exist between bricks and parts. Some of them have been defined as design patterns. So doing, global functions such as Dialogue Control or Presentation are split in each elementary agent, what helps to support iterative design. Some tools to help define applications with these models have been designed, see for instance [7]. However, as global models, agent-based architecture models suffer from problems. Choosing the right level of decomposition is hard for non-experienced developers. More, ensuring strictly the rules of the model (for example, a PAC object only knows its father and its sons) may be difficult when implementation considerations are to be taken into account. Hybrid models, which are supposed to benefit the most from the two approaches, emerged. Mainly, these models lean on a global definition of the architecture based on 626 C. Kolski et al. the Arch model, and use an object-oriented approach to refine some of the main components, such as the Presentation or the Controller. For example, PAC-Amodeus [8] facilitates the design of multimodal applications. Another example is H4, a model that was defined firstly for the Computer Aided Design area; tools were created for various applications, to help the design of applications [9, 10], to help their validation [11], or both [12]. Other related research proposes architecture models concerning distributed and plastic UI [13, 14]. 3 Agent-Based Architectures: Approaches Dedicated to CSCW Systems CSCW systems are not only interactive systems, but also and mainly multi-user distributed systems. For these reasons their architecture must answer new requirements. Three important characteristics are: (1) taxonomy of collaborations, which can be either related to the crossing in a matrix location (local or distant), and temporal view (synchronous or asynchronous), as suggested by [15], or related to the nature of cooperation (asynchronous cooperation, in session cooperation, in meeting cooperation and close cooperation [16]); (2) awareness is the information about activities done by other actors, needed in synchronous cooperation, which can be actor oriented (their effective participation) or production oriented expressed by WYSIWIS (What You See Is What I See) acronym with a strict or relaxed view of working data; (3) nature of cooperation activities can be examined, as initially proposed by [17] in relation to the support of three main kinds of activities, i.e. production, conversation/ communication and coordination between participants. From an architectural point of view, CSCW systems are clearly inspired by interactive systems architectures, i.e. layered, agent and hybrid architecture are also used for CSCW systems. We can mention Zipper [18] and Dewan [19] models for layered collaborative systems, based mainly on ARCH model adaptation to multi-user distributed situations. ALV and AMF-C [20] are the representatives of agent-based systems. They generalize PAC agent model for collaborative distributed situations. CoPAC, PAC* and Clover (all described in [21]) are typical examples for hybrid systems. In this last case, they reuse ARCH model and adapt it to multi-user and distributed situations. All these architecture models take into account synchronous collaboration allowing real time interaction between cooperating actors. Distant and local interactions are treated in the same way, as only mediated interactions are taken into account, i.e. direct local nonmediated interaction is not supported. Asynchronous collaboration is not addressed mainly because in this case multi-user interaction, awareness and cooperative operations are not done by interaction. Awareness of shared artifacts (data) and participating actors is more or less supported as well as strict and relaxed WYSIWIS. Concerning cooperation activities (production, conversation and coordination), these are either fundamental elements (for PAC* and Clover) or naturally integrated (AMF-C). Hybrid architectures are either agent-based only in Control part of the model (CoPAC and PAC*) or agent orientation can be used also in other parts of the model. Recent evolution of cooperative systems is related to the mobility of the actors, evolving in augmented real environment with pervasive behavior of the environment and related context-aware computing. The concept of nomadism (networking, handheld Agent-Based Architecture for Interactive System Design 627 devices, mobile communicating objects technology, localization and permanent or non permanent connectivity) extends the CSCW and allows us to introduce the concept of "capillary" CSCW [16]. We use this term by analogy with the network of blood vessels. As its name implies, the purpose of the capillary CSCW is "to extend the capacities provided by co-operative working tools in increasingly finer ramifications, from their use on fixed proprietary workstations to small handheld devices". Main characteristics are: management of collaboration and coordination of the mobile actors, coherence and validity of the information exchanged between handheld devices which are connected only intermittently to the network and the "group" with the aim of having the most synchronized possible information, heterogeneity of the communication protocols of the handheld devices and constraints of interface and overall capacity of the handheld devices in terms of size of screen, speed transmission, memory, autonomy, as well as the interaction devices. In recent evolution of the AMF-C model, its transformation from a fully agent-based system to hybrid system, integration of IRVO perception of new paradigm of interaction (interaction with real and virtual objects) allows it to fully address problems with capillary cooperative systems. In this new mobility context adaptation to different interaction devices, environmental situations, software and hardware platforms and user preferences becomes the core problem. Adaptation techniques can be classified in four different categories ranging from easiest to implement to most powerful: Translation techniques; Markup language-based approaches; Reverse and re-engineering techniques; Model-based approaches. Designing and implementing interactive collaborative applications that are adaptable (manually) or adaptive (automatically) to the context of use requires consideration of the characteristics of the user, the interactive platform as well as the constraints and capabilities of each environment. A state of the art survey shows us that among the large majority of existing approaches for adaptation, the model-based approach seems to be the most powerful. Such approach uses high level and abstract representations that can be instantiated latter on in the development lifecycle to meet specific usability requirements. However, these approaches need to combine apparently independent models such as concepts (e.g. UML), task (e.g. CTT), platform (e.g. CC/PP) or user profiles. The relationships between these models need to be defined at the design step and refined at run-time in order to be able to achieve the overall usability. Our belief is that, what we refer to as an interaction model is the right place to glue together all the models and usability attributes. This model must support both design stage linking other models and run-time. In addition, because Software Engineering and HCI have shown the importance of clearly separate functional core from presentation components, our interaction model is supported by a well structured architecture. In this new version of the AMF-C architectural model [22], we maintain the basic characteristic of the model, i.e. the Multi-faceted approach allows the creation of new facets, to clarify the behavior and allow automation of implementation process; a graphical formalism that expresses the control structure of multi-user interactions and adaptation in real time of awareness characteristics; and a run-time model that allows dynamic control of interactions. We add IRVO interaction formalism allowing the expression of new augmented reality interactions and we structure the system with hybrid approach, allowing to mix XML specifications, engine based interpretation and connection to real components of functional core or managing new interaction devices (Fig. 2). 628 C. Kolski et al. Fig. 2. Relations between Arch model (dashed lines), AMF-C and IRVO models 4 Web Services and Agent-Based Architectures Web services lead to new possibilities and problems concerning distributed system design. Fig. 3 suggests a complex industrial organization exploiting web services. Fig. 3. Example of different actors communicating directly or not via web services [23] The traditional web services provide functionalities based on classical client/server architecture, but agent-based architectures offer new perspectives in this field. They utilize autonomous and proactive behaviors of agents. Interesting new approaches appear in the literature. For instance, a technical framework for AWS (agent-based web services) is described in [24]; it supports the idea of capturing, modeling and implementing service functionalities with autonomous and dynamic interactions. Technically agent-oriented software construction, knowledge representation and interaction mechanisms are integrated. Fig. 4 gives an impression of the framework. DAML-S (DARPA agent markup language for services) is a semantic markup language for describing web services and related ontologies. It has been superseded by OWL-S [25]. A discussion of dynamic web-service invocation by agents can be found in [26]. Their infrastructure is a hybrid peer-to-peer model. Agents are used to specify service providers and service customers. For this purpose JADE [27] (Java Agent Development Environment) is used; it is a framework developed as open source project. A Agent-Based Architecture for Interactive System Design 629 web service can be published as a JADE agent service and agents services can be published as web service endpoints (see also [24]). Such propositions have to be considered with attention regarding agent-based architecture perspectives concerning service-oriented interactive systems. Business Application Environment (Business-oriented protocols) (e.g. contract net and e-auction for e-Marketplace) Web Service Operation Protocols (e.g. WSDL, BPEL4WS and WS Security) AWS: #m Business Operator Service Protocols (e.g. DAML-S) Communicatio Interaction ProblemSolving HTTP Communicatio e Interaction (e.g. DAML-S) led g SOAP SOAP Kn ow ow Kn ge led ProblemSolving AWS: #m Representation Application Entities Communication Transportation Fig. 4. Integrated technical framework for agent-based web services [24] 5 Agent-Based Architectures: The Evaluation Problem The evaluation of interactive systems aims at ensuring that users are capable of realizing their tasks. The evaluation methods and tools are numerous and of different types; they are generally based on two global criteria: utility and/or usability [28]. When the interactive system uses an agent-based architecture, new methodological and conceptual questions appear. For instance: how to evaluate such systems? Is it necessary to combine several evaluation methods? Is it possible to be assisted by automated or semi-automated evaluation tools? How to connect such tools to the agent-based systems? How to link the agents’ behaviors with the analyzed situations? There are several further questions. We are particularly interested in automated or semi-automated tools. An electronic informer (EI) is a software tool that captures automatically interactions between the user and the UI in real situations in a discreet and transparent way, so that the user does not feel hampered by the tool. The captured data are objective and can be scientifically analyzed by the evaluators. For a review about EI, we refer to [29]. Several tools are available, but very few of them take into account the specificities of agent-based interactive systems in their evaluation approaches [30, 31, 32, 33]. The architecture of a tool dedicated to such systems is showed in Fig. 5. This kind of EI aims at capturing not only interactions between user and interface agents in terms of occurred UI events like other EIs, but also interactions between agents themselves in terms of interactions between services. It aims also to go further than other EIs to assist evaluators in interpreting analysis results of captured data in order to evaluate three aspects of an agent-based interactive system: user interface (UI), some non-functional properties (such as response time, reliability, complexity, etc.), and 630 C. Kolski et al. Fig. 5. Example of a tool for evaluating agent-based interactive systems [33] properties of users to operate systems (ability, habits, preferences, progress of a certain user, etc.). Seven independent modules compose this tool. The module 1 is responsible of capturing events that occur from all agents of the system and then, it saves them into a database that will be analyzed by other modules. The connection between this EI and the evaluated agent-based system is based on the association of each type of agents (interface agents, controller agents, application agents) with a corresponding informer. The evaluation can be remotely realized. This module 1 and the evaluated system can run on the same machine, or on two different ones on the network. After capturing data, this EI enables the evaluator to determine tasks that user has realized (module 2). Some synthetic calculations and statistics can be realized on captured data such as the number and frequency of occurred events, average response time of service interactions, time taken to realize a task, number of successful or failed tasks, etc., of any chosen agent or all the agents in any chosen period of time. These analysis results will be showed to the evaluator using tables or graphs (module 3). The tool also enables the generation of Petri Nets (PNs) and the evaluator can Agent-Based Architecture for Interactive System Design 631 compare PNs (module 4 and 5). A generated PN describes user’s actions in terms of UI events (that have ever occurred on interface agents) and system’s actions in terms of executed services of agents in order to realize a certain task. Generated PNs are called observed PNs or real PNs. The evaluator can compare real PNs to realize a certain task of a certain user with theoretical PNs predicted by the designers for the same task or he/she can do the comparison between real PNs to realize the same task of different users. Exploiting formal aspect of the PNs, such comparisons are very useful for evaluators to detect problems of the interface, the system or the users such as: bad or useless actions of users, non-optimal way chosen by users to realize tasks, failed service interactions, properties of users (habits, evaluation and comparison of abilities of different users, supervision of the progress of abilities of a certain user, etc.). The analysis results of the module 3, the generation and comparison of PNs, all these results can be interpreted with the indications of module 6 (that enables the association with an open list of determined criteria) to help evaluators critique the system and propose useful suggestions to the designers for improvements. This tool is representative of a new generation of tools dedicated to agent-based interactive systems. A lot of research is still necessary in this domain (adaptation to different application fields and architecture models, helps in real time…). 6 Conclusion and Perspectives Since the eighties, many models and approaches are proposed in the literature concerning so-called distributed or agent-based architectures of interactive systems. By lack of place, it was just possible to propose a brief overview of this domain, about (1) general agent-based architecture models, (2) models dedicated to CSCW systems, (3) interactive systems based on web services, (4) evaluation of interactive systems using agent-based architecture. Many research and development perspectives can be now envisaged. Currently, general agent-based architecture models are mainly used at the conceptual level. They allow good design of application, minimizing dependencies and improving maintainability of applications. They need now to be more largely used at the implementation level. Their inclusion into integrated development environments, such as Eclipse for example, might be the next step to allow tools to be developed. Help for software design, simulation, and evaluation are the main topics that are to be addressed. Capillary cooperative systems need important context adaptation. These mechanisms are more easily elaborated in hybrid architectures using agents in several layers. The benefits of autonomous behavior and independence of agent-based systems constitutes an important advantage. Many researches concern currently context-aware interactive systems; different types or generations of adapted agent-based architecture models have to be progressively proposed. Agent-based systems might help to dynamically compose web services. In this way they can support dynamic adaptation of workflow systems. Many research problems have also to be studied and solved regarding the evaluation of agent-based interactive systems. 632 C. Kolski et al. Acknowledgements. The present research work has been supported by CISIT, the Nord-Pas-de-Calais Region, the European Community (FEDER). The authors gratefully acknowledge the support of these institutions. References 1. Pfaff, G.E.: User interface management system. Springer, Heidelberg (1985) 2. Bass, L., Little, R., Pellegrino, R., Reed, S.: The Arch Model: Seeheim revisited. In: Proceedings of User Interface Developers Workshop, Seeheim (1991) 3. Goldberg, A.: Smalltalk-80, the interactive programming environment. Addison-Wesley, Reading (1983) 4. Coutaz, J.: PAC, an Object-Oriented Model for Dialog Design. In: Bullinger, H.-J., Shackel, B. (eds.) Proc. Interact 1987, 2nd IFIP International Conference on HumanComputer Interaction, Stuttgart, Germany, September 1-4, 1987, pp. 431–436 (1987) 5. Ouadou, K.: AMF: Un modèle d’architecture multi-agents multi-facettes pour Interfaces Homme-Machine et les outils associés (in French), PhD Thesis, ECL, Lyon (1994) 6. Goschnick, S., Sterling, L.: Shadowboard: an Agent-oriented Model-View-Controller (AoMVC) architecture for a digital self. In: Proc. Int. Workshop on Agent Technologies over Internet Applications (ATIA 2001), Tamkang University, Taipei, Taiwan (2001) 7. Jambon, F.: From Formal Specifications to Secure Implementations. In: Kolski, C., Vanderdonckt, J. (eds.) Computer-Aided Design of User Interfaces (CADUI 2002), pp. 43–54. Kluwer Academics, Dordrecht (2002) 8. Nigay, L.: Conception et modélisation logicielles des systèmes interactifs: application aux interfaces multimodales (in French), PhD Thesis, Joseph Fourier Univ., Grenoble (1994) 9. Texier, G., Guittet, L., Girard, P.: The Dialog Toolset: a new way to create the dialog component. In: Stephanidis, C. (ed.) Universal Access in HCI, pp. 200–204. Lawrence Erlbaum Associates, Mahwah (2001) 10. Depaulis, F., Maiano, S., Texier, G.: DTS-Edit: an Interactive Development Environment for Structured Dialog Applications. In: Kolski, C., Vanderdonckt, J. (eds.) ComputerAided Design of User Interfaces (CADUI 2002), pp. 75–82. Kluwer Academics, Dordrecht (2002) 11. Francis, J., Girard, P., Boisdron, Y.: Dialogue Validation from Task Analysis. In: Duke, D.J., Puerta, A. (eds.) Eurographics Workshop on Design, Specification, and Verification of Interactive Systems (DSV-IS 1999), Braga, Portugal, pp. 205–224. Springer, Heidelberg (1999) 12. Baron, M., Girard, P.: SUIDT: Safe User Interface Design Tool. In: International Conference on Intelligent User Interfaces Computer-Aided Design of User Interfaces (IUICADUI 2004), Madeira, Portugal, pp. 350–351. ACM Press, New York (2004) 13. Balme, L., Demeure, A., Barralon, N., Coutaz, J., Calvary, G.: CAMELEON-RT: A Software Architecture Reference Model for Distributed, Migratable, and Plastic User Interfaces. In: Markopoulos, P., Eggen, B., Aarts, E., Crowley, J.L. (eds.) EUSAI 2004. LNCS, vol. 3295, pp. 291–302. Springer, Heidelberg (2004) 14. Calvary, G., Daassi, O., Coutaz, J., Demeure, A.: Des widgets aux comets pour la plasticité des systèmes interactifs. Revue d’interaction Homme-Machine 6(1), 33–53 (2005) 15. Ellis, C.A., Gibbs, S.J., Rein, G.L.: Groupware: some issues and experiences. Communication of ACM 34(1), 39–58 (1991) Agent-Based Architecture for Interactive System Design 633 16. David, B., Chalon, R., Vaisman, G., Delotte, O.: Capillary CSCW. In: Stephanidis, C., Jacko, J. (eds.) Human-Computer Interaction Theory and Practice, LEA, pp. 879–883 (2003) 17. Ellis, C.A., Wainer, J.: A Conceptual Model of Groupware. In: Proceedings of CSCW 1994 Conference, pp. 79–88. ACM Press, New York (1994) 18. Patterson, J.F.: A taxonomy of architectures for synchronous groupware applications. In: Workshop on Software architectures for cooperative systems CSCW 1994. ACM SIGOIS Bulletin Special Issue Papers of the CSCW 1994 workshops, vol. 15(3) (April 1995) 19. Dewan, P., Choudhary, R.: Coupling the User Interfaces of a Multiuser Program. ACM Transactions on Computer-Human Interaction 2(1), 1–39 (1995) 20. Tarpin-Bernard, F.: Architectures logicielles pour le travail cooperatif (in French), PhD Thesis, Ecole Centrale de Lyon, France (1997) 21. Laurillau, Y.: Conception et réalisation logicielles pour les collecticiels centrées sur l’activité de groupe: le modèle et la plate-forme Clover (in French), PhD Thesis, Joseph Fourier University, Grenoble (2002) 22. Tarpin-Bernard, F., Samaan, K., David, B.: Achieving usability of adaptable software: the AMF-based approach. In: Seffah, A., Vanderdonckt, J., Desmarais, M.C. (eds.) HumanCentered Software Engineering, Software Engineering Models, Patterns and Architectures for Human-Computer Interaction, Springer, Heidelberg (2009) 23. Idoughi, D.: Contribution à un cadre de spécification et conception d’IHM de supervision à base de services web dans les systèmes industriels complexes, application à une raffinerie de sucre (in French), Ph.D. Thesis, University of Valenciennes, France (2008) 24. Li, Y., Shen, W.-m., Ghenniwa, H., Lu, X.: Model-Driven Agent-Based Web Services IDE. In: Wang, S., Tanaka, K., Zhou, S., Ling, T.-W., Guan, J., Yang, D.-q., Grandi, F., Mangina, E.E., Song, I.-Y., Mayr, H.C. (eds.) ER Workshops 2004. LNCS, vol. 3289, pp. 518–528. Springer, Heidelberg (2004) 25. Paolucci, M., Sycara, K.: Autonomous Semantic Web Services. IEEE Internet Computing 7, 34–41 (2003) 26. Yang, H., Chen, J., Meng, X., Zhang, Y.: A Dynamic Agent-based Web Service Invocation Infrastructure. In: Proceedings of the First Int. Conf. on Advances in ComputerHuman Interaction, Sainte Luce, Martinique, pp. 206–211 (2008) 27. Bellifemine, F., Caire, G., Poggi, A., Rimassa, G.: JADE: A software framework for developing multi-agent applications. Lessons learned, Information & Software Technology 50(1-2), 10–21 (2008) 28. Nielsen, J.: Usability Engineering. Academic Press, Boston, MA (1993) 29. Hilbert, D.M., Redmiles, D.F.: Extracting usability information from user interface events. ACM Computing Surveys 32(4), 384–421 (2000) 30. Trabelsi, A., Ezzedine, H., Kolski, C.: Architecture modelling and evaluation of agentbased interactive systems. In: Proc. IEEE SMC 2004, The Hague, pp. 5159–5164 (2004) 31. Tarby, J.-C., Ezzedine, H., Rouillard, J., Tran, C.D., Laporte, P., Kolski, C.: Traces using aspect oriented programming and interactive agent-based architecture for early usability evaluation: Basic principles and comparison. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 632–641. Springer, Heidelberg (2007) 32. Ezzedine, H., Bonte, T., Kolski, C., Tahon, C.: Integration of traffic management and traveller information systems: basic principles and case study in intermodal transport system management. Int. J. of Comp., Com. & Control (IJCCC) 3, 281–294 (2008) 33. Tran, C.-D., Ezzedine, H., Kolski, C.: A generic and configurable electronic informer to assist the evaluation of agent-based interactive systems. In: 7th international conference on Computer-Aided Design of User Interfaces, CADUI 2008, Albacete (June 2008) BunBunMovie: Scenario Visualizing System Based on 3-D Character Tomoya Matsuo and Takashi Yoshino Wakayama University, 930 Sakaedani, Wakayamaya, Japan yoshino@sys.wakayama-u.ac.jp Abstract. There are many text-based contents, such as novels and script. Those contents have only scenario, and lack visual information. The purpose of this research is to provide visualizing environment that can visualize text-based contents easily. Moreover, such environment can also provide the opportunity to get pleasure out of scenario. It is necessary for visualizing scenario to make various motions of characters and to depict various situations. Therefore we propose motion assortment function to make various motions of characters. The function uses a Japanese dictionary and a thesaurus search. We also propose associated image display function that uses an image search to depict various situations. From the experiments about the motion assortment function, we show that the proposal method can assort some motions. From the experiments of subjective assessment, we found that some subjects inclined to use such easy visualizing environment. Keywords: Scenario visualizing, 3-D character, motion synthesis. 1 Introduction There exist many forms of written communication, such as diaries, novels, and scripts. The contents of these communication media are text-based and can be used to provide descriptions of various scenarios. It is extremely easy to use one’s imagination to visualize a scenario. However, to actually recreate a scene, it is necessary to use images and animation. This requires considerable preparation. Recently, animation has gained popularity among people and the demand for it has increased considerably. However, it is difficult to create animated objects using 3D software. To date, there has been no single environment that can be easily utilized for creating animated objects. In this study, we propose a scenario visualization system that only requires users to input sentences describing the required situation. In order to effectively create 3D movies, it is necessary to provide detailed descriptions of the scene, the characters, and their movements. We have attempted to simulate different types of movements for verbs that are not registered in the system by using a national language dictionary and a thesaurus. In addition, our system also retrieves images corresponding to the scenario and displays them. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 634–643, 2009. © Springer-Verlag Berlin Heidelberg 2009 BunBunMovie: Scenario Visualizing System Based on 3-D Character 635 2 Related Works Previous studies have proposed several systems that can recreate scenarios only on the basis of input information. Zeng developed a system named “3D Story Visualiser”[1, 2]. This system utilizes visual information in input sentences to create 3D scenes. The input is in the form of natural language and is processed with a tool known as NLS. The tool extracts nouns and prepositions from the input sentences, converts them into VRML form, and then outputs them as 3D images. In our system, the only task performed by the user is the input of sentences into the system. However, the 3D Story Visualiser can only reproduce information as a 3D image. The movements of the character are not reproduced. The aim of our study is to create a system that produces animated characters. Aoki developed a system (for creating animated objects) entitled the “Digital Movie Director”[3]. His system is based on TVML technology [4, 5] which was developed by the NHK Science & Technical Research Laboratories. In this system, the subject, predicate, and object used in the scene are set by the user. Moreover, the user can set camera angles and sound effects. In our system, the only task of the user is to input sentences. However, in the Digital Movie Director, the user has to first develop both the characters and the scene. Hence, a large amount of time is required to develop 3D images using this system. The Digital Movie Director cannot create a 3D movie unless both the scene and movements of the characters are prepared in advance. 3 System for Visualizing 3D Character Scenarios 3.1 Objective The objectives of our system are stated as follows: 1. The only task to be performed by the user is the input of sentences describing the scenario. We aim to develop a visualization system wherein no special operations are required to be performed by the user. A system that only requires sentences to be input can be easily used by all users. Our system first analyzes the input sentences and then determines the subject and predicate. 2. The system responds to verbs other than the registered verbs. It is difficult to simulate movements corresponding to all verbs. Hence, it is necessary for the system to respond to verbs other than those that are registered. Therefore, we have developed a “dictionary retrieval function” and a “movement synthesis function.” These functions allow us to simulate movements that correspond to verbs that are not registered in the system. 3.2 BunBunMovie The system that we have developed in this study is known as “BunBunMovie.” It analyzes input sentences and uses the information in them to develop a 3D movie. Figure 1 shows the execution screen of our system. The upper part of Figure 1 shows 636 T. Matsuo and T. Yoshino a screen that displays the input sentences. “3D character,” “Related image,” and “Sentences under analysis” are displayed on the screen. The lower part of Figure 1 shows the screen used to input sentences. The user inputs BunBunMovie: Scenario Visualizing System Based on 3-D Character 3 sentences and pushes the reproduction button, following which the sentences are displayed on the screen. The BunBunMovie system is programmed using C#. We use TVML to create 3D animated objects. MeCab [6] is used for morphological analysis. Image of character Image of place Moving character Name of character and movement name of character Charactor name : I Action : Dance Reproduction button Yamada said "yah."And, Yamada jumped high. I was angry and s aid "It is annoying!!" T he uncle was s urpris ed at the voice. I apologized to the uncle, "I'm sorry." Input sentences I was reading the book. I went out because I had gotten tired of it. On the way, I met the friend. The friend said, "I was busy now," and went to the other s ide. I became sad a little. However, I played on down at once. I went to the dis co, and I danced with Tanaka. Input history Fig. 1. Execution screen of the system Flow of sentence analysis process. We now explain the process for the analysis of the input. Figure 2 shows the flow of the procedure for sentence analysis. The explanation of each function is as follows. 1. First, the system analyzes the input sentences using MeCab (Figure 2 (1)). The nouns that denote the subject and place are determined from the relation between the noun and the case-marking particle. 2. The system examines whether the noun assumed to be a subject exists in a “subject list” (Figure 2 (2)). If the noun exists in the list, its related image is retrieved (Figure 2 (3)). 3. To analyze the verb, the system examines a “verb list” (Figure 2 (4)). If the verb under analysis does not exist in the list, the system refers to a dictionary (Figure 2 (5), (6), (7)). 4. The system then examines whether the noun assumed to denote a place exists in a “place list” (Figure 2 (8)). If the noun exists in the list, its corresponding image is retrieved (Figure 2 (9)). 5. The subject, predicate, and the noun indicating the place are converted into TVML format (Figure 2 (10)). 3.3 Functions of the BunBunMovie System The BunBunMovie system uses three functions known as “word list,” “dictionary retrieval,” and “image retrieval” to recreate desired scenarios from the information inputted by the users. BunBunMovie: Scenario Visualizing System Based on 3-D Character 637 [Input sentences] (1) [Morphological analysis] The system assumes the noun to be a subject from the relation between the noun and the case-marking particle. MeCab The system assumes that the noun is place noun from the relation between the noun and the case-marking particle. Image search for object (2) [List of subject] The image is displayed in a movie. subject [Google image search] (3) Image TAMATEBAKO] (5)[RUIGO Synonym (4) [List of verb] dictionary verb Case of unregistered verb Place Image search for place The image is displayed in Movie. (7) The national language dictionary retrieval result and the synonym dictionary retrieval result are examined. [Yahoo dictionary] (6) Language When the national language dictionary retrieval result exist in the synonym dictionary retrieval result, the system examines the verb list again. dictionary (8) [List of place noun] Image search With face image search option [Google image search] (9) Image search Image (10) [TVML data] Play TVML Player Fig. 2. Flow of the sentence analysis process Use of “Word list”. The word list consists of a list of words that already have corresponding images present in the system. The word list consists of three lists, a “subject list,” “verb list,” and a “place list.” The system checks these lists for words extracted from the input sentences and uses the corresponding images to recreate the desired scenarios. The detailed description of each list is as follows: 1. Subject List A noun that is recognized by the system as a subject is registered in the subject list. For example, nouns such as “I” and “he” are present in the subject list. All words registered in the subject list are nouns that fall in the “living thing” category of the Japanese dictionary. The number of registered words in our list is approximately 1700. 2. Verb List Verbs such as “run” and “walk” are registered in the verb list. The verbs registered in the verb list can be converted into TVML. There are 103 verbs present in our list. 3. Place List This list contains nouns that indicate a place or location. For instance, this list contains words such as “meadow” and “mountain.” The total number of words in the place list is approximately 6500. The registered words are nouns that exist in the “place” category of the Japanese vocabulary dictionary. The place list is required to accurately recreate the location of a scene from its description. Dictionary retrieval function. In order to recreate a scene properly, we need to recreate the movements of the characters in the scene. However, it is not possible to simulate all types of movements since the number of verbs registered in the list is limited. This results in a 3D character that is inanimate even though sentences describing its behavior have been inputted. 638 T. Matsuo and T. Yoshino The dictionary retrieval function is used as follows. 1. First, the system obtains synonyms of the unregistered verb from the thesaurus (Figure 2 (5)). 2. Then, the system retrieves the definition of the unregistered verb from the Yahoo online national language dictionary (Figure 2 (6)). 3. The system analyzes the definition of the unregistered verb in the national language dictionary using MeCab. 4. It compares the information obtained from the national language dictionary with the synonyms and examines whether the description obtained from the dictionary matches any of the synonyms (Figure 2 (7)). 5. If a match is found, the system assumes the synonym to be a paraphrase of the unregistered verb. 6. The system examines whether this verb exists in the verb list (Figure 2 (4)). 7. On the basis of the information in the verb list, the appropriate movements are then assigned to the 3D characters. For instance, if the dictionary retrieval function is used for an unregistered verb such as “whisper,” the system paraphrases the verb “whisper” to the verb “say.” The system can re-examine the verb list on the basis of the new information obtained and can recreate the desired scene accurately. The image retrieval function. When sentences are made visible, sight information is insufficient only in the movement of 3D character. We think that the appearances of the character and information of the scene are necessary. The system uses the image retrieval function to add visual information. We use “Google image retrieval” to obtain visual information. Using the image retrieval function, users can obtain and display images related to the subject and the place described in the input. As an example, consider the following sentence inputted by a user: “My uncle was in a meadow.” The system assumes “uncle” to be the subject and “meadow” to be the location. The images corresponding to this subject and place are then retrieved by the system and displayed. Using this function, further information such as the role of the character and other information pertaining to the scene can also be added. Our system also uses Google Image Search’s face filter to retrieve only facial images related to a subject. This option displays all the images related to a particular face by priority. The probability of obtaining inaccurate images of a particular subject using this search option is very low. 3.4 Development of the Movement Synthesis Function Even though the dictionary retrieval function allows us to obtain the definition of several unregistered verbs, it is not possible to process all unregistered verbs using this function. Therefore, we have developed the movement synthesis function. This function combines the movements of verbs already registered in the verb list and then develops movements for unregistered verbs. In other word, the movement synthesis function first selects registered verbs and then combines their movements. BunBunMovie: Scenario Visualizing System Based on 3-D Character 639 A verb is selected on the basis of synonyms of the unregistered verb. The flow of the movement synthesis function can be explained as follows, and it is illustrated in Figure 3. 1. We first retrieve synonyms for verbs that are already registered in the verb list. As a result, synonym tag is added to these verbs. 2. The system then retrieves synonyms of the unregistered word from a thesaurus (Figure 3(1)). 3. The system compares the synonym tag of the unregistered word with that of the registered verbs and examines the number of agreements between them (Figure 3(2)). 4. It then combines the movements of two or more registered verbs and develops the movement corresponding to the unregistered verb (Figure 3(3)). Comparison of synonym tag information (1) Unregistered verb (2) Registered verb Verb:UNKNOWN [ jump, enjoy…] Verb:A[ smile, enjoy…] Verb:B[ sad, cry…] Verb:C[ jump, snap…] (3) The verb with the same synonym tag as synonym tag information on an unregistered verb are chosen. Verb:Ａ Verb:Ｃ Fig. 3. Movement synthesis function. UNKNOWN, A, B and C denote the verb. The words in parentheses denote the synonym tag. 4 Experiments for Evaluation of Movement Synthesis Function We carried out an experiment to evaluate the performance of the movement synthesis function. The purpose of the experiment was to verify whether a user can correctly recognize the movements simulated by the system. In the experiment, ten students of Wakayama University observed the characters and movements developed by the system. They then provided feedback by filling out a questionnaire. 4.1 Experimental Procedure Movement synthesis function made five movements of the character. Those movements of character are ”set, hang, see off, be penitent, and burst.” The test users evaluated the movement of the characters. The number of verbs that were combined to develop the movements was approximately 70. The movement synthesis function chose two verbs from among these verbs to develop each set of movements. The test users answered a questionnaire with a five-point evaluation system and another description-based questionnaire. 4.2 Results of the Experiment Table 1 shows the results of the five-point evaluation system used in the questionnaire. 640 T. Matsuo and T. Yoshino A movement that had a high average rating in the evaluation results was a movement that was correctly recognized by the user. However, there were also movements that had a low average rating. The average ratings of the verbs “burst open” and “see off” were high. Table 2 shows the feedback provided by the test users in the description-based questionnaire. The feedback was both positive and negative. One of the positive instances of feedback stated that the movement of the characters was interesting. On the other hands we also received negative feedback that stated that the movements of the characters were inaccurate. Table 1. Results of the questionnaire of the experiments evaluating the movement synthesis function Movement developed by the system Movements used for Average synthesis set up the traps go out + poke 3.0 hang up dig + apologize 1.8 see off go to + say no 3.8 amend be mad + be worried 2.4 burst be mad + jump 3.4 The values in the “average” column of the table denote the mean values of the ratings given in response to the question “Is the movement of the character appropriate?” The following five-point Likert scale was used as the grading system in the questionnaire. The scale is as follows. 1: Strongly disagree, 2: Disagree, 3: Neutral, 4: Agree, and 5: Strongly agree. Table 2. Impression of the movement synthesis function Positive feedback ・The fact that the system itself develops the movements of the character is interesting. ・Even though the movement of the character may not be recognizable, it is important that the characters react to more words. ・Both appropriate and inappropriate movements were present. The appropriate movements were useful. Negative feedback ・The character expresses two separate movements for a single verb. ・Several movements of the character were not recognizable. ・It is difficult to understand for the synthesis of the movement of a monotonous character. 4.3 Discussion The results of the five-point evaluation in the questionnaire show that the average ratings for the verbs “burst” and “see off” were high. This is an example of the effectiveness of the movement synthesis function. It should be noted that when one of the two verbs used in the synthesis of a movement is related to the target verb, the average rating is high. For instance, “jump” was used for the synthesis of “burst”, and “go to” was used for the synthesis of “see off.” This shows that if verbs with similar meanings are used for synthesis, the required movements can be developed accurately. However, the overall accuracy of the movement synthesis function that is based on synonym information is not high. One of the negative instances of the feedback we obtained stated BunBunMovie: Scenario Visualizing System Based on 3-D Character 641 that the character expressed two separate movements for a single verb. In other words, it implied that the test user was unable to recognize that the movement of the character corresponded only to a single verb. To solve this problem, we plan to improve the system such that it displays more natural movements. Therefore, we intend to utilize composite motions developed by Oshita [7]. 5 Trial Evaluation of System by Users We performed trial experiments to allow users to evaluate the system. In the experiment, the test users inputted sentences describing a scene and then viewed the movie developed by the system. The purpose of the experiment was to allow users to evaluate the accuracy of the analysis of the sentences and the quality of the movie. Ten students from Wakayama University tested our system. 5.1 Experimental Process The test users first inputted the sentences and then observed the movies developed by the system on the basis of the information provided. The interval of the experiment was 10 minutes. After the experiment, the test users answered a questionnaire based on a five-point evaluation system and a description-based questionnaire. 5.2 Results of the Experiment When sentences where the test user clearly described the subject and predicate, such as “I went to college” and “I climb the mountain“ were inputted, the system accurately developed the movie. However, when only predicates such as “Went dancing” and “It kicked and knocked it down” were input, the system was not able to develop a movie. Table 3 shows the questionnaire result of the system. Overall system was highly evaluated by the test users. Table 4 shows the requests and impressions of the test users to the system. Some of the requests of the users included a desire to develop their own images and characters that were then recreated by the system. Other requests involved an improvement in Table 3. Questionnaire result of the system Question Average 2. A related image improves interest. 4.4 3.9 3. I want to visually recreate my diary using this system. 4.1 1. The movements of the characters are interesting. The values in the “average” column of the table indicate the mean values of the ratings given in response to the questions. A five-point Likert scale was used for the evaluation. The scale is as follows. 1: Strongly disagree, 2: Disagree, 3: Neutral, 4: Agree, and 5: Strongly agree. 642 T. Matsuo and T. Yoshino the response of the system to different words and an improvement in the analytical accuracy of the system. Most of the test users felt that the system was interesting, although they stated that it was necessary to increase the number of words that can be translated by the system into images. However, most of the feedback that we received was positive. Table 4. Requests and impressions for the system Request for the system I would like to use sentences that allow the system to recreate images that I have already prepared. I would like to develop the character and then use it in the system. The system should be able to respond to more words. When the system can not analyze sentences, the system should display, “We can not analyze sentences.” The speed of the reproduction of images by the system should be improved. ・・・・・ Impression for the system The system is interesting. I enjoyed using the system, although the movie was slightly unclear. Although the system is interesting, it is necessary to improve it. The system can only be used to visualize diaries when it responds to more words. If the input sentences are not grammatically correct, the characters do not respond well. ・・・・ 5.3 Discussion From the results of the questionnaire with the five-point evaluation system, we found that the rating for the item “I want to visually recreate my diary using the system” was high. However, there were several other requests concerning the system. There were requests for better analysis of the sentences, an increase in the number of words responded to by the system, and improved analytical accuracy. In the experiment, our system was unable to analyze a lot of sentences. Therefore, test users requested an improvement in the accuracy of the analysis of the sentences by the system. 6 Conclusion Conventional visualization systems are not able to recreate scenarios when the words extracted from input information are not registered with the system. Moreover, these systems are also not able to develop movements for each character. To resolve these issues, we proposed a system that uses a dictionary retrieval function and a movement synthesis function to recreate the required scenarios. The performance of our system was evaluated through trial tests by users. The results of our tests were as follows. 1. It was observed that developing appropriate movements for a character was a problem in conventional systems. We developed a movement synthesis function that utilized synonyms to develop the movements corresponding to a particular verb. BunBunMovie: Scenario Visualizing System Based on 3-D Character 643 To verify the accuracy of the movement synthesis function, we performed experiments that evaluated its accuracy. It was observed that many of the movements developed by the synthesis function were not appropriate. However, the use of the movement synthesis function allowed us to develop animated characters. Hence, we proved that our proposed technique has good potential. 2. We also performed test experiments that allowed users to evaluate our system. The test users stated that they were satisfied with the system. We received feedback that stated that the users found the system interesting. This proves the potential of our system. However, our system only responds to sentences in which the subject and the predicate are input appropriately. Therefore, there were several requests concerning the accuracy of the analysis of a natural sentence. We plan to further improve the analytical accuracy of the system and the movement synthesis function. References 1. Aoki, T.: Digital Movie Director, http://www.rcast.utokyo.ac.jp/ja/research/pioneers/007/ index.html 2. Douke, M., Hayashi, M., Makino, E.: A Study of Automatic Program Production Using TVML, Short Papers and Demos. In: Eurographics 1999, pp. 42–45 (1999) 3. Hayashi, M.: TVML (TV program Making Language) Make Your Own TV Programs on a PC! In: International Conferences, Virtual Studios And Virtual Production (2000) 4. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying Conditional Random Fields to Japanese Morphological Analysis. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-2004), pp. 230–237 (2004) 5. Oshita, M.: Smart Motion Synthesis. In: SIGGRAPH 2007 Posters (2007) 6. Zeng, X., Mehdi, Q.H., Gough, N.E.: Shape of the Story: Story Visualization Techniques. In: Seventh International Conference on Information Visualization, pp.144–150 (2003) 7. Zeng, X., Mehdi, Q.H., Gough, N.E.: From Visual Semantic Parameterization to Graphic Visualization. In: Ninth International Conference on Information Visualization, pp. 488– 493 (2005) Augmented Collaborative Card-Based Creative Activity with Digital Pens Motoki Miura, Taro Sugihara, and Susumu Kunifuji School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan miuramo@jaist.ac.jp, sugihara@jaist.ac.jp, kuni@jaist.ac.jp Abstract. Typically, practitioners of the KJ method use paper labels and fourcolored ball-point pens to externalize their thoughts and ideas during the process. A similar approach and method is used in group KJ lessons. However, due to the large paper size required, this approach is limited in effective capturing and sharing of outcomes. Considering the merits of the conventional paper–pen approach and the demand for quick sharing of outcomes after the session, we designed and implemented a system to digitize the group KJ session—not just the outcomes but also the details of the creative work processes. We use digital pens to capture position and orientation of labels, as well as their contents, during the session. We confirmed the efficiency of our system with several KJ sessions. Keywords: CSCW, Creative meeting, Label work, KJ method. 1 Introduction We often use small paper cards or post-it notes to organize our thoughts and ideas. Organization of paper cards has two advantages: (1) moving the cards by hands is intuitive; (2) groups of users can simultaneously access the cards. Therefore such card-based activity is commonly used for summarizing and organizing ideas. However, digitizing the process of card-based activities must address special concerns. There are several tools that handle virtual cards and post-it notes on a PC, such as the KJ Editor [6], GUNGEN [7], and D-ABDUCTOR [8]. These tools are effective in organizing personal tasks but are not suitable in group work, because they require mouse and keyboard input. To solve these input limitations, by introducing a natural and augmented reality approach, Klemmer et al. [1] proposed the Designer’s outpost system to capture the flow and transition of post-it notes on a wall-size large screen. We also developed a system to relieve the occlusion problem of the outpost system by using glass with controllable transparency [2]. However, the outpost system’s large size makes it difficult to move, rendering it inapplicable for digitizing activities in casual meetings usually held in knowledge creating companies and research laboratories. In this paper, we propose a system to capture the location of the paper cards with a small and simple facility. The captured location is then used to reproduce the card organization process after the session ends. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 644–651, 2009. © Springer-Verlag Berlin Heidelberg 2009 Augmented Collaborative Card-Based Creative Activity with Digital Pens 645 2 Capturing Card Locations by Digital Pens We used Anoto-based pens to store drawings on paper cards and a base sheet. The Anoto-based pen can recognize the position of drawings by scanning special dotted patterns on the paper. Using the unique features in patterns, the system can distinguish between data of the drawings on the cards and those on the base sheet. Using these characteristics, the drawings can be used not only for handwriting notes but also describing the relationships between the paper cards and the base sheet. When the user draws a line that covers the sheet and a card (Fig. 1 left), the pen recognizes the line as three drawings (Fig. 1 right). If these drawings are generated at almost the same time, we can consider that, at that time, the paper card was placed so as to connect the three drawings. We call this operation scanning and the connecting points joints. Fig. 1. A line over the card border is separated into three lines By enhancing the technique, we can recognize orientation and overlapping state of paper cards if two joints are extracted by scanning. (Fig. 2) Fig. 2. Recognizing of orientation and overlapping states To eliminate unnecessary pen drawings on paper for scanning, we can simply use a semi-transparent plastic sheet while scanning (Fig. 3) Fig. 3. Using transparent plastic sheet to eliminate drawings 646 M. Miura, T. Sugihara, and S. Kunifuji Fig. 4. Grouping (left) and Ungrouping (right) gestures A similar method was proposed by [3] as research on digitized experiment record notes; however, to edit the card structure, we introduced extra pen gestures called grouping and ungrouping. Grouping can be performed by a continuous round stroke from the top-level card to the child cards (Fig. 4 left). Ungrouping can be defined by a continuous single stroke from the top-level card to the child cards (Fig. 4 right). These grouping and ungrouping operations can be used for common card-based creative activities, especially for making figures in the KJ method [4, 5]. Of course a digital camera can store the paper sheet status in detail. But reusability of the card content is crucial for creative tasks, and the atomic data should be provided to enhance the process. In particular, an authentic KJ method involves procedures that use repetitive tasks for refinement and deepening. 3 GKJ System We developed a system named GKJ (Group KJ) that handles scanned handwritten drawings and gestures captured by multiple Anoto pens. The GKJ system consists of (1) Anoto pens, (2) an L-Box Digital Pen Gateway System (DPGW), and (3) a GKJ editor. A system overview is shown in Fig. 5. The L-Box DPGW collects pen data from multiple pens simultaneously, via a Bluetooth connection, and sends it to a MySQL table in a PC. The GKJ editor checks the updated data and handles the data to construct a digital representation of the current paperwork status. For further editing Fig. 5. GKJ system overview and data flow Augmented Collaborative Card-Based Creative Activity with Digital Pens 647 tasks, the GKJ editor provides functions for organizing the virtual cards, using a mouse and a keyboard as alternative input devices. 4 Usage Scenario Typically the group KJ method session consists of two stages—card gathering and card unfolding. In the gathering stage, the participants discuss and collect cards with similar meanings or arguments. After that, they add extra cards to those gathered, and write an abstract of the cards on the added cards. Then they clip these cards together with paper clips, considering them as a single card. This is repeated until there are less than 6~9 cards in the stack. In the GKJ editor, the folding operation can be performed by the grouping gesture, and the folded cards are shown in the left of Fig. 6. In the authentic KJ method, the participants are basically prohibited from referring to the child cards during this stage. Then they proceed to the unfolding stage. Usually the participants extract the cards on the base sheet, but this requires special care to not destroy the constructed structure of piled cards. Also, in the real world, it is difficult to re-organize the unfolded cards because the amount of area necessary for the cards depends on the number of cards and the layout. A high number of cards prevent a trial and error approach. Therefore we recommend that the participants use virtual cards to estimate a preliminary layout. The GKJ editor provides a function for unfolding virtual cards and pre-organizes the extracted virtual cards by dragging the top-level cards. Figure 6 right shows the unfolded virtual card view. Using this function, the participants can effectively layout the cards by considering the relationship between the cards. After the two stages, the participants obtain a figure (Fig. 7) which represents their issues and viewpoint as an outcome. The participants can review the process by replaying the operations with the GKJ editor. Also they can export the process data or print figures in a PDF format. Incidentally, the curved line in Fig. 7 was constructed by a Bezier curve, whose control points were generated by the convex hull algorithm. The curved lines are automatically recalculated by moving the virtual cards. As described above, the proposed GKJ system allows freely choosing a proper environment (real or digitized cards) for their task such as review by the digitized log Fig. 6–7. Digitized views of grouping — (left) pile and (right) unfolded view 648 M. Miura, T. Sugihara, and S. Kunifuji rollback, and distribution of data. The high portability of the GKJ system makes it useful in a variety of environments. Group sessions with the GKJ system can be held with (1) a base paper sheet, (2) paper cards, (3) digital pens, (4) L-Box, a small Linux box, and (5) a PC. Fig. 8. Final KJ method figure in the GKJ system 4.1 Practice We operated the GKJ system for small courses of collaborative card-base activity (Fig. 9 shows a class at our institute and Fig 10 shows a lecture course with city hall staff). We used pre-printed cards as material, and the participants positioned the cards spatially to represent their thoughts and considerations. In this case, the system may not enhance the ongoing work, but the participants enjoyed scanning and checking the digitized data. The instructor could conduct the course in the same manner as the conventional courses that use the cards (Fig. 9 left). Since the scanning is intuitive, the instructor could easily capture the card locations. The captured data was utilized to generate PDF files, which represent their layout after the course. The precise transition log of the cards was helpful for retrospection. In the lecture course at a city hall (Fig. 10), the participants first wrote their thoughts and work related problems on plain paper cards with pens. Then they classified their cards by hand and discussed the issues. After organizing the cards, by putting them on the base sheet, we scanned the card positions. Augmented Collaborative Card-Based Creative Activity with Digital Pens 649 Fig. 9. Lecture courses with collaborative card-base activity at the university Fig. 10. Lecture course with city hall staffs We obtained the following findings from observations of the sessions and comments from participants. (Advantages) 1. Card writing with a pen was straightforward and intuitive for participants. 2. The participants could naturally organize their cards because they could see other people’s behavior. 3. The quick distribution of digitized logs and figures (PDF) is effective for reviewing the session and discussion. Incidentally, the instructor had been providing digitized logs and session videos for participants even for the conventional lectures, but it took a few days to digitize the outcomes. 4. The instructor and some participants could master the scanning and enjoyed the operation. 5. The position of the scanned data accurately represented the real figures. (Drawbacks) 1. Sometimes the scanning failed due to errors. The most frequent mistake was a scanning section error. The GKJ system utilizes four A2 size sheets to compose an A0 sized base sheet. Since the printed dot pattern of the A2 sheet was same, the user needed to specify the section of the base sheet to the system before scanning by tapping a checkbox. Occasionally the user missed the presetting or scanned with 650 M. Miura, T. Sugihara, and S. Kunifuji a wrong pen. To reduce this error, we now prepare four preset pens for each A2 sheet section. Even if the error occurs, the user can easily fix the misrecognition by rescanning. 2. Sometimes unnecessary scanning lines appeared on cards and base sheets. The reason was misrecognition during scanning. The misrecognition was caused by weak pen pressure, high scanning speed, and the lack of a gap between cards (less than 1 cm). To solve the issue, we added a “transparent” pen mode, which does not draw unnecessary lines while scanning. 3. Some participants wrote upside down on the cards, because it is difficult to recognize the top and bottom. This caused the card content to be shown wrong side up, and the position was scanned incorrectly. The issue could be solved by implementing a function to automatically detect when the wrong side was up by considering the handwritten note, and handling the card carefully. 4. As we described in points 1 to 3, scanning required skill and know-how. The user needed to understand the characteristics of the pen and the GKJ to operate the system adequately. However, the skill could be easily acquired with a few minutes of training, and a failed scanning could be easily recovered by rescanning. Even with its drawbacks, the GKJ system has the potential to augment conventional paper based discussion. We also found that most of the drawbacks could be solved by further system refinements. 5 Conclusion We proposed a method for capturing the location and hierarchical structure of paper cards written with Anoto-based pens. The method enables the participants to record a precise atomic transition log of the cards. We also developed a system for digitizing paper-based card organization tasks instantly, based on the proposed method. Due to simplicity of the pen-based input, the GKJ system is universal; it can be used by office workers and also the elderly and primary school children. We confirmed the effectiveness of the system with several sessions. We applied it to small group learning sessions of up to 10 persons, but this system is applicable for many participants and groups, since the system can handle up to 40+ pens at the same time. We will refine the GKJ system to improve its usability, and it should contribute to the effectiveness of group discussions that include various types of participants, such as town meetings. Acknowledgement Digital Pen Gateway System and related technologies are from NTT Comware Tokai Corporation. Our research is partly supported by a grant-in-aid for Scientific Research (20680036, 20300046). References 1. Klemmer, S.R., Newman, M.W., Farrell, R., Bilezikjian, M., Landay, J.A.: The Designers’ Outpost: A Tangible Interface for Collaborative Web Site Design. In: Proceedings of UIST 2001, pp. 1–10 (2001) Augmented Collaborative Card-Based Creative Activity with Digital Pens 651 2. Miura, M., Kunifuji, S.: A Tabletop Interface Using Controllable Transparency Glass for Collaborative Card-based Creative Activity. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS, vol. 5178, pp. 855–862. Springer, Heidelberg (2008) 3. Ikeda, H., Furakawa, N., Konoishi, K.: iJITinLab: Information Handling Environment Enabling Integration of Paper and Electronic Documents. In: CSCW 2006 Workshop (Collaborating over Paper and Digital Documents) (2006) 4. Kawakita, J.: An Idea Development Method, Chuuko Shinsho, Chuuo Kouron-sha (1968) 5. http://www.mycoted.com/KJ-Method 6. Ohiwa, H., Takeda, N., Kawai, K., Shiomi, A.: KJ editor: a card-handling tool for creative work support. Knowledge-Based Systems 10(1), 43–50 (1997) 7. Munemori, J.: GUNGEN: Groupware for a new idea generation support system. Inf. Soft. Technology 38(3), 213–220 (1996) 8. Misue, K., Nitta, K., Sugiyama, K., Koshiba, T., Inder, R.: Enhancing D-ABDUCTOR Towards a Diagrammatic User Interface Platform. In: Proceedings of KES 1998, pp. 359–368 (1998) Usability-Engineering-Requirements as a Basis for the Integration with Software Engineering Karsten Nebe1 and Volker Paelke2 1 University of Paderborn, C-LAB, Fürstenallee 11, 33098 Paderborn, Germany 2 Leibniz University Hannover, Appelstrasse 9a, 30167 Hannover, Germany Karsten.Nebe@c-lab.de, Volker.Paelke@ikg.uni-hannover.de Abstract. Usability is growing to become an integral quality aspect of software development, but it is not an exclusive attribute of the generated product; it is also a fundamental attribute for the development process itself. The question is how to adapt software engineering processes (or models) in such a way that they can ensure the development of usable solutions. In this paper, the authors present an integration approach pursuing this goal. It draws on so called ‘Compliancy and Key Requirements’ that can be used for the definition of software processes (or process models) and thereby support the integration of both disciplines. The requirements are based upon representative standards (DIN ISO 13407 and ISO/PAS 18152) but were enhanced by the results of an expert based survey using interviews and questionnaires. Additionally the requirements have been verified by experts and represent an evaluated knowledge base for the development of usable products. Keywords: Integration, Software Engineering, Usability Engineering, Standards DIN EN ISO 13407 and ISO/PAS 18152, Process Models, Process Definition, Process Improvement, Assessment. 1 Introduction The integration of Software Engineering (SE) and Usability Engineering (UE) is a widespread topic in theory and practice as the need for usability grew to prominence in software development within the last years. There are far-reaching benefits both for the users, which include a high productivity, increased quality of work and usersatisfaction, as well as for the organizations, e.g. in monetary form, such as the reduction of support- and training-costs [10]. Thus, the usability of the solution became an integral quality aspect in software development, especially in comparison with competing products (respectively competing manufacturers) and can result in a unique selling proposition. But quality is not an exclusive attribute of the generated product; it is also a fundamental attribute of the manufacturing process itself. An optimal process would be designed in such a way as to assure the desired quality of the produced solution. At this point UE methods are used to ensure utilizable solutions. However, in industrial practice UE can only exist in addition to or in combination with SE. Hence, there is a need to integrate the two disciplines of SE and UE. The goal is to combine the procedures and J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 652–659, 2009. © Springer-Verlag Berlin Heidelberg 2009 Usability-Engineering-Requirements as a Basis for the Integration with SE 653 the goals of SE and UE in a way that allows systematic and predictable implementations to be generated while considering the factors of costs, time and quality adequately for both SE and UE proposes. In this paper, the authors present an integration approach pursuing this goal. 2 Integration Approaches In theory and praxis, a considerable number of integration approaches with distinct focuses exist [18]. Some of these approaches tend to define common activities and artifacts for both SE and UE and to integrate these specific activities into the process of development. They aim at a ’soft integration‘ of UE aspects on a mutual basis, e.g. at interlinking relative results (e.g. [17, 5, 2]). Most approaches focus on a minimal organizational and structural transformation and/or change. Quite similar are approaches that aim at a common specification of activities and artifacts. They are grounded on communication and information exchange by using shared definitions (e.g. [1, 21, 20]). These two kinds of approaches could be summarized as a group of approaches that aim directly at the operational development processes in organizations. Other integration approaches relate to the level of process definitions and process models (e.g. [6, 11,3]). These aim to define pre-settings for the development and contain both a more concrete approach (focusing on the integration of UE activities in an already existing SE-Models), and more fundamental aspects of process models (independently of any concrete SE-Model). In general these approaches concentrate on the combination of phases, activities and results (within existing structures) on the level of process models to build up the base for integration. In addition there is another third group of integration approaches focusing on a higher level ob abstraction. Those approaches are independent from any specific process model or activities but rather describe organizational measures, principles, paradigms or meta-models (e.g. [16, 7, 5, 19]). Those approaches aim at the definition of general procedures for the development, which is comparable to standards in SE and UE on this level of abstraction. Accordingly, strategies for the implementation are abstract and need to be adapted to particular situations. Altogether these groups of approaches aim to provide systematic procedures for developing usable software. At a closer look, they address three different levels of abstraction: 1. The abstract overarching level of standards in software engineering and usability engineering, serving as a framework to ensure consistency, compatibility, exchangeability, and quality within and beyond the organizational borders and to cover the improvement of quality and communication. 2. The level of process models for software engineering and usability engineering, to provide a procedural model that can serve as a framework for an organization, providing specific features, e.g. predictability, risk management, coverage of complexity, generation of fast deliverables and outcomes, etc. 3. The operational process level which reflects the execution of activities and the processing of information within the organization. It is an instance of the underlying model and the implementation of activities and information processing within the organization. 654 K. Nebe and V. Paelke These are related in a hierarchy: standards define the overarching framework, process models describe systematic and traceable approaches within such a framework, and at the operational level the models are tailored to fit the specifics of an organization. 2.1 Integration on the Level of Standards, Process Models and Operational Process It can be observed that this hierarchy of standards, process models and processes exists in both disciplines, but there have been few attempts to exploit these similarities for integration. With this goal in mind, the authors analyzed these three levels and presented a holistic approach for integration of SE and UE [12, 13, 14]. By doing this, the authors identified similarities between SE and UE on the level of standards. The standards’ detailed descriptions of processes, activities and tasks, output artifacts, etc. were analyzed and compared. For this, the SE standard ISO/IEC 12207 [8] was chosen for comparison with the UE standard DIN EN ISO 13407 [4]. On a high level, when examining the descriptions of each activity, by relating tasks and outputs with each other, similarities could be identified in terms of the characteristics, objectives and proceedings of activities. Based on these similarities single activities were consolidated as groups of activities (so called, ‘common activities’). These ‘common activities’ are part of both disciplines SE and UE on the highest level of standards. The result is a compilation of five ‘common activities’: Requirement Analysis, Software Specification, Software Design and Implementation, Software Validation, Evaluation that represent the process of development from both, a SE and a UE point of view [12, 13]. These activities define the overarching framework for the next level, the ‘level of process models’. In a following analysis the authors identified the maturity of software engineering process models’ ability to create usable products [12, 14]. For that purpose, the authors used a two-step approach to synthesize the demands of usability engineering and performed an assessment of selected software engineering models. To obtain detailed knowledge about usability engineering activities, methods, deliverables and their regarding quality aspects, the authors analyzed the two usability engineering standards DIN EN ISO 13407 and the ISO/PAS 18152 [9]. The ISO/PAS 18152 defines detailed base practices that specify the tasks for creating usable products. These base practices were used as a foundation to derive requirements that represent the ‘common activities’ from a usability engineering perspective. The quantity of fulfilled requirements for each activity of the framework informs about the level of compliance of the software engineering model. It provides an estimate of how well the UE base practices are covered in a given SE model. The results of the assessment provide an overview about the degree of compliance of the selected models with usability engineering demands. It turned out that there is a relatively small compliance to the usability engineering activities across all selected software engineering models. This is an indicator that only little integration between usability engineering and software engineering currently exists on the level of process models. The analysis did not only highlight weaknesses of SE Models, it also pinpointed the potential for integration between software engineering and usability engineering: Usability-Engineering-Requirements as a Basis for the Integration with SE 655 Where base practices are not considered as fulfilled, recommendations could be derived, which would contribute to an accomplishment. The underlying base practices provide indices what needs to be considered on the level of process models. This can be used a foundation for implementing the operational process level. However, during the analysis it became apparent that there is a clear need for more detailed/adequate criteria for the assessment by which more objective and reliable statements about process models and their ability to create usable software could be made. Such detailed criteria would also be useful to formalize process-requirements that can influence the definition of SE-Models and development processes that are usercentered and by improve the interplay of SE and UE in practice. Having this in mind, the authors performed semi-structured interviews with experts from the domain of UE to identify requirements from the UE perspective. The results have been analyzed and evaluated as described in the following paragraph. 3 UE-Process-Requirements In order to make software development processes user-centered there is a need for explicit knowledge about relevant activities, their dependencies, regarding results, roles, and quality aspects, etc. One goal is to develop such a knowledge-base using existing findings and to enrich them by expert’s knowledge. Therefore the authors created an interview-guideline and questionnaires that correspond to the overall process framework of common activities particularly with regards to the usability engineering perspective. The analysis is based on the four humancentered design activities of the DIN EN ISO 13407 (‘context of use’, ‘user requirements’, ‘produce design solutions’ and ‘evaluation of use’) and their respective base practices and specifics as defined in the ISO/PAS 18152 (i.e. fundamental activities, basic conditions and constraints, relevance of activities, resulting outcomes, type of documentation, and respective roles and responsibilities). The goal was not to evaluate these standards but to add details for further use. A substantial part of the analysis referred explicitly to quality characteristics of the four human-centered design activities. The goal was to identify what constitutes the quality of a certain activity from the experts’ point of view and what kind of (potentially measurable) success and quality criteria exist that are relevant on a process level and subsequently for the implementation in practice. Examples of the questionnaire are: How to identify good activities? How to identify good results or deliverables? How to identify appropriate roles? What are properties/characteristics for the relevance and frequency? How could the progress of an activity or deliverable be measured and controlled? Based on the results the authors identified activities, deliverables and roles that are necessary to ensure the development of usable products from the experts’ point of view. Relevant factors of influence could be for instance: „When will an activity A not be performed, and why?” or “Under which circumstances will an activity A be performed completely, when just partly?” Additionally, criteria that allow measuring the progress of the development process. 656 K. Nebe and V. Paelke It was expected that the results could be used not just as more detailed criteria for an assessment but would also provide an indication of the level of completeness of the ISO/PAS 18152 and identify potential areas of improvement. To achieve this, the authors performed semi-structured interviews and questionnaires with six experts in the field of UE [15]. The experts were well grounded in theoretical terms, i.e. standards and process models, as well as in usability practice. 3.1 Derivation of Requirements As a result, about 470 statements from the experts have been gathered which then have been consolidated and classified by adding references to its source (i.e. the interview partner and the question out of the interview-guideline); to one of the four activities (‘context of use’, ‘user requirements’, ‘produce design solutions’ or ‘evaluation of use’); whether it addresses quality aspects regarding the process, an activity, or deliverable; whether it complies to the activities’ and base practices’ goals (as defined in the two ISO standards), etc. Thus, overarching process- and quality characteristics could be identified that led to findings about the relevance, the applicability and need of usability activities, methods and artifacts to be implemented in SE. By performing several iterations of analysis similar statements were merged and formalized in terms of 107 ‘requirements for development processes or process models. There are two distinct types of requirements: ‘Compliancy and Key Requirements’. Compliancy requirements represent the goals and base practices defined in the standards DIN EN ISO 13407 and ISO/PAS 18152 but refine them with the output of the analysis. The key requirements define core characteristics of the overall frameworks usability activities focusing on the activities’ and results’ quality. Together, the requirements define the demands of UE and lead to the systematically creation of usable products. Examples of the resulting requirements are: • Context-analysis is an integral part of the process. • Analysis takes part early in the process before conceptual work is carried out. • Analysis activities are preformed iteratively until all incompletions and inconsistencies are eliminated. • Resources and time for the elicitation and evaluation of user requirements is sufficiently provided. • User requirements are addressed in the system design. • User requirements are the input for the next process step and accordingly positioned in the development process. • The requirements of the users of the system are defined. 3.2 Evaluation of Requirements In a subsequent analysis, both the compliancy and key requirements have been evaluated by 13 usability experts using questionnaires (three of these experts were also involved in the previous analysis). The questionnaire included a list of all 107 requirements grouped by the four activities (‘context of use’, ‘user requirements’, ‘produce design solutions’ or ‘evaluation of use’) and scales to rate the correctness and the relevance for the appliance in practice of each requirement. Some examples of Requirements are shown in Table 1. Usability-Engineering-Requirements as a Basis for the Integration with SE 657 Table 1. Examples of the requirements for the UE-activites ‘context of use’ (CoU), ‘user requirements’ (UR), ‘produce design solutions’ (PDS) and ‘evaluation of use’ (EoU) and the experts’ rating in terms of correctness and relevance (in practice) Nr 2 17 Activity CoU CoU 27 CoU 24 CoU 33 CoU 46 UR 71 PDS 105 EoU Requirement Context-analysis is an integral part of the process. The outcomes of the context analysis serve as the input for the next process step and the activity itself is anchored within the process model accordingly. The characteristics of the intended users and their tasks, including user interaction with other users and other systems, are documented. The analysis is focused on the original context of the users (their goals, tasks, characteristics of the tasks and the environment, etc.). The analysis is independent of any existing solution/implementation. The context-information is based on facts and not an interpretation of any situation. A sufficient amount of user requirements are the basis for the next process step (PDS). The development of solutions is carried out in collaboration with the development team. It is checked that the system is ready for evaluation. Correctness Correct Correct Relevance Very high High Correct Very High Correct High Sufficient Medium Correct Very High Correct Very High Sufficient Medium By looking at the overall results it turned out that most requirements are rated correct by the majority of experts: 31 requirements by all 13 participants; 29 requirements by 12 experts; 27 requirements by 11; and 6 requirements by at least 10 Experts. No requirement has been rated incorrect. All together there is a high compliance of the experts opinions to the requirements. The sum of requirements that has been rated correct by at least 10 experts is 93 – which represent 87% of all 107 requirements. The rating of the relevance was used to derive recommendations about the priority for the appliance in practice (i.e. for the definition of processes). 1. Those requirements that have been rated as ‘correct’ and range from a ‘very high’ to a ‘high’ scale of relevance. (in general: the higher the relevance the higher the priority). 2. Those requirements that have been rated as ‘correct’ and show ‘medium’ scale of correctness. 3. Those requirements that depict a ‘sufficient’ scale of correctness. 4. Those requirements that show an ‘acceptable’ scale of correctness. 5. All remaining requirements. But, by applying the requirements in practice, it is important to consider requirements of all four activities in equal measure. A partially implementation of selective requirements will not lead to usable products. Only using them in a holistic way will support the systematic development of usable solutions. As a result of the analysis and evaluation the compliancy and key requirements represent an evaluated knowledge basis for the development of usable products. The analysis based on representative standards of UE and the requirements add here to more specific criteria based on experts’ knowledge. The requirements account for the 658 K. Nebe and V. Paelke integration of SE and UE as they can be used for the definition and adaption of SE process models as well as operational development processes. 4 Conclusions and Outlook In summary, there exist many integration approaches that aim to provide systematic procedures for developing usable software. At a closer look, they address three different levels of abstraction: standards, process models and operational processes. However, there have been few attempts to exploit the integration in a holistic way including all three levels. The authors report about such an approach and present a systematic way of integrating usability engineering demands into the software engineering methodology. The results of an expert based analysis (and subsequent evaluation) have been used to derive two distinct types of requirements: ‘Compliancy and Key Requirements’. Compliancy requirements represent the goals and base practices defined in the standards DIN EN ISO 13407 and ISO/PAS 18152 but those are refined by the output of the analysis. The key requirements define core characteristics of the overall frameworks usability activities focusing on the activities’ and results’ quality and are also based of the analysis’ results. The requirements represent an evaluated knowledge basis for the development of usable products. They add to an integration of software engineering and usability engineering as they can be used for the definition and adaption of software development processes and process models, too. In future we aim to evaluate these requirements in practical projects to observe process changes and their resulting effects to the usability of the products. References 1. Constantine, L.L., Lockwood, L.A.D.: Software for Use: A Practical Guide to the Models and Methods of Usage-Centered Design. Addison-Wesley (ACM Press), New York (1999) 2. Constantine, L.L., Biddle, R., Noble, J.: Usage-centered design and software engineering. Models for integration. In: IFIP Working Group 2.7/13.4, ICSE 2003 Workshop on Bridging the Gap Between Software Engineering and Human-Computer Interaction, Portland (2003) 3. Düchting, M., Zimmermann, D., Nebe, K.: Incorporating User Centered Requirement Engineering into Agile Software Development. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 58–67. Springer, Heidelberg (2007) 4. DIN EN ISO 13407: Human-centered design processes for interactive systems, CEN European Committee for Standardization, Brussels (1999) 5. Ferre, X.: Integration of Usability Techniques into Software Development Process. Bridging The Gaps Between Software Engineering and Human-Computer Interaction. In: Proceedings of ICSE 2003 International Conference on Software Engineering, pp. 28–35. ACM Press, Portland (2003) 6. Göransson, B., Lif, M., Gulliksen, J.: Usability Design-Extending Rational Unified Process with a New Discipline. In: Jorge, J.A., Jardim Nunes, N., Falcão e Cunha, J. (eds.) DSV-IS 2003. LNCS, vol. 2844, pp. 316–330. Springer, Heidelberg (2003) Usability-Engineering-Requirements as a Basis for the Integration with SE 659 7. Granollers, T., Lorès, J., Perdrix, F.: Usability Engineering Process Model. Integration with Software Engineering. In: Proceedings of the Tenth International Conference on Human-Computer Interaction, pp. 965–969. Lawrence Erlbaum Associates, New Jersey (2002) 8. ISO/IEC 12207: Information technology - Software life cycle processes, 2nd edn., 200802-01. ISO/IEC, Genf (2008) 9. ISO/PAS 18152: Ergonomics of human-system interaction - Specification for the process assessment of human-system issues. ISO, Genf (2003) 10. Jokela, T.: An Assessment Approach for User-Centred Design Processes. In: Proceedings of EuroSPI 2001, Limerick Institute of Technology Press, Limerick (2001) 11. Kolski, C.: A call for answers around the proposition of an HCI-enriched model. ACM SIGSOFT Software Engineering Notes 23(3), 93–96 (1998) 12. Nebe, K., Zimmermann, D.: Suitability of Software Engineering Models for the Production of Usable Software. In: Proceedings of the Engineering Interactive Systems 2007. LNCS, Springer, Heidelberg (2007a) 13. Nebe, K., Zimmermann, D.: Aspects of Integrating User Centered Design into Software Engineering Processes. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 194–203. Springer, Heidelberg (2007b) 14. Nebe, K., Zimmermann, D., Paelke, V.: Integrating Software Engineering and Usability Engineering. In: Pinder, S. (ed.) Advances in Human-Computer Interaction, ch. 20, pp. 331–350. I-Tech Education and Publishing, Wien (2008b) 15. Nebe, K.: Integration von Usability Engineering und Software Engineering: Konformitäts und Rahmenanforderungen zur Bewertung und Definition von Softwareentwicklungsprozessen, Doctoral Thesis, Shaker Verlag, Aachen (in print, 2009) 16. Pawar, S.A.: A Common Software Development Framework For Coordinating Usability Engineering and Software Engineering Activities. Master Thesis, Blacksburg, Virginia (2004) 17. Schaffer, E.: Institutionalization of usability: a step-by-step guide. Addison-Wesley, Pearson Education, Inc., Boston (2004) 18. Seffah, A., Desmarias, M.C., Metzker, E.: Human Centered Software Engineering, HCI, Usability and Software Engineering Integration: Present and Future. In: Seffah, A., Gulliksen, J., Desmarais, M.C. (eds.) Human-Centred Software Engineering – Integrating Usability in the Development Lifecycle, vol. 8, Springer, Heidelberg (2005) 19. Sousa, K., Furtado, E., Mendoca, H.: UPi: a software development process aiming at usability, productivity and integration. In: Proceedings of the 2005 Latin American conference on Human-computer interaction CLIHC 2005, ACM Press, New York (2005) 20. Tidwell, J.: Designing Interfaces: Patterns for Effective Interface Design. O’Reilly, Sebastopol (2005) 21. Van Harmelen, M.: Object Modeling and User Interface Design: Designing Interactive Systems. Addison-Wesley, Boston (2001) Design Creation Based on KANSEI in Toshiba Yosoko Nishizawa and Kanya Hiroi Toshiba corporation, Design Center, 1-1-1 shibaura, minato-ku, Tokyo, Japan {yosoko takano,kanya kiroi}@toshiba.co.jp Abstract. In endeavoring to increase the quality of design, Toshiba has outlined a concept of “perceived quality,” and evaluates designs on the basis of achieving a higher level of perceived quality. We defined six indices from the result of the image research into the design by the user. These six indicators of perceived quality were used in the creation and evaluation of designs, and a number of products were put on the market and evaluated. Keywords: KANSEI, design, product, quality of design, Evaluation of design. 1 Introduction Toshiba’s Design Division was established in 1953, and now boasts a 55-year history. Based on a concept of “Eyes that see into the future, hands that know the joy of creation, and hearts that care for humanity and the environment,” and with “Smart & Unique” as its development slogan, the Design Division currently supports product development for Toshiba as a whole. During the past half-century, the genres of products in development have increased dramatically, and the scope of design activities has also greatly expanded. In addition, with the growth of new markets and the globalization of production bases, it has become essential to develop products of powerful originality that present new value. Company design departments (the in-house design process) must work to respond to these changes. In a market that is saturated with products and services and with minimal differences between them, new value is required in order to enable differentiation of products on a level beyond function, color, or form. The values required by users are changing more and more based on experience of using products and the exchange of impressions and feelings in interpersonal communication. In particular, the more that functions are improved, the more that users rely on an appeal that sways their emotions as a factor in deciding on the purchase of a product. This means is that it is essential to incorporate new value that transcends function. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 660–666, 2009. © Springer-Verlag Berlin Heidelberg 2009 Design Creation Based on KANSEI in Toshiba 661 As the source of added value, having shifted from quantity to quality, shifts from quality to more nebulous factors, we are driven by the necessity to create new value for customers. As one initiative in this direction, Toshiba is engaging in product development with “perceived quality” positioned as an added value to its products. As of the present, we have attempted to define the nature of perceived value, and we have evolved the terms “perceived quality” and “appealing quality.” Unfortunately, the question of how to incorporate these concepts in product development is an ongoing process of trial and error on the ground. In this presentation, we will offer examples to illustrate Toshiba’s concept of “perceived quality,” and will also discuss the methods by which the concept was derived. 2 The Concept of Perceived Quality 2.1 Derivation of Perceived Quality “Perceived quality” is quality that can be expressed in terms of an individual’s feelings and the images to which they respond. It is a quality that can be expressed in terms of the subjective requirements of the individual. For an automobile, examples of these subjective requirements would be “Does it feel good to drive?,” “Is it stylish?,” and the like. Contrasting with this, there are other aspects of quality which can be expressed as objective, physical characteristics[1]. For a car again, examples of these would be high horsepower, good fuel efficiency, and the like. We can say that “perceived quality” resides in design features that appeal to the emotions, and is something that the customer judges subjectively. What, then, is a design that appeals to the emotions, and what is a product in which this quality resides to a high degree? First, we studied what types of products appealed to the emotions and how customers evaluated these products. The results of these studies are shown in Figure 1. The numbers in the figure represent products. The evaluated product is a product that shines to the design prize or it is a commodity that designed evaluation is high for the user. These results, in addition to a series of interviews, showed that a design with a high level of perceived quality is one that is beautiful, easy to use, and that offers feelings of security and pleasure. Further, the systemic structural relationships shown in Figure 2 also exist. This shows that design expression can be used to increase quality, enabling the creation of a product that creates a strong impression on customers. We defined two types of perceived quality: Basic perceived quality and a perceived quality that goes beyond the basic to affect the emotions (appealing quality) (Figure 2). As a prerequisite for the creation of a design that affects the emotions (i.e., that possesses appealing quality), we established that first the design must produce feelings of pleasure (i.e., must possess basic perceived quality). Then, we searched for the factor that became an index from the result of the above-mentioned user survey. 662 Y. Nishizawa and K. Hiroi Fig. 1. T This is a positioning map. We mapped the results of a correspondence analysis of data from a Web-based questionnaire on design images. Fig. 2. This figure shows the elements making up perceived quality as defined by Toshiba. A design expression that offers simplicity and ease of use produces feelings of pleasure in users. This can be understood as basic perceived quality, but by itself this is not enough. We must also consider appealing quality, which transcends feelings of pleasure to affect the emotions. The results of the user survey showed that this basic perceived quality was made up of six elements. Indicators we defined for these six elements are `Aesthetic quality`, `Quality with feeling of warmth`, `Quality in use`, `Universal quality`, `Quality that transmits the “message” of the product`, and `Original quality` (Figure 2). Design Creation Based on KANSEI in Toshiba 663 Fig. 3. These are the six elements of perceived quality. These were derived by the factorial analysis. 2.2 Evaluating Perceived Quality Using the six indicators to evaluate a group of popular products with excellent design features that were available on the market showed us that it was indeed possible to evaluate these products to some extent. The results of this survey showed two patterns for products of high perceived quality: Either the product was evaluated for all six indicators to some degree (Group B), or it was evaluated extremely highly for only some of the indicators (Group A). Figure 4 shows products that were selected as displaying a high level of perceived quality. Evaluated on the basis of the six indicators of perceived quality, the products A in Group A received relatively high evaluations for `Quality with feeling of warmth` and `Aesthetic quality`, but low evaluations for `Quality in use` and `Universal quality`. The products in Group B, by contrast, received balanced evaluations across the entire spectrum of indicators, despite receiving fairly low evaluations for `Quality with feeling of warmth`. Product A was liked for users more than products B. As this shows, rating highly in all six indicators of perceived quality does not mean that a product will be evaluated as displaying a high level of perceived quality. Rather, we can say that a product that receives an extremely high evaluation on one axis is more likely to be selected as a product of high perceived quality. Given this, perceived quality can be considered as something which strongly displays a specific tendency rather than something that is balanced overall. 664 Y. Nishizawa and K. Hiroi Fig. 4. This is an example of the evaluation of award-winning designs using the six indicators of perceived quality that we defined. These are the results of a questionnaire given to average users. Both A and B was evaluated highly for only some indicators. 3 Creating Products of High Perceived Quality at Toshiba Based on the results discussed above, we set the six indicators of perceived quality as shared guidelines for designers in creating perceived quality. And we attempted to create products at Toshiba displaying a high level of perceived quality. This enabled us to develop a variety of products of high perceived quality. Examples include a range of IH cookers, a cellular phone (KOTO), and high-quality home electronic products (washing machines, ovens, and vacuum cleaners). The IH cookers and the KOTO cellular phone (Figure 5) incorporate Japanese-style design, and both received extremely high evaluations for some indicators of perceived quality. The form of the Japanese musical instrument, the koto, was used as a design element in the KOTO cellular phones (see Figure 5B), and the projected design image saw them as being finished in vermillion. This products was made for `Aesthetic quality`, `Quality that transmits the “message” of the product` and `Original quality`. A B Fig. 5. A is an IH cooker, and B is a “Koto” model cellular phone. Both products were evaluated highly for design. Design Creation Based on KANSEI in Toshiba 665 On the other hand, the IH cookers (see Figure 5A) was evaluated extremely highly for `Quality with feeling of warmth` and `Original quality`. This IH cookers feature an unusual combination of the forms of conventional IH cookers and metal pots, and present them as an integrated whole. Design efforts have also enabled the cookers to be presented as tableware. The original designs of the cookers have incorporated materials traditionally used in different parts of Japan – stainless steel in Tsubame city, Nambu ironware in Mizusawa city, earthenware in Yokkaichi, etc. – and they have been marketed as products in which the feelings of users can find a resonance. In addition to gold of the Japanese G Mark, this design has received Germany’s IF Award and Red Dot Award, indicating how highly-regarded it is in Europe. In order to demonstrate the appeal of Toshiba design, we developed advertisements that focused on their perceived quality (Figure 6). Fig. 6. These are examples of advertising Toshiba that is the front side appeal for the sensibility quality In addition, Toshiba is globally selling a lot of products and systems now. Therefore, design must be conducted on a global level. Issues for the future will include how to blend design elements having a global appeal with those whose appeal is unique to Japan, how to judge the amount of the ingredients, and how to incorporate essences in the design. To respond to these issues, we are at present engaging in further study of perceived quality and revising our six indicators, in order to enable them to function as a yardstick of perceived quality that will be globally valid. 4 Conclusion In the field of products for B to C, which now represents a mature market, it will be increasingly important in future to use design to create products with originality and 666 Y. Nishizawa and K. Hiroi high perceived quality. Toshiba has introduced a “yardstick” of perceived quality as a guide to answering the question of how this originality is to be created. In 2006, the Ministry of Economy, Trade and Industry also launched a program for the development of products that were both original and incorporated a new Japanese style, as an initiative for the creation of perceived value [2]. And this year is positioned as a year for the creation of perceived value, and Japanese products falling within this category will be presented in exhibitions in Paris and elsewhere. On such movement, training its focus on the perceptions involved in perceived quality, Toshiba makes efforts to the development of the commodity with a sensibility that runs globally and an original sensibility of Japan. References 1. The Japanesque Modern Committee: Towards a Japanesque Modern Style – Representing Japanese Tradition to the World, http://www.rieti.go.jp/jp/events/bbl/06041801.html 2. Policy Office for Design and Human Life System Manufacturing Industries Bureau METI 3. Kansei Initiative – Proposal of a fourth value axis, IIST WORLD FORUM (June 16, 2008), http://www.iist.or.jp/wf/magazine/0618/0618_E.html 4. Opinions presented at the symposium Kansei initiatives (Initiatives for the Creation of Perceived Value), held by the Japan Industrial Designers Association (JIDA) (June 18, 2007) 5. Hiroi, K.: About the sensibility value creation of the design: Research leader, vol. 10, pp. 43–51. Technical Information Institute Co., Ltd., Japan (2007) High-Fidelity Prototyping of Interactive Systems Can Be Formal Too Philippe Palanque, Jean-François Ladry, David Navarre, and Eric Barboni IHCS-IRIT, Université Paul Sabatier – Toulouse 3, France {ladry,palanque,navarre,barboni}@irit.fr Abstract. The design of safety critical systems calls for advanced software engineering models, methods and tools in order to meet the safety requirements that will avoid putting human life at stake. When the safety critical system encompasses a substantial interactive component, the same level of confidence is required towards the human-computer interface. Conventional empirical or semi-formal techniques, although very fruitful, do not provide sufficient insight on the reliability of the human-system cooperation, and offer no easy way to, for example, quantitatively compare two design options. The aim of this paper is to present a method, with supporting tools and techniques, for engineering the design and development of usable user interfaces for safety-critical applications. More precisely we present the Petshop environment which is a Petri net based tool for the design specification, prototyping and validation of interactive software. In this environment models of the interactive application can be interactively modified and executed. This is used to support prototyping phases (when the models and the interactive application evolve significantly to meet late user requirements for instance) as well as in the operation phase (after the system is deployed). The use of the description technique (the ICO formalism) supported by PetShop is presented on a multimodal ground segment application for satellite control and more precisely how prototyping can be performed at the various levels of the architecture of interactive systems. Keywords: Model-based approaches, formal description techniques, interactive prototyping, reliability, evolvability. 1 Introduction and Related Work Current research in Interactive systems promotes the development of new interaction and visualization techniques in order to increase the bandwidth between the users and the systems. Such an increase in bandwidth can have a significant impact on efficiency (for instance the number of commands triggered by the users within a given amount of time) and also on error-rate [23] and complexity. To address design issues raised by such systems, new design and development processes have to be defined and assessed. Current development processes both in the field of Human-Computer Interaction (HCI) [11] and Software Engineering (SE) [9, 4] promote iteration-centered processes but with a different perspective. In the field of HCI, the product of each iteration is J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 667–676, 2009. © Springer-Verlag Berlin Heidelberg 2009 668 P. Palanque et al. tested with potential users of the system under development while, in SE, the product is evaluated by different stakeholders including client or customer (the one who pays for or buys the product) and more unlikely users (but user-centered approaches (such as task analysis and modelling). At design stage, HCI approaches promote iteration through the production of prototypes to be presented to and used by “real” users. While such design process is widely agreed upon, the debate is still vivid whether one should use low-fidelity [24] or high-fidelity prototyping [26, 14]. When it comes to complex applications at the interaction level [19], or at the application level [25], low fidelity approaches only address a small part of that complexity. The outcome is too informal for making it exploitable further on in the development process without losing a significant part of it. This limits the use of low-fidelity prototyping approaches to the earlier phases of the development process, where main design questions are addressed and low level ones left to later phases. The main drawback of high-fidelity prototyping lays in the fact that the iterations are more time consuming and thus prevent exploration of new ideas without jeopardizing the entire project by overrun on schedule. Another inconvenient of high-fidelity prototyping is related to the product of that phase which most of the time corresponds to program code, making its integration in the rest of the application very difficult due to lack of abstraction. In this paper, we promote the use of an executable formal approach called Interactive Cooperative Objects (ICOs) within the high-fidelity prototyping phase of interactive systems development. This formal approach solves some of the limitations of Rapid Application Development (RAD) techniques currently used for high-fidelity prototyping. Indeed, it provides abstraction through models, rapid execution through simulation and testing through generation of test cases and scenarios. In addition, when the prototyping phase is terminated, the outcome is not only a partially running prototype, but also a partial formal description of its behaviour that can then be passed on to the development team in charge of the development of the final system to be deployed. Previous work we have done in that domain was focusing on the rapid prototyping of the interactive application [17]; our current work addresses the 3 levels of interactive systems prototyping: interaction technique level (including multimodal interactions with non standard input devices as tactile screens), interactive component (including sophisticated widgets such as range sliders of semi-transparent pop-up menus) [16] and the interactive application in complex environment as cockpits (both military and civil [1]), grounds segment for satellite control rooms [20] and Air Traffic Management interactive applications. This paper focuses on the use of the ICOs formal description technique to support rapid prototyping of interaction techniques. More precisely, it presents how an interaction technique can be defined and then how it can “rapidly” evolve according to users’ feedback and users’ performance. Indeed, the tool support environment for ICOs (called PetShop) has been now extended to provide additional facilities such as model-based logging of events and state-changes to support usability evaluation activities classically imbricated with rapid prototyping. This paper also addresses how logging support can be used to carry out performance analysis of the interaction technique thus limiting user testing to interaction techniques that have been previously formally analysed. High-Fidelity Prototyping of Interactive Systems Can Be Formal Too 669 This paper is organized as follows. Next section presents some related work and research questions in the field of model-based approaches for interactive systems. The ICO notation is described in section 3. Section 4 presents the CASE tool Petshop which allows editing and execution of ICO models. Section 5 presents, on two small examples, how prototyping can be managed with PetShop and ICOs. Section 6 concludes the paper. 2 Model-Based Approaches for Interactive Systems When formal methods were initially used for interactive systems [16], models were limited to the dialog part, making them less prominent for runtime use as only one part of the interactive system was taken into account. In order to address issues raised by real life application, current trend in interactive systems engineering is to develop models for all the parts of the systems. Another parallel track of research work has been targeting at modelling new interaction techniques in order to be able to deal with current practice in the field of HCI. To deal with WIMP and post-WIMP interaction techniques, several notations have been proposed from Data-flow-based notations such as Wizz’ed [7], Icon [6], Nimmit [23] or InTml [8] to event-based notations such as Marigold [23], Hynets [23] or ICO [8]. Hybrid models integrating both event-based and data-flow-based notations have also been presented in [8] and in [15]. With respect to that later work, the work presented here extends the work presented in [15] by removing the data-flow model dealing with input devices configuration and proposing a single event-based notation described in the next section. The work presented in this paper is about providing a modelling technique capable of representing the behaviour of an entire interactive application (from physical to functional interaction) using a dedicated Petri net dialect. It also targets at new interaction techniques (e.g. multimodal, direct manipulation ...) such as the ones used in the field of HCI. This paper shows how the CASE tool Petshop [1] embeds the system models (which represent an interactive system from the interaction technique through to the system functional core) using the ICO notation at runtime for: • Prototyping of models • Execution of application to check • Analysis as a way of supporting models construction by providing additional information about the properties of the models under construction. 3 The ICO Formalism The ICO formalism is a formal description technique dedicated to the specification of interactive systems [19]. It uses concepts borrowed from the object-oriented approach (dynamic instantiation, classification, encapsulation, inheritance, client/server relationship) to describe the structural or static aspects of systems, and uses high-level Petri nets [23] to describe their dynamic behavioral aspects. 670 P. Palanque et al. 3.1 Cooperative Object The ICO notation depends on Cooperative objects, A Cooperative Object states how the object reacts to external stimuli according to its inner state. The COs behaviour is called the Object Control Structure (ObCS) is expressed in a language based on Object Petri Net (OPN) (see Fig. 1.). An ObCS can have multiple places and transitions that are linked with arcs like standard Petri nets. As an extension to these standard arcs, ICO provides additional input arcs: Test arcs and Inhibitor arcs. Each place has an initial marking (represented by one or several tokens in the place) describing the initial state of the system. Fig. 1. Metamodel of the COs exhibiting runtime features With respect to “standard” Petri nets, the object-oriented nature of the Cooperative Objects supports instantiation. Indeed, every ObCS can be instantiated and allows multiple executions of the same class as in object oriented programming languages. These instances can be parameterised by constructor arguments. This parameterisation is used to associate markings to the Petri net describing the behaviour of the instantiated Cooperative Object. For example, in a case of a multiple mouse interaction (i.e. in interactive cockpits such as the Airbus A380), each mouse driver is a distinct instance of an ObCS class with different Class Parameters (i.e. the number of the mouse) and so the behaviour model of each driver handle its own coordinates represented in the marking of the instance. For more details about that type of modelling see [1]. Fig. 1 presents a subset of the class diagram of ICOs. As stated above, the main element used for prototyping is related to the fact that each class can have several instances (as shown on the right-hand side of the figure) and that instances can be Played, Paused or Stopped. 3.2 Interactive Cooperative Objects To allow dealing with the specificities of interactive systems the Cooperative Objects formalism has been extended. The resulting notation is called Interactive Cooperative Objects. High-Fidelity Prototyping of Interactive Systems Can Be Formal Too 671 An ICO is a 6-tuple <CO, Su, Wid, Event, Act, Rend> where: • • • • • CO is a Cooperative Object described in section 3.1, Su is a set of user services (a user service is a set of synchronized transitions), Wid is a set of interactive widgets (e.g. buttons, listbox, …) linked to the ICO class, Event is a set of user events coming from items of Wid, Act and Rend are the activation and rendering function described below. Act: An activation function defines the relationship between events triggered by users while interacting with user interface objects (by manipulation of input devices such as mouse, keyboard, voice recognition systems …) with the transitions of the ObCS. When an event is triggered the related transition can be fired if the transition was fireable (according to the current marking of the Petri net). Rend: A rendering function defines how the state changes in the ObCS influence the changes in the presentation (what the user perceives of the application). The state changes are linked to the entering in or exiting of a token in a place. 4 Prototyping of ICO Models Using Petshop Tool To support the manipulation of the ICO notation, a CASE tool called Petshop [1] has been developed. It includes a Java implementation of a Object-oriented Petri net interpreter and some analysis tools for verifying properties on the models. The tool is publicly available at http://ihcs.irit.fr/petshop. 4.1 Structure Fig. 2 represents the high level structure of Petshop. In Petshop, it is possible to edit, execute and analyze the instances of ObCS. When the user edits an instance, Petshop starts to update the ObCS (the class) and then updates all the instances of this class. During the first execution of the instance, the instantiation engine takes the ObCS to create an instance. Next, this instance is executed and can be directly managed by the user of Petshop (started, paused and stopped). When the instance is running, Petshop can also analyze the model (currently limited to the calculation of place invariants and transition invariants [10]). An example of PetShop user interface is presented in Fig. 2. Fig. 2. High Level Structure of Petshop 672 P. Palanque et al. 4.2 Edition of Models The CASE tool Petshop allows: • to graphically add Petri net items (place, transition and different arcs) , • to modify the initial construction parameters of the class (e.g. editing a set of variables that may have different values for each instantiation) • to modify the initial marking for each place (that corresponds to raw values or to references to the initial parameters of the class), • to change the executable code in the transition, • to modify the layout of the Petri net, • to cut copy paste part of the model, • to undo redo any change, • to navigate through large models via mini map or through a large set of models via a tree. 4.3 Execution of Models In Petshop a toolbar ( ) allows the user to start/stop/pause an instance of the ObCS. There are two modes of execution of instances: • A normal execution in which the user is a spectator of execution and observes the execution of an instance. Transitions are fired using random enabling substitutions, • A step by step execution in which the user can select a substitution to fire the transition. At runtime, the execution of instances gives the following feedback to the user: • The marking is shown by the number of tokens present in a place, • The fireability of transitions is shown by colour changes: purple for fireable or gray for not fireable, • The firing of a transition and the updating of the marking (by the evolution of tokens in the input and output places of the fired transition). Petshop also provides observability and controllability services via an API for externals programs (in our case the window manager of the plateform handing input devices). Observability services send events to subscribers when: markings change, substitutions change and events are raised in code associated to the transitions. Controllability services receive events from external sources and fire the related transition of a user service. All traces of execution can be logged to an external file allowing further analysis such as usability evaluation of the interactive systems [5]. 5 Prototyping Interactive Systems with ICOs This section presents the prototyping capabilities of PetShop and the ICO notation. These capabilities are presented on two examples extracted from case studies. They High-Fidelity Prototyping of Interactive Systems Can Be Formal Too 673 show different aspects illustrating how prototyping can be performed at different levels of the architecture of interactive systems. 5.1 Prototyping Interaction Techniques The example in this section presents how it is possible using the ICO notation to prototype low level interaction techniques. Such prototyping is critical to increase usability of interactive applications as fine tuning of interaction can have a huge impact on the overall performance of users [13]. Fig. 3. ICO model of a mouse driver The model of Fig. 3 describes a transducer for handling low level events. It models how events from the input device (in such as a case a pointing device like a mouse) are received from the input device and how they are transformed according to the need of the interactive application. Dark transitions represent the transitions that are available according to the current marking of the model. Their black border means that they are connected to events i.e. even though they are available according to the current marking, they must additionally receive an event to be actually fired. The model can receive 4 different events: mouseMove, mousePressed, mouseReleased and mouseClick. The current position of the cursor of the input device is stored in the place Currentxy. When a mouseMove event is received the transducer has to transform the dx, dy parameters received in x and y position to reflect that change on the mouse cursor. In order to keep the cursor inside a set of predefined bounds (this could be for instance the size of the screen or the size of a portion of a window) the transformation of x and y values according to dx and dy parameters has to be constrained. This is the role of the places named Bounds. As for a notational aspect these places are virtual places i.e. virtual copies of a single place. This notational aspect is used to reduce the number of arcs when the same place is connected to many transitions. The code of the transitions mouseClick, mouseReleased and mousePressed feature contain the Trigger construct. This means that, when one of these transitions is fired 674 P. Palanque et al. the model will raise an event. Other models registered to the current model will then be notified for each event triggered. The model in Fig. 4 shows how the previous model can be modified according to requests from modification (after usability evaluation for instance). Fig. 4. Modified ICO model of a mouse driver (acceleration of mouse move events) The modification includes a new element in the interaction technique: the acceleration. Indeed, the movements on the table where the mouse is located are typically much more constrained than the virtual space available to the cursor. For this reason mouse drivers will embed an acceleration mechanism that increase cursor movement according to speed. This is modelled by adding the places Coef in the models and connecting them to the transitions in charge of the calculation of the new position of the cursor. The code of these transitions shows that dx and dy parameters are multiplied by the coefficient (stored in the token of the place Coef). 5.2 Prototyping Applications While the prototyping of interaction techniques is critical for fine tuning of interaction, prototyping is also needed at a higher level. This section presents how PetShop and ICO support prototyping at the dialogue level of interactive applications. The prototyping aspects remain the same as for the interaction technique i.e. models describing the behaviour of the applications at the dialogue level can be interactively modified and the impact of the modifications can be immediately perceived. The application under consideration here is called MPIA. The Multi Purpose Interactive Application (MPIA) is an application available in the cockpits of several aircrafts that aims at handling several flight parameters. It is made up of 3 pages (called WXR, GCAS and AIRCOND). The WXR page is responsible for managing weather radar information; GCAS is responsible for the Ground Anti Collision System parameters while AIRCOND deals with settings of the air conditioning. Due to space constraints we don’t present in details the interactive modifications of the models but the interested reader can see detailed behaviour of that application (in a reconfiguration process after hardware failure in a cockpit) in [18]. High-Fidelity Prototyping of Interactive Systems Can Be Formal Too 675 6 Conclusion This paper presents the ICO notation for the description of interactive systems via graphical models which can be edited and executed at runtime. The ICO notation, an extension of object Petri nets has a dedicated CASE tool called Petshop. This runtime capability increases the possibilities of modelling by supporting prototyping, testing, and verification. This paper presented how prototyping of interactive applications can be performed at two different levels: interaction technique and dialogue model. The later is extracted from an industrial example dealing with cockpit applications in civil aircrafts. We have studied the usability of ICOs and PetShop for prototyping phases in an informal with software engineers involved in the field of Air Traffic Control applications [2]. Informally we can report that modification of models was fine while creation of models and connecting models was not performed in a satisfying way. Testing of the tool is available at http://ihcs.irit.fr/petshop. The specific application area that we consider in the paper is ground segment applications for satellite control, but the results have been applied and are applicable to other application areas with similar requirements. Acknowledgements. This work is supported by the EU funded Network of Excellence ResIST http://www.resist-noe.eu contract n°026764 and the CNES funded R&T Tortuga project http://ihcs.irit.fr/tortuga/ contract n° R-S08/BS-0003-029. We would also like to thanks the reviewers for their in-depth thoughtful comments. References 1. Barboni, E., Navarre, D., Palanque, P., Basnyat, S.: Addressing Issues Raised by the Exploitation of Formal Specification Techniques for Interactive Cockpit Applications. In: HCI Aero 2006, p. t.b.p., Seattle (2006) 2. Bastide, R., Navarre, D., Palanque, P.: A Tool-Supported Design Framework for Safety Critical Interactive Systems. Interacting with computers 15(3), 309–328 (2003) 3. Bastide, R., Palanque, P., Duc, L.: Integrating Rendering Specifications into a Formalism for the Design of Interactive Systems. In: DSV-IS 1998, pp. 171–190 (1998) 4. Beck, K.: Extreme Programming Explained: Embrace Change. Addison-Wesley, US (1999) 5. Bernhaupt, R., Navarre, D., Palanque, P., Winckler, M.: Model-Based Evaluation: A New Way to Support Usability Evaluation of Multimodal Interactive Applications In Maturing Usability, Quality in Software, Interaction and Value. In: Human-Computer Interaction Series, pp. 96–119. Springer, Heidelberg (2007) 6. Dragicevic, P., Fekete, J.-D.: Input Device Selection and Interaction Configuration with ICON. In: Proceedings of IHM-HCI 2001, People and Computers XV - Interaction without Frontiers, pp. 543–448. Springer, Heidelberg (2001) 7. Esteban, O., Chatty, S., Palanque, P.: Whizz’Ed: a visual environment for building highly interactive interfaces. In: Proceedings of the Interact 1995 conference, pp. 121–126 (1995) 8. Figueroa, P., Green, M., Hoover, J.: InTml: A Description Language for VR Applications. In: Proceedings of Web3D 2002, Arizona, USA, pp. 53–58 (2002) 9. Fowler, M., Highsmith, J.: The Agile Manifesto. Software Development (August 2001) 10. Genrich, H.J.: Predicate/Transitions Nets. In: Jensen, K., Rozenberg, G. (eds.) High-Levels Petri Nets: Theory and Application, pp. 3–43. Springer, Berlin (1991) 676 P. Palanque et al. 11. Gulliksen, J., Goransson, B., Boivie, I., Blomkvist, S., Persson, J., Cajander, A.: Key principles for user-centred systems design. Behaviour and Inf. Tech. 22, 397–409 (2003) 12. Jacob, R.: A Software Model and Specification Language for Non-WIMP User Interfaces. ACM Transactions on Computer-Human Interaction 6(1), 1–46 (1999) 13. Kabbash, P., Buxton, W.A.: The “prince” technique: Fitts’ law and selection using area cursors. In: Proceedings of the ACM CHI Conference, pp. 273–279. ACM Press, New York (1995) 14. Lim, Y., Pangam, A., Periyasami, S., Aneja, S.: Comparative analysis of high- and lowfidelity prototypes for more valid usability evaluations of mobile devices. In: Proc. of NordiCHI 2006, vol. 189, pp. 291–300. ACM, New York (2006) 15. Navarre, D., Palanque, P., Dragicevic, P., Bastide, R.: An Approach Integrating two Complementary Model-based Environments for the Construction of Multimodal Interactive Applications. Interacting with Computers 18(5), 910–941 (2006) 16. Navarre, D., Palanque, P., Bastide, R., Sy, O.: Structuring interactive systems specifications for executability and prototypability. In: Palanque, P., Paternó, F. (eds.) DSV-IS 2000. LNCS, vol. 1946, pp. 97–120. Springer, Heidelberg (2001) 17. Navarre, D., Palanque, P., Bastide, R., Sy, O.: A Model-Based Tool for Interactive Prototyping of Highly Interactive Applications. In: 12th IEEE International Workshop on Rapid System Prototyping, Monterey, USA, IEEE, Los Alamitos (2001) 18. Navarre, D., Palanque, P., Basnyat, S.: Usability Service Continuation through Reconfiguration of Input and Output Devices in Safety Critical Interactive Systems. In: Harrison, M.D., Sujan, M.-A. (eds.) SAFECOMP 2008. LNCS, vol. 5219, pp. 373–386. Springer, Heidelberg (2008) 19. Navarre, D., Palanque, P., Bastide, R., Schyn, A., Winckler, M., Nedel, L.P., Freitas, C.M.D.S.: A model-based approach for engineering multimodal interactive systems. In: Costabile, M.F., Paternó, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 170–183. Springer, Heidelberg (2005) 20. Palanque, P., Bernhaupt, R., Navarre, D., Ould, M., Winckler, M.: Supporting Usability Evaluation of Multimodal Man-Machine Interfaces for Space Ground Segment Applications Using Petri net Based Formal Specification. In: Ninth International Conference on Space Operations, CD-ROM proceedings, Rome, Italy, June 18-22 (2006) 21. Parnas, D.L.: On the use of transition diagram in the design of a user interface for interactive computer system. In: Proceedings of the 24th ACM Conference, pp. 379–385 (1969) 22. Peterson, J.L.: Petri Net Theory and the Modeling of Systems. Prentice-Hall, Englewood Cliffs (1981) 23. Reason, J.: Human Error, 302 pages. Cambridge University Press, Cambridge (1990) 24. Rettig, M.: Prototyping for tiny fingers. Commun. ACM 37(4), 21–27 (1994) 25. Risoldi, M., Amaral, V.: Towards a Formal, Model-Based Framework for Control Systems Interaction Prototyping. Rapid Integration of Software Engineering Techniques, 144–159 (2007) 26. Rudd, J., Stern, K., Isensee, S.: Low vs. high-fidelity prototyping debate. Interactions 3(1), 76–85 (1996) 27. Vanacken, D., De Boeck, J., Raymaekers, C., Coninx, K.: NiMMiT: a Notation for Modelling Multimodal Interaction Techniques. In: International Conference on Computer Graphics Theory and Applications, Portugal (2006) 28. Wieting, R.: Hybrid High-Level Nets. In: Proc. of the 1996 Winter Simulation Conference, pp. 848–855. ACM Press, New York (1996) 29. Willans, J.S., Harrison, M.D.: Prototyping pre-implementation designs of virtual environment behaviour. In: Nigay, L., Little, M.R. (eds.) EHCI 2001. LNCS, vol. 2254, pp. 91– 108. Springer, Heidelberg (2001) RUCID: Rapid Usable Consistent Interaction Design Patterns-Based Mobile Phone UI Design Library, Process and Tool Avinash Raj1 and Vihari Komaragiri2 1 Toronto, Canada avinash.raj@hotmail.com 2 Bangalore, India vihari@gmail.com Abstract. This paper is based on a research effort at Kyocera Wireless, India that aimed to overcome the limitations in the mobile phone design process, by giving designers an improved design and specification tool and helping them deal routinely with some of the more rooted constraints of phone design. The tool extends the idea of templates from simple visual elements, to more abstract design components. It adds further value to this modularization of design, by taking an approach of extensive and ever-growing library of patterns to define and refine these components. The components cover most of the low- to medium-level building blocks of design. They are specified in the library as a tuple(patterns) of <design problem, design solution, context, constraints> each at the different level of hierarchy. The components are visually represented using standardized shapes with placeholder and help text and are made available as part of the design work surface of a visual prototyping tool such as MS Visio or Adobe Fireworks. Keywords: Mobile phone UI design, patterns, architecture, design process, lib. 1 Introduction Modern mobile phones give its Users many features: voice and data calls, text messaging, personal information management (phonebook and calendar), WAP browsing, games, etc. All these features are packaged into a handset with a small screen and 12/16 key special purpose keypad. Each new release of the phone builds additional features on top of the existing ones and there is a constant race among phone vendors to put attractive phones out into the market faster and cheaper. Among the newer mobile phones deluging the markets most have their own separate interaction paradigms. It is important to find out a uniform set of interaction paradigms to avoid a time-consuming and expensive approach, to avoid re-writing each application for each mobile phone. The Users should not be forced to undergo the grueling process of learning and unlearning these interaction styles when using a J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 677–686, 2009. © Springer-Verlag Berlin Heidelberg 2009 678 A. Raj and V. Komaragiri same set of application over multiple mobile phones of either the same vendor or, in fact, even of different vendors. 1.1 Current Mobile Phone Interaction Design Process Though design guideline documents deliver coherence among User Interfaces, they cannot be used effectively to communicate how different components of the design will work together and how Users will interact with them. In addition, guidelines can get obsolete or ignored very soon in the fast developing world of mobile phones. Furthermore, the interaction designer is limited to the task of specifying the design that a software engineer then implements it in an embedded software development environment that is notorious for its lack of sophisticated APIs for UI creation. This indirection and the limitations inherent in the development environment, also mean that a lot of design intent and time could be lost in translation. There needs to be a way to put design implementation in the hands of interaction designers, and this needs to be done in a “backward” compatible manner. Even when new technologies like FlashLite, uiOne become the platforms of UI development, there will still be some phones that require UI development in native code. So the solution will have to support both styles of interaction design and implementation. While phone vendors have started adopting a platform approach to software development, adding incremental features to existing code bases and builds, this has the bug/feature of perpetuating design from older phones, whether good or bad. There is also no easy way of upgrading a design element for greater usability because it is difficult to trace a design element across various features where it is used. A solution to this problem could be ensure that interaction design is modular to the extent possible and utilizes design elements in a consistent, traceable manner. 1.2 Proposal to Solve Usability, Consistency and Time-to-Market Constraints The aim of this research is to provide a tool where interaction designers can choose from the pre-packaged design element library and use the appropriate element by mapping the usability constraints and context of the pattern with the needs and context of the feature being designed. Presented in the form of a Microsoft Visio template-based prototype tool, these patterns can be easily used by the designers. After the usual steps of analyzing the design problem, identifying the User goals and then breaking them down into tasks, the paper proposes a change to the design process. Instead of trying to sketch the design from scratch from that point onwards, the designer simply uses the design tool and its template library to look for and use design modules that already exist. For the part of the design that does not yet exists, the designer then builds newer tasks and flows from existing building-block objects. The designer then adds these newer creations as potential candidates to the pattern template library to be verified for usability, to be incorporated and then used by other designers in creating other features. The designer achieves speed and design consistency with this approach. The modular pattern library allows for reuse across designers and design teams, and for usability refinement, design evolution and for backward compatibility with changes as the product evolves. RUCID: Rapid Usable Consistent Interaction Design 679 2 RUCID Basics In this section, we present a novel formulation of mobile phone interaction design architecture and build on this framework using the patterns based approach inspired by Christopher Alexander [2]. Some samples of mobile phone interaction design patterns are presented here to illustrate the concept. We draw from the work of Alan Cooper [1] to derive interaction design architecture for mobile phones. The mobile phone has many input Triggers (typically the 12-key keypad, plus five-way navigation keys and so on). The context of use of the mobile phone is much different from the mouse and hence its interaction design structure is quite different from that of a computer mouse. There is a need for having a Primitive action for example a key press achieve not just generic input/output or application specific commands but directly address User goals. We address this in our model as follows: The User’s goals in using an application can be broken down into some generic tasks common across applications. These generic tasks precede and succeed specific tasks called into existence by the needs catered to by the feature being designed. For example, “starting an application or closing it” are typically generic tasks; “playing a music track” or “composing an SMS message” are feature specific tasks. Generic tasks are made up of Flows of input interaction in conjunction with the output – actions, symbols, graphics, and other feedback information – expressed on the screen (and speakers, vibrations etc.) of a mobile phone. The Flows are represented as Idioms to left hand side of the Alan Cooper inverted pyramid, while the output is represented on the right hand side of the inverted pyramid. The output shown on screen can be further divided into information, widgets and graphics. A Flow can be thought of as a sub-task or a sequence of Primitive and/or Compound actions that results in an application specific function to be executed. A Compound action in turn constitutes a sequence of Primitive User actions and phone reactions that achieve a User’s sub-objective. In a typical mobile phone design, a Flow that achieves a User objective, or a Compound action that achieves a subobjective, may simply consist of one Primitive action for example; press and hold of hash key can actually achieve a User goal of locking the phone. This User goal can also be achieved by accessing settings from the menu, choosing the keypad lock menu option and then enabling the keypad lock option. The architecture (Fig.1) of interaction design in mobile phones is at the heart of our pattern exploration. It anchors our search for interaction design patterns in mobile phones and also provides means to organize, link and document them. Since this model also articulates the typical top down process of design, it lends itself to very practical application as evidenced by the prototype tool that we created. In the following pages we look at sample patterns generated at the Widget, Primitive, Compound and Flow level, one sample each. The Compound, Primitive and Flow patterns that follow are tailored to the tool rather than the Alexandrian pattern. There were altogether 23 Primitive patterns, 3 Compound patterns, 15 Flow and 29 Widget patterns that were captured during the course of this research. 680 A. Raj and V. Komaragiri User Goal Tasks Screens Info Graphics Primitives Flows Compounds Triggers Widgets Fig. 1. “A New Mobile UI Design Architecture Model” that details out the hierarchy of design levels, starting from “User goals” all the way to “ Action-Triggers” Table 1. Sample Widget pattern (Soft key Window) Problem Context Solution Rationale User needs to access additional functions that can be performed on a screen. For a given Screen, a User has more number of actions possible than the maximum number of Soft keys. One of the options that can be accessed through a softkey can provide gateway to multiple options. The User can move to these options and select a desired one. A limited number of Softkeys can be displayed at any one time. A dedicated key cannot be assigned to each option because: 1) The options and their number keep changing depending on the screen. 2) The surface area of mobile is small and limited. Using a single key to access a variable sized list allows for any number of items to be accommodated. Examples Sub Title Sub Title Slideshow <Option 2> <Option 3> <Option 4> <Option 5> <Option 6> View Related Patterns Options Press Right Soft Softkey, List, Scroll Close RUCID: Rapid Usable Consistent Interaction Design 681 Table 2. Sample Primitive Pattern (Type a Digit) Problem User wants to type a digit Examples Related Patterns Solution Constraint [1]Press a Number [1]User is in Numbers only mode. There is a key clear mapping between what User presses on [2]Press & Hold a the keypad and what appears on the screen. Number key until [2]User is in Normal Alpha or Rapid Entry the mode. Typing a digit in this mode may not be digit appears on the intuitive but it is easy to learn. screen [3]In current Phones it’s only used as feedback. [3]Speak the digit [4]This can be used when input allows a small [4]Press Navigation Numeric range (date, year, month etc.), key (up/down) to otherwise its time consuming. change digits •Press a Number key to type digit in the Numbers only mode • Press & Hold a Number key until the digit appears on the screen in Normal Alpha or Rapid Entry mode • In Voice Dial the digits are typed as one speaks • In Set Alarm Press Navigation key (up/down) to change hours, minutes c1. Type a string Table 3. Sample Compound Pattern (Type a String) Problem [1]User wants to type a Numeric string Solution [1] Press Number keys, Navigate, Delete Constraint [1]User is in Numbers only mode [2],[3]User wants to type a Alphabetic string [2]Press a key (29) once for the first alphabet, twice for the second alphabet, and so on, Navigate, Delete; [3]Press0Next key [4]User wants to type a Alphanumeric string [4]Press & Release keys for alphabets & Press & Hold keys for Numbers, Navigate, Delete [2]User is in Normal Alpha mode; [3]User is in Rapid Entry mode. Rapid Entry mode checks its dictionary of common words and guesses at the word User is trying to spell. [4]User is in Normal Alpha mode. Child Patterns p1.Type a Digit, p3.Delete Character to the left of cursor, p10. Navigate left/right p2. Type an Alphabet, p3.Delete Character to the left of cursor, p10. Navigate left/right; p2.Type an Alphabet Example p1.1,p3.1,p 10.1 p1. Type a Digit, p2.Type an Alphabet, p3.Delete Character to the left of cursor, p10. Navigate left/right p1.2, p2.1, p3.1, p10.1 p2.1, p3.1, p10.1; p2.2 682 A. Raj and V. Komaragiri Table 4. Sample Flow Pattern (Initiate) Problem Solution [1]Go through the Menu hierarchy [2]Use Shortcuts [3]Accept Alert [4]Initiate an Application through another [5]Type a Contact num, then choose to initiate Call or Send message User needs to start an Application Constraint Child Patterns [1]This is most simple & p24. Access additional generic, though most time screen functionality/ consuming way. Options, c2.Scroll & Select/p7.2 Select menu items by pressing digits on Num keys [2]User is in the Standby p18.Quick access mode. Using keypad (in Standby) Shortcuts is faster than using Menus. But it’s not possible to have shortcut for every Application. [3]When the User accepts p24.Access additional an alert it takes her to the screen respective Application. This functionality/ is applicable to very few Options Applications & is not a direct User action. [4]User is already in an p24.Access Application. Through additional screen Softkey Window the User functionality/ initiates another Options, c2.Scroll Application. Such & Select Applications generally compliment each other. [5]User is in standby. Only c1, (p12.Create applicable to Messaging and outgoing call)/ Calling. (p24.Access additional screen functionality/ Options, c2.Scroll & Select) Example p24.2, c2/P7.2 p18.1 p24.2 p24.1, c2.1 c1, (p12.1) /(p24.1, c2.1) 3 The Prototype In this section, the part of the problem statement for this research paper is addressed the design and development of a tool that interaction designers can use to perform rapid, usable, consistent interaction design. The design, details and use of the tool for mobile phone feature design is outlined. The tool is based on the pattern-based UI Design Library sampled in the previous section. In order to make efficient use of the patterns library however a platform is needed. Visio is a diagramming program that can help to create business and technical diagrams that document and organize complex ideas, processes, and systems. Visio seemed like a suitable platform for using the captured pattern for various reasons: 1. Shape libraries can be easily created with Visio tools. 2. New template can be created from a drawing file or from an existing template by assembling diagrams by dragging predefined shapes. RUCID: Rapid Usable Consistent Interaction Design 683 3. Reviewers’ comments and changes to shapes can be tracked using the review mode. 4. Visio can be integrated with Microsoft Office Excel, Microsoft Office Word, Microsoft, and other formats. Once we identified patterns at each level (Primitive, Compound, Flow, Widget), Visio templates were created for each level of the mobile phone architecture. We built a design vocabulary. Higher levels templates for example Screen could be constructed from lower levels like Widgets. The application will facilitate in effective and efficient designing of Screens and task flows of newer applications by just dragging and dropping existing UIs and Interactions. Revision and iterations are also easier, whenever a shape or a whole template is added to the tool. The tool follows hierarchical method of design on which our research is based on. If we take an application, application has some goals. Theses goals can be broken down into tasks. Task can be broken down into UIs (Widgets, Screen) and Interactions (Flows, Compound, and Primitives). Thus the whole application can be constructed using this tool. 3.1 Sample Application (Slideshow) to Explain the Prototype User Goal: View Camera Picture In slideshow Tasks: 1. Launching Slideshow from image browser 2. Pausing/Continue the Slideshow 3. Skip to “Next/Previous Image” 4. Set Transition duration 5. Enable/Disable Looping 6. Exit Slideshow We take 1st Task i.e. “Launching Slideshow from image browser” and try to construct it through our prototype. This is an instance of Initiate Flow. If it was available we would just need to drag the shape from Intimate template and put customized labels for the Screens & Flows. However let’s assume that the Flow was not available and therefore try to construct it from scratch from the available Widgets and Actions. Templates: • Basic Flowchart Shapes • Primitive action • Compound action • Widgets • Screens • Flows Shapes under Templates folder Fig. 2. Starting screen 684 A. Raj and V. Komaragiri Table 5. Task1 (Launching Slideshow from image browser) 1 2 3 4 5 6 7 RUCID: Rapid Usable Consistent Interaction Design 685 Steps: 1. Drag Image browser shape from the Widget template 2. Put appropriate labels to the Image Browser 3. Drag the Access Additional Screen Functionality/Options shape from the Primitive action Template 4. Choose the most appropriate action in the given Constraint 5. Drag the Softkey window from the Widget. The new component gets attached to an appropriate place through the connectors 6. Put appropriate labels to the Softkey Window 7. Drag the Image shape from the Screen Template and make customization to the labels and the layout In the same way rest of the tasks can be constructed. Later this Flow can be added as a shape to the Initiate template after getting it reviewed by the design team. 4 RUCID Testing, Analysis and Future Improvement Peer reviews were conducted from time to time throughout the research and feedback was fed into the next course of the research. Finding people to test with were not difficult because the people going to use the tool were the fellow interaction designers. However it was difficult to find time from their busy schedules. Some basic usability testing was conducted. The Visio prototype was very close to the actual envisioned tool. The Scenario given to the User was same as in the prototype shown earlier in the paper i.e. to Create the Slideshow application using the tool. Since they were already acquainted with Visio they could easily operate it. They felt that the tool gave a lot of control and freedom in the designer's hands. They recommended on adding some more Microsoft Office Visio templates, such as Basic Flowchart Shapes, which will be helpful in making flow diagrams and adding comments. They felt only the generic design should be laid out through the Visio tool. When the task flows with the Screens are made it will have references to particular chapter in the generic UI specification, where the guideline is laid out. This way the patterns will complement the design guidelines. However a pattern based approach to UI design can go beyond the Visio templates or even more customized specification tool. If the specification is actually done in something like Flashlite, and the patterns are codified in FlashLite, the designer can leverage a lot of the common elements of design and finally the specification output actually becomes the UI for the phone and the Flashlite version of the design can just be loaded upon a phone. There is no need for additional software development at least on the UI side. Individual patterns and the language as a whole can be refined by tapping into the broad expertise of the Kyocera design staff by discussing and gathering their collective wisdom. The library of patterns can be converted into a workable set of standards by agreeing on an appropriate rating scale and by assembling a representative group of reviewers who rate the content according to the same criteria. 686 A. Raj and V. Komaragiri 5 Conclusion Ultimately, we expect the patterns-based UI Design library and the process and tool that it facilitates, will result in a strengthened Kyocera User Interface design and brand and a more efficient design staff. A patterns-based approach has a better chance because of its inherent modularity which the tool is able to take advantage of. “Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice” (Alexander et al., 1977. p.x). The patterns created as part of this project have still some distance to cover in terms of rigor and validation to justify Alexander’s description above. But we believe they form a promising and important first step in that direction. Future technologies like uiOne [11], Flashlite are a fertile field for the extension of the patterns based approach. The UI design library makes is convenient to constantly evolve the various elements of design and to seamlessly integrate innovative elements into the future phone features as a normal part of the design process. In fact, it should be possible to write parsers that would read a Visio file and produce Flashlite scripts or uiOne triglets that could potentially be directly loaded onto a phone as its User Interface. This removes the indirection involved in implementing a design. The interaction designer is able to control the actual look, usability of a feature on a phone directly. That would be a very good goal to achieve. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Cooper, A., Reimann, R.: Essentials of Interaction Design. About face 2.0 Alexander, C.: A Pattern Language Implementing a Pattern Library in the Real World: A Yahoo! Case Study van Welie, M., van der Veer, G.C.: Pattern Languages in Interaction Design: Structure and Organization Vora, P., Castillo, J.: Using Patterns to Design Usable Interfaces for Web Applications. Alpha Cube, Inc. (June 29, 2005) Todd, E., Kemp, E., Phillips, C.: What makes a good User Interface pattern language? Tidwell, J.: COMMON GROUND: A Pattern Language for Human-Computer Interface Design, http://www.mit.edu/~jtidwell/Interaction_patterns.html van Welie, M.: Patterns in Interaction Design, http://www.welie.com/ Yahoo Design Pattern Library, http://developer.yahoo.com/ypatterns/ Forum NOKIA, http://forum.nokia.com/ uiOne, http://brew.qualcomm.com/brew/en/about/uione.html The Appropriation of Information and Communication Technology: A Cross-Cultural Perspective Jose Rojas and Matthew Chalmers Department of Computing Science, University of Glasgow Glasgow, G12 8QQ, UK {jose,matthew}@dcs.gla.ac.uk Abstract. In this paper we explore the process of appropriation attempting to broaden the set of topics considered significant on it. We present a model of appropriation derived from two studies conducted in the UK, Japan, South Korea and China. We describe our model based on a characterisation of elements supportive of appropriation in the context of use (discussed in terms of space/place, social practices and activity) and in the ICT itself (described in terms of meaning, relevance and triviality). We emphasise the pre-eminence of context in achieving the appropriation of ICT. Keywords: Appropriation, ICT, context, infrastructure, layout, marketing, business, domestication, socialisation, peer support, media, triviality, commoditisation, meaning, relevance, space, place, social practices. 1 Introduction One standing concern of the HCI field is facilitating the introduction of new information and communication technology (ICT) into society by lowering the barriers they might experience through an ongoing cycle of design, development and refinement of features. Interest in this problem arises from a preoccupation with the fast-paced change of technological development and the seemingly limited ability of society to cope with this deluge of change. The study of such a phenomenon broadly falls within what has been termed the appropriation of ICT. Appropriation is here understood as the “processes by which individuals and communities consciously take both conceptual and operational control of an idea, a tool, a technology, etc. within the context of their real and perceived culture” [15]. It could be argued that in the HCI field there is an assumption that by achieving the right combination of features in a new technology it is possible to produce a technology that ‘naturally’ fits practice. This view of appropriation as a single trait that can be captured and endowed upon new ICT seems to prevent us from considering a wider range of influences that might affect this process. This crusade to find the technology that can be seamlessly appropriated does not say anything about the fact that many people, every day, continue using new technologies like mobile phones, mp3 players, IM clients, etc. regardless of their proficiency operating them. There are other reasons beyond the technology itself that influence people in appropriating it. Elucidating what those other influences might be J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 687–696, 2009. © Springer-Verlag Berlin Heidelberg 2009 688 J. Rojas and M. Chalmers is precisely the purpose of this work. In this paper we will first review a popular approach to the study of appropriation in HCI through the concept of cultural dimensions highlighting some of its shortcomings in this regard. Later we will present alternative approaches to the study of appropriation of ICT. Drawing from this, we will then introduce our work arguing on its relevance to describe the appropriation of ICT. 2 Cultural Dimensions and HCI An approach in HCI to the appropriation of ICT is through a discourse centred in the cultural differences observed among people living in different milieux and in creating technologies that, arguably, capture and embody these differences. Geert Hofstede’s concept of cultural dimensions is a popular theory used to justify this approach [3, 10, 11]. Several reasons may account for this circumstance. First, it provides a simple, tested and ready-made model of culture that can be used as a theoretical platform for a research enquiry; second, it is supposed to be based in “quantifiable” traits of culture (power distance, collectivism, uncertainty avoidance, etc.); and thirdly, it was cited in one of the earliest books dealing with culture and HCI by del Galdo and Nielsen [8]. The cultural dimensions concept has drawn a considerable amount of criticism, primarily due to its reductionism. For instance, it claims cultural homogeneity among the members of a culture; disregards behaviour and values that are not necessarily determined by a particular cultural setting; ignores variations in the expression of cultural traits because of changing situations; and does not account for differences in the importance assigned by different cultures to specific cultural traits [9, 12, 14]. 2.1 What in the World Is Culture? From the perspective of cultural dimensions, culture is conceived as a monolithic and fixed entity. The idea of culture as a monolithic entity is nowhere clearer than in Hofstede’s definition of culture as “the collective programming of the mind which distinguishes the members of one group of people from another” [6]. Interestingly, but perhaps quite frequently ignored, even Hofstede denied the homogenising implications of his definition by conceding that cultures cannot be attached to specific geographical settings since cultures can be found both as integrated social systems and as smaller parts of those integrated social systems [6]. Hofstede’s model also characterizes culture as a fixed entity; therefore, the values and norms of a culture remain static (subjective dimension) even when its symbols may change (objective dimension). For Hofstede [6], the slow pace of change in values accounts for the emergence, over hundreds of years, of what he termed cultural dimensions. The fixedness of culture advocate by this theory prevents it from providing an intelligible account of, for instance, the rapid embrace of computer-mediated communication. This can be illustrated in a study by Wurtz [16] analysing McDonalds websites around the world in 2003 with their counterparts in February 2009. Clear-cut issues of low- vs. high-contextuality and individualism vs. collectivism, like those found in Wurtz’s study, are difficult to distinguish in the current design of those websites. An immovable culture such as that portrayed by Hofstede’s cultural dimensions would prevent a society from adapting to a changing environment by learning new ways of performing tasks and solving problems [9]. The Appropriation of Information and Communication Technology 689 In our opinion, trying to define users in cultural dimensions’ deterministic manner and then embodying those definitions in a technology is problematic. We believe that a more sensible approach to the integration of ICT into a person’s real or perceived setting [15] is found by identifying common elements and forces shaping this circumstance. An analysis of successful ICT across countries might provide us with a number of elements that influence people to embrace certain technologies regardless of their cultural differences. An investigation of those elements and their role in encounters between people and technology might be more helpful in providing the conditions necessary to introduce and sustain use of novel digital technology over the long term. With this idea in mind, we now consider other approaches to the study of the appropriation of ICT, and how they explore the influence of different elements beyond culture in this phenomenon. Dourish defines appropriation as “the process by which people adopt and adapt technologies, fitting them into their working practices” (emphasis added) [4]. This highlights two sides of this process: (1) the social element whereby people modify their activities to integrate a technology into their practices and, thereby, create new practices, and (2) the technical features of a technology embedded in its design that can be modified according to a predefined set of options. For Dourish, the problem of designing for appropriation seems to be both a technical issue that arises from dated software structures and a social issue. This observation is interesting when we observe that technologies that continue relying in “old” technological structures continue being appropriated all around us. Dourish’s view does not diminish the role of humans in the process of appropriation, but does focus on the “shortcomings” of current technology and how technological alternatives might address them. Carroll has introduced a model to describe the appropriation of ICT [2] and to aid in the design of technology [1]. Carroll’s model is the result of her exploration of mobile technologies and CRM systems. Carroll suggests the cycle of appropriation is completed only when, in the design process, a new set of requirements is gathered from the different ways in which a technology is appropriated (“design from appropriation”). New requirements and new uses are then incorporated into the design process for future iterations of a technology (“design for appropriation”) in order to incorporate the features designers did not foresee in the early iterations of a technology [1]. Carroll’s cycle highlights the essential role of users in completing the appropriation cycle. However, given the myriad people that every day embrace ICT, satisfying the needs of new users would appear to demand a never-ending cycle of requirements gathering and redesign. Clearly, this infinite cycle is not happening and yet many people continue embracing ICT. Ito suggests that Japan’s adoption of keitai—as mobile phones are known there—is due to a self-feeding loop carved in the particularities of Japanese popular culture (e.g., animation, video games, comics, food and other cultural elements of this nation) [7]. Yet popular culture alone does not account for the adoption of mobile phones and the mobile Internet in Japan. At least two more elements also have an important role in this phenomenon: business practice and mobile technology design. Complex relationships between use, design of mobile technology and business practice provide a better understanding of the adoption of keitai and mobile Internet in Japan [7]. For instance, in his account of mobile phone adoption in Japan, Okada [13] traces the development of the technologies that made the mobile phone possible describing how 690 J. Rojas and M. Chalmers the addition of a simple LCD on pagers made possible the transition of this technology from a business tool to a personal one. However, even these technological achievements would have had little impact among people if the corporations developing these technologies had not battled each other to penetrate the Japanese market through constant price reductions [13]. It would appear then that the appropriation of ICT is influenced by a constellation of elements beyond the people that use a technology or the features of the technology itself. Analysis beyond these latter two issues seems to be absent from standard research approaches into this phenomenon. To compound this issue, those other elements influencing the appropriation of ICT seem to fall outside traditional areas of concern in HCI. Several questions naturally arise from this observation: What are those other elements influencing the appropriation of ICT and what are their relationships? And how is a deeper understanding of the appropriation of ICT useful for the HCI field as we create novel technologies? 3 A Model of Appropriation of ICT To address these questions we explored the appropriation of common ICT (e.g., computers, Internet, mobile phones, IM, online social networks, blogs) in settings other than work where these technologies are used on a voluntary basis over the long term. First, we conducted an ethnographic study over a three-month period with a sample of fifteen international students from China, Greece and India enrolled in Masters degree courses during the school year 2007-2008 at the University of Glasgow. We did this assuming the process of appropriation of ICT and the dynamics associated with these changes, would be magnified among people relocating to a new milieu. Our study centred on weekly 30-minute interviews with each member of our sample from October 2007 to January 2008. We followed this approach for two reasons: (1) to explore at a high level events that remain imprinted in our sample’s memories regarding their reasons to adopt an ICT, and (2) to track behavioural changes in their use of ICT, and their justifications, over an extended period of time. We analysed transcripts and other collected materials using the Grounded Theory approach [5] identifying forces and elements that shape this process within this community. In this manner we identified elements in the context of use and in the ICT itself that play a decisive role in the appropriation of ICT. In order to increase the relevance of the findings of our first study we decided to undertake fieldwork in other countries to account for different elements in diverse contexts. We believe this experience improved the ecological validity of our findings and of the resulting socio-culturally informed model of appropriation of ICT we built from our data. One of the authors visited Japan, South Korea and China to carry out similar studies to the first, but this time interviewing people in their native countries. In Sapporo, Japan we interviewed two Japanese students and four Japanese workers, as well as four foreign students from Brazil and India. In South Korea we interviewed eleven local students at Ajou University in Suwon. In China, twenty local students at Nankai University in Tianjin were recruited. Interviews lasted forty minutes on average and, with the exception of Japan, where each participant took part in more than one individual interview, all other participants took part in one single individual interview. The Appropriation of Information and Communication Technology Participants were recruited through advertisement on the BBS of each university and through word of mouth. The only requirement to take part in the study was the ability to conduct a simple conversation in English. The second study was complemented with observations of people’s conduct and ICT use in public transportation and private and semi-private spaces (e.g., restaurants, classrooms, airports, streets, shops, etc.) across the countries surveyed. Based on the above two studies, we drew out six elements whose interactions seem to provide the necessary conditions for the appropriation of ICT in everyday life. Outlined in Figure 1, and explained later in this section, our model covers the key issues of this phenomenon in all the contexts (and countries) we have analysed. The following example illustrates the use of our model of appropriation of ICT: 691 Fig. 1. Our model of appropriation is constituted of three micro-level features—relevance, meaning and triviality—and three macro-level features: space/place, activity and social practices “When I was in High School the Internet became very popular in China. Everyone would have a computer at home and in their offices. Everyone would be talking about chatting and QQ. I started using QQ in High School. It was very popular in my environment among my friends to communicate with each other, so I applied, downloaded and installed the software. However, I would seldom use QQ. At that time our work schedule was very tight. My best friend, my classmates, all were very keen on their studies, so we seldom used it. It was common to have a computer, but students would seldom use it.” (Yan, 23, China) On a micro-level (inner ring in Figure 1), like any other high school student, this participant needs to communicate with her peers. To do that, she needs to use a technology that satisfies the task she is supposed to perform (i.e. it is relevant), one she can afford (i.e., trivial), and one that is in accordance with the practices enacted by her peers (i.e. meaningful). At a macro-level (outer ring in Figure 1) use of an ICT does not take place in a vacuum, it takes place within the structure of a particular context or environment (Meizhou, Guangdong, China in this case). Thus, the use of QQ as a communication technology is set within the following context: 1) it appears to be that around the time of this event (2001-04) a general adoption of computers and the Internet took place, at least in this town; the absence of computers and the Internet would prevent this event from happening at all (space/place); 2) that QQ is even considered as an alternative for communication is only possible because the company who developed this IM client has invested reasonable resources to position itself as a communication channel for the younger generation (social practices); and yet, 3) the use of QQ is opposed by the specter of the Chinese National College Entrance Examination during ‘Black June’ that represents a life-defining moment among the Chinese youth (activity). This example illustrates how the appropriation of ICT is achieved through a delicate balance between the characteristics of the ICT itself and the context of use. We will now describe more fully the six elements mentioned, but in doing so we note the 692 J. Rojas and M. Chalmers importance of understanding how these elements influence each other. This exercise can serve to remind ourselves of the limits of our sphere of influence as technologists. 3.1 Relevance The practical issue addressed here is based in the fact that in everyday life people continually face challenges they need to solve. Individuals are continually making choices as to what they want, need or should act on. A large part of the success of an ICT resides in its ability to continue opening new possibilities to disseminate information and facilitate communication, and in its ability to sustain existing practices around these issues. The relevance of an application to support these activities might be practical or perceived. For instance, participants across the countries surveyed usually relied on Skype for voice communication abroad because it was the only service that, among the existing alternatives, offered a free service and/or a competitive price. Skype, within this context, is a case of practical relevance as it effectively makes voice communication affordable. Conversely, in China we found participants who would embrace applications such as MSN assuming this move would be to their advantage upon securing a job since the use of QQ is discouraged in a work setting. The case of MSN here is one of perceived relevance. 3.2 Triviality As people address their challenges in everyday life, they may consider whether digital technologies may help them in meeting these challenges. Accordingly, people may frame their problems in terms of the possibilities of ICT within the horizon of expectations engendered by the media, their perception regarding the ease of use of any given ICT, and the costs, if any, associated with operating (or acquiring) an ICT. The practical issue for individuals here is the choice, from among the ICTs that they have at hand and that they can afford, of the most ‘trivial’ one to use. Different factors contribute to this including commoditisation, usability issues and the media. Moore’s Law maintains that the number of transistors (microprocessor performance) doubles every two years. One of the main consequences of this law is the commoditisation of digital goods. Perhaps the best example of the commoditisation of technology is mobile phones. Some participants received their first mobile phone as early as 14 years of age. Some of them are now using their fifth mobile phone. Although not a reason enough to secure appropriation, a high level of usability in ICT facilitates this process. ICTs need to reach a level in which even first-time users can operate them with some degree of proficiency in a short period of time. For instance, before the creation of YouTube it was possible to share video online, but this operation would require skills and resources reserved to a few. The creation of YouTube facilitated the task of sharing video online to such a degree that none of the skills previously required were needed any longer. In our view, the media contributes to the trivialisation of ICT by ingraining it into the public consciousness. It could be argued that the media is one of the most important factors in promoting ICTs, in shaping what ICTs are widely known, and even when and how ICTs are to be used. Needs finally solved by ICTs, images acquired by the use of the latest technologies, and new horizons arguably opened by novel gadgetry seem to be the standard discourse about ICTs in the media. The Appropriation of Information and Communication Technology 693 3.3 Meaning Users assign subjective and often intangible meanings to ICT, making it meaningful in individual ways, as well as taking up more clear–cut objective functional patterns of use. In the case of information gathering, most cases observed were an individual activity, and meanings assigned to information technology were given on a discrete basis. Tools used to develop an activity become interlocked with the feelings experienced during the task (e.g., success or relief) influencing any future use. The appropriation of an information tool continues over the long term until meanings associated with a technology are lost because the tool is proved to lack the very same faculties and appeal that led to its adoption, or because a superior or more attractive technology comes on stage (e.g., Google’s appropriation in the face of AltaVista’s demise). The case of communication technologies seems to be more complex. Meanings bestowed upon communication technology do not seem to depend only on the activity they are supposed to satisfy but on a larger constellation of issues. A common behaviour observed in the sample was that of mixing and matching various communications technologies (e.g. IM, social networks, etc.) according to different circumstances such as: 1) place where they were used (e.g., bedroom vs. lecture hall vs. library); 2) message that was transmitted (e.g., chit-chat vs. more substantial issues); 3) ‘listener’ or the person at the other end of the communication link (e.g., friend vs. family vs. lecturer; 4) level of involvement with listener (e.g., friends vs. boyfriend/girlfriend); and even 5) time of the day and season of the year when communication takes place (e.g., day vs. night, holidays vs. term time). Users appropriate communication technologies according to these various scenarios of daily use imbuing them with multiple meanings and making them coexist within a constellation of other technologies. Besides addressing an activity in terms of the most trivial ICT at hand, previously established patterns of use also have a significant effect on people. People tend to appropriate those technologies already appropriated by their peers. The practical issue here is which ICT is the most appropriate to communicate a given message to a given person at a given moment, with regard to both sender and receiver. 3.4 Activity ICTs are constantly being assessed by their users according to their abilities to continue satisfying evolving and emerging needs (cf. relevance above), and their expression in everyday activity such as play, communication, study, etc. according to the trends of our time. Not surprisingly, then, the ability of ICTs to continue addressing real and perceived needs (and the activities these needs engender) is another element influencing their appropriation. Broadly speaking, among our participants ICTs were used to satisfy three types of activities: 1) study and work activities that are paramount in their list of priorities and to which all other needs/activities are secondary; 2) leisure and entertainment activities that play an important role in balancing everyday life; and to a lesser degree, 3) economic-related activities that might produce a financial benefit (e.g., purchasing clothes online at a lower price than in a bricks-and-mortar store). 694 J. Rojas and M. Chalmers 3.5 Space/Place The manner in which an activity or need is satisfied does not depend entirely on the tool used to this end, but also on the place where the tool is used. Thus, even when mobile technologies might extend the spatial range where social interactions mediated by ICT occur, their use is still restricted by two elements of the space/place where action takes place: the infrastructure and layout, and the marketing and business practices of a given milieu. In a sense, these elements determine what ICT is trivial in any given setting. Some of our participants across the countries surveyed were housed in university accommodation. Having to live under these conditions forced participants to readjust and modify previous habits to cope with the restrictions and possibilities of a new infrastructure and layout that is shared with non-family members. Thus, while mobile technologies might open new horizons for use, they might not be appropriated because they are effectively constrained by service availability (e.g., a mobile provider without a good signal in certain areas) and by other physical barriers (e.g., a bedroom for six students) that must be negotiated with regard to other people’s activities. Strong influence on the appropriation of ICT comes from the marketing and business practices of a given milieu. This influence is not only shaped by the technical possibilities of a technology, but by business competition. ICT thus exists only within the possibilities made available by different corporations for whom the commercialisation of ICTs and their features are a source of revenue. Within this structure, it would seem obvious to say that users can only appropriate ICTs that are commercially available, as well as affordable or free. We believe the influence of marketing and business practices cannot be ignored in any study of the appropriation of ICT. 3.6 Social Practices Organisational regulation in places such as schools and offices exerts considerable influence in making certain ICT the de facto standard to conduct an activity. For instance, at Nankai University in China students would conduct most of their internal communication through the school’s BBS. Whoever enters those spaces/places is tacitly forced to comply and adapt to the ongoing practices of these environments. Government regulation is another considerable influence on the appropriation (or not) of ICT. For instance, for Korean males, military service is a duty characterised by deprivation of any means to communicate with the exterior except through regular public phones inside the base. As indicated above, the media’s influence in the use of ICT was observed as having a significant effect. Existing and evolving practices of a co-located or dispersed social group make certain technologies the chosen option to conduct various social practices such as the expression of friendship and communication. Thus, the appropriation of ICT is not necessarily a question of the technical features of a technology, but a question of how the environment leads a group of people to embrace an ICT and incorporate it into its existing and ongoing social practices making it meaningful. Two groups are especially relevant in this regard: the family, and friends and acquaintances. The domestication of technologies takes place when these become part of a household dynamics. ICT are also adopted and integrated into the ongoing activities of a domestic environment. For instance, across the countries surveyed, parents were The Appropriation of Information and Communication Technology 695 always highlighted as the original facilitators of ICT such as computers and mobile phones. The provision of a mobile phone was invariably a form of monitoring enacted by anxious parents. The integration of ICT into the dynamics of a group of friends, acquaintances and classmates—the socialisation of ICT—also seems to be fundamental in its appropriation. For instance, in South Korea the most popular IM client among our sample was NateOn; in China it was QQ. Use of a different client was seen as an oddity and discouraged among our participants and their friends. 4 Conclusion In this paper we have avoided focusing on a structural or functional analysis of ICT, and whether or how structure and function directly determine appropriation. Instead, we took a step back to consider the forces acting on the context of use in terms of space/place, social practices and activity. From this viewpoint, we observed how the presence of certain characteristics of ICT (collectively summarised as triviality, meaning and relevance) with a given context of use seems to encourage, and in some cases impose, the use and appropriation of ICT. We suggest that this model might be applied to other countries and contexts of use beyond university students, as it focuses on elements and characteristics that can be consistently found across diverse settings. By laying out the interplay of the six aspects of our model of appropriation, as they appear in a particular case, we believe it is possible to provide a fair picture of the appropriation of an ICT in everyday life—or the lack of it. However, we invite readers to apply the conceptual framework presented here to other settings, communities and countries so as to assess and improve it. The breadth, openness and interdependence of these aspects makes them difficult to experiment with holistically in controlled lab settings, or to encapsulate in fixed and formal software engineering terms. Most difficult for such approaches, we believe, are uncontrollable and unpredictable aspects of large–scale social activity, such as economics, marketing and the mass media. Our findings suggest that such areas are often essential parts of the adoption and appropriation of ICT, even if they are not conveniently or traditionally part of such approaches. This confirms the need for more situated practices of user study well-established in HCI, such as those based on ethnography, but it also reminds us of the breadth of perspective needed for—and inherent difficulty of—design for appropriation that reflects and respects its dynamism, detail and variety. References 1. Carroll, J.: Completing Design in Use: Closing the Appropriation Cycle. In: Proceedings of the 12th European Conference on Information Systems (ECIS 2004), Turku, Finland, p. 11 (2004) 2. Carroll, J., Howard, S., Vetere, F., Peck, J., Murphy, J.: Just What Do the Youth of Today Want? Technology Appropriation by Young People. In: Proceedings of the 35th Annual Hawaii international Conference on System Sciences (HICSS 2002), IEEE Computer Society, Washington, DC, USA (2002) 696 J. Rojas and M. Chalmers 3. Choi, B., Lee, I., Kim, J., Jeon, Y.: A qualitative cross-national study of cultural influences on mobile data service design. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Portland, Oregon, USA, pp. 661–670. ACM Press, New York (2005) 4. Dourish, P.: The Appropriation of Interactive Technologies: Some Lessons from Placeless Documents. Computer Supported Cooperative Work 12, 465–490 (2003) 5. Glaser, B.G., Strauss, A.L.: The Discovery of grounded theory. Strategies for qualitative research, pp. x. 271. Weidenfeld & Nicolson, London, printed in USA (1968) 6. Hofstede, G.H.: Culture’s Consequences: Comparing Values, Behaviors, Institutions, and Organizations Across Nations. Sage Publications, Thousand Oaks (2001) 7. Ito, M.: Introduction. In: Ito, M., Okabe, D., Matsuda, M. (eds.) Personal, Portable, Pedestrian: Mobile Phones in Japanese Life, pp. 1–16. MIT Press, Cambridge (2005) 8. Kamppuri, M., Bednarik, R., Tukiainen, M.: The Expanding Focus of HCI: Case Culture. In: Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles, Oslo, Norway, pp. 405–408. ACM Press, New York (2006) 9. Kamppuri, M., Tukiainen, M.: Culture in Human-Computer Interaction Studies. In: Proceedings Cultural Attitudes Towards Communication and Technology 2004, Murdoch University, Australia, pp. 43–57 (2004) 10. Kayan, S., Fussell, S.R., Setlock, L.D.: Cultural Differences in the Use of Instant Messaging in Asia and North America. In: Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work, Banff, Alberta, Canada, pp. 525–528. ACM Press, New York (2006) 11. Marcus, A., Gould, E.W.: Crosscurrents: Cultural Dimensions and Global Web Userinterface Design interactions 7(4), 32–46 (2004) 12. McSweeney, B.: Hofstede’s Model of National Cultural Differences and Their Consequences: A Triumph of Faith - A Failure of Analysis. Human Relations 55(1), 89–118 (2002) 13. Okada, T.: Youth Culture and the Shaping of Japanese Mobile Media: Personalization and the Keitai Internet as Multimedia. In: Ito, M., Okabe, D., Matsuda, M. (eds.) Personal, portable, pedestrian: mobile phones in Japanese life, pp. 41–60. MIT Press, Cambridge (2005) 14. Ratner, C., Hui, L.: Theoretical and Methodological Problems in Cross-Cultural Psychology. Journal for the Theory of Social Behavior 33, 67–94 (2003) 15. UNESCO Definitions, http://www.unesco.org/education/educprog/lwf/ doc/portfolio/definitions.htm (accessed December 27, 2008) 16. Würtz, E.: Intercultural Communication on Websites: An Analysis of Visual Communication in High- and Low-context Cultures. In: Proceedings Cultural Attitudes Towards Communication and Technology 2004, Murdoch University, Australia, pp. 109–122 (2004) UISK: Supporting Model-Driven and Sketch-Driven Paperless Prototyping Vinícius Costa Villas Bôas Segura and Simone Diniz Junqueira Barbosa Departamento de Informatica Rua Marques de Sao Vicente, 225 Gavea, Rio de Janeiro, RJ, 22451-900, Brasil {vsegura,simone}@inf.puc-rio.br Abstract. Sketches are often used during user interface design and evaluation as both a design support tool and a communication tool. Despite recent efforts, computational support to user interface sketching has not yet reached its full potential. This paper reports a study comparing two evaluation techniques: paper prototyping and a simulation-based evaluation supported by the UISKEI tool. Keywords: User interface sketching, prototyping, user interface evaluation. 1 Introduction Most designers create, as part of the design activity, sketches of their alternative design ideas to better communicate them among the design team and to the users, as well as to evaluate them early in the design process [2, 5, 7]. Users, in their turn, prefer to evaluate user interfaces that more closely resemble the final product, but very often find difficulties in going beyond static representations to grasp how the user-system interaction will take place. Through sketches alone, the consequences of certain design choices may go unnoticed, such as restrictions on the configuration of or sequence of actions in order to perform a certain task. There are even users who consider paper sketches a hurried and amateurish representation, despite the success of early evaluation techniques such as paper prototyping [11]. To support designers in the creation of sketch-based prototypes, we have developed a tool called UISKEI (User Interface Sketching and Evaluation Interface), which allows designers to draw and have the tool recognize user interface elements, as well as to associate behavior to these elements in a pseudo-functional prototype. The user can then interact with the prototype as in a simulation of how the application will behave, thus gaining a better understanding of how the interaction will happen, and allowing designers to conduct usability evaluation with users [9] early in the design process. This paper is organized as follows. The next section presents the tool and the goals it intends to fulfill. The following section describes an early evaluation of how UISKEI compares to paper prototyping, explaining how the test was conducted and the results obtained. Lastly, we present the conclusions of the study. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 697–705, 2009. © Springer-Verlag Berlin Heidelberg 2009 698 V.C.V.B. Segura and S.D.J. Barbosa 2 UISKEI: Instrumenting Sketch-Driven Evaluation 2.1 Designing in UISKEI UISKEI has been developed to support two design strategies: based on sketches alone, and based on interaction models and sketches. In the first strategy, designers will manually define the behavior of each user interface element to prepare for the simulation. In the second strategy, designers will import the definition of the application behavior from an XML file representing an interaction diagram in MoLIC, a modeling language for interaction as conversation [1]. MoLIC allows designers to represent interaction scenarios [3, 4] in a more structured representation, in which intersections and relations between scenarios are made explicit. To create the sketch of a presentation unit (window, web page and the like) with the user interface elements contained therein, the user must draw them on the screen, taking advantage of the pen-based interaction supported by the system. There is a predefined language of gestures that allows UISKEI to recognize the corresponding widget (WIMP element), using the algorithms and ideas defined in [6,8,10,12]. Currently, the set of user interface elements recognized by UISK includes the following elements: buttons, labels, radio buttons, toggles, drop-down lists, lists, and textboxes. If an element is not recognized, it remains on the screen as a stroke, allowing the designer to draw any kind of meaningful symbols (for example, simplified images or logos). This possibility of unconstrained drawing grants flexibility to the tool, making it easier for the prototype to better resemble the final interface. In addition, it allows the creation of innovative user interface elements, since every single drawing can be treated as a widget. Fig. 1. UISKEI elements Having drawn the user interface elements, it is possible for the user to define values and behaviors for each element. Values are entered as strings and they can only be added to a specific set of elements, having different interpretations depending of the element. For example, the values of a textbox correspond to the texts that could be “typed” in the prototype while the values of a drop-down list correspond to the items that could be selected in the prototype. Some elements have pre-defined values (e.g. radio buttons have two: checked and unchecked) and others have none (e.g. labels). UISK: Supporting Model-Driven and Sketch-Driven Paperless Prototyping 699 The behavior can be conditionally triggered (e.g. depending on the value of another user interface element or several of them) and can result in a series of actions. For example, the click of a command button can result in navigating to different screens according to the state of a toggle button and the click over a toggle button can enable several radio buttons. Behavior can also be added to strokes, widening the possibilities of the tool. 2.2 User Testing through Simulation in UISKEI After the screens and the corresponding behaviors have been defined, it is possible to launch a simulation to be presented to the final users in order to evaluate the system. The final user interacts with the simulation, and the system performs the behavior assigned to each element, taking into account the context of the interaction. So, without any coding or implementation from the designer’s part, it is possible to have a semi-functional sketch-based prototype evaluated with the final user. 3 Paperless Prototyping Evaluation in UISKEI: A Preliminary Study We have conducted preliminary user testing sessions to evaluate the simulation facility provided by UISKEI and comparing it to the paper prototyping technique. 3.1 Planning the Evaluation To evaluate UISKEI, we chose a photo web application as the target system, and developed two alternative interactive solutions to it (A and B). Each test participant, acting as a final user, went through both solutions in either paper or UISKEI, giving us insight for further developing the tool and allowing us to make a preliminary comparison between the two prototyping techniques. In order to reduce bias due to the order in which each technique was used, the test participants were divided into four groups, as depicted in Table 1: Table 1. Test group division First A, then B First B, then A First paper, then UISKEI G1 G3 First UISKEI, then paper G2 G4 The tests proceeded according to the following procedure, instantiated here for group G1: 1. Testing using paper prototyping, beginning with solution A and then testing with solution B 2. Interview about the paper prototyping technique 3. Testing using UISKEI simulation, beginning with the solution A and then testing with solution B 4. Interview about the UISKEI simulation, comparing it to paper prototyping 700 V.C.V.B. Segura and S.D.J. Barbosa Each user agreed to take part in the evaluation by signing a consent form and all test sessions were recorded (in both audio and screen capture). For this preliminary study, there were two users in each group. Throughout this paper, we call the group G1 participants P1a and P1b, and likewise for the other three groups. All eight users were either undergraduate or graduate students, in their early 20’s: five men (four of them with a background in Engineering, one in Communication Studies), and three women (one in Biological Sciences, one in Engineering and one in Communication Studies). 3.2 The Evaluation Scenario The proposed system was a photo buying web site and all users were presented with the following scenario: You are a recent graduate who wishes to choose certain graduation ceremony photos to buy. In order to do that, you visit the website of the company responsible for the pictures and choose, from among the available photos, the four best shots (¬¬, =P, T_T, ^_^). Next, after looking at the chosen pictures, you decide not to buy one of them (T_T) and then you confirm your request. Fig. 2. User interface for solution A UISK: Supporting Model-Driven and Sketch-Driven Paperless Prototyping 701 Alternative solution A was inspired by an actual website and had a more textual approach: users must type the reference code in the appropriate textbox and then click “add photo>>” to add it to the list below. To remove, the user must only click on the reference in the list. When the user clicks the OK button, they finish the purchase. All of the screenshots can be seen below: the screenshots drawn in UISKEI on the left, and the corresponding paper version on the right: Alternative solution B had a more pictorial approach: the user goes through the pages and select the pictures by marking a checkbox below each picture. Also, this solution presents an additional step ⎯a “selected screen”⎯ displaying only the selected pictures side-by-side and allowing the user to compare them and remove those they do not wish to have printed. Again, the screenshots for solution B can be seen below (UISKEI on the left, paper on the right). Fig. 3. Solution B “screenshots” 702 V.C.V.B. Segura and S.D.J. Barbosa As can be seen, both test materials have a sketchy look, in order to allow us to make a fair comparison and so that the appearance is not a factor of interference with the test results. 3.3 Results In the interview, besides open questions there were three grading questions, in which the users rated the technique in three aspects: • Adequacy: how adequate the technique was for testing a user interface prototype. (1 = poor, 5 = adequate) • Enjoyability: how much the user would like to use the prototyping technique again in the future. (1 = would dislike, 5 = would like very much) • Comprehensiveness: how much it is possible to understand how the interface works and criticize it. (1 = little comprehensive, 5 = very comprehensive) Although the test was not significant due to the low number of participants, we already have some interesting indications. The overall results of UISKEI are better: UISKEI ended up with an average of 4.6, while paper prototyping ended up with an average of 3.0. Each UISKEI grade has at least 1.2 points of difference to the corresponding one in paper prototyping. Moreover, UISKEI’s strongest point (enjoyability, with a 4.8 average) was the paper’s weakest one (with a 2.6 average). Every user acknowledged and enjoyed the immediate response of UISKEI, saying that the interaction was more visible and dynamic: “With the computer it is easier, since you click and it appears” (P4a) and “it is possible to perceive the basic functionality” (P2b). Also, another common opinion is that UISKEI prototyping was better because it is already in the same environment as the final solution, being “close to the real interface” (P2b). The results obtained in the rating questions are shown below (Table 2): Table 2. Results for the rating questions at the interviews P1a G1 P1b P2a G2 P2b P3a G3 P3b P4a G4 P4b Average Std. Dev. Adeq. 4 4 2 2 4 5 2 2 3.1 1.2 Paper Enjoy. 4 4 1 1 3 5 1 2 2.6 1.6 Comp. 3 5 1 3 4 5 3 2 3.3 1.4 Adeq. 4 5 5 4 4 5 5 4 4.5 0.5 UISKEI Enjoy. 5 5 5 4 5 5 5 4 4.8 0.5 Comp. 4 5 5 5 4 5 5 3 4.5 0.8 The influence of the test order can be seen by comparing the results of Table 1 grouping certain rows and columns. Comparing the solution order (tables 3 and 4), it is possible to notice that users who were presented with the solution B first (groups UISK: Supporting Model-Driven and Sketch-Driven Paperless Prototyping 703 G3 and G4) gave higher grades overall. This can be explained by the users’ preference to the more pictorial approach, confirmed in the interview. Table 3. Results for the rating questions at the interviews of the users presented first with the solution A, than with the solution B Average Std. Dev. Adeq. 3.0 1.2 G1 and G2 (First Row – First A, then B) Paper UISKEI Enjoy. Comp. Adeq. Enjoy. 2.5 3.0 4.5 4.8 1.7 1.6 0.6 0.5 Comp. 4.8 0.5 Table 4. Results for the rating questions at the interviews of the users presented first with the solution B, than with the solution A Average Std. Dev. G3 and G4 (Second Row – First B, then A) Paper UISKEI Adeq. Enjoy. Comp. Adeq. Enjoy. Comp. 3.3 2.8 3.5 4.5 4.8 4.3 1.5 1.7 1.3 0.6 0.5 1.0 Comparing the technique presentation order (tables 5 and 6), it is possible to notice a clear difference between UISKEI and paper prototyping. The paper prototyping technique received higher grades (4.3, 4.0 and 4.3) from users who experimented it first than from users who tested UISKEI (2.0, 1.3 and 2.3). This can also be seen in the standard deviation data: while the tables comparing by technique order (by columns) have standard deviation for paper protoyping below 1.0, the ones comparing by solution order (by row) have the same parameter equal to or above 1.2, showing the influence of the order in which the techniques were presented to participants. Table 5. Results for the rating questions at the interviews of the users presented first with paper prototyping, than with UISKEI prototyping Average Std. Dev. G1 and G3 (First Column – First paper, then UISKEI) Paper UISKEI Adeq. Enjoy. Comp. Adeq. Enjoy. Comp. 4.3 4.0 4.3 4.5 5.0 4.5 0.5 0.8 1.0 0.6 0.0 0.6 During the interview, we discovered that paper prototyping was considered unnatural by many of the users. One of them even missed the clicking noise of the mouse: “[paper] is not much natural, it does not have that feeling, does not have sound, the clicking sound. The experience is different. Cannot say what it has that bugs me, but it is different.” (P3a). Later, when testing UISKEI, the same user said: “same thing as in the paper, but presented in a form that was more comfortable to me” (P3a). 704 V.C.V.B. Segura and S.D.J. Barbosa Table 6. Results for the rating questions at the interviews of the users presented first with UISKEI prototyping, than with paper prototyping Average Std. Dev. G2 and G4 (Second Column – First paper, then UISKEI) Paper UISKEI Adeq. Enjoy. Comp. Adeq. Enjoy. Comp. 2.0 1.3 2.3 4.5 4.5 4.5 0.0 0.5 1.0 0.6 0.6 1.0 One of them (P2b), even called the paper prototyping “boring and senseless”, remarking that he felt that it was “more difficult to understand the dynamics”, because he felt it was not very efficient nor interactive. By contrast, another user (P3b) said that UISKEI was “simple and objective”. It was also a common idea that the paper has a limitation by its own nature: “(…) in the paper it is unreal (…) I kind of ignore the idea of having buttons, list and everything.” (P2a). Many participants felt annoyed by the constant “paper switching”, one of them (P4b) even commented that she “preferred that the computer does the maual job”. The same user, after testing both techniques, said that “there are things that you see in the computer and others in paper”, remarking that she herself had a different behavior while experimenting with the two techniques (in particular, while in the computer, she paid more attention to the screen, noticing the “view results” link in the top right corner, while in the paper she lost her focus more easily). When asked to compare the techniques, most participants (6 out of 8) preferred UISKEI over paper prototyping. P1a, however, stated that paper is a “more established thing”, while UISKEI still have its own bugs and some limitations (as, for example, not being able to type in a textbox, being restricted to a set of predefined values). However, his argumentation was: “UISKEI gave me expectations that were not fulfilled [regarding the aforementioned limitations], opposing to paper, that I would know that I could not type so I would be already resigned”. So, the paper’s natural limitations ended up by being a positive aspect to it. The other participant who preferred paper, P1b, said that she felt more comfortable with it, since “nothing will happen, nothing can be broken”. 4 Conclusions This paper presented a study comparing the paper prototyping evaluation technique to an interaction simulation supported by the UISKEI tool. The purpose of the study was to investigate whether the envisioned computational support is promising, and in which directions the development should evolve to take better advantage of the tool for supporting early evaluation of human-computer interaction. The preliminary results showed that UISKEI was generally well accepted by the study participants. In order to promote UISKEI’s adoption by design and development teams, however, some important facilities must be incorporated in the tool. From the designers’ point of view, it is important to have UISKEI import existing dialogue or interaction models to help define the application’s behavior and ensure consistency between modeled HCI design decisions and the simulated prototype. UISK: Supporting Model-Driven and Sketch-Driven Paperless Prototyping 705 Besides allowing the evaluation of the user interface defined by the widgets and the user-system interaction defined by the behavior (manually or by importing a MoLIC diagram), UISK aims to advance the software specification efforts. Therefore, the UISK team is currently developing a generator for writing the interface and interaction specification in a user interface markup language, together with some UML diagrams. This specification will improve the traceability between the software specification and implementation activities and the earlier design activity. In the future, we intend to make this markup available for graphics designers to improve the user interface using their own special purpose tool. Acknowledgements. We would like to thank all study participants who selflessly devoted their time to our research. We would also like to thank the Brazilian Council for Scientific Research, CNPq, for grants 479167/2008-7 and 311794/2006-8, which made this work possible. References 1. Barbosa, S.D.J., de Paula, M.G.: Designing and evaluating interaction as conversation: A modeling language based on semiotic engineering. In: Jorge, J.A., Jardim Nunes, N., Falcão e Cunha, J. (eds.) DSV-IS 2003. LNCS, vol. 2844, pp. 16–33. Springer, Heidelberg (2003) 2. Buxton, B.: Sketching the User Experience: getting the design right and the right design. Morgan Kaufmann, San Francisco (2007) 3. Carroll, J. (ed.): Scenario-based Design: Envisioning Work and Technology in System Development. John Wiley and Sons, New York (1995) 4. Carroll, J.: Making Use: Scenario-Based Design of Human-Computer Interactions. MIT Press, Cambridge, MA (2000) 5. Coyette, A., Faulkner, S., Kolp, M., Limbourg, Q., Vanderdonckt, J.: SketchiXML: towards a multi-agent design tool for sketching the user interfaces based on USIXML. In: Proceedings of TAMODIA 2004, pp. 75–82 (2004) 6. Forbus, K.D., Ferguson, R.W., Usher, J.M.: Towards a computational model of sketching. In: Proceedings of the 6th international Conference on intelligent User interfaces, IUI 2001, pp. 77–83. ACM, New York (2001), doi:http://doi.acm.org/10.1145/359784.360278 7. Landay, J.A., Myers, B.A.: Sketching interfaces: Toward more human interface design. Computer 34(3), 56–64 (2001) 8. Li, J., Zhang, X., Ao, X., Dai, G.: Sketch recognition with continuous feedback based on incremental intention extraction. In: Proceedings of the 10th international Conference on Intelligent User Interfaces, IUI 2005, pp. 145–150. ACM Press, New York (2005) 9. Nielsen, J.: Usability Engineering. Academic Press, London (1993) 10. Sezgin, T.M., Stahovich, T., Davis, R.: Sketch based interfaces: Early processing for sketch understanding. In: Proceedings of 2001 Perceptive User Interfaces Workshop, PUI 2001 (2001) 11. Snyder, C.: Paper Prototyping: the fast and easy way to design and refine user interfaces. Morgan Kaufmann, San Francisco (2003) 12. Xiangyu, J., Wenyin, L., Jianyong, S., Sun, Z.: On-Line Graphics Recognition. In: Proceedings of the 10th Pacific Conference on Computer Graphics and Applications, p. 256. IEEE Computer Society, Washington, DC (2002) Beyond the User Interface: Towards User-Centred Design of Online Services Marcin Sikorski1, 2 1 Polish-Japanese Institute of Information Technology ul. Koszykowa 86, 02-008 Warszawa, Poland Marcin.Sikorski@pjwstk.edu.pl 2 Gdansk University of Technology Faculty of Management and Economics ul. Narutowicza 11/12, 809-952 Gdansk, Poland Marcin.Sikorski@zie.pg.gda.pl Abstract. This paper presents an attempt to identify those economic factors relevant to design of online services, which shape long-term customer satisfaction, as well as customer loyalty and business relationship with the service vendor. Using user-based studies and expert-based evaluations and major economic factors were identified as consumer needs. Also typical technical components of online services have been identified and prioritized as design elements, also relevant to satisfying economic needs of consumers of online services. As a result of this study will be delivered: a catalogue of design elements, design guidelines and economics-oriented design methodology for online services. Keywords: HCI, usability of online services, e-commerce, online customer behaviour, customer value. 1 Introduction and Background A growing number of online services available in everyday life (in both business and in private contexts, like financial or travel services) make them interesting object of research and design. Successful design of an online service is not only the issue of providing good usability and smooth user-system interaction, but it is also the problem of delivering appropriate business value to the customer [7]. This issue has been raised by Benyon et al. [1], who discussed interesting interplay between two design streams: HCI and usability engineering on one side, and services marketing on the other; a role of users shifting from website users to conscious consumers-decisionmakers has been also discussed. Demands of contemporary e-commerce and business require to include consumer behaviors and decisions as design drivers and – at the same time – as design constraints. Consumer needs should be converted into system attributes which delivers online service to the user, and its goal is to satisfy customer needs better than competition. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 706–714, 2009. © Springer-Verlag Berlin Heidelberg 2009 Beyond the User Interface: Towards User-Centred Design of Online Services 707 The design questions in this context are: • how to identify customer needs and expected benefits? • how to identify system attributes (service components) delivering previously identified economic benefits? • how to deliver these benefits as system attributes (functions or components) to the user using the service website? • how to use the service website to to stimulate customer loyalty, to build a profitable relationship with customers, and to deliver long-term economic value, matching customer needs and lifestyles? Because usability is only a precondition to use the service, but perceived economic value of business relationship is nowadays presumed [3, 6, 8] to be the main attractor for regular using any specific online service, there is a need to expand existing methodology for online services design and to provide economics-relevant guidelines for designers of online services. 2 Current Approach to Design of Online Services Current approach to design of online services is nowadays a mixture of e-commerce models, website design guidelines and relationship marketing principles [5, 6, 8]. Guidelines produced by HCI (Human-Computer Interaction) research are helpful in designing user interface and in improving website usability; however they are not covering other aspects affecting user behaviour – those aspects reaching “beyond the user interface” like information content, costs, attractiveness, prestige, lifestyles, which rather refer to economic criteria of consumer decision-making and long-term business relationships. By now HCI has been seldom treating the user as a consumer, and the interactive system not only as a technical artifact, bus primarily as a business platform designed for satisfying customer needs (of mostly economic nature). Few HCI studies so far have adopted such a business perspective, apart from focusing on perceived value and worth [2] they were mainly relevant to designing e-commerce solutions and improving their usability with a focus on business performance. As a result, nowadays HCI methodology is challenged to enhance its scope as to include also those elements of user-system interaction – a business interaction, in a sense – which may be useful in developing a better user experience not only when directly using the website, but also in a broader sense and in a longer term: as developing valuable business relationship between the customer and the online service vendor. 3 Methodology 3.1 Research Framework, Foundations and Assumptions Foundations and assumptions for this research are originating from HCI domain as well as from service marketing literature, especially related to e-commerce, e-services and customer relationship management [3, 4, 6]. The importance of economic determinants of customer satisfaction, namely those related to factors “beyond the user interface” and plain usability, has been already discussed early studies on e-commerce [7]. 708 M. Sikorski Also former attempts to combine user needs with technical components that satisfy these needs provided foundations for quality management approach in improving the website as a whole (including quality of contents), not only quality of interaction. Therefore a more economic approach, oriented on improving a service website towards delivering economic benefits for customer and enhancing value of relationship, was used to address the stimuli received by customer from both on-line channels (Internet, mobile interaction) and off-line channels (printed ads, press reviews, opinions of other customers). As a research framework a model describing the gradual development of online services [10] was used to identify layers shaping different aspects of customer behaviour in on-line services (Table 1). Table 1. Levels of service quality and levels of customer focus in development of online services [10] Service quality focus Value Relationship Experience Usability Functionality Customer focus Lifestyle Loyalty Delight Satisfaction Performance Because customer needs cover various levels and various dimensions of expected service quality, they must have some specific technical realization on the specific website (for instance available as functions, buttons, catalogs, calculators, images, calendars, opinions, baskets, wish lists and other elements often available in ecommerce websites). In general, technical realization of an online service covers two dimensions: • analytical: system functionality and usability, to be used by a customer, • emotional an affective: user experience, to be perceived by a customer. As a result, a model describing interaction between the user and the service has been adopted, covering following technical components: • • • • business model, leading to customer attitudes and behaviors, transaction process, leading to repeated transactions, interaction layer, facilitating ordering and payment, visual design layer, enabling perception and manipulation on user interface elements. Categories of user requirements in online service will be treated as consumer needs and they have to be identified using expert evaluation, case studies of real systems, and primarily – user-based testing of online services. 3.2 Research Procedure Research procedure applied in this study was a sequence of several methods: • expert evaluations: several examples of actual online services (financial, ecommerce and tourist information services) were analysed by team of e-commerce and usability experts in order to extract: Beyond the User Interface: Towards User-Centred Design of Online Services 709 main requirements of users – with focus on economic needs, main technical components of a service website – with focus on visual, transactional, business process and economic value elements; • user-based testing: consisted of three parts, aimed on gathering user requirements and expectations as well as on validating experts’ point of view: o pre-test questionnaires and requirements workshop: experienced users of online services (in three groups) were asked to fill in a paper questionnaire asking them to declare main categories of requirements and expectations when using on-line services, as well as difficulties experienced on familiar service websites; next these requirements were discussed with users, and grouped into several categories; in another questionnaire task users were asked to compare main categories pairwise as to priorities for main requirements could be calculated; o usability test of actual online service: after requirements questionnaire and workshop users had to perform a specific task using a real online service (financial, ecommerce or tourist information); the tasks were designed in a way stimulating economic thinking – users had to decide which option from a given set is the most attractive; o focus groups and post-test questionnaires : after completing the task with a specific online service, users were asked to fill-in a post-test questionnaire, in which they were asked to identify perceived benefits and experienced difficulties falling into four categories: • economic value and relationship • business transaction process • interaction support • visual elements • conceptual design based on the QFD methodology: o selection and analysis of data gathered from users and experts using affinity diagrams, relationship matrices and prioritization techniques commonly used in the QFD (Quality Function Deployment) methodology, adapted to analyze and improve usability of websites [9]. o o The ultimate goal of this study was to gather enough data to form design guidelines aimed to address economic factors shaping user/customer behaviour not only in an operational dimension when using a specific service, but primarily in a long-term horizon, where total value perceived by customer as well as perceived worth of relationship determine customer loyalty and attitude to using a specific online service or an online brand. 4 Results 4.1 Expected Customer Benefits Main groups of customer needs were identified, in more detail recently described in [11], with assigned following priorities: 710 M. Sikorski Critical: Attractive price (F1) Important: Security of access to service (F5) Sufficient information provided (F4) Medium: Convenient access to service (F3) Short time of website operation (F6) Low additional expenses (F2) Less important: Low physical workload (F7) Low mental workload (F8) Marginal: Benefits from marketing incentives (F9) Priorities computed from user data (gathered from interactions with three types of online services) showed also that marketing incentives (like freebies, gadgets etc.) seem to have practically no influence on attracting users to use on-line services. Physical and mental workload also received surprisingly low priorities, however their meaning should be related not only to fatigue when operating the website, but also to more general convenience-related issues. 4.2 Technical Attributes of an Online Service Post-test questionnaire delivered many qualitative data, which allowed to identify about 60 technical factors (service attributes, functions and components) which users declared as having impact on perceived quality in relation to website, service and to business relationship with a specific vendor. These elements were classified into four layers with selected elements listed below: V – Visual user interface layer easily readable screen elements, websites screen layout consistency, ongoing visual support (visual tips, flags, icons, maps etc). I – Interaction layer easy navigation, automated operations, multiple views of the product. P – Business process layer search support, comparing support, logical sequence of steps, confirmations in operations (instant feedback, feel of control), payment transaction flow, user profile, login etc., tracking service progress (insight into the process). R – Business relationship and economic value layer always attractive prices (attractive prices every day), preview of basket, preview of payment, product configuration customization, Beyond the User Interface: Towards User-Centred Design of Online Services 711 recommendations and customer opinions, choice of methods for placing orders, choice of methods for payment, choice of methods for delivery, fair terms of use, assurance about security, credibility and trust, related offers, special offers for special customers, customer programs, freebies and marketing incentives. The section “R”, covering business relationship and economic value layer was especially interesting due to the purpose of this study, and it contained more than 20 items declared by users as having impact not only on buying decisions but also on shaping long-term relationship with the service vendor. 4.3 Identifying Relationships Data gathered from users were aggregated by experts and collected in a QFD matrix in order to search for possible relationships between user requirements and technical factors. Due to a relatively big amount of user requirements and technical components a preliminary selection was made, and the analysis based on a frequency of users declarations. Priorities were calculated using data from pairwise comparisons from user studies, as well as expert estimations relevant to criticality of given technical factors to a business success of a hypothetic online service of a specific sort. Taking into consideration relationships with identified consumer needs local priorities were computed within each category (Visual, Interaction, Process, Relationship). Fig. 1. shows priorities for selected technical factors within the most interesting section, namely Relationship, covering components which shape economic value for customer. These priorities for technical factors have been computed using typical QFD sorting formula (see [9] for details), which combines importance of each identified customer requirement and relationship of this requirement with technical factors (like elements of service website), assigned to be “responsible” to satisfy each of customer requirements. Results shown in Fig. 1 show that presence of authentic customer recommendations and external evidences of credibility gained top priorities. Next, customers appreciate ability to check availability of service (like amount of seats still left) as an important insight into the service process, with clearly written conditions for guarantee and terms of use for service and website. Remaining technical factors relevant to economic value are shown in Fig. 1., but will be not further discussed here. Finally, all (exactly 64) technical factors have been classified according to QFD sorting formula, presenting local priorities within remaining three other categories (Visual, Interaction and Process). Because this study was intended as qualitative only, and pragmatically aimed to deliver design guidelines for developing online services (basing on user-based testing and expert aggregations of user data), no attempt was yet made to use statistical tools to prove generality of obtained results. 712 M. Sikorski Customer opinions and recommendations Visible prizes/logos of trust and credibility Instant checking of service availability Written conditions for guarantee and terms of use Available privacy protection policy Available multiple methods of payment Off-line contact available (toll-free phone, fax etc) Available multiple methods of delivery Online service available 0-24 Always competitive price Available multiple forms of order Access to associated offers Price reductions for regular customers User account, user profile, shopping history Newsletter for regular customers Freebies: leaflets, comms, multimedia, games, etc. 0 0,5 1 1,5 2 2,5 3 3,5 4 Fig. 1. Priorities for selected technical factors relevant to economic value and customer relationship in online services Instead, a graphical analysis of relationships among user requirements (43 items finally grouped into 6 categories of customers needs) and technical factors (64 items, grouped into 4 categories V, I, P, R) was made. Because the data gathered from users were collected using three different types of online services, as usual in user-based testing user preferences and biases are bound to affect the raw results. For this reason expert analysis was also applied to aggregate the data – this brought surely some subjectivity to the assessments, but helped in extracting the data relevant to the context of online services of a specific type. Table 2 shows that multiple relationships have been found among (1) identified categories of customer needs and (2) identified categories of technical components. Table 2. Categories of consumer needs and related categories of technical components (1) Consumer needs related to: Price Convenience Information demand Security Time and speed Workload (2) Technical components Visual elements Interaction elements Transaction process elements Economic value/relationship Beyond the User Interface: Towards User-Centred Design of Online Services 713 Classification of technical components is not yet finished, but intended result will be available as a catalogue of design elements (“online service primitives”) which can be used to compose online services and service websites in a variety of business contexts. Validation of this approach is planned as an user-based testing of online service prototypes, and observing users behaviour (in short-term as a willingness to buy and in long-term as a willingness to maintain a specific business relationship) will offer suggestions for developing a more general design methodology for online services. 4.4 Prospective Design Methodology Prospective design methodology resulting from this project is aimed to address also business factors (beyond the user interface) by: • relating specific user requirements (consumer needs), • relating service attributes (technical components) with economic values, perceived benefits and profitable business relationships. It is also aimed to convert usability perspective in design an online service form website usability perspective to a wider perspective, clearly stating that a service website or a mobile device is only a delivery platforms for a more generic business solution – a specific online service. 4.5 Discussion Presented approach although mostly qualitative, allowed to identify: • 43 user requirements declared by a sample of online service users, • 64 technical factors to be used when planning technical realization of a service website. The items were grouped in several categories, helpful in developing design guidelines for online services. Limitations and constrains of this study are at least the following: • different types of online services were analysed together, basing on imperfect assumption that there is a common pattern of consumer behaviour underlying a general use of online services; • users’ behaviour and data were far from homogeneous, though bringing difficulties in averaging obtained data and extracting the most representative parts; • currently, a lack of convenient visual notation for mapping 43 user requirements against 64 technical components makes presenting results difficult (large spreadsheets full of formulas), what reduces usability of eventual results. 5 Conclusions and Further Work This approach, despite of advantages resulting from involving real users, has major limitations resulting from subjectivity of obtained data; users’ declarations are always subjective as raw data, and even if aggregated by experts, they are still subject to experts’ subjectivity in interpreting recorded observations. 714 M. Sikorski Within this project further experiments with prototypes of online services are intended to prove if design guidelines stemming from this study are representative and valid enough for practical design contexts. Because the shift in HCI design focus from the user interface, via service website to economic value and business relationship seems to be inevitable, this study – however full of limitations – is believed to present a novel attempt towards broadening HCI design methodology for design challenges of future online services. Acknowledgments. This paper has been prepared with support of the research grant 34/N-COST/2007/0 from the Polish Ministry of Research and Higher Education. References 1. Benyon, D., O’Keefe, B., Mival, O.: Human-Centred Design of Interactive Services. In: HCI and the Analysis, Design and Evaluation of Services, http://www.eng. cam.ac.uk/~pw308/workshops/HCI&Services/Papers/service%20des ign%20w_shop.pdf 2. Cockton, G.: Designing Worth is Worth Designing. In: Proceedings of NordiCHI 2006, pp. 165–174 (2006) 3. Cheung, C.M.K., Zhu, L., Kwong, T., Chan, G.W.W., Limayem, M.: Online Customer Behaviour: A Review and Agenda for Future Research. In: Proceedings of the 16th Bled eCommerce Conference eTranformation, Bled Slovenia, June 9-11, 2003, pp. 194–218 (2003) 4. Dahlberg, T., Mallat, N.: Mobile Payment Service Development – Managerial Implications of Consumer Value Perceptions. In: Proceedings of ECIS 2002, Gdańsk, Poland, June 6-8, 2002, pp. 649–657 (2002) 5. van Dijk, G.: HCI Informing Service Design, and Visa Versa. In: HCI and the Analysis, Design and Evaluation of Services, http://www.eng.cam.ac.uk/~pw308/ workshops/HCI&Services/Papers/Servicedesign_paper_GvD_ 190708.pdf 6. Fui-Hoon Nah, F., Davis, S.: HCI Research Issues in e-Commerce Design. Journal of Electronic Commerce Research 3(3) (2002), http://www.csulb.edu/journals/ jecr/issues/20023/paper1.pdf 7. Lee, J., Kim, J., Moon, J.Y.: What Makes Internet Users Visit Cyber Stores Again? Key Design Factors for Customer Loyalty. Proceedings of CHI 2000 Letters 2(1), 305–312 (2000) 8. Rust, R.T., Kannan, P.K.: E-service: A New Paradigm for Business in the Electronic Environment. Communications of the ACM 46(6), 37–42 (2003) 9. Sikorski, M.: Zastosowanie metody QFD do doskonalenia jakosci uzytkowej serwisow WWW (Application of QFD Method for Improving Usability of WWW Sites, in Polish). Zeszyty Naukowe Politechniki Poznanskiej, seria: Organizacja i Zarzadzanie, vol. 35, pp. 13–24 (2002) 10. Sikorski, M.: HCI and the Economics of User Experience. In: Law, E., Hvannberg, E., Cockton, G. (eds.) Maturing Usability, pp. 318–343. Springer, London (2008) 11. Sikorski, M.: From user satisfaction to customer loyalty: addressing economic values in user-centered design of on-line services. In: Proceedings of the COST 298 Conference The Good, The Bad And The Challenging: The user and the future of information and communication technologies, Copenhagen, May 13-15, 2009 (in print) Designing for Change: Engineering Adaptable and Adaptive User Interaction by Focusing on User Goals Bruno S. da Silva, Ariane M. Bueno, and Simone D.J. Barbosa Departamento de Informática, PUC-Rio R. Marquês de São Vicente, 225 Gávea, Rio de Janeiro, RJ, Brasil, 22451-900 {brunosantana,abueno,simone}@inf.puc-rio.br Abstract. In the human-computer interaction area, research work in end-user programming, end-user development, and user or system-driven adaptation of interactive systems has attempted to cope with variations in users’ intents, context changes and evolutions. In the field of requirements engineering, research that addresses similar issues has been called variability analysis. Most work in variability analysis, however, focuses on prioritizing one or few possible solutions to be implemented in the final product, whereas in humancomputer interaction many researchers advocate that we should strive to enable users to adjust and adapt the product as needed. This paper presents an approach to bring the results obtained in requirements engineering to inform the choice of interaction design solutions to cope with variability. Keywords: Variability analysis, interactive systems adaptation, bridging requirements engineering and interaction design. 1 Introduction Despite the research effort to deal with differences and variations among users and devices in human-computer interaction (HCI), even today we lack a systematic approach to deal with variations that reflect differences in the context of use, in user goals, needs, preferences, and strategies to achieve them. Research focusing on requirements analysis calls our attention to the importance of variations among users, devices and contexts and which propose techniques to analyze them [10, 11, 12, 16, 20]. In HCI, on the other hand, many researchers have proposed strategies to deal with variations at interaction time, typically with flexible, adaptive and adaptable solutions [15, 17, 18, 19, 26]. Nevertheless, there is still a gap between the identification of variability in the problem space and the definition of adequate solutions for coping with variability in the solution space. In requirements engineering, HCI-related concerns are typically treated as softgoals, which can be achieved by selecting, at design time, certain task decomposition, based on the user and business priorities identified by the requirements engineer [3, 13]. This means that, if users’ priorities and softgoals change during system usage, the system will not be able to accommodate them, and may therefore become inefficient or even obsolete. The research area of intelligent J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 715–724, 2009. © Springer-Verlag Berlin Heidelberg 2009 716 B.S. da Silva, A.M. Bueno, and S.D.J. Barbosa user interfaces, in particular on adaptable and adaptive systems [15], aims to accommodate changes in needs and priorities during interaction, thus rendering the system more efficient and useful for a longer period of time. Most work in adaptable and adaptive user interfaces, however, focuses on the specification and implementation of the adaptation mechanisms. There is little work on how to decide which adaptation strategy to adopt to deal with certain identified variations, based on the analysis of requirements and user needs. More specifically, there is little work on designing adaptation based on contextualized user goals (identified during analysis), as opposed to specific ways of achieving them (which already result from design decisions that, if made too early, may unnecessarily restrict the possible adaptation strategies). We have been working on a variability analysis approach based on the users’ discourse that characterizes their goals [24, 25]. In those previous works, the focus was to separately explore the problem and the solution spaces of the HCI design activity related to differences and variations among HCI concerns. In this work, we relate the dimensions analyzed in the problem space and the corresponding strategies deemed appropriate to accommodate change in the solution space during interaction time. This paper is organized as follows: the next section describes our account of the problem and solution spaces. The third section describes how variability is considered in each space. Next, the paper presents the proposed relations between the two spaces, and the fifth section presents some concluding remarks. 2 The Problem and Solution Spaces The problem space may at first be characterized by the people who will use or benefit from the system, and, as in semiotic engineering, what they “want and need to do, in which preferred ways, and why” [4]. Moreover, the contexts [6] in which user activity occurs is also part of the problem space. Variations can be found in the heterogeneity of the user population, their goals or the context for their activities, as well as their evolution in time. Regarding users, it is important to investigate the psychological characteristics (e.g. attitude, motivation, preferences), knowledge and experience (e.g. typing skill, task experience) and physical characteristics (e.g. color blindness, hearing difficulty). Regarding their goals, it is important to examine the user goals when they perform their activities, the frequency and importance of the activities, and the main artifacts and objects used in performing them. Concerning the context of the activity, it is important to study when and where the goals need to be/are achieved: the time divisions (e.g. hours and days) or intervals (e.g. seasonal intervals) which are relevant to user goals and related activities; the set of places (e.g. home and work) or a hierarchy of places (e.g. a room at a university, a campus, a city, and so on); the physical environment (e.g. open/close work areas, lighting, heat, noise level, distractions and interruptions); the social and cultural environment (e.g. in broad terms, morale, motivation, values and policy, or, in specific terms, the possibility of learning to use the system with colleagues, cases when users are pressured to go fast, and this culture works better with uncertainty than others) [11, 23]. Designing for Change: Engineering Adaptable and Adaptive User Interaction 717 To understand the design solution space, from an HCI perspective, we need to understand what goes on during the use of a system. The semiotic engineering theory of human-computer interaction [4] brings to our attention that, as intellectual artifacts, every software [5]: • linguistically encodes both a particular understanding or interpretation of a problem situation and a particular set of corresponding solutions; and • is designed with the goal that users will be able to formulate and express their intents within the linguistic system encoded in the artifact. By linguistic, de Souza means that the artifact encoding is “based on a system of symbols—verbal, visual, aural, or other—that can be interpreted by consistent semantic rules” [5]. In semiotic engineering, Jakobson’s communication model ([13], Fig. 1.) is used to illustrate human-computer interaction phenomena and as thus it provides a basis for defining the solution space. context contact (channel) message addresser (sender) addressee (receiver) code Fig. 1. Jakobson's communication model As with other semiotic approaches, semiotic engineering views HCI as a particular kind of computer-mediated human interaction [4]. It views software as a metacommunication artifact, i.e., a (meta)message produced by the designer about the communication that may take place when users communicate with the message itself at interaction time. The user interface is said to be the “designer’s deputy”, in the sense that it encodes a range of meanings, meaning manipulations, and design principles that the designer chose to synthesize in the product. The MoLIC language, Modeling Language for Interaction as Conversation, was devised to help designers elaborate the metamessage [1] at a dialogue level, allowing them to represent and reflect on: the user’s goals or intents supported by the system; the conversations (i.e. sequences of illocutions and turn taking performed by the user and the designer’s deputy) through which the users may achieve their goals; restrictions on the utterance of certain illocutions according to the context of the conversation (or the world); the perlocutions or effects of each (segment of) conversation; illocutions aimed at repairing communicative breakdowns that the designer is able to anticipate; and the signs1 contained in the illocutions. Thus, we may say that MoLIC further details the solution space with respect to human-computer interaction. MoLIC does not, however, represent the concrete user interface, which is also part of the solution space. In this paper, we do not deal 1 Peirce defined sign as “anything that stands for something else, to somebody, in some respect or capacity” [22]. 718 B.S. da Silva, A.M. Bueno, and S.D.J. Barbosa extensively with the user interface itself. When necessary, we only point to some forms of user interface adaptation, without detailing how it can or should be represented. In this paper, we propose to classify the signs in the solution space in three groups: • object signs, which represent concepts, entities or things; • task signs, which represent actions that manipulate the object signs; and • user interface signs, which represent the user interface elements that refer to objects and tasks. As we will see in the fourth section, variations in each group of signs (as a result of the requirements elicitation and analysis) will point to different interactive solutions to cope with these variations. 3 Exploring Variability Changes in the problem space may require changes in the solution space. In order to explore variability in the problem space, we follow the variability analysis process defined in [24] and refined in [25], which comprises the following steps: 1. Elicit information about domain, user goals, users, context of use and system (possible hardware and infrastructure). 2. Identify goal-directed user requests. 3. Identify and describe the signs present in the user discourse about their domain, goals, and tasks. 4. Rewrite user requests using cases. 5. Organize signs in an ontology. 6. Explore possible variations by expanding user requests. 7. Explore possible variations by substituting signs in user requests. The data collected in (1) are typically answers to the general 5W2H questions: who, what, when, where, why, how, and how much. In addition, issues of time could combine when and how much to generate questions about how often and for how long. Who participates in the interaction process? Examining the interaction process, one realizes that both user and system participate in it. Regarding the users, information about their skills and preferences [11, 12, 23] need to be elicited, as well as any constraints and special needs they may have. Regarding the system, information about the available hardware platforms (desktop, laptop, PDA, cell phone, etc.), input and output devices (mouse, keyboard, pen, etc.), and infrastructure (network access, disk space, etc.) are necessary. It is important to note that, at this stage, no design or implementation decisions are made. Instead, possibilities are being elicited that will help anticipate variations. Only later should these aspects drive design decisions. What are the participants’ goals? The users’ goals (i.e. the expected results of their interaction with the system) are traditionally investigated during requirements engineering [3, 11, 1420, 20]. The “system’s” goals, on the other hand, are a product of the designers’ work to support the users’ goals, and thus are designed in later stages of development process. Designing for Change: Engineering Adaptable and Adaptive User Interaction 719 When, where, and in which contexts will goals be achieved? Besides common known time divisions, such as minutes, hours, days, months, and so on, the requirements engineer should investigate other relevant time divisions or intervals, such as seasonal intervals. The interaction can occur in a set or hierarchy of places, such as home and work, or university<campus<city<state<country. An analysis of the environments should investigate the physical (e.g. levels of light and noise), social (e.g. the possibility of learning to use the system with colleagues; pressure to be efficient) and cultural (e.g. ways to deal with uncertainty) aspects of the environment that can interfere in goal formulation and the user-system interaction [11, 23, 26]. Also, the more general question helps to uncover additional elements to be encoded in the problem space as part of the context, as discussed by Dourish in [6]. How can each goal be achieved? Possible strategies users follow to achieve each goal in their current context of activity. More recently, variability has also been taken into account [10, 12, 16, 20]. Later, during design, one or more selected strategies will be mapped onto interaction sequences at the user interface. How often is each goal formulated? This is a variation of the How much? question that is very relevant for investigating recurring events and goals. The analysis should also consider variations in time, with special attention to the frequency of change. For example, it is not sufficient to investigate that a specific user has such and such skills and preferences, because they can change in time, motivated, among other factors, by training or promotion. As the concerns involved in variability can change during system usage, it may not sufficient to deal with variability searching for “the best option” at design time for user X and context Y (privileging some softgoals over others) to design and develop the system according to this option. It is also important to consider strategies to deal with variability during system usage. We identify the signs (words, pictures etc. that users employ to mean something) present in the users’ discourse about their goals, and organize them in an ontology; and to identify goal-directed user requests and rewrite them using cases (e.g. Fillmore’s [9]). These are then reviewed and expanded through systematic question asking and, finally, we analyze the system variability, also through systematic question asking and by traversing the sign ontology. The focus on goal-directed user requests means that goals have not yet been decomposed into tasks. A user request may be viewed as a high-level “instruction” issued to a system for the user to achieve her goal, assuming the system could interpret it and take appropriate action(s). For instance, in a media player application, “play this playlist at 50% volume” is a user request. The ontology allows us to explore if signs that are somehow related may fit in a user request case and thus represent a possible variation. For instance, the relation music–is-part-of–playlist gives us a clue to attempt to use a playlist in place of a music file. By representing user requests as verb+cases (Agentive, Dative, Objective, Factitive, Instrumental, Manner, Locational, and Temporal), one may restrict the ontology traversal to make the variability analysis more efficient. Systematic question asking is used to review and expand the user requests that were previously identified. To systematically find out about the possible variations, diverse questions are asked (e.g.: What can vary? When? How often? How much?) As 720 B.S. da Silva, A.M. Bueno, and S.D.J. Barbosa a result of the analysis process, the answers associated to a structured user request and sign ontology, will help to make the upcoming design decisions. For instance, a play[O(media_file)] user request may originate an expanded user request, like: play[O(media_file),M(volume),M(speed),M(equalization), O(current_playing_position),F(state=playing)] Regarding variability in the solution space, literature on intelligent, adaptable and adaptive user interfaces (cf, for instance, [15, 17, 18]) addresses variations in the solution space according to multiple dimensions: • • • • What is adapted?What aspect of it is adapted? Who makes the adaptation? The adaptation is associated to what? And in the case of adaptive systems, The adaptation is computed as a function of what? Regarding what is adapted (what kinds of signs are adapted), we have that task signs, (conceptual) object signs and user interface (ui) signs can be adapted. This adaptation can be of two kinds: token selection or configuration (attribute values or tokens), and type selection or composition (selection of a subset of types within a sign). Regarding who makes the adaptation, we can have either the user (manual adaptation, as in adaptable systems) or the system (automatic adaptation, as in adaptive systems). If the adaptation is manual, we also need to decide: To what is the adaptation associated? An adaptation can be associated to any element of the domain model, the communication model or the context of activity that is encoded in the system, such as: a single user, a group of users or a user profile; the application, i.e., valid across users; a single device (e.g. my cell phone whose serial number is VN2531) or a set of devices, extensionally or intensionally defined (e.g. smartphones with 3G connection capability); a certain object (e.g. document, paragraph) or a set of objects extensionally or intensionally defined (e.g. all documents produced by the WriteOn text editor); certain periods of time (e.g. turn sound notifications off between 12am and 6am); certain locations (e.g. at home vs. at the workplace); and so on. In the case of manual adaptation, there can be implicitly or explicitly setting parameters (e.g. in “preference” dialogue boxes), to creating macros, styles and templates. In this kind of adaptation, the user is in control of the application’s behavior, but the necessary effort to adapt the application needs to be taken into careful consideration so as not to hinder the adaptation possibilities [18]. In importing a picture into a document, the path where the previously imported picture was located can be remembered in the next import operation, assuming that the user will keep most of their pictures in the same directory. This is a case of implicit parameter setting: it was meant by the user only to be part of defining a single operation, but designed to be remembered and later reused, as if the user had defined it as the preferred path for locating media files in a “preferences” dialog box. If the adaptation is automatic, we need to decide: How (As a function of what) will the adaptation computed? In any case, care must be taken to allow users to easily regain control of the application behavior, either by configuring the rules of the adaptation mechanism “just-in-time” or by turning them off altogether. Designing for Change: Engineering Adaptable and Adaptive User Interaction 721 Most adaptive systems adjust the content and formatting presented to users as a function of the user’s (and other users’) interaction history, device, network capabilities, and so on. In the text editor example, the documents could be listed with different formatting or in a different order depending on how often they were accessed, for instance. Adaptive hypermedia systems are among the most prominent examples of adaptive systems. In e-commerce applications, recommender system modules are a common example of this category of adaptation. A few adaptive systems make permanent changes to the application or the object with which the user is working, such as “autocorrect” and “autocomplete” features in form fields, for instance. It is paramount that this kind of adaptation is adequately communicated to users, easily reversible, and possible to be turned off. Thus, variations in the solution space can be encoded by the tuple <what, type/token, who, f(X), g(Y)>, where f(X) is to what the adaptations is associated, and g(Y) is the set of elements that drive the adaptation (in the case of adaptive systems, the actual rules that adapt the user interface). 4 From Analysis to Design: Make Decisions about Variability When moving from analysis to design, there are at least four strategies to deal with variations: • not to accommodate variation (single non-adaptable system); • to accommodate variation in a family of products (software product line); • to allow users to manually change some aspects of the system (adaptable system); and • to embed rules and inference mechanisms for the system to automatically adapt itself (adaptive system). Focusing on flexible, adaptable and adaptive systems, we can relate the problem space to the solution space, as follows: Problem Space Solution Space (conceptual) regarding the focus of the adaptation (cases in the user request) agent, object, dative, factitive object manner, instrument, locational, temporal task regarding the structure of user requests goal-driven user request with fixed cases token selection goal-driven user request with varying cases type composition regarding the frequency of occurrence of a set of tokens or types varying set of tokens or types that cannot be associated to specific contexts of use ⎯ (no defaults) prominent set of tokens or types associated to most contexts of use ⎯ (fixed defaults) 722 B.S. da Silva, A.M. Bueno, and S.D.J. Barbosa tokens or types associated to certain contexts that cannot be inferred manual tokens or types associated to contexts that can be inferred automatic Within the solution space, we can relate the possible adaptation solutions to different interaction design solutions, as follows: Conceptual Solution Interaction Solution and Examples no adaptation, fixed solution traditional HCI design solutions, with or without default values no adaptation, flexible solution alternative tasks and parameters for a certain task <task, token selection> setting or inferring application or task profiles, i.e., sets of configuration parameters related to the whole application, to a set of tasks, or to a single task, instead of being applied to specific objects selection of printing parameters, character and paragraph formatting configuration sets for different environments, so that when printing a certain document at home, the configuration would be different from when printing the same document at the workplace having the default printer configured differently in distinct environments, such as at home or at the workplace, or depending on the network to which the device is connected <task, type composition> manually or automatically composing sequences of tasks into a macro or script, especially by recording the user-system interaction to be reproduced later as a single interaction step. This kind of adaptation is especially useful for repetitive action sequences. manual: composing task sequences into a macro or script; macro recording, interactive macro editing and running, scripting automatic: programming by demonstration <object, token selection> setting or inferring attribute values of an object to establish a new default or a named configuration formatting styles in text editors, which can be applied to characters and paragraphs to facilitate consistent formatting certain patterns of printing for certain kinds of document, such as technical report, illustration, and so on. Each document could then be associated with a template so that it is printed with the corresponding configuration settings (also related to task adaptation) <object, type composition> manually or automatically creating new objects or templates composed of existing objects, either extensionally or intensionally special sets of documents, such as manually defined favorite documents or automatically inferred most accessed documents, most recently accessed documents, and so on <ui, token selection> setting or inferring attributes of user interface signs manually redefining colors, labels, images automatically highlighting user interface signs according to some context-dependent relevance criteria Designing for Change: Engineering Adaptable and Adaptive User Interaction <ui, type composition> 723 manually or automatically arranging user interface signs reconfiguring the layout, panels, toolbars, and menus putting frequently accessed operations first on a list, “remembering” the last user interface layout configuration To increase the user’s control of the interaction, a candidate design solution for adaptivity generally involves suggesting and adaptation and having the user approve or dismiss it or, for less risky adaptations, to make the adaptation but provide a clear and easy mechanism for reversing it. It is important to note that the “associated to” and the “computed from” functions are closely related to the problem space, and their mapping onto the solution space depends on the domain and context elements the designer decides to encode in the software. 5 Concluding Remarks Our preliminary evaluation of the variability analysis approach confirmed the benefits of avoiding early task decomposition, organizing signs in an ontology and systematically asking questions to analyze variability. The support for traversing from the problem to the solution space has been mainly inspired by existing literature involving both theoretical and technical work. Further work includes a more extensive evaluation of the proposed traversals, and also the design and evaluation of alternative interaction mechanisms to support the user goals in the identified range of variations. References 1. Barbosa, S.D.J., de Paula, M.G.: Designing and evaluating interaction as conversation: A modeling language based on semiotic engineering. In: Jorge, J.A., Jardim Nunes, N., Falcão e Cunha, J. (eds.) DSV-IS 2003. LNCS, vol. 2844, pp. 16–33. Springer, Heidelberg (2003) 2. Carroll, J.M., Mack, R.L., Robertson, S.P., Rosson, M.B.: Binding objects to scenarios of use. International Journal of Human-Computer Studies 41(1-2), 243–276 (1994) 3. Dardenne, A., van Lamsweerde, A., Fickas, S.: Goal-directed requirements acquisition. Science of Computer Programming 20(1-2), 3–50 (1993) 4. de Souza, C.S.: The Semiotic Engineering of Human-Computer Interaction. MIT Press, Cambridge (2005) 5. de Souza, C.S.: Semiotic engineering: Bringing designers and users together at interaction time. Interacting with Computers 17(3), 317–341 (2005) 6. Dourish, P.: What We Talk About When We Talk About Context. Personal and Ubiquitous Computing 8, 19–30 (2004) 7. Eco, U.: Semiotics and the Philosophy of Language. Indiana University Press, Bloomington IN (1984) 8. Eco, U.: Theory of Semiotics. University Press, Bloomington (1979) 9. Fillmore, C.: The case for case. In: Bach, E., Harms, R.T. (eds.) Universals in Linguist Theory. Holt, New York (1968) 724 B.S. da Silva, A.M. Bueno, and S.D.J. Barbosa 10. González-Baixauli, B., Laguna, M.A., Leite, J.C.S.P.: Aplicación de un Enfoque Intencional al Análisis de Variabilidad. In: Proceedings of the 8th Workshop on Requirements Engineering, Porto, Portugal, pp. 100–111 (2005) 11. Hackos, J.T., Redish, J.C.: User and task analysis for interface design. John Wiley & Sons, New York (1998) 12. Hui, B., Liaskos, S., Mylopoulos, J.: Requirements Analysis for Customizable Software: a Goals-Skills-Preferences Framework. In: Proceedings of the 11th IEEE International Requirements Engineering Conference, pp. 117–126 (2003) 13. Jakobson, R.: Linguistics and Poetics. In: Sebeok, T. (ed.) Style in Language, pp. 350–377. MIT Press, Cambridge (1960) 14. Kotonya, G., Sommerville, I.: Requirements Engineering: Processes and Techniques. Wiley, Chichester (1998) 15. Kühme, T., Malinowski, U. (eds.): Adaptive User Interfaces: Principles and Practice. North-Holland, Elsevier (1993) 16. Liaskos, S., Lapouchnian, A., Yu, Y., Yu, E., Mylopoulos, J.: On Goal-based Variability Acquisition and Analysis. In: Proceedings of the 14th IEEE International Conference on Requirements Engineering, pp. 76–85 (2006) 17. Lieberman, H., Paternò, F., Wulf, V. (eds.): End User Development. Springer, Heidelberg (2006) 18. Mackay, W.E.: Triggers and barriers to customizing software. In: Proceedings of CHI 1991, New Orleans, USA, pp. 153–160 (1991) 19. McGrenere, J.: The design and evaluation of multiple interfaces - a solution for complex software. PhD thesis. Department of Computer Science, U. of Toronto, Canada (2002) 20. Mylopoulos, J., Chung, L., Liao, S., Wang, H., Yu, E.: Exploring alternatives during requirements analysis. IEEE Software 18(1), 92–96 (2001) 21. Nielsen, J.: Heuristic Evaluation. In: Nielsen, J., Mack, R.L. (eds.) Usability Inspection Methods. John Wiley & Sons, New York (1994) 22. Peirce, C.S.: Collected Papers. Harvard University Press, Cambridge (1931-1955); excerpted in Buchler, Justus (ed.): Philosophical Writings of Peirce. Dover, NY (1955) 23. Preece, J., Rogers, Y., Sharp, E.: Interaction Design: Beyond Human-computer Interaction. John Wiley & Sons, New York, NY (2002) 24. Silva, B.S., Barbosa, S.D.J., Leite, J.C.: A Language-Based Approach to Variability Analysis. In: Proceedings of WER 2008 (2008) 25. Silva, B.S., Barbosa, S.D.J.: Variability Analysis: From requirements engineering towards interaction design. In: Proceedings of SEW-32, Kassandra, Greece (2008) 26. Sutcliffe, A., Fickasm, S., Sohlberg, M.M.: PC-RE: a method for personal and contextual requirements engineering with some experinence. Requirements Engineering 11, 157–173 (2006) Productive Love: A New Proposal for Designing Affective Technology Ramon Solves Pujol and Hiroyuki Umemuro Tokyo Institute of Technology, 2-12-1-W9-67 O-Okayama, Meguro-ku, Tokyo 152-8552 Japan solves.r.aa@m.titech.ac.jp, umemuro.h.aa@m.titech.ac.jp Abstract. Love highly present in peoples talks and all cultural spheres, its importance suggests the need to understand what role technology plays in relation to it and the roles it could play in the future. We review studies related to love in HCI and we identify a lack of consideration of philosophy as a background for love understanding. Based on literature review, we offer a proposal of guidelines for designing technology that aims to improve loving relationships. Besides, we examine principles of engagement with technology that may be important when designing love-promoting technology. Finally we present a Productive Love promoting system, which evaluation indicated that the participant found it useful to improve their Productive Love. Keywords: Productive Love, affective technology, care, respect, responsibility, knowledge, lovers, family. 1 Introduction In relationship to love and in the field of HCI there have widely been realized studies about transmission of personal information, intimacy, connectedness, awareness, and social presence. However, the understanding of love has not been approached specifically. What's more, what has been done is mainly based on people experiences and opinions. On the other hand, a philosophical approach may represent a further understanding of the experience of love, consequently, in this paper; our goal is to explore how technology can contribute to improve love relationships through basing our principles on philosophy. Our contribution is the description of the existing and potential design space for HCI in relation to the field of love. We want to highlight the possibility of a new line of research that has for the most part been unobserved within HCI to date. That is the possibility of using philosophy and technology to improve love relationships. The structure of this paper is as follows: We will introduce Productive Love philosophical approach and we will explain its possible applicability to HCI by offering a set of guidelines for Productive Love promoting technologies. We will briefly point examples of previous works related to love on HCI reflecting on the type of interaction that they may address under the basis of Productive Love. And we will indicate which areas seem to have room for improvement. Furthermore, we will explore principles of J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 725–734, 2009. © Springer-Verlag Berlin Heidelberg 2009 726 R. Solves Pujol and H. Umemuro engagement with technology, which may be important when designing love-promoting technology. Finally, we describe a Productive Love promoting application that embraces several of the proposed guidelines and we describe its evaluation. This application example as aims to arise criticism and challenge future designs of love promoting technologies. 2 Productive Love from Theoretical Review It seems that most people share the tendency to regard love as innate, passive, and basically the same as falling in love. Similarly, Burston explains that Plato, Schopenhauer, Nietzsche, Freud, and, Lacan represents the common understanding that erotic love is an involuntary passion, where love is "blind" and therefore, the adversary of reason. On the other hand, Burston points that Soren Kierkegaard, Max Scheler, Martin Buber and Erich Fromm represents the belief that genuine love always includes an element of volition [3]. Specifically, Fromm differentiated two kinds of love. (a) Immature love or symbiotic union, which corresponds to the person whose character, has not developed further than the receptive orientation, and we define as Receptive Love. (b) Mature love, attributable to the person who has developed productive character or orientation, is the representative of active love, and we define as Productive Love [9]. The idea of Productive Love has been approached by several philosophers under different designations such as being love by Abraham Maslow or benevolent love, altruistic love and agape love by Bernard Murstein [13]. As a validation of Productive Love-related theories that regard it as a higher form of love, agape love was positively correlated with relationship satisfaction [15]. Erich Fromm in his book “The art of loving”, suggested that love is an art, therefore it could be learned as an art is. Moreover, Fromm proposed four basic elements common in all forms of for love: care, responsibility, respect and knowledge [9]. These appeal to us as basis for designing technology to improve love. Nevertheless, in addition to Erich Fromm’s approach, we base the present survey and proposed guidelines on our extensive review of the works of several western theorists and researchers. Still, the theories and researches taken into account understand of love not as an involuntary passion but as a voluntary action, which can be improved. The reason that brought us to choose this approach is that the understanding of love as an action brings up the possibility to make use of the existent or future technologies to help to improve it. We enumerate five key aspects in understanding of Productive Love. First, we summarize what has been described to be in relation to it. Second, we contrast with what has been described in relation to Receptive Love, its antagonist. Third, we include elements of theories and empirical studies about love in general terms that seem to be in accordance with Productive Love. Finally, as a representative of the differences between people’s beliefs about love nowadays and our Productive Love approach, we include a comparison between Fromm’s understanding of care, respect, responsibility and knowledge and the results of a brainstorming regarding to them carried out with five participants [9]. Productive Love: A New Proposal for Designing Affective Technology 727 2.1 Elements of Productive Love A synthesis of the elements related to the concept of Productive Love is the following: self preservation, striving for the other, giving, concern for the other, appreciation of love, enjoy the other, selflessness, non exchange, realism, maturity, and insight [9, 10, 13, 19]. 2.2 Elements of Receptive Love A synthesis of the elements related to the concept of Receptive Love is the following: objectification, complementarily, fairness, exchange, ego, receiving, involuntary love, evaluation, materialism, and irrationality [9,13]. 2.3 Elements Related to Productive Love from Psychological and Philosophical Theoretical Studies A synthesis of the elements related to the concept of Productive Love is the following: concern, communication, action, and independence [5, 8, 11, 18, 22]. 2.4 Elements Related to Productive Love from Psychological Studies A synthesis of the elements related to the concept of Productive Love is the following: concern, communication, striving for the other, and giving [14, 18, 20]. 2.5 Discrepancies between Fromm’s (1956) Theory about Love and Nowadays People’s Thoughts from an Exploratory Brainstorming First, Fromm defined care as active concern for life and the growth of that which we love [9]. However, participants mainly concerned about care between equals, and not much about care as labor for the other. Second, Fromm explained responsibility as a voluntary act, to be able and ready to respond [9]. Conversely, participants understood it mostly as a duty. Third, Fromm explained respect to be the ability to be aware of the other’s person unique individuality [9]. In spite of this, participants concerned about too much respect, as negative. Finally, Fromm explained that knowledge is a core factor for love [9]. However, participants did not reflect on objectiveness as fundamental factor for the knowing about the partner. 3 New Directions: Improvement of Actual interactions In order to obtain a summary representative of Productive Love, the above-introduced elements from literature were analyzed and classified. In addition, insights from the brainstorming were taken into account, resulting in the following list of elements of Productive Love. Preservation of personal individuality and freedom, not feeling of duty. Respect, not objectification or exploitation. Self-growth, not dependency on others. Care, responsibility and active concern, not involuntary love. 728 R. Solves Pujol and H. Umemuro Giving, not exchange and fairness. Selflessness, not egoism. Enjoy the other, as he is, not evaluation of the other. Enjoyment of the love experience, not materialism. Realism, not delusion. Knowledge, not lack of understanding. 4 Guidelines for Designing Technology That Promotes Productive Love The objective of the Productive Love promoting technology is to create an environment where the users can experience the aspects related to Productive love. In that way, the users may be able to appreciate Productive Love qualities and perhaps help them find their own meaning about their loving relationships without the need of going through the theoretical aspects of the Productive Love theories, which could be discouraging. This section introduces an initial proposal of theory and some examples of how the final elements of Productive Love could be taken into account when designing lovepromoting technology. 4.1 Preservation of Personal Individuality and Freedom, Not Feeling of Duty When using a love-promoting technology, there should be freedom of actions; there should not be rules of “good” or “bad”. There should neither be obligation for stereotyped actions such as using emoticons, which may not represent the user. On the other hand, love-promoting technology should facilitate every person to feel easy to express freely, such as accepting an open-ended rage of actions instead of offering a predefined and limited one. 4.2 Respect, Not Objectification or Exploitation Love-promoting technology should respect everyone’s desired degree of privacy such as not encouraging the use of devices (i.e., cameras) if this invades one’s privacy or lead control over a person. Moreover, love-promoting technology should not help in acquiring a person, or obtaining something from the other, as it could be obtaining personal favors or getting chores done. 4.3 Self-growth, Not Dependency on Others Love-promoting technology should not lead to dependency. For instance, a device that facilitates a person to wake the other person in the morning would be better if it also facilitates the receiver in learning to do it by himself. A love-promoting device that intends to improve self-growth could facilitate to show the person’s improvements, which may help to improve self-awareness. Productive Love: A New Proposal for Designing Affective Technology 729 4.4 Care and Responsibility Not Involuntary Love A love-promoting technology that intends to improve active concern for the other person and responsiveness for the other’s actions could inform about the person illusions, dislikes, or moods, which could motivate responsiveness to the other’s needs. For example, thinking how to do something for the other person when seeing that he or she needs it. 4.5 Giving, Not Exchange and Fairness The actions done on love-promoting technology should facilitate to understand the other’s pace instead of focus on a return, which may be derived from an operating logic that requires correspondence of actions. The user should be able to find pleasure on doing actions for the other. For instance, providing information of the other person’s when receiving an action could motivate this. 4.6 Selflessness, Not Egoism In a love promoting technology the source of pleasure should not remain in the human-device interaction but grow to be in the human-human interaction, which may lead to learn to enjoy doing actions for the other. In order to accomplish that, the actions carried out in the love-promoting technology should be far from fiction, as the ones carried out by fictional characters in fictional contexts in most of the videogames, but closer to real actions and real persons as accurately as possible. 4.7 Enjoy the Other, as He Is, Not Evaluation of the Other Love-promoting technology should not make differences of rank, status, personal scoring, comparisons, or competition. For instance, high amount of actions should not be evaluated as better. Likewise, actions such as “buying a present” or “asking how are you” should not be established as having different value. Both of them should be shown as valuable in order to let the receiver understand its context and enjoy them. 4.8 Enjoyment of the Love Experience, Not Materialism Love-promoting technology should not put a premium on the user actions by obtaining points or evaluation, which may drive focus on the return. However, initially, something similar to a game might provide initial interest, which could help to appreciate the love experience. 4.9 Realism, Not Delusion Love-promoting technology should avoid showing an unreal or partial image of the partner, which may lead to create a wrong image of him or her. Love-promoting technology should avoid promoting idealization of the partner by showing too many signs of affection such as exaggerating emoticons, for example. What is more, in order to help for an objective understanding of the partner; love-promoting technology should transmit realistic information about good and bad things. 730 R. Solves Pujol and H. Umemuro 4.10 Knowledge, Not Lack of Understanding In order to make possible other aspects of Productive Love, knowledge should be promoted by the love-promoting technology. To know about the other person could be helped by facilitating the user to provide personal information such as answering questions, being motivated to talk, write about oneself, or to share personal images or objects. However, knowledge could be collected as well automatically, using for instance, sensory technology that may collect images, sound, movement, presence, etc. It is possible as well to use the collected data to estimate the person’s activity, emotions or feelings. 5 Engagement with the Technology Love-promoting technology besides being a vehicle for human-human interaction is still a piece of technology that requires human-technology interaction. This section discusses factors that seem critical to effectively engage with the love-promoting technology. The quality of the experience and the importance of enjoyment related to human computer interactions has been discussed by Blythe and Hassenzahl who explained fun to be trivial, repetitive, spontaneous, frivol and spectacle-like. While pleasure was explained to be relevant, progressive, not necessarily spontaneous, somehow serious, aesthetics or artlike, requiring commitment to the basic principles and rules of the activity, sources are limited to three: opportunities for personal growth, memory attachment, and anticipation [1]. Accordingly, in order to make the experience pleasurable, the love-promoting technology could strive for the feelings that can be experienced when remembering a loved person, or when anticipating the other person acts. Moreover, the mirroring of own acts could stress the consciousness of the active engagement, which may lead to personal growth. For example, a game where one is able to discover and do things for a partner could be a way to motivate the user to initiate some of the above described keys of pleasure in relation to the other. Nevertheless, besides being pleasurable, love-promoting technology might need to provide an extra attractiveness in order to compete with nowadays most popular daily leisure activities such as watching TV, browsing the Internet, or playing videogames. Among nowadays popular leisure activities, videogames and web activities have been extensively linked to flow, concept that was initially introduced by Csikszentmihalyi, who explained it as an optimal experience that follows the optimal combination of challenge and skill levels of the person and the situation [7]. In that sense, a gameoriented or web-oriented application could be at first appropriate for the lovepromoting technology. For example, if icons suggest some kind of easy interaction, it might lead to reactive operation and thus, be attractive and fun. 6 Previous Related Works Intimate awareness systems were explored through metaphoric representations, for instance, the LumiTouch picture frame lighted when the remote user touched another picture frame [4], and the Lover’s cups transmitted the movement of a cup into Productive Love: A New Proposal for Designing Affective Technology 731 illumination [6], However, metaphors fail to provide objective knowledge about the partner, which is needed as a basis for growing Productive Love. Some proposals transmitted the information in a more objective way, the Whereabouts clock demonstrated to serve as reassurance through an animated representation of family location where members of the household are represented by icons linked to the location of their cell phones [2]. However these devices are restricted on transmitting information of the person’s inner situation, which is vital for promoting Productive Love. Some proposals transmitted even more personal information. For instance, in the ToTell list, pictures or messages acted as a post card, that functioned as a reminder of interesting moments and experiences to talk about [17]. However we believe that there is the possibility to provide intimate information automatically, which may not fail to be continuously provided and serve to instigate Productive Love growth. In order to approach the above-mentioned designs related to love, several research methods have been used. Participatory designs were used to find out what people miss [12]. Furthermore, Vetere used cultural probes and listed several research methods: online questionnaires, data logs, longitudinal focus groups, interviews, and written reflections [21]. The pointed methods rely on people’s opinions and habits about their relationships. However, we believe that generally, people may not understand love in such a deep way as philosophers do, therefore people’s ways could be improved through the Productive Love principles. 7 System: Pictures’ Call This section provides a practical example of a system that embraces several of the proposed guidelines for love-promoting technology and factors of engagement with technology. 7.1 Description The system automatically takes pictures of each of the users and sends to the other. The automatism frees the users from a task that could represent duty, a burden, and thus, could be abandoned. Essentially, the automatism intends to provide better engagement with the system, and consequently with the other person. The pictures are taken through a tablet PC that displays a mirror image when functioning. Five pictures a day are taken in random intervals, this way the photographed person appears completely natural, doing whatever he is doing and away from posing and choosing the best smiling picture to be send. This ensures that the information realistically and objectively represents the person. In order to respect intimacy, the system incorporates a reset button and the pictures are sent after a security delay of thirty minutes. This way, if necessary, the user is able to reset the system anytime before the pictures are sent, protecting his individuality and freedom. The pictures are received in a second tablet PC, where the pictures are automatically displayed as a slideshow. Receiving updated pictures of the companion is useful for improving the knowledge about the other, and may motivate care and responsibility. Moreover, in order to promote active concern for the other person, as well as 732 R. Solves Pujol and H. Umemuro engagement with the system, the pictures are directly commented by writing on them using the touch panel display, and re-sent to the original person. This encourages the person to take an action related to the person who appears in the picture, which can be considered as a gift and a selfless act. What is more, receiving commented pictures of oneself, which one has even not seen yet, serves as a mirror that supports personal awareness. This also keeps the user interested and engaged with the technology and therefore with the loved person, which helps to enjoy the love experience, and the other. 7.2 Evaluation The system installed in two households, one of which were parents on their fifties, and the other was their son on his twenties. The participants used the system for one week and answered six questionnaires about Productive Love and the system use. Before using the system, the participants answered a questionnaire about how much they valued the items of Productive Love. This questionnaire was returned after using the system and they answered a second questionnaire on how the system helped them to achieve Productive Love. Besides, the participants answered a Productive Love questionnaire about their relationship with the other household member. This questionnaire was administrated before and after using the system in order to detect if the participants experienced changes on their Productive Love relationships. A fourth questionnaire assessed the costs and benefits of communication. A fifth questionnaire assessed three of the Nielsen's [16] five criteria of usability: learnability efficiency and satisfaction. Finally, a sixth questionnaire asked their impressions about the system. The participants evaluated positively Productive Love items, indicating that were suitable for them. The most valued factor was respect and the less valued was realism. The participants felt that the system helped them to achieve Productive Love as well, particularly, the participant that rated the system as more helpful was the mother who spent more hours with the system. The factors that the device was ranked on being more helpful were care and responsibility, selfless, enjoy the other and knowledge, while the factor that was less ranked was self-growth. In the other side, the overall difference before and after using the system, about Productive Love between the distant familiars, still positive, was minimal, which seems attributable to having used the system for only one week, which may be not long enough to lead the action done for Productive Love be reflected on the users relationship. The costs and benefits of communication questionnaire showed that the system was not well evaluated in terms of privacy. The reasons were where the mother s concern about spending long time in front of a camera, and the son’s concern about the fact that a large part o his room was taken by the camera. Placing the camera in a less intrusive place might solve those problems. In the other side, the system was well evaluated in terms of not causing expectations or obligations. This makes the system valuable for relationships where one of the parties has higher desire of updated information than the other, for example, grandparents who may want to see more about the young generations, which are sometimes to busy to keep informing them. The learnability and efficiency of the system where evaluated low, the reason seems to be that the system had technical problems, and stopped several times. For Productive Love: A New Proposal for Designing Affective Technology 733 instance, the system errors made the son worry about if his parents had received his comments or not. In the other side, the system was well evaluated in terms of satisfaction the witch indicates that in spite of its errors the participants were able to enjoy it. The father appreciated sending comments, but he would like as well, to send owntaken pictures through the system and use it for videoconference. The mother enjoyed seeing things in a natural setting that she even did not see when living together with her son. The son highly valued sending and receiving hand righting in real time, and being able to handwriting itself, which let him transmit his feelings. 8 Conclusion A love-promoting technology might not love itself. However, providing enjoyment of doing activities related to the other person may help to know the other, thus may be able to create the conditions to improve Productive Love. Eventually, those activities might substitute general forms of entertainment such as watching TV, or playing videogames, which are done without relation with the loved persons. Our system evaluation indicated that the participants felt that it helped them to achieve Productive Love. However, further evaluations should be done during longer period in order to materialize those improvements in the users lives. Besides, future development of the system should address the experienced technical problems. Acknowledgments. The authors would like to thank the anonymous voluntary participants for their contributions in the evaluation study. References 1. Blythe, M., Hassenzahl, M.: The semantics of fun: Differentiating enjoyable experiences. In: Blythe, M.A., Monk, A.F., Overbeeke, K., Wright, P.C. (eds.) Funology: From Usability to Enjoyment, pp. 91–100. Kluwer, Dordrecht (2003) 2. Brown, B., Taylor, A.S., Izadi, S., Sellen, A., Kaye, J.J., Eardley, R.: Locating Family Values: A Field Trial of the Whereabouts Clock. In: Krumm, J., Abowd, G.D., Seneviratne, A., Strang, T. (eds.) UbiComp 2007. LNCS, vol. 4717, pp. 354–371. Springer, Heidelberg (2007) 3. Burston, D.: Reply to Pnina Shinebourne’s Essay Review in Existential Analysis 17.2 on The Art of Loving by Erich Fromm. Existential Analysis 18(1), 198–201 (2007) 4. Chang, A., Resner, B., Koerner, B., Wang, X., Ishii, H.: LumiTouch: An emotional communication device. In: Ext. Abstracts CHI 2001, pp. 313–314. ACM Press, New York (2001) 5. Chapman, G.: The Five Love Languages. Northfield, IL, Chicago (1995) 6. Chung, H., Lee, C.H., Selker, T.: Lover’s cups: drinking interfaces as new communication channels. In: Ext. Abstracts CHI 2006, ACM Press, New York (2006) 7. Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience. Harper Row, New York (1990) 8. Dion, K.K., Dion, K.L.: Cultural perspectives on romantic love. Personal Relationships 3, 5–17 (1996) 9. Fromm, E.: The Art of Loving, 1995th edn. Thorsons, London (1956) 734 R. Solves Pujol and H. Umemuro 10. Gelbond, B.: Self-actualization and unselfish love. Journal of Religious Humanism 13, 74– 78 (1979) 11. Hatfield, E., Utne, M.K., Traupmann, J.: Equity theory and intimate relationships. In: Burgess, R.L., Huston, T.L. (eds.) Social Exchange in Developing Relationships, Academic Press, New York (1979) 12. van der Hoog, W., Keller, A.I., Stappers, P.J.: Gustbowl: Technology supporting affective communication through routine ritual interactions. In: Ext. Abstracts CHI 2004, pp. 24–29. ACM Press, New York (2004) 13. Le, T.N.: A measure of immature love. Individual Differences Research, 3(b) (2005) 14. Lee, J.A.: The colors of Love: An exploration of the ways of loving. New Press, Ontario (1973) 15. Lin, L., Huddleston-Casas, C.A.: Agape love in couple relationships. J. Marriage and Family Review 37(4), 29, 48 (2005) 16. Nielsen, J.: Usability Engineering. Morgan Kaufmann, San Francisco (1993) 17. Romero, N., Markopoulos, P., Baren, J., Ruyter, B., Ijsselsteijn, W., Farshchian, B.: Connecting the family with awareness systems. Personal and Ubiquitous Computing 11(4), 299–312 (2007) 18. Rubin, Z.: Measurement of romantic love. Journal of Personality and Social Psychology 16(b), 265–273 (1970) 19. Shinebourne, P.: Erich Fromm’s The art of loving: 50 years on. Existential Analysis: Journal of the Society for Existential Analysis 17(b), 397–408 (2006) 20. Sternberg, R.B.: Construct validation of a triangular love scale. European Journal of Psychology 27, 313–335 (1997) 21. Vetere, F., Gibbs, M., Kjeldskov, J., Howard, S., Mueller, F., Pedell, S., et al.: Mediating intimacy: Designing technologies to support strong-tie relationships. In: Proc. SIGCHI 2005, pp. 471–480. ACM Press, New York (2005) 22. Weinstein, R.S.: What heals in psychoanalysis? Psychoanalytic Inquiry 27, 302–309 (2007) Insight into Kansei Color Combinations in Interactive User Interface Designing K.G.D. Tharangie1, Shuichi Matsuzaki2, Ashu Marasinghe1, and Koichi Yamada1 1 Department of Informatoion Science and Control Engineering 2 Department of Electrical Engineering Nagaoka University of Technology, Japan tharangie@yahoo.com, shmatsu@vos.nagaokaut.ac.jp, {ashu,yamada}@kjs.nagaokaut.ac.jp Abstract. Color has a major impact on Human Computer Interaction. Although there is a very thin line between appropriate and inappropriate use of color, if used properly, color can be a powerful tool to improve the usefulness of an interactive interface in a wide variety of areas. On the contrary the excessive or inappropriate use of color can severely hinder the functionality and usability of an interface accordingly. A good visual design provides higher level of user satisfaction and further aids with conveying the intended message to its audience. In this paper we focus on one requisite aspect of visual design as such the Color, revealing one hidden dimension of color; Affectivity, by acquiring prospective user’s concealed color aesthetic preferences, employing Kansei Engineering Assessing System with respect to interactive Interfaces. Keywords: Interactive environment, Kansei Engineering, color, Visual design, Affective color. 1 Introduction There is an ample of research work already carried out in the field of color aesthetics and interactive design. There is abundance of design guidelines and recommendations available for visual design including the use of colors in interface designing, data visualization, Interactive interfaces designing etc; nevertheless these guidelines conflict with one another as there are no precise protocols for interface designing. Therefore, designers resort to follow an experimental approach, which also becomes the guideline for him with his level of maturity. In this research, we neither associate colors with any already existing color associations nor consider the color psychology, color symbolism or color theories; instead, we are employing Kansei Engineering to perform this task for non designer and also for designers. We collect a number of adjectives that people use to express their feelings about the objects that they find in their daily life and combine them with the trait words in the domain. In other words, request prospective users or experimental subjects to select the adjectives that which they think is useful to describe the characteristics of colors. Furthermore these adjective elaborate the inner thoughts of the user, J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 735–744, 2009. © Springer-Verlag Berlin Heidelberg 2009 736 K.G.D. Tharangie et al. and can be used as a measurement to decide the relevance or usability of color in a particular situation. In brief, the main purpose of this research is to listen to what user has to say about the use of colors in already existing design materials, and by analyzing, organizing, modeling and presenting them as a pool of knowledge in a certain way as solutions for designers to design a better Interactive environments for the respective users. Distinct users from different cultural, educational, professional backgrounds may have different kinds of individual favours. That is the very reason why designers cannot accommodate already existing designing protocols as it is, when designing for a specific user group. In this research we attempt to explore user’s hidden point of views in a form of affective parameters, and analyze that to generalise and derive common human affective patterns which everyone share as human beings. It is not necessary that red should represent anger, as described in color psychology. The prime focus of the research is that, how people feel about the red when it is associated with other colors, represented as background color, foreground color. Is it easy to read? Does it really contribute some value to usability of the Interactive learning environment? Does it increase the visual search? or does it make the interactivity more exciting and challenging? What colors we should use to emphasise, highlight important points? Concisely, using Kansei Engineering we capture the affective parameters from the actual users, and process it in Kansei Engineering to output better solutions to design a products and services. 2 What Is Kansei? Several Scholars has given various definitions for Kansei. What is highlight by many of them is that Kansei is a subjective internal process in the brain, which activated by an external stimuli (see Fig. 1). Nagamachi further explains this concept by articulating that Kansei is an individual's subjective impression from a certain artefact, environment, or situation using all the senses of sight, hearing, feeling, smell, taste as well as recognition”[1]. Lee et al. further elaborating the word ‘Kansei’ says that the word Kansei is embedded with more semantics such as sensibility, sense, sensitivity, aesthetics, emotion, affection and intuition. [2] 2.1 Kansei Engineering In a very special manner via human affect, feelings, emotions Kansei Engineering is able to capture the information which we sense from five sensory organs as an input and send though various screening levels to produce the final output. The Engineering process (see fig. 2) is a network which is interconnected and it recognises the correct functions to generate inter-relationships and produce the output with sense. Before using Kansei as an engineering method, there are three focal points one should understand about Kansei; Firstly How to understand Kansei, secondly how to reflect and translate Kansei, finally how to create system and organization for Kansei Oriented Design. Even though it is very difficult to capture one’s Kansei; it can be approximately measured using fallowing four different methods [1] People’s behaviours and actions, Words (spoken), Facial and body expressions, Physiological responses (Heart rate, EMG, EEG). Insight into Kansei Color Combinations in Interactive User Interface Designing 737 Fig. 1. Kansei is an interpretation of a subjective Experience. A person perceives an object at first and further elaborates it to get a better understanding of the object. During this process of conceptualization persons’ experience background influence predominantly in generating his subjective impression/ Kansei; for generating the internal picture. Above mentioned five measurement methods again can be categorized under two main categories such as psychological and physiological measurements. Some Kansei engineers prefer Physiological methods where they can directly measure user’s facial expressions, Heart rate, EMG, EEG etc. during a user is actively interacting with the experiment. However some engineers believe Physiological measurements are not exactly Kansei itself rather a by-product of Kansei. Therefore engineers commonly accommodate more elaborative semantic differential methods. In semantic differential method, semantic scales are prepared using emotional related adjectives combined with trait words in the domain. In addition to Semantic Differential methods (SD methods), engineer can also get feedback from subjects’ comments and recommendations; a qualitative measurement method, which expresses user preferences in their own words as answers for an open ended questionnaires. In semantic differential method, engineer collect Kansei words or adjectives related to the product domain (in this research emotion related adjectives which express the attributes of colors in respect to interactive interfaces), and categorize adjectives that represent the user preferences and opinions in the particular domain and test them against the 5 or 7 point (Semantic Differential SD) scale [3]. After gathering SD data and analyzing statistically, Engineer makes suggestions for the designer to amend his design procedure with improved sense. If gathered result data was represent by a pyramid, primary data belongs in the bottom level or the basement of the pyramid. When data was analysed further, according to its refined level, it goes higher in the pyramid until it reach the peak of the pyramid where we find the highest level of Kansei or general Kansei. Lower degree of Kansei is more subjective or individual than 738 K.G.D. Tharangie et al. higher levels of Kansei. When Engineer analysed and refined the data it contributes to build up higher levels of Kansei. Highest level of Kansei is more refined and generalised data of subjective Kansei and from that data new design principles derives. Although the initial aims of Kansei Engineering was to build-up a methodology that is capable of constructing industrial products and services which reflect the users’ personal preferences and requirements; currently it is emerging into a new direction, where it can assist with fundamental technical issues in Human Computer Interaction for realising safety and pleasantness of individuals, which is considered to be the most fundamental problem in the coming information network society. [6] Fig. 2. Kansei Processing System Fig. 2 represents Kansei Engineering process in cooperation with Chisei. In order to explain the domain of Kansei clearly, Lee et al., [2] gained the aid from the Japanese word Chisei. While Kansei is closely connected with affective emotional values, Chisei is closely connected with logic recognition and understanding. Both of these concepts are activated by sensory inputs but going through two different ways of mapping and processing and finally out put the result as creativity and knowledge. 3 Method Subjects are 20 male and female under graduate University students in Information Technology Engineering in Sri Lanka. They are between the age of 20 and 30. No subjects reported in color deficiency. All of the subjects reported extensive experience with computers and most of them used the computer for more than 3 years during the period of their study, thoroughly experienced in programming and computer software handling. None of them were specially trained for any specific color related applications. Moreover, the research was conducted in paper medium in daylight and normal classroom condition. Research methodology consists of three consecutive sessions: Preliminary data acquisition (collecting Kansei words and collecting computer interfaces), detailed data gathering questionnaire, testing with bipolar scales. Eleven basic colors used in computer interface designing; known universally, as the colors that “almost never confused” (red, pink, purple, blue, green, yellow, orange, brown, white, grey, black) accommodated as the basic testing color palette (see table 1). Insight into Kansei Color Combinations in Interactive User Interface Designing 739 Table 1. Selected Color Palette Colour Red Orange Brown Yellow Green Blue Purple Grey Pink White Black RGB Value 255,0,0 255,165,0 165,42,42 255,255,0 0,128,0 0,0,255 128,0,128, 128,128,128 255,192,203 255,255,255 0,0,0 Research was conducted in the morning in an enclosed quiet room in normal day light conditions. Questionnaires were conducted in subjects’ native language (Sinhala). As mentioned above research questionnaire was designed employing user comments; a qualitative measuring methods and semantic differential scale quantitative measuring methods. Employing above two methods the Kansei Color questionnaire was prepared to address the following areas. Kansei color questionnaire employed for this research consists of two main sections. First section is the open-ended questionnaire. Second section consists of color testing with semantic differential scales. Experiment was conducted in three consecutive stages as fallows; Preliminary data acquisition, Detailed data gathering questionnaire, Testing against bipolar scales. Varieties of questions in questionnaire are as fallows. • • • • • • Selection of favourite colors and reason for selecting these colors. Matching colors and Kansei words Suggesting Kansei words for respective colors Color interface; testing background, font and theme colors Difference between less colors interfaces and Colorful interfaces Testing colors with Semantic Differential scales 4 Result and Discussion 4.1 Affective Color Fig. 3 is an example for the capability of Kansei Engineering on acquiring people’s hidden aesthetic preferences and quantifying those using semantic measurements of Kansei words. Furthermore, these graphs summarise the color preferences of the test subjects, for four of the Kansei words chosen in the study such as Pleasant, Boring, Exciting and Disgusted. It comprehends except colors brown, grey, black and red, other colors were considered as comparatively pleasant color choices. When designing Interactive Interfaces, designers has to make preferable color choices or select preferable color combinations. Kansei Engineering can address that 740 K.G.D. Tharangie et al. problem employing Kansei words. Using Kansei words, Kansei Engineer can measure numerous user preferences about color combinations and analyze them statistically to find a general or abstract pattern. Examining this result designer can get a basic idea about what color choices to make, what colours can be grouped together, and what colours are more compatible with chosen colors. In this research, for the Kansei word Boring and Disgusted subjects have again eminently selected black, brown and grey, colors and comparatively eliminated the others. Furthermore result of Kansei word Pleasant; blue, green, yellow, orange etc. have been selected as not boring and disgusted colors. The result of the Kansei word Exciting also assert the results of first Kansei word pleasant and illustrates that yellow, blue, green, pink and orange are preferable color choices for an interactive environment. 1 0.5 1.5 1 0.5 0 Pink White Black Grey Pink White Black Purple Blue Green Yellow -0.5 Brown 0 Orange 0.5 0 Red 1 Black White Pink Grey Purple Blue Green Yellow Brown Orange Red Colour Preferences 1.5 Colour preferences Grey Purple Blue Disgusted Exciting -1 Green Colours Colours -0.5 Yellow Brown -1 -1.5 Orange -0.5 Red Black White Pink Grey Purple Blue Green Yellow -1.5 Brown -1 Orange 0 -0.5 Colour Preferences Boring 1.5 Red Colour Preferences Pleasent -1 -1.5 -1.5 Colours Colours Fig. 3. Variation of color for Kansei words Pleasant, Boring, Exciting and Disgusted Above mentioned result does not convey that each and every time when designer intend to design a user interface he needs to conduct a Kansei research. There are already existing Kansei expert systems, which particularly non designers but also designers can employ when selecting color combinations or designer himself can conduct a simple Kansei research for prospective users and grasp the color choices of the user through his affective/Kansei expressions. Nevertheless, the important point that should be noted here is not Kansei Engineering itself but apart from the all color physics relating to colour, there is another aspect or dimension of colors, which influence the user considerably but we apparently ignore as designers. Moreover, results reveal that as colors, black, grey and brown are not so exciting or pleasant while colors as blue, yellow, pink green comparatively have higher Insight into Kansei Color Combinations in Interactive User Interface Designing 741 preferences. With regard to this results, when designer select so called boring, unpleasant colors, they have to be careful about to what extend they can use those colors or how they can balance these colors to make color combinations to be combined with highly preferred colors. As per the results, preferences for the theme color were chosen in descending order; blue, green, white, yellow, pink etc as such. This result also confirms that colors that preferred as pleasant and exciting colors (see fig. 3) are also good color choices for theme colors. However, this selection of vivid colours does not mean that black, grey and brown are unusable colors. Although they are not so preferred as the other colors, a good designer can manipulate these colors blending in right proportions in color combinations to highlight the other colors or balance or neutral the contrast of eminent colors. If one makes a crucial color choice and if he is not so certain about that colour combination, and if many other design matters depend on this choice, simple semantic differential analysis test with Kansei words will be immensely assistive. 4.2 Color vs. Black and White Result reveals that 85% of the subjects preferred having colored interactive interfaces than black and white or mono color interfaces. If one examined the above graph (see fig. 4) it would be easy to realise that color environments are preferred over black and white; vivid colors were given higher ranking in positive Kansei words region than black and white. Nevertheless, it should be noted that; although black is not a popular color in this research questionnaire results, white was preferred more than some eminent colors such as blue and yellow. This research was conducted in Sri Lanka. In Sri Lankan culture black is considered to be the ‘colour of Evil’ and white is meant to be the ‘colour of Sadness’. Another aspect of the same is that the black is regarded being the ‘colour of corruption’ and white is reckoned as the ‘colour of purity’. These are cultural codes that a person learns through his socialization. Therefore, it is imperative to remember that culture codes play a major role in the process of generating ‘subjective impressions’ in one’s mind. Solution for this problem is that the culture specific Kansei engineering and finally generalisation with comparison. Colour Black & White 1 1 0.5 0 bo rin -0.5 -1 Pl g ea s H an t ap p y Ex ci D tin g isg us te d Colours -1.5 Red Yellow Colour Preferences Colour Preferences 1.5 0.5 0 -0.5 bo rin g Pl ea sa nt H ap py Ex ci tin g -1 -1.5 Green Blue Colours White Black Fig. 4. Variation of color for Kansei words for color Vs Black & White D is gu st ed 742 K.G.D. Tharangie et al. 4.3 Color Temperature and Affect Almost all the subjects considered good functionality is more important than good visual presentation or an attractive visual environment, but color add excitement and flavour to the learning activity. (Warm colors that generate warm feelings, so as they draw attention instantly than others which put our mind at agitation; uncomfortable to associate for a longer period of time to fig. 5) Warm colors such as red, yellow, pink and Orange can be utilised to draw attention to particular areas of the Interface and also in assistance to enhance the visual search, thus when someone decides give away a larger area of their interface to a warm colour, it should be done with caution. According to their affective nature warm colors fatigue the eye as well as the mind. On the contrary, Cool colors are pleasing to the eye as well as to the mind which make them good candidates for background colors and theme colors. However one should keep in mind that we can not talk about colors in terms of color Hue alone, the other attributes of the colors such as brightness and saturation also contributes to the color temperature, accordingly the affective quality of colors. [4], [5] Warm Colour: Red Happy 1 Pleasant 0.5 Exciting Cool Colour : Blue Happy 1 Pleasant 0.5 0 Boring -0.5 Anger -0.5 Boring -1 -1 -1.5 -1.5 Sad Affectionate Disgusted Fear Surprising Exciting 0 Angry Affectionate Fear Sad Surprising Disgusted Fig. 5. Warm colors intense towards warm feelings and cool colors intense towards cool feelings 4.4 Color Relations Relationship that each Kansei words has with colour, works as a grouping mechanism for Kansei words. Ten Kansei words, which were chosen out from more than 500 adjectives as the result of highest ranking by the subjects, were reduced to 4 factors or groups adopting factor analysis. i.e. Intense, Dull, Lively and Calm. For an instance, the factor Lively, consist of Kansei words such as happy, exciting and pleasant. Kansei words in one factor, strongly correlated to each other and significant at 95% level of confidence. The important point to make here is that, colors that work prominently behind the screen to form these Kansei words groups can be mapped and grouped under the above mentioned factors. Colour ranking for Kansei words were done using non parametric statistical method chi-square. By analyzing the ranking result of colors, engineer can derive what is the most suitable colour to represent respective Kansei words. Fig. 6 illustrates the fluctuations of variability of factors according to Insight into Kansei Color Combinations in Interactive User Interface Designing 743 colours. This graph approximately illustrates the representation of respective colours in each factor. Following is the summary of the section of Interface design related questions. Output of these results is also aligning with the result of quantitative questionnaire in the above sections. Appropriate using of colors provides a pleasing comfortable environment to work with. Color can assist with creating better mental models i.e. creating a hierarchy in the interactive environment, by assisting the user to understand the structure of the system. Consistency of color makes a virtual structure and supports the navigation. Pleasant colors make the interactive environment a pleasing place to work and it reduces boredom and sway in the readers mind towards enthusiasm. Using different shades and tints in the design, good designer can create elegant aesthetic pattern, which make the interface a special and comfortable place to interact with. On the contrary, results reveal that wrong use of color has an ability to make the environment more complex and noisy and lead to miscommunication. Dispersion of Factors 1.5 0 Black White Pink Grey Purple Blue Green Yellow Brown -1 Orange -0.5 Red Preferences 1 0.5 -1.5 Colours -2 Lively Dull Intense Calm Fig. 6. Dispersion of 4 derived factors Calm, Lively, Intense and Dull With all the color knowledge, simultaneously, it is an added advantage to have common understanding of affective parameters for colors for proper communication and coordination of colour for interactive Interface Design. Unnecessary utilisation of colors in design may lead to unattractive interactive interfaces which user is not willing to communicate with. Finally it is very important to accommodate color deficiencies; color itself should not be used to convey vital messages, it is always better to accompany them with text. Concisely, subjects believe that the color has an ability to motivate, excite, draws attention to a particular matter and provides emphasis. If the color does not fulfil the intended purpose or deliver the right message, it can affect the credibility of the content and disturb greatly the aesthetic value of the interactive environment. Even though colour has an affective meaning, it should not be used alone. Most of Kansei generated by colours are universal, but some are influenced considerably by the culture. Therefore, when designing any computer interface, the designers should be careful enough when accommodating already defined general rules of color aesthetics. Thus Kansei Engineer can capture all these discrepancies since engineering procedure starts from the lowest level (basement of pyramid) and rise higher towards general Kansei. 744 K.G.D. Tharangie et al. 5 Conclusion Kansei Engineering acquires the prospective users’ feelings about the existing products and analyse them to localise or individualise it to users’ preference needs. Since Kansei Engineering does not fully depend on already existing domain of specific values, it has a higher ability to engineer and incorporate the most user visual preferences into a particular design. Hence visual design is one of the important criteria that is used to distinguish a high quality products; it does not only satisfy users aesthetic needs, it also assists with improving the functionality of the interactive Interfaces; This research focuses on the affective quality of color to utilise it in Interactive Interface design. Understanding the affective quality is an extra aid to improve the simplicity, consistency, clarity and handling cultural differences. Good visual design is high quality, appropriate and relevant for the audience and the message that it is supporting. Quality of a product is not determined solely by its functional aspects, it also crucially depends on users’ preferences, opinion and evaluation. References 1. Nagamachi, M.: Workshop 2 on Kansei Engineering. In: Proceedings of International Conference on Affective Human Factors Design, Singapore (2001) 2. Lee, S., Harada, A., Stappers, P.J.: Pleasure with Products: Design based Kansei. In: Green, W., Jordan, P. (eds.) Pleasure with Products: Beyond usability, pp. 219–222. Taylor & Francis, London (2002) 3. Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illions press, Urbana (1957) 4. Ou, L., Luo, M.R., Woodcock, A., Wright, A.: A study of colour emotion and colour preference, part 1: colour emotions for single colours. Color research and application 29, 232–240 (2004) 5. Xin, J.H., Cheng, K.M., Taylor, G., Sato, T., Hansuebsai, A.: Cross-regional comparison of colour emotions Part I: Quantitative Analysis. Color Research and Application 29, 451–457 (2004) 6. Tomomasa, N., Hidenori, T., Takashi, U.: Overview of Kansei engineering: a proposal of Kansei Informatics towards realising safety and pleasantness of individuals in information network society. The International Journal of Biometrics 1(1), 7 (2008) Learn as Babies Learn: A Conceptual Model of Designing Optimum Learnability Douglas Xiaoyong Wang Xin Songjiang Road 1111, 25-502, 210021, Songjiang District, Shanghai, P.R. China w_xiaoyong@hotmail.com Abstract. A newborn baby’s first move is to look for the nipples. This is an instinct for a baby to live, build strength and interact with the world. The interaction seems very similar to our users’ choosing a product for self-empowerment and productivity. However, most users are not babies, neither the majority of man-made products embody perfect affordances. How could user experience designers help to create an easy-to-learn product for specific user goals? This paper explores the answer via a balanced view on user-learning and machinelearning, and proposes designers’ early engagement in conceptual design together with full awareness of users’ learning constrains, so as to make users happier and thankful since initial contact with the product the designers created. Keywords: User Experience (UX), Learning Curve, User Centered Design (UCD), Harmonious, Learnability. 1 Introduction: The Perfect Learning 载营魄抱一，能无离乎。专气致柔，能如婴儿乎。涤除玄鉴，能如疵乎。爱国治民，能无为乎。天门开阖, 能为雌乎。明白四达，能无知乎。生之，畜之，生而不有，为而不恃，长而不宰，是谓玄德。 [Translation]: “Can you embrace the One with your soul, and never depart from the Way? Can you concentrate your vital force to achieve the gentleness of a new-born baby? Can you cleanse and purify your mystic vision until it is clear? Can you love the people and govern the state without interfering? Can you play the role of the female in opening and closing the doors of heaven? Can you understand all and penetrate all without using the mind? To give birth and to nourish, to give birth without taking possession, to act without obligation, to lead without dominating-this is the mystical power.” – Tao Te Ching, Chapter 101 1 道德经 Also translated as “Dao De Ching” (dào dé jīng ) This ancient book of Chinese philosophy is central in Chinese religion, not only for Taoism (Dàojiāo 道教) but Chinese Buddhism. Many Chinese artists, including poets, painters, calligraphers, and even gardeners have used the Tao Te Ching as a source of inspiration. Its influence has also spread widely outside East Asia, aided by hundreds of translations into Western languages. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 745–751, 2009. © Springer-Verlag Berlin Heidelberg 2009 746 D.X. Wang Being a new father, it is inspiring for me to see the best use of nipples. The function, giving milk to a newborn baby, is well presented through an appropriate shape and size. The interaction between the mother and the baby is effective right after the baby’s birth. During this time, the baby builds energy from the mother and she learns the baby’s status through nipples. Neither side needs to tell the other what to do in this harmonious system. Laozi (the author of Tao Te Chine) was fascinated by babies’ learning power and regarded it as an effective means to reach ‘Tao2.’ Babies learn by all means and all senses: nose, tongue, eyes, ears, skin and so on. They would cry out for attention and invite interactions from the outside (mostly the mother at this time), so the ‘world’ would respond also. Such learning model has optimum effectiveness and resides in no constrains. As a baby grows older, he starts to know more. To some degree, this experience of learning grows also constrains that reduce the need for learning more outside the box. Most of us fear fire because it burns and hurts. If burned once, a person will not touch fire again, despite the fact that the same fire can cook and keep warm. The intent to avoid the fire protects a person from being hurt, but also creates biased views for other uses. Same truth lies in positive experiences. For example, spoiled kids may find it hard to try with their own creative thinking, thanks to the ready-made stuff from their parents. If our users learn to use a designed product (a home appliance, a piece of equipment, or computer software) in the same mode as babies learn (the metaphor I used above), their productivity will be optimum and immediate. But users are NO babies, meaning that they don’t always learn by instinct (maybe some do have such talent). World leading athletes, for example, are well-trained to focus on their body or equipments at the unconscious level, in order to achieve maximum performance. However, most athletes gain such performance after long-term hard training rather than an instinctive gain. 2 The Learning Curve We learn stuff by two ways: either “try n error” or instructions. Our productivity increases over time as we apply what we learned into practice. Such phenomenon is called the “learning curveit refers to the graphical relation between the amount of learning and the time it takes to learn. Our users are facing such learning curves everyday on the products we design for them. Unfortunately, not many of those are designed perfectly. It is also possible to see perfect designs, mostly as a result of being welldesigned or as the product has evolved into maturity after generations of redesign. As UX designers, what can we help to create a learnable product with a shortest learning curve? Do we have to always separate the learning curve from the actual productivity curve (see figure 1)? Can our users start enjoying the product productively since the ‘initial contact, without any learning stage? 2 道 Tao ( , Pinyin Dào ) is a concept found in Taoism, Confucianism, and more generally in ancient Chinese philosophy. While the character itself translates as "way," "path," or "route," or sometimes more loosely as "doctrine" or "principle," it is used philosophically to signify the true nature of the world. Learn as Babies Learn: A Conceptual Model of Designing Optimum Learnability 747 Fig. 1. Learning Curve Vs Productivity Curve over time There are many certified training centers for software like Photoshop, AutoCAD, or 3DS Max, etc. People pay so much money on learning these products, in order to have some career/life-changing opportunities. If we can manage to pull the starting point of use as early as the learning starts, our users will so much happier and thankful, as they can see something meaningful instantly and their investment are paid off. There are also experts who use the product for many years and very well. However, this only happens at the end of the learning curve and most users would not like this. If we see the interaction as a relationship between the product and the user, Levinger's “Relationship Stage” theory3 reveals the relationship going through a series of stages as it matures. His model has ABCDE stages shown as below. A = Acquaintance/attraction. We meet other people and feel an initial attraction, often based on physical beauty and similarity. B = Build-up. We become increasingly interdependent as we reveal more and more about our private selves. We get irritated by one another, but the more pleasant aspects may well keep the relationship going. C = Continuation/consolidation. Longer-term commitments are made, such as marriage. The partnership enters what may be a life-long stable relationship. D = Deterioration. Many relationships decay, due to several factors. These include relative effort, rewards, barriers to exit (such as marriage and social obligation) and the availability of alternatives. E = Ending. The relationship ends when partners agree to separate or one leaves. If a user wants to reach the golden time of high productivity at stage C, one must go through A and B. Normally, A and B would take very long time to remove the tentativeness, getting familiar and learning the inside out. 3 The Two Parties of Optimum Learning If we apply Levinger’s stage theory to the relationship between the user and the product, designers work as the bridge in between. Users may have zero or certain level of background knowledge before use. The designers are trained to create tools using the users’ language4. For example, you can design a device with a button and expect a user to push, even though the user has no idea what would happen afterwards. Such 3 Levinger, G. : A social psychological perspective on marital dissolution, Journal of Social Issues 32 (1), 21-47 (1976). 4 The user’s language refers to a particular user group’s culture, religion, profession practice and so on. 748 D.X. Wang affordances, if properly designed collectively into a product, offer an easy ground for the user to understand and operate. A digital product could also be smart enough to understand the users’ behaviour and intention. For example, Microsoft Windows can detect the idle time of a computer and do something useful and time-consuming. Some programs can even provide options for us to choose from, with the history data the computer collected and analyzed, such as the shopping experience on Amason.com. A user may have very different understanding and levels of knowledge on how a product should be used. Such belief or perception should influence design decisions. The designers need to dive into the users’ mind in order to create something appropriate. In the past, some professional users are also the designer for their own tools, which is a much reasonable model. For example, blacksmith make tools for better bending and squashing. However, today’s products are mass produced to serve average consumers. Things can be done with common senses or simple instructions. Products with complex user manuals normally fail because the learning curve seems to long and doesn’t pay off the investment. 3.1 No Learning, Just Do It The affordance of a product guides a user to interact. A cup says, “You can drink water out of me.” How? You can pour water in my empty ‘body’ and hold the handle designed for human hands. For sophisticated human beings, the designer can add more variables to make the cup more meaningful. If we use color to represent the temperature of the contained liquid, and the shape to indicate gender, a buyer fitting into designed profile would find it very easy to choose and use. Tangible interfaces show a great future for those frustrated consumers. A mobile phone user now needs to search the menu for a feature and press on the right keys to perform; otherwise, the phone will not respond correctly. With tangible interaction, you could easily mute your phone by flipping over when getting a call at meetings. You can knock on the phone screen to call someone politely as if you are knocking his door. Your friend could recognize the way you normally knock and answer the call. Such affordances are very easy to learn. Apple’s iPhone touch-screen interface empowers users to slide and zoom with fingers, and change direction of screen via gravity sensors, which achieve great success by giving users sense of natural interaction. In 2006, Wii from Nintendo became a big success in gaming industry, leaving far behind the traditional game consoles using push buttons. 3.2 Design for Different Users Professionals, who work in architecture, civil engineering, or plant, have deep knowledge of the industry. For example, architects know how to build ‘green’ buildings; engineers know how to design efficient and safe machines. These users have strong opinions that are very valuable to success; however, such knowledge sometimes stops them thinking creatively or out of the norms. Thanks to their proven tracking records, it is hard to convince them to try a different way, even with proved productivity gain. The typical response would be, “I know that, but it is not going to work,” or “oh, I know exactly what I want…” Learn as Babies Learn: A Conceptual Model of Designing Optimum Learnability 749 To design for professional users, firstly, a designer needs to build trust and credibility in the filed by learning that industry. Then, the designer needs to share a common language accepted by the users. Only within such framework, an innovative design proposal could gain success. Of course, design concepts should be validated iteratively to reduce the cost of redesigning. Consumer products have wider reach in the market. They are built for everyday and need little training. Most users rather use “try-n-error” than read pages of instructions. But when learning a complicated and important product, a user would accept to read the manual for a specific feature. In today’s software industry, help documentations are sometimes regarded as a ‘must-to-do’ and some companies have it as part of a standard UCD (User-Centered Design) process. For certain products with very complex functionality, it seems necessary to have a help system. But for simple tools like the calculator in Microsoft Windows, it becomes ‘ironic’ to see ‘Help’ on the menu. Products also embrace strong social status. When a person is using a unique tool or capable of installing a complex system, the process creates self-assurance and empowers confidence. In public, such ability strengthens social status. However, a poor use experience of using or learning decreases one’s confidence. My mom is welleducated and teaches in a university, but she normally refuses to operate electronics like a DVD player or a video camera. Even after I show her a few times, she will still asks me to do it the next time. Why? When she interacted with electronics in the 80s, those products were made very hard-to-use and expensive. So the fear of breaking the device or loosing face and confidence have kept her away until today. I bet that, if such devices continue to have more than one power button, she will never use electronics. 4 Understand the Message A few years ago, I relocated to China from Canada and was annoyed by cars’ honking at pedestrians when turning. This phenomenon on the road is common and did not annoy me at all before I move to Canada. Most Canadians drive nicely and don’t honk, unless it is an emergency and the driver wants to warn others about danger. In this context, honking is interpreted as a sign of danger, or mostly a bad manner on the road. In China, people have to fight for their way to get moving in the crowd. This fully occupies the drivers’ mind when they drive, so road rage is very common. Honking is a natural way to show existence. If we think positively, honking is actually a friendly signal to for pedestrians to be more careful, and contains a good intention, once you accept the context. After realizing this, I felt much better when walking around and even sensed care out of being honked: the drivers want me to be careful on the road. Certainly, drivers can’t just expect pedestrians to learn their intention and move away. Wrong interpretation will cause serious accidents, especially when our planet of earth gets ‘flatter’ everyday. More and more people from different cultures or regions live in Shanghai and bring in various values. Once, one of my US colleagues landed at the Shanghai Pudong Airport, and almost got hit by a bus in the airport. It is his first time to visit from America, so he had no idea about dodging the bus. The bus 750 D.X. Wang driver did not realize that this person is NOT part of the local protocol. The bus stopped right in front of his feet, thanks to driver’s quick response. Similar phenomenon can be seen in user experience design. A lot of products blindly designed with mal-interpreted metaphors. A designer could assume to know the norms, but the users still can’t use it and the market won’t buy in if it is not right. When an interface is properly designed for a context, it could tell a good story and even show the designer’s awareness and skills behind. This connection is hidden to most people, and often unknown by the designer. An experienced designer would be fully aware how to tell a story through interaction, and direct the entire user experience as a movie. 5 Smarter Products, Smart Interface “How will my toaster get better, making toast the way I prefer?” – Donald Norman Donald Norman has talked about smart machines in “Emotional Design.” More importantly, the machines need to proactively adjust to the user’s intention and situational changes. “Robots will need something akin to emotion in order to make these complex decisions. Machines order to improve their work and make appropriate decisions.” For long, I am sorry to see humans being forced to learn machines, instead of just use. We spend years of education in school, to get better trained to use some computer programs. Such valuable time could be spent on other creative and strategic projects. Our products should be smarter. They should know what I am looking for and plan to do. In fact, the data for making such smart decisions is available in many types of software, and collected through “Customer Involvement Program” from companies like Microsoft, Autodesk, Adobe, etc. Currently, user’s behavioural data are sent back for corporate designers for improvements. For example, when an AutoCAD user clicks on the canvas, the clusters of clicks fall into particular workflows. The computer understands and analyze for predictions and improvements. Also, the computer also produces options to choose, as well as notify the user gracefully. However, this user has to wait for the future release to see the design updates. Why can’t the computer make gradual changes on the interface as we use? 6 The Optimum Learning Model My son opens his mouth, stretching his limbs to seek for the nipples. Amanda (my wife) looks at his adorable face and holds him for feeding. The harmony amazed me and inspired this paper. – December 18, 2008, 13:49, Shanghai Xu Jia Hui Delivery Room If the designers create a perfect design like the nipples, our users may have the chance to learn like babies. Donald Norman talked about the three mental models of an interactive computer system: the designer’s model, the machine system image and the Learn as Babies Learn: A Conceptual Model of Designing Optimum Learnability 751 user’s model. In his ideal diagram, the designer is responsible to match the user’s model with the system image and reduce the differences between the two. I believe that, however, these three models are not static, but dynamically interacting with each other in the following model. To reach such harmony, a designer should be deeply engaged in the design with the engineers at early stages and continuously adjust design specification with development. The designer should also know the users and the context comprehensively, via various UCD methods. The system learnability can be further optimized by smart proactive programs that detect users’ emotion, preference and facilitate users’ future behaviour. As the user learns, the computer will also learn. When the two parities understand the other and interact in harmony, they co-evolve into a mode of ‘optimum learning,’ just as Lao Tzu favoured two thousand years ago. UX designers can use this harmonious model create a world that is ‘easier to use’ and more ‘user friendly.’ References 1. Arnheim, R.: Art and Visual Perception: A Psychology of the Creative Eye, 3rd printing edn. University of California Press (1960) 2. Donald, N.: The Design of Everyday Things. Basic Books (September 17, 2002) 3. Donald, N.: Emotional Design: Why We Love (or Hate) Everyday Things. Basic Books (May 10, 2005) 4. Laurel, B.: Computer as Theatre. Addison Wesley Longman, Inc., Amsterdam (1993) 5. Levinger, G.: A social psychological perspective on marital dissolution. Journal of Social Issues 32(1), 21–47 6. McLuhan, M.: Understanding Media (Routledge Classics). Routledge (September 26, 2005) 7. Richard, D.: The Blind Watchmaker: Why the Evidence of Evolution Reveals a Universe Without Design. W.W. Norton (September 19, 1996) 8. , : . : (2006) 老子著陈忠评译道德经出版社吉林文史出版社 Time-Oriented Interface Design: Picking the Right Time and Method for Information Presentation Keita Watanabe1, Kei Sugawara1, Shota Matsuda2, and Michiaki Yasumura2 2 1 Graduate School of Media and Governance, Keio University Faculty of Environment and Information Studies, Keio University 5322 Endo Fujisawa, Kanagawa 252-8520, Japan {100kw,ks,t05844sm,yasumura}@sfc.keio.ac.jp Abstract. Today, people have far more access to relevant information than they can possibly consume. In this paper we describe a framework for Time-oriented Interface Design where information presentation and access is regulated according to when human activities afford opportunities for interacting with information. Information interfaces are then designed according to the time available during these opportunities, with the designs being constrained by salient aspects of the associated situations and contexts. In our view of time-oriented interface design there are four main types of situation where there may be time to view or interact with information: Spontaneous time; Waiting time; Background time; Interruption / Resumption. Information presented in these situations may be consumed without conflicting with the performance of other tasks. In the following presentation, the four types of information access situation are described. The use of time-oriented interface design is then illustrated by five prototype systems that have been developed in our laboratory. The paper will conclude with a discussion of lessons learned and an assessment of the potential for time-oriented human interface design to enhance future information interaction. 1 Introduction The amount of online information continues to grow, overwhelming our ability to keep track of it all. In spite of all new online information, including the digitization of pictures, music, movies and other content, people still only have a limited amount of time available for consuming information. And many people now have insufficient time to manage and consume all the information that they are interested in. Information consumption can also be effortful, requiring people to find time to process information and to operate devices such as personal computers and cell-phones. The complexity of interacting with information on some devices creates a barrier for many users, and this, along with insufficient time means that people often lose the opportunity to acquire information that is useful, or of interest, to them. The research reported in this paper focused on users who don't have enough time. The overall goal of the research is to develop methods for making information more readily available in situations where people have some readiness or ability to interact with the information effectively. The four main classes of information presentation J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 752–759, 2009. © Springer-Verlag Berlin Heidelberg 2009 Time-Oriented Interface Design 753 opportunity are: Spontaneous time, Waiting time, Background time, and Interruption / Resumption. Each of these information presentation opportunities will be introduced and discussed. This will be followed by the presentation of five prototypes that illustrate the application of time-oriented interface design. 2 The Time-Oriented Interface Design People are constantly faced with a tradeoff between accessing the almost limitless pool of online information and performing their other tasks and activities. Even when people want or need to interact with information they frequently will not because there is simply not enough time. Consider the following scenario. A person is preparing the material for a presentation the next day, while writing e-mail, and occasionally answering phone calls. These tasks are sometimes interrupted to go to the refrigerator and drink water and the person also sometimes takes a break and watches TV. In this situation, presentation software is not designed for dealing with the possibility of occasional phone calls. In general informational software and devices are blind to the circumstances of their users. Increasingly however, people work in a multi-tasking fashion. For example, if the user turns on the TV while taking a rest and movie starts that is interesting to her, she may then watch it. In this case, the work may be delayed by two hours. Both devices and applications assume that they have the full attention of the user and it is the user who has to take full control of managing her time and activities. However, the inflexibility of applications and media makes the life of the information consumer more difficult than it needs to be. Some media can be sampled such as novels and comics whereas other forms of content, such as songs and movies are best experienced continuously, from beginning to end. Thus knowing the time a person has, their informational priorities, and the temporal properties of associated media, allows devices and applications to predict when and how to deliver information. In the following section four distinct types of information delivery are presented, along with associated prototype applications that exploit the temporal properties of each of the types. 2.1 Spontaneous Time, Waiting Time, Background Time and Interruption / Resumption Human lives involving switching amongst various activities as people seek to achieve their goals based on the resources, opportunities, and constraints that the environment presents to them. Each one of us has to solve the problem of how to use time effectively. Time-oriented Interface Design makes time management an explicit part of Interface Design. In the following subsections we define the following types of information delivery opportunity: "Spontaneous time", "Waiting time", "Background time", "Interruption / Resumption". Five prototype systems that illustrate this time management approach to interface design are then introduced: Memorium, CastOven, Push&Pull, PhotoLoop, HappyPrinter. 754 K. Watanabe et al. 2.1.1 Spontaneous Time Spontaneous time is the slices of time availability that occur in between daily tasks. Sudden availability of small amounts of time can arise from many causes including interruption of concentration, taking a rest when feeling tired, looking around for inspiration or distraction, in the pause when switching from one task to another task, and so on. Spontaneous time availability frequently arises when people move around as required for moving between meetings, or going for a coffee break or going to the restroom. When moving, human perception persists and the human brain continues to think and to acquire information. Thus Spontaneous time provides an opportunity for people to be entertained, information, or to interact with information in general. Persistent interfaces help users obtain information during the Spontaneous time in their daily lives. Clocks, calendars, and memos mounted in public spaces are all examples of persistent interfaces. Increasingly, persistent interfaces are utilizing screens and digital media. Typically, persistent interfaces run autonomously and they require or enable little if any interaction. Persistent displays in public spaces are used to inform travelers of train or flight arrivals, provide weather forecasts, and to advertise products and services, among many other uses. Persistent displays can also be private such as messages stuck to the refrigerator door or the inside of the front door as reminders. Memorium The “Memorium” prototype system [5] is an example of a digitally enabled persistent interface designed for personal, or small group, use. This system provides ambient information concerning Web searches of possible interest to observers. Memorium provides a kind of serendipity, or brainstorming, where users see new relationships and informational opportunities without having to go to the trouble of formulating and executive searches. Memorium works by starting with some seed terms and then conducting searches, which are then visualized as moving text boxes on a screen. When boxes collide, terms are mixed and a new search is carried out leading to a new floating box on the screen. Thus ambient is both a form of Web data mining and also a visualization of Web search results. 2.1.2 Interruption / Resumption The previous section described spontaneous time as the time that occurs between tasks or while people are distracted or taking breaks. Looking at time management more Fig. 1. Memorium: Presenting information related to the interest persistently Time-Oriented Interface Design 755 finely, there is also the transitional phase from task to non-task (an interruption) and also the transitional phase from non-task back to task (a resumption). Unless using software, users may not be aware of switching from activity to activity explicitly. However, humans are constantly interrupting and resuming activities whether or not they are aware of those changes in their state. From a time management viewpoint interruption and resumption of activities represent additional opportunities for obtaining information, but they also have special properties since they generally last for only a short period of time and do not typically represent an opportunity for inserting a new task into a person’s schedule. A prototype that exploits information access availability during task transitions will now be described. Push&Pull Push&Pull is a prototype that adds interaction capability to a persistent interface [7]. This system can be used passively or interactively and it smoothly switches from being viewed to being interacted with. The interface responds to actions that can be performed easily during a task transition such as "Push" and "Pull". Users can expand one slide of a (left to right across the screen) by pulling after which the slide is shown in detail and the scrolling stops. Through the reverse operation of pushing, users can then return to the scrolling mode where multiple smaller slide images are shown. This system exploits a natural form of interaction so that pulling a particular slide towards you is like picking something up and holding it closer for more detailed inspection, or picking an interesting book from the shelf to look at it before returning it (similar to a push operation in the slideshow). Push&Pull is associated with a persistent interface that has no start or termination. Passing users can approach and pull it if they are interested in a slide and push it back when they have seen enough. Push&Pull is ideally suited for activity / task transitions because it requires only a minimum of effort and concentration. Fig. 2. Push&Pull: [Left] pushed away situation (images move sideways) [Right] pulled out situation (images are expanded and stopped) 2.1.3 Waiting Time Waiting time is a frequent state for many people. It can occur when using software and waiting for an event to occur, and it also happens in many forms of queuing, waiting for transport, and so on. Simple examples of waiting time include: downloading from the Web, installing software on a PC, waiting for food to cook, boiling water and so on. In software controlled waiting tasks such as microwave cooking it is often possible to estimate precisely how long the waiting time will take. 756 K. Watanabe et al. Fig. 3. CastOven: Providing videos that fit each waiting time CastOven CastOven is a prototype system that allows users to see video content on the Web where the content is selected based on the required cooking time for the microwave [6]. When users set the cooking time and push the start button, a video of the same length of time as the cooking is played on a display on the front of the microwave. Because the video is the same length as the cooking time, the video playback ends when the cooking ends. This allows people to consume information during a waiting period that would otherwise likely be frustrating and wasteful. HappyPrinter HappyPrinter is a second example of an application for exploiting waiting time. This prototype of HappyPrinter is a special printer in which users can enjoy the experience of photo printing during the wait time for printing. When users start printing photos with HappyPrinter, the photos are printed out in a covered by red cloth, reminiscent of a red carpet. In addition, this system lights up the photos and plays background music when printing. In this case the waiting time is used to entertain, and to augment the experience of remembering the photos and the events and people that they depict. Fig. 4. HappyPrinter: Producing printing process with BGM and the lighting Time-Oriented Interface Design 757 2.1.4 Background Time Background time is time that users use by overlapping other tasks. For example, this includes working while listening to music. Today, a lot of machines are used routinely, and the automation of tools is advanced. Thus many tasks require only a minimum of supervisory control and it is possible to do other things while carrying out tasks such as washing. Taking public transit, such as a train or a bus, provides many people with a lot of background time. The following prototype is designed to exploit background time. PhotoLoop PhotoLoop is a slideshow system that records how people interact with the photo slideshow each time it is viewed [4]. The resulting video is then used as a narration for the view. This application capitalizes on the natural tendency to share photos with friends or family and to create narratives concerning photos and the events they depict. PhotoLoop adds new narrative content to photos and acts as a kind of simple content creation method for everyday life. While PhotoLoop requires more interaction than background music and the like, it is well suited to background tasks that require relatively little interaction from the user. Fig. 5. PhotoLoop: Generating new contents while watching 3 Discussion We seem a more effective relationship between Time and Informational tasks. Several prototype applications demonstrate how to use time more effectively. Abowd et al. augmented the concept of Ubiquitous Computing by Weiser [1] in the context of everyday living, and proposed Everyday life computing [2]. They also provided an earlier mention of the concepts successive interaction and interruption / resumption of interaction. Our study seeks to expand on this approach and show how it can be used to create novel systems for information interaction. Activity Based Computing [8] is a related approach but does not employ the Spontaneous or Waiting times used in this research. Time-Machine Computing by Rekimoto [3] proposed a kind of Time-oriented Interface. But, the study focuses time management. On the other hand, this study focuses the design of time. And so, we call this Time-oriented Interface Design. 758 K. Watanabe et al. It is difficult to predict a user's Spontaneous time ahead of using it. Thus, it is necessary to prepare "persistent information presentation environment" for realizing persistent interfaces in Memorium. The goal in Time-oriented Interface Design should be not only to manage time but also to create interesting, useful, and aesthetically pleasing interactions with media and information. CastOven that using Waiting time has Time-database that makes it possible for time to be a search query. The system can then provide users with information that is adapted for the appropriate waiting time. This approach can be extended to different types of information. For example, users can get the necessary time in using route retrieval. If the system can acquire this time information, the system can provide users with content that is adapted to the length of travel. As mentioned above, users sometimes choose to interact with information or not depending on how much time they have. However, the only way to manage the increasing information demands that people are faced with is to use available time more effectively and to expand the information interaction opportunities that a person experiences. Smart use of Spontaneous time, Waiting time and Background time may be the key to smooth consumption of the massive information that confront us in everyday life. This approach also provides a new communication channel to users for contents providers. It is comparatively easy to make solid blocks of time by create slideshow using the contents that don't have an intrinsic time line such as photos. On the other hand, it is hard to make contents of short time and good quality. To solve this problem, it might be necessary to combine elemental technologies. 4 Conclusion This research described the fact that users don't have enough time to consume the information that continues to increase exponentially. We proposed Time-oriented Interface Design for enhancing information use in everyday life. This study then introduced five prototype systems that demonstrated four different types of Time-oriented Interface Design. Acknowledgements. We thank Mark Chignell for assisting in the editing of this manuscript. References 1. Weiser, M.: The Computer for the 21st Century. Scientific American (International Edition) 265(3), 66–75 (1991) 2. Abowd, G.D., Mynatt, E.D.: Charting past, present, and future research in ubiquitous computing. ACM Transaction of Computer Human Interaction 7(1), 29–58 (2000) 3. Rekimoto, J.: Time-Machine Computing: Towards Time-centric User Interfaces. In: WISS (1999) 4. Watanabe, K., Tsukada, K., Yasumura, M.: PhotoLoop: An annotation system using users’ activities while watching slideshow. In: WISS 2007 Proceedings (December 2007) Time-Oriented Interface Design 759 5. Watanabe, K., Yasumura, M.: A proposal of persistent interface and its implementation for ubiquitous environment. IPSJ Journal 49(6), 1984–1992 (2008) 6. Watanabe, K., Matsuda, S., Yasumura, M.: CastOven: Providing contents that fit each waiting time in everyday life. In: WISS 2008 Proceedings (November 2008) 7. Watanabe, K., Yasumura, M.: The concept for interaction focused human action in everyday life and its prototype. In: IPSJ report, 2005-HI-115, pp. 69–74 (September 2005) 8. Norman, D.A.: The invisible Computer. MIT Press, Cambridge (1998) Enabling Interactive Access to Web Tables Xin Yang, Wenchang Xu, and Yuanchun Shi Department of Computer Science and Technology, Tsinghua University, Beijing, P.R. China yang-x02@mails.tsinghua.edu.cn, stefanie8806@gmail.com, shiyc@tsinghua.edu.cn Abstract. Tables are widely used in web pages. Unfortunately, most web tables can only be passively accessed but cannot be interactively accessed, that is, users can view information displayed in tables but cannot control the presentation of tables like sorting data and hiding or showing a column/row. Interactive access is especially useful when encountering large tables or browsing on small screens. In this paper, we propose to enable interactive access to genuine web tables based on automatic table detection and a good understanding of table contents. We designed and implemented a plug-in for the Microsoft Internet Explorer, called the iWebTable, which provides a customized user interface supporting interactive access to genuine web tables. Experimental results show that users are satisfied and really enjoy the interactive access mode to web tables, especially in such cases as they need to sort data in large tables or compare data in distant columns or rows. Keywords: Web table, Interactive access, Table extraction, Table interpretation, User interface design. 1 Introduction Tables are widely used in web pages, which are the most important carriers of the vast information on Internet. Being condensed and highly expressive, web tables have become the most common visual representation for hierarchically structured data on the World Wide Web. Basically speaking, there are two modes to access web tables: the passive mode and the interactive mode. In the passive mode, users only view information displayed in web tables, while in the interactive mode, users can also control the presentation of web tables, such as: sorting data, hiding or showing a column or row, and so on. Interactive access is very useful in various web browsing cases, especially for large web tables, for example: 1) searching for some column or row containing particular information; 2) comparing data in distant columns or rows. Users can enjoy even more benefits when browsing on handheld devices with limited screen size. Unfortunately, most web tables do not support the interactive access mode, making users feel frustrated in above cases, especially when encountering large tables and when browsing on small screens. In this paper, we propose to enable interactive access to genuine tables presented in web pages based on automatic table detection and a good understanding of table J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 760–768, 2009. © Springer-Verlag Berlin Heidelberg 2009 Enabling Interactive Access to Web Tables 761 contents. We designed and implemented a plug-in for the Microsoft Internet Explorer, called the iWebTable, which works in three steps: 1) automatically detecting genuine web tables by machine learning; 2) understanding the contents of a genuine web table and re-constructing a customized structure for the table; 3) creating a customized user interface supporting interactive access to a genuine web table. A genuine table [12] is used to display logically related data with significant semantics, while a non-genuine table in used to group non-related data or layout elements together in order to improve the appearance and understandability of web pages. The problem of detecting genuine tables from web pages and understanding table contents is not as easy as it appears to be. First, it is difficult to distinguish between genuine and non-genuine tables because: 1) the TABLE element defined in HTML is often used for improving page layout as well as displaying information; 2) some web page authors prefer to use other styles (e.g. CSS) instead of HTML TABLE elements to format web tables. Second, it is also difficult to identify table headers and understand the hierarchical structure of table contents due to the relaxation of HTML grammar, which is semi-structured in nature and treats rows and columns asymmetrically. Previous works solved this problem in two ways: 1) analyzing HTML documents; 2) utilizing the visual rendition of web pages. We prefer the former one, as previous methods [3, 6] of the latter one showed that it is more time-consuming and only supports those tables with pre-defined styles. In the experiment, we collected tables from 200 web pages to construct the training data set for learning genuine web tables. We tested on three different machine learning methods: Naïve Bayes, SVM with linear kernel and SVM with RBF kernel, with the last one performing the best. We also conducted a user study on iWebTable, which demonstrates the effectiveness of iWebTable and shows that users are satisfied and really enjoy the interactive access mode to web tables, especially in such cases as sorting data in large tables and comparing data in distant rows or columns. The rest of the paper is organized as follows. Section 2 investigates related work. Section 3 describes the design and implementation of iWebTable. Section 4 presents the experiment. Section 5 makes discussions and indicates future research directions. 2 Related Work We review existing works on web table analysis as well as innovative user interfaces for accessing web tables. Web Table Analysis. Web table analysis has attracted many attentions from researchers in the areas of Web Data Mining and Information Retrieval since late 1990s [2, 3, 5, 6, 7, 12, 13, 14]. In recent years, some researchers even try to extend table analyzing technologies to other document formats like PDF files [4, 8, 9, 10]. Lim and Ng [7] proposed to automatically retrieve hierarchical data from HTML tables by constructing the content tree for each table, without pre-requiring the internal table structure. Yoshida et al. [14] presented a new approach to integrate web tables describing similar objects based on table structure recognition. Cohen et al. [2] developed an extensible wrapper-learning system called WL2 to exploit HTML tables and 762 X. Yang, W. Xu, and Y. Shi lists. Yang and Luk [13] first presented the definition of Web Table Mining and developed a frame work for comprehensively analyzing the structural aspects of web tables. Hurst [5] noticed that HTML TABLE elements were more and more often used to control the layout of web pages and tried to identify true web tables by learning from HTML DOM and geometric features. Wang and Hu [12] focused on web table detection and proposed to automatically classify web tables either as genuine or non-genuine by machine learning. Krüpl and Herzog [6] also concentrated on detecting genuine web tables, but they relied on the visual rendition of web pages rather than the HTML code. Later, Gatterbauer et al. [3] extended the idea of visually guided web table detection and used a model of the visual representation of web pages to extract domain-independent information from web tables. Our idea of enabling interactive access to web tables should be based on automatic table detection, which is the most important aspect of web table analysis. Researches above indicated two approaches: 1) analyzing HTML documents; 2) utilizing the visual rendition of web pages. We prefer the former one, as previous methods [3, 6] of the latter one showed that it is more time-consuming and only supports those tables with pre-defined styles mainly because of relying on heuristics too much. Here, we partially adopt the machine learning based approach proposed by Wang and Hu [12] and learn genuine web tables using 7 layout features and 8 content features. Web Table Access UI. With the development of web table analyzing technologies, innovative user interfaces for accessing web tables [1, 11] also began to emerge. Asakawa and Itoh [1] developed a non-visual web table navigation method enabling both horizontal and vertical navigation with a table cursor, a table pointer and a cell-jumping key. However, they only dealt with gridded tables, that is, TABLE elements defined in HTML documents but without COLSPAN and ROWSPAN. Recently, Tajima and Ohnishi [11] proposed three modes for web table browsing on small screens: normal mode, record mode and cell mode. They concentrated on how to present a segment of a large web table as the user requires, without concerning about the relationships among data in different table cells, and they did not present any user evaluation. Compared with our iWebTable, Tajima and Ohnishi’s only provides presentation re-rendering functions like hiding unnecessary rows or columns, but cannot support advanced functions like sorting data of the entire table ordered by the data of current row or column. 3 iWebTable The enabling of interactive access to web tables should be based on automatic table detection and table content understanding. We designed and implemented a plug-in for the Microsoft Internet Explorer, called the iWebTable, which provides a customized user interface supporting interactive access to genuine web tables. The user interface of iWebTable is illustrated in Figure 1. It is composed of the following three modules, step by step: • Step 1. Pre-processor: automatically detecting genuine tables by analyzing the HTML document of the given web page based on machine learning. Enabling Interactive Access to Web Tables 763 • Step 2. Interpreter: understanding the contents of a genuine web table and re-constructing a customized structure for the table. • Step 3. Controller: creating a customized user interface supporting interactive access to a genuine web table. Fig. 1. The user interface of iWebTable. The six buttons on the toolbar are (from left to right): Ascent Sorting, Descent Sorting, Iinitial Order, Hiding Info, Showing Info and Restoring All. The scroll bar is used to adjust the transparency of the toolbar. 3.1 Pre-processor We follow previous works [3, 6, 12] and regard each web table as either genuine or non-genuine. Given a web page, denoted as P, the pre-processor automatically detects all genuine tables in the page by analyzing the HTML document and constructs a genuine table set, denoted as GTSP = {GTP1, GTP2, ..., GTPn}, each GTPi (i = 1, 2, ..., n) representing a genuine table in the web page. Firstly, all TABLE elements are extracted from the HTML DOM tree, which can be easily achieved by using the API of the Microsoft Internet Explorer. Secondly, the extracted tables are classified into either genuine or non-genuine by machine learning. Here we partially adopt the automatic web table detection method proposed by Wang and Hu [12] and represent each table as a 15-dimensional feature vector, including 7 layout features and 8 content features. Layout features are calculated based on row numbers, column numbers and cell length. Content features are calculated based on the number of different content types of table cells, such as: image, form, hyperlink, alphabetical, and so on. Note that Wang and Hu also suggested several word group features, but we decide not to use them because: 1) adding the word group features can hardly improve the performance of the classifier; 2) it can save a lot of time without using word group features. 3.2 Interpreter Using the genuine table set GTSP constructed by the pre-processor as input, the interpreter automatically understands the contents of each GTPi from three aspects: 1) determining table type; 2) identifying cell data type; 3) re-constructing table structure. For the first aspect, we classify genuine web tables into three sub-categories according to different table header styles, including: column-wise, column-row-wise and row-wise. The type of each GTPi is decided depending on the layout of TH elements within it. If some GTPi contains no TH element, it will be classified as default (column-wise) and each cell in the first row will be regarded as a table header. 764 X. Yang, W. Xu, and Y. Shi Table 1. Heruristics for identifying cell data type No. Description H1 If the cell contains FORM element, the cell is identified as form, otherwise use H2. H2 If the cell contains IMG element, the cell is identified as image, otherwise use H3. H3 If the cell contains A element, the cell is identified as hyperlink, otherwise use H4. H4 If the inner text length (not including blanks) of the cell is 0, the cell is identified as empty, otherwise use H5. H5 If the inner text of the cell is consisted of only digital numbers, the cell is digit, otherwise use H6. H6 If the inner text of the cell is consisted of only digital numbers and alphabetical characters, the cell is alphabetical, otherwise the cell is others. For the second aspect, we define seven basic data types for table cells, including: image, form, hyperlink, alphabetical, digit, empty and others, each corresponds to the content type of the same name used by the pre-processor. The type of each cell in GTPi is identified based on six heuristics, namely H1-H6 , as showed in Table 1. For the third aspect, we re-construct each GTPi as a data matrix, denoted as MPi. The width of MPi is equal to the maximum cell count in a row in GTPi, while the height MPi of is equal to the row count of GTPi. Each element of MPi represents a corresponding table cell in GTPi. Table cells with ROWSPAN or COLSPAN tags are represented by multiple matrix elements of the same properties. Each matrix element is recorded as an attribute-value pair with the header information, denoted as < <data-type, content>, isHeader, headerInfo>. Compared to the DOM structure of a TABLE element, our customized structure can be traversed with much lower cost. 3.3 Controller Based on the re-constructed structure MPi, the controller creates a customized user interface supporting interactive access to the corresponding genuine web table GTPi. We implement the controller as a floating toolbar with adjustable transparency, which Table 2. Functions provided by iWebTable No. Name Description F1 Ascent Sorting Sorting data of the entire table ascendingly ordered by the data of current column or row. F2 Descent Sorting Sorting data of the entire table descendingly ordered by the data of current column or row. F3 Initial Order Restoring data of the entire table to initial order. F4 Hiding Info Hiding current column or row. F5 Showing Info Showing current column or row next to current one. F6 Restoring All Restoring the table to the initial state. Enabling Interactive Access to Web Tables 765 automatically appears when the cursor is hovered on a header cell. The toolbar is consisted of six buttons, as illustrated in Figure 1. Table 2 presents the function of each button. By pressing these buttons, users can control the appearance of the table in different ways without changing the layout of the web page. 4 Experiment The experiment is designed to evaluate the effectiveness of iWebTable on enabling interactive access to genuine web tables, by two steps: − Step 1. Setting up the pre-processor of iWebTable. − Step 2. Conducting a user study on iWebTable. 4.1 Pre-processor Set Up We set up the pre-processor by: 1) constructing the training data set for learning genuine web tables; 2) selecting a machine learning method used by the classifier. The training data set is constructed from 3807 tables in 200 common web pages. By manual classification, we found that genuine tables accounts for only about 13.13% (500/3807) of all web tables, which is similar to conclusions in [12]. We tested on three different machine learning methods by conducting 10-fold validations, including: Naïve Bayes, SVM with linear kernel and SVM with RBF kernel. Experimental results show that SVM with RBF kernel performs the best, with precision, recall and F-measure achieving as high as 95.81%, 95.98% and 95.89% respectively. Therefore, we decide to use it to learn genuine web tables. 4.2 User Study The user study is aim to testify whether iWebTable is helpful for browsing genuine web tables and to evaluate its usability. We assume that iWebTable can greatly reduce users’ effort in typical cases as: 1) search for certain information in a large web table; 2) compare data in distant columns or rows. Based on this assumption, we designed two scenarios and assigned a specific task in each scenario, as showed in Table 3. Table 3. Scenarios and tasks designed for user study No. Scenario Task 1 You are browsing a 5-column genuine web table which displays the top 50 popular songs ordered by their rankings, with titles showed in col. 2 and the hyperlinks for trial audition showed in col. 5. Given the titles of three songs (two are listed in the table), to find out whether they are top 50 and click on the hyperlinks for trial audition. 3 You are browsing a 10-column genuine web table which displays the train schedule between two specified stations, with train number showed in col. 1 and detailed information showed in others. Given the numbers of two trains (in two distant rows), to compare their detailed information and make a selection between them. 766 X. Yang, W. Xu, and Y. Shi We recruited 5 participants, all of whom were Chinese graduate students familiar with web browsing on desktops using Microsoft Internet Explorer. They were showed a demo on how to use iWebTable in advance and then were required to complete each task in two ways: 1) firstly, with iWebTable disabled; 2) secondly, with iWebTable enabled. Table 4 shows the quantified results calculated from automatically recorded timestamps of relevant operations, with F1-F6 denoting the six functions presented in Table 2. In the first task, which requires searching for certain information in a 5*51 web table, the functions for sorting data (F1 and F2) are frequently used. While in the second task, which requires comparing data in two distant rows, the function for hiding information (F4) is most frequently used. In both tasks, participants spent much less time when iWebTable is enabled. Finally, each participant was orally interviewed about: 1) whether satisfied with each of the six functions; 2) which function is most useful; 3) the usability of the user interface. All participants reported high satisfaction with each function and agreed that F1 and F2 are the most two useful functions. However, they thought that the usability of the user interface should be improved. For an example, the icons of F4 and F5 cannot express the corresponding functions well. Results above demonstrates the effectiveness of iWebTable on enabling interactive access to genuine web tables, especially in such cases as sorting data in large tables and comparing data in distant columns or rows. On the whole, users are satisfied with iWebTable and really enjoy the interactive access mode to web tables. Table 4. Quantified results of user study Task 1 Task 2 Average times of using F1-F6 / per person F1 F2 F3 F4 F5 F6 Average Time Spent / per person Disabled - - - - - - 50.406s Enabled 1.0 1.2 0.2 0.8 0.0 0.2 24.628s Disabled - - - - - - 39.003s Enabled 0.8 0.4 0.4 1.8 0.4 0.2 26.869s Status of iWebTable 5 Discussions and Future Work We propose to enable interactive access to web tables based on automatic table detection and a good understanding of table contents. We designed and implemented a plug-in for the Microsoft Internet Explorer, called the iWebTable, which enables interactive access to genuine web tables by providing: 1) presentation re-rendering functions like hiding unnecessary rows or columns; 2) advanced functions like sorting data of the entire table ordered by the data of current row or column. In spite of the promising experimental results, iWebTable has two limitations. First, it cannot handle web tables not presented by HTML TABLE elements, such as Enabling Interactive Access to Web Tables 767 formatted by CSS. This is the instinct drawback of detecting web tables by using HTML code. Second, as reported in the oral interview, its user interface needs to be improved. For an example, the icons for toolbar buttons should be more expressive. In the future, we will improve the user interface of iWebTable based on suggestions collected from oral interviews in user study. We also would like to extend the functions of iWebTable and migrate it to small screens. Acknowledgements. This paper was supported by Specialized Research Fund for the Doctorial Program of Higher Education (No. 20050003048). References 1. Asakawa, C., Itoh, T.: User Interface of a Nonvisual Table Navigation Method. In: ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 1999), pp. 214–215. ACM Press, New York (1999) 2. Cohen, W.W., Hurst, M., Jensen, L.S.: A Flexible Learning System for Wrapping Tables and Lists in HTML Documents. In: 11th International Conference on World Wide Web (WWW 2002), pp. 232–241. ACM Press, New York (2002) 3. Gaterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards Domain-Independent Information Extraction from Web Tables. In: 16th International Conference on World Wide Web (WWW 2007), pp. 71–80. ACM Press, New York (2007) 4. Hassan, T., Baumgartner, R.: Table Recognition and Understanding from PDF Files. In: 9th International Conference on Document Analysis and Recognition (ICDAR 2007), Washington, DC, USA, pp. 1143–1147. IEEE Computer Society Press, Los Alamitos (2007) 5. Hurst, M.: Classifying TABLE Elements in HTML. In: 11th International Conference on World Wide Web (WWW 2002), Poster Paper (2002) 6. Krüpl, B., Herzog, M.: Visually Guided Bottom-Up Table Detection and Segmentation in Web Documents. In: 15th International Conference on World Wide Web (WWW 2006), pp. 933–934. ACM Press, New York (2006) 7. Lim, S.J., Ng, Y.K.: An Automated Approach for Retrieving Hierarchical Data from HTML Tables. In: 8th ACM International Conference on Information and Knowledge Management (CIKM 1999), pp. 466–474. ACM Press, New York (1999) 8. Liu, Y., Bai, K., Mitra, P., Giles, C.L.: Automatic Searching of Tables in Digital Libraries. In: 16th International Conference on World Wide Web (WWW 2007), pp. 1135–1136. ACM Press, New York (2007) 9. Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table Extraction Using Conditional Random Fields. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), pp. 235–242. ACM Press, New York (2003) 10. Ramel, J.Y., Crucianu, M., Vincent, N., Faure, C.: Detection, Extraction and Representation of Tables. In: 7th International Conference on Document Analysis and Recognition (ICDAR 2003), Washington, DC, USA, pp. 374–378. IEEE Computer Society Press, Los Alamitos (2003) 11. Tajima, K., Ohnishi, K.: Browsing Large HTML Tables on Small Screens. In: 21st Annual ACM Symposium on User Interface Software and Technology (UIST 2008), pp. 259–268. ACM Press, New York (2008) 768 X. Yang, W. Xu, and Y. Shi 12. Wang, Y.L., Hu, J.Y.: A Machine Learning Based Approach for Table Detection on The Web. In: 11th International Conference on World Wide Web (WWW 2002), pp. 242–250. ACM Press, New York (2002) 13. Yang, Y.C., Luk, W.S.: A Framework for Web Table Mining. In: 4th ACM CIKM International Workshop on Web Information and Data Management (WIDM 2002), pp. 36–42. ACM Press, New York (2002) 14. Yoshida, M., Torisawa, K., Tsujii, J.: A Method to Integrate Tables of the World Wide Web. In: 1st International Workshop on Web Document Analysis (WDA 2001), pp. 31–34 (2001) Integration of Creativity into Website Design Liang Zeng1, Robert W. Proctor2, and Gavriel Salvendy1,3 1 School of Industrial Engineering, Purdue University, 315 N Grant St, West Lafayette, IN 47907, USA 2 Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907, USA 3 Department of Industrial Engineering, Tsinghua University, Beijing 100084, P.R. China zeng@purdue.edu, proctor@psych.purdue.edu, salvendy@purdue.edu Abstract. The desire to achieve a sound design of a product and its interaction with humans runs the gamut from the simplest hardware products to the most complex information technology systems. This paper proposes a conceptual framework highlighting the central role of creativity in ergonomic design of websites. The integration of creativity helps to achieve synergy of the three dimensions in ergonomic design: functionality, usability, and affectivity. A factor structure of website creativity is further discussed in terms of its relation to the ergonomic design framework. Suggestions for the realization of website creativity are provided, and future research directions are discussed. Keywords: Creativity, e-commerce, information technology, website design. 1 Introduction The first two technological revolutions (the steam engine revolution of the early 19th century and the electricity/internal combustion revolution of the early 20th century) boosted the variety of human products, brought prosperity to the commercial market, and fueled development of the economy. Though the impacts of those revolutions were substantial, they seem minor compared to that of the third technological revolution, with its hallmark of Information Technology (IT). The inventions of computers, the internet, and various IT products have enhanced the quality of life in many respects. The advent of the information age makes it possible for people to enjoy extensive interactions with IT products and services. The most salient characteristics of IT products are that they are generally more complex than ordinary consumer goods and provide intangible and dynamic services. It is those traits that distinguish IT products from traditional hardware goods [1]. The advent of IT also greatly intensifies market competition and calls for sustained generation of creativity. Companies need to launch innovative products and services in order to beat their rivals. In today’s commercial market, with increasingly heightened competition, a corporation merely providing products and/or services like those developed by others finds it progressively more difficult to increase its market share. The J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 769–776, 2009. © Springer-Verlag Berlin Heidelberg 2009 770 L. Zeng, R.W. Proctor, and G. Salvendy continuous power of creativity calls for the recognition that it is one of the most important determinants of a company’s competitive advantage. Without creativity, long-term failure is a near certainty [2-4]. As a representative of IT products and hypermedia computer-mediated environments, web-based applications and services now play a pivotal role in everyday social life. The World Wide Web (WWW) has risen to be the most versatile mass medium and also a global platform used by individuals, organizations, and governments throughout the world [5]. Over the past decade, the number of businesses that have established their presence on the WWW has been skyrocketing. The websites of companies, institutions, and universities serve as their online shop front. People interact with those websites via the Internet and form their first impressions. All these entities intend to attract prospective customers, partners, or students by making their websites appealing to the target groups [6]. Commercial applications have been developed to explore the capabilities provided by websites. In 1996 over half of the 200 most heavily advertised brands in the United States were represented with websites [5]. Netcraft, an Internet monitoring company that has tracked website growth since 1995, reports that in January 2007 there were 106,875,138 websites with domain names and content on them, compared to merely 18,000 websites in August 1995. Websites, acting as hubs of communication, entertainment, and commerce, have played a major role in daily life. The main focus of this article is on the design of creative and successful websites, which have the power to attract customers and influence their purchase intentions. A conceptual framework regarding the role of creativity in website design is proposed. Then, factors for website creativity are discussed in terms of its relationship with the conceptual framework. 2 The Role of Creativity in Website Design The desire to achieve a sound website design with which humans can easily interact has gained much attention in the area of Human-Computer Interaction (HCI). Consumers are becoming more and more demanding, and thus how to win those increasingly picky customers has become a key to market success [7]. Traditional ergonomic design of websites typically embraces two fundamental dimensions, functionality and usability (see Fig. 1). First and foremost, the design of a website should satisfy its functional needs. Functionality is mainly concerned with what a website provides in practical terms, the level at which it performs its functions, and the different options it offers the customer [7]. Sound functional design can thus be regarded as a hygiene factor, according to Herzberg’s [8] motivation theory. The absence of functionality causes user dissatisfaction, yet the presence of good functionality may be taken for granted by customers and not necessarily lead to user satisfaction. A website with sound functionality should also be easy to use. The importance of usability in website design and more broadly in HCI design has long been emphasized in the human factors and ergonomics literature [9]. Diversified instruments have been developed to evaluate usability and make a website easy for human use. Usability can be considered to be a combination of both hygiene and motivational factors, since it may foster user satisfaction. Integration of Creativity into Website Design 771 Fig. 1. Integration of Creativity in Website Design Nonetheless, mere design for performance and usability is no longer sufficient. In other words, a website satisfying these two fundamental aspects is still far from being competitive in the market’s intense competition. Not only must the website perform its functions well and be easy to use, but it should also support an overall experience attractive and pleasant to customers, which calls for considering the user’s affect and pleasure. The significance of affective design (a motivational factor) has been highlighted in the literature, and it is suggested that the three dimensions (functionality, usability, and affectivity) of ergonomic design should go hand in hand to create a sound product or service [7, 10, 11]. However, there is still large room for improvement beyond meeting the aforementioned design requirements, and creativity is the catalyst that can make such difference. Searching for ways of achieving a sound website design can be characterized as a problem-solving activity [9]. The literature in creative cognition suggests that creativity involves viewing a problem from various perspectives at different abstract levels, each leading to a specific problem formulation [12-14]. The three dimensions of website design requirements imply general objectives that can be clearly defined and formulated as problems in accordance with a specific design context. Creative problem solving involves exploration and transformation of conceptual spaces, and creativity is germinated when previously unrelated patterns of thought get connected to produce a creative product [1, 15]. There is usually more than one method meeting design goals. Effectively applying creative problem-solving techniques (brainstorming, morphological analysis, etc.) can facilitate individual or team cognition in generating varieties of design alternatives, which ultimately leads to creative products that better satisfy those specified design goals by way of iterative ideation-evaluation phases [16-18]. Therefore, creativity is an integral part of the design process. By integrating creativity into website development can a designer realize synergy of those three dimensions of ergonomic design and come up with a creative website that functions better, is easier to 772 L. Zeng, R.W. Proctor, and G. Salvendy use, triggers more of the customer’s pleasure, and thus becomes more commercially competitive (see Fig. 1). Heed was once only paid to issues regarding functionality and usability, but further requirements regarding creativity and desirability have now emerged. There is a need to appreciate the sum total of the customer’s experience with the product and service, rather than just focusing on utility and usability [10]. Previous research indicates that creativity matters – creativity has a halo effect that renders the product more appealing to customers. Research in web design has found that a creative website interface is preferred by users even if its usability is degraded [19]. Therefore, creativity is an important source adding both supplemental and substitutional value to a website [20, 21]. Yet, it is the supplemental value creativity affords that should be emphasized. When a website is creatively designed in terms of functionality, usability, and affectivity, it can achieve larger success. The ever-changing market demand structure galvanizes a shift from the product-based to value-based competition, and thereby creativity (being an important value-adding source) should be addressed in the development of the full spectrum of web-based applications and services. Developing novel and useful web-based applications and services to achieve market success is the key objective for most corporations in web business. The nature of this pursuit is a subjective endeavor. Never can there be a website that is considered to be the best and well-liked by everyone. That is the nature of website design, where every individual or team could come up with a different idea, and the success of a design cannot be evaluated fully until it is released to the target market. It is creativity that serves as the engine propelling the never-ceasing evolution of websites. The designer’s goal should be to develop websites that are novel and appropriate, and which fit the application context well [1, 22]. Horn and Salvendy [11, 20, 23] found that creativity of traditional hardware products could considerably increase the consumer’s purchase intention. Subsequent research of IT product creativity further reveals that website creativity can also shape the consumer’s web use behavior as well as purchase intention when visiting an e-commerce website [21]. It is proposed that website creativity will increase the site’s perceived attractiveness, ease of use, and usefulness, which ultimately predict the user’s behavioral intention to visit, remain on, and revisit the website, as well as the user’s satisfaction and the site’s profitability [1, 21, 24]. So, to sum up, creativity plays a pivotal role in website design. Ergonomic design of websites integrating creativity helps to realize synergy of functional, usable, and affective design, with the aim of developing creative websites that are more creative, appealing, and commercially viable. 3 Factors Influencing Website Creativity Creativity can be defined in terms of creative personality, process, product, and press (the environment or context where creative productions are produced), depending on specific interest of the study and the associating context [1, 15, 20]. This paper takes an outcome-based perspective and adopts a definition of website creativity as “the subjective judgment of a website to exhibit novelty and appropriateness that elicits arousal and pleasure and is compatible with the user’s preferences” (21, p. 568). Integration of Creativity into Website Design 773 Aesthetic Appeal Personalization Interactivity Website Creativity Commonality and Simplicity Novelty and Flexibility Importance Affect Fig. 2. Factor Structure of Website Creativity Horn and Salvendy [11, 20, 23] developed a factor structure of traditional hardware product creativity. Due to the fact that IT products (and especially websites) differ appreciably from traditional hardware products in various aspects, a factor model that better captures such differences is indispensable for thoroughly studying this concept and providing design guidelines boosting website creativity [1]. A factor structure for website creativity has been developed, with seven key factors/dimensions: Aesthetic Appeal, Interactivity, Novelty and Flexibility, Affectivity, Importance, Commonality and Simplicity, and Personalization (see Fig. 2), explaining 63% of the total variance associated with website creativity items. This factor model has both construct validity and predictive validity, and 62% of the total variance regarding the prediction of the user’s overall preference towards creative websites was explained by significant factors [21]. Aesthetic and visual virtues add to the appeal of websites [1, 25, 26]. Interactivity is one of the most important determinants for excellent website design, and the increase of interactivity positively affects the user’s “perceived satisfaction, effectiveness, efficiency, value, and overall attitude” towards the website ([27], pp. 281, [28-30]). A third dimension, novelty and flexibility, reflects the dynamism of web services. Novelty is considered as a crucial determinant of creativity [20]. Flexibility implies continuous updates and management of the website’s configuration, content preparation, and 774 L. Zeng, R.W. Proctor, and G. Salvendy interaction mode so as to achieve continuous improvement and foster the generation of creativity [31-35]. It is such continuous improvement that guarantees the novelty or originality of the website. A fourth dimension, affectivity, is concerned about emotional impacts of website creativity, which generally consists of two sub-dimensions: arousal and pleasure. Another major dimension, importance, deals with how important and useful the product is to the customer. It embraces two subscales: relevance and significance. This dimension reemphasizes that only those websites that are both novel and appropriate are of real creativity, echoing the definition of creativity from a pragmatic point of view which fits well with the business reality [13, 20, 21]. A sixth dimension, commonality and simplicity, suggests that there should be large quantities of and various types of components (developed by web designers) that can add creativity to web sites. Yet a trade-off between complexity, level of sophistication, and simplicity should be found. A moderate level of complexity is recommended; beyond that point the website interface would appear busy and confusing. Additionally, personalization refers to automatic adjustment of web service configurations, content, structure, and presentation, tailored to each individual consumer’s preferences. It is apparent that those seven key factors of website creativity link closely with the proposed design framework. Every factor can evoke design objectives that correspond to all three dimensions of design requirements (functionality, usability, and affectivity) and may emphasize a subset of these dimensions. For example, there are a number of technical issues related to making personalized websites a reality. Usability issues need to be taken into account because customized features may not be easy to use. In addition, personalized web interfaces with high aesthetic appeal would attract more users. Customizable features can be determined through data mining, which is a term broadly used for methods used to identify each individual customer’s behavior (by analyzing his/her interactions with the interface, purchases, repeat visits, etc.). These features can then be set to provide personalized purchase suggestions. Some websites (such as Gmail and Facebook) allow each individual customer to personalize the display of the sites so as to increase their aesthetic appeal, catering to each individual’s taste. It is suggested that website developers and managers can enhance the creativity of their websites by effectively addressing functionality, usability, and affectivity issues in light of the factor structure of website creativity. 4 Conclusion The ergonomic design framework proposed in this paper highlights that creativity plays a central role in achieving synergy of functionality, usability, and affectivity. Integration of creativity in website design depends heavily on the specific design goal of the website and its application context. Creativity can add both supplemental and substitutional values to websites, and ultimately influence the web user’s behavior. A creative website, being novel and appropriate, would increase the site’s perceived attractiveness, ease of use, and usefulness, which ultimately predicts the user’s behavioral intention and satisfaction, as well as the site’s profitability. Future research should go further toward considering the impacts of creativity on IT product and service design. Previous research in IT creativity has mainly taken an outcome-based viewpoint, yet the value of unraveling those variables and processes that lead to creative design entails a process-based Integration of Creativity into Website Design 775 perspective with application of the creative cognition approach. In business, teams have been playing a pivotal role in generating creative productions in organizational contexts. Therefore, research regarding the achievement of team-level creativity in IT product and service development with the pursuit of business success would be of much interest. References 1. Zeng, L., Salvendy, G.: How creative is your website? In: 2nd International Conference on Applied Human Factors and Ergonomics (AEI 2008), pp. 1–10. USA Publishing, Las Vegas (2008) 2. Satzinger, J.W., Garfield, M.J., Nagasundaram, M.: The creative process: the effects of group memory on individual idea generation. Journal of Management Information Systems 15, 143–160 (1999) 3. Howard, T.J., Culley, S.J., Dekoninck, E.: Describing the creative design process by the integration of engineering design and cognitive psychology literature. Design Studies 29, 160–180 (2008) 4. Couger, J.D., Higgins, L.F., McIntyre, S.C. (Un)structured creativity in information systems organizations. MIS Quarterly 17, 375–397 (1993) 5. Eighmey, J.: Profiling user responses to commercial web sites. Journal of Advertising Research 37, 59–66 (1997) 6. Andreatos, A.: A framework for web site assessment. In: IEEE Mediterranean Electrotechnical Conference, Malaga, Spain, pp. 737–740. IEEE, Los Alamitos (2006) 7. Jordan, P.W.: The four pleasures: understanding users holistically. In: 2nd International Conference on Applied Human Factors and Ergonomics (AEI 2008), pp. 1–10. USA Publishing, Las Vegas (2008) 8. Herzberg, F.: Work and the Nature of Man. World, Cleveland, OH (1966) 9. Proctor, R.W., Van Zandt, T.: Human Factors in Simple and Complex Systems, 2nd edn. CRC Press, Boca Raton (2008) 10. Helander, M.G., Khalid, H.M.: Affective and pleasurable design. In: Salvendy, G. (ed.) Handbook of Human Factors and Ergonomics, Hoboken, NJ, USA, pp. 543–572. Wiley, Chichester (2006) 11. Horn, D., Salvendy, G.: Measuring consumer perceptions of product creativity: impact on satisfaction and purchasability. Human Factors and Ergonomics in Manufacturing (in press, 2009) 12. Geschka, H., Schaude, G.R., Schlicksupp, H.: Modern techniques for solving problems. Chemical Engineering 6, 91–97 (1973) 13. Ward, T.B.: Cognition, creativity, and entrepreneurship. Journal of Business Venturing 19, 173–188 (2004) 14. Clemen, R.T., Reilly, T.: Making Hard Decisions with Decision Tools. Duxbury, Pacific Grove (2004) 15. Warr, A., O’Neill, E.: Understanding design as a social creative process. In: 6th Conference on Creativity and Cognition, pp. 118–127. ACM, London (2005) 16. Paulus, P.B.: Groups, teams, and creativity: The creative potential of idea-generating groups. Applied Psychology 49, 237–262 (2000) 17. Lubart, T.I.: Models of the creative process: past, present and future. Creativity Research Journal 13, 295–308 (2000–2001) 776 L. Zeng, R.W. Proctor, and G. Salvendy 18. Basadur, M.: Managing the Creative Process in Organizations. In: Runco, M.A. (ed.) Problem finding, problem solving, and creativity, pp. 237–268. Ablex Publishing Corporation, Norwood (1994) 19. De Angeli, A., Sutcliffe, A., Hartmann, J.: Interaction, usability and aesthetics: What influences users’ preferences? In: 6th conference on Designing Interactive systems, pp. 271–280. ACM, University Park (2006) 20. Horn, D.B., Salvendy, G.: Consumer-based assessment of product creativity: A review and reappraisal. Human Factors and Ergonomics in Manufacturing 16, 155–175 (2006) 21. Zeng, L., Salvendy, G., Zhang, M.: Factor structure of web site creativity. Computers in Human Behavior 25, 568–577 (2009) 22. Niku, S.B.: Creative Design of Products and Systems. Wiley, Hoboken (2009) 23. Horn, D.B., Salvendy, G.: Product creativity: conceptual model, measurement and characteristics. Theoretical Issues in Ergonomics Science 7, 395–412 (2006) 24. Davis, F.D., Bagozzi, R.P., Warshaw, P.R.: User acceptance of computer technology: a comparison of two theoretical models. Management Science 35, 982–1003 (1989) 25. White, B.: How to “Webby-ize” your web site - factors in award-winning design. In: 3rd Latin American Web Congress, Buenos Aires, Argentina, pp. 25–26. IEEE Computer Society, Los Alamitos (2006) 26. Laviea, T., Tractinsky, N.: Assessing dimensions of perceived visual aesthetics of web sites. International Journal of Human-Computer Studies 60, 269–298 (2004) 27. Teoa, H.H., Oha, L.B., Liua, C., Weib, K.K.: An empirical study of the effects of interactivity on web user attitude. International Journal of Human-Computer Studies 58, 281–305 (2003) 28. Hostetter, M., Kranz, D., Seed, C., Terman, C., Ward, S.: Curl: a gentle slope language for the Web. World Wide Web Journal 2, 121–134 (1997) 29. Cao, M., Zhang, Q.Y., Seydel, J.S.: B2C e-commerce web site quality: An empirical examination. Industrial Management and Data Systems 105, 645–661 (2005) 30. Kuan, H.H., Bock, G.W., Vathanophas, V.S.: Comparing the effects of usability on customer conversion and retention at e-commerce websites. In: 38th Annual Hawaii International Conference on System Sciences, Big Island, HI, USA, pp. 174–182 (2005) 31. Albert, T.C., Goes, P.B., Gupta, A.: GIST: a model for design and management of content and interactivity of customer-centric web sites. MIS Quarterly 28, 161–182 (2004) 32. Lin, H.X., Choong, Y.Y., Salvendy, G.: A proposed index of usability: a method for comparing the relative usability of different software systems. Behaviour and Information Technology 16, 267–278 (1997) 33. Smith, S.L., Mosier, J.N.: Design Guidelines for Designing User Interface Software. Technical Report MTR-10090. The MITRE Corporation, Bedford, MA, USA (1986) 34. Dysart, J.: Custom pages on-the-fly: a new model of interactivity emerges on the web. Online 22, 39–40, 42, 44 (1998) 35. Canali, C., Casolari, S., Lancellotti, R.: Architectures for scalable and flexible Web personalization services. In: 1st International Workshop on Advanced Architectures and Algorithms for Internet Delivery and Applications, Orlando, FL, USA, pp. 50–57. IEEE, Los Alamitos (2005) YVision: A General Purpose Software Composition Framework Ant˜ ao Almada, Gon¸calo Lopes, Andr´e Almeida, Jo˜ ao Fraz˜ ao, and Nuno Cardoso YDreams, Madan Parque - Sul, P-2829-149 Caparica, Portugal http://www.ydreams.com/ Abstract. Expectations for the industry of Human Computer Interaction are much higher today than they were ten or even ﬁve years ago. Innovative solutions to sense and gather information from the real world in real-time must be combined with lightning-fast computer graphics to deliver high-quality designs for the new interaction paradigms. The very combination of all these emerging technologies presents diﬃcult challenges, not only for ﬁnding good design and programming methodologies, but to encapsulate those patterns in a collection of frameworks and tools enabling rapid-prototyping and agile development. Application designers should be able to express their creative endeavours by quickly trying out diﬀerent design combinations with full access to leading edge technology. In the following we present the YVision general purpose software composition framework and show how it achieves the goal of managing the complexity and reducing the development time of parallel, data-driven, multimedia applications. 1 Introduction Interactive real-time multimedia systems have been at the core of the recent Human-Computer Interaction (HCI) revolution. By breaking free of the traditional decades-old interfaces (i.e. keyboard and mouse), such systems pioneer explorations in new design spaces allowing for more natural exchanges with computational devices. Wall and ﬂoor displays [16,1,4], interactive surfaces [7,15,14], tangible interfaces augmenting everyday physical artifacts [10,11] or even collaborative work environments exploiting social settings to enhance productivity in meaningful ways [3] have stretched the limits of how we understand the relationship between man and machine. Expectations for the industry of HCI are much higher today than they were ten or even ﬁve years ago. Previous publicly available interactive displays were mostly limited to artistic and cultural exhibitions (e.g. Zack Booth’s interactive installation artwork [12]). However, interaction today begins to permeate marketing and advertising initiatives, fostering the interest of major brands worldwide and opening a new untapped market segment [16,4,1]. There is a much more global awareness and acceptance of technology in everyday life. Rich interactive J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 779–788, 2009. c Springer-Verlag Berlin Heidelberg 2009 780 A. Almada et al. systems have the ability to arouse curiosity and interest even among individuals who may be unfamiliar with the more technical aspects of computation, by their focus on natural and familiar paradigms. Users empathize more easily with interaction models mimicking the tasks they perform routinely, allowing them to understand the possibilities made available by leading edge technology in a seamless manner. The shift from artistic and cultural mediums to large scale marketing and commercial applications was supported on key investments made by technology and IT corporations hoping to bring together the computing vision ever closer to the common citizen and to explore and launch new proﬁtable markets on computation. The development of these new interfaces requires exploration at both the hardware and software levels. Innovative solutions to sense and gather information from the real world in real-time must be combined with lightning-fast computer graphics and application logic simulation to deliver high-quality designs for the new interaction paradigms. In addition to their own speciﬁc development problems, the very combination of all these emerging technologies presents diﬃcult challenges for the development of competitive out of the box HCI software applications. The ﬁrst challenge is the sheer computational power necessary to both analyze and integrate information collected from all available sensors while keeping the application logic and visualization responsive. For instance, computer vision has become an increasingly powerful tool in the design of natural interaction, being able to extract structural, dynamical and even semantical information from the environment using video capture devices. However, the algorithms necessary for ﬁltering, pre-processing and analyzing the input signal can be extremely expensive, especially considering that they need to run at suﬃciently high-frequencies so that interaction can be deemed natural by a human user. While available computational power continues to escalate over time, the limits in processor technology have forced manufacturers to move on to multiprocessor development. This means that harnessing next-generation processing hardware to its fullest will call for a mastery of parallelism in software development. Concurrent application design remains a diﬃcult and problem-speciﬁc task, and often compromises composability and reuse of developed code. Another challenge is ﬁguring out how to incorporate all these new inputs into the traditional application development pipeline. While keyboard, mouse and joystick interactions have become quite familiar and stable foundations to application developers, the amount of new information made available by currently existing technology can be daunting. Supporting the development of cuttingedge HCI commercial products means being able to incrementally incorporate new information into existing designs with minimal development eﬀort. Incorporating new sensory information should not imply rebuilding the structure of previously developed applications. Addressing these development challenges means not only to ﬁnd good design and programming methodologies, but to encapsulate those patterns in a collection of frameworks and tools enabling rapid-prototyping and agile development. YVision: A General Purpose Software Composition Framework 781 The goal is to provide a set of fast mappings between high-level concepts and code, so that HCI application designers can express their creative endeavours by quickly trying out diﬀerent design combinations with full access to leading edge technology. YVision is a heterogeneous general purpose software composition framework with the goal of managing the complexity and reducing the development time of parallel, data-driven, multimedia applications. In the following we will attempt to show how it achieves this goal by facilitating code reuse, supplying automatic concurrency management mechanisms, enforcing good programming practices and providing a set of high-level ﬂexible mappings particularly suited for composition of HCI applications. 2 Framework Architecture YVision is a general purpose programming framework developed to enable rapid prototyping and development of parallel, data-driven, multimedia applications. The whole package integrates several diﬀerent technologies required to address the many distinct challenges emerging during the development of HCI applications, including 3D rendering, rigid-body physics or image processing and computer vision. Each of these technologies constitutes an available Module implemented on top of the more general YVision Core framework. The Core framework provides the abstract concepts for software composition. The Modules layer contains speciﬁc components for HCI such as rendering and image processing functionalities. Data-driven engines can then be built using those modular components with the variant part of each application being detailed in data ﬁles. Fig. 1 illustrates the relationship between the diﬀerent framework components. Fig. 1. The YVision framework integration stack 782 A. Almada et al. It is important to understand this layered structure of the YVision framework. The Core framework knows absolutely nothing about image processing, render primitives or sound libraries. It is rather a collection of abstract architectural elements attempting to encapsulate a set of good programming practices, also known as design patterns [5] with a special emphasis on the design of modular and composable systems. Modularity and composability are probably the two primary key concepts behind the entire YVision Core development eﬀort. Alongside these principles, the YVision Core clearly deﬁnes a set of high-level concepts for assembling an application. These concepts build on diﬀerent programming paradigms, from the functional composition of the image processing pipeline to declarative speciﬁcation of application objects. This heterogeneity stems from the realization that while logically equivalent languages may have achieved the same upper bound on expressive power, in practice we ﬁnd each of them better suited to describe speciﬁc classes of problems. The faster we can map concepts to program, the faster and more eﬃcient the development process becomes. Recognizing this important practical necessity, the YVision Core has been bred from the start to address the very distinct requirements emerging in the development of a multimedia application. Nevertheless, each paradigm is mostly self-contained and capable of being applied in its own right, and developers are free to apply the ones most suited to their speciﬁc problem. The abstract concepts and motivations for each Core section are presented below, along with the most common usage scenarios for each part of the framework. Advanced development scenarios are then progressively introduced, building up from previous sections. Methodological concepts are brought up whenever they are deemed relevant. 2.1 Dataflow Graphs Drawing workﬂows is a common way of designing and understanding many information processing scenarios. Diagram boxes abstractly represent information processing tasks. These tasks work on some set of inputs and eventually produce a set of outputs. Arrows or straight lines connect some task’s output to some other task’s input. A graph (or workﬂow) is deﬁned by a ﬁnite set of tasks (i.e. blocks; activities) and connections representing the data ﬂow between those tasks (see ﬁg. 2). In workﬂow models, the pattern of connections between tasks not only represents relations between objects in the system, but also implicitly enforces constraints on the execution model. Basically, data ﬂows along the connections and tasks execute as soon as data is available. Execution follows data ﬂow. This allows for the easy orchestration of complex data processing tasks in a succinct and ﬂexible manner. The YVision Graph framework enables the explicit use of the workﬂow abstraction to build computer programs. Using an object-oriented architecture based on functional composition, the programmer can develop application functionalities by encapsulating them inside atomic black boxes, or blocks. More YVision: A General Purpose Software Composition Framework 783 Fig. 2. Some common workﬂow representations complex functionalities or even entire programs can then be assembled by composing available blocks together using data ﬂow connections. A block is simply a collection of input, output and property pins. Input and output pins receive and deliver ﬂow data and activate information processing tasks using event handlers. Property pins specify customizable parameters controlling how the data is to be handled. The workﬂow paradigm has several advantages in specifying, at diﬀerent abstraction levels, the control and data ﬂow of an application. By isolating implemented functionalities inside block boundaries, the programmer is encouraged to develop decoupled, and hence reusable, functionality. As long as some valid data can be fed to the block, the task code can be reused in arbitrary applications. Dataﬂow graphs are mainly used to compose signal processing pipelines, such as in computer vision or audio processing. They are also commonly used to specify the main application loop. Modern interactive applications often deﬁne one or more internal loops composed of diﬀerent tasks such as rendering, physics, acquiring user input or simulating the application logic. Dataﬂow graphs are especially suited to compose and manage complexity at the macroscopic application execution level. Adding a new operation to the render loop or replacing a ﬁlter in the image processing pipeline is simply a matter of adding a new block or rewiring a connection. Furthermore, the functional nature of the dataﬂow’s execution model easily allows for attribution of speciﬁc subsections of the graph to distinct processors. The YVision framework cleanly separates this concern from the dataﬂow’s functional connectivity. Events raised from parallel threads will be automatically marshalled to the correct execution context in such a way that internal block execution code is shielded from synchronization concerns. Each execution context is assigned its own system thread. The framework also provides block components for controlling ﬂow execution, allowing for the synchronization of critical path subsections. 2.2 Object Composition Multimedia interactive applications like games and other virtual (or augmented) reality environments are concerned with the deﬁnition of the objects that 784 A. Almada et al. compose the game world. These objects often exhibit very diﬀerent properties and characteristics. Some have distinct visual representations, others have rigidbody physics, yet others display autonomous behavior, emit sounds, are controlled by user input, etc. Despite the countless combinations and possibilities in deﬁning these objects, many functionalities are similar across diﬀerent types of objects, like rendering or physics. Code reuse of shared functionality is essential for managing complexity in large projects, and for this reason the game industry has contributed extensive research eﬀorts to the development of object management models. The YVision framework has followed the more recently proposed approach of using component-based architectures [8,6,13] to model the world objects. In this paradigm, objects are simply collections of components, and the overall object behavior and characteristics are deﬁned by the properties and interactions between these components. Traditional designs relied on inheritance-heavy hierarchies to deﬁne the objects composing the game world, but experience and growing game complexity have shown this approach to be brittle [8]. A component is responsible for the data and behavior of one speciﬁc application aspect. Each component is thus an independently developed black-box exposing a certain interface. In its purest form, a component is implemented with no speciﬁc knowledge of the object in which it is placed and is completely independent of other components. In practice, components need to be able to communicate with one another, but they do so only through well-deﬁned interfaces, so that implementation independence can be maintained. By adhering to the tenets of object composition, the YVision framework allows for the creation of entirely new objects simply by assembling existing components. These components can also be parameterized to ﬁt speciﬁc needs. This means that as long as existing functionality suﬃces for creating our objects, there is no need to develop new code at all, as some speciﬁc parameterized composition will provide some approximation of the result we are aiming for. The upper dataﬂow graph layer provides not only the execution framework and context for the speciﬁed world objects, but can also generate new objects for the world. Image processing pipelines usually terminate with the creation of interactive virtual objects which are then embedded in the application logic where they can communicate autonomously with the other persistent objects. 2.3 Behavior Composition Interactive objects achieve self-containment and autonomy in the world through a set of behaviors which lay out their rules for action. A central task in the deﬁnition of world objects for any application is thus the speciﬁcation of object behavior, application logic or even artiﬁcial intelligence for autonomous virtual agents. Behavior logic usually takes up the role of linking together distinct components in a world object. The need for such generalized component access may lead to increased coupling between behavior code and all the other components YVision: A General Purpose Software Composition Framework 785 in a world object. However, it is important that this coupling does not propagate to the behavior composition itself. The YVision framework enforces behavior modularity by providing a single common interface to all behavior implementations. Behaviors are deﬁned as latent tasks, since they can take up several application cycles to be evaluated, and can either succeed or fail. Complexity is then built in the system by hierarchically composing the behaviors in a tree-like manner [2]. Sequences and selectors are simple ways of composing behaviors together, and are logically equivalent to the and and or nodes in Hierarchical Task Network planners [9]. Hierarchical logic is an excellent way to manage complexity in the speciﬁcation of object behavior, allowing for ﬂexible yet detailed control over low-level behaviors [2]. Each leaf in the behavior tree is a procedural implementation which can be reused arbitrarily in diﬀerent levels of the hierarchy thanks to the common behavior interface. 2.4 Settings Multimedia interactive applications often need calibration steps to be performed in order to function as expected. Several parameters are usually available to achieve this, situated at diﬀerent levels of the application architecture. This is often achieved by developing graphical user interfaces providing access to the conﬁguration parameters. Since application settings are spread out over the diﬀerent application layers, conﬁguration code needs to know about related objects and their properties. If care is not taken, the conﬁguration code can easily contaminate an otherwise well-designed framework. While it is true that an object’s conﬁguration code is more easily developed in concert with the object deﬁnition itself, it is bad practice to add new methods and functionality to the object so that this speciﬁc conﬁguration can be accessed. One way to achieve this requirement without the aforementioned interface contamination is to specify conﬁguration hooks in each object by annotating its properties and deﬁnition with metadata. Metadata can associate any kind of information with objects and their properties and methods without compromising the functional interface. This methodology has the additional advantage that metadata can even be overriden at run-time in order to change some of the object’s characteristics without having to recompile the module containing the object. A general conﬁguration framework can then be developed to analyze and use this metadata to instantiate, at run-time, an adequate graphical user interface for any given object. Such a general conﬁguration framework is implemented by the YVision Settings. 3 Data-Driven Software Composition From the moment that modularity and composability became well established in the framework it was possible to think about data-driving the composition 786 A. Almada et al. Fig. 3. The dataﬂow graph for the interactive application Bubbles. The upper dataﬂow speciﬁes the image processing pipeline including contour extraction and dynamics analysis. The lower dataﬂow deﬁnes the main render loop. Each of these pipelines can be assigned to a single core on a multi-core machine without compromising the functional connectivity. Fig. 4. The behavior tree specifying the logic of a bubble. The solid nodes represent conditions and the dashed nodes represent actions. The behavior composition relies on a sequence of three basic bubble states: creation, lifetime and destruction. itself. Working with highly decoupled and modular systems allows for massive code reuse, but it is still necessary to specify how the diﬀerent components will be pieced together in order to form a coherent whole. Since the entire system is built up around well-deﬁned interfaces, this composition process can be automated to work from scripts which are stored alongside YVision: A General Purpose Software Composition Framework 787 with the application data resources themselves. This means that, at some level, application behavior is driven by its own data. Multimedia applications can have very complex data deﬁnitions, including outputs from several tools such as 3D modeling applications or other visual editors. In order to maintain a certain degree of uniformity and accessibility across all these deﬁnitions, XML was chosen as the main representation for data scripts, being a human readable and highly ﬂexible data exchange language. A common YVision application usually contains one or more dataﬂow graph layers, usually specifying signal processing pipelines and the main render loop. Figures 3 and 4 illustrate the speciﬁcation of the main application graph and hierarchical logic layer of the Bubbles application. In this application, virtual soap bubbles are projected in a vertical surface. Participants can then use their shadow to interact with the bubbles, popping them and pushing them following physically plausible rules. The bubbles are simulated using a correctly parameterized rigid-body physics engine and can interact both with the structure and dynamics of users. Changing from physical soap bubbles to volley balls or other physical objects would simply be a matter of changing the visual component parameters and adjusting the physical properties in the object scripts so that new interaction reactions are generated. Diﬀerent kinds of objects can even be combined so that bubbles and volley balls coexist in the same application. 4 Conclusions and Future Work The YVision framework is currently being employed in a production environment where it has contributed to increase code quality and the robustness of developed applications due to eﬀective code reuse. Although at this stage it is mostly being used by programmers, it still contributes greatly to the rapid-prototyping of HCI applications, by allowing quick experimentation and composition of available and previously used image processing ﬁlters and analyzers. In the two years following its ﬁrst oﬃcial release, the framework has been applied successfully to nearly a hundred of HCI applications and continues to grow with each new challenge it meets. New components and modules developed for speciﬁc applications have been successfully generalized and reapplied in applications with similar requirements. Development is actively taking place for new HCI computer vision modules such as object tracking and facial expression recognition. The Core framework is also evolving, hopefully becoming even simpler so that any new programmer can be quickly integrated and start producing more modules and components using it. The general purpose nature of the framework is also being put to the test as YVision begins to be applied to other application domains such as autonomous robot control. The conclusion of the study of Domain Speciﬁc Languages for the speciﬁcation of the ﬂexible part of the framework will also be a major step forward, as the high-level concepts can be brought forth visually into an integrated development 788 A. Almada et al. environment where the various aspects of a HCI application can be deﬁned as models and then be immediately instantiated as a working prototype. At this point in time, software composition and prototyping will be available for direct experimentation by interaction designers, thereby increasing productivity and focusing programmer development on new functionalities and core engine code. References 1. CatchYoo, http://www.catchyoo.com/ 2. Champandard, A.J.: AiGameDev (2003-2008), http://aigamedev.com/ (cited 2008) 3. Dourish, P.: Where the Action Is: The Foundations of Embodied Interaction. MIT Press, Cambridge (2001) 4. EyeClick, http://www.eyeclick.com/ 5. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns. Addison-Wesley Professional, Reading (1995) 6. Garces, S.: AI Game Programming Wisdom 3, Charles River Media, pp. 251–263 (2006) 7. Kaltenbrunner, M., Bencina, R.: reacTIVision: A computer-vision framework for table-based tangible interaction. In: Proceedings of the First International Conference on Tangible and Embedded Interaction (TEI 2007), Baton Rouge, Louisiana (2007) 8. Rene, B.: Game Programming Gems 5, Charles River Media, pp. 25–37 (2005) 9. Russell, S.J., Norvig, P.: Artiﬁcial Intelligence: A Modern Approach. Pearson Education, London (2003) 10. Baraldi, S., Benini, L., Caﬁni, O., Del Bimbo, A., Farella, E., Landucci, L., Pieracci, A., Torpei, N.: Introducing TANGerINE: A tangible interactive natural environment. In: Proceedings of ACM MultiMedia 2007, Augsburg (September 2007) 11. Schmieg, S.: Roy block, http://www.kingcosmonaut.de/royblock/ 12. Simpson, Z.B.: Mine control, http://www.mine-control.com/ 13. Stoy, C.: Game Programming Gems 6, Charles River Media, pp. 393–403 (2006) 14. Wellner, P.: The digitaldesk calculator: Tangible manipulation on a desk top display. In: Proc. ACM SIGGRAPH Symposium on User Interface Software and Technology, pp. 107–115 (1991) 15. Wilson, A.: PlayAnywhere: A compact tabletop computer vision system. In: Symposium on User Interface Software and Technology, UIST (2005) 16. YDreams (2000-2008), http://www.ydreams.com/ Collaborative Development and New Devices for Human-Computer Interaction Hans-Jörg Bullinger and Gunnar Brink Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Hansastraße 27c,80686 München hans-joerg.bullinger@zv.fraunhofer.de www.fraunhofer.de Abstract. The article pays tribute to the emergence in 1993 of graphical browsers that allow users to address electronic information with a point-and-click interface, and places this development on a par with other important historical events that shaped society and the life of the individuals. It describes the resistance that some voiced at the time to the prompt economical utilization of the Internet's new possibilities. It goes on to describe current technical developments in the human-computer interface environment that could be very, perhaps even comparably, important. It concludes with an appeal for the courage to develop technical innovations, particularly in difficult economic times. Keywords: Context-aware services, pervasive gaming, collaboration, semantic web, testing methods, interfaces and peripherals, mini-projector, augmented immersive 3D displays, acoustic wave field synthesis. 1 Introduction There are certain events that most of us remember fairly accurately if we are old enough. You know exactly what you were doing the day the Berlin Wall fell, just as you remember when you found out that the World Trade Center had been destroyed by terrorists with captured airliners, or what you were doing when you learned that Kennedy had been assassinated. Armstrong's steps on the moon: such moments in history are engraved in our memories if we have had the fortune or misfortune to live through them. There is another event that I can also remember very well: In 1993, a colleague at the Fraunhofer Institute showed me an innovative program: Mosaic 1.0, one of the first browsers with a graphical interface. Of course, we already had an Internet connection at that time, and we used the E-mail function extensively. But compared to today, using the Internet then was a painstaking business. It took at least a handful of UNIX control commands for the simplest matters. Even back then, however, we had more than you might think, for example, e-mail lists, the forerunners of today's Internet forums. In 1993, in an institute (FIT) that today belongs to the FraunhoferGesellschaft, BSCW (Basic Support for Cooperative Work) was developed. In 1995 this was the first fully Web-based groupware system [1] (European Software Innovation Prize 1996). J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 789–795, 2009. © Springer-Verlag Berlin Heidelberg 2009 790 H.-J. Bullinger and G. Brink But Mosaic and all subsequent browsers radically changed the Internet, because they changed the way in which we use the Internet: intuitively, graphically and with a substantially higher quality experience. "Mosaic" NetScape was already available in 1994, followed by Netscape Navigator. Perhaps some of you still remember how the new Netscape business model unsettled business economists: the product was supplied at no cost. Only later and improved versions required payment of license fees. At that time, this type of market entry seemed revolutionary. Today we use Explorer, Firebird, Safari or Opera as browsers. Each browser is better in some areas than the others. It is hard to imagine that before 1993, there was not a single browser like these in circulation. What began so quietly in 1993, an Internet that could be used graphically and intuitively started a revolution. JavaScript, frames and tables soon made their appearance. The first Java programs were integrated. The first homepages for private individuals became fashionable around 1996. And from 2000, the Internet became a daily topic of conversation, and so its own safeguard. 2 A Look Back A number of years ago, Gary Hamel described the impact of the introduction of browsers in the business world in an impressive article in the Harvard Business Review [2]. In 1994, at IBM, which at that time was experiencing great economic difficulties, there was an employee named David Grossman, who had also already loaded the abovementioned Mosaic browser on to his UNIX computer in order to familiarize himself with the graphical world of the Internet. Completely without the knowledge of the topmost management level, he formed a small team; the first corporate Intranet was set up, and a team of around 300 activists was put together – a growing movement that developed IBM's first homepage. Only after this homepage was finished was the CEO at that time, Lou Gerstner, included. He was quickly convinced of the new possibilities. It took Gerstner's support to convince the many managers in the hierarchy of the opportunities offered by the Internet. From this event, Grossman developed a few rules, the five most important of which I would like to quote for you here: 1. 2. 3. 4. 5. Start simple; grow fast. Trial by fire. Skip the krill (go to the top of the food chain when you're trying to sell your idea). Take risks, make mistakes quickly, fix them fast. Don't get pinned down (to any one way of thinking). I could relate any number of similar events at this point. We were ourselves very active in introducing HTML into the factory workshop, into manufacturing [3], [4], [5]. Today most engineers understand how expedient it is to use Web-based systems for designing and manufacturing. This includes such things as design information integration, remote system control, the use of HTML and Java programming, client-server architectures and, above all, the active inclusion of user interface design and human-machine interaction. In those days, when we talked to the responsible people in companies, most of them saw the Internet as just a game, something for children. Or they were worried by the reports in the media about the unsavory aspects of the Net. Serious applications in the producing trades? Most considered Collaborative Development and New Devices for Human-Computer Interaction 791 these to be out of the question. We, or other people, or simply reality convinced them in a process that took many years. 3 What Happens Next The past is important, because we want to learn from it for the future. Which trends do we see? Where can we see developments in interaction technologies today that are significant in the way that the Internet and browser were in 1993? How can we use them effectively and rapidly? Today, many people take it for granted that they can be reached anywhere at any time, that they have constant access to e-mail and data at work and at home, that they can listen to music on an MP3 player, skip through their digital photo collections and always find the right route thanks to GPS and satnav. The rapid progress in information and communication technology has made all this possible at the touch of a button. This technology will continue to shape and facilitate our everyday lives. Tiny sensors and wireless communications are the key to this future, imparting new functions to passive objects such as carpets, clothing and windows. As a result, they become active objects which automatically adapt to the user. And they deliver valuable information to communication networks, which for example help drivers to avoid traffic jams or doctors to check the health of patients. How many of the ever-present, invisible helpers are actually used is a matter of personal choice: for instance, in the intelligent house, which autonomously ensures safety, security and optimal energy use, or in consumer electronics systems wirelessly networked throughout the house. People will also be able to deploy their own software agents to handle appointments and find information on the Internet. The best interaction with services is ultimately the one that the user does not even experiences as an explicit interaction. If services are embedded in the surroundings or if services are context aware, it is possible to reduce the extent of interaction for the user and to increase the accuracy of the support. Context-aware services are an enhancement of user-adaptive services [6]. The Fraunhofer Institute FIT has developed practical solutions in the field of context-aware services, e.g., for guide systems for trade fair and museum visitors [7], for assisting warehouse clerks [8] and for situated learning [9]. We furthermore expect that in the future, too, important impetuses for "serious applications" will come from the computer game segment. Games are an important part of our society. In addition to being plenty of fun, they make it possible to learn new skills and to try these skills out in a safe social environment, and also to explore where our own skills and limits lie. New human-computer interfaces allow users to immerse themselves in highly detailed virtual worlds, play with people from other countries and other cultures and experience the games together [10]. Pervasive gaming removes the limits of conventional games in a real (social and physical) environment and expands (virtual) game worlds for players and spectators. Various electronic media are used here, such as electronic music, video, the Internet, computers, smart phones, etc., in order to blend real game environments and virtual elements as seamlessly as possible; instead of being tied to the PC or game console, players can move about freely in their natural environment, which can itself be an element in the game. 792 H.-J. Bullinger and G. Brink "Epidemic Menace" is such a pervasive gaming approach which is being developed in the integrated EU project IPerG by a consortium of leading European research institutes and companies [11]. Internet players are tied to players in the real world via augmented reality. In addition to the Fraunhofer Institute for Applied Information Technology FIT, some of the other participants are Sony NetServices, Blast Theory, Daydream, the Swedish Institute of Computer Science, Nokia and the Mixed Reality Laboratory at the University of Nottingham. Epidemic Menace includes design and evaluation methods, as well as marketing strategies as the basis for attractive and profitable games, the development of an optimized technical infrastructure, efficient design tools and user interfaces suitable for games. Speaking very generally, multimodal, immersive worlds are particularly fascinating when various stimuli give rise to an overall impression that is perceived as the integrated user experience. Such approaches, such as linking visual, acoustic and tactile input and output modalities, have meanwhile come to be used in diverse application scenarios. We are also presenting an article on this at this conference [12]. Very generally, many IT applications will be offered and used as services in the future. The boundaries between a company's internal solutions and its external service offers are becoming blurred. New applications and business models networked via the Internet are establishing themselves as attractive alternatives or supplement to existing solutions. At Fraunhofer ISST, at Fraunhofer FOKUS and at Fraunhofer IAO, we are working not only on new technological formats and solutions, but also on utilization strategies, concrete solutions and future business models [13], [14]. Successful Internet information systems must strike a delicate balance between autonomy, flexibility, and governance [15]. Collaboration also calls for semantic integration: documents that know where they belong. This includes Theseus, a research program for developing a new, Internetbased knowledge infrastructure funded by the BMWi (German federal ministry of economics and technology) with a grant of roughly 90 million euros. In addition to the Bertelsmann subsidiary SAP and Siemens, research partners are a total of ten Fraunhofer Gesellschaft institutes receiving grants, as well as other research organizations and universities [16]. The program is expected to produce its results in the form of innovative products, tools, services and business models for the World Wide Web, as well as the service and knowledge society of tomorrow. The program consists of six so-called Use Cases, each of which is led by a "Use Case Captain" from a company, and of a Core Technology Cluster, led by Fraunhofer, for the basic technologies. Further "transversal activities" are the development of business cases at Fraunhofer IAO and the assessment and evaluation of the technologies by Fraunhofer IDMT [17]. The accompanying research, under the direction of Prof. Weber from ISST, includes the development of new methods for project profiling and service profiling as an innovation management tool. The acceptance of new interaction technologies hinges on the stability, robustness and performance capability of the systems that build on them. The Fraunhofer Institute FOKUS applies systematic and automated testing methods, which were developed in the European ITEA projects TT-Medal and D-MINT, among others [18], [19]. These are used, for example, for safeguarding Web service-based systems [20], business processes [21] and medical workflows in accordance with IHE [22]. In one current research project at Fraunhofer FOKUS, questions involving the trustworthiness of Collaborative Development and New Devices for Human-Computer Interaction 793 network-based offers such as online purchasing or personal networks, which are central for all Internet users, are examined. In order to offer users better assessment and control possibilities for their Internet use, technical and organizational aspects of trustworthiness are analyzed, modeled and evaluated. 4 Interfaces and Peripherals When there's a discussion of interfaces, it often centers not on interfaces, but on new peripherals. Portable computer displays are a terrible nuisance. They are scarcely usable on airplanes, because there is not enough room to flip the display back. They are delicate, expensive, heavy and use up the battery. Displays the size of those on portable computers offers reasonable resolution, but they are too large to be used on PDAs and our growing collection of devices in our jacket pockets. One possible solution can be found in a projector that is barely largely than a sugar cube [23]. Now Fraunhofer researchers have improved their mini-projector to the point that it can project jitter-free pictures when held in the hand. PowerPoint presentations during a business trip can be shown on the spot, and it is no longer a problem for people to find their way in unknown cities when they use a city map projected on the wall of the nearest building. No matter whether the projector is put into our cell phone, PDA or digital camera: soon we will always have a mini-laser projector in our pocket. The various display techniques developed at different Fraunhofer institutes, e.g. at Fraunhofer HHI, enable any content to be presented in augmented, immersive 3D format [24]. Applications range from the immersive 3D dome cinema, to mixedreality applications where reality and the virtual world merge, through to presentation on mobile terminal units. An auto stereoscopic display system provides users stereo visualization without the uncomfortable and inconvenient drawbacks of wearing stereo glasses or head-mounted displays. The users perceive a different image with each eye, giving a stereo image. Our solution from Fraunhofer Heinrich Hertz Institute uses an eye tracking system to automatically adjust the two displayed images to follow the viewer's eyes as they move their heads. This brings us to the subject of acoustics, which is also a part of the humancomputer interface of the future. By means of the Wave Field Synthesis technology, every room can be filled with natural and realistic sound [25]. The optimum sound in a room, e.g., a movie theatre, can usually be experienced by only few people in the audience. Outside of this determined local range, the quality for optimal sound cannot be provided. Other deficiencies of these surround sound systems are the lack of realistic reproduction of natural and virtual sound sources. The Wave Field Synthesis based IOSONO sound system, developed at Fraunhofer IDMT, creates natural sound fields for every room and seat. The basic idea comes from the Huygen's principle of wave propagation. Using this principle in conjunction with audio allows - not only in cinemas - the realistic emulation of virtual sound sources and waves by using of a ring of loudspeakers around the listening space. Every listener is able to enjoy his own sonic sphere where he perceives dialogues and effects from the right acoustic perspective. This reproduction of 3D-wave fields in real time also offers new possibilities for editing and mixing of sound scenes. 794 H.-J. Bullinger and G. Brink 5 Outlook IBM was in an existential crisis in 1994. Today, when everyone is asking: • "How can we avoid the impact of the incipient economic crisis? • How can we stabilize already plunging figures and reach our budgeted targets? • Where can we save additional costs?" The message of the events from that time, not only at IBM, a company that was really doing poorly, is more important than ever. Success or failure starts in our heads. The brain construes the future. Feed it the right input, so that the right output emerges. We must have a climate free of fear and threats in order to develop new ideas and forge new paths. We must have friendly conditions so that employees can and want to provide their intellectual potential fully and completely to the company. Focus on what can work. Instead of questions directed inwardly, such as those above, which are usually the type asked in the crisis, questions focused on the customers, or an examination of the new technologies and their opportunities, bring us much further. Then the attention is focused outwards, and not inwards. References 1. De Michelis, G., et al.: Cooperative information systems: A manifesto. Cooperative Information Systems: Trends & Directions, 315–165 (1997) 2. Hamel, G.: Waking up IBM: how a gang of unlikely rebels transformed Big Blue. Harvard Business Review 78(4), 137–146 (2000) 3. Bullinger, H.J., Lentes, H.P., Scholtz, O.H.: Invited paper Challenges and chances for innovative companies in a global information society. International Journal of Production Research 38(7), 1469–1500 (2000) 4. Balve, P., et al.: Prozeßorientierte Fertigungssegmentierung. ua: Montageplanung-effizient und marktgerecht, pp. 75–94. Springer, Heidelberg (2001) 5. Bullinger, H.J., Gudszend, T.: IT-Supported Knowledge Generation and Storage in Competence Networks for Plant Maintenance. In: Proceedings of the The 8th International Conference on Concurrent Enterprising, Rome, Italy, University of Nottingham (2002) 6. Oppermann, R.: From useradaptive to context-adaptive information systems. iCom. Zeitschrift fur interaktive und kooperative Medien 3(2005), 4–14 (2005) 7. Oppermann, R., Specht, M., Stephanidis, C.: Contextualized Information Systems for an Information Society for All. In: Universal Access in HCI: Towards an Information Society for All, vol. 3, pp. 850–853 8. Kaufmann, O., et al.: Implicit interaction for pro-active assistance in a context-adaptive warehouse application. ACM, New York (2007) 9. Oppermann, R.: Situated learning in the work process (2005) 10. Prinz, W., Aschersleben, G., Koch, I.: Cognition and Action. In: Oxford Handbook of Human Action: Mechanisms of Human Action, p. 35 (2008) 11. Lindt, I., et al.: Combining multiple gaming interfaces in epidemic menace. ACM Press, New York (2006) 12. Beinhauer, W., C.H.:, Using Acoustic Landscapes for the Evaluation of Multimodal Mobile Applications HCII 2009 (2009) Collaborative Development and New Devices for Human-Computer Interaction 795 13. Renner, T., et al.: Open Source Software: Einsatzpotenziale und Wirtschaftlichkeit. Eine Studie der Fraunhofer-Gesellschaft (2005) 14. Hunt, G.C., et al.: System and method for logical modeling of distributed computer systems, Google Patents (2006) 15. Jarke, M.: On Technology Convergence and Platforms: Requirements Challenges from New Technologies and System Architectures. Springer, Heidelberg (2009) 16. Reuse, B., Vollmar, R., Broy, M.: Informatikforschung in Deutschland. Springer, Heidelberg (2008) 17. Dunker, P., et al.: Content-based Mood Classification for Photos and Music 18. Deiß, T.R.A., Schieferdecker, I., Vassiliou-Gioles, T.: Advanced Test Processes using TTCN-3. ITEA Publications (2006) 19. Din, G., Schieferdecker, I., Petre, R.: Performance Test Design Process and Its Implementation Patterns for Multi-services Systems. In: Suzuki, K., Higashino, T., Ulrich, A., Hasegawa, T. (eds.) TestCom/FATES 2008. LNCS, vol. 5047, pp. 135–152. Springer, Heidelberg (2008) 20. Vouffo-Feudjio, A.a.S.I.: Availability testing for Web services. In: Telektronikk 2009 (to appear, 2009) 21. Din, G., Eckert, K.P., Schieferdecker, I.: A Workload Model for Benchmarking BPEL Engines (2008) 22. Vega, D., Schieferdecker, I., Din, G.: A TTCN-3-based Test Automation Framework for HL7-based Applications and Components. In: 11th Intern. Conf. on Quality Engineering in Software Technology (2008) 23. Scholles, M., et al.: Illumination & Displays Building ultra-compact laser projectors 24. Pastoor, S.: 3D Displays. 3D Video Communication. Wiley, Chichester (2005) 25. Brandenburg, K., Brix, S., Sporer, T.: Wave field synthesis: From research to applications Orchestration Modeling of Interactive Systems Bertrand David and René Chalon LIESP laboratory, Ecole Centrale de Lyon, France 36, avenue Guy de Collongue, 69134 ECULLY cedex {Bertrand.David,Rene.Chalon}@ec-lyon.fr Abstract. In this paper we study the role of orchestration and its modeling for interactive systems. After a common sense explanation of orchestration and its meaning in information technologies and mainly SOA, we explain its use specifically for the design and use of interactive systems. We propose a taxonomy and, in relation with it, we point out both adaptation and plasticity of HCI systems as a partial answer for orchestration. We then suggest complementary aspects needed for orchestration and their modeling in an MDA approach. We also present a case study and we conclude by considering perspectives. Keywords: Human-computer interaction, Orchestration, SOA, MDA, CSCW, static and dynamic evolution. 1 Introduction Orchestration is a word which is increasingly used in the information technology (IT) world. It seems interesting to study its meaning and use in different domains of IT and mainly its use in SOA (Service Oriented Architecture). Our objective is, after this general view, to focus our attention on its use in HCI, in order to answer, at least partially, the question: is orchestration something new (fully or partially), or do we already have answers in HCI which cover the orchestration problematic? For everybody, orchestration commonly defines a music arrangement to be played by several instruments. Its generalization is given in Webster’s New Collegiate Dictionary which gives following definitions for the words “orchestrate” and “orchestration”: Orchestrate: to arrange or combine so as to achieve a maximum effect. Orchestration: harmonious organization – develop a world community through orchestration of cultural diversities (L.K.Frank). In the information technology field, orchestration has first been used by OMG [1] referring to orchestration as "the modeling of directed, internal business processes". OMG also introduced choreography as "the specification of interactions between autonomous processes". Orchestration in business processes is a series of activities in a controlled work flow – typically involving one single swim lane. Whereas choreography relates to the observable public exchange of messages, rules of interaction and agreements between two or more business process end points can address the collaboration among multiple swim lanes. There are several design factors which need to be addressed to achieve both orchestration and choreography. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 796–805, 2009. © Springer-Verlag Berlin Heidelberg 2009 Orchestration Modeling of Interactive Systems 797 The key design attributes for Orchestration include participant and role definition, variables, properties which enable conversation, fault handlers for exception processing, compensation handlers for error recovery and event handlers to respond to concurrent events with the process itself and set of activities. For Choreography, message structure, asynchronous communication, message rules, invocation, events and event handling are some of the important design factors. 2 Orchestration and SOA We can define orchestration as a standards-based mechanism that defines how web services work together, including business logic, sequencing, exception handling, process decomposition, including service and process reuse. Orchestrations may span a few internal systems, systems between organizations, or both. Moreover, orchestrations are long-running, multi-step transactions, almost always controlled by one business party, and are loosely coupled and asynchronous in nature. In SOA context, we can consider orchestration as really another layer in itself over and above more traditional application integration approaches, including informationand service-oriented integration. Orchestration encapsulates these integration points, binding them together to form higher level processes and composite services, indeed orchestrations themselves should become services. The view presented above is an interesting point of view which focuses on an architectural point of view (need of layers) and evolution (need of changes). SOA is presented as mainly back office oriented, to express and organize exchanges between service and process organization and evolution. The meaning of the term Interaction is oriented to exchange between software components. As pointed out by D. Davies [2] the missing link of SOA is UI orchestration, i.e. extending SOA principles to the UI. Our feeling is that the front office, user-oriented applications are also concerned by orchestration. This orchestration is humancomputer interaction oriented and is in charge to organize the front end (user-oriented view) of process satisfaction and evolution. In HCI literature, orchestration is not a usual term. We only found one reference [3], which offers to use the orchestration model to express interaction dependencies among widgets and grouping of widgets into Ajax pages in Rich Internet Applications (RIA). Unfortunately this view is too strongly related to implementation aspects. We are far from the SOA definition: The objective is to define different rules of distribution for different tasks to be carried out by different Web Services. A more comprehensive view of orchestration is needed in respect of the following statement [4]: “Orchestration is a necessity if you are building a SOA, intra- or inter-organization. It's the layer that creates business solutions from the vast array of services and information flows found in new and existing systems. Orchestration is a godlike control mechanism that's able to put the SOA to work, as well as provide a point of control. Orchestration layers allow you to change the way your business functions, as needed, to define or redefine any business process on-the-fly. This provides the business with the flexibility and agility needed to compete today.” An orchestration engine is a server that is responsible for acting as an intelligent intermediary between services. An architectural point of view is explained in figure 1, inspired by [5]. 798 B. David and R. Chalon Services SOA pattern Business solutions management –process automation Pattern Context-aware Profile Management –Workflow management Orchestration Engine Internet access Users Fig. 1. Main layers of SOA architecture 3 From SOA Orchestration to Interactive System Orchestration Interaction orchestration is not a common term. As mentioned above, Orchestration describes the automated arrangement, coordination, and management of complex computer systems and services. It is often discussed as having an inherent logic or intelligence or even implicitly autonomic control. In reality, orchestration is largely the effect of automation or systems deploying elements of the control theory. As seen in figure 1 this architecture is mainly back-office oriented, no user considerations are pointed out, i.e. environment, working conditions, … The HCI point of view appears as complementary, i.e. mainly concerned by front-end user considerations, i.e. working conditions, interaction devices and environment. From the HCI point of view, the most related concept is adaptation, primarily in terms of portability [6] (adaptation to interaction devices – hardware), then now in terms of plasticity [7] (adaptation to the context – platform & user & environment). A very interesting in-depth taxonomy of adaptation is presented in [8] and summarized in figure 2. The adaptation is, on the one hand, defined by this taxonomy and, on the other hand, by the way to obtain it. Four approaches, two and two being opposed, can be used: explicit versus implicit adaptation (adaptability versus adaptivity) and static versus dynamic adaptation. In respect of the SOA Orchestration Fig. 2. Adaptation taxonomy [8] we can adapt the SOA back-office orchestration view to the HCI front-office orchestration and rename from HCI adaptation to HCI orchestration, or we can maintain the distinction between the “classical” HCI Orchestration Modeling of Interactive Systems 799 adaptation – plasticity term for well-defined HCI field approach and only use the Orchestration term for new aspects, related to specific aspects of the SOA view of Orchestration. As mentioned above, there are several design factors which need to be addressed to achieve both orchestration and choreography. The key design attributes for Orchestration are participant and role definition. For Choreography, message structure, asynchronous communication, message rules and invocation are some of the important design factors. We propose to study the HCI adaptation and SOA orchestration views of interactive systems in order to take into account three user situations (single user, multi user and cooperating user), related context characteristics (platform, user, environment) which can be more or less sophisticated, and collaboration – cooperation requirements. Table 1. Taxonomy of user situations and associated Single user Platform Adaptation User preferences Adaptation Environment Adaptation Mixed reality interaction Adaptation Role and process requirements Cooperation requirements X X Multiple users Adaptation & Orchestration Adaptation & Orchestration Adaptation & Orchestration Adaptation & Orchestration Orchestration X Cooperative users Adaptation & Orchestration Adaptation & Orchestration Adaptation & Orchestration Adaptation & Orchestration Orchestration Orchestration As indicated in table 1, in three user contexts (single, multiple users, cooperative users) the overall adaptation process is either HCI adaptation oriented (for all aspects of single users), or a mix between HCI adaptation- and SOA orchestration-oriented when it comes to multiple and cooperative users (single user facets based on HCI adaptation and multiple and cooperative aspects based of SOA orchestration). It is purely SOA orchestration-oriented for role and process requirements and cooperation requirement aspects. 4 HCI Model-Based Approach and Orchestration To integrate these so-called SOA orchestration aspects into the HCI view of orchestration, it is important to introduce new models to the Model-based view of the HCI systems design process. As generally admitted [9], 5 complementary models are used for the requirements, design, development and use live cycle of interactive systems. They are respectively: • Task model defining user and computer actions and activities and their temporal organization. • Concepts or data model expressing application behavior world to be represented in the computer. 800 B. David and R. Chalon • Dialog model expressing dialog organization, structure and temporal organization without look and feel aspects. • Abstract interaction model expressing abstract interaction style and interaction and presentation aspects without concrete considerations • Concrete interaction models expressing concrete instantiation of all aspects of the user interface Task model Data model Dialog model Context Adaptation Model Abstract Interaction model Classic HCI models Static Multi-user Functional Model Dynamic Multi-user Functional Model Concrete Concrete Interaction Concrete Interaction model Interaction model model Adaptation and orchestration models Fig. 3. Model-based process for Interaction systems elaboration and orchestration Different transformation processes can be proposed based on these models [9]. We prefer the process (Fig. 3) starting with the task model, continuing with the data model, and allowing to elaborate a dialog model based on these two models. An abstract interface model and several concrete interaction models can now be elaborated. Context adaptation model is used to adapt user interface to different identified situations. For multi user and collaborative interactive systems we need two complementary models (static and dynamic functional multi-users model) [10] which are able to express in a static and dynamic way the situation (role and context) of each user. This information is used to provide orchestration for each user. Manipulation of these different models can be done in different ways. Either they are statically accessible for developers before execution (activation) of the system, or they are dynamically accessible during the execution for dynamic adaptation. These interfaces are called metainterface [11] by J. Coutaz and extra-interface [12] by G. Calvary. We offer a further approach [13], which allows to not only provide the developer with this adaptation interface, but also the user himself. In AMF-C approach, based on multi-faceted agents, the control of each agent is expressed in a graphical way. This clean definition of the relation between facets i.e. presentation (interaction) and abstraction; can be dynamically modified by the user in order to modify associated behavior. This graphical formalism is not only available at Orchestration Modeling of Interactive Systems 801 design stage, for the designer, but can be given to the final user. This way he can, during the execution, adjust on-demand awareness of actions, i.e. his actions (or results of actions) can be, or not be, propagated to other users. As shown in fig. 4 a particular configuration of propagation administrator allows on-demand to propagate, or not, the result of action of user 1 to the user 2. A A The B target port is activated when a source port is activated, when the lock is opened. The activation lock the administrator B unlock Fragmented AMF-C Agent User 1 Pres. Control Abstraction Start Action Do Action 2 User 2 Pres. 1 3 Start_Action Fig. 4. Graphic manipulation of control in AMF-C architecture to allow, or not, awareness of action of user 1 to user 2 In the case of multi user interactive systems, orchestration mainly means the adaptation of the application behavior to the role of the user and the adaptation of the system to all adaptation characteristics expressed in the user model. This adaptation is either static, if the characteristics of the process and distribution of users’ roles are defined before the execution, or dynamic if the process can evolve dynamically and the user can change his role in the process as well as the context characteristics (mobility i.e. geographical or logical location, platform changes …). 5 Orchestration of Collaboration In the case of cooperative systems, orchestration means organization of work of each actor and mainly inter-actor exchanges. The users working in CSCW situations are concerned by both, the front-end and back-end views of orchestration. Specification and dynamic evolution of collaborative situations and processes must be expressed to 802 B. David and R. Chalon allow the orchestration engine to manage collaboration situations and processes and their changes. To express this behavior and associated orchestration we are using ORCHESTRA formalism [14] of which the objective is to describe cooperative behavior of different actors of the process. The formalism is based on the metaphor of music scores expressed separately for each actor (music instrument) as it is done in orchestra presentation for the conductor. This way a global view of collaboration is expressed and can be used for orchestration, i.e. allocation of tasks to the actors in respect of their roles, establishing cooperation, coordination, and conversation in relation with expressed strategies and styles. Based on this formalism, orchestration can be done in respect of this description. Cooperation patterns have been proposed to exemplify more common cooperation situations. . . . . . . . . Fig. 5. ORCHESTRA description of heating maintenance activities example In ORCHESTRA formalism (Fig. 5) for each actor (a score is devoted to each) it is possible to describe his instantaneous behavior by means of a chord composed of several keys describing step or transition of collaborative workflow concerned, tasks, actions or procedures authorized, as well as artifact-tools used and artifact-objects used or produced and in which environment(s). Cooperation between actors is expressed by means of a vertical presentation of scores of each actor. The left to right time progress is the main principle. However, each cooperation episode (between two vertical separators, can be “played” individually with the possibility to jump from episode to episode, to skip an episode, to repeat it in order to clearly express a Orchestration Modeling of Interactive Systems 803 working process. To each key a more precise in-depth textual description is associated, indicating more precisely the behavior. In fact, this specification of actor behavior is not based on the actor himself, but on a more basic role behavior description. Actor description is the composition of different roles. This accumulation is either simultaneous or sequential, i.e. an actor can cumulate several (compatible) roles at the same time or take different roles in different cooperation episodes. In this way ORCHESTRA is able to provide appropriate information corresponding to the static or dynamic multi-user model mentioned above (Fig. 6) which can be used in static or dynamic orchestration of the interaction system. Fig. 6. Main concepts manipulated by ORCHESTRA and associated patterns [14] 6 Case Study In a sophisticated maintenance collaborative system, two categories of several actors are concerned. Fix “seating” actors: a secretary collecting intervention calls, a clerk and administrative person in charge of maintenance contract management (billing, renewal, …) a warehouseman (supplying and stock management) and a supervisor (managing maintenance interventions); and moving, mobile actors: a technician, a technician in chief, an expert. In reality, mainly for mobile actors, their activity is closely related to the precise maintenance contract specific to each case. For efficiency and repairing contractual delays, it is important to be able to call up an appropriate number of technicians and to allocate among them precise tasks and deadlines. For this reason, the overall ORCHESTRA description must be complemented by indepth textual mainly for tasks to be done, artifact-tool used and artifact-objects used, created or updated. In the same way, the description is not based on actors, but on roles; actors are instantiated by the supervisor through the composition of several roles, either simultaneous or sequential, and assigned to persons (technicians). This way, a precise orchestration of each actor is done either statically before the beginning of the intervention or dynamically during the maintenance process. The Orchestration engine is in charge to put these requirements elaborated by the supervisor into practice. 804 B. David and R. Chalon 7 Conclusion and Perspectives In this paper we discussed Orchestration from different points of view. We first reminded of the common sense definition, then we discussed Orchestration in IT and more deeply in the SOA approach, where it is extensively used. We tried to study its use in the HCI field and we observed two main differences: in the SOA field, Orchestration is mainly used for multi-user back-office task organization based on basic services. In this field, interaction refers to the relation between two software components. The front-office use in SOA is not yet well covered. In HCI field, orchestration is not a common term and principle. In this front-office view, interactive systems are mainly user interface oriented and therefore concerned with concrete user interfaces taking into account plasticity principles (platform, user preferences and environment) in order to provide the user with an appropriate interface. The majority of interactive systems are single user oriented ones and for this reason, Orchestration, used to manage multi-user activities, is not taken into account. A dimension, which is part of the SOA Orchestration, which is deeply treated in HCI systems, is concerned with adaptation, related to the plasticity principles (platform, user preferences and environment) and more [8] (Fig. 2). Multi-user and cooperative user interactive systems are not yet in the mainstream of interactive systems. They are for this reason less studied. Multi-user systems can be, from an HCI point of view, considered as single user systems mainly for two reasons: 1/ no collaboration between users and 2/ static, before the execution, definition of user rights and duties. For these reasons applying to them the Orchestration in the SOA sense seems legitimate. For true cooperative systems, the static or dynamic management of user rights and duties is mandatory. This way the SOA orchestration can be transposed to this kind of interactive systems. In order to take into account the Orchestration of multi-user and collaborative systems we suggested to add to the list of models used either for design – implementation – use process or for the adaptation of two new models concerned with the static and dynamic description of functionalities of the system. The static multi-user functional model works on roles and can be compared to the UML class model. The dynamic multi-user functional model works on users, as concrete compositions of roles, and can be compared to the UML object model. These models can be used in the same way as in the “orchestration engine” of SOA, to statically (before execution) or dynamically (during the execution) adapt multi-user or collaborative systems to the accurate assigning of actors (and their roles). We had no space to accurately describe the content and modeling of these new models, but they are naturally XML based in order to be able to be treated by different transformation engines, like AMF engine [14]. For the future, two ways are offered to HCI, either to rename all its adaptation activities and approaches and call them orchestration, in order to be understood by a large community of Information System designers and users, or to preach the distinction between adaptation and orchestration and clearly advocate the differences. However, just like SOA orchestration increasingly studies the front-office aspects of orchestration, HCI field could take into account what happened in the back-office, in order to take into account the impact on the front-office in general and on User Interface in particular. It is not possible to maintain the classic macro-module called Functional Core (FC) as a black box, it seems important to inspect it and extract important information to be used in UI design. Orchestration Modeling of Interactive Systems 805 References 1. Siegel, J.: OMG Specifications for Business Modeling, The OMG standard (fall 2007) 2. Davies, D.: The Missing Link of SOA, Opinion Piece, International Developerp. p. 71 (September 2006) 3. Perez, S., Diaz, O., Melia, S., Gomez, J.: Facing Interaction-Rich RIAs: the Orchestration Model. In: Eighth International Conference on Web Engineering, pp. 24–37. IEEE Xplore (2008) 4. Linthicum, D.: Why Orchestration Defines Your SOA, Toolbox for IT, Knowledge Sharing Communities (posted, March 31, 2005), http://it.toolbox.com/ 5. http://www.serviceoriented.org/orchestration_engine.html 6. Bono, P.R., Herman, P. (eds.): GKS theory and practice. Springer, Heidelberg (1987) 7. Thevenin, D., Coutaz, J.: Plasticity of User Interfaces: Framework and Research Agenda. In: Sasse, M.A., Johnson, C. (eds.) Interact 1999, pp. 110–117. IOS Press, Amsterdam (1999) 8. Rouillard, J.: Adaptation en contexte: contributions aux interfaces multimodales et multicanales, HDR dissertation, Université de Lille (2008) 9. Vanderdonckt, J.: A MDA-Compliant Environment for Developing User Interfaces of Information Systems. In: Pastor, Ó., Falcão e Cunha, J. (eds.) CAiSE 2005. LNCS, vol. 3520, pp. 16–31. Springer, Heidelberg (2005) 10. David, B., Ouadou, K., Sadou, S., Vial, C.: A Framework for Intelligent User Interfaces. In: Proceeding of OZCHI 1991, Sydney, Australie (1991) 11. Coutaz, J.: Meta-User Interfaces for Ambient Spaces. In: Coninx, K., Luyten, K., Schneider, K.A. (eds.) TAMODIA 2006. LNCS, vol. 4385, pp. 1–15. Springer, Heidelberg (2007) 12. Calvary, G.: Plasticité des Interfaces Homme-Machine, HDR Dissertation, Université Joseph Fourier, Grenoble (2007) 13. Tarpin-Bernard, F., Samaan, K., David, B.: Achieving usability of adaptable software: the AMF-based approach. In: Seffah, A., Vanderdonckt, J., Desmarais, M.C. (eds.) HumanCentered Software Engineering, Software Engineering Models, Patterns and Architectures for Human-Computer Interaction, Springer, Heidelberg (2009) 14. David, B., Chalon, R., Delotte, O., Masserey, G.: ORCHESTRA: formalism to express static and dynamic model of mobile collaborative activities and associated patterns. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 1082–1091. Springer, Heidelberg (2007) 15. Chalon, R., David, B.T.: IRVO: an Interaction Model for designing Collaborative Mixed Reality Systems. In: Proceedings of 11th International Conference on Human-Computer Interaction (HCII 2005), Las Vegas, Nevada, USA (2005) An Exploration of Perspective Changes within MBD Anke Dittmar and Peter Forbrig University of Rostock {anke.dittmar,peter.forbrig}@uni-rostock.de Abstract. Most current model-based design approaches tend to be specification-driven. Using task models solely at the specification level, contradicts the very idea inherent to task-based design. This paper suggests to look for improvements of the situation at different levels of artifact use. First, HOPS is introduced as a general specification formalism for the interaction paradigm which also allows advanced task modeling. Second, we propose to apply formal modeling in different modes during the different stages of a design process. “Task sketching” is elaborated more deeply. And third, a combination with complementary techniques is recommended to embed the development of formal system specifications in a reflective conversation between all stakeholders. An example is used throughout the paper to illustrate our ideas. 1 Introduction Interaction design is typically a multidisciplinary process with different stakeholders being involved. One basic idea of model- or task-based design approaches (MBD) is that design decisions are supported by the creation of (semi-)formal models, which represent different perspectives on the domain of interest. Consulting different perspectives may help to broaden the view on the design problem and to prevent designers from applying first-best solution strategies. Typical models in use are models about tasks, task domains, users as well as system descriptions such as models about dialogues and functional cores or specific device models. Although the intention behind MBD is to develop useful and usable interactive systems they are often criticized for their narrow view on tasks and human activity in general. However, a more nuanced view on (semi-)formal design approaches may suggest that they should neither condemned nor glamorized. It is a truism that artifacts allow people to deal with the world in new ways but also impose limitations by focusing on certain aspects only and marginalizing others. Formal or semiformal models are abstract descriptions. Hence they can never reflect the full richness of an actual situation. However, by using abstractions people are able to recognize recurrent patterns in different situations. Every modeling formalism has underlying assumptions and supplies more or less elaborated means (modeling constructs) to describe the domain of interest appropriately. As a consequence, abstract models may guide the design of envisioned interaction situations, while at the same time they may prevent designers from seeing a problem “with different lenses”. When exploring an artifact, in this case a design approach, it is useful to distinguish three facets. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 806–815, 2009. © Springer-Verlag Berlin Heidelberg 2009 An Exploration of Perspective Changes within MBD 807 − Which properties are inherent to the artifact itself? − Which modes of thinking/acting are particularly supported by the artifact? − What is the common or prevalent culture of using the artifact? Of course, these are related. By thinking about properties of an artifact one may develop new modes of interactive use. By using an artifact one may discover new properties. Certain artifacts may be “pre-assigned” to certain communities of practice and this may prevent others from applying them and so on. This paper investigates all three facets with respect to task-based design. We introduce HOPS as a general specification formalism for the interaction paradigm. Particularly, we concentrate on task modeling in the illustrative examples to show advantages of HOPS over existing modeling approaches (Sect. 4). Furthermore, we suggest applying (semi-)formal modeling in different modes during the different stages of a design process. We illustrate the “sketchy mode” more deeply (Sect. 5). In this mode incomplete and inconsistent descriptions and model variants are even desired. They reflect different perspectives and help to provoke discussion and a more thorough elaboration of the design space. Although model-based design techniques reflect user perspectives on tasks and work processes, they do not require nor support true stakeholder participation. A combination with complementary approaches is needed (Sect. 6). This is also in line with our previous work in [1,2]. 2 Related Work 2.1 Prevailing Practices in MBD Model-based approaches assume that task knowledge can partially be described and that task representations support a more sensitive design. Table 1 shows that early publications already recommend to model both existing and envisioned task situations. Which practices should be changed and what is worth to maintain? How can functional specifications be derived from existing knowledge of task domains? A dialogue about possible effects of task reallocation is necessary. Table 1. Suggested names for representations of existing and envisioned task situations (Extant) Generalised Task Model Existing Task Model Task Model 1 (Target) Composite Task Model [3] Envisioned Task Model [4] Task Model 2 [5] However, a critical review of existing work in MBD rather reveals modeling practices as indicated in Fig. 1. One rarely finds existing task models. Most papers focus on the exploitation of task models for deriving systems specifications, in particular UI models, in a more or less direct way. The design process is mainly specification driven. One consequence is that the models in use do not necessarily reflect different, even conflicting design perspectives anymore. For example, task models and dialog models often become indistinguishable (see e.g. [6,2] for a discussion). Stakeholders are not encouraged to ask what-if questions and to explore the design space. 808 A. Dittmar and P. Forbrig Fig. 1. Most common models in current MBD approaches 2.2 Common Modes of Use in MBD Carroll points out that traditional task analysis assumes correct and complete structural task descriptions [7]. Terms like those mentioned in Table 1 (left hand side) and common modeling practices in MBD seem to support this statement. Generally, formal approaches are often focused on achieving complete and correct specifications. This makes sense in many contexts. As for specifications of interactive systems it is additionally desirable that they are executable on a machine. However, it does not make sense in every context. For example, Carroll notes that the objective of task modeling is often an optimal performance. Supposedly complete or correct models may hinder the consideration of other important aspects of work [7]. Fowler distinguishes in [8] between three modes in which people use UML. − UML as sketch: “developers use the UML to help to communicate some aspects of a system… The essence of sketching is selectivity.” − UML as blueprint: is about completeness, e.g. about a detailed design for a programmer to code up. − UML as programming language: “UML diagrams are compiled directly to executable code” [8]. In MBD and related approaches like model-driven development of software systems the last two modes are prevailing. It is our suggestion to use models as sketches more intensive in MBD. We elaborate this mode of use more deeply with respect to task modeling in Sect. 4. 2.3 Review of Task Modeling Formalisms Most existing modeling formalisms allow a hierarchical decomposition of tasks and the specification of temporal dependencies between sub-tasks (e.g. HTA, TKS, MAD, CTT, GOMS; for a detailed overview, see [9]). A task model describes a set of sequences of basic tasks. It is said that the task-related goal can be accomplished by performing one of these sequences. Sometimes, the use of and the effect on task domain objects can be specified (e.g. TaOSpec [2]). However, objects are always seen as “second class” [10]. Dix et al. propose more sophisticated artifact life cycles alongside task descriptions. “Users recruit their everyday knowledge and properties of the world to coordinate their activity.” An Exploration of Perspective Changes within MBD 809 Approaches like GTA [5] and CCTT [11] aim to describe cooperative work by introducing concepts such as roles and actors. Fig. 2 shows two roles Lecturer and Registrar in a CCTT-model. A lecturer is responsible for grading students on the basis of homework and quizzes. The registrar files the results students achieved. The cooperative tree on the left side specifies the coordination between the roles. Registrar: Lecturer: Fig. 2. CCTT-model with two roles and a task scenario with the corresponding enabled task set The authors of [12] criticize task modeling for considering tasks as “discrete, isolated chunks of behavior”. Indeed, a scenario of a CCTT-model describes e.g. cooperation as a sequence of coordinated actions assigned to roles (see Fig. 2). The essence of interaction may be lost. Does it mean that MBD rather supports a “mechanization” of work, even though it is well intended? We suggest that improvements are necessary and possible at all three levels of artifact use mentioned in the introduction. Modeling formalisms have to be developed which reflect a richer task understanding. However, even improved techniques will continue to supply idealized and normative descriptions. On the one hand this is necessary for developing software systems. On the other hand, we need to learn how to merge (semi-)formal and complementary design approaches in a better way. 3 Modeling with HOPS HOPS is a prototypical tool to specify and animate interactive systems. It is based on the concept of higher-order processes and allows a unified description of structural and behavioral aspects of a system. For more details see e.g. [13]. The essential elements of the approach are operations and processes. Operations refer to the smallest units of behavior in a certain context of analysis or design. They are represented by names. They happen without interruption but can be characterized by pre- and post conditions. Processes are abstractions over operations. They have an inner structure and interruption is inherent to them. Each process defines a set of 810 A. Dittmar and P. Forbrig operations. Furthermore, processes can contain components, which are processes themselves. An enclosing process describes how its components interact with each other, while at the same time these components constitute its environment. An essential characteristic of higher-order processes is that they do not fully control their environment. Only those operations of components are considered (in the focus of the process) which are relevant for describing the interaction of components. Operations defined in a higher-level process can “absorb” operations of components and describe some new “atomic” behavior. HOPS allows, in addition, the assignment of “foreign code” to operations. This feature is applied in Sect. 4. However, a higherlevel process controls those behavioral aspects only which are in its focus. It does not impose unnecessary restrictions on the rest of the components’ behavior. Generally, behavior is described by sets of sequences of operations. Constraints reduce the number of valid sequences. In HOPS, valid sequences of operations – valid in the context of a process – are specified by partial equations and pre-defined temporal and structural operators. In addition to operations and components, a process definition can contain sub-processes specified by such partial equations. A sub-process describes a partial behavior of its process. A sub-process can furthermore consider additional components and hence introduce operations, which were not in the initial focus of the process. Sub-processes are used for two purposes mainly: first for describing states in pre- and post conditions, second for putting different processes into a common context or for describing specialized sub-structures. An illustrative example In the following example, sub-processes are used for the second purpose only. We take up the task domain of Fig. 2. Some basic ideas of HOPS and of the underlying modeling philosophy are shown. It is to be seen that sub-systems (represented by processes) exist independently, that their interaction can produce new, uninterruptible behavior (represented by operations of a higher-level process) and that processes have no full control over their environment (represented by components). We continue to use the example in the following sections to sketch the suggested modeling approach. Five processes are modeled. Teacher, CourseRequ, and Registrar are basic processes which contain no components but only simple operations (represented by names). Temporal operators in the partial equations are: [] for alternatives, * for iterations, [..] for options, ; for sequences and ||| for concurrency. Furthermore, the structural operators AND, XOR and NOT are used. For operator definitions see e.g. [13]. − Process CourseRequ knows three operations homework, exam, and project. There are two sub-processes SE and HCI describing the requirements for specific courses. For instance, in order to pass the HCI course one has to do one or two exercises and then pass an examination. − Process Teacher reflects a task understanding of lecturers and tutors. A Lecturer has to deal with lists of students, has to evaluate homework, project work, and exams. They also have to decide about grades students get for courses and publish these grades. There are no temporal restrictions on performing these operations (actions). A Tutor is a teacher but is not allowed to evaluate exams, to grade or to publish grades. − Process Registrar describes aspects of the work of a registrar in a student’s office. They get information about grades and file them in any order. An Exploration of Perspective Changes within MBD 811 PROCESS Teacher OPS create_participant_list, eval_homework, eval_exam, eval_project, grade, publish_grades SUB PROCESSES Teacher = Lecturer = Tutor = END PROCESS (create_participant_list [] eval_homework [] eval_project [] eval_exam [] grade [] publish_grades)*, Teacher, (NOT eval_exam) AND (NOT grade) AND (NOT publish_grades) PROCESS CourseRequ OPS homework, exam, project PROCESS Registrar OPS receive_grades, file_grades SUB PROCESSES CourseRequ = SE XOR HCI, BasicRequ = exam, SE = BasicRequ ||| project, HCI = homework ; [homework] ; BasicRequ END PROCESS SUB PROCESSES Registrar = receive_grades* ||| file_grades* END PROCESS The two higher-level processes contain components. Their operations absorb operations of the components. A “op = <<..>>”-construct denotes a sequence of lowerlevel operations which is now uninterruptible and referred to by operation op. If an operation of a component occurs within <<..>> it is in the focus of and so controlled by the enclosing process. − Process Semester08 is to say that two lecturers and one tutor are involved in organizing the HCI course and the SE course which run in parallel. The SE course has two groups se1 and se2. The definitions of the operations reflect responsibilities of the teaching staff. For example, tutor barton is responsible for the project work of group se1 as well as for homework in HCI and he accepts to be so. The partial equations (following the keywords SUB PROCESSES) describe temporal constraints for performing the single operations. For instance, there is only one HCIhomework required in this semester though two would be possible according to the specification of sub-process HCI of CourseRequ. − Process FileSemester focuses on how information about grades is passed on to the students’ office. PROCESS Semester08 BASIC COMPS barton: Tutor(Teacher), smith: Lecturer(Teacher), potts: Lecturer(Teacher), se1: SE(CourseRequ), se2: SE(CourseRequ), hci: HCI(CourseRequ) OPS se1_project se2_project se_exam se_grading hci_homework hci_exam hci_grading = = = = = = = <<se1.project; barton.eval_project>>, <<se2.project; potts.eval_project>>, <<se1.exam; se2.exam; smith.eval_exam>>, <<smith.grade; smith.publish_grades>>, <<hci.homework; barton.eval_homework>>, <<hci.exam; potts.eval_exam>>, <<potts.grade; potts.publish_grades>> 812 A. Dittmar and P. Forbrig SUB PROCESSES Semester08 SE_Course HCI_Course END PROCESS = = = SE_Course ||| HCI_Course, (se_exam ||| se1_project ||| se2_project); se_grading, (hci_homework ||| hci_exam) ; hci_grading PROCESS FileSemester BASIC COMPS sem: Semester08, thompson: Registrar OPS transmitHCI = <<sem.hci_grading ; thompson.receive_grades>>, transmitSE = <<sem.se_grading ; thompson.receive_grades>> SUB PROCESSES FileSemester = transmitHCI ||| transmitSE END PROCESS HOPS is implemented in Prolog. The GUI is implemented in Java using JPL. A screenshot of an animation run of process FileSemester is given in Fig. 3. It shows that, after executing se1_project, se_exam, and se2_project in this order (see left side of the window), the operation transmitSE of the highest-level process is enabled (see right side of the window). However, some operations, which are enabled by lower-level processes and are not in the focus of process FileSemester, could also be performed in the next step. That is, higher-level processes only partially influence the behavior of lower-level ones. Fig. 3. An animation run of process FileSemester in the HOPS-GUI An Exploration of Perspective Changes within MBD 813 What is offered to task modeling? The suggested modeling approach may be more appropriate to understand task situations as interactive situations. HOPS specifications allow to reflect different perspectives on tasks at different levels of abstraction. A process can describe a role but also artifacts or work processes as illustrated in the example. Task objects are not “second class” anymore. Components can be used e.g. for describing actual actors, artifacts and their interaction. In the example, three teachers give courses with certain requirements in a specific semester. The description concentrates on the cooperative task of grading students. Traditional task models can be expressed by simple process definitions without components. Sub-processes represent sub-tasks, operations represent basic sub-tasks and temporal constraints can be specified by partial equations. However, the expressiveness of HOPS far exceeds these concepts. 4 Sketching Task Situations with HOPS As indicated in Sect. 2, most task-based approaches assume generalized models representing task knowledge of a specific domain independently of the detail of the individual tasks analyzed or imagined. However, Diaper argues that “only a small number of tasks can be selected for analysis” [14]. Tasks can be fulfilled by the same or by different actors in a rich variety of ways which we can only observe in parts. He also refers to the problem of task combination and states that quite a few methods are able to combine different tasks into a single task representation though they claim so. In [2], we suggested to understand task models as fragmentary descriptions which may be enriched by stories (e.g. [1]) or illustrations, such as the one in Fig. 4 for our example. In [15], Fallman points out that sketching supports the idea of design as a dialogue or reflective conversation. He emphasizes that sketching is “not simply an externalization of ideas already in the designer’s mind, but on the contrary a way of shaping new ideas.” It is the actual process of creating “task sketches” which may help to embed the development of formal system specifications in an argumentation process. Stakeholders are encouraged to look at the current task situation from different perspectives. “Inconsistencies” between descriptions may support a dialogue about (and sketching of) more coherent envisioned situations. In HOPS, “foreign calls” can be assigned to operations. Such a call can invoke e.g. the execution of Java code. We used this feature to couple illustrations and operations. During the animation of a task model, the illustrations appear in Java frames along with the operations. The frames are entitled by the name of the components which perform the corresponding operations. An impression is given in Fig. 5. The actual task scenario is shown on the left side. On the right side, visualizations for the executed operations are shown. They give hints to artifacts like different lists and forms which the actors use for recording students’ results. For example, the frames linked to operation transmitSE (6) depict what is needed by Prof. Smith in order to do the grading for the SE course and how the results are sent to the registrar. The figure associated to (7) sketches problems Mrs. Thompson sometimes encounter while filing grades. 814 A. Dittmar and P. Forbrig Fig. 4. Overview of the current task situation in the illustrative example Fig. 5. An animation run of process FileSemester with illustrations assigned to operations 5 Revisiting MBD: A Brief Summary Task-based design is a means to embed interaction design into the analysis of current and the design of envisioned task situations. However, most current MBD approaches tend to be specification-driven. Often, task models even lose their character of being descriptions of task practices but can rather be understood as dialog models of the envisioned technical system. This undercuts the very idea of MBD. This paper suggests to look for improvements of the situation at three different levels of artifact use. HOPS was introduced as a specification formalism for the interaction. An example was given to illustrate how it can be used for advanced task modeling. We emphasize the importance of “task sketching” for the emergence of design ideas out of a shared understanding of stakeholders about current practices. Task sketches combine formal and informal representations and may support a smooth transition to formal specifications as they are needed for developing interactive systems. An Exploration of Perspective Changes within MBD 815 References 1. Dittmar, A., Forbrig, P.: Bridging the Gap between Scenarios and Formal Models. In: Proc. of HCII 2003 (2003) 2. Dittmar, A., Gellendin, A., Forbrig, P.: Requirements Elicitation and Elaboration in TaskBased Design Needs More Than Task Modelling: A Case Study. In: Coninx, K., Luyten, K., Schneider, K.A. (eds.) TAMODIA 2006. LNCS, vol. 4385, pp. 275–291. Springer, Heidelberg (2007) 3. Lim, K.Y., Long, J.: The MUSE Method for Usability Engineering. Cambridge University Press, Cambridge (1994) 4. Johnson, P., Wilson, S.: Bridging the Generation Gap: From Work Tasks to User Interface Designs. In: Proc. of CADUI 1996 (1996) 5. van der Veer, G.C., Lenting, B.F., Bergevoet, B.A.J.: GTA: Groupware Task Analysis Modeling Complexity. Acta Psychologica 91, 297–322 (1996) 6. Cass, A.G., Fernandes, C.S.T.: Using Task Models for Cascading Selective Undo. In: Coninx, K., Luyten, K., Schneider, K.A. (eds.) TAMODIA 2006. LNCS, vol. 4385, pp. 186– 201. Springer, Heidelberg (2007) 7. Carroll, J.M.: Scenarios and Task Analysis as Design Methods. In: Workshop on Exploring Design as a Research Activity, CHI 2007 (2007), http://www.chi2007.org/ attend/workshops.php 8. Fowler, M.: UML Distilled: A Brief Guide to the Standard Object Modeling Language, 3rd edn. Addison-Wesley, Reading (2003) 9. Limbourg, Q., Vanderdonckt, J.: Comparing Task Models for User Interface Design. In: [16] (2004) 10. Dix, A., Ramduny-Ellis, D., Wilkinson, J.: Trigger Analysis: Understanding Broken Tasks. In: [16] (2004) 11. Paternò, F., Santoro, C., Tahmassebi, S.: Formal Models for Cooperative Tasks: Concepts and an Application for En-Route Air-Traffic Control. In: DSVIS 1998, Springer, Heidelberg (1998) 12. Randall, D., Hughes, J., Shapiro, D.: Steps towards a partnership: Ethnography and system design. In: Jirotka, M., Gougen, J. (eds.) Requirements Engineering: Social and Technical Issues, Academic Press, San Diego (1994) 13. Dittmar, A., Hübner, T., Forbrig, P.: HOPS: a prototypical specification tool for interactive systems. In: Graham, T.C.N., Palanque, P. (eds.) DSV-IS 2008. LNCS, vol. 5136, Springer, Heidelberg (2008) 14. Diaper, D.: Understanding Task Analysis for Human-Computer Interaction. In: [16] (2004) 15. Fallman, D.: Design-oriented Human-Computer Interaction. In: Proc. of CHI 2003 (2003) 16. Diaper, D., Stanton, N.A. (eds.): The handbook of task analysis for human-computer interaction. Lawrence Erlbaum Associates, Mahwah (2004) Rapid Development of Scoped User Interfaces Denis Dub´e, Jacob Beard, and Hans Vangheluwe School of Computer Science, McGill University, Montreal, Qu´ebec, Canada Abstract. As the demand for domain- and formalism-specific visual modelling environments grows, the need to rapidly construct complex User Interfaces (UIs) increases. In this paper, we propose a MultiParadigm Modelling (MPM) approach whereby structure, visual appearance and above all reactive behaviour of a UI are explicitly modelled. These models are constructed at the most appropriate level of abstraction, using the most appropriate modelling formalisms. This allows for rapid application synthesis, easy adaptation to changing requirements, and simplified maintenance. In this paper, we introduce Scoped User Interfaces, and illustrate how one may model them using Hierarchicallylinked Statecharts (HlS). The use of HlS is demonstrated through the rapid development of a DChart formalism-specific modelling environment. 1 Introduction There are many challenges developers face during the development of a complex User Interface. Desired behaviour may be autonomous or reactive, and possibly real-time. Each UI component may be required to exhibit a radically diﬀerent behaviour from that of any other component and the behaviour of components may be inter-related. These complex behavioural relationships between components are often diﬃcult to express, and are even more diﬃcult to encode and maintain. There are also diﬃculties related to the development process: the developer must be able to rapidly adapt the structure and behaviour of the UI to changing system requirements. Unfortunately, conventional code-centric approaches fall short. Hence, a developer needs to capture the structure and behaviour of a UI such that “accidental complexity” [3] is minimized. We claim that an elegant solution to these problems may be found in Multi-Paradigm Modelling [15]. By modelling every aspect of the system-to-be-built, at the most appropriate level of abstraction, using the most appropriate formalisms, it becomes possible to completely capture the structure, behaviour and visual appearance of a UI, to rapidly generate prototype implementations, to easily adapt the UI as project requirements change, and, ﬁnally, to synthesize a UI and maintain it. The modelling of UIs is an active ﬁeld of research. Navarre et.al. for example developed an architecture capable of handling failures of input and output devices [16]. The goal of their research is to facilitate speciﬁcation, validation and implementation, and testing of User Interfaces, and to achieve plasticity, or dynamic reconﬁguration of user interfaces, not just for visual appearance, but also J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 816–825, 2009. c Springer-Verlag Berlin Heidelberg 2009 Rapid Development of Scoped User Interfaces 817 for behaviour. In this approach, the behaviour of each UI component is explicitly modelled in the ICO formalism, a variant of Petri Nets [2]. The Presentation Framework of VMTS, the Visual Modeling and Transformation System [13] on the other hand, provides a ﬂexible environment for model visualization and provides a declarative solution for appearance description. The project leverages XAML, an XML-based user interface descriptor language, to describe not only the static appearance of the UI, but the dynamic behavior of an element as well. There is a lot of similarity between our approach and the cited projects: we believe in facilitating rapid, domain-speciﬁc modelling of the UI, and that this may best be achieved by explicitly modelling the behaviour of each individual UI component. Our approach distinguishes itself in several important ways from related research. First, we attempt to solve the problems of UI development by casting it as a pure “language engineering” problem. Second, we are primarily concerned with modelling the reactive behaviour of the class of user interfaces that are made up of hierarchically-nested entities. The following section introduces the notion of Scoped User Interface and its uses. Section 3 presents Hierarchically-linked Statecharts (HlS) and section 4 demonstrates the use of HlS to model a visual modelling environment for DCharts, and extension of the Statecharts formalism [11]. Finally, we conclude and give some directions for future work. 2 Scoped User Interfaces A Scoped User Interface is one in which reactive visual components (widgets) such as buttons and windows, but also domain-speciﬁc entities, are hierarchically nested. At the highest level of the hierarchy, widgets exhibit general behaviour. Deeper in the hierarchy, widgets have more speciﬁc behaviour. The notion of scope as it pertains to a UI is analogous to that in high-level programming languages which provide a syntactic means of specifying the hierarchical scope of a variable. The latter is used by the compiler to bind a variable use to its declaration by searching successively higher levels in the scope hierarchy until reaching the global variable space. The variable becomes an event and the bounding boxes of graphical entities become scope delimiters. A Scoped UI, then, is one which has a notion of hierarchical scope, and can bind an event to the most tightly-binding component in a hierarchy, based upon event coordinates. There are many real-world examples in which Scoped UIs are implicitly used. Hereafter, we focus on domain/formalism-speciﬁc modelling environments. Domain- and formalism-speciﬁc modelling have the potential to greatly improve productivity as they [12]. – match the user’s mental model of the problem domain; – maximally constrain the user (to the problem at hand, through the checking of domain constraints) making the language easier to learn and avoiding modelling errors “by construction”; 818 D. Dub´e, J. Beard, and H. Vangheluwe – separate the domain-expert’s work from analysis and transformation expert’s work; – are able to exploit features inherent to a speciﬁc domain or formalism. This will for example enable speciﬁc analysis techniques or the synthesis of eﬃcient code exploiting features of the speciﬁc domain. While editing in multiple formalisms within a single environment is highly desirable, it is important to be able to delimit each formalism’s scope. This is achieved by assigning formalism-speciﬁc behaviour based on graphically-delimited regions. The primary challenge faced when developing Scoped UIs is to describe the interaction between the user on the one hand and the various entities in the UI on the other hand. As those entities may exhibit reactive as well as autonomous, timed behaviour, it makes sense to consider them as “actors” [1,8]. The second challenge is to avoid creating an entirely new speciﬁcation of UI behaviour for each formalism, but rather modifying it to suit special requirements. Hence, it makes sense to have at the root level a single, generic speciﬁcation. Scope-speciﬁc modiﬁcations to this generic speciﬁcation can then be made. That is, for each entity of the scope that has speciﬁc user-interface requirements, a speciﬁc UI speciﬁcation is created specializing or complementing the more generic one. 3 Hierarchically-Linked Statecharts Hierarchically-linked Statecharts (HlS) is a formalism for visually describing the structure and behaviour of Scoped UIs based on a combination of UML Class Diagram and Statecharts [11]. UML Class Diagrams are used to describe permissible relationships such as containment and connectivity between UI components. Statecharts are used to encode reactive behaviour of individual visual entities and their interactions. As will be demonstrated in section 4, HlS make it easier to develop applications with complex UI behaviour faster and more reliably. This is possible, as HlS allow the developer to see UI development as a language engineering problem. Speciﬁcally, HlS entails the following work-ﬂow: 1. One uses an appropriate formalism, such as UML Class Diagrams, to specify the Abstract Syntax of the visual language. This entails specifying all elements in the domain one wishes to model, and qualifying their relationships with other elements. This Class Diagram, together with constraints over its elements is commonly known as a meta-model. 2. Subsequently, one models the Concrete Visual Syntax by associating a visual entity (such as an iconic shape [4]) of the application being developed, 3. One ﬁnally speciﬁes UI behaviour using Statecharts, such that each Statechart is associated with a class and speciﬁes the reactive behaviour of each instance of that class. The Statechart “glues” together Rapid Development of Scoped User Interfaces 819 – reaction to user events such as mouse clicks and key-presses; – interaction with the non-visual part of the language. In particular, checking of well-formedness of constructs against the Abstract Syntax speciﬁcation as well as reﬂecting the Semantics of the language which is often encoded as transformation rules; – layout operations which act exclusively on the Concrete Visual Syntax. The Abstract Syntax, Concrete Syntax, and Behaviour models are suﬃcient to specify the structure, behaviour and appearance of a visual language, each sentence of which is a valid application instance. This visual language speciﬁcation is suﬃcient to allow the automatic synthesis of a language-speciﬁc modelling environment. 4 Example To demonstrate the usefulness and feasibility of explicitly modelling UI behaviour using HlS, a visual modelling environment for the DCharts formalism was created. DCharts, a formalism created by Thomas Feng [9] is an extension of Statecharts. For the implementation, we will use our own tool AToM3 [5,6] (A Tool for Multi-Formalism and Meta-Modelling). 4.1 Specifying DCharts Abstract and Concrete Syntax The Abstract Syntax of the DCharts visual language is shown in Fig. 1, modelled as a variant of UML the Class Diagram formalism. It has classes with attributes, associations with multiplicities, and inheritance. The rectangular boxes in the class diagram describe the nodes/vertices in the visual language. The meaning of the nodes is as follows: – DC DChart is a representation of the entire model. All other entities will be contained by this entity. – DC Basic corresponds to a simple state that does not hierarchically contain others. – DC Composite is nearly identical to DC Basic. A major structural diﬀerence is that it can contain other states. – DC History is the history (pseudo-)state. – DC Orthogonal is an orthogonal block that allows for concurrently active states. The entities whose icons have a hexagonal shape at the top describe relationships/edges in the visual language. The ﬁrst type is the hierarchical containment topological constraint relationship. The following entities are of this type: DC ChartContains, DC Contains, and DC Orthogonality. The second type of relationship are the visible arrows. DC Hyperedge, is a simple directed transition between states. It consists of common Statechart attributes such as a trigger, guard (condition), and action code. It also has DCharts speciﬁc attributes: priority, broadcast code and broadcast to ﬁeld. 820 D. Dub´e, J. Beard, and H. Vangheluwe Fig. 1. DCharts Meta-model in the Class Diagram formalism 4.2 Specifying Formalism-Specific Behaviour Using DCharts Although the above models describe both Abstract and Concrete Syntax of the visual language, we still need to model the behaviour of a language-speciﬁc visual modelling environment. This will be done in the form of Hierarchically linked Statecharts. In the following, the labels on the states and transitions of the UI behaviour Statecharts use a custom notation. This notation does not change the expressiveness of the formalism, but does make the communication between Hierarchically linked Statecharts which is encoded in explicitly in transition, enter, and exit actions, more intuitive to the modeller. A star, x*, indicates that action code is present. A plus, x+, indicates that a diﬀerent Statechart handles the action. Parenthesis, <x>, indicate that the trigger event is generated by another Statechart. Regular brackets, (x), indicate the event was generated by the initialization routine for the entity when it is ﬁrst instantiated. Square brackets1 [x] indicate that the event was generated by the Statechart itself, usually within the action code of a state. Note that while the following behaviour Statecharts were designed to accommodate layout behaviour, and include events speciﬁcally targeting layout, a detailed presentation of this behaviour is beyond the scope of this paper. Button Behaviour Model. Code for Buttons is automatically synthesized for each of the classes in the meta-model. They allow for the instantiation of DChart entities. The button behaviour model shown in Fig. 2 is simple. When the button to create entity X is pushed, the events “<Reset>” and “<X Button>” are sent 1 This should not be confused with the UML Statechart notation for transition guard. Rapid Development of Scoped User Interfaces <Reset> 821 DChartActions <Composite Button> <Create>* Composite Mode <Orthogonal Button> <Create>* Orthogonal Mode (create)* <State Button> Default Idle <Create>* State Mode <History Button> <Create>* History Mode Fig. 2. Button behaviour Statechart to this Statechart. If not already there, the Statechart moves to an Idle state upon receipt of the ﬁrst event. The second event then moves it to a state whereby entity X can get instantiated. It then waits for an event requesting the creation of that entity. The “<Create>” event is generated by the DC DChart speciﬁc behaviour Statechart when it intercepts and handles the “Model Action” event. DChart Entity-Specific Behaviour Models. All visual entities of the DCharts formalism require their own behaviour models. The most important are the root entity that contains all other entities of the DCharts formalism and the composite state. Referring to the class diagram in ﬁgure 1, these correspond to DC DChart and DC Composite respectively. At the other extreme, the behaviour Statechart for the transition edge, DC Hyperedge, is trivial. All the remaining entities, excluding the non-visual containment relationships, use behaviour Statecharts that are specializations of that of the composite state. DC DChart behaviour Statechart. The behaviour of the DC DChart entity begins with initialization when the entity is ﬁrst created. This initialization includes a “(create)*” trigger that sets the active state to “Idle”. From then on, the following ﬁve events trigger interesting behaviour: 1. The “<Control-Button-Press-3>” event indicates that a new DCharts formalism entity should be added to the canvas. Note that the same event is generated if one uses the AToM3 menu system or a keyboard/mouse shortcut. The actual creation of an entity is of course handled by the button behaviour Statechart described previously in 4.2. 2. The “<Control-Button-Press-1>*” event triggers a “modal” lock, forcing all events to be routed only to this Statechart. The lock is only released when either an arrow is ﬁnally created or the process is aborted, via the “<Arrow Created rel="nofollow">*” and “Reset*” events respectively. It is necessary to reﬁne the behaviour found in this generic UI behaviour Statechart for two reasons. The ﬁrst is merely for the convenience of the 822 D. Dub´e, J. Beard, and H. Vangheluwe New Arrow Create DChart Entity+ <layoutRequest> <Any-Motion rel="nofollow">* (create)* Rollback* [Done] <Control-ButtonPress-3> Idle Drop Point* <layoutRequest> [Done] Default [Done] serviceLayoutRequest+ Drop Point Snap Points* <Control-ButtonPress-1>* <DChartSelect>* <Edit>* Toggle Snap* Toggle Snap* <Arrow Created rel="nofollow">* Reset* Drop Point No Snap* [Done] <Any-Motion rel="nofollow">* Drop Point2* Rollback* Fig. 3. DC DChart behaviour Statechart user. Instead of allowing the user to draw arrows to indicate containment relationships, only transitions may be drawn. This saves time, and a dragand-drop behaviour model exists for creating and destroying containment relationships as shall be shown later. The second reason is simply to know when transitions are actually created so that their UI behaviour Statecharts may be initialized. DC Composite behaviour Statechart. The behaviour of DC Composite, the composite state, is the most complex of all. Fortunately, it is also re-usable by many other entities. The initialization phase is rather involved, with two main possibilities. The ﬁrst is that an interactive session with the user is in eﬀect, in which case the “(create)” trigger signals the creation of a new DC DChart. Immediately, the user is presented with a dialog asking to which of the entities in the region of the newly created DC Composite, they would like to contain the new composite state. If the composite state is successfully connected to either a DC DChart or another DC Composite, then the “[didConnect]” trigger is generated, followed by a “<layoutRequest>” event to the container, and ﬁnally a “[Done]” event to set the state to “HasParent”. If the composite state is not successfully connected, then a “[didNotConnect]” event is generated and the active state is set to “NoParent”. Finally, the second of the two possibilities is that the model is being loaded rather than interactively edited. In this case, a “(loadModelCreate)” event is ﬁrst sent when the DC Composite is ﬁrst instantiated, setting the active state to “NoParent”. Then, a second “(loadModelCreate)” event is sent if a containing relationship is instantiated with this DC Composite as its parent, thus setting the active state to “HasParent”. The following is a list of all the events that occur after the initialization phase. Rapid Development of Scoped User Interfaces Default* [didConnect] hierarchicalConnect* 823 hierarchicalDisconnect* (create) [disconnected] [Done] requestLayoutOnParent* requestLayoutOnOldParent* Idle (loadModelCreate) [drop] [Done] [didNotConnect] (loadModelConnect) <layoutRequest> [stayedConnected] NoParent [Done] HasParent <DChartSelect>* [notDropRoot] History <DChartDelete> [Request drop] serviceLayoutRequest+ [drop] H [Done] <DChartDrop> Request Drop* <Edit> Edit* finalLayoutRequest* Fig. 4. DC Composite behaviour Statechart 1. The ”<DChartSelect>*” event is dealt with in the same manner as the DC DChart UI behaviour Statechart. All hierarchical children are selected. 2. The “<Edit>” event indicates that the user has opened an edit dialog on the DC Composite attributes. This allows the user to modify the visual appearance of the node, which may trigger requests for layout. 3. The ”<DChart Drop>” event indicates that this composite state, among potentially many other entities, has just been dragged and then dropped. The transition with this trigger promptly generates two events: “[Done]”, which restores the active state to either “NoParent” or “HasParent”, followed by “[drop]”, which causes hierarchical connection or hierarchical disconnection, respectively, to be attempted. The latter occurs only if the entity has been dropped outside of its parent container and the user has explicitly agreed to disconnect it. This triggers a “<layoutRequest>” followed by an attempt to hierarchically connect the disconnected composite state in its new location. 4. The “<DChartDelete>” event indicates that this composite state is to be deleted. DC Hyperedge behaviour Statechart. The behaviour of the DC Hyperedge or transition, is trivially simple, as Fig. 5 shows. As noted earlier, the transition is a hyper-edge only in the meta-model, in the generated DCharts formalism itself it is a simple directed edge with one source and one target. The transition is ﬁrst initialized with a “(create)” event. Afterwards, it simply awaits “<Edit>*” events in order to apply changes made in its edit dialog. These changes aﬀect the information content of the label associated with the transition. For a full description of all behaviour Statecharts we refer to M.Sc. thesis of the ﬁrst author [7]. It is ﬁnally possible to synthesize a visual DChart modelling environment from the Class Diagram meta-model and the Hierarchically linked Statechart models as shown in Fig. 6. Note how this demonstrates support for multi-formalism modelling, with speciﬁc behaviour for Buttons and DChart entities. 824 D. Dub´e, J. Beard, and H. Vangheluwe <Edit>* (create) Default* Idle Fig. 5. DC Hyperedge behaviour Statechart Fig. 6. Synthesized DCharts modelling environment with model instance 4.3 Conclusion We have shown how it is possible to model complex, scoped, formalism-speciﬁc UI behaviour using HlS. This was possible by modelling abstract syntax and concrete syntax of a visual language explicitly as well by attaching models of behaviour to all entities in the language. It was possible to develop an example application quickly. The result has proven to be both robust and easy to maintain. Our contribution is most closely related to the work by Minas on DiaGen and DiaMeta [14]. Our focus is however mostly on the explicit modelling of behaviour. We believe HlS can be used as the “assembly language” for UI behaviour modelling. As such, we will explore other notations such as task models [10] and map them onto HlS. Our current work implements the above idea in the form of UIs running entirely within a web-browser, using only SVG and ECMAScript (JavaScript). Rapid Development of Scoped User Interfaces 825 References 1. Agha, G., Hewitt, C.: Actors: a conceptual foundation for concurrent objectoriented programming. pp. 49–74 (1987) 2. Barboni, E., Conversy, S., Navarre, D., Palanque, P.: Model-based engineering of widgets, user applications and servers compliant with ARINC 661 specification. In: Doherty, G., Blandford, A. (eds.) DSVIS 2006. LNCS, vol. 4323, pp. 25–38. Springer, Heidelberg (2007) 3. Brooks, F.P.: No silver bullet: Essence and accidents of software engineering. Computer 20(4), 10–19 (1987) 4. Costagliola, G., Lucia, A.D., Orefice, S., Polese, G.: A classification framework to support the design of visual languages. J. Vis. Lang. Comput. 13(6), 573–600 (2002) 5. de Lara, J., Vangheluwe, H.: AToM3 : A tool for multi-formalism and metamodelling. In: Kutsche, R.-D., Weber, H. (eds.) FASE 2002. LNCS, vol. 2306, pp. 174–188. Springer, Heidelberg (2002) 6. de Lara, J., Vangheluwe, H.L.: Defining visual notations and their manipulation through meta-modelling and graph transformation. Journal of Visual Languages and Computing, Special Issue on Domain-Specific Modeling with Visual Languages 15(3-4), 309–330 (2004) 7. Dub´e, D.: Graph layout for domain-specific modeling. M.Sc. dissertation, School of Computer Science, McGill University (June 2006) 8. Edward, S.N., Lee, A., Wirthlin, M.J.: Actor-oriented design of embedded hardware and software systems. Journal of Circuits, Systems, and Computers 12(3), 231–260 (2003) 9. Feng, T.H.: DCharts, a formalism for modeling and simulation based design of reactive software systems. Master’s thesis, School of Computer Science, McGill University, Montr´eal, Canada (February 2004) 10. Forbrig, P., Patern` o, F. (eds.): HCSE/TAMODIA 2008. LNCS, vol. 5247. Springer, Heidelberg (2008) 11. Harel, D.: Statecharts: A visual formalism for complex systems. Sci. Comput. Program. 8(3), 231–274 (1987) 12. Kelly, S., Tolvanen, J.-P.: Domain-Specific Modeling: Enabling Full Code Generation. Wiley, Chichester (2008) 13. Levendovszky, T., Lengyel, L., Mezei, G., M´esz´ aros, T.: Introducing the vmts mobile toolkit. In: Sch¨ urr, A., Nagl, M., Z¨ undorf, A. (eds.) AGTIVE 2007. LNCS, vol. 5088, pp. 587–592. Springer, Heidelberg (2007) 14. Minas, M., K¨ oth, O.: Generating diagram editors with diaGen. In: M¨ unch, M., Nagl, M. (eds.) AGTIVE 1999. LNCS, vol. 1779, pp. 433–440. Springer, Heidelberg (2000) 15. Mosterman, P.J., Vangheluwe, H.: Computer Automated Multi-Paradigm Modeling: An Introduction. Simulation: Transactions of the Society for Modeling and Simulation International, Special Issue: Grand Challenges for Modeling and Simulation 80(9), 433–450 (2004) 16. Navarre, D., Palanque, P.A., Ladry, J.-F., Basnyat, S.: An architecture and a formal description technique for the design and implementation of reconfigurable user interfaces. In: Graham, T.C.N., Palanque, P. (eds.) DSV-IS 2008. LNCS, vol. 5136, pp. 208–224. Springer, Heidelberg (2008) PaMGIS: A Framework for Pattern-Based Modeling and Generation of Interactive Systems Jürgen Engel and Christian Märtin Augsburg University of Applied Sciences, Faculty of Computer Science, Postfach 110605, 86031 Augsburg, Germany {Juergen.Engel,Christian.Maertin}@hs-augsburg.de Abstract. This paper introduces the PaMGIS framework for pattern-based modeling, generation and usability evaluation of interactive systems. It describes the structural aspects of HCI pattern languages and how such languages and patterns for various modeling stages (e.g. task modeling) and abstraction levels can be exploited to automate part of the software development process for interactive applications. The main components and the general functionality of the framework are discussed. The remaining part of the paper focuses on the low-level automation component of the framework and illustrates how the code for concrete interaction objects is generated from semi-abstract user interface patterns. Keywords: Interactive system, user interface, model-driven development, pattern-based development, HCI pattern languages, task-models, software generation, usability evaluation. 1 Introduction Over the last decades interactive software has become a more and more essential ingredient of human life. No matter where people are and what they do, they are used to interact – knowingly or unwittingly – with products built around interactive software components, such as web applications, telecommunication devices, car navigation systems, household appliances, or other electronic equipment. In equal measure, software engineering and software development have become increasingly important disciplines. There is a steadily growing demand for targeted, cost-effective, and efficient production of high quality software applications. To meet these requirements several very worthwhile techniques for supporting the overall software process have been introduced, such as model-driven and pattern-based development, automated software generation and usability evaluation. In order to maximize the overall benefit for developers and user interface designers of interactive systems we combined these techniques within an integrated software development framework for pattern-based modeling and generation of interactive systems (PaMGIS), which is described in the following chapters (fig. 1). The system is currently under development with prototypical implementations of major components already available. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 826–835, 2009. © Springer-Verlag Berlin Heidelberg 2009 PaMGIS PaMGIS Framework 827 Modelbased Development Automation Usability Evaluation Patternbased Development Fig. 1. Combined and integrated software development approach of PaMGIS 2 Related Work 2.1 Structured Pattern Languages Patterns and pattern languages have a growing impact on the disciplines of HCI and web engineering. Originally pattern languages were constructed for solving problems in architecture and urban planning [1]. In the early nineties they were adopted by software engineers and software architects for promoting the reuse of high-quality design solutions [5]. For over a decade now patterns and pattern languages have also entered the fields of HCI, usability engineering [7], user experience [14] and organizational workflow [6]. Numerous comprehensive pattern catalogues and pattern languages have been developed for improving usability aspects and the quality of interface and interaction design, e.g. [13],[15]. However, most of these pattern collections lack an appropriate organizational structure in order to facilitate pattern selection and ensure the overall coverage of domain dependent and independent modeling and design problems. In [3] manageability aspects of various existing UI pattern catalogues are discussed and compared. In [8] we introduced a structured approach both for designing hierarchically organized HCI pattern languages and controlling the selection of the really needed patterns during the software development process. Meanwhile we have tested and evaluated our approach for three newly designed HCI pattern languages from different domains (online shops, industrial automation, database administration). We have also added components for the semi-automated generation of real user interfaces from the selected patterns represented in XML and for binding them to the application domain code. These results heavily contribute to the PaMGIS framework introduced in this paper. 2.1 Patterns and Software Engineering Pattern-based software engineering has to cover the overall software development life-cycle from early planning down to coding, maintenance and usability evaluation 828 J. Engel and C. Märtin activities. Several pattern-based approaches explore the detailed design aspects for getting from problem definitions to possible solutions, e.g. by using case-based reasoning techniques [16]. Other approaches apply formal specification models for defining flexible pattern languages for domain-specific interactive systems [2]. However, some of the most promising software engineering approaches for highly usable interactive systems combine both patterns and task models for detecting all the dynamic and contextual aspects for modeling and designing high-quality target applications. In [11] a task- and pattern-driven way for designing multiple user interfaces is discussed. In [4] the authors demonstrate that concrete task models may be derived from more abstract task model patterns. And in [12] a comprehensive process for user interface engineering using patterns is introduced. 3 PaMGIS Framework Concept 3.1 General Functionality The PaMGIS framework is designed to support software developers with a combination of model-based and pattern-based approaches (fig. 2). By exploiting information inherent to task-models and pattern languages it is intended to provide a maximum degree of automation for the generation of artifacts from abstract (UML/XML) and semi-abstract (XML) application models down to the resulting application source codes. In any process step the user of the framework has the option to interfere with and manipulate results as desired. Further PaMGIS is capable of letting the resulting software applications automatically generate usage data log files during runtime. These logs are evaluated and the results are fed back to the original pattern definitions. The central component of the framework is a pattern repository containing different types of patterns and pattern languages of different abstraction levels, i.e. architecture patterns, design patterns, and HCI patterns. It not solely incorporates the pure pattern definitions structured according to [8], but also pattern metadata and additional components for code and usage data generation and usability evaluation feed back. PaMGIS provides capabilities for pattern administration and maintenance as well as reporting facilities. In a first process step the various input models – in particular the task, user, device, and context models – are interpreted. Related patterns are selected from the pattern repository and an abstract application model is generated, representing the high-level architecture and structure of the future software application. As task model representation language we prefer CTTs as described in [9]. In a subsequent step, the model generation phase produces a semi-abstract application model, a much more detailed description of the target application. Here, the various components from the additional pattern information are exploited to extend the model in order to prepare the next process phase. Within the now following code generation step the semi-abstract application model is transformed into application source code. The related generator consists of various plug-in components for different target programming languages. Currently we have prototypical implementations for C++, C#, Java and HTML available. PaMGIS User Model Task Model Device Model 829 Context Model Pattern Selection and Preparation Abstract Application Model Pattern Repository Model Generation Barge-in Option SemiAbstract Application Model Code Generation Pattern Administration Application Code Usage Data Evaluation Compiler Application Usage Reporting Usage Data Repository Fig. 2. PaMGIS Architecture Overview Having the source code compiled and the resulting software application up and running, usage data records are written to the usage data repository. From here, pattern language-specific and pattern-specific evaluations can be performed, e.g. using predefined metrics. As the related results are stored back to the pattern repository this new information can be used to improve quality in the next software generation cycle. 3.2 From Patterns to Generated Code The framework stores pattern languages and the actual patterns in XML format. Patterns are composed of typical elements, such as the name of the respective pattern language, pattern name, problem, context and solution descriptions, and application examples. For software generation purposes we have enhanced the pattern structure by a special component named <automation rel="nofollow">. It includes information required for application model and code generation. This element is mission-critical for the generation procedures and must not contain erroneous content. Therefore these parts of the pattern definitions are validated using a specific document type definition (DTD). 830 J. Engel and C. Märtin The <automation rel="nofollow"> component consists of two mandatory elements, <code> and <children>. While <code> specifies the later appearance of the respective artifacts, the <children> element provides definitions of potential child elements. The <code> section is mandatory and consists of three potential child elements: <element>, <attribute rel="nofollow"> and <value> (table 1). Table 1. Child elements of the XML tag <code> Child element <element> <attribute rel="nofollow"> <value> Description Specifies the name of the tag to be generated. Occurs exactly one time. Allows the specification of desired attributes of the tag to be generated. Optional. Allows the specification of a specific value for the tag to be generated. Optional. Here, for <attribute rel="nofollow"> and <value> several attributes can be specified (table 2). Table 2. Attributes of the child elements <attribute rel="nofollow"> and <value> Child element <attribute rel="nofollow"> <attribute rel="nofollow"> Attribute name mandatory <attribute rel="nofollow"> range <attribute rel="nofollow"> <attribute rel="nofollow"> option fixed <value> mandatory Description Specifies the name of the attribute. Mandatory. Specifies whether the respective generated field must be filled out or not. Boolean, default false. Specifies a range of possible numerical values. Optional. Specifies a list of possible values. Optional Enforces the system to use the specific value which has been specified within the pattern language. Cannot be modified during runtime of the generated application. Boolean, default false. Specifies whether the respective generated field must be filled out or not. Boolean, default false. The <children> section within the <automation rel="nofollow"> component is also mandatory and consists of only one potential child element: <child> (table 3). Table 3. Child elements of the XML tag <children> Child element <child> Description Specifies the potential child elements. Optional. The following table shows the possible attributes of the child element <child> (table 4). PaMGIS 831 Table 4. Attributes of the child element <child> Child element <child> Attribute ref <child> <child> <child> min max choose <child> priority <child> relation <child> anchor Description Reference to the child element by its pattern ID. If the pattern is part of a different pattern language the reference has to be specified in the format of PLname#PatternID. Mandatory. Minimum number of occurances. Maximum number of occurances. Gives the user the possibility to choose one out of several alternatives. Specification of priorities in order to influence the order and positions of elements. Specifies the relation between elements and therefore influences their positions. Specifies the general orientation of a child element. In the following it is demonstrated how an example user dialogue can be modeled and generated by using the previously described functions of the framework. We produce a simple dialogue box providing the user with two buttons: ‘ok’ to confirm the antecedent action and ‘cancel’ in order to abandon. Fig. 3. Pattern hierarchy used for the exemplary ‘Ok/Cancel’ user dialogue In the first step we select three patterns: ‘StandardFrame’ representing the standard window for the user dialogue, ‘OkCancelBar’ describing the panel which will later on incorporate the buttons, and ‘Button’ specifying the appearance of the two buttons. The patterns and their relationships follow the rules and structures as described in [12]. They are interrelated as shown in figure 3. The most relevant information required for the generation process is covered within the <automation rel="nofollow"> components of the chosen patterns. In the <code> section of the ‘StandardFrame’ pattern (please refer to the following code listing) we find the specifications of the <element> ‘panel’ and two attributes called ‘name’ and ‘type’. Both attributes are mandatory (mandatory=”true”). The ‘type’ attribute defines that the respective artifact will become a frame. The <children> section defines a child element which may occur not once or at most one time. It is based on the ‘OkCancelBar’ pattern (ref=”B1”, which is the pattern ID of that pattern). 832 J. Engel and C. Märtin Code fragment: <automation rel="nofollow"> component of the ‘StandardFrame’ pattern (ID=A1) <automation rel="nofollow"> <code> <element>panel</element> <attribute name=”name” mandatory=”true” / rel="nofollow"> <attribute name=”type” mandatory=”true” rel="nofollow">Frame</attribute> </code> <children> <child ref=”B1” min=”0” max=”1”>gridx=0;gridy=1</child> </children> </automation> On the next deeper level, i.e. the ‘OkCancelBar’ pattern, the <code> section of the <automation rel="nofollow"> component also specifies an element named ‘panel’ and several attributes, such as ‘name’, ‘type’ and further ones which define the position of the frame inside its parent element. The <children> section defines two child elements which will be derived from the ‘Button’ pattern (ID=C1), occur in each case exactly once and will be positioned as specified. One will be labeled ‘Ok’, the other ‘Cancel’. Code fragment: <automation rel="nofollow"> component of the ‘OkCancelBar’ pattern (ID=B1) <automation rel="nofollow"> <code> <element>panel</element> <attribute name=”name” mandatory=”true” rel="nofollow">Ok_Cancel</attribute> <attribute name=”type” mandatory=”true” rel="nofollow">internalPanel </attribute> <attribute name=”gridx” mandatory=”true” / rel="nofollow"> <attribute name=”gridy” mandatory=”true” / rel="nofollow"> <attribute name=”gridwidth” mandatory=”false” / rel="nofollow"> <attribute name=”gridheight” mandatory=”false” / rel="nofollow"> <attribute name=”weightx” mandatory=”false” range=”0.0,1.0” / rel="nofollow"> <attribute name=”weighty” mandatory=”false” range=”0.0,1.0” / rel="nofollow"> <attribute name=”fill” mandatory=”false” option=”NONE,BOTH, HORIZONTAL,VERTICAL” / rel="nofollow"> <attribute name=”anchor” mandatory=”false” option=”CENTER,NORTH, NORTHEAST,EAST,SOUTHEAST,SOUTH,SOUTHWEST,WEST, NORTHWEST” / rel="nofollow"> <attribute name=”insets” mandatory=”false” / rel="nofollow"> <attribute name=”ipadx” mandatory=”false” / rel="nofollow"> <attribute name=”ipady” mandatory=”false” / rel="nofollow"> </code> <children> <child ref=”C1” min=”1” max=”1”>gridx=0;gridy=0;fill=NONE; anchor=EAST;gridwidth=1.0;value=Ok</child> <child ref=”C1” min=”1” max=”1”>gridx=1;gridy=0;fill=NONE; anchor=EAST;value=Cancel</child> </children> </automation> Finally the model generator has to interpret the ‘Button’ pattern twice, one time for each of the two buttons. The <code> section defines an element ‘button’ and several attributes for the name and various positioning data. Please note, that the <children> section is particularly specified, but does not contain any child element definitions. PaMGIS 833 Code fragment: <automation rel="nofollow"> component of the ‘Button’ pattern (ID=C1) <automation rel="nofollow"> <code> <element>button</element> <attribute name=”name” / rel="nofollow"> <attribute name=”gridx” mandatory=”true” / rel="nofollow"> <attribute name=”gridy” mandatory=”true” / rel="nofollow"> <attribute name=”gridwidth” mandatory=”false” / rel="nofollow"> <attribute name=”gridheight” mandatory=”false” / rel="nofollow"> <attribute name=”weightx” mandatory=”false” range=”0.0,1.0” / rel="nofollow"> <attribute name=”weighty” mandatory=”false” range=”0.0,1.0” / rel="nofollow"> <attribute name=”fill” mandatory=”false” option=”NONE,BOTH, HORIZONTAL,VERTICAL” / rel="nofollow"> <attribute name=”anchor” mandatory=”false” option=”CENTER,NORTH,NORTHEAST,EAST,SOUTHEAST, SOUTH,SOUTHWEST,WEST,NORTHWEST” / rel="nofollow"> <attribute name=”insets” mandatory=”false” / rel="nofollow"> <attribute name=”ipadx” mandatory=”false” / rel="nofollow"> <attribute name=”ipady” mandatory=”false” / rel="nofollow"> <value mandatory=”true” /> </code> <children /> </automation> After having processed the pattern definitions listed above, the model generator produces the following application model definition. Code fragment: respective application model definition as computed by the model generator <windows> <panel name=”OkCancelFrame” type=”Frame”> <panel name=”Ok_Cancel” type=”internalPanel” gridx=”0” gridy=”1”> <button gridx=”0” gridy=”0” fill=”NONE” anchor=”EAST” gridwidth=”1.0”>Ok<button/> <button gridx=”1” gridy=”0” fill=”NONE” anchor=”EAST”> Cancel<button/> </panel> </panel> </windows> In the last step, the code generator processes the application model and creates the source code which subsequently can be compiled [10]. When using the code generator for Java 1.6 Swing, the resulting ‘Ok/Cancel’ user dialogue looks like as shown in figure 4. Fig. 4. ‘Ok/Cancel’ user dialogue as produced by the Java 1.6 Swing code generator 834 J. Engel and C. Märtin 4 Conclusion PaMGIS is our approach to enhance productivity and developer experience for interactive system designers by combining two valuable software development techniques, model-based and pattern-based development. The framework supports our research with respect to the potentials and limits of automated software development. The Pattern Repository serves as a structured reservoir of patterns organized in different pattern languages and on different levels of abstraction which can be used for generating interactive software applications for a growing number of problem domains. Moreover the system allows evaluating the patterns’ usability and to reflect these findings back to the pattern repository. This leads to a better understanding of how to use individual patterns. Finally PaMGIS supports to generate, evaluate, and compare alternative designs of interactive applications within the same problem domain. References 1. Alexander, C., Ishikawa, S., Silverstein, M.: A pattern language. Oxford University Press, Oxford (1977) 2. Borchers, J.O.: A Pattern Approach to Interaction Design. In: Proc. DIS 2000, pp. 369– 378. ACM Press, Brooklyn, New York (2000) 3. Deng, J., Kemp, E., Todd, E.G.: Managing UI Pattern Collections. In: Proc. CHINZ 2005, Auckland, New Zealand, pp. 31–38. ACM Press, New York (2005) 4. Gaffar, A., Sinnig, D., Seffah, A., Forbrig, P.: Modeling Patterns for Task Models. In: Proc. TAMODIA 2004 (2004) 5. Gamma, E., et al.: Design Patterns. Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading (1995) 6. Guerrero Garcia, J., Vanderdonckt, J., Gonzalez Calleros, J.M., Winckler, M.: Towards a Library of Workflow User Interface Patterns. In: Graham, T.C.N., Palanque, P. (eds.) DSV-IS 2008. LNCS, vol. 5136, pp. 96–101. Springer, Heidelberg (2008) 7. Marcus, A.: Patterns within Patterns. Interactions 11(2), 28–34 (2004) 8. Märtin, C., Roski, A.: Structurally Supported Design of HCI Pattern Languages. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 1159–1167. Springer, Heidelberg (2007) 9. Paternò, F.: Model-based Design and Evaluation of Interactive Applications. Springer, London (2000) 10. Roski, A., Märtin, C.: Pattern-Sprachen und Automatisierung. In: Koschke, R., et al. (eds.) Proc. zur Informatik 2007, September 24-28, 2007. Bremen, Informatik trifft Logistik, Band 1, GI-edn., Lecture Notes in Informatics, pp. 454–458 (2007) 11. Seffah, A., Forbrig, P.: Multiple User Interfaces: Towards a Task-Driven and PatternsOriented Design Model. In: Forbrig, P., Limbourg, Q., Urban, B., Vanderdonckt, J. (eds.) DSV-IS 2002. LNCS, vol. 2545, pp. 118–132. Springer, Heidelberg (2002) 12. Seffah, A., Gaffar, A.: Model-based User Interface Engineering with Design Patterns. J. of Systems and Software 80(8), 1408–1422 (2007) 13. Tidwell, J.: Interaction Design Patterns. In: Proceedings of the Pattern Languages of Programming PLoP 1998 (1998) PaMGIS 835 14. Tiedtke, T., Krach, T., Märtin, C.: Multi-Level Patterns for the Planes of User Experience. In: Proc. of HCI International, Las Vegas, Nevada, USA, July 22-27, 2005. Theories Models and Processes in HCI, vol. 4, Lawrence Erlbaum Associates, Mahwah (2005) 15. van Welie, M., Traetteberg, H.: Interaction Patterns in User Interfaces. In: 7th Pattern Languages of Programs Conference, Allerton Park Monticello, USA, August 13-16 (2000) 16. Wentzlaff, I., Specker, M.: Pattern-based Development of User-friendly Web Applications. In: Proc. ICWE 2006 Workshops, Palo Alto, July 10-14, 2006, ACM Press, New York (2006) People-Oriented Programming: From Agent-Oriented Analysis to the Design of Interactive Systems Steve Goschnick Department of Information Systems University of Melbourne, VIC 3010, Australia stevenbg@unimelb.edu.au Abstract. Where the Object-Oriented paradigm set about abstracting objects, Agent-Oriented (AO) theory draws on Psychology to abstract mentalist notions like: beliefs, perceptions, goals, and intentions. As such, the associated AgentOriented analysis can be used quite successfully to design interactive systems for people, delivering applications that are heavily individual-oriented. This reversal of the AO lens focuses analysis back upon people. It puts a multi-faceted agent used in analysis ‘into the shoes’ of the user and turns the design and implementation into one we call People-Oriented Programming (POP). POP calls on users to gather ethnographic data about themselves using Cultural Probes and on end-user innovation via software toolkits. This turn of focus is timely as the analyst/designer of interactive systems is facing new challenges regarding flexibility, user situatedness, dynamic environments, incomplete data, diversity in user needs, sensors in the environment, and users emersed in multiple parallel social worlds. Based on an extensive background analysis this paper distills a set of key aspects that any POP effort should possess. Keywords: Agent-oriented analysis, agent-oriented paradigm, user innovation, HCI, people-oriented programming, agent meta-models, ShadowBoard Agents. 1 Introduction Several Agent-Oriented (AO) architectures draw on models from Psychology (e.g. BDI and ShadowBoard [9]), abstracting mentalistic notions, such as: beliefs, perceptions, goals and intentions. As such, some associated agent-oriented analysis, can be used quite successfully to design interactive systems for individuals with heterogeneous needs. This reversing of the lens of AO back upon people, places a multi-faceted agent analysis ‘into the shoes’ of the user and turns the design and implementation into one we call People Oriented Programming. This reversal of focus for AO analysis is timely, as modern interactive systems are placing new challenges upon the analyst/designer: a heightened degree of flexibility, situatedness of users, uncertain and dynamic environments, incomplete information, diversity in users and their needs, sensors proliferating in the environment, and users emersed in multiple parallel social worlds, instead of in one fixed organisation. It has a particular strength in the domestic setting, where people spend a significant amount of their time, often on non-work tasks, goals and less descript activities. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 836–845, 2009. © Springer-Verlag Berlin Heidelberg 2009 People-Oriented Programming 837 Agent-Oriented analysis and design can deal with user situatedness via agent adaptability. An agent’s internal world view, coupled with high-level core value goals [9] facilitates autonomous behaviour. An AO system can deal with the non-sequential external events in an agent’s environment (reactive behaviours), while continuing with their current goals (proactive behaviours). In this paper we look in detail at these two aspects of interaction design in mixed-initiative human-agent systems [3]: dealing with the changing user context; and the message-flow model that facilitates reactive and proactive behaviours. It is presented in the context of the ShaMAN multi-agent meta-model [11], as instantiated in the DigitalFriend software [10]. Then, as the main contribution, this paper presents People Oriented Programming (POP) as a new design paradigm for building personal systems. Based on an extensive background analysis it distills a set of key aspects that any POP effort should possess. POP calls upon the user in three capacities: as the focus of customised software, which von Hippel and Katz describe as ‘markets of one’ [24]; as a self-ethnographer using Cultural Probes [7] to gather data; and as an end-user developer via software toolkits [24] designed to make the user central to innovation in new product development, the way that end-users are doing in the games genre [18] and in mashups of Internet services [2]. The technology used here to pursue People Oriented Programming is the DigitalFriend, V1 of which instantiates the ShadowBoard Agent Architecture [9]. Its theoretical base draws on Analytical Psychology – giving POP its fourth layer of meaning. Before looking at POP and the agent analysis, we first look at the pathways for interaction within the ShaMAN agent meta-model, in order to characterise and scope the design of the necessary interaction system. 2 Interaction through Operators, SpeechActs and UI devices The DigitalFriend V2 is multi-agent system (MAS) software – an instantiation of the ShaMAN meta-model – with a central goal of helping an individual user in the fullspectrum of their life (work, leisure, family, community activity, etc.). It is designed to monitor, alert, filter and initiate tasks, messages and resources, all within the context of the user’s goals and activities. The Personal Assistant Agents accumulated within the DigitalFriend make it a mixed-initiative human-agent system [3]. Interactions that can take place, include: inter-agent communication which is facilitated via speech-acts in an agent communication language; agent-to-user, usually via messages accompanied by visual and/or aural alerts; and also via direct user-to-agent interaction usually accomplished through UI interface components. 2.1 Interaction through Speech-Acts SpeechFlow in ShaMAN represents an interaction model for agent intercommunication. The allowable message types between two given agents, is an interaction plan (or a communication protocol), within which all the dynamically produced speech-acts, abide. Note: the right-hand side of figure 1 represents a part of an interaction plan, in the form of sender Æ speechact Æ receiver, for a number of sub-agents within a user’s DigitalFriend. 838 S. Goschnick Fig. 1. Message mapping and Agent Interaction Plans On the left is the part of the ShaMAN meta-model that deals with the flow of speech-acts. The sending agent uses speech-acts to communication to either: other agents (including the user); to whole SocialWorlds; or to a specific SocialRole across SocialWorlds. In the DigitalFriend, each agent has a queue of messages received from the SpeechFlow, and the situatedness of the agent (particularly the human user), often determines which ones come off the queue, at what times, as we see in Section 3. 2.2 Human-to-Agent Interaction through UI Devices The level of granularity of interaction between the human user and the agents in their DigitalFriend are varied. The concept of Locales is taken from Fitzpatrick [5], and can represent any place in which interaction takes place between the members of a Social World. This includes abstract places such as the graphical representation of a GUI on the screen. It can be something as commonplace as a File-Chooser GUI component that an agent uses to ask the user for a file. Or it can be a custom UI widget an agent uses to request specific information from the user. Figure 2 is part of the ShaMAN meta-model as follows: an Agent has a number of roles in multiple SocialWorlds represented as AgentRole. Roles have a set of goals that once initiated, form an Agent’s intentions (represented here as AgentRoleGoal). Tasks are set in motion to achieve a goal. Sub-tasks are performed by sub-agents, but some require the user to perform a task, or to ponder a new situation. An agent may call on Human-Agent-Interaction (the HAI entity) to achieve the necessary task. A Locale can be a place for interaction. I.e. the UI components in fig.3 include an interactive map of the world put to screen by an agent, which is waiting on the user to select and confirm a country. Where a traditional Task Analysis may go down to key-stroke interactions, in an agent system that uses high-level UI components, the task-granularity stops at putting the UI component to screen, and receiving the user’s selection. I.e. The nonsequential nature of user interaction with a complex GUI component, is not of specific interest beyond what the user actually selects, facilitated here by the link between the Task and the HAI entities. People-Oriented Programming 839 Fig. 2. Model of Human-Agent-Interaction within ShaMAN 3 Agent State as a Basis for Analysis In the DigitalFriend the user is represented as another agent, but with several added features: agents can interact with the user via direct actions (in addition to speechacts) via UI interfaces represented as Locales; the user has a set of known behaviours including walking, driving, reading, meeting, sleeping; and, the user is represented computationally at the top of the DigitalFriend’s hierarchy of agents, with their Goals sitting at the top of the Goal hierarchy/network. Fig. 3. Human-Agent-Interaction within DigitalFriend V2 There is a complex representation of agent-state within the ShaMAN meta-model, giving a comprehensive coverage of the user’s situatedness (figure 4), including what Locale they are currently in, who and what agents currently inhabit that Locale, what resources they have at hand, and more. In analysis it is paramount to discover the ‘what’ rather than the ‘how’ [17]. To do so, we assume all people in a user’s SocialWorlds are equally empowered with a MAS of the sophistication of at least the DigitalFriend. In a given Locale the Inhabitants are assumed to be known, along with their Roles. Of the Resources across the system, those in AgentResource are immediately available to an Agent, in addition the agent may use OnSiteResources. 840 S. Goschnick Fig. 4. Agent State in ShaMAN [11], with respect to situatedness Note: The ShaMAN meta-model (the greyed-out background in fig. 4, available elsewhere [11]) has 30 classes/entities in UML class diagram notation. It effectively combines several sub-models: Role, Goal, Task, Interaction, SocialWorld, Resource and Locale models. These models are interconnected by a number of associate entities, in ER (entity relation) terminology – i.e. the entities on the many end of the one-to-many relationships, in crows-foot notation. E.g. Inhabitant, Responsibility, Member, AgentResource and OnSiteResource cross-relate several sub-models, to great benefit regarding situatedness, in analysis, design and implementation. Communication to the user from the DigitalFriend can be filtered for their current situation. These elements of state (as per figure 4 above), together with the user’s current behaviour, allows ShaMAN to select messages from the agent’s current queue of messages (in particular, the user), according to a set of message delivery rules. It becomes easy to stipulate rules such as: No taking a phone call while driving. The rules can be stipulated more clearly once a user’s state has been ascertained: What SocialWorlds do they belong to, and what are their Roles in those worlds? What are the Responsibilities that go with those Roles? Who else are members of those SocialWorlds and what are their respective Roles? What Locales (real and abstract) does the user frequent in the course of fulfilling their Responsibilities? What are the current conditions in the Locale? What Resources does the user have at their disposal, within each Role? What Resources do their agents have at their disposal? What Resources are available in the Locales a user is expected to use? What Agents does the user expect to have available when in each of the Locales? What Agents does the user expect to have available when they are in each of their Roles? 4 People-Oriented Programming as a Design Paradigm 4.1 Background This general method follows on from the Shadowboard Methodology by Goschnick and Graham [12]. It was not a generic AO methodology, but was specifically aimed at gathering and tailoring the AO requirements for a multi-agent Personal Assistant Agent (PAA) system, for an individual user. The authors made the point with: People-Oriented Programming 841 “…the primary requirement in the work presented here was to have a 24x7 userrepresentation available (even) while the user sleeps or is otherwise offline, within a tool capable of autonomous computation, some decision-making, some information filtering and with the ability to concentrate the presentation of relevant information and to inform the user at the most convenient time” Their central idea then was to marry a top-down 62 role/sub-role model starting template, with certain bottom-up techniques from ethnography including Cultural Probes by Gaver et al [7] in the form of “user-kept diaries” and user scenarios in the tradition of Rosson and Caroll [19], into a methodology that addressed the personal aspirations and desires of an individual together with their social needs. To bring an individual’s social needs into the equation, Goschnick and Graham foreshadowed future work that called upon the theory of Social Worlds by Strauss [21] and the Locales framework [5]. That foreshadowed work there, is the ShaMAN meta-model here, which now facilitates Social Worlds and Locales in a MAS system. A significant difference in the Shadowboard Methodology over other agent methodologies involves the Role entity: a role model is not only used in the requirements gathering process, in addition, it serves functions within the analysis, design and in the implementation of a Personal Assistant Agent. E.g. A role-hierarchy lens is used to filter and organise messages for the user’s benefit, that come from various subagents [10]. These messages are also stored in a log that represents an interaction trajectory arc [5] of the user’s life so far. Cultural Probe [7] data captures the richness of individual users in the domestic and social space, however, many Ethnographers use it to inspire their own design processes [1,8] rather than to enhance the design process. There have been pockets of research trying to bring ethnographic data and/or scenarios into the Software Engineering process as seamlessly as possible [14,22,6,13]. Some provide support for a multi-disciplinary team approach to bring in the richness of social context [13]. While ethnography and software engineering are complementary - ethnographic studies capture the details that are useful in analysis, while software engineering design looks for and uses abstractions as often as possible [22] - the two forms of data and focus, from these two quite different disciplines, each with different notations and concepts, means that there has been a bottleneck in getting from one to the other, difficult to negotiate without loosing much of the detail captured in the ethnographic data. Furthermore, there is neither an efficient or affordable way to capture it on the scale needed for heterogeneous individual user needs. A way forward involves users collecting data about their own lives with Cultural Probes and Software Engineers providing model-based toolkits to enable end-user development of interfaces and mash-ups of the software and Internet-based services they regularly use in their lives. There is a movement of people towards such end-user development of software and computer-based artifacts – whenever they get the appropriate tools to do so. We can see it in the current rise of mash-ups in the Internet world [2] by hobbyist users. And we see it in the user modifications (so-called mods) in computer games that facilitate user additions to their game playing [18, 16]. End-user development is in part researched in the context of user innovaton – i.e. innovations created by end-users themselves rather than by corporate software houses [23]. Von Hippel and Katz investigated the use of toolkits [24] distributed to endusers, in order for manufacturers to be able to service the unique needs of individuals in what they called “markets of one”. I.e. Some manufacturers have abandoned their 842 S. Goschnick “increasingly frustrating efforts to understand users’ needs” (ibid) and instead have outsourced need-related innovation tasks to the users themselves. To do so, the tasks involved in bringing a new product into existence, are divided into two interrelated parts: solution-related tasks, and needs-related tasks. The solutions-related tasks are catered for with flexible user-friendly toolkits, initially provided by the manufacturer. The needs-related tasks are what the end-users then do with those toolkits – i.e. they customise the initial product to suit their specific needs. Von Hippel has been researching in the user innovation space since the early 1990’s and in his book Democratizing Innovation [23] he gives two primary reasons that help to explain the recent exponential growth in the user-innovation area: tools that were previously only available to a niche professional base, have become available to mass end-users (in both cost and ease-of-use); and secondly, online communities of end-users confide their needs and share their solutions through the various communication channels and social networks afforded by the Internet. One of the earliest mass-enlistments of end-users via software toolkits, is in the computer games area [15], where numerous games have tens of millions of users, and several of them have tens of thousands of end-users providing additional innovative content and functionality to those games. Prugl and Schreier [18] studied the use of toolkits for the popular computer game The Sims (28 million units sold within 2 years of release), in which they studied samples from four online user communities, with an average of 15,000 members (ibid). Many other games offer toolkits to end-users to extend game functionality and content. Jeppesen and Molin [15] found that of the 94 games they surveyed, 33 of them included toolkits for end-user development. Where von Hippel mainly describes end-user innovation as the way that ‘markets of one’ can be appropriately and cost effectively serviced with the goal of ‘satisfying each user’s needs’, Prugl and Schreier looked deeper into ‘how’ users deal with the invitation to innovate (including the model of open innovation), and they also investigated the attractiveness of end-user designs, to other users. They single out ‘leadingedge’ users as a potential source of radical product development (ibid), since their designs find large user-bases amongst other users in online communities that centre around the toolkits. This is a useful finding since end-user toolkits are used by a minority of users, whereas some of the innovation produced by those users, can be used by many of the rest. In a study about what motivates users to modify the games they play [16], Kadarusman focused on the World of Warcraft (WoW) game, which has over 11 million registered users (as at October 2008). He reported that WoW has more than 4600 user-modifications available for download, the most popular of which was downloaded on average 110,927 times per day. 4.2 Definition of People-Oriented Programming We are now in a position to describe People Oriented Programming (POP) as a new design paradigm for developing individual-oriented systems, and define the four elements that it includes. POP calls upon the individual user in three main capacities: firstly, as the central focus of a customised software system addressing heterogeneous needs, which von Hippel and Katz describe as ‘markets of one’ [24]; secondly as a selfethnographer [12,1] administering and using Cultural Probes [7], personal role models [9,10] and scenarios [19] to gather their own very-specific data (including in the domestic space); and thirdly, as end-user developers, coming up with their own solutions to People-Oriented Programming 843 match their personal needs, utilising well-engineered software toolkits [24] designed to make the user the centre of innovation in new product development. The fourth element of POP is the cognitive models behind the tools, techniques and frameworks upon which the user toolkits are built. These models are drawn from two disciplines that are not often cross-referenced [11]: the Agent-Oriented paradigm, and Cognitive Task Modelling. E.g. the technology used in this research to pursue People Oriented Programming is the DigitalFriend, which appropriately has its theoretical base anchored upon a century of evolution of models of mind from Analytical Psychology [9,10]. Fig. 5. Home Environment Locale with interface to Java SunSpot sensor (insert) The following section briefly explores the use of Locales from the ShaMAN metamodel, which has featured in parts of Sections 2 and 3, by way of an example DigitalFriend of a user engaged in POP, with the DigitalFriend V2 toolkit: 4.3 Example Locales in a User’s DigitalFriend In the personalisation of the DigitalFriend for a given individual, numerous private and personally significant Locales are brought into the analysis, and into the technology. For example, a map of the user’s home is represented as a Locale in figure 5. It is cross-related with the Role lens in the DigitalFriend (see insert in figure 5), and is connected with a sensor (a Java SunSpot technology kit, in this prototype) that knows exactly where the user currently is spatially. For example, if the person is in the kitchen sub-Locale, then the DigitalFriend can be set to assume the person is in the default role of cook, and likewise the Resources that become available can be cooking suggestions and recipes. When in the garden, they can be assumed to be in the gardener role by default, and can receive messages about their previous activity against significant Resources (e.g. “You last pruned the Apple tree in late Winter of 2008”) from a trajectory arc (i.e. via log files). With more specialised sensors, they would be able to get information about the state of the garden as they pass by them – e.g. moist content sensors could trigger: “The south-east garden bed has a moisture content well below the recommendation for this time of the year”. The Locales in the model do not have to represent maps or rooms that actually exist, as those in figure 5 do. From the Jungian Psychology that underlines the theory of 844 S. Goschnick sub-selves behind the many-facetted, many-role model of the individual, also comes the Jungian concept of archetypal symbols [9], which hold common meanings (across people) when they appear in peoples’ dreams. From Jung we are told that to dream often of a given House or Home usually symbolically represents the person’s mind itself in some compartmentised way. The different rooms: kitchen, lounge, dining, bedroom, laundry, etc symbolically representing analogous-aspects of the individual’s life. Therefore an individual user can build ‘the house of their dreams’ in computerbased imagery, either 2D plans or 3D representations, and then link those images/media to the roles in their life within the MAS DigitalFriend, as a personally satisfying and highly intuitive interface. 5 Conclusion People Oriented Programming (POP) as defined above sounds simple i.e.: focus on the heterogeneous needs of individual users; get the users to record their own ethnographic data; and then have them develop their own enhancements to agent-oriented software using user-friendly toolkits. That stated simplicity belies the actual complexity to carry it out. Just as the modern GUI PC is much easier to use than old non-GUI PCs, multiple layers of complexity and indirection were needed to bring about that simplicity of use. Not surprisingly then, ethnography is an inexact way to gather requirements as compared to traditional requirements engineering methods; AO technologies are an order of magnitude more complex that traditional OO languages and frameworks; and useroriented toolkits that are user-friendly enough to build personal systems from disparate services and applications, are complex in terms of designing and building them. However, the POP approach retains the richness from the cultural probe data, through into the technology in a way that reflects peoples social needs, desires and goals, and to the benefit of the collective aspirations of the social worlds they are a part of. The recently reported amount of end-user innovation in the games genre touched on above, is testimony to the approach working, when the mix between user-needs and the functionality of the solution-related technology on offer, is right. The agent-oriented model-based approach to personalising an individual’s interface to the technology in their lives is a natural fit. The mentalistic notions that the AO paradigm abstracts in a computational form, draws upon Psychology, and therefore can be reverse-focused upon programming for individuals, by end-users themselves. The models from cognitive task modeling told us that goals, plans, tasks, actions, roles and objects are represented in the cognitive functioning of the mind. These models confirm those from the agent-oriented paradigm [11], and vice versa, through their strong underlying similarities. And it is the models that will hold POP together as the artifacts shared between the Software Engineers building the toolkits, and the end-users innovating their own creations and customisations with them. References 1. Arnold, M.: The Connected Home: Probing the Effects and Affects of Domesticated ICTs. In: Proceedings of PDC 2004, ACM, Toronto (2004) 2. Feiler, J.: Web 2.0 Mashups. McGraw Hill, New York (2008) People-Oriented Programming 845 3. Fleming, M., Cohen, R.A.: User Modeling Approach to Determining System Initiative in Mixed Initiative AI Systems. In: International Conference on User Modeling UM 1997 (1997) 4. Fitzpatrick, G.: The Locales Framework: Understanding and Designing for Wicked Problems. Kluwer Academic Publications, London (2003) 5. Forbrig, P., Dittmar, A.: Bridging the Gap between Scenarios and Formal Models. In: Jacko, J., Stephanidis, C. (eds.) Human Computer Interaction: Theory & Practice (Part I), pp. 98–102. Lawrence Erlbaum Associates, Mahwah (2003) 6. Gaver, B., Dunne, T., Pacenti, E.: Cultural Probes. Interactions 6(1), 21–29 (1999) 7. Gaver, B., Boucher, A., Pennington, S., Walker, B.: Cultural Probes and the Value of Uncertainty. Interactions 11(5), 53–56 (2004) 8. Goschnick, S.B.: ShadowBoard: an Agent Architecture for enabling a sophisticated Digital Self. Thesis, Dept. of Computer Science, University of Melbourne, Australia (2001) 9. Goschnick, S.B.: The DigitalFriend: the First End-User Oriented Multi-Agent System. In: OSDC 2006, Open Source Developers’ Conference, Melbourne, Australia, December 5-8 (2006) 10. Goschnick, S., Balbo, S., Sonenberg, L.: From Task to Agent-Oriented Meta-models, and Back Again. In: Forbrig, P., Paternò, F. (eds.) HCSE/TAMODIA 2008. LNCS, vol. 5247, pp. 41–57. Springer, Heidelberg (2008) 11. Goschnick, S., Graham, C.: Augmenting Interaction and Cognition using Agent Architectures and Technology Inspired by Psychology and Social Worlds. In: Universal Access in the Information Society, vol. 4(19), Springer, Heidelberg (2005) 12. Haesen, M., Coninx, K., Van den Bergh, J., Luyten, K.: MuiCSer: A Process Framework for Multi-disciplinary User-Centred Software Engineering Processes. In: Forbrig, P., Paternò, F. (eds.) HCSE/TAMODIA 2008. LNCS, vol. 5247, pp. 150–165. Springer, Heidelberg (2008) 13. Hughes, J., O’Brien, J., Rouncefield, M., Blythin, S.: Designing with Ethnography: A Presentation Framework for Design. In: Proceedings of Design of Interactive Systems (DIS 1997), pp. 147–158. ACM, Amsterdam (1997) 14. Jeppesen, L.B., Molin, M.J.: Consumers as Co-developers: Learning and Innovation Outside the Firm. Technology Analysis and Strategic Management 15(3), 363–384 (2003) 15. Kadarusman, J.: User-Innovation in the Modding Community of World of Warcraft. Honours Thesis, Department of Information Systems, University of Melbourne (2008) 16. Pressman, R.: Software Engineering: A Practitioner’s Approach. McGraw-Hill, New York (2004) 17. Prugl, R., Schreier, M.: Learning from leading-edge customers at The Sims: Opening up the innovation process using toolkits. R&D Management 36(3), 237–250 (2006) 18. Rosson, M.B., Caroll, J.M.: Usability Engineering: Scenario-based development of humancomputer interaction. Morgan Kaufmann, San Francisco (2001) 19. Stary, C.: Toward the Task-Complete Development of Activity-Oriented User Interfaces. International Journal of Human-Computer Interaction 11(2), 153–182 (1999) 20. Strauss, A.: A Social World Perspective. Studies in Symbolic Interaction 1, 119–128 (1978) 21. Viller, S., Sommerville, I.: Conherence: An Approach to Representing Ethnographic Analyses in Systems Design. Human-Computer Interaction 14, 9–41 (1999) 22. Von Hippel, E.: Democratizing Innovation. MIT Press, Cambridge (2005) 23. Von Hippel, E., Katz, R.: Shifting Innovation to Users via Toolkits. Management Science 48(7), 821–833 (2002), http://userinnovation.mit.edu/papers/10.pdf Visualization of Software and Systems as Support Mechanism for Integrated Software Project Control Peter Liggesmeyer1,2, Jens Heidrich1, Jürgen Münch1, Robert Kalcklösch2, Henning Barthel1, and Dirk Zeckzer2 1 Fraunhofer IESE, Fraunhofer Platz 1, 67663 Kaiserslautern, Germany {peter.liggesmeyer,jens.heidrich,juergen.muench, henning.barthel}@iese.fraunhofer.de 2 TU Kaiserslautern, Post Office Box 3049, 67653 Kaiserslautern, Germany {kalckloesch,zeckzer}@informatik.uni-kl.de Abstract. Many software development organizations still lack support for obtaining intellectual control over their software development processes and for determining the performance of their processes and the quality of the produced products. Systematic support for detecting and reacting to critical process and product states in order to achieve planned goals is usually missing. One means to institutionalize measurement on the basis of explicit models is the development and establishment of a so-called Software Project Control Center (SPCC) for systematic quality assurance and management support. An SPCC is comparable to a control room, which is a well known term in the mechanical production domain. One crucial task of an SPCC is the systematic visualization of measurement data in order to provide context-, purpose-, and role-oriented information for all stakeholders (e.g., project managers, quality assurance managers, developers) during the execution of a software development project. The article will present an overview of SPCC concepts, a concrete instantiation that supports goal-oriented data visualization, as well as examples and experiences from practical applications. Keywords: Software Project Control Centers, Visualization Mechanisms, Data Visualization, GQM. 1 Introduction The complexity of software-intensive systems and development projects continues to increase. One major reason is the ever-increasing complexity of functional as well as non-functional software and systems requirements (e.g., reliability or time constraints for safety critical systems). The more complex the requirements, the more people are usually involved in meeting them, which further increases the complexity of controlling and coordinating a project. This, in turn, makes it even harder to develop the system according to plan (i.e., matching time, quality, and budget constraints). Project control issues are very hard to handle. Many software development organizations still lack support for obtaining intellectual control over their software development projects and for determining the performance of their processes and the quality of the J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 846–855, 2009. © Springer-Verlag Berlin Heidelberg 2009 Visualization of Software and Systems as Support Mechanism 847 produced products. Systematic support for detecting and reacting to critical process and product states in order to achieve planned goals and quality is usually missing [15]. One way to support effective control of software development projects is the use of basic engineering principles [7], [19], with particular attention to the monitoring and analysis of actual product and process states, the comparison of actual states with planned states, and the initiation of any necessary corrective actions during project execution. Effectively applying these principles requires the collection, interpretation, and appropriate visualization of measurement data according to a previously measurement goals and plans in order to provide stakeholders with up-to-date information about the project state. One major challenge is to adequately (and partially integrated) visualize process and product properties during project execution so that informed decisions can be made by the relevant stakeholders (such as project managers, quality assurance personnel). This addresses, for instance, early warning mechanisms that recognize insufficient quality characteristics of development products or the ability to generate accurate effort and cost predictions. In the aeronautical domain, air traffic control systems are used to ensure the safe operation of commercial and private aircraft. Air traffic controllers use these systems to coordinate the safe and efficient movement of air traffic (e.g., to make certain that planes stay a safe distance apart or to minimize delays). These systems collect and visualize all critical data (e.g., the distance between two planes, the planned arrival and departure times) in order to support decisions by air traffic controllers. Software project control requires an analogous approach that is tailored to the specifics of the process being used (e.g., its non-deterministic, concurrent, and distributed nature). A Software Project Control Center (SPCC) [15] is a control system for software development that collects all data relevant to project control, interprets and analyzes the data according to the project’s control needs, visualizes the data for different project roles, and suggests corrective actions in the case of plan deviations. An SPCC could also support the packaging of data (e.g., as predictive models) for future use and contribute to an improvement cycle spanning a series of projects. Controlling a project means ensuring the satisfaction of project objectives by monitoring and measuring progress regularly in order to identify variances from the plan during project execution, so that corrective action can be taken when necessary [17]. Planning is the basis for project control and defines expectations, which can be checked during project execution. Project control is driven by different role-oriented needs. We define control needs as a set of role-dependent requirements for obtaining project control. A project manager needs different kinds of data, data of different granularity, or different data visualizations than a quality assurance manager. In this article, we want to illustrate selected existing project control approaches (Section 2), and then focus on a concrete instantiation that supports goal-oriented data visualization, the so called Specula approach (Section 3). Afterwards, we will present advanced visualization mechanisms used for controlling risks and quality of development projects and selected lessons learned from their application (Section 4). Finally, we will give a summary and illustrate future research fields (Section 5). 848 P. Liggesmeyer et al. 2 Related Work An overview of the state of the art in Software Project Control Centers can be found in [15]. Most of the existing, rather generic, approaches for control centers offer only partial solutions. Especially purpose- and role-oriented usages based on a flexible set of techniques and methods are not comprehensively supported. In practice, many companies develop their own dashboards (mainly based on Spreadsheet applications) or use dashboard solutions that provide a fixed set of predefined functions for project control (e.g., deal with product quality only or solely focus on project costs) and are very specific to the company for which they were developed. The indicators used to control a development project depend on the project’s goals and the organizational environment. There is no default set of indicators that is always used in all development projects in the same manner. According to [14], a “good” indicator has to (a) support analysis of the intended information need, (b) support the type of analysis needed, (c) provide the appropriate level of detail, (d) indicate a possible management action, and (e) provide timely information for making decisions and taking action. The concrete indicators that are chosen should be derived in a systematic way from the project goals [12], making use of, for instance, the Goal Question Metric (GQM) approach [3]. Some examples from indicators used in practice can be found in [1]. With respect to controlling project cost, the Earned Value approach provides a set of commonly used indicators and interpretation rules. With respect to product quality, there exists even an ISO standard [10]. However, the concrete usage of the proposed measures depends upon the individual organization. The test / diagnosis of complex systems was put on a formal basis in 1967 by [16]. One of the drawbacks was addressed by [13]. A good overview of system diagnosis models can be found in [2]. With respect to the visualization and applicable tools, an overview is presented in [20]. 3 Goal-Oriented Software Project Control Specula [8] is a state-of-the-art SPCC. It interprets and visualizes collected measurement data in a goal-oriented way in order to effectively detect plan deviations. The control functionality provided by Specula depends on the underlying goals with respect to project control. If these goals are explicitly defined, the corresponding functionality is composed out of packaged, freely configurable control components. Specula provides four basic components: (1) a logical architecture for implementing software cockpits, (2) a conceptual model formally describing the interfaces between data collection, data interpretation, and data visualization [9], (3) an implementation of the conceptual model, including a construction kit of control components, and (4) a methodology of how to select control components according to explicitly stated goals and customize the SPCC functionality [8]. The methodology is based on the Quality Improvement Paradigm (QIP) and makes use of the GQM approach [3] for specifying measurement goals. QIP is used to implement a project control feedback cycle and make use of experiences and knowledge gathered in order to reuse and customize control components. GQM is used to drive the selection process of finding the right control components according to defined goals. The different phases that have to be Visualization of Software and Systems as Support Mechanism 849 considered for setting up and applying project control mechanisms can be characterized as follows: I. Characterize Control Environment: First, stakeholders characterize the environment in which project control shall be applied in order to set up a measurement program that is able to provide a basis for satisfying all needs. II. Set Control Goals: Then, measurement goals for project control are defined and metrics are derived determining what kind of data to collect. In general, any goal derivation process can be used for defining control objectives. For practical reasons, we focus on the GQM paradigm for defining concrete measurement goals addressing the measurement object, purpose, quality focus, viewpoint, and context information. III. Goal-oriented Composition: Next, all control mechanisms for the project are composed based on the defined goals in order to provide online feedback on the basis of the data collected during project execution; that is, control techniques and visualization mechanisms are selected from a corresponding repository and instantiated in the context of the project that has to be controlled. This process is driven by interpretation and visualization models that clearly define which indicators contribute to specific control objectives, how to assess and aggregate indicator values, and how to visualize control objectives and intermediate results. IV. Execute Project Control Mechanisms: Once all control mechanisms are specified, a set of role-oriented views is generated for controlling the project. When measurement data are collected, the control mechanisms interpret and visualize them accordingly, so that plan deviations and project risks are detected and a decision-maker can react accordingly. If a deviation is detected, its root cause must be determined and the control mechanisms have to be adapted accordingly. This, does, for example, require data analyses on different levels of abstraction in order to be able to trace causes of plan deviations. V. Analyze Results: After project completion, the resulting visualization catena has to be analyzed with respect to plan deviations and project risks detected in time, too late, or not detected at all. The causes for plan deviations and risks that were detected too late or that were not detected at all have to be determined. VI. Package Results: The analysis results of the control mechanisms that were applied may be used as a basis for defining and improving control mechanisms for future projects (e.g., selecting the right control techniques and data visualizations, choosing the right parameters for controlling the project). Fig. 1 illustrates the basic conceptual modules of the Specula approach. The customization module is responsible for selecting and adapting the control components according to project goals and characteristics and defined measurement (control) goals. It is possible to include past experience (e.g., effort baselines, thresholds) in the selection and adaptation process. This experience is stored in a experience base. A Visualization Catena (VC) is created, which formally describes how to collect, interpret, and visualize measurement data. The set of reusable control components from which the VC is instantiated basically consists of integrated project control techniques (for interpreting the data in the right way) and data visualization mechanisms (for presenting the interpreted data in accordance with the role interested in the data). The central processing module collects measurement data during project performance and 850 P. Liggesmeyer et al. interprets and visualizes them according to the VC specification. Measurement data can be retrieved automatically from project repositories or manually from data collection forms and formal documents. Finally, charts and tables are produced to allow for online project control. A packaging module collects feedback from project stakeholders about the application of the control mechanisms and stores them in an Experience Base (e.g., whether a baseline worked, whether all plan deviations were detected, or whether retaliatory actions had a measurable effect). Using these modules, the Specula framework is able to specify a whole family of project control centers (which is comparable to a software product line for control centers). Project Members GQM GQM Plan Plan G G Q Q M M Project Planners M M Q Q M M GQM GQM VI VI FI FI DE DE WFI WFI Documents Q Q M M Customisation Customisation Goals Goals and and Characteristics Characteristics M M Project Project Repositories Repositories Experience Experience Base Base Data Data Collection, Collection, Interpretation, Interpretation, and and Visualisation Visualisation VI VI Visualisation Visualisation Catena Catena VI VI Goal Goal Question Question Metric Metric View View Instance Instance Function Function Instance Instance Data Data Entry Entry Web Web Form Form Instance Instance Association Association Data Data Flow Flow Packaging Packaging Online Online Control Control VI VI VI VI FI FI Control Control Components Components FI FI FI FI FI FI DE DE DE DE DE DE DE DE WFI WFI WFI WFI DB DB IDE IDE Project Stakeholders Fig. 1. Overview of the Specula framework Fig. 2. Example visualization of a simple hierarchical Gantt chart The Specula approach was evaluated as part of industrial case studies in the SoftPit project (a public German research project, no. 01ISE07A) in which a prototypical implementation of the concepts was used. Results of the first two iterations can be Visualization of Software and Systems as Support Mechanism 851 found in [5] and [6]. In general, people perceived the usefulness and ease of use of the Specula control center as positive. However, usefulness and ease of use also varied across the different case study providers depending on the state of the practice before introducing the control center solution and it also largely varied across the different visualization mechanisms used. In the Soft-Pit case, mostly “standard visualizations” for project control were used, such as Gantt charts, line/bar charts, tables/matrixes and simple trees (see, e.g., Fig. 2). One major success factor for the usefulness of visualizations was how intuitive the visualization can be interpreted. Especially, when aggregating data and complex, multi-dimensional relationships needs to be illustrated, this requires more advanced visualization concepts. 4 Advanced Visualization Mechanisms The following sub-sections present examples for advanced mechanisms used for visualizing the risks and quality of development projects and summarize lessons learned from their application. Further visualization mechanisms of quality properties, especially safety and security for embedded systems, is currently investigated in the German research project ViERforES (see http://www.vierfores.de). Fig. 3. 3D Treemap visualizing different metrics 4.1 Visualizing Code Quality To analyze the quality of a software system, metrics are used that measure certain attributes of the software’s internal structure. Many metric tools exist that are able to define such metrics and collect measurement data automatically. For analysis purposes different visualization techniques such as node-link diagrams and graphs are used in order to help the user in drawing conclusions about the quality of the software system. These techniques use a limited set of graphical elements like text, simple 852 P. Liggesmeyer et al. geometric shapes or uniform color fills to highlight relevant attributes of the software system being visualized. For combining different metric values within one picture, a 3D-Treemap technique (see Fig. 3. ) was developed and integrated into a code analysis system at Fraunhofer IESE. This visualization mechanism allows us to map data measuring code quality to different graphical properties of each cube (such as position, size, height, and color). To further analyze these values, the user is able to interactively define new views, pan and zoom within the 3D scene and use a pull-down menu to initialize other measurement or visualization actions. 4.2 Visualizing Risk Management Risk management and especially risk avoidance plays an important role in all development and construction activities. For managing risks, a structured process is mandatory. Visualization is used for analyzing risks and supporting managers in deciding upon necessary actions. Siemens developed a methodology named sira that is used for collecting data about possible risks [4]. This includes structured interviews for determining possible risks, their probability and importance as well as the possible damage that may be caused. Based on this analysis, a risk portfolio is created. In order to analyze these risks, the so-called sira bubble charts were created that summarize all necessary information that has to be discussed with the customer (see Fig. 4 and [4]). Fig. 4. sira.iris, a visualization of a risk portfolio 4.3 Fault Detection in Distributed Systems The functionality of a software system is distributed over many components and the interaction between these components plays a crucial role. In order to analyze the reliability of a system, data about communicating components and their errorproneness is collected. The output is analyzed using an interactive visualization (see Fig. 5 and [20]). In this visualization color coding is used to characterize the “faultiness” of components with respect to communication relations. The ratio between faulty communication and the overall amount of communication is used for coloring Visualization of Software and Systems as Support Mechanism 853 all nodes and edges of the graph, indicating starting points for bug fixing activities. The overall approach is described in [11]. Interaction plays an important role in this application. For instance, changing colors helps understanding the impact of faults. Changing transparency of clusters helps understanding structural information about the system. Fig. 5. Faults collected during the execution of a system 5 Conclusions This article presented the basic concept of an SPCC for establishing project control by means of systematic visualization mechanisms. We illustrated existing approaches and presented a goal-oriented way to establish project control by formalizing the way measurement data are interpreted and visualized according to a previously defined measurement goal. Existing approaches offer mostly partial solutions. Especially goal-oriented usages based on a flexible set of techniques and methods are not comprehensively supported [15]. The expected benefits of the goal-oriented visualization approaches include: (1) improvement of quality assurance and project control by providing a set of custom-made views of measurement data, (2) support of project management through early detection of plan deviations and proactive intervention, (3) support of distributed software development by establishing a single point of control, (4) enhanced understanding of software processes, and improvement of these processes, via measurement-based feedback, and (5) preventing information overload through custom-made views with different levels of abstraction. An important research issue in this context is the development of a schema for adaptable control techniques and methods, which effectively allows for purposedriven usage of an SPCC in varying application contexts. Another research issue is the elicitation of information needs for the roles involved and the development of 854 P. Liggesmeyer et al. mechanisms for generating adequate role-oriented visualizations of the project data. Another important research issue is support of change management. When the goals or characteristics of a project change, the real processes react accordingly. Consequently, the control mechanisms, which should always reflect the real world situation, must be updated. This requires flexible mechanisms that allow for reacting to process variations. One long-term goal of engineering-style software development is to control and forecast the impact of process changes and adjustments on the quality of the software artifacts produced and on other important project goals. Goal-oriented visualization mechanisms can be seen as a valuable contribution towards reaching this goal. References 1. Agresti, W., Card, D., Church, V.: Manager’s Handbook for Software Development. SEL 84-101, NASA Goddard Space Flight Center. Greenbelt, Maryland (November 1990) 2. Barborak, M., Malek, M., Dahbura, A.T.: The Consensus Problem in Fault-Tolerant Computing. ACM Computing Surveys 25(2), 171–220 (1993) 3. Basili, V.R., Caldiera, G., Rombach, D.: The Experience Factory. Encyclopaedia of Software Engineering 1, 469–476 (1994) 4. Bülte, H., Mäckel, O.: Mehr sehen mit sira: Mit einem Blick IT-Projekte durchleuchten. In: SE 2009, Kaiserslautern, Germany (2009) 5. Ciolkowski, M., Heidrich, J., Münch, J., Simon, F., Radicke, M.: Evaluating Software Project Control Centers in Industrial Environments. In: Proceedings of the First International Symposium on Empirical Software Engineering and Measurement, pp. 314–323. IEEE Computer Society, Los Alamitos (2007) 6. Ciolkowski, M., Heidrich, J., Simon, F., Radicke, M.: Empirical results from using custom-made software project control centers in industrial environments. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 243–252. ACM, Kaiserslautern, Germany (2008) 7. Gibbs, W.W.: Software’s Chronic Crisis. Scientific American, 86–95 (1994) 8. Heidrich, J., Münch, J.: Goal-Oriented Setup and Usage of Custom-Tailored Software Cockpits. In: Jedlitschka, A., Salo, O. (eds.) PROFES 2008. LNCS, vol. 5089, pp. 4–18. Springer, Heidelberg (2008) 9. Heidrich, J., Münch, J.: Cost-Efficient Customisation of Software Cockpits by Reusing Configurable Control Components. In: Dekkers, T. (ed.) Proceedings of the 4th Software Measurement European Forum, SMEF 2007, Rome, Italy, May 9-11, 2007, pp. 19–32 (2007) 10. ISO 9126: Software Engineering – Product Quality. Technical Report. ISO/IEC TR 9126. Geneva (2003) 11. Kalcklösch, R.: Gossip-Based Diagnosis of Arbitrary Component-Oriented Systems. Technische Universität Kaiserslautern, PhD Thesis (2008) 12. Kitchenham, B.A.: Software Metrics. Blackwell, Oxford (1995) 13. Kuhl, J.G., Reddy, S.M.: Distributed fault-tolerance for large multiprocessor systems. In: ISCA 1980: Proceedings of the 7th Annual Symposium on Computer Architecture, La Baule, United States, pp. 23–30. ACM Press, New York (1980) 14. McGarry, J., Card, D., Jones, C., Layman, B., Clark, E., Dean, J., Hall, F.: Practical Software Measurement – Objective Information for Decision Makers, 1st edn. AddisonWesley Professional, Reading (October 15, 2001) Visualization of Software and Systems as Support Mechanism 855 15. Münch, J., Heidrich, J.: Software Project Control Centers: Concepts and Approaches. Journal of Systems and Software 70(1), 3–19 (2003) 16. Preparata, F.P., Metze, G., Chien, R.T.: On the Connection Assignment Problem of Diagnosable Systems. IEEE Transactions on Electronic Computers EC-16(6), 848–854 (1967) 17. Project Management Institute: A Guide to the Project Management Body of Knowledge (PMBOK® Guide) 2000 edn. Project Management Institute, Four Campus Boulevard, Newtown Square, PA 19073-3299 USA (2000) 18. Rombach, H.D., Verlage, M.: Directions in Software Process Research. Advances in Computers 41, 1–63 (1995) 19. Shaw, M.: Prospects for an Engineering Discipline of Software. IEEE Software 7(6), 15– 24 (1990) 20. Zeckzer, D., Schröder, L., Kalcklösch, R., Hagen, H., Klein, T.: Analyzing the Reliability of Communication between Software Entities Using 3D Force-Directed Layout of Clustered Graphs. In: ACM Conference on Software Visualization (SoftVis 2008), Herrsching am Ammersee, Germany, September 16-17 (2008) Collage: A Declarative Programming Model for Compositional Development of Web Applications Bruce Lucas, Rahul Akolkar, and Charlie Wiecha IBM T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598 {bdlucas,akolkar,wiecha}@us.ibm.com Abstract. Collage is a declarative programming model and runtime expressly targeted at building and deploying cross-organizational software as compositions of web components. Collage is based on an RDF data model, data-driven execution model, and flexible support for cross-organizational composition of both application and UI components. In this paper we outline a uniform set of Collage language features addressing end-to-end application design, including business objects, but with particular focus on user interaction, and adaptation to current interaction platforms such as web browsers. Keywords: Declarative languages, Cross-organizational applications, Distributed Computing, Resource Description Framework, Constraint-based programming. 1 Introduction The goal of the Collage project is to design a radically simplified declarative programming model and runtime expressly targeted at building and deploying crossorganizational software as compositions of web components. An additional goal is to support an evolutionary style of software development that allows rapid application prototyping, but also enables progressive refinement of the initial prototype into a hardened asset. The interaction architecture and presentation model in Collage allows for the description of user interfaces at a range of levels of abstraction and mapping onto a range of interaction devices. In the sections below we outline a simplified and uniform set of Collage language features addressing end-to-end application design, including business objects, but with particular focus on user interaction, and adaptation to current interaction platforms such as web browsers. 2 RDF-Based Data Model The Collage data model is built on a core subset of RDF [1]. RDF was developed as part of the vision of a semantic web, that is, a web of knowledge based on fundamental principles of representation of and reasoning about knowledge. However, the core challenge presented by such a web - managing distributed inter-organizational connected graphs of data - is also shared by web application developers. Collage focuses on and builds on J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 856–865, 2009. © Springer-Verlag Berlin Heidelberg 2009 Collage: A Declarative Programming Model 857 the basics of RDF that enable it to meet that challenge, thus applying RDF to a somewhat different domain than that which initially motivated its development. Each resource may have zero or more associated classes. Classes are named by URIs. RDF represents the classifications of a resource R as a set of triples with R as subject, rdf:type as predicate, and the associated class(es) as object(s). RDF supports multiple classification by associating any number of classification triples with a resource. In Collage multiple classification is an important mechanism for supporting composition: • Collage provides declarative (and therefore easily composable) language constructs by which separately authored program artifacts may independently specify different classifications for the same resource. When those program artifacts are composed to form a single program, the resource is multiply classified. • Collage provides declarative (and so composable) language constructs for associating program behavior, such as creation of dependent structure and reaction to resource updates, with resources on the basis of their classification(s). When those program artifacts are composed to form a single program, the multiply classified resource thereby acquires multiple independently specified facets to its associated behavior. The features discussed in this section that support a flexible approach to class definition and use play an important role in Collage, for such diverse purposes as flexible cross-organizational composition of programs and processes, user interface styling, device adaptation, and so on. The Collage execution model centers (as do many execution models) around a concept of mutable entities with composite values that may be read, and whose values evolve over time by being updated. The RDF data model underlying Collage lacks such a notion, so the Collage data model builds on the RDF model by introducing the concept of resources with associated composite values to model mutable entities. The recursively-composed value of a resource R is represented in Collage by a tree of RDF resource nodes and triples rooted at R. Thus RDF triples are used in Collage both to model a tree-structured composite value of a resource, and to connect related resources in an arbitrary graph. The triples comprising the value of a resource are distinguished from triples that connect resources by having a predicate that is a subproperty of a distinguished property, c:value, defined by Collage. A heavy line in the graphical notation shown in Figure 1 indicates value edges. 3 Execution Model Collage is a programming model for building reactive systems. Thus the Collage execution model defines how the runtime state of a program changes in reaction to the occurrence of external events such as user inputs or messages from external systems; or in reaction to the occurrence of asynchronous internal events. The Collage execution model is data centric, meaning that all aspects of the execution model are described in terms of changes of the runtime state of the program. The runtime state of a program comprises a set of RDF triples contained in a triple store. 858 B. Lucas, R. Akolkar, and C. Wiecha Fig. 1. Collage c:value properties (shown in bold) model composite values in RDF Execution in Collage is driven by updates. An update is the assignment of a value to a resource (even if that value is the same as the value that the resource has before the update). An update may be either an initiating update or an ensuing update. Each external event, such as a user input, is reflected in Collage as an initiating update. In response to an initiating update, a Collage program performs an execution cycle. An execution cycle performs a cascade of ensuing updates, as declaratively directed by the Collage program constructs. In effect, the initiating updates may be viewed as "clock ticks" that define a succession of values for the resources in the cascade of the initiating update. The Collage language provides constructs that the programmer uses to specify updates to the runtime state. Following the dichotomy in the data model between value and non-value triples, the execution model provides two classes of constructs for updating program state: • The <c:bind> construct declaratively specifies how the value of a resource changes in response to changes in the values of other resources, using XQuery as a functional value computation language. • The <c:let> and <c:create> constructs declaratively specify the creation of new resources, relate new and existing resources by creating and destroying non-value triples, and change the classification of new and existing resources, all based on current resource values. During any given execution cycle a set of binds will be triggered by the update of an input resource. A bind that is triggered is called a relevant bind. Each resource that will be updated is updated exactly once by the execution of a <c:bind> construct, after all input resources to that bind have been updated. Thus each resource has a welldefined old value, which is its value prior to the start of the execution cycle; and a new value, which is its value after the completion of the execution cycle. When a <c:bind> construct executes, it has access to both the old and the new values of its input resources. Binds that use new values can be used to model constraintlike updates, for example keeping a data or presentation view in sync with the data being viewed. Binds that use old values can be used to model non-idempotent operations such as adding to a total, appending an item to a list, or inserting a record into a database. Collage: A Declarative Programming Model 859 After performing the resource value updates directed by the <c:bind> constructs (by creating and destroying value triples associated with a resource), the Collage program creates new resources, resource relationships (represented as non-value triples), and classifications (also represented as non-value triples), based on the new resource values. These actions are declaratively directed by <c:let> and <c:create> constructs. The Collage programming model constructs are declarative, meaning that they may be viewed as conditional constraints on the runtime state. These constraints may be constraints either on the new state of the system after the completion of an execution cycle, or constraints relating the new state of the system after the completion of an execution cycle to the old state of the system prior to the initiating update. At the completion of an execution cycle, the state of the system will satisfy the constraints specified by all <c:let> and <c:create> constructs, and the constraints specified by the relevant binds, which are the binds whose execution was triggered on that execution cycle. There are other programming models that work with RDF data stores, by providing the necessary data layer for existing scripting languages. ActiveRDF [2] is an objectoriented API for managing RDF data which can be used as a data layer in Ruby-on-Rails applications. In contrast with ActiveRDF and Ruby, Collage provides a declarative datadriven programming model that supports live data connections, or queries. 4 End-to-End Example The example shown above is a simple end-to-end Collage application that provides a form (1) that allows querying and updating a relational database of weather information (2). The <c:create> construct associates user interface elements such as inputs (3) and triggers (4) with the WEATHERMAN resource class that represents the form. The <c:let> construct (5) uses the "city" input field to select a row from the database, recording it using the "selected" property. The <c:bind> construct (6), triggered by the "set" trigger (4), updates the database with the quantity in the "temperature" input field, after converting Fahrenheit to Celsius. A similar <c:bind> construct (7) retrieves the temperature from the database, converting Celsius to Fahrenheit. Fig. 2. An end-to-end example Collage application – query and update temperatures 860 B. Lucas, R. Akolkar, and C. Wiecha As indicated by the dashed boxes above, the application may be distributed among multiple computing nodes, such as a browser and a server. Distributed data structures are formed by triples that connect resources with URLs residing on different computing nodes, such as (5). Distributed execution occurs when <c:bind> constructs that reference distributed data structures are executed, such as (6) and (7). 5 Presentation and Interaction Model The presentation and user interaction model in Collage allows the description of user interfaces, or application "front-ends", at a range of levels of abstraction. A recursive MVC pattern is supported, allowing the developer to refine an abstract user interface description through successive levels of more concrete specification. The following diagram summarizes the Collage application data-centric recursive MVC model. The model of an instance of the MVC pattern is represented by a Collage resource. The view of MVC is a set of resources associated with the model, whose instantiation is declaratively driven by the <c:create> construct. The controller of MVC is a set of <c:bind> constructs that update the model resource in response to updates to the view resources, and vice versa. The set of resources that comprise a view of a model resource may themselves serve as models for other views, thus supporting a recursive MVC pattern. The set of resources comprising a view, together with the <c:bind>-based controller that connects the view resources with the model resource, may also be seen as a more concrete refinement of an abstraction represented by the model. Each instance of this MVC pattern in Collage is referred to as an abstract interaction unit. The development of a user interface in Collage centers around abstract interaction units, and in particular, around the definition of the semantics of the class of an abstract interaction unit's model resource. Composite abstract interaction units are defined by declaratively decomposing them into a collection of subordinate interaction units defined elsewhere. Fig. 3. Recursive MVC design pattern Collage: A Declarative Programming Model 861 A variety of composition patterns are supported in this way: • In a typical composition pattern, a subordinate interaction unit is created corresponding to each piece of the model of the composite interaction unit that needs to be presented to the user or collected from the user. • Collections of interaction units may be aggregated based on application semantics, constituting an abstract form. • Interaction units may be designed with an eye to reuse, or they may be designed for a single use. The Collage pattern described above supports either use case equally. The MVC recursion is grounded in built-in abstract interaction primitives, such as <c:input>, <c:output>, and<c:trigger>, each of which represents a primitive unit of presentation and associated reactive behavior. The abstract interaction primitives are discussed in greater detail in subsequent sections. Necessary user interface layout information for each interaction unit is then specified in terms of an associated abstract layout tree. The layout tree comprises a set of layout containment triples whose predicate is the Collage layout containment property, <c:contains>. The subjects in the layout containment triples are resources whose class indicates the kind of layout. Styling information is added similarly, either interspersed with layout information or separately per developer choice, using the Collage styling property c:style. The subjects in styling triples may be either interaction units or layout units. Top level user interaction window resources in Collage have a c:top classification. An abstract layout tree is presented to the user by either connecting the root of the tree to a c:top resource via a containment triple, or by multiply classify the root of a layout tree as a c:top. Page flow is achieved by using classification and containment mechanisms to dynamically change the content of a top-level window. 5.1 Abstract Interaction Primitives Collage provides built-in abstract interaction primitives, which form the basis of the user interface or application “front-end” definition. The built-in primitives each constitute an abstract unit of user interaction, encapsulating an element of an abstract user interface. These built-in primitives are defined as resource classifications. The builtin set of primitives provides a common ground for development of various concrete user interfaces, such as for supporting different types of devices and interaction modalities. Fig. 4. Structure of an input with value, label, and hint properties. Labels and hints have value properties in turn for their content. 862 B. Lucas, R. Akolkar, and C. Wiecha For example, the <c:input> abstract interaction primitive is used to collect a single piece of information, as an RDF literal. The dependent structure for <c:input>, the application code that supplies its values, and a possible rendering are exemplified in figure below. A <c:input> primitive represents a simple one-line input field. A <c:input> resource has an associate RDF string literal value, which at all times represents the current value of the input field as seen by the user. Thus the value associated with the <c:input> resource initially represents the default value of the input field, and subsequently represents the response from the user for this unit of interaction. When the user enters a new value into the input field, the system responds by changing the value of the <c:input> resource, which generates an update of the <c:input> resource that in turn initiates a Collage execution cycle. The execution cycle may result in the update of a number of other resource values, such as the values of the models further up the tree. In addition, the <c:input> resource has associated resources whose values represent information such as labels and hints to be presented to the user in association with the input field. 5.2 Layout and Styling The abstract interaction primitives and abstract composition units described above are abstract in the sense that they do not provide information about concrete rendering, such as the physical layout of the interaction units on a screen. This section describes the Collage mechanism for overlaying concrete layout and styling information on the abstract interaction mechanism described above. Fig. 5. Layout of a login dialog using RDF triples describing containment relations Collage provides a mechanism for describing a layout tree as an overlay on the abstract interaction units. A layout tree consists of • a set of resources connected as a tree by the Collage layout containment property c:contains; • a root node, such as a c:top resource representing a top-level user interaction window; • a set of interior nodes which are resources of a layout container class such as the built-in containers <c:vbox>, <c:hbox>, and <c:grid>; and Collage: A Declarative Programming Model 863 • a set of leaf nodes which are resources of one of the abstract interaction primitives described above (<c:input>, <c:output>, etc.). The following example illustrates a simple layout tree for a login form. In the layout tree diagram, the c:contains triples are shown as blue lines. The separation of concerns between the abstract user interface definition and the layout tree and styling information can be factored in, for example, by having separate resource classifications for the concerns, which may be defined in the same or different Collage program units, by the same or different developers. 5.3 Renderkits The preceding sections describe the mechanisms by which a developer defines abstract and concrete elements of a user interface. In order to present the user interface on any particular device, it must be rendered using a technology that may be specific to the device. A rendered user interface is implemented in Collage as a refinement of the user interface described using the presentation primitives and layout mechanism described above. The resource classifications necessary for such refinement are defined in a renderkit. The following diagram illustrates the relationship between • resources that are created by code that a typical application developer writes: model, abstract UI, and layout, on the left; and • resources that are created by a renderkit, on the right, which most developers need not be concerned with. Fig. 6. Layers of abstraction in a Collage user interface, and RDF mapping among layers We anticipate most developers will not need or want to author a renderkit, but rather will use a pre-existing renderkit, such as the system-provided renderkit(s). However, it is possible that advanced developers might want to author a renderkit using the Collage programming model. 864 B. Lucas, R. Akolkar, and C. Wiecha 6 Browser Model Collage is designed to support distributed systems: it is built on RDF, a fundamentally distributed data model, and via its execution model distributed computations as well. Collage extends application distribution all the way to the browser, treating the browser as a Collage computing node (albeit with possibly limited capability). In our current implementation, upon first contacting a server with an HTTP GET request, a browser is sent a bootstrap Javascript program whose execution makes the browser a limited Collage node, capable of managing resources related to the presentation of the user interface. After this initial bootstrap exchange further communication between the browser and the server is via HTTP POSTs using the Collage protocol outlined above. This is accomplished with an AJAX-style JavaScript Collage Engine running in the browser. In effect, Collage functions as a high-level programming environment supporting AJAX-style applications. The browser is capable of initiating updates for resources related to user inputs, of receiving notification of updates to resources related to user outputs, and of receiving and displaying Collage value trees, represented as XML documents, that represent screen layouts. It is envisioned that portions of a Collage program may be compiled to Javascript and loaded into the browser for execution to create a more responsive user experience. The Collage protocol infrastructure maintains equality between the server and browser copies of the rendered user interface representation by sending Collage update messages, in both directions. Update messages from the browser to the server represent user actions such as field inputs and button presses, while update messages from the server to the browser represent the server’s response to user actions, or asynchronous "push" updates to the current browser page. 7 Discussion and Related Work The use of Collage to model user interaction can be compared with earlier work in the area of User Interface Management Systems (UIMS)[3], model-based UI design methods [4], and constraint systems for UI execution models [5]. Collage bind, let, and create expressions, along with their associated execution model, clearly avoid the well-known limitations of traditional UIMS – namely the rigidity deriving from a fixed separation between “application” and “user interface. Collage provides a generic relation and constraint management mechanism suitable to an open ended set of design patterns – some of which may correspond to typical UI layerings. We are interested in the more general case in which an application may be decomposed more continuously between back-end and front-end UI logic -- via the same underlying set of abstractions. More recent work on Model-based UI design [4] avoids the rigidity of traditional UIMS but typically still assumes a given set of model types and relations. Collage can be viewed as a lower-level framework suitable for representing an open-ended set of ontologies for model-based UI design. Note that some earlier systems for modelbased UI design, including XIML [6], are extensible in their model, attribute, and relation types as well, and are intended as more open-ended UI data frameworks. Collage: A Declarative Programming Model 865 Collage, in addition, provides an execution semantics that supports runtime evaluation of structural and value changes across a set of linked models. Collage’s support for implicitly propagating structure and value change events in a distributed application makes possible late design decisions in partitioning applications across multiple nodes in a network. Current web design practice involves one programming model on the server to generate markup and script, and another on the client to execute them. This split forces very early and rigid design decisions in partitioning applications to suit a given network topology. Collage allows a uniform endto-end specification language allowing for flexible retargeting and redeployment of components as desired. Function originally developed for the server might be migrated to the client to aid performance. Conversely, function originally prototyped on the client might be migrated to a hosted server-based environment for larger-scale managed deployment. Finally, we can compare Collage with XForms [5], an existing declarative datadriven language for web applications. Collage builds on and generalizes many of the concepts familiar from XForms to produce a uniform programming model across all application tiers. The XML tree-based MVC design of XForms is made recursive and generalized to RDF graphs in Collage. The concept of view-model and model-model binding is expanded to a general purpose computation model based on resourceresource binding. Data-driven user interface instantiation is generalized to declarative resource instantiation. The event-driven execution model of XForms is simplified and regularized by the data centric update-driven execution model of Collage. References 1. Klyne, G., Carroll, J. (eds.): Resource Description Framework (RDF): Concepts and Abstract Syntax. World Wide Web Consortium (2004), http://www.w3.org/TR/rdf-concepts 2. Oren, E., Delbru, R., Gerke, S., Haller, A., Decker, S.: ActiveRDF: Object-Oriented Semantic Web Programming. In: 16th International Conference on the World Wide Web, pp. 817– 824 (2007) 3. Olsen, D.: User Interface Management Systems: Models and Algorithms. Morgan Kaufmann, San Mateo (1992) 4. Trætteberg, H., Molina, P., Nunes, N. (eds.): MBUI 2004, First International Workshop on Making Model-based User Interface Design Practical, Funchal, Portugal (2004), http://ftp.informatik.rwth-aachen.de/Publications/ CEUR-WS/Vol-103/ 5. Boyer, J., Dubinko, M., Klotz, L., Landwehr, D., Merrick, R., Raman, T.V. (eds.): XForms 1.0, 3rd edn. World Wide Web Consortium (2007), http://www.w3.org/TR/xforms 6. Puerta, A., Eisenstein, J.: Towards a General Computational Framework for Model-Based Interface Development Systems. Knowledge-Based Systems 12, 433–442 (1999) Hypernetwork Model to Represent Similarity Details Applied to Musical Instrument Performance Tetsuya Maeshiro1, Midori Maeshiro2, Katsunori Shimohara3, and Shin-ichi Nakayama1 1 School of Library and Information Science, University of Tsukuba, 1-2 Kasuga, Tsukuba, 305-8550 Japan {maeshiro,nakayama}@slis.tsukuba.ac.jp 2 School of Music, Federal University of Rio de Janeiro, Rua do Passeio, 98, Lapa, Rio de Janeiro, Brazil mdrmaeshiro@gmail.com 3 Faculty of Engineering, Doshisha University, 1-3 Tatara Miyakodani, Kyotanabe, 610-0394 Japan kshimoha@mail.doshisha.ac.jp Abstract. This paper treats the quantification and description of similarities among entities being represented as a network. The proposed representation model, hypernetwork model, allows more specific description of relationships among represented entities than conventional knowledge representation models. Musical instruments performance is represented with hypernetwork model. Detailed description of similarity relationships provided by the hypernetwork model enables the discrimination of various types and degrees of similarity. A method to compare similar relationships is also discussed, which leads to the analogical reasoning, associative search and retrieval. 1 Introduction This paper treats the detailed description of relationships among entities represented as a network, where a node represents an instance of entity and nodes are linked if represented instances are related. Basic assumption is that target entities can be represented using three elements that constitute a network representation, namely nodes, edges and edge labels. For instance, knowledge is represented as a set of concepts and semantic relationships among them. A central problem that resides is the quantification of "similar" or "similarity" among entities. Detailed description of similarity relationships provided by the proposed model enables the discrimination of various types and degrees of similarity. Furthermore, it is also possible to compare similarity degrees and ranks according to likeness, which leads to the analogical reasoning, associative search and retrieval. Any human computer interaction and communication, whether verbal or nonverbal, requires that computers store information exchanged with humans based on some kind of representation model. Description of similarity is not possible using conventional knowledge representation models such as semantic network [1]. The reason is that these models are based J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 866–873, 2009. © Springer-Verlag Berlin Heidelberg 2009 Hypernetwork Model to Represent Similarity Details 867 on graph [2] : (1) Links (edges) have no child nodes attached to describe the links, and only labels are allowed. (2) When one creates a graph (network) to describe association among link labels, this graph should exist independently from the network whose link labels were extracted. However, this “link-label network” should be integrated for more powerful representation. The hypernetwork model, the proposed representation model, allows a specification of relationships. An immediate advantage of a precise description of similarity is a more precise comparison of entities represented in networks, because accurate comparison allows precise detection of subnetworks with desired degree of similarity. Identification of relevant subnetworks, or the subnetworks classified as highly similar, is crucial for pattern matching and analogical reasoning, and the latter relies totally on how similar subnetworks are extracted from the whole network. The precision and quality of reasoning are directly influenced from the quality of matched subnetworks. Exact match is not useful because analogy exploits similarity or approximation among data or information. Conventional knowledge representation models such as semantic network [1], frame and entity-relation model [3] are all equivalent models. These conventional models require the use of predefined keywords (controlled keywords) to label the links to specify relationships. To indicate that two nodes represent entities or objects that are alike or related, a simple "similar" or "related" labels are commonly used without distinguishing how similar or related the connected nodes are. A possible solution is to provide unique label for all relationships, but this is unreal due to the problem related with frame problem or symbolic AI problem. Quantification of similarity among entities implies that two relationships can be related objectively utilizing verifiable method. Evidently, subjectiveness may occur in the description of each similarity relationship, but the comparison is reproducible and gives identical results. Once the specification of similarity is provided, comparison of similarity and quantification of similarity, both absolute and relative, are possible. An efficient network matching mechanism for the proposed method is also discussed, based on a high speed computer system suitable to process network data. 2 Hypernetwork Model The proposed model is extended from hypergraph [4], which has more representation power than conventional knowledge representation models that are based on graph [2]. The proposed model follows basic definitions of semantic networks [1], where a node is connected to other nodes (1) to specify the nodes or (2) when nodes are related by some relationship. Figure 1 shows examples of the graph and the hypergraph. Edges of hypergraph connecting more than two nodes are represented with circle surrounding the connected nodes. A graph G is defined as G=VxV, where V is the set of nodes, and edge connects a pair of nodes. As the formula indicates, a graph is a matrix generated by product of nodes. On the other hand, the hypergraph H is defined as H=VxE, where E is the set of edges. VxE is a matrix that results from product of vertices by nodes. Hence an edge of hypergraph, called hyperedge or hyperlink, connects any number of nodes. 868 T. Maeshiro et al. This is a notable difference from graph that can connect only a pair of nodes. Note that N-ary relationship is fundamentally different from the combination of binary relationships, as they represent distinct concepts. Another interpretation of the formula VxE is under the framework of set theory, where each hyperlink defines a distinct subset of nodes. a d c a 0 1 1 0 a b c d b b 1 0 1 0 c 1 1 0 1 d 0 0 1 0 (A) e1 a a b c d e2 c b e3 e1 1 1 1 0 e2 1 0 1 0 e3 0 1 0 1 d (B) Fig. 1. (A) A graph. An edge connects a pair of nodes. The matrix on the right is the adjacency matrix VxV. 1 if the nodes are connected, and 0 otherwise. (B) A hypergraph. The circle surrounding nodes a, b and c is an edge that connects these three nodes. An edge may contain any number of nodes. The matrix on the right is the connection matrix VxE. 1 if the edge connects the nodes, 0 otherwise. Another representation of hypergraph, the Bipartite graph, is obtained by converting the formula VxE to a formula similar to the graph, VxV. If the set of links E of hypergraph is treated as a set of nodes VE, H=VxE becomes HB=VxVE, a structure called bipartite graph. Two different types of nodes exist in bipartite graph. One is the set of nodes that exists as nodes (vertices) in original hypergraph, denoted vertex nodes. The other one is the set of hyperlinks in original hypergraph, denoted link nodes. Although bipartite graph has expression capability equivalent to the hypergraph, it is a graph because a link always connects a pair of nodes. However, a link never connects nodes of the same type, i.e., two vertex nodes or two link nodes. A connection is allowed only between a vertex node and a link node. The third type of node exists in the proposed model, denoted attribute node, which has no correspondence in hypergraph and subsequently in bipartite graph. Due to the presence of this node, though a hypergraph can be converted to a bipartite network, this process is irreversible. Hypernetwork Model to Represent Similarity Details e1 e1 a e2 869 e3 e2 c b e3 d a c b (A) d (B) Fig. 2. Bipartite graph. (A) hypergraph with labels attached to nodes (a, b, c, d) and edges (e1, e2, e3). (B) Bipartite representation of the hypergraph in (A), where the four nodes on bottom denote nodes in hypergraph (a, b, c, d), and the three nodes on top (shadowed circles) denote edges in hypergraph (e1, e2, e3). The node e1 is connected to nodes a, b and c in (B) because the edge e1 connects nodes a, b and c in (A). In the context of knowledge representation models, a concept or an entity represented by vertex node can be specified in two ways: by describing the attributes of the vertex node, or by relating to other vertex nodes. Combination of the two is also possible. The attribute node exists to specify any of three node type: vertex node, relation node and attribute node. An attribute node can further specify other attribute nodes. Table 1. Connectivity among vertex node, relation node and attribute node Vertex node Relation node Attribute node − Connect Connect Relation node Connect − Connect Attribute node Connect Connect Connect Vertex node Table 1 shows the connectivity constraints among three node types. Two connections are prohibited: between vertex node and vertex node, and between link node and relation node, constraint imposed from their role in hypergraph. Table 1 is symmetrical on diagonal axis although the directionality of links depends on the context and what the network represents. 2.1 Specification of Similarity Attribute node linked to a vertex node specifies or defines the properties of the entity represented by the vertex node. Attribute node linked to another attribute node is homologous to the previous case, and it defines the properties of the quality or concept expressed by the other attribute node. On the other hand, attribute node connected to a link node is the element absent in conventional knowledge representation 870 T. Maeshiro et al. models. This connection enables a detailed specification of relationship among vertex nodes. Note that the relationship treated here is N-ary which covers the binary relationship, the only relationship that conventional models can represent. The ability to assign attributes to relationship is essential to qualify and then quantify the similarity relationships that are simply labeled “similar” in semantic networks, for example. Furthermore, attribute nodes connected to link node are specified with more details by further connecting attribute nodes, generating a multi-level hierarchical structure of attributes. Any type of relationship is specified in same manner, but this paper focus on “similarity”, a very broad conception, meaning anything between identical and different. Since vertex nodes and relation nodes can connect to attribute nodes, relation nodes are related to other relation nodes. Consequently, a concept network with higher density than conventional models emerges. 2.2 Qualification and Quantification of Similarity Qualification is sufficient for exact match of concept networks. Assignment of attribute nodes to a relation node is equivalent to qualification of the relationship represented by the relation node. Any detail is possible by attaching necessary number of attributed nodes directly to the relation node or further detailing the attribute nodes with any of vertex, relation or attribute nodes. Opposite to qualification, quantification of similarity is necessary for approximate match of networks, which has broader and more useful applications. Typical operation is the network search and match, where the task is to enumerate all subnetworks present in the target network that are similar to the query network. Another task is to compare two networks, then detect similarities and differences, in order to calculate quantitative value of similarity. Another important application is the enumeration of similar concepts from a given concept, which is a simulation of associative reasoning. The process is like associative game, where each participant suggests a keyword related to the previous keyword, generating a chain of sequentially related keywords. The hypernetwork model can simulate this process by choosing an initial node and then successively “firing” the nodes connected directly with similarity relationship. This activation process selects multiple nodes at each step, and the number of activated nodes increases monotonically. 3 Representation of Musical Instrument Performance We employ the hypernetwork model to represent performance techniques of musical instruments. Music is unique because it contains emotional and logical aspects. Emotional aspect is evident. Logical aspect is found in musical theory, harmony and signal processing, among others. Musical instruments performance or playing techniques described as rules are useful for automatic “humanization” system that processes exact compilation of musical score data in MIDI format to generate MIDI data that sounds similar to human players’ performance [5]. The system, named MorPH, incorporates a set of rules provided for each musical instrument to modify note loudness (velocity), timing, duration, and effects caused by performance techniques such as ghost notes, fill-ins and vibrato. Hypernetwork Model to Represent Similarity Details 871 Different players of same musical instrument are described by different rule sets. A rule is a combination of one or more conditions and the instrument playing technique to be applied when conditions are fulfilled. Examples of rules are (1) Initial note of a phrase is soft (all instruments); (2) Insert a ghost note on the high beat if note is absent (bass). In some cases, particularly in classical music, different rule set is required for each musical piece even for same player and instrument. It is useful to study the relationship between performance techniques and performance impressions. One can study performance impressions by varying the performance rules and analyzing what people feel or perceive. Parameters of the analysis are the rule sets and similarity among rule sets. The generation of a rule set that completely describes the characteristics of a professional musician is hard because of the difficulty in generating symbolic expression of specific and subtle conditions to apply the rule. Each performance rule set, available for each instrument of a player, is represented by a vertex node. Individual rules are also vertex nodes, connected to the vertex node representing the rule set (player and instrument) through a relation node (Figure 3). Similarity among rule sets is valuable to detect similar players. If the description of similarity is unsupported by representation model, on the fly calculation of similarity among rule sets is necessary while executing network search and match. In some cases, however, calculation of similarity is not possible, particularly when associated with impressions. In this case, similarity has to be specified by humans and input manually. We synthesized a hypernetwork of five musical instruments: base, drums, guitar, sax and piano, with three players for each instrument, totaling fifteen rule sets. Generated similar Player1 Instrument1 whole Player2 rule1 part Instrument1 whole part-of part Player3 Instrument1 part-of part part rule2 rule3 rule4 part rule1 part part rule2 rule3 Fig. 3. Hypernetwork representation of musical instrument performance rule sets. Circles are vertex nodes, shaded circles are relation nodes, and double circles are attribute nodes. Illustrative example to describe similarity between instrument1 performance of two players. “whole” and “part” are labels assigned to link between relation node and vertex node. Attribute nodes connected to “similar” relation node specifies properties particular to the relation node. Note that relation nodes denote N-ary relationships. 872 T. Maeshiro et al. rules were classified according to performance techniques, such as fast passage notes, fill-ins, short phrase melodies, and chord successions. In addition to the relation nodes that relate rule sets at the player level (Figure 3), rules were also related on performance technique level. Similarity among rule sets is calculated based on the number of rules and similarity among rules. The similarity among rules is defined by comparing the conditional and execution parts of rules, and by similarity among conditions. A peculiar characteristic of musical performance rule is that both condition part and execution part of rules describe notes of musical score, but is actually referring time duration that the note sounds. The time duration in actual performance is never identical to what is indicated in musical score, and can be shorter or longer. Specificity of a rule is defined as a combination of specificity of condition and execution parts, and shorter the time range (duration) referred by the condition or affected by the execution, more specific the rule is. Furthermore, the priority of the specificity of execution part is higher than the condition part. 4 Discussions Given the hypernetwork of musical instrument performance, one can choose a rule set (player) and then gradually change the performance by traversing the hypernetwork. Further the rule set positioned from the starting rule set, the difference of listening impression should be larger. But this depends on the algorithm to calculate the difference among rule set, and experiment is necessary to analyze the impression. Many graph search and match algorithms have been proposed in the field of graph theory, but usually their objective is to find cliques (completely connected graphs) or exact matched graphs. Furthermore, many are approximate algorithms due to their NP completeness nature. We have developed Starpack, a fast computer system to process data represented as hypergraphs. Thus graphs are included. Starpack has been used to simulate gene regulatory networks [6,7]. A node represents a gene, and genes are connected if they participate in the same reaction or one regulates others. The architecture of Starpack consists basically of simple processors and registers. The simulation consists of calculating temporal variation of quantities of all genes, and computation is ruled by different gene regulatory formula associated with each relationship that uses quantities of genes that participate in the regulatory relationship. Registers store the values of gene quantities. Starpack is a massively parallel system, and mapping between regulatory formula and processors is one to one. So all gene regulatory formula is executed in parallel, and the execution speed is up to one million times faster than software based simulators. The processor represents a relation node of bipartite graph. The hypernetwork model is easily implemented on Starpack. Vertex nodes and attribute nodes are assigned to registers that are used to represent gene quantity, and relation nodes are assigned to processors. Both mappings are one to one. Instead of quantity value, ID number that identifies symbols and labels are stored in registers. To execute network search and match, the whole target network is loaded onto Starpack, and the query network, whose size is smaller than the target network, is input to Starpack. Then the query network propagates over the processors through connection Hypernetwork Model to Represent Similarity Details 873 determined by the target network. This architecture has two advantages. First, selection of search strategy such as breadth-first, depth-first and beam search is irrelevant because the query data propagates simultaneously to all connected processors, and match is executed in parallel. Second, NP computational complexity problem vanishes due to parallel execution as long as the target network fits into Starpack. Even if the number of relation nodes is greater than the number of processors, or the number of vertex nodes and attribute nodes exceeds the number of registers, computational cost is still polynomial because the target network data can be stored in external storage and swapped. Experiments indicate that some concepts are represented by single neuron in human brain [8]. The implementation of the hypernetwork model on Starpack is a possible approach to build brain-like inference system, where concepts are represented by vertex nodes, and semantic relationships by relation nodes. The study of network search and match is under progress, and results will be presented in future. Some argue that pattern matching and hierarchical structure are two basic factors of intelligent computer systems [9]. Pattern matching can be described as a network matching process if the target data is represented as networks. Our model can be applied to intelligent machines using concept network, where a node represents a concept and nodes are connected if the concepts are semantically related. Specification of "similar" relationships among nodes enables the firing of reasoning process on concept network, and the process continues, spreading the firing signal over the network. This is an associative search that traces paths interconnected by semantic relationships among concepts. References 1. Quillian, M.R.: Word concepts: a theory and simulation of some basic semantic capabilities. Behavioral Science 12(5), 410–430 (1967) 2. Berge, C.: The Theory of Graphs, Dover (2001) 3. Date, C.J.: An Introduction to Database Systems, 8th edn. Addison-Wesley, Reading (2003) 4. Berge, C.: Hypergraphs: Combinatorics of Finite Sets. North-Holland, Amsterdam (1989) 5. Maeshiro, T., Shimohara, K.: MorPH - Musical Performance Humanizer. In: Proceedings of the International Symposium: KANSEI 2001, pp. 195–198 (2001) 6. Maeshiro, T., Hemmi, H., Shimohara, K.: Ultra-fast Genome wide Simulation of Biological Signal Transduction Networks - Starpack. In: Frontiers of Computational Science, pp. 177– 180. Springer, Heidelberg (2007) 7. Hemmi, H., Maeshiro, T., Shimohara, K.: New Computing System Architecture for Scientific Simulations - Non CPU-oriented Methodology. In: Frontiers of Computational Science, Springer, Heidelberg (2007) 8. Quiroga, R.Q., Reddy, L., Kreiman, G., Koch, C., Fried, I.: Invariant visual representation by single neurons in the human brain. Nature 435, 177–180 (2005) 9. Hawkins, J., Blakeslee, S.: On Intelligence, Times Books (2004) Open Collaborative Development: Trends, Tools, and Tactics Kathrin M. Moeslein1,2, Angelika C. Bullinger1, and Jens Soeldner1 1 Chair for Information Systems 1, University of Erlangen-Nuremberg, Lange Gasse 20, 90403 Nuremberg, Germany {kathrin.moeslein,angelika.bullinger, jens.soeldner}@wiso.uni-erlangen.de 2 Center for Leading Innovation & Cooperation (CLIC), HHL – Leipzig Graduate School of Management, Jahnallee 59, 04109 Leipzig, Germany Abstract. Following the successful trend of open source, companies can be observed to open their innovation and development processes towards interested and capable partners inside and outside the organization. Previous research has neglected the need to integrate these different innovators. In this paper, we start to explore how this integration can be facilitated by social software, a class of applications that belong to the group of web-based, user-centric applications commonly referred to by the term Web 2.0. We show data of 24 social networking services which we examined along the characteristics typically used in the field of social software. Keywords: Social networking services, open innovation, collaborative development. 1 Trends: Open Innovation Innovations, generally defined as the development of novel products, services, and processes, have come to be accepted as the main driver for successful companies (e.g. [1, 2]). It is no longer an issue for companies and research whether innovation is important, but how innovation is conducted and innovation processes are steered. Companies have accordingly invested considerable resources in the establishment and development of capable R&D departments. Researchers and developers have typically grown experts in their field who are sometimes ‘locked in’ their R&D knowledge. To overcome this challenge of ‘seeing and not seeing’ [3], organizations have increasingly been found to open their development process towards more collaborative approaches. Companies have been opening up formerly closed innovation and development processes, traditionally conducted by their internal R&D department, and started to allow external actors to participate in development processes. Accordingly, external sources of innovation have undergone growing attention in studies investigating innovation. The strategy of organizations to open up their innovation processes to outside innovators has been termed open innovation [4]. Consequently, we speak of open collaborative development if it comes to collaborative development of products, services and processes over the web. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 874–881, 2009. © Springer-Verlag Berlin Heidelberg 2009 Open Collaborative Development: Trends, Tools, and Tactics 875 From this perspective, innovation or development is no longer dependent on members of an organizational unit within a single company, but rather generated by the (social) interaction of members from distributed, organization spanning networks [5]. For open collaborative development to be successful, the active participation of competent participants is as crucial as it is to open innovation processes. Innovators inside and outside the company borders need to be integrated in the relevant processes – from idea generation and selection to implementation. Motivating and enabling these potential participants to publish their ideas and to comment on and participate in the further development of their own and the ideas of others’ presents a challenge. In the field of open innovation, first ideas emerge that collaborative applications from the area of web 2.0 could be promising for this task [6]. In this paper, we go beyond open innovation and focus in particular on open collaborative development, i.e. collaborative development over the web. The paper proceeds as follows: the next section provides information on the characteristics of social software (cf. tools) and a particular type of application in this field: social networking services (cf. tactics). The following section then shows our research approach, succeeded by presentation of the empirical findings on 24 social networking services focusing in particular on the support of open collaborative development which were part of our analysis. Finally, conclusion and outlook close the paper. 2 Tools: Social Software Social software denotes internet-based applications that enable, support and extend indirect and direct interpersonal interaction in (parts) of public social networks leveraging network effects and effects of scale [6, 7, 8]. The emergence of applications from the field of social software is related to several current trends that are proponents for software based collaborative development. First, the continued expansion of the open source movement provides interested users with new and modified technical tools - and, besides, influences the way of thinking and (collaboratively) working [9]. Second, the development of information technologies is powering the evolution of the internet. The change of the internet as we have known it for years into the so-called web 2.0 and the semantic web brings about important changes in both technologies and in usage patterns. Beginning with technologies, advances in web technologies have improved the way to search for, decide on and implement innovation [10]. Attributes of web 2.0 include RSS feeds, tagclouds, mashups and rich internet applications as well as new ways and tools for managing content and for delivering services. Thus, technological advancements support openness, foster innovative collaboration and ease information exchange – i.e., the web 2.0 changes usage patterns [11]. It stands for a new way of interacting with information on the internet [12, 13]. Users are enabled to participate as active users that can easily provide new content and comment on existing content. Similar to the ideas of open innovation, web 2.0 effectively enables a usergenerated internet. This kind of user involvement is made possible by a variety of social software applications, e.g. weblogs, wikis, and services for social networking and social bookmarking. Important traits of social software are their ease of use and 876 K.M. Moeslein, A.C. Bullinger, and J. Soeldner high flexibility. Motivating reports point out that social software can effectively support collaboration and knowledge exchange of employees within the firm [14, 15]. This leads us to conclude that applying the mental model of open innovation to an analysis of social software might indicate potential ways how development processes can be made more collaborative over the web. 3 Tactics: Social Networking Services From the variety of social software applications, social networking services seem particularly suitable to initiate, support and extend indirect and direct interpersonal interaction and thus enable open collaborative development. A social network is a set of people connected by a set of social relations, such as friendship, co-working, or information exchange [16]. Members in social networks tend to have weak ties - in contrast to communities, where members tend to have strong ties [17, 18]. One of the benefits of social networking services (SNS) is that they reveal these links which could be useful e.g. for a job search [17]. Richter & Koch [19] define social networking services as web-based services that “offer users functionalities for identity management (…) and enable furthermore to keep in touch with other users.” In this context, a distinction is made between open SNS that are available to use for everyone and closed SNS that are used by a distinct user group, often within a company [19, 20, 21]. This paper focuses on open SNS, as it examines the potential integration of innovators in a process of open collaborative development. This requires a field of collaboration open to all innovators. Popular examples of open social networking services are facebook.com, myspace.com, classmates.com and myyearbook.com. The importance of open SNS to the digital public is mirrored in the monthly visits. To name but two, in January 2009, facebook.com experienced 1.2 billion visits and myspace.com was visited about 800.000 times [22]. This fascination is mostly driven by the desire of users to remain in contact [19, 23]. The basic features of social network services, however, include more functionalities [19, 24]: identity management within a bounded system (i.e., construction of a public or semi-public personal profile); expert search and contact management (i.e., list other users with whom the user shares a connection); (contextual and network) awareness (i.e. information on the status of other users); as well as support of a collaborative exchange (i.e., view and navigation of the list of connections and those made by others within the system) Among the multitude of social networking services, we have identified the emergence of a new type in which the above listed basic functionalities are combined with further functions that potentially support open collaborative development. These social networking services mostly focus on academic users and accordingly, we propose the term social research networks (SRN) for this specific subgroup of social networking services. As the academic audience might well form a nucleus for activities in the field of internet-based collaborative development, we investigated these more in detail. 4 Research Approach We draw on previously acknowledged characteristics from research on social software in our efforts to elaborate a first integrated systematization of characteristics of Open Collaborative Development: Trends, Tools, and Tactics 877 SNS. We also seek to identify additional characteristics which existing literature has yet not regarded. To accomplish these goals, we conducted a three-step research study. In total, 24 social networking services have been analyzed. Research Sample and Data Collection. To allow for comparison of the different social network services, we have chosen our sample based on the following criteria: web-based service, addressing a professional or an academic audience; representation of target audiences from different disciplinary fields as well as social networking services independent of a particular discipline and openness to a broad public. Due to the large number of social networking services analyzed, the diversity necessary for generalization of insights is ensured (cf. table 1). The 24 social networking services are basically defined by two characteristics: degree of topical focus (20 classified ‘general’, 4 ‘focused’) and degree of openness (19 classified ‘open’, 5 ‘invitation’). Data Analysis. To gain a deeper understanding of the characteristics of social networking services, we applied the following research approach. Using extant characteristics from the publications in the field, we analyzed how the social networking services in our sample represent these characteristics. Analysis of characteristics was done by 52 under-graduate students specializing in innovation management at a large public German University. Evaluation was done by two researchers separately and subsequently, evaluation results were compared. In case of differences, a third researcher was asked to evaluate the social networking services. We focused on identification of relevant characteristics as standalones, i.e. relations and interdependences have not been explored. 5 Findings Despite the continuous growth of importance that social software in general and social networking services in particular have shown throughout the last years, research in the field so far has been limited. In addition, previous publications on social networking services have categorized their functionalities into three main areas – identity and network, interaction and communication, and information and content [19]. We propose that while these categories are sufficient for generic social networking services, they are not adequate to comprehensively describe emerging forms that hold more specific offerings for collaborative work and development. Hence, this study has taken a step forward by examination of 24 social networking services and proposing the enlargement of the systematization by two novel criteria stemming from research on social software (topical focus and degree of openness). Table 1 below shows our findings. In future work, the five criteria will be further refined and expanded. Identity and Network. Managing identity information is one of the core functionalities of social networking services, also present in the investigated social networking services. 20 of 24 networks analyzed offer identity management functionalities. Academia.edu goes one step further and offers a tree-like hierarchy of universities and their departments in which researchers can easily find their place in. Furthermore, a researcher can select from a range of keywords (or create new keywords) denoting his or her research interests. The keywords bear another advantage – you can opt for an 878 K.M. Moeslein, A.C. Bullinger, and J. Soeldner Network Criteria 2collab.com academia.edu academici.com biomedexperts.com centraldesktop.com collabrx.com epws.org escidoc.org globaledge.msu.edu labmeeting.com laboratree.org lalisio.com lumifi.com mendeley.com mynetresearch.com network.nature.com pingtsta.com researchgate.net saba.com scholarz.net scilife.net scispace.net ssrn.com thoughtleaders.within3.co Table 1. Categorization of collaborative social networking services Identity and Network Personal Profile Directory of Profiles Search for Profiles x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Interaction and Communication Messages Instant Mess. Service x x x x x x x Information and Content Wiki Group Editor Social Tagging Social Rating Data Upload Paper Upload Commenting x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Topical Focus Generic Specific x x x x x x x x x x x x x x x x x x x x x x x x Degree of Openness Free Commercial Open for everyone By invitation only x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x automatic email notification when your profile is found by others in google.com, so you directly see when your profile has been viewed. Academia.edu implements network awareness via the feature of “following” contacts – this provides for automatic updates on any status changes of direct contacts displayed in the user’s news feed. Academia.edu serves as an example of another feature: cross-platform integration of social network services. In this particular example, keywords entered in the profile can automatically be replicated to the user’s facebook.com profile. Interaction and Communication. 16 out of 24 social networking services analyzed in this study offer built-in functions for communication via direct messages between its members. The social networking service academici.com for example, a social networking service targeting knowledge workers and offering features for interaction, Open Collaborative Development: Trends, Tools, and Tactics 879 collaboration and knowledge transfer allows for direct and indirect communication, however limited to paying users. Interaction and Communication. 16 out of 24 social networking services analyzed in this study offer built-in functions for communication via direct messages between its members. The social networking service academici.com for example, a social networking service targeting knowledge workers and offering features for interaction, collaboration and knowledge transfer allows for direct and indirect communication, however limited to paying users. Information and Content. Management of information and content is the third core functionality of social software, yet traditionally not the domain of social networking services. Managing of information is traditionally provided by wikis or forums which about one fifth of the services analyzed include. Content management, i.e. managing publications, documents and papers is provided by 19 of 24 social networking services networks analyzed. 6 of 24 also feature tagging and rating possibilities for the content. Some social networking services go well beyond these features, lumifi.com for instance features strong functionalities in the area of searching for content with an automatic content analysis engine that finds connections and relationships between words, going beyond traditional indexing or search applications. 6 Conclusion Our findings provide a contribution to the fields of open innovation and social software by establishing a link between two previously distinct strands of research. By our analysis, we suggest that social networking services as an application from the field of social software might be a suitable tool to enable collaborative development over the web. Hence, this research can serve as a first indication for (innovation) managers or other organizers to design social networking services to this end as well as a first foundation for researchers in the field to further explore the details of social networking services in the light of innovative collaboration. The strengths of our study must be tempered with recognition of its limitations that could be addressed in future research. Given the qualitative nature of the study, the systematization of social networking services should be seen as a structured analysis of reality, and not as reality itself. First, we do not claim to have identified a comprehensive set of criteria to categorize social networks enabling collaborative work. We see a need to further and in more detail explore a number of criteria. In particular, there is a need to further explore the reception of users towards the offerings made by the social networking services. Second, whereas our findings show a first set of social networking services as standalones, further research could increase the knowledge by studying the relations and interdependencies between different social networking services. Acknowledgments. This research has been funded by the Donors’ Association for the Promotion of Sciences and Humanities in Germany under HHL’s ‘Open School Initiative’. The authors would like to thank their students for their assistance in the analysis of the social networking services. 880 K.M. Moeslein, A.C. Bullinger, and J. Soeldner References 1. Drucker, P.F.: The practice of management. Elsevier, Amsterdam (2007) 2. Christensen, C.M.: The innovator’s dilemma. When new technologies cause great firms to fail. Harvard Business School Press, Boston (2008) 3. Day, G.S., Shoemaker, P.J.H.: Leading the vigilant organization. Strategy and Leadership 34(5), 4–10 (2006) 4. Chesbrough, H.W.: Open innovation. The new imperative for creating and profiting from technology. Harvard Business School Press, Boston (2005) 5. Perkmann, M., Walsh, K.: University-industry relationships and open innovation: Towards a research agenda. International Journal of Management Reviews 9(4), 259–280 6. Koch, M., Bullinger, A.C., Moeslein, K.: Social Software als Basiswerkzeug der Open Innovation. In: Zerfass, A., Moeslein, K. (eds.) Kommunikation als Erfolgsfaktor im Innovationsmanagement - Strategien im Zeitalter der Open Innovation, Gabler, Wiesbaden (2009) 7. plasticbag.org: An addendum to a definition of Social Software, http://www.plasticbag.org/archives/2005/01/ an_addendum_to_a_definition_of_social_software/ 8. Koch, M., Richter, A.: Enterprise 2.0. Planung, Einführung und erfolgreicher Einsatz von Social Software in Unternehmen. Oldenbourg, München (2007) 9. Lakhani, K.R., Wolf, R.G.: Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects. In: Perspectives on Free and Open Source Software, MIT Press, Cambridge (2005) 10. Tidd, J., Bessant, J.R., Pavitt, K.: Managing innovation. Integrating technological, market and organizational change. Wiley, Chichester (2005) 11. Downes, S.: eLearn: Feature Article, http://www.elearnmag.org/ subpage.cfm?section=articles&article=29-1 12. Chone, J.: Web 0.x to Web 2.0 Simplified, http://www.bitsandbuzz.com/article/ web-0x-to-web-20-simplified/ 13. O’Reilly, T.: What Is Web 2.0, http://www.oreillynet.com/pub/a/oreilly/tim/news/ 2005/09/30/what-is-web-20.html 14. Back, A., Gronau, N., Tochtermann, K.: Web 2.0 in der Unternehmenspraxis. Grundlagen, Fallstudien und Trends zum Einsatz von Social Software. Oldenbourg, München (2008) 15. Buhse, W.: Enterprise 2.0 - die Kunst, loszulassen. Rhombos-Verlag, Berlin (2008) 16. Garton, L., Haythornthwaite, C., Wellman, B.: Studying Online Social Networks. Journal of Computer-Mediated Communication 3(1) (1997) 17. Granovetter, M.S.: The strength of weak ties. American Journal of Sociology 78(6), 1360– 1380 (1973) 18. Stocker, A., Tochtermann, K.: Investigating Weblogs in Small and Medium Enterprises: An Exploratory Case Study. In: Flejter, D., Grzonkowski, S., Kaczmarek, T., Kowalkiewicz, M., Nagle, T., Parkes, J. (eds.) BIS (Workshops): CEUR-WS.org (CEUR Workshop Proceedings), vol. 333, pp. 95–107 (2008) 19. Richter, A., Koch, M.: Functions of Social Networking Services. In: Hassanaly, P., Ramrajsingh, A., Randall, D., Salembier, P., Tixier, M. (eds.) Proc. Intl. Conf. on the Design of Cooperative Systems 2008, Carry-le-Rouet, France, pp. 87–98 (2008) 20. Bughin, J., Manyika, J.: How businesses are using Web 2.0: A McKinsey Global Survey, http://www.mckinseyquarterly.com/article_page.aspx?ar=1913 Open Collaborative Development: Trends, Tools, and Tactics 881 21. Young, G.O.: Global Enterprise Web 2.0 Market Forecast: 2007 To 2013. Forrester Research, http://www.forrester.com/Research/Document/Excerpt/ 0,7211,43850,00.html 22. Kazeniac, A.: Social Networks: Facebook Takes Over Top Spot, Twitter Climbs, http://blog.compete.com/2009/02/09/ facebook-myspace-twitter-social-network/ 23. Enders, A., Hungenberg, H., Denker, H.-P., Mauch, S.: The long tail of social networking: Revenue models of social networking sites. European Management Journal 26(3), 199–211 (2008) 24. Boyd, D.M., Ellison, N.B.: Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication 13(1) (2007) Investigating the Run Time Behavior of Distributed Applications by Using Tiny Java Virtual Machines with Wireless Communications Tsuyoshi Miyazaki, Takayuki Suzuki, and Fujio Yamamoto Department of Information and Computer Sciences, Kanagawa Institute of Technology, 1030 Shimo-ogino, Atsugi-shi, Kanagawa, 243-0292 Japan {miyazaki,suzuki,yamamoto}@ic.kanagawa-it.ac.jp Abstract. From the viewpoint of programming education, distributed application programs carried out in a small JAVA machine group were considered. These computers are equipped with radio communication facility, multi-thread function, LEDs and various sensors. Parallel genetic algorithms and distributed search problems were targeted for the study here. About the latter, a detailed implementation method and the result of the experiment are shown. In such a computing environment, it was understood that the internal behavior and the data communication in the distributed application were easy to be grasped by an effect of visualizing them by the physical interface. Keywords: Physical Computing, Distributed Computing, Software Education. 1 Introduction In the present age when Web-related technical development is remarkable, the importance of the software education at the university rises still more. Above all, it is necessary to teach the basic technology of distributed systems or distributed applications practically. Conventionally, in this field, lectures on a basic concept and the basic technology of the distributed applications were performed in the classroom, whereas the practice that uses computers seems not to be performed very much. The one of the reasons is that there is not the environment where each person accesses a lot of PCs except one's PC. The second reason is that grasping the cooperative activities among PCs is not so easy, and consequently it is difficult to have overall image of the processing performed by that distributed application. Recently small computer SunSPOT [5,6] equipped with JAVA virtual machine attracts attention. SunSPOT possesses an acceleration sensor and an illumination sensor, and can execute general JAVA programs on it. Therefore it is originally thought that it is a device to build a radio sensor network. However, as a notable thing, this device possesses a multi-thread function as well as a wireless communication function. These functions can be used in the JAVA language like the case of the normal PC. In other words, various network applications using multi-thread programming, socket communication and multicast can be made on this machine easily. Communication among J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 882–889, 2009. © Springer-Verlag Berlin Heidelberg 2009 Investigating the Run Time Behavior of Distributed Applications 883 SunSPOTs is enabled without being connected to the PC after an application program was developed. The power supply by the built-in battery lasts comparatively for a long time enough to demonstrate the applications anywhere. By using radio communication facilities among SunSPOTs, various distributed applications can be performed. When it is compared with a general PC, the CPU performance and the transmission rate of SunSPOT are considerably low. Fortunately, such characteristics can be used effectively. In other words, an environment that can watch the behavior of the distributed application by slow motion is provided. SunSPOT is equipped with eight color LEDs and also can be attached an LCD display such as one in a mobile telephone. If they are used, the operation of the CPU and the movement of the transmission and reception can be grasped easily. Based upon this, utilization of SunSPOT for education of the distributed application technology is trying. In the following, distributed applications to work on a set of SunSPOTs developed this time are explained, and based on it, the possibility of the use by the future education is considered. 2 Distributed Application Programs for Educational Use Two applications are taken up to teach the technology of the distributed programs in a computing environment mentioned above. The first is the parallel genetic algorithm, and second is the issue of search of the two-dimensional domain by Bentley [2]. About the former, only the aim and method are described, and the implementation will be reported in the near future. On the other hand, about the latter, the details on a method and the implementation are shown below. 2.1 Parallel Genetic Algorithms The genetic algorithm (GA) has been used as convincing technique to solve the issue of complicated optimization for a long time. The potential concurrency of the processing attracted attention early and there are many studies on it [4,8]. By the parallel genetic algorithm, a population is divided into some partial groups, and each is allocated to each processor. Evolution processing is carried out independently in each group. The variety of the individual will be maintained in the whole population. However, in the evolution, it is very likely that each sub-population converges early to each local optimum status. To prevent this problem, migration of individuals is necessary in the parallel genetic algorithm. In other words the information should be exchanged at a certain suitable period between sub-populations. Some individuals are moved to another subpopulation. And let the individuals take an opportunity to do crossing-over with the individuals in the sub-population they arrived at. The individual to be migrated is limited to the elite who showed high fitness in its original sub-population. Because the elite evolved in a different way from the evolution performed in the sub-population it arrived at, a birth of child with new character is expected from the crossing-over with a native individual there. Such situation will develop new searching area for global optimized value. 884 T. Miyazaki, T. Suzuki, and F. Yamamoto Here, a problem to search for the combination pattern of the color was solved by such a parallel genetic algorithm. It is assumed that eight lamps are put to one line now. Each lamp can display 256 colors by changing each value of R, G and B. Then the combination of the color of eight lamps becomes the enormous number, that is, 256 to the 8th power (about 10 to the 19th power). It is the problem to find out one specific pattern among them. The difference between the answer pattern and the presumed pattern is assumed known as a value of the fitness function. Each subpopulation has 30 individuals. Each individual estimates one color combination pattern. From generation to generation, those individuals do crossing-over and the individuals having high fitness (near the correct color pattern) will survive. In this process, appropriate migration is important as shown above. By this migration, the stoppage to a local optimum as a whole population can be avoided. 2.2 Bentley’s Searching Problems The second application developed this time is a solution for the searching problem that Bentley, J. L. (1985) presented [2]. It is assumed that a two dimensional array of NxN contains the elements of plus and minuses numbers at random position. The problem is to find out the best rectangle among all possible rectangles covering the part of the array. The best rectangle here denotes the one that gives the maximum value resulting from the summation of all the elements contained in that rectangle. That maximum value is the final answer. A naive algorithm. For example, in the case of an array A of 4x4 in Fig. 1, the grand total of the elements in the rectangle of the dot line frame becomes 3. On the other hand, the grand total of the elements in the rectangle of the black bold frame is 8, and this becomes the maximum. The empty rectangle that nothing includes is permitted, and, in that case, the sum of elements is considered to be a zero. Therefore, the maximum does not become the minus number. Generally, in this solution, computational complexity suddenly increases as size N of the array A grows big. Fig. 1. Bentley’s problem (for 4x4 array) The most naive algorithm needs computational complexity of the order of N to the 6th. However, by some improvement, it can be reduced to the order of N to the 5th. The main part of this algorithm is shown below. Investigating the Run Time Behavior of Distributed Applications 885 int maxsofar = 0; int sum; for (int is =0; is<N; is++){ for (int js =0; js<N; js++){ for (int ib =is; ib<N; ib++){ sum = 0; for (int jb =js; jb<N; jb++){ for (int i = is; i<=ib; i++) { sum = sum + A[i][jb]; } if (sum > maxsofar) maxsofar = sum; } } } } An Optimized Algorithm. More effective algorithm is usable if much memory can be used. The main part of this algorithm is shown below. Here, the intermediate result of the calculation is stored in a work array W, and it is used effectively as a partial sum. This algorithmic computational complexity becomes the order of N cubed [1]. int globalmax =0; int sum; int i, j, k ; for (j=0; j<N; j++){ for (i=0; i<N; i++){ sum = 0.0; for (k=i; k<N; k++){ sum = sum + A[k][j]; W[i][k] = max(0, W[i][k]+sum); globalmax = max(globalmax, W[i][k]); } } } However, because the SunSPOT machine does not have much quantity of memory, it was decided to use the algorithm of N to the 5th mentioned above. Here, the performance enhancement of the application is not necessarily assumed the main purpose. Therefore, this algorithm is useful enough as an exercise to grasp inside behavior of the distributed application. 3 Distributed Solution for the Bentley’s Problem Using SunSPOTs Here, an algorithm with the complexity of N to the fifth is adopted. And a JAVA program is developed that divides the search calculation into small parts and distributes them among several SunSPOTs. One SunSPOT is chosen as a host machine, and other SunSPOTs act as a worker that performs provided part of searching. The host sends appropriate partial search area to the worker that sent a “READY" message, and then waits for the searched result. The worker, which received a partial search area, 886 T. Miyazaki, T. Suzuki, and F. Yamamoto performs calculation and sends back the result (local maximum value) to the host, and then sends “READY" message again. During the computation with several workers, another new workers can be dynamically joined the computation. When all the results from the workers sent back to the host, the global maximum value should be taken as the final answer. 3.1 Observing Internal Processing and Data Transmission In this distributed application, the host SunSPOT has two threads run simultaneously. The first thread receives “READY” from a worker, and then sends back partial search area to that worker. The second thread waits for the result that a worker calculated on the partial search area. Eight LEDs of SunSPOT is used to display the movement of this application. When a SunSPOT sends or receives data, LED-7 or LED-0 turns on blue respectively for a pre-defined period. While a SunSPOT performs calculation, four central LEDs turn on red, whereas while it is in “READY” they turn on green. Because a LCD device (a display part of a mobile telephone) is connected to the host SunSPOT, it is possible to display successive data sent from each worker and also the final answer. Fig. 2 shows the communications among a host and the workers. Fig. 2. Radio communications among SunSPOTs Outline of the Host program. Pseudo-coding shows the outline of this method below. This illustrates the programs of the host SunSPOT. In this version, it is necessary to start this host program earlier than a worker. poblic void run(){ start listenReady() in the main thread; start listenResult() in the second thread; } Investigating the Run Time Behavior of Distributed Applications 887 public void listenReady(){ loop = true; while(loop){ receive datagram at port 171; if(the message equals “READY”){ turn on LED-0 for 100ms (receiving); extracts sender’s address A; if(partial search area is still left){ send a partial search area to the address A at port 172; turn on LED-7 for 100ms (sending); }else{ loop = false; } } } } public void listenResult(){ loop = ture; while(loop){ receive datagram at port 173; ） if(the message equals “RESULT” { turn on LED-0 for 100ms (receiving); take that data as an local maximum; update the global maximum value; } if(all the answers have been received){ loop = false; } } } Outline of the Worker program. The following pseudo-coding illustrates the outline of the worker program on a SunSPOT. This worker can be added dynamically during processing with the host and other workers. The relationship among the host and workers are shown in Fig. 2. public void run(){ while(true){ turn on green with LED2-5 (as “READY”); receive datagram at port 172; if(it indicates a partial search area){ if(the problem is to me){ turn on LED-0 for 100ms (receiving); turn on red with LED2-5(as “BUSY”); search the partial area (local max); send result to the host at port 173; turn on LED-7 for 100ms (sending); send “READY” with my address to the host at port 171; 888 T. Miyazaki, T. Suzuki, and F. Yamamoto turn on LED-7 for 100ms (sending); } } } } 3.2 Observing the Behavior of the Application With up to sixteen SunSPOTs (including host SunSPOT), demonstration of this application was performed in a classroom. Strong interest of students was attracted. Primarily, they can understand well the cooperative activities among JAVA machines that are small enough to put in their palm. They can clearly see which machine receives a problem (partial search domain) from a host, and when it goes busy or ready state. In other word, even in the case that the calculation (searching) completes instantly with a normal PC, they can observe steadily and slowly the activities when using SunSPOTs. The positional transparency in a distributed system can be presented simply by moving some of the machines running the application outside the door. Additionally they can understand some new machines join the calculation dynamically by turning on the power switch of them. Through these experiences they can be familiarized with distributed systems technology. It took 176 seconds when using one host and one worker in case of an array of 60x60. In contrast, a solution was obtained in 67 seconds when the number of workers was increased to three. 3.3 Considerations in Programming Education The students may wrestle with the following problems after having had interest in this way, and it is thought that their software development ability can be improved. At first, in this application the amount of operations each machine takes care of may be unequal, depending on the division method of the search domain. This situation can be observed by the red lightning of LEDs, which denotes execution of operations. Students may have willing to do load balancing among workers. As a next problem, consider the case that a worker suddenly goes down while executing operations, for example by pushing the power button off. In this case, some of the expected result will never returned to the host forever. It may be understandable that some kind of transaction facility should be introduced. Namely, recovering mechanism that another worker can take care of the missing problem should be necessary. There exits a bottleneck in the communication among the host and many workers. This problem may also be observable by frequent lightning of blue LEDs of the host when a lot of extra new workers are added. For that problem, a method that the local maximum value should be reserved in a worker by communications between workers in the sub-group will be thought about. By this method, only few workers send a result to a host and consequently communication traffic should be reduced. 4 Conclusion By being familiar with a distributed application, actually holding portable JAVA machines in a hand in this way, most students probably wrestle with the acquisition of Investigating the Run Time Behavior of Distributed Applications 889 the software technology based on multi-threads and network communications eagerly. The environment of physical computing except SunSPOT used this time is also regulated well recently [3]. With it, the application to education [7] would largely be extended. References 1. Arisawa, M.: Algorithms and Their Analysis, Corona Publishing (1989) (in Japanese) 2. Bentley, J.L.: Programming Pearls. Addison-Wesley, Reading (1985) 3. Estrin, D., Culler, D., Pister, K., Sukhatme, G.: Connecting the Physical World with Pervasive Networks. Pervasive Computing (2002) 4. Juille, H., Pollack, J.B.: Massively Parallel Genetic Programming. In: Advances in Genetic Programming II, MIT Press, Cambridge (1996) 5. Simon, D., Cifuentes, C., Cleal, D., Daniels, J., White, D.: Java(TM) on the Bare Metal of Wireless Sensor Devices – The Squawk Java Virtual Machine, VEE, Ottawa (2006) 6. Smith, R.B.: SPOTWorld and the Sun SPOT. In: Proceedings of the 6th international conference on Information processing in sensor networks, pp. 565–566 (2007) 7. Yamamoto, F.: An Educational JavaSpaces Programming Environment with Phidgets Devices. In: Supplementary Proceedings of The 15th International Conference on Computers in Education, pp. 1–2 (2007) 8. Yamamoto, F., Araki, T.: A Parallel Genetic Algorithm with Diversity Controlled Migration and its Applicability to Multimodal Function Optimization. In: Proc. of the AFSS 1998, pp. 629–633 (1998) OntoDesk: Ontology-Based Persistent System-Wide Undo on the Desktop David Nemeskey, Buntarou Shizuki, and Jiro Tanaka Department of Computer Science, University of Tsukuba nemeskey@iplab.cs.tsukuba.ac.jp, {shizuki,jiro}@cs.tsukuba.ac.jp Abstract. Recovery is an important aspect of user experience. However, current desktop environments lack a system-wide undo facility. OntoDesk is an ontology-based experimental desktop system that offers this feature. Ontology is used to model the semantic relationships between parts of the system. OntoDesk assembles a global action history of application use. With this information, it provides undo/redo for any part of the system, including applications without native recovery. The framework allows developers to add advanced features to their applications, and it allows users to explore the system with confidence, knowing that their actions will be reversible. Keywords: OntoDesk, ontology, OWL, system-wide undo, persistent undo, application, action, global history, session management. 1 Introduction User interaction history plays an important role in interactive systems. Most desktop applications allow the user to undo past actions, enabling the user to recover from errors and to explore application functions without hesitation [1]. However, an action history usually belongs to a particular application: other programs cannot access it, and it is lost when the application is closed. Furthermore, even modern operating systems do not support undoing system-level events, such as starting and stopping an application and file creation. These limitations seriously hinder the utility of recovery in an environment where a single mouse click is enough to make a mistake. For instance, the user can accidentally change the desktop background while trying to save a picture in a browser, since the options are close in the context menu. It is also not uncommon that the user closes an application accidentally. While certain applications can reestablish their previous state on restart, others forget the navigation history. Consequently, the lack of persistent undo history means that users cannot undo errors from a previous session; the prior state must be reestablished manually. Joyce, a distributed system framework provides persistent, system-wide undo [2]. The undo mechanism is based on manually-defined dependencies between the actions of applications. While this method works for standalone applications, tasks that involve several applications requires that inter-application dependencies are defined. This is not feasible on the desktop, where applications come from many sources. J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 890–899, 2009. © Springer-Verlag Berlin Heidelberg 2009 OntoDesk: Ontology-Based Persistent System-Wide Undo on the Desktop 891 In this paper, we propose an ontology-based framework that provides persistent system-wide undo, eliminating all the aforementioned limitations of recovery. The system objects, such as files and documents, as well as applications and user actions are described by an ontology. This allows us to build a model that decouples actions from their applications on an abstract level. Our framework tracks the actions and maintains a persistent, global action history. Using an ontology delivers several benefits. Firstly, system objects and actions are defined in an abstract, unified manner, regardless of implementation details. This allows the undo model to be general and applicable to any system object: users can undo anything from modifications to a file, to closing of an application or even changing of the desktop background. Secondly, as the ontology is a text-based data model, it frees the framework from programming languages and toolkits. As a result, existing applications can be integrated into the framework. Finally, we can infer interapplication dependencies even for applications that were not designed to work together. 2 Approach Conventionally, the actions that the user can invoke in an application are not visible to the system. As a result, the system is ignorant of user actions in any specific application. Conversely, events and logs reported by applications to the operating system are usually insufficient for recovery and too low-level to present to the user. We intend to rectify this situation by making actions first-class citizens within the framework. Applications define the available high-level actions. These are the semantic actions that users consider when using the computer, such as “insert text to a document,” “send a mail,” or “close a program.” Applications report the actions they execute to our framework, which builds a persistent, global history list of these action records. The heart of our framework is the ontology, “an explicit specification of the concepts in a domain and the relations among them” [3]. Here, the ontology serves as an intelligent registry where applications and other parts of the system, such as the runtime environment, publish their capabilities. It also tracks all system resources, such as files, documents and running application processes; they are the objects that live in the system and serve as parameters for actions. Resources also have states, which represent the data (or literally, the state, in case of application processes) of the resource at a given time. The history is stored in the ontology as well. Applications include an ontology file that describes their capabilities. This file is merged to the system ontology when the application is installed. Our research focuses on the controlling aspects of desktop systems. The framework models the control flow of a typical desktop environment. Data and content are modeled only to the extent that is necessary for the undo facility. File metadata, interrelations between documents, etc. are not supported. Data semantics are already being investigated by other projects, such as Nepomuk [4]. 2.1 Architecture of the Framework The main component of the framework is the Ontology Server, which runs as a system service. All ontology queries and modifications are carried out by the server. 892 D. Nemeskey, B. Shizuki, and J. Tanaka Other processes can access it only through the interface provided by the server. Applications communicate with the server via IPC. When the user executes an action in an application, the description of the action is sent to the server, which stores it in the ontology. Undo and redo are reported in the same way. The server can also instruct an application to execute one of its actions, or perform an undo or redo operation (see below). Fig. 1. Control flow of the framework 2.2 Undo Support A system-wide undo facility faces challenges that do not appear in traditional applications. Firstly, the global history may contain non-undoable actions, such as sending a mail. Secondly, finding the specific action to undo in a global history would be overwhelming, requiring system-generated resource-local histories including only those actions that affected a particular resource. This is complicated by actions which can affect multiple resources. Lastly, effects of the undo operation should be localized: parts of the system unrelated to the undone action should not be affected. These features demand a more sophisticated undo model. Undo Model. Most of today's applications provide linear undo: actions are stored in a single history list, and undone in reverse temporal order. In our case, however, actions related to a certain resource are scattered in the global action list. Similar challenges are encountered in multi-user undo, and necessitate a non-linear undo model, where actions can be undone regardless of their placement in the history [1]. Another contrast to the traditional undo model is that actions may depend on the result of other actions. For example, actions that belong to an application can only be executed if the application has been started before. Two strategies exist for the case when an action, whose results are used by other actions, is selected for undo. Direct selective undo [1] forbids such operations. Cascading selective undo [5] undoes dependent user actions until a meaningful state is reached. Since the former approach would be very limiting, we have opted for cascading selective undo in our framework. When an action is selected for undo, the system first determines its undo closure. It includes all actions that directly or indirectly depend on resources created or changed during the execution of the selected action. Then all actions in the closure are undone in reverse temporal order. If any action in the closure is not undoable, the whole undo request is rejected. This process ensures that (1) the system will be in a consistent state after the undo operation, and (2) unrelated actions are not affected. OntoDesk: Ontology-Based Persistent System-Wide Undo on the Desktop 893 Although a form of selective undo is employed “under the hood,” we decided to allow only linear undo on the UI. Users are not familiar with the selective undo mechanism; neither are there applications that support it natively. Further, linear undo fulfills the stable execution property: the command is always undone in the state that was reached after execution [1]. This property greatly simplifies reasoning in the ontology, as there is no need to verify if an action is undoable in the current state. Application- and Server-based Undo. Our framework provides three methods of undoing actions. Even if an application does not implement undo (at all, or for a set of actions), the Ontology Server may execute the undo based on the information present in the ontology. Hence, we differentiate between native (application-based) and server-based undo. In the case of native undo, the undo facility is implemented in the application itself. When the user issues an undo or redo command, the application executes it and reports it to the server. The server then modifies the history accordingly. Server-based undo is implemented using two strategies: inverse actions and partial checkpoints [6]. These methods are chosen only if native undo is not available. Applications can define inverses for their actions in the ontology. When an action with an inverse is selected for undo, the server assembles a message for the inverse action, and fills its inputs from the data of the original action. It then sends the message to the application and instructs it to execute the inverse action. Our framework also allows actions to be defined as checkpoints for resources. This tells the system that the resource is in an easily reproducible state after the action is executed. For example, a Save File action serves as a checkpoint for documents; Start Application action for an application. Using checkpoints, the server can undo even non-native, non-invertible actions. First, the system looks for the latest checkpoints of all input resources of the selected action. It then re-executes all commands from the checkpoints up to, but not including, the action, thereby resetting the state of all related resources to their condition prior to the action. 3 Ontology Design Our ontology language of choice is OWL-DL [7]. It is based on description logic, thus not dissimilar to an object oriented language [8]. Objects in the domain are represented as individuals; these are then arranged into a class hierarchy. Individuals have properties, which can refer to other individuals or simple data types. To represent actions, we use OWL-S, a vocabulary for Semantic Web Services [9]. Actions in our ontology are OWL-S Services. Two ontology files are used by the Ontology Server. The definition ontology contains the type hierarchy and the individuals that represent system-level entities, including the registered actions and applications. The history ontology contains the history list and all related data. 3.1 Main Classes The ontology defines three main categories: Resources, Actions and Applications. Actions are provided by the Applications. For example, text editors provide the Insert 894 D. Nemeskey, B. Shizuki, and J. Tanaka Text action, which inserts text into a document. A special application type, Framework provides the system-level actions. The currently supported resource types are Document, File and RunningApplication. A RunningApplication object represents a running process. Documents represent meaningful data, such as a body of text or an image. They are stored in Files. A file may contain several Documents, such as a multimedia file that contains video and audio streams. Data that has not yet been saved into a file, e.g. when the user starts drawing a picture in an image editor, can also be represented. In such a case, the Document is stored in the RunningApplication instance. Modifications to the data only affect the Documents, and the changes are visible in the File only after a Save File action is executed (see Fig. 2). 3.2 Undo Handling The undo options discussed in Section 2.2.2 are represented as follows. The supportsUndo property shows if an application supports undo. Actions can have the following properties: • • • • The undoable property shows if the action is undoable; The nativeUndo property shows if the actions is undoable natively; The inverseAction property is used to define the inverse of the action; The checkPoint property declares that the action is a checkpoint for the related resource. If the nativeUndo property is true, native undo is performed on request. If false, but the action has an inverse (the action the inverseAction property refers to), the inverse action method is used. Otherwise, if, for every resource-type input parameter of the action, there is at least one action in the history whose checkPoint property refers to the input resource, the checkpoint method is selected. If none of these conditions are fulfilled, or the undoable property is false, no undo is possible for the action. Fig. 2. Shows how the resources (left) and their states (narrow bars) change when the user executes an action sequence (top). The dotted arrows indicate which actions result in new states for the resources. The meaning of the storedIn property is as described above. The storedIn property arrows between the application and document states are omitted for brevity. OntoDesk: Ontology-Based Persistent System-Wide Undo on the Desktop 895 3.3 Action History Our handling of action history is similar to the model described in [10]. The history consists of ExecutedAction objects. They contain a reference to their action type, and store the actual values for the input and output parameters of the action. These values can be Resources in the ontology or raw data (strings, numbers, etc.). Resources are added to the ontology as parameters of the ExecutedAction objects. The only resources in the history ontology are those that are referenced by the actions; i.e. not all files in the system are represented in the ontology. In this way, all information necessary for undo is stored in the ontology, yet the number of objects – and therefore overhead – is minimized. The ontology must reflect a constantly changing system state. The framework employs a very simple time scale, where the system state changes only as the result of actions. Because all actions are saved as ExecutedAction objects, they can serve as time points. Changes to Resources are modeled with State objects. Every Resource is assigned a State when it is created. Actions that modify the data associated with a Resource (as opposed to reading it; e.g. the Save File action modifies the state of the file, but Load File does not) report this fact by requesting a new State for it. Relations between resources are represented by properties between their States. This enables the ontology to represent relations whose validity changes with time, such as “after the Save File action, the document stored in the file is the same as the one loaded into the application (see Fig. 2).” 4 Implementation To test the feasibility of our approach, we implemented a mock desktop, OntoDesk. Our experiences with the system allowed us to examine the requirements for integrating applications to our framework, and the effort required. 4.1 OntoDesk OntoDesk is a desktop simulator written in Java. It supports typical desktop features like wallpapers, a start menu and window management. Applications are displayed in internal frames. Currently, OntoDesk includes the following applications: • A file manager, • Image Editor – an image viewer/editor for the user to draw simple shapes, • Two text editors, Text Editor and Simple Editor. Although they have the same features, Simple Editor relies entirely on the framework to provide undo. The Ontology Server is implemented as a system thread and communicates with applications by socket interface. The system ontology is accessed via OWL API [11]. The undo operation should be easily detected and specified, and the state resulting from the execution thereof should be easily predicted [12]. Though our system allows only linear undo, all resources have their own histories, requiring an easy-to-use interface to the system-wide history facility. OntoDesk introduces the History Viewer tool, which allows the user to choose a resource or a running application and display its 896 D. Nemeskey, B. Shizuki, and J. Tanaka Fig. 3. OntoDesk with Image Editor and Text Editor open (right). The History Viewer (left) is displaying the actions executed in the Text Editor. action history. Undone operations are displayed in a lighter color. The user can also undo and redo selected actions. If the selected action is not last in the history, all subsequent actions are also undone. All applications use our framework to access the history of their resources and to implement undo/redo. The file manager can display the undo history of a file in the History Viewer. When a file is opened in a text or image editor, previous modifications are loaded with the file, and they become available to the user. Furthermore, the user can undo the closing of applications, in which case OntoDesk restores the application and its documents to the state it was in before it was closed. 4.2 Requirements for Integration In this chapter, we review the requirements of integration and the effort needed. Ontology. Applications that want to utilize the framework’s capabilities must define an ontology including the: • Actions they provide. Actions defined by other applications or the system itself may be reused. In OntoDesk, the Simple Editor uses the same actions as the Text Editor; it is only the implementation that is different. • Resource types that are specific to the application, such as custom file types. Changes in the Application. Integration to the framework also requires that the code base of the application be extended with the following three components: • A communication interface to the Ontology Server. As this component is completely generic, it can be abstracted into a library. OntoDesk: Ontology-Based Persistent System-Wide Undo on the Desktop 897 • A command interface that acts as a mediator between the Ontology Server and the application. It reports executed actions to the Server and forwards its requests to the appropriate parts of the application. • Callback hooks need to be added to the already existing source code to notify the command interface about user actions and their parameters. These hooks are typically placed in the event handlers of the application's interfaces (UI or otherwise), and are generally short (a few lines per event). A Case Study Using OntoDesk. All applications in OntoDesk had been developed as regular applications before they were adapted to our framework. This proves the feasibility of integration. By comparing the application source code before and after the adaptations, we can measure the effort required for integration. OntoDesk provides the communication interface as a Java library. The command interface is provided as a system class, which implements basic event reporting. Client applications extend this class through subclassing to fit to their needs. Table 1 summarizes the additional modifications. As we can see, while the framework requires some boilerplate code, only a moderate number of additional lines are required for new actions. Although the percentage of framework-related code may seem high, it must be noted that the applications are very simple. The complexity of real-life applications dwarfs the 16-20 lines per action required. Moreover, utilizing the framework can even help reducing code size: by delegating undo to the framework, Simple Editor's code base became 23% shorter than that of Text Editor (*). Table 1. Source code modifications required by the integration in OntoDesk. LOC = Lines of code, excluding comments. ΔLOC shows the number of extra lines required for the command interface and callback hooks. Avg. LOC/action shows the average LOC needed per action. Application # of actions LOC ΔLOC Avg. LOC/action Framework code % Text Editor 4 834 181 20 17.8% Simple Editor 4 634* 159 18 20% Image Editor 3 858 148 16 14.7% File Manager 1 996 76 16 7% 5 Advantages of Our Framework A persistent, system-wide undo benefits the user by removing today’s recovery limitations. This framework also offers advantages to developers. In OntoDesk, programmers can utilize the built-in History Viewer tool. Since it provides an application-independent way of displaying action histories, developers can choose to omit the undo/redo options from the user interface of their applications, and rely on the History Viewer instead. OntoDesk’s Simple Editor uses this approach. Server-based undo can be used to add undo capabilities to an application merely by defining the inverse and checkpoint properties for its actions. In OntoDesk, several applications use this approach. Image Editor and Text Editor natively support undo for document editing. However, it is also possible to undo the New Document or Load File actions, a feature generally lacking in applications. Here it is achieved by 898 D. Nemeskey, B. Shizuki, and J. Tanaka server-based undo. Furthermore, Simple Editor does not support native undo at all, yet, with OntoDesk, it has feature parity with Text Editor. In addition, the persistent undo mechanism can be used to implement session management. OntoDesk handles system sessions in this way: when the user closes the main window, all applications are closed. When OntoDesk is restarted, these actions are undone, and the system returns to the state it had before the user left it. 6 Future Work and Conclusions In this paper we have proposed a framework that allows persistent, system-wide undo on the desktop. Allowing applications to describe the actions they provide in an ontology enabled our system to create an abstract model of the capabilities of its components. This information can be used to provide a global action history. Our framework can be applied to existing applications with minimal modifications. OntoDesk, an experimental desktop, showcases the potential of our approach. A global action history has uses besides beyond undo/redo. It could serve as a basis of a system-wide macro facility [13]. By enriching the semantic information in the ontology, a Programming By Example system could also be based on our framework. We would like to explore these possibilities in future versions of OntoDesk. Our final goal is to test the feasibility of our framework in a real-life environment by adapting our framework to an existing application or desktop environment. References 1. Berlage, T.: A selective undo mechanism for graphical user interfaces based on command objects. ACM Transactions on Computer-Human Interaction 1, 269–294 (1994) 2. O’Brien, J., Shapiro, M.: Undo for anyone, anywhere, anytime. In: Proceedings of the 11th workshop on ACM SIGOPS European workshop, vol. 31 (2004) 3. Noy, N.F., et al.: Creating Semantic Web Contents with Protégé-2000. IEEE Intelligent Systems 16, 60–71 (2001) 4. NEPOMUK – The Social Semantic Desktop, http://nepomuk.semanticdesktop.org/ 5. Cass, A.G., Fern, C.S.T.: Modeling dependencies for cascading selective undo. In: IFIP INTERACT 2005 Workshop on Integrating Software Engineering and Usability Engineering (2005) 6. James, E., Archer, J., Conway, R.W., Schneider, F.B.: User Recovery and Reversal in Interactive Systems (1981) 7. OWL Web Ontology Language Reference, http://www.w3.org/TR/owl-ref/ 8. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The description logic handbook: theory, implementation, and applications. University Press, Cambridge (2003) 9. OWL-S: Semantic Markup for Web Services, http://www.w3.org/Submission/OWL-S/ 10. Zhou, C., Imamiya, A.: Object-based nonlinear undo model. In: Proceedings of the 21st International Computer Software and Applications Conference, pp. 50–55 (1997) OntoDesk: Ontology-Based Persistent System-Wide Undo on the Desktop 899 11. The OWL API, http://owlapi.sourceforge.net/ 12. Masuda, H., Imamiya, A.: Design of a graphical history browser with Undo facility, and visual search analysis. Syst. Comput. Japan 35, 32–45 (2004) 13. Myers, B.A., Kosbie, D.S.: Reusable hierarchical command objects. In: Proceedings of the SIGCHI conference on Human factors in computing systems: common ground, pp. 260– 267 (1996) Peer-to-Peer File Sharing Communication Detection System with Traffic Mining and Visualization Satoshi Togawa1, Kazuhide Kanenishi2, and Yoneo Yano3 1 Faculty of Management and Information Science, Shikoku University, 123-1 Furukawa Ojin-cho Tokushima, 771-1192 Japan doors@shikoku-u.ac.jp 2 Center for Advanced Information Technology, The University of Tokushima, 2-1 Minami-Josanjima Tokushima, 770-8506, Japan marukin@cue.tokushima-u.ac.jp 3 Institute of Technology and Science, The University of Tokushima, 2-1 Minami-Josanjima Tokushima, 770-8506, Japan yano@is.tokushima-u.ac.jp Abstract. In this research, we have built a system for network administrators that visualize the Peer-to-Peer (P2P) file sharing activities of network users. This system monitors network traffic and discerns traffic features using traffic mining. This system visualizes the P2P file sharing traffic activities of an organization by making the processing object not an individual user but a user group. The network administrator can comprehend the P2P sharing activities of the organization by referring to the map. This system extracts traffic features from captured IP packets that the users communicated. And this system extracts the appearance ratio of DNS host query. Afterwards this system creates traffic model. These features of the traffic model are emphasized by weighting. After that, the traffic model is visualized by High Speed Spherical Self-Organizing Map. This feature map shows network traffic behavior related with P2P file sharing communication like a birds-eye view. As a result, we think we can assist the monitoring operation and network administration. Keywords: Traffic Mining, Traffic Visualization, Administrator Assistance, Peer-to-Peer communication Detection. High Speed Spherical SOM. 1 Introduction Today, Peer-to-Peer (P2P) applications have become on the Internet. It is applied in the field of file sharing, voice over ip (VoIP) and some groupware. Especially, a lot of file sharing software has been designed on the P2P communication model. If Internet users want to get various kinds of data, the users can easily obtain various files and data using P2P file sharing software. The file content often includes music CDs and DVDs protected by copyright law, it is not appropriate to exchange these files. Moreover, popular P2P file sharing application in Japan, such as Winny and Share need a huge bandwidth because these applications send and receive large amounts of data. As a result, regular communication such as E-mail, Web browsing is obstructed by J.A. Jacko (Ed.): Human-Computer Interaction, Part I, HCII 2009, LNCS 5610, pp. 900–909, 2009. © Springer-Verlag Berlin Heidelberg 2009 Peer-to-Peer File Sharing Communication Detection System 901 P2P applications traffic. Additionally, new P2P file sharing software that is called Perfect Dark is released for several years. In addition, a virus that causes data compromise has been generated for the P2P file sharing network. These are many cases of classified information being compromised because of these viruses. For example, it is called Antinny, these viruses and their varieties posed a lot of cases of data compromise problem. These viruses give out sensitive information stored in the user’s computer using the publication function on the P2P file sharing application. When an organization’s member uses a P2P file sharing application, there is a risk that the organizations security will be compromised. If certain classified information leaks, the data will pass along from place to place in the P2P network. For example, many data compromise accidents occurred in Japan at 2006. A large amount of military intelligence and investigate information were leaked from SelfDefense Forces and police departments. In addition, a lot of companies leaked customer’s private information, and these are continuing compromise incident today. These accidents are extremely serious and can result in customers losing trust in an organization. There are limitation techniques which use packet filters to limit illegal traffic that deviates from the policy established by the company or university. Filter technology, which synchronizes with the packet filter definition, is installed on the firewall. And, illegal traffic that does not conform to the site policy is not forwarded to users. However, it is difficult to keep the filter definition perfectly set because the default destination port of each P2P application is different. Moreover, P2P applications such as Winny and Share select the destination port dynamically. This P2P application’s traffic limitation can be impossibly strict. Therefore, P2P traffic cannot be limited only filter technology based on packet filter definitions. On the other hand, the network administrator can use a specialized firewall system to limit P2P traffic such as One Point Wall. These P2P traffic limitation techniques are based on the signature information that is extracted from illegal traffic. It is completely blocked when a signature matches a traffic pattern. However, when a signature does not match the traffic, illegal traffic is not restricted. Accordingly, the network administrator has to understand the P2P application’s behavior in the organization’s network traffic. If the administrator can understand the P2P application’s behavior, the administrator can usually ascertain problem at an early stage. At the present time, if an administrator wants to understand P2P application activities, protocol analysis such as sniffer tools can be used. However, this method is very labor intensive, and these methods only provide basic information like IP address and port number level classification. The network administrator really wants a result that shows where the P2P file sharing application is used? For these reasons, we have developed a traffic visualization system for P2P communication detection and administrator assistance. This system provides a feature map of traffic behavior made up from results of network traffic mining. This system assists the monitoring operation of the administrator by showing the feature map that this system presents. As a result, that we can assist the monitoring operation of the administrator. We pay attention to the traffic that the organization users send and receive. These features are extracted from this traffic using traffic mining. The features are the 902 S. Togawa, K. Kanenishi, and Y. Yano source/destination IP addresses, and source/destination TCP (or UDP) port number and TCP flags condition. In addition, we pay attention to the results of DNS query from internal clients. We found that when the P2P nodes try to find other nodes, the DNS query amounts are less than normal DNS queries such as Web browsing. We use these DNS query features for discerning P2P application behavior. Moreover, this system acquires the packet occurrence frequency and yield. Consequently, a traffic model is generated from the feature and packet occurrence frequency and a result of the DNS query. The method of generating the model is the Vector Space Model (VSM). The similarity problem between traffic features is replaced with a cosine measure between vectors. Weighting is added to the obtained traffic model to emphasize feature quantity. Afterwards, a feature map is generated by using High Speed Spherical SelfOrganizing Map (HSS-SOM) from the traffic model. This method enables to make accurate clustering results, because HSS-SOM does not have the problem of ‘Border Effect’ compared to Flat SOM. This map shows an administrator which computer communicated to other computers and the volume of the communication. It expresses not only the summarized traffic amount but also each traffic type and behavior. It can be said that the feature map is a result of traffic mining from the users’ traffic, and the administrator is assisted in understanding organization’s traffic behavior by this map. In this paper, we proposed a system framework of traffic visualization for P2P communication detection, and we show a configuration of the prototype system. Next, we show the results of experimental use and examine these results. Finally, we describe future study, and we show conclusions. 2 Assisting the Detection of P2P File Sharing Traffic 2.1 Framework of P2P File Sharing Detection Fig.1 shows a framework of administrator assistance for P2P file sharing traffic detection. We assist the monitoring and detecting operation of the network administrator by providing the traffic behavior of the organization users. We paid attention to traffic between internal side and the Internet. All users’ traffic passes a gateway in the internal site. We collect all IP packets that pass the gateway. In addition, traffic features are extracted from collected IP packets. Moreover, we paid attention to the results of DNS host queries from internal DNS resolver to destination’s DNS servers. In most cases, P2P nodes information is distributed without hostname including FQDN (Fully Qualified Domain Name). In the result, the DNS host query amounts of the internal P2P nodes are less than normal applications. If the traffic feature of one host had different from other hosts, and that host’s DNS host query amount was less than other hosts, that host has high probability of the P2P node. Consequently, a traffic model is generated from extracted features that the users communicated and the results of DNS host queries. The method of generating the traffic model is Vector Space Model. As a result, the similarity problem between source IP addresses is replaced with a cosine measure between feature vectors. Moreover, weighting is added to the obtained traffic model to emphasize feature quantity. Peer-to-Peer File Sharing Communication Detection System 903 P2P Nodes administrator Internet Weighting Traffic Redirect Extracting and Mining a Traffic Feature Layer2 Swtich (Gateway) Feature Map Redirected Traffic Data Traffic Features Traffic Model Vector Space Model Internal Site High Speed Spherical Self-Organizing Map DNS resolver user user user Fig. 1. Framework of administrator assistance A series of processing described here are traffic mining. Because, the feature related to P2P file sharing communication is extracted from all captured traffic by the series of processing. Moreover, extracted and emphasized features are stored to the traffic model. This model adapt to traffic feature visualization. Afterwards, a feature map is generated by High Speed Spherical Self-Organizing Map (HSS-SOM). HSS-SOM is an algorithm to map multi-dimensional vectors on a spherical space map. As a result, this map expresses the typical source IP addresses that the users communicated. The administrator gets birds-eye view of the organization communication activities by referring to the map. Therefore, the administrator is assisted in understanding P2P traffic behavior by this feature map. 2.2 Framework of P2P File Sharing Detection Fig.2 is hybrid and pure P2P file sharing architectures. The left side figure shows hybrid P2P communication model, and the right side figure shows pure P2P communication model. Generally, hybrid P2P model has one or some central server which keeps all Meta information such as kind of files and these names. It is directory of user identities and index of resources on the P2P community. If the administrator wants to make restriction of use hybrid P2P file sharing, it only has to block off the path to the index server. However, this limitation technique is ineffectual for pure P2P file sharing architecture. This architecture does not have central servers such as index server. All information of sharing resources is stored to some nodes. In this result, index information of resources is distributed on the pure P2P network, it is difficult to block off the path to the resources of sharing information. 904 S. Togawa, K. Kanenishi, and Y. Yano Index Server Node Node Node Node Node Node Node Hybrid P2P model Node Node Pure P2P model Fig. 2. P2P file sharing architecture Therefore, if an administrator wants to limit pure P2P file sharing communications, the administrator must keep monitoring users’ communication activity of organization. 2.3 Exploratory Experiment and Results for Traffic Feature Extraction We made an exploratory experiment to clarify a feature of DNS host query by the P2P node. Especially, we want to clarify a DNS host query feature of the pure P2P nodes. The P2P file sharing application was installed to experimental computer, and the experimental computer was used for 20 minutes with P2P application. After that, we generated general Web browsing traffic with other computers. Then both traffic was monitored and compared for this exploratory experiment. Table 1 and Table 2 show the measuring results of the exploratory experiment. First of all, we can find a difference of the amount of sending IP packets. The case of pure P2P communication model much than Web browsing case per same measurement time. It is about 3.5 times larger than Web browsing case. That mean is P2P communication model makes a lot of a lot of connections between internal node and P2P nodes on the Internet. Table 1. Measuring Results of pure P2P communication Amount of sending IP packets Amount of destination IP addresses Appearance Ratio of TCP PUSH flag Appearance Ratio of DNS host query 22,235 415 35.2% Less than 0.1% And, it is understood that appearance ratio of DNS host queries by P2P communication is remarkably low. In most situations, P2P nodes information is provided without hostname including FQDN (Fully Qualified Domain Name). As a result, it is Peer-to-Peer File Sharing Communication Detection System 905 Table 2. Measuring Results of general Web browsing communication Amount of sending IP packets Amount of destination IP addresses Appearance Ratio of TCP PUSH flag Appearance Ratio of DNS host query 6,416 42 9.2% 91.8% provided only IP addresses. Therefore, DNS hostname resolution is not required to make connections between both P2P nodes. When the connection between P2P nodes is generated, the DNS host query is not appeared. Therefore, we can find striking difference of an appearance ratio of DNS host query between both communication models. We can find small display of an appearance ratio of TCP PUSH flag between both communications. And this feature is variable in amount. When we use appearance ratio of TCP PUSH flag, we have fear of erroneous decision for detecting P2P communications. As a result, we think important features for detecting P2P communication are the appearance ratio of DNS host query and the amount of sending IP packets. 3 System Configuration We show the configuration of proposed system in Fig. 3. This system has 5 modules that include a “Traffic Collection Module”, “Traffic Analysis Module”, “DNS Query Analysis Module”, “Modeling Module” and “Visualization Module”. A detailed description of each module is provided below. 3.1 Traffic Collection Module IP packets that users of organization sent and received are redirected via layer2 switch with port mirroring function. Traffic Collection Module accepts the redirected IP packets from layer2 switch. In addition, an Ethernet adapter configuration of this system is set to promiscuous mode. Because, this module have to accept all related IP packets. The accepted IP packets include normal traffic and illegal traffic, and all accepted IP packets are passed to the Traffic Analysis Module. 3.2 Traffic Analysis Module This module attempts selection for all accepted IP packets. First of all, an administrator gives the IP address information of internal site servers to this module. And source traffic of the internal servers is dropped from all accepted traffic by using the given IP address information. Next, this module attempts to select traffic features from selected IP packets. This module analyzes a packet field of selected IP packets, and some feature extracted from selected packets. The features are the source/destination IP address, and the source/destination TCP PORT number and TCP flags status. At the same time, each packets occurrence rate is calculated and stored. All extracted and calculated 906 S. Togawa, K. Kanenishi, and Y. Yano features are passed to the Modeling Module with other features generated from DNS Query Analysis Module. 3.3 DNS Query Analysis Module The DNS resolver processes a DNS host resolution requests that was required from internal users. This module collects the results of DNS host resolution and requested client information from DNS resolver’s log. It is selected excluding the incomplete results of DNS request. All extracted complete results of DNS hostname resolution are passed to the Modeling Module. P2P Community redirected traffic Suggestion System Traffic Collection Module traffic extraction DNS Query Analysis Module DNS query collection Layer2 Swtich Traffic Analysis Module feature extraction Modeling Module traffic model DNS resolver users Visualization Module HSS-SOM feature map administrator Fig. 3. System configuration of proposed system 3.4 Modeling Module This module generates a traffic model which is defined by Vector Space Model. One source IP address corresponds to one multi-dimensionally composed vector, and each element of the multi-dimensional vector stores are number of destination IP address and the destination PORT number. We call this multi-dimensional vector a “feature vector”. The number of feature vectors is the same as the total amount of extracted source IP addresses. The set of these feature vectors becomes the traffic model. The weighting process done to the feature vectors emphasizes the characteristics of the traffic model according to the occurrence rate with which the source IP address and DNS host query ratio. As a result, if the module discovers frequency appearing source IP address, it is possible to find the P2P packet spreader host. 3.5 Visualization Module This module visualizes and making the feature map from obtained traffic model. The High Speed Spherical Self-Organizing Map is used as a visualization method in this Peer-to-Peer File Sharing Communication Detection System 907 module. The source IP addresses of the processing object are self-organized by HSSSOM algorithm. This results in a well-consolidated visual output that allows the administrator to get birds-eye view of internal users’ P2P communication activities. 4 Experimental Use and Result 4.1 Experimental Environment This system was tested to confirm its effectiveness. We collected traffic from users belonging to one organization on September 2008. The amounts of observed data extracted from collected IP packets and generated feature vectors are presented in Table 3. And, Table 4 shows the computer’s specification of experimental use. Table 3. Amount of Observed Data Data Type Observed Data Generated Feature Vectors Amounts 2,329,730 26,480 Table 4. Specification of the Experimental Use CPU Specification System Memory Capacity HDD Capacity Operating System Intel Core 2 Quad Q6600 (2.40GHz) 3Gbytes 300Gbytes Microsoft Windows XP SP3 Fig. 4. Feature map of non weighted 908 S. Togawa, K. Kanenishi, and Y. Yano Fig. 5. Feature map of weighted by DNS host query ratio 4.2 Feature Map Feature maps were generated once a half hour in the experimental period. The traffic data was put into the system. The input packet amount was about 6,800,000 packets and the source IP address count that the system extracted was about 1,000. Each feature map that the system generated had 120 elements, and each element corresponded to a summarized source IP addresses. Therefore, the source IP addresses appearing in the map where communicated a lot of times related with P2P communication by the computers. We show generated feature maps in Fig.4 and Fig.5. Fig.4 shows the result of non emphasized traffic model inputted. We can find four clusters from Label No.1 to 4. These clusters declare labeled hosts made a lot of traffic. However, these hosts which are labeled except No.2 are not related P2P file sharing traffic. No.2 is processed P2P file sharing software named Share. The feature map of Fig.5 has just one cluster, and this host processed P2P file sharing communication. This map is the result of using emphasized traffic model inputted. This weighting method used by appearing ratio of DNS host query. These hosts which are not related with P2P communication were not appeared in this feature map. As a result, we think that we can make complete to detect the P2P file sharing communication. The network administrator is assisted to detect the P2P file sharing traffic using this feature map and traffic mining method. We think that the appearance ratio of DNS host query is especially effective for the feature map generating. 5 Conclusion In this paper, we proposed a traffic mining system for P2P communication detection. And we explained a configuration of the prototype system. And, we shown the results of experimental use and examine. Peer-to-Peer File Sharing Communication Detection System 909 This system extracts records of P2P communication activities from the collected IP packets and the collected DNS query results. In addition, this system provides a feature map for the administrator. We developed a prototype system and experimented to confirm its effectiveness. It was shown that an administrator could inspect the results of feature maps. References 1. 2. 3. 4. Kaneko, I.: The Technology of Winny. ASCII Books, Tokyo (2005) BitTorrent Web Site, http://www.bittorrent.com/ Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Heidelberg (2001) Yoshida, k., Katuno, S., Ano, S., Yamazaki, K., Tsuru, M.: Stream Mining for Network Management. IEICE Trans. Communication E89-B(6), 1774–1780 (2006) 5. Sugimoto, S., Tachibana, K.: A First Attempt to Apply High Speed Spherical Selforganizing Map to Huge Climate Datasets. In: SOLA 2008, vol. 4, pp. 041–044 (2008) 6. Brennan, D., Hulle, M.: Comparison of Flat SOM with Spherical SOM: A Case Study, http://simone.neuro.kuleuven.be/marc/brennan.pdf Author Index Abe, Kiyohiko 3 Abu Bakar, Noor Azyanti 510 Adikari, Sisira 549 Adjouadi, Malek 49 Agarwal, Anshu 385 Ahmed, Salah Uddin 559 Akasaka, Toshiya 569 Akolkar, Rahul 856 ¨ Ala¸cam, Ozge 12 Almada, Ant˜ ao 779 Al Mahmud, Abdullah 559 Almeida, Andr´e 779 Alsos, Ole Andreas 232 Ando, Kazuhiro 184 Asahi, Toshiyuki 195, 214 Asteriadis, Stylianos 22 Bae, Jin Ah 85 Balagtas-Fernandez, Florence 204 Barboni, Eric 667 Barbosa, Simone Diniz Junqueira 697, 715 Barreto, Armando 49 Barthel, Henning 846 Bastide, R´emi 579 Beard, Jacob 816 Beelders, Tanya R. 395 Bellotti, Victoria 214 Bergaust, Kristin 559 Blignaut, Pieter J. 395 Boardman, Richard 500 Boguslawski, Gemma 421 Bojko, Agnieszka (Aga) 30 Bolchini, Davide 320 Bomsdorf, Birgit 587 Brink, Gunnar 789 Bubb, Heiner 105 Bueno, Ariane M. 715 Bullinger, Angelika C. 874 Bullinger, Hans-J¨ org 789 Campbell, John 549 Cardoso, Nuno 779 Cerratto Pargman, Teresa Chalmers, Matthew 687 606 Chalon, Ren´e 796 Chang, Chen-Hao 474 Chen, Judy 500 Chi, Ed H. 597 Cho, Ae Jin 455 Cho, Eun Joung 455 Choi, Jong Kyu 224 Close, Tyler 490 Coninx, Karin 616 Cuˇr´ın, Jan 287 Cybis, Walter de Abreu 339 D’Zmura, Michael 40, 176 Dahl, Yngve 232 Dalcı, Mustafa 12 David, Bertrand 624, 796 Dednam, Engela H. 395 de Freitas, Sara 149 Deng, Siyi 40, 176 Dittmar, Anke 806 Dub´e, Denis 816 Dunwell, Ian 149 Efe, Kemal 405 Eliasson, Johan 606 Engel, J¨ urgen 826 Eust´ aquio Rangel de Queiroz, Jos´e Ezzedine, Houcine 624 242 Faiola, Anthony 320 Ferreira, Danilo de Sousa 242 Forbrig, Peter 624, 806 Fraz˜ ao, Jo˜ ao 779 Fujimura, Aya 131 Fukuzumi, Shin’ichi 184, 195, 214, 252 Gao, Ying 49 Garc´ıa-Gaona, Alma R. Girard, Patrick 624 Goschnick, Steve 836 Guo, Yinni 413 Hachimura, Kozaburo Haesen, Mieke 616 Han, Tack-Don 540 149 159 912 Author Index Haywood, Anna 421 Heidrich, Jens 846 Heikkinen, Kari 69 Hercegﬁ, K´ aroly 59 Hermann, Fabian 431 Hipp, Cornelia 259 Hiroi, Kanya 660 Horiguchi, Yukio 438 Hsieh, Min-Chih 465 Huang, Fei-Hui 446 Hussmann, Heinrich 204 Hwang, Sheue-Ling 446 Ikegami, Teruya Izs´ o, Lajos 59 195, 252 Jain, Jhilmil 490 Janssen, Doris 431 Jeﬀries, Robin 500 Jeon, Byeong Ho 85 Ji, Yonggu 224 Jin, Beom Suk 224 John, Bonnie E. 267 Kalckl¨ osch, Robert 846 Kalwar, Santosh Kumar 69 Kanenishi, Kazuhide 900 Karpouzis, Kostas 22 Kim, Han Joon 224 Kim, Jonghwa 77 Kim, Yu-Jin 85 Kimura, Masaki 95 Kiura, Mikio 277 Kleindienst, Jan 287 Kl¨ uh, Michael 304 Kollias, Stefanos 22 Kolski, Christophe 624 Komaragiri, Vihari 677 Kunifuji, Susumu 644 Kuroda, Yuji 438 Kweon, Sang Hee 455 Labsk´ y, Martin 287 Ladry, Jean-Fran¸cois 667 Lange, Christian 105 Lappas, Tom 40, 176 Lee, Haeinn 111 Lee, Ho-Joon 295 Lee, Jungtae 111 Lee, Ying-Lien 446 Leuteritz, Jan-Paul 304 Lew, Gavin S. 314 Liarokapis, Fotis 149 Liggesmeyer, Peter 846 Lin, Chiuhsiang Joe 465 Liu, Cheng-Li 474 Liu, Hunszu 484 Lopes, Gon¸calo 779 Lucas, Bruce 856 Luyten, Kris 616 Luzcando, Edgardo 320 Maeshiro, Midori 866 Maeshiro, Tetsuya 866 Marasinghe, Ashu 735 M¨ artin, Christian 826 Mart´ınez-Mir´ on, Erika A. 149 Matsuda, Shota 752 Matsuda, Yuko 121 Matsui, Takemi 184 Matsumoto, Ken-ichi 95, 121, 277 Matsuo, Tomoya 634 Matsuzaki, Shuichi 735 McDonald, Craig 549 McDonald, Theo 395 Mesarina, Malena 490 Meskens, Jan 616 Mirel, Barbara 329 Miura, Motoki 644 Miyazaki, Tsuyoshi 882 Moeslein, Kathrin M. 874 Morandini, Marcelo 339 M¨ unch, J¨ urgen 846 Nakanishi, Hiroaki 438 Nakayama, Shin-ichi 866 Navarre, David 667 Nebe, Karsten 652 Nemeskey, David 890 Nishiuchi, Nobuyuki 184 Nishizawa, Yosoko 660 Odaka, Naoki 159 Ohi, Shoichi 3 Ohira, Masao 95, 121, 277 Ohyama, Minoru 3 Okada, Hidehiko 252 Okada, Yusaku 569 Okubo, Masashi 131 Ozerturk, Sabriye 405 Author Index Paelke, Volker 652 Palanque, Philippe 667 Park, Byungho 141 Park, Jong C. 295 P´ aszti, M´ arton 59 Pavlidis, Ioannis 169 Peissner, Matthias 259 Prabaker, Madhu 385 Proctor, Robert W. 413, 769 Raj, Avinash 677 Ramberg, Robert 606 Rao, Shailendra 500 Rebolledo-Mendez, Genaro Recker, John 490 Rojas, Jose 687 Ryu, Young Sam 349 Tanaka, Jiro 890 Taslim, Jamaliah 510 Tengku Wook, Tengku Siti Meriam Tharangie, K.G.D. 735 Thorpe, Samuel 40, 176 Togawa, Satoshi 900 Tong, Yu 520 T´ ov¨ olgyi, Sarolta 59 Tran, Chi Dung 624 Trujillo, Anna 362 Tsai, Ping-Jung 465 Uang, Shiaw-Tsyr 474 Umemuro, Hiroyuki 725 Uwano, Hidetake 95, 121 149 Sakaguchi, Yusuke 184 Sakata, Mamiko 159 Salim, Siti Salwa 514 Salvendy, Gavriel 413, 769 Sauro, Jeﬀ 352 Sawaragi, Tetsuo 438 Sayers, Craig 490 Scapin, Dominique L. 339 Schipke, Daniel 431 Schuller, Andreas 431 Segura, Vin´ıcius Costa Villas Bˆ oas Seo, Jonghoon 540 Seo, Ssanghee 111 Shastri, Dvijesh 169 Shi, Yuanchun 760 Shiang, Wei-Jung 465 Shimohara, Katsunori 866 Shin, Seungchul 540 Shizuki, Buntarou 890 Sikorski, Marcin 706 Silva, Bruno S. da 715 Sinnig, Daniel 587 Soeldner, Jens 874 Solves Pujol, Ramon 725 Srinivasan, Ramesh 40, 176 Sugawara, Kei 752 Sugihara, Taro 644 Suzuki, Satoshi 184 Suzuki, Shunsuke 214, 267 Suzuki, Takayuki 882 Svanæs, Dag 232 913 Vangheluwe, Hans 816 Vargas-Cerd´ an, Mar´ıa Dolores 697 Wakamiya, Sayaka 159 Wan Adnan, Wan Adilah 510 Wang, Douglas Xiaoyong 745 Watanabe, Keita 752 Wesley, Avinash 169 Whang, Mincheol 77 Widlroither, Harald 304 Wiecha, Charlie 856 Williams, Gayna 530 Wohlfarter, Martin 105 Woo, Jincheol 77 Wright, Zach 329 Xu, Wenchang 760 Yamada, Koichi 735 Yamamoto, Fujio 882 Yamazaki, Toshimasa 184 Yang, Chao-Yang 372 Yang, Xin 760 Yano, Yoneo 900 Yasumura, Michiaki 752 Yoon, Joonyoung 540 Yoon, Sungyoung 540 Yoshino, Takashi 634 Yu, Hui-Chi 465 Zeckzer, Dirk 846 Zeng, Liang 769 Zhong, Yinqing 520 149 514 </div> </div> </div> <div class="row hidden-xs"> <div class="col-md-12"> <h2></h2> <hr /> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/human-computer-interaction-novel-interaction-methods-and-techniques-13th-interna.html"> <img src="https://epdf.mx/img/300x300/human-computer-interaction-novel-interaction-metho_5a599181b7d7bc650cafa4b5.jpg" alt="Human-Computer Interaction. Novel Interaction Methods and Techniques: 13th International Conference, HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Human-Computer Interaction. Novel Interaction Methods and Techniques: 13th International Conference, HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/human-computer-interaction-novel-interaction-methods-and-techniques-13th-interna.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/online-communities-and-social-computing-third-international-conference-ocsc-2009.html"> <img src="https://epdf.mx/img/300x300/online-communities-and-social-computing-third-inte_5a599193b7d7bc660c1df53a.jpg" alt="Online Communities and Social Computing: Third International Conference, OCSC 2009, Held as Part of HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Online Communities and Social Computing: Third International Conference, OCSC 2009, Held as Part of HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/online-communities-and-social-computing-third-international-conference-ocsc-2009.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/human-computer-interaction-interacting-in-various-application-domains-13th-inter.html"> <img src="https://epdf.mx/img/300x300/human-computer-interaction-interacting-in-various-_5a59907ab7d7bc660c1df536.jpg" alt="Human-Computer Interaction. Interacting in Various Application Domains: 13th International Conference, HCI International 2009, San Diego, CA, USA, ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Human-Computer Interaction. Interacting in Various Application Domains: 13th International Conference, HCI International 2009, San Diego, CA, USA, ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/human-computer-interaction-interacting-in-various-application-domains-13th-inter.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/human-computer-interaction-ambient-ubiquitous-and-intelligent-interaction-13th-i.html"> <img src="https://epdf.mx/img/300x300/human-computer-interaction-ambient-ubiquitous-and-_5a599064b7d7bc670ca2c15b.jpg" alt="Human-Computer Interaction. Ambient, Ubiquitous and Intelligent Interaction: 13th International Conference, HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Human-Computer Interaction. Ambient, Ubiquitous and Intelligent Interaction: 13th International Conference, HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/human-computer-interaction-ambient-ubiquitous-and-intelligent-interaction-13th-i.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/internationalization-design-and-global-development-third-international-conferenc.html"> <img src="https://epdf.mx/img/300x300/internationalization-design-and-global-development_5a49c81bb7d7bcc5174e9125.jpg" alt="Internationalization, Design and Global Development: Third International Conference, IDGD 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Internationalization, Design and Global Development: Third International Conference, IDGD 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/internationalization-design-and-global-development-third-international-conferenc.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/ergonomics-and-health-aspects-of-work-with-computers-international-conference-eh.html"> <img src="https://epdf.mx/img/300x300/ergonomics-and-health-aspects-of-work-with-compute_5a59921cb7d7bc660c1df53d.jpg" alt="Ergonomics and Health Aspects of Work with Computers: International Conference, EHAWC 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Ergonomics and Health Aspects of Work with Computers: International Conference, EHAWC 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/ergonomics-and-health-aspects-of-work-with-computers-international-conference-eh.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/virtual-and-mixed-reality-third-international-conference-vmr-2009-held-as-part-o.html"> <img src="https://epdf.mx/img/300x300/virtual-and-mixed-reality-third-international-conf_5a599137b7d7bc670ca2c15d.jpg" alt="Virtual and Mixed Reality: Third International Conference, VMR 2009, Held as Part of HCI International 2009, San Diego, CA USA, July, 19-24, 2009, ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Virtual and Mixed Reality: Third International Conference, VMR 2009, Held as Part of HCI International 2009, San Diego, CA USA, July, 19-24, 2009, ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/virtual-and-mixed-reality-third-international-conference-vmr-2009-held-as-part-o.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/digital-human-modeling-second-international-conference-icdhm-2009-held-as-part-o21332.html"> <img src="https://epdf.mx/img/300x300/digital-human-modeling-second-international-confer_5a599103b7d7bc660c1df537.jpg" alt="Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/digital-human-modeling-second-international-conference-icdhm-2009-held-as-part-o21332.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/the-semantic-web-iswc-2009-8th-international-semantic-web-conference-iswc-2009-c.html"> <img src="https://epdf.mx/img/300x300/the-semantic-web-iswc-2009-8th-international-seman_5a988d56b7d7bcb844aa09d5.jpg" alt="The Semantic Web - ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009, Proceedings (Lecture ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">The Semantic Web - ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009, Proceedings (Lecture ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/the-semantic-web-iswc-2009-8th-international-semantic-web-conference-iswc-2009-c.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/semantic-web-services-and-web-process-composition-first-international-workshop-s84216.html"> <img src="https://epdf.mx/img/300x300/semantic-web-services-and-web-process-composition-_5a53678eb7d7bce428e34a25.jpg" alt="Semantic Web Services and Web Process Composition: First International Workshop, SWSWPC 2004, San Diego, CA, USA, July 6, 2004, Revised Selected Papers ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Semantic Web Services and Web Process Composition: First International Workshop, SWSWPC 2004, San Diego, CA, USA, July 6, 2004, Revised Selected Papers ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/semantic-web-services-and-web-process-composition-first-international-workshop-s84216.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/engineering-psychology-and-cognitive-ergonomics-8th-international-conference-epc.html"> <img src="https://epdf.mx/img/300x300/engineering-psychology-and-cognitive-ergonomics-8t_5a599064b7d7bc650cafa4b4.jpg" alt="Engineering Psychology and Cognitive Ergonomics: 8th International Conference, EPCE 2009, Held as Part of HCI International 2009, San Diego, CA, USA, ... Lecture Notes in Artificial Intelligence)" /> <h3 class="note-title">Engineering Psychology and Cognitive Ergonomics: 8th International Conference, EPCE 2009, Held as Part of HCI International 2009, San Diego, CA, USA, ... Lecture Notes in Artificial Intelligence)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/engineering-psychology-and-cognitive-ergonomics-8th-international-conference-epc.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/web-engineering-9th-international-conference-icwe-2009-san-sebastian-spain-june-.html"> <img src="https://epdf.mx/img/300x300/web-engineering-9th-international-conference-icwe-_5b6cc089b7d7bc8e45e6fc30.jpg" alt="Web Engineering: 9th International Conference, ICWE 2009 San Sebastián, Spain, June 24-26 2009 Proceedings (Lecture Notes in Computer Science ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Web Engineering: 9th International Conference, ICWE 2009 San Sebastián, Spain, June 24-26 2009 Proceedings (Lecture Notes in Computer Science ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/web-engineering-9th-international-conference-icwe-2009-san-sebastian-spain-june-.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/web-reasoning-and-rule-systems-third-international-conference-rr-2009-chantilly-.html"> <img src="https://epdf.mx/img/300x300/web-reasoning-and-rule-systems-third-international_5a98873eb7d7bcb744e63b6d.jpg" alt="Web Reasoning and Rule Systems: Third International Conference, RR 2009, Chantilly, VA, USA, October 25-26, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Web Reasoning and Rule Systems: Third International Conference, RR 2009, Chantilly, VA, USA, October 25-26, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/web-reasoning-and-rule-systems-third-international-conference-rr-2009-chantilly-.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/advances-in-web-and-network-technologies-and-information-management-ap-web-waim-.html"> <img src="https://epdf.mx/img/300x300/advances-in-web-and-network-technologies-and-infor_5a5d4975b7d7bcd61b9c0754.jpg" alt="Advances in Web and Network Technologies and Information Management: AP Web WAIM 2009 International Workshops: WCMT 2009, RTBI 2009, DBIR-ENQOIR 2009, ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Advances in Web and Network Technologies and Information Management: AP Web WAIM 2009 International Workshops: WCMT 2009, RTBI 2009, DBIR-ENQOIR 2009, ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/advances-in-web-and-network-technologies-and-information-management-ap-web-waim-.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/haptic-and-audio-interaction-design-4th-international-conference-haid-2009-dresd.html"> <img src="https://epdf.mx/img/300x300/haptic-and-audio-interaction-design-4th-internatio_5a388142b7d7bc1553a58b81.jpg" alt="Haptic and Audio Interaction Design: 4th International Conference, HAID 2009 Dresden, Germany, September 10-11, 2009 Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Haptic and Audio Interaction Design: 4th International Conference, HAID 2009 Dresden, Germany, September 10-11, 2009 Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/haptic-and-audio-interaction-design-4th-international-conference-haid-2009-dresd.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/e-commerce-and-web-technologies-10th-international-conference-ec-web-2009-linz-a.html"> <img src="https://epdf.mx/img/300x300/e-commerce-and-web-technologies-10th-international_5a63ea69b7d7bc881503a30a.jpg" alt="E-Commerce and Web Technologies: 10th International Conference, EC-Web 2009, Linz, Austria, September 1-4, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">E-Commerce and Web Technologies: 10th International Conference, EC-Web 2009, Linz, Austria, September 1-4, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/e-commerce-and-web-technologies-10th-international-conference-ec-web-2009-linz-a.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/human-interface-and-the-management-of-information-information-and-interaction-sy.html"> <img src="https://epdf.mx/img/300x300/human-interface-and-the-management-of-information-_5a86193db7d7bcd3052c520e.jpg" alt="Human Interface and the Management of Information. Information and Interaction: Symposium on Human Interface 2009, Held as Part of HCI International ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Human Interface and the Management of Information. Information and Interaction: Symposium on Human Interface 2009, Held as Part of HCI International ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/human-interface-and-the-management-of-information-information-and-interaction-sy.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/human-computer-interaction-interact-2009-12th-ifip-tc-13-international-conferenc.html"> <img src="https://epdf.mx/img/300x300/human-computer-interaction-interact-2009-12th-ifip_5b6a8801b7d7bc954fdbd6ac.jpg" alt="Human-Computer Interaction - INTERACT 2009: 12th IFIP TC 13 International Conference, Uppsala, Sweden, August 24-28, 2009, Proceedigns Part II ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Human-Computer Interaction - INTERACT 2009: 12th IFIP TC 13 International Conference, Uppsala, Sweden, August 24-28, 2009, Proceedigns Part II ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/human-computer-interaction-interact-2009-12th-ifip-tc-13-international-conferenc.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/ubiquitous-intelligence-and-computing-6th-international-conference-uic-2009-bris.html"> <img src="https://epdf.mx/img/300x300/ubiquitous-intelligence-and-computing-6th-internat_5a63e002b7d7bceb666a5962.jpg" alt="Ubiquitous Intelligence and Computing: 6th International Conference, UIC 2009, Brisbane, Australia, July 7-9, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Ubiquitous Intelligence and Computing: 6th International Conference, UIC 2009, Brisbane, Australia, July 7-9, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/ubiquitous-intelligence-and-computing-6th-international-conference-uic-2009-bris.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/geosensor-networks-third-international-conference-gsn-2009-oxford-uk-july-13-14-.html"> <img src="https://epdf.mx/img/300x300/geosensor-networks-third-international-conference-_5a4e3361b7d7bc3f4a3acc3c.jpg" alt="GeoSensor Networks: Third International Conference, GSN 2009, Oxford, UK, July 13-14, 2009, Proceedings (Lecture Notes in Computer Science ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">GeoSensor Networks: Third International Conference, GSN 2009, Oxford, UK, July 13-14, 2009, Proceedings (Lecture Notes in Computer Science ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/geosensor-networks-third-international-conference-gsn-2009-oxford-uk-july-13-14-.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/entertainment-computing-icec-2009-8th-international-conference-icec-2009-paris-f.html"> <img src="https://epdf.mx/img/300x300/entertainment-computing-icec-2009-8th-internationa_5aed98f8b7d7bce43be14b52.jpg" alt="Entertainment Computing -- ICEC 2009: 8th International Conference, ICEC 2009, Paris, France, September 3-5, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Entertainment Computing -- ICEC 2009: 8th International Conference, ICEC 2009, Paris, France, September 3-5, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/entertainment-computing-icec-2009-8th-international-conference-icec-2009-paris-f.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/web-information-systems-engineering-wise-2009-10th-international-conference-pozn.html"> <img src="https://epdf.mx/img/300x300/web-information-systems-engineering-wise-2009-10th_5a988ea7b7d7bcb844aa09d8.jpg" alt="Web Information Systems Engineering - WISE 2009: 10th International Conference, Poznen, Poland, October 5-7, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Web Information Systems Engineering - WISE 2009: 10th International Conference, Poznen, Poland, October 5-7, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/web-information-systems-engineering-wise-2009-10th-international-conference-pozn.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/advances-in-web-based-learning-icwl-2009-8th-international-conference-aachen-ger.html"> <img src="https://epdf.mx/img/300x300/advances-in-web-based-learning-icwl-2009-8th-inter_5a67e70bb7d7bc720ba15989.jpg" alt="Advances in Web Based Learning - ICWL 2009: 8th International Conference, Aachen, Germany, August 19-21, 2009, Proceedings (Lecture Notes in Computer Science ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Advances in Web Based Learning - ICWL 2009: 8th International Conference, Aachen, Germany, August 19-21, 2009, Proceedings (Lecture Notes in Computer Science ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/advances-in-web-based-learning-icwl-2009-8th-international-conference-aachen-ger.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/digital-human-modeling-second-international-conference-icdhm-2009-held-as-part-o.html"> <img src="https://epdf.mx/img/300x300/digital-human-modeling-second-international-confer_5a587ee1b7d7bc0a74bfcb1c.jpg" alt="Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009. (Proceedings. Lecture Notes in Computer Science)" /> <h3 class="note-title">Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009. (Proceedings. Lecture Notes in Computer Science)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/digital-human-modeling-second-international-conference-icdhm-2009-held-as-part-o.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/digital-human-modeling-second-international-conference-icdhm-2009-held-as-part-o93105.html"> <img src="https://epdf.mx/img/300x300/digital-human-modeling-second-international-confer_5a58873ab7d7bc0e74b2e466.jpg" alt="Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009. (Proceedings. Lecture Notes in Computer Science)" /> <h3 class="note-title">Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009. (Proceedings. Lecture Notes in Computer Science)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/digital-human-modeling-second-international-conference-icdhm-2009-held-as-part-o93105.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/advances-in-web-based-learning-icwl-2009-8th-international-conference-aachen-ger34810.html"> <img src="https://epdf.mx/img/300x300/advances-in-web-based-learning-icwl-2009-8th-inter_5a67e7a6b7d7bc6f0b6b348c.jpg" alt="Advances in Web Based Learning - ICWL 2009: 8th International Conference, Aachen, Germany, August 19-21, 2009, Proceedings (Lecture Notes in Computer ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Advances in Web Based Learning - ICWL 2009: 8th International Conference, Aachen, Germany, August 19-21, 2009, Proceedings (Lecture Notes in Computer ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/advances-in-web-based-learning-icwl-2009-8th-international-conference-aachen-ger34810.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/advances-in-spatial-and-temporal-databases-11th-international-symposium-sstd-200.html"> <img src="https://epdf.mx/img/300x300/advances-in-spatial-and-temporal-databases-11th-in_5a388413b7d7bc13532626f5.jpg" alt="Advances in Spatial and Temporal Databases: 11th International Symposium, SSTD 2009 Aalborg, Denmark, July 8-10, 2009 Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Advances in Spatial and Temporal Databases: 11th International Symposium, SSTD 2009 Aalborg, Denmark, July 8-10, 2009 Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/advances-in-spatial-and-temporal-databases-11th-international-symposium-sstd-200.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/electronic-government-8th-international-conference-egov-2009-linz-austria-august.html"> <img src="https://epdf.mx/img/300x300/electronic-government-8th-international-conference_5a77a542b7d7bc9b08cc8fe0.jpg" alt="Electronic Government: 8th International Conference, EGOV 2009, Linz, Austria, August 31 - September 3, 2009, Proceedings (Lecture Notes in Computer ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Electronic Government: 8th International Conference, EGOV 2009, Linz, Austria, August 31 - September 3, 2009, Proceedings (Lecture Notes in Computer ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/electronic-government-8th-international-conference-egov-2009-linz-austria-august.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/data-warehousing-and-knowledge-discovery-11th-international-conference-dawak-200.html"> <img src="https://epdf.mx/img/300x300/data-warehousing-and-knowledge-discovery-11th-inte_5a69351ab7d7bca57c2c21fe.jpg" alt="Data Warehousing and Knowledge Discovery: 11th International Conference, DaWaK 2009 Linz, Austria, August 31-September 2, 2009 Proceedings (Lecture ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Data Warehousing and Knowledge Discovery: 11th International Conference, DaWaK 2009 Linz, Austria, August 31-September 2, 2009 Proceedings (Lecture ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/data-warehousing-and-knowledge-discovery-11th-international-conference-dawak-200.html">Read more</a> </div> </div> </div> <div class="col-lg-2 col-md-3"> <div class="note"> <div class="note-meta-thumb"> <a href="https://epdf.mx/semantic-multimedia-4th-international-conference-on-semantic-and-digital-media-t.html"> <img src="https://epdf.mx/img/300x300/semantic-multimedia-4th-international-conference-o_5a60731db7d7bc32156696c1.jpg" alt="Semantic Multimedia: 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009 Graz, Austria, December 2-4, 2009 Proceedings ... Applications, incl. Internet Web, and HCI)" /> <h3 class="note-title">Semantic Multimedia: 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009 Graz, Austria, December 2-4, 2009 Proceedings ... Applications, incl. Internet Web, and HCI)</h3> </a> </div> <div class="note-action"> <a class="more-link" href="https://epdf.mx/semantic-multimedia-4th-international-conference-on-semantic-and-digital-media-t.html">Read more</a> </div> </div> </div> </div> </div> <div class="col-lg-3 col-md-4 col-xs-12"> <div class="panel-recommend panel panel-primary"> <div class="panel-heading"> <h4 class="panel-title">Recommend Documents</h4> </div> <div class="panel-body"> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/human-computer-interaction-novel-interaction-methods-and-techniques-13th-interna.html"> <img src="https://epdf.mx/img/60x80/human-computer-interaction-novel-interaction-metho_5a599181b7d7bc650cafa4b5.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/human-computer-interaction-novel-interaction-methods-and-techniques-13th-interna.html"> Human-Computer Interaction. Novel Interaction Methods and Techniques: 13th International Conference, HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/online-communities-and-social-computing-third-international-conference-ocsc-2009.html"> <img src="https://epdf.mx/img/60x80/online-communities-and-social-computing-third-inte_5a599193b7d7bc660c1df53a.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/online-communities-and-social-computing-third-international-conference-ocsc-2009.html"> Online Communities and Social Computing: Third International Conference, OCSC 2009, Held as Part of HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/human-computer-interaction-interacting-in-various-application-domains-13th-inter.html"> <img src="https://epdf.mx/img/60x80/human-computer-interaction-interacting-in-various-_5a59907ab7d7bc660c1df536.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/human-computer-interaction-interacting-in-various-application-domains-13th-inter.html"> Human-Computer Interaction. Interacting in Various Application Domains: 13th International Conference, HCI International 2009, San Diego, CA, USA, ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/human-computer-interaction-ambient-ubiquitous-and-intelligent-interaction-13th-i.html"> <img src="https://epdf.mx/img/60x80/human-computer-interaction-ambient-ubiquitous-and-_5a599064b7d7bc670ca2c15b.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/human-computer-interaction-ambient-ubiquitous-and-intelligent-interaction-13th-i.html"> Human-Computer Interaction. Ambient, Ubiquitous and Intelligent Interaction: 13th International Conference, HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/internationalization-design-and-global-development-third-international-conferenc.html"> <img src="https://epdf.mx/img/60x80/internationalization-design-and-global-development_5a49c81bb7d7bcc5174e9125.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/internationalization-design-and-global-development-third-international-conferenc.html"> Internationalization, Design and Global Development: Third International Conference, IDGD 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/ergonomics-and-health-aspects-of-work-with-computers-international-conference-eh.html"> <img src="https://epdf.mx/img/60x80/ergonomics-and-health-aspects-of-work-with-compute_5a59921cb7d7bc660c1df53d.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/ergonomics-and-health-aspects-of-work-with-computers-international-conference-eh.html"> Ergonomics and Health Aspects of Work with Computers: International Conference, EHAWC 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/virtual-and-mixed-reality-third-international-conference-vmr-2009-held-as-part-o.html"> <img src="https://epdf.mx/img/60x80/virtual-and-mixed-reality-third-international-conf_5a599137b7d7bc670ca2c15d.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/virtual-and-mixed-reality-third-international-conference-vmr-2009-held-as-part-o.html"> Virtual and Mixed Reality: Third International Conference, VMR 2009, Held as Part of HCI International 2009, San Diego, CA USA, July, 19-24, 2009, ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/digital-human-modeling-second-international-conference-icdhm-2009-held-as-part-o21332.html"> <img src="https://epdf.mx/img/60x80/digital-human-modeling-second-international-confer_5a599103b7d7bc660c1df537.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/digital-human-modeling-second-international-conference-icdhm-2009-held-as-part-o21332.html"> Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/the-semantic-web-iswc-2009-8th-international-semantic-web-conference-iswc-2009-c.html"> <img src="https://epdf.mx/img/60x80/the-semantic-web-iswc-2009-8th-international-seman_5a988d56b7d7bcb844aa09d5.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/the-semantic-web-iswc-2009-8th-international-semantic-web-conference-iswc-2009-c.html"> The Semantic Web - ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009, Proceedings (Lecture ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> <div class="row m-0"> <div class="col-md-3 col-xs-3 pl-0 text-center"> <a href="https://epdf.mx/semantic-web-services-and-web-process-composition-first-international-workshop-s84216.html"> <img src="https://epdf.mx/img/60x80/semantic-web-services-and-web-process-composition-_5a53678eb7d7bce428e34a25.jpg" alt="" width="100%" /> </a> </div> <div class="col-md-9 col-xs-9 p-0"> <label> <a href="https://epdf.mx/semantic-web-services-and-web-process-composition-first-international-workshop-s84216.html"> Semantic Web Services and Web Process Composition: First International Workshop, SWSWPC 2004, San Diego, CA, USA, July 6, 2004, Revised Selected Papers ... Applications, incl. Internet Web, and HCI) </a> </label> <div class="note-meta"> <div class="note-desc">Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris...</div> </div> </div> <div class="clearfix"></div> <hr class="mt-15 mb-15" /> </div> </div> </div> </div> </div> </div> <div class="modal fade" id="report" tabindex="-1" role="dialog" aria-hidden="true"> <div class="modal-dialog"> <div class="modal-content"> <form role="form" method="post" action="https://epdf.mx/report/human-computer-interaction-new-trends-13th-international-conference-hci-internat" style="border: none;"> <div class="modal-header"> <button type="button" class="close" data-dismiss="modal" aria-hidden="true">×</button> <h4 class="modal-title">Report "Human-Computer Interaction. New Trends: 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19-24, 2009, Proceedings, Part ... Applications, incl. Internet Web, and HCI)"</h4> </div> <div class="modal-body"> <div class="form-group"> <label>Your name</label> <input type="text" name="name" required="required" class="form-control" /> </div> <div class="form-group"> <label>Email</label> <input type="email" name="email" required="required" class="form-control" /> </div> <div class="form-group"> <label>Reason</label> <select name="reason" required="required" class="form-control"> <option value="">-Select Reason-</option> <option value="pornographic" selected="selected">Pornographic</option> <option value="defamatory">Defamatory</option> <option value="illegal">Illegal/Unlawful</option> <option value="spam">Spam</option> <option value="others">Other Terms Of Service Violation</option> <option value="copyright">File a copyright complaint</option> </select> </div> <div class="form-group"> <label>Description</label> <textarea name="description" required="required" rows="3" class="form-control" style="border: 1px solid #cccccc;"></textarea> </div> <div class="form-group"> <div style="display: inline-block;"> <div class="g-recaptcha" data-sitekey="6Le0e5ceAAAAADsZpn1H3VI-HvOppGDh-O-QAVYL"></div> </div> </div> <script src='https://www.google.com/recaptcha/api.js'></script> </div> <div class="modal-footer"> <button type="button" class="btn btn-default" data-dismiss="modal">Close</button> <button type="submit" class="btn btn-success">Send</button> </div> </form> </div> </div> </div> <footer class="footer" style="margin-top: 60px;"> <div class="container-fluid"> Copyright © 2024 EPDF.MX. All rights reserved. <div class="pull-right"> <a href="https://epdf.mx/about">About Us</a> | <a href="https://epdf.mx/privacy">Privacy Policy</a> | <a href="https://epdf.mx/term">Terms of Service</a> | <a href="https://epdf.mx/copyright">Copyright</a> | <a href="https://epdf.mx/dmca">DMCA</a> | <a href="https://epdf.mx/contact">Contact Us</a> | <a href="https://epdf.mx/cookie_policy">Cookie Policy</a> </div> </div> </footer>  <div class="modal fade" id="login" tabindex="-1" role="dialog" aria-labelledby="myModalLabel"> <div class="modal-dialog" role="document"> <div class="modal-content"> <div class="modal-header"> <button type="button" class="close" data-dismiss="modal" aria-label="Close" on="tap:login.close">×</button> <h4 class="modal-title" id="add-note-label">Sign In</h4> </div> <div class="modal-body"> <form action="https://epdf.mx/login" method="post"> <div class="form-group"> <label class="sr-only" for="email">Email</label> <input class="form-input form-control" type="text" name="email" id="email" value="" placeholder="Email" /> </div> <div class="form-group"> <label class="sr-only" for="password">Password</label> <input class="form-input form-control" type="password" name="password" id="password" value="" placeholder="Password" /> </div> <div class="form-group"> <div class="checkbox"> <label class="form-checkbox"> <input type="checkbox" name="remember" value="1" /> Remember me </label> <label class="pull-right"><a href="https://epdf.mx/forgot">Forgot password?</a></label> </div> </div> <button class="btn btn-primary btn-block" type="submit">Sign In</button> </form> <hr style="margin-top: 15px;" /> <a href="https://epdf.mx/login/facebook" class="btn btn-facebook btn-block"> Login with Facebook</a> </div> </div> </div> </div>  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-111550345-1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'UA-111550345-1'); </script> <script src="https://epdf.mx/assets/js/jquery-ui.min.js"></script> <link rel="stylesheet" href="https://epdf.mx/assets/css/jquery-ui.css"> <script> $(function () { $("#document_search").autocomplete({ source: function (request, response) { $.ajax({ url: "https://epdf.mx/suggest", dataType: "json", data: { term: request.term }, success: function (data) { response(data); } }); }, autoFill: true, select: function( event, ui ) { $(this).val(ui.item.value); $(this).parents("form").submit(); } }); }); </script>  <div id="EPDFMX_cookie_box" style="z-index:99999; border-top: 1px solid #fefefe; background: #97c479; width: 100%; position: fixed; padding: 5px 15px; text-align: center; left:0; bottom: 0;"> Our partners will collect data and use cookies for ad personalization and measurement. <a href="https://epdf.mx/cookie_policy" target="_blank">Learn how we and our ad partner Google, collect and use data</a>. <a href="#" class="btn btn-success" onclick="accept_EPDFMX_cookie_box();return false;">Agree & close</a> </div> <script> function accept_EPDFMX_cookie_box() { document.cookie = "EPDFMX_cookie_box_viewed=1;max-age=15768000;path=/"; hide_EPDFMX_cookie_box(); } function hide_EPDFMX_cookie_box() { var cb = document.getElementById('EPDFMX_cookie_box'); if (cb) { cb.parentElement.removeChild(cb); } } (function () { var EPDFMX_cookie_box_viewed = (function (name) { var matches = document.cookie.match(new RegExp("(?:^|; )" + name.replace(/([\.$?*|{}\[\]\\\/\+^])/g, '\\$1') + "=([^;]*)")); return matches ? decodeURIComponent(matches[1]) : undefined; })('EPDFMX_cookie_box_viewed'); if (EPDFMX_cookie_box_viewed) { hide_EPDFMX_cookie_box(); } })(); </script>  </body> </html> <script data-cfasync="false" src="/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js"></script>