Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
6772
Gavriel Salvendy Michael J. Smith (Eds.)
Human Interface and the Management of Information Interacting with Information Symposium on Human Interface 2011 Held as Part of HCI International 2011 Orlando, FL, USA, July 9-14, 2011 Proceedings, Part II
13
Volume Editors Gavriel Salvendy Purdue University School of Industrial Engineering West Lafayette, IN, USA and Tsinghua University Department of Industrial Engineering Beijing, P.R. China E-mail:
[email protected] Michael J. Smith University of Wisconsin-Madison Department of Industrial and Systems Engineering Center for Quality and Productivity Improvement Madison, WI, USA E-mail:
[email protected]
ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-21668-8 e-ISBN 978-3-642-21669-5 DOI 10.1007/978-3-642-21669-5 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2009928850 CR Subject Classification (1998): H.5, K.6, H.3-4, C.2, H.4.2, J.1 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
The 14th International Conference on Human–Computer Interaction, HCI International 2011, was held in Orlando, Florida, USA, July 9–14, 2011, jointly with the Symposium on Human Interface (Japan) 2011, the 9th International Conference on Engineering Psychology and Cognitive Ergonomics, the 6th International Conference on Universal Access in Human–Computer Interaction, the 4th International Conference on Virtual and Mixed Reality, the 4th International Conference on Internationalization, Design and Global Development, the 4th International Conference on Online Communities and Social Computing, the 6th International Conference on Augmented Cognition, the Third International Conference on Digital Human Modeling, the Second International Conference on Human-Centered Design, and the First International Conference on Design, User Experience, and Usability. A total of 4,039 individuals from academia, research institutes, industry and governmental agencies from 67 countries submitted contributions, and 1,318 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human–computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. This volume, edited by Gavriel Salvendy and Michael J. Smith, contains papers in the thematic area of human interface and the management of information (HIMI), addressing the following major topics: • • • • •
Access to information Supporting communication Supporting work, collaboration, decision-making and business Mobile and ubiquitous information Information in aviation
The remaining volumes of the HCI International 2011 Proceedings are: • Volume 1, LNCS 6761, Human–Computer Interaction—Design and Development Approaches (Part I), edited by Julie A. Jacko • Volume 2, LNCS 6762, Human–Computer Interaction—Interaction Techniques and Environments (Part II), edited by Julie A. Jacko • Volume 3, LNCS 6763, Human–Computer Interaction—Towards Mobile and Intelligent Interaction Environments (Part III), edited by Julie A. Jacko • Volume 4, LNCS 6764, Human–Computer Interaction—Users and Applications (Part IV), edited by Julie A. Jacko • Volume 5, LNCS 6765, Universal Access in Human–Computer Interaction— Design for All and eInclusion (Part I), edited by Constantine Stephanidis
VI
Foreword
• Volume 6, LNCS 6766, Universal Access in Human–Computer Interaction— Users Diversity (Part II), edited by Constantine Stephanidis • Volume 7, LNCS 6767, Universal Access in Human–Computer Interaction— Context Diversity (Part III), edited by Constantine Stephanidis • Volume 8, LNCS 6768, Universal Access in Human–Computer Interaction— Applications and Services (Part IV), edited by Constantine Stephanidis • Volume 9, LNCS 6769, Design, User Experience, and Usability—Theory, Methods, Tools and Practice (Part I), edited by Aaron Marcus • Volume 10, LNCS 6770, Design, User Experience, and Usability— Understanding the User Experience (Part II), edited by Aaron Marcus • Volume 11, LNCS 6771, Human Interface and the Management of Information—Design and Interaction (Part I), edited by Michael J. Smith and Gavriel Salvendy • Volume 13, LNCS 6773, Virtual and Mixed Reality—New Trends (Part I), edited by Randall Shumaker • Volume 14, LNCS 6774, Virtual and Mixed Reality—Systems and Applications (Part II), edited by Randall Shumaker • Volume 15, LNCS 6775, Internationalization, Design and Global Development, edited by P.L. Patrick Rau • Volume 16, LNCS 6776, Human-Centered Design, edited by Masaaki Kurosu • Volume 17, LNCS 6777, Digital Human Modeling, edited by Vincent G. Duffy • Volume 18, LNCS 6778, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris • Volume 19, LNCS 6779, Ergonomics and Health Aspects of Work with Computers, edited by Michelle M. Robertson • Volume 20, LNAI 6780, Foundations of Augmented Cognition: Directing the Future of Adaptive Systems, edited by Dylan D. Schmorrow and Cali M. Fidopiastis • Volume 21, LNAI 6781, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris • Volume 22, CCIS 173, HCI International 2011 Posters Proceedings (Part I), edited by Constantine Stephanidis • Volume 23, CCIS 174, HCI International 2011 Posters Proceedings (Part II), edited by Constantine Stephanidis I would like to thank the Program Chairs and the members of the Program Boards of all Thematic Areas, listed herein, for their contribution to the highest scientific quality and the overall success of the HCI International 2011 Conference. In addition to the members of the Program Boards, I also wish to thank the following volunteer external reviewers: Roman Vilimek from Germany, Ramalingam Ponnusamy from India, Si Jung “Jun” Kim from the USA, and Ilia Adami, Iosif Klironomos, Vassilis Kouroumalis, George Margetis, and Stavroula Ntoa from Greece.
Foreword
VII
This conference would not have been possible without the continuous support and advice of the Conference Scientific Advisor, Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications and Exhibition Chair and Editor of HCI International News, Abbas Moallem. I would also like to thank for their contribution toward the organization of the HCI International 2011 Conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, Maria Bouhli and George Kapnas. July 2011
Constantine Stephanidis
Organization
Ergonomics and Health Aspects of Work with Computers Program Chair: Michelle M. Robertson Arne Aar˚ as, Norway Pascale Carayon, USA Jason Devereux, UK Wolfgang Friesdorf, Germany Martin Helander, Singapore Ed Israelski, USA Ben-Tzion Karsh, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Nancy Larson, USA Kari Lindstr¨om, Finland
Brenda Lobb, New Zealand Holger Luczak, Germany William S. Marras, USA Aura C. Matias, Philippines Matthias R¨ otting, Germany Michelle L. Rogers, USA Dominique L. Scapin, France Lawrence M. Schleifer, USA Michael J. Smith, USA Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK
Human Interface and the Management of Information Program Chair: Michael J. Smith Hans-J¨ org Bullinger, Germany Alan Chan, Hong Kong Shin’ichi Fukuzumi, Japan Jon R. Gunderson, USA Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Hirohiko Mori, Japan Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA
Youngho Rhee, Korea Anxo Cereijo Roib´ as, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P. R. China
X
Organization
Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven A. Landry, USA
Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA Pietro Carlo Cacciabue, Italy John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Wen-Chin Li, Taiwan James T. Luxhøj, USA Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands
Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa Matthew J.W. Thomas, Australia Mark Young, UK Rolf Zon, The Netherlands
Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth Andr´e, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian B¨ uhler, Germany Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy
Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA Patrick M. Langdon, UK Seongil Lee, Korea
Organization
Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria
Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Panayiotis Zaphiris, Cyprus
Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA Simon Julier, UK David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA Gordon McK Mair, UK
David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Jose San Martin, Spain Dieter Schmalstieg, Austria Dylan Schmorrow, USA Kay Stanney, USA Janet Weisenford, USA Mark Wiederhold, USA
Internationalization, Design and Global Development Program Chair: P.L. Patrick Rau Michael L. Best, USA Alan Chan, Hong Kong Lin-Lin Chen, Taiwan Andy M. Dearden, UK Susan M. Dray, USA Henry Been-Lirn Duh, Singapore Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA Sung H. Han, Korea Veikko Ikonen, Finland Toshikazu Kato, Japan Esin Kiris, USA Apala Lahiri Chavan, India
James R. Lewis, USA James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA Katsuhiko Ogawa, Japan Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin R¨ ose, Germany Supriya Singh, Australia Alvin W. Yeo, Malaysia Hsiu-Ping Yueh, Taiwan
XI
XII
Organization
Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Peter Day, UK Fiorella De Cindio, Italy Heidi Feng, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Andrew Laghos, Cyprus Stefanie Lindstaedt, Austria Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan
Anthony F. Norcio, USA Ulrike Pfeil, UK Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Laura Slaughter, Norway Sergei Stafeev, Russia Asimina Vasalou, UK June Wei, USA Haibin Zhu, Canada
Augmented Cognition Program Chairs: Dylan D. Schmorrow, Cali M. Fidopiastis Monique Beaudoin, USA Chris Berka, USA Joseph Cohn, USA Martha E. Crosby, USA Julie Drexler, USA Ivy Estabrooke, USA Chris Forsythe, USA Wai Tat Fu, USA Marc Grootjen, The Netherlands Jefferson Grubb, USA Santosh Mathan, USA
Rob Matthews, Australia Dennis McBride, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Banu Onaral, USA Kay Stanney, USA Roy Stripling, USA Rob Taylor, UK Karl van Orden, USA
Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Giuseppe Andreoni, Italy Thomas J. Armstrong, USA Norman I. Badler, USA Fethi Calisir, Turkey Daniel Carruth, USA Keith Case, UK Julie Charland, Canada
Yaobin Chen, USA Kathryn Cormican, Ireland Daniel A. DeLaurentis, USA Yingzi Du, USA Okan Ersoy, USA Enda Fallon, Ireland Yan Fu, P.R. China Afzal Godil, USA
Organization
Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Bo Hoege, Germany Hongwei Hsiao, USA Tianzi Jiang, P.R. China Nan Kong, USA Steven A. Landry, USA Kang Li, USA Zhizhong Li, P.R. China Tim Marler, USA
XIII
Ahmet F. Ozok, Turkey Srinivas Peeta, USA Sudhakar Rajulu, USA Matthias R¨ otting, Germany Matthew Reed, USA Johan Stahre, Sweden Mao-Jiun Wang, Taiwan Xuguang Wang, France Jingzhou (James) Yang, USA Gulcin Yucel, Turkey Tingshao Zhu, P.R. China
Human-Centered Design Program Chair: Masaaki Kurosu Julio Abascal, Spain Simone Barbosa, Brazil Tomas Berns, Sweden Nigel Bevan, UK Torkil Clemmensen, Denmark Susan M. Dray, USA Vanessa Evers, The Netherlands Xiaolan Fu, P.R. China Yasuhiro Horibe, Japan Jason Huang, P.R. China Minna Isomursu, Finland Timo Jokela, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan Seongil Lee, Korea Kee Yong Lim, Singapore
Zhengjie Liu, P.R. China Lo¨ıc Mart´ınez-Normand, Spain Monique Noirhomme-Fraiture, Belgium Philippe Palanque, France Annelise Mark Pejtersen, Denmark Kerstin R¨ ose, Germany Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Janet Wesson, South Africa Toshiki Yamaoka, Japan Kazuhiko Yamazaki, Japan Silvia Zimmermann, Switzerland
Design, User Experience, and Usability Program Chair: Aaron Marcus Ronald Baecker, Canada Barbara Ballard, USA Konrad Baumann, Austria Arne Berger, Germany Randolph Bias, USA Jamie Blustein, Canada
Ana Boa-Ventura, USA Lorenzo Cantoni, Switzerland Sameer Chavan, Korea Wei Ding, USA Maximilian Eibl, Germany Zelda Harrison, USA
XIV
Organization
R¨ udiger Heimg¨artner, Germany Brigitte Herrmann, Germany Sabine Kabel-Eckes, USA Kaleem Khan, Canada Jonathan Kies, USA Jon Kolko, USA Helga Letowt-Vorbek, South Africa James Lin, USA Frazer McKimm, Ireland Michael Renner, Switzerland
Christine Ronnewinkel, Germany Elizabeth Rosenzweig, USA Paul Sherman, USA Ben Shneiderman, USA Christian Sturm, Germany Brian Sullivan, USA Jaakko Villa, Finland Michele Visciola, Italy Susan Weinschenk, USA
HCI International 2013
The 15th International Conference on Human–Computer Interaction, HCI International 2013, will be held jointly with the affiliated conferences in the summer of 2013. It will cover a broad spectrum of themes related to human–computer interaction (HCI), including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. More information about the topics, as well as the venue and dates of the conference, will be announced through the HCI International Conference series website: http://www.hci-international.org/ General Chair Professor Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email:
[email protected]
Table of Contents – Part II
Part I: Access to Information Developing Optimum Interface Design for On-screen Chinese Proofreading Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alan H.S. Chan, Joey C.Y. So, and Steve N.H. Tsang “Life Portal”: An Information Access Scheme Based on Life Logs . . . . . . Shin-ichiro Eitoku, Manabu Motegi, Rika Mochizuki, Takashi Yagi, Shin-yo Muto, and Masanobu Abe
3 11
Proposal of the Kawaii Search System Based on the First Sight of Impression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyoko Hashiguchi and Katsuhiko Ogawa
21
Development of a Tracking Sound Game for Exercise Support of Visually Impaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshikazu Ikegami, Keita Ito, Hironaga Ishii, and Michiko Ohkura
31
From Personal to Collaborative Information Management: A Design Science’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mei Lu and Philip Corriveau
36
A Classification Scheme for Characterizing Visual Mining . . . . . . . . . . . . . Elaheh Mozaffari and Sudhir Mudur
46
Transforming a Standard Lecture into a Hybrid Learning Scenario . . . . . Hans-Martin Pohl, Jan-Torsten Milde, and Jan Lingelbach
55
Designing Web Sites and Interfaces to Optimize Successful User Interactions: Symposium Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert W. Proctor and Kim-Phuong L. Vu
62
Petimo: Sharing Experiences through Physically Extended Social Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nimesha Ranasinghe, Owen Noel Newton Fernando, and Adrian David Cheok Comparison Analysis for Text Data by Using FACT-Graph . . . . . . . . . . . . Ryosuke Saga, Seiko Takamizawa, Kodai Kitami, Hiroshi Tsuji, and Kazunori Matsumoto A Comparison between Single and Dual Monitor Productivity and the Effects of Window Management Styles on Performance . . . . . . . . . . . . . . . Alex Stegman, Chen Ling, and Randa Shehab
66
75
84
XVIII
Table of Contents – Part II
Interface Evaluation of Web-Based e-Picture Books in Taiwan . . . . . . . . . Pei-shiuan Tsai and Man-lai You
94
A Digital Archive System for Preserving Audio and Visual Space . . . . . . Makoto Uesaka, Yusuke Ikegaya, and Tomohito Yamamoto
103
Experience Explorer: Context-Based Browsing of Personal Media . . . . . . Tuomas Vaittinen, Tuula K¨ arkk¨ ainen, and Kimmo Roimela
111
Part II: Supporting Communication Service Science Method to Create Pictograms Referring to Sign Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naotsune Hosono, Hiromitsu Inoue, Hiroyuki Miki, Michio Suzuki, Yuji Nagashima, Yutaka Tomita, and Sakae Yamamoto
123
MoPaCo: Pseudo 3D Video Communication System . . . . . . . . . . . . . . . . . . Ryo Ishii, Shiro Ozawa, Takafumi Mukouchi, and Norihiko Matsuura
131
Analysis on Relationship between Smiley and Emotional Word Included in Chat Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Junko Itou, Tomoyasu Ogaki, and Jun Munemori
141
Designing Peripheral Communication Services for Families Living-Apart: Elderly Persons and Family . . . . . . . . . . . . . . . . . . . . . . . . . . . Yosuke Kinoe and Mihoko Noda
147
Visual Feedback to Reduce Influence of Delay on Video Chatting . . . . . . Kazuyoshi Murata, Masatsugu Hattori, and Yu Shibuya
157
Research on the Relationships between Visual Entertainment Factor and Chat Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomoyasu Ogaki, Junko Itou, and Jun Munemori
165
Multimodal Conversation Scene Analysis for Understanding People’s Communicative Behaviors in Face-to-Face Meetings . . . . . . . . . . . . . . . . . . Kazuhiro Otsuka
171
A Virtual Audience System for Enhancing Embodied Interaction Based on Conversational Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshihiro Sejima, Yutaka Ishii, and Tomio Watanabe
180
VizKid: A Behavior Capture and Visualization System of Adult-Child Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Grace Shin, Taeil Choi, Agata Rozga, and Mario Romero
190
Interactive e-Hon as Parent-Child Communication Tool . . . . . . . . . . . . . . . Kaoru Sumi and Mizue Nagata
199
Table of Contents – Part II
SAM: A Spatial Interactive Platform for Studying Family Communication Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guo-Jhen Yu, Teng-Wen Chang, and Ying-Chong Wang
XIX
207
Part III: Supporting Work, Collaboration, Decision-Making and Business The Effects Visual Feedback on Social Behavior during Decision Making Meetings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Merel Brandon, Simon Epskamp, Thomas de Groot, Tim Franssen, Bart van Gennep, and Thomas Visser Co-Creation of Value through Social Network Marketing: A Field Experiment Using a Facebook Campaign to Increase Conversion Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asle Fagerstrøm and Gheorghita Ghinea Towards Argument Representational Tools for Hybrid Argumentation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mar´ıa Paula Gonz´ alez, Sebastian Gottifredi, Alejandro J. Garc´ıa, and Guillermo R. Simari
219
229
236
Development of a Price Promotion Model for Online Store Selection . . . . Shintaro Hotta, Syohei Ishizu, and Yoshimitsu Nagai
246
Design Effective Voluntary Medical Incident Reporting Systems: A Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lei Hua and Yang Gong
253
Technology-Based Decision-Making Support System . . . . . . . . . . . . . . . . . . Hanmin Jung, Mikyoung Lee, Pyung Kim, and Won-Kyung Sung
262
Economic Analysis of SON-Enabled Mobile WiMAX . . . . . . . . . . . . . . . . . Seungjin Kwack, Jahwan Koo, and Jinwook Chung
268
ICT-Enabled Business Process Re-engineering: International Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ya-Ching Lee, Pin-Yu Chu, and Hsien-Lee Tseng
278
A Methodology to Develop a Clinical Ontology for Healthcare Business . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mario Macedo and Pedro Isa´ıas
285
Advances in E-commerce User Interface Design . . . . . . . . . . . . . . . . . . . . . . Lawrence J. Najjar
292
Information Technology Services Indutry and Job Design . . . . . . . . . . . . . Yoshihiko Saitoh
301
XX
Table of Contents – Part II
Dodging Window Interference to Freely Share Any Off-the-Shelf Application among Multiple Users in Co-located Collaboration . . . . . . . . Shinichiro Sakamoto, Makoto Nakashima, and Tetsuro Ito
305
Process in Establishing Communication in Collaborative Creation . . . . . . Mamiko Sakata and Keita Miyamoto
315
Real-World User-Centered Design: The Michigan Workforce Background Check System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sarah J. Swierenga, Fuad Abujarad, Toni A. Dennis, and Lori A. Post What Kinds of Human Negotiation Skill Can Be Acquired by Changing Negotiation Order of Bargaining Agents? . . . . . . . . . . . . . . . . . . . . . . . . . . . Keiki Takadama, Atsushi Otaki, Keiji Sato, Hiroyasu Matsushima, Masayuki Otani, Yoshihiro Ichikawa, Kiyohiko Hattori, and Hiroyoki Sato An Efficient and Scalable Meeting Minutes Generation and Presentation Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Berk Taner, Can Yildizli, Ahmet Ozcan Nergiz, and Selim Balcisoy
325
335
345
Part IV: Mobile and Ubiquitous Information Object and Scene Recognition Using Color Descriptors and Adaptive Color KLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volkan H. Bagci, Mariofanna Milanova, Roumen Kountchev, Roumiana Kountcheva, and Vladimir Todorov What Maps and What Displays for Remote Situation Awareness and ROV Localization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryad Chellali and Khelifa Baizid Evaluation of Disaster Information Management System Using Tabletop User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hidemi Fukada, Kazue Kobayashi, Aki Katsuki, and Naotake Hirasawa Relationality-Oriented Systems Design for Emergence, Growth, and Operation of Relationality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takuya Kajio, Manami Watanabe, Ivan Tanev, and Katsunori Shimohara
355
364
373
381
Real-Time and Interactive Rendering for Translucent Materials Such as Human Skin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiroyuki Kubo, Yoshinori Dobashi, and Shigeo Morishima
388
Local Communication Media Based on Concept of Media Biotope . . . . . . Hidetsugu Suto and Makiba Sakamoto
396
Table of Contents – Part II
XXI
Big Fat Wand: A Laser Projection System for Information Sharing in a Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toru Takahashi and Takao Terano
403
Disaster Information Collecting/Providing Service for Local Residents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuichi Takahashi, Daiji Kobayashi, and Sakae Yamamoto
411
Comfortable Design of Task-Related Information Displayed Using Optical See-Through Head-Mounted Display . . . . . . . . . . . . . . . . . . . . . . . . Kazuhiro Tanuma, Tomohiro Sato, Makoto Nomura, and Miwa Nakanishi
419
Usability Issues in Introducing Capacitive Interaction into Mobile Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuang Xu and Keith Bradburn
430
Balance Ball Interface for Performing Arts . . . . . . . . . . . . . . . . . . . . . . . . . . Tomoyuki Yamaguchi, Tsukasa Kobayashi, and Shuji Hashimoto
440
Study on Accessibility of Urgent Message Transmission Service in a Disaster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shunichi Yonemura and Kazuo Kamata
446
Part V: Information in Aviation Is ACARS and FANS-1A Just Another Data Link to the Controller? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vernol Battiste, Joel Lachter, Sarah V. Ligda, Jimmy H. Nguyen, L. Paige Bacon, Robert W. Koteskey, and Walter W. Johnson Flight Deck Workload and Acceptability of Verbal and Digital Communication Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summer L. Brandt, Joel Lachter, Arik-Quang V. Dao, Vernol Battiste, and Walter W. Johnson Conflict Resolution Automation and Pilot Situation Awareness . . . . . . . . Arik-Quang V. Dao, Summer L. Brandt, L. Paige Bacon, Joshua M. Kraut, Jimmy Nguyen, Katsumi Minakata, Hamzah Raza, and Walter W. Johnson Effect of ATC Training with NextGen Tools and Online Situation Awareness and Workload Probes on Operator Performance . . . . . . . . . . . . Ariana Kiken, R. Conrad Rorie, L. Paige Bacon, Sabrina Billinghurst, Joshua M. Kraut, Thomas Z. Strybel, Kim-Phuong L. Vu, and Vernol Battiste
453
463
473
483
XXII
Table of Contents – Part II
Effects of Data Communications Failure on Air Traffic Controller Sector Management Effectiveness, Situation Awareness, and Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joshua M. Kraut, Ariana Kiken, Sabrina Billinghurst, Corey A. Morgan, Thomas Z. Strybel, Dan Chiappe, and Kim-Phuong L. Vu Pilot Information Presentation on the Flight Deck: An Application of Synthetic Speech and Visual Digital Displays . . . . . . . . . . . . . . . . . . . . . . . . Nickolas D. Macchiarella, Jason P. Kring, Michael S. Coman, Tom Haritos, and Zoubair Entezari How Data Comm Methods and Multi-dimensional Traffic Displays Influence Pilot Workload under Trajectory Based Operations . . . . . . . . . . Jimmy H. Nguyen, L. Paige Bacon, R. Conrad Rorie, Meghann Herron, Kim-Phuong L. Vu, Thomas Z. Strybel, and Vernol Battiste Macroergonomics in Air Traffic Control – The Approach of a New System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luiza Helena Boueri Rebello A Preliminary Investigation of Training Order for Introducing NextGen Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Conrad Rorie, Ariana Kiken, Corey Morgan, Sabrina Billinghurst, Gregory Morales, Kevin Monk, Kim-Phuong L. Vu, Thomas Strybel, and Vernol Battiste Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
493
500
507
516
526
535
Table of Contents – Part I
Part I: Design and Development Methods and Tools Visual Programming of Location-Based Services . . . . . . . . . . . . . . . . . . . . . Antonio Bottaro, Enrico Marino, Franco Milicchio, Alberto Paoluzzi, Maurizio Rosina, and Federico Spini
3
Connecting Envisioning Process to User Interface Design Process . . . . . . Naotake Hirasawa, Shinya Ogata, and Kiko Yamada-Kawai
13
Learner-Centered Methodology for Designing and Developing Multimedia Simulation for Biology Education . . . . . . . . . . . . . . . . . . . . . . . Chi-Cheng Lin, Mark Bergland, and Karen Klyczek
20
User Interface and Information Management of Scenarios . . . . . . . . . . . . . Robert Louden, Matt Fontaine, Glenn A. Martin, Jason Daly, and Sae Schatz
30
Giving UI Developers the Power of UI Design Patterns . . . . . . . . . . . . . . . Jocelyn Richard, Jean-Marc Robert, S´ebastien Malo, and Jo¨el Migneault
40
The Cultural Integration of Knowledge Management into Interactive Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javed Anjum Sheikh, Bob Fields, and Elke Duncker
48
Supporting of Requirements Elicitation for Ensuring Services of Information Systems Used for Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuki Terawaki
58
Visualizing Programs on Different Levels of Abstractions . . . . . . . . . . . . . . Jo-Han Wu and Jan Stelovsky
66
Measurement and Evaluation in Service Engineering . . . . . . . . . . . . . . . . . . Sakae Yamamoto, Miki Hiroyuki, and Hirohiko Mori
76
A Human Interface Toolkit for Developing Operation Support System of Complex Industrial Systems with IVI-COM Technology . . . . . . . . . . . . Yangping Zhou, Yujie Dong, Xiaojing Huang, and Hidekazu Yoshikawa
82
Part II: Information and User Interfaces Design A Conceptual Model of the Axiomatic Usability Evaluation Method . . . . Yinni Guo, Robert W. Proctor, and Gavriel Salvendy
93
XXIV
Table of Contents – Part I
Study on Evaluation of Kawaii Colors Using Visual Analog Scale . . . . . . . Tsuyoshi Komatsu and Michiko Ohkura
103
Representation of Decision Making Process in Music Composition Based on Hypernetwork Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tetsuya Maeshiro, Shin-ichi Nakayama, and Midori Maeshiro
109
Some Issues toward Creating Human-Centric Services . . . . . . . . . . . . . . . . Hirohiko Mori
118
A User-Centric Metadata Creation Tool for Preserving the Nation’s Ecological Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fatma Nasoz, Renee C. Bryce, Craig J. Palmer, and David J. Rugg
122
Comparison between Mathematical Complexity and Human Feeling . . . . Masashi Okubo, Akiya Togo, and Shogo Takahashi
132
How Do Real or Virtual Agent’s Body and Instructions Contribute to Task Achievement? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yugo Takeuchi and Hisashi Naito
142
Interaction Mediate Agent Based on User Interruptibility Estimation . . . Takahiro Tanaka and Kinya Fujita
152
Ontological Approach to Aesthetic Feelings: A Multilingual Case of Cutism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akifumi Tokosumi and Fumina Teng
161
Constructing Phylogenetic Trees Based on Intra-group Analysis of Human Mitochondrial DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ivan Vogel, Frantiˇsek Zedek, and Pavel Oˇcen´ aˇsek
165
A Qualitative Study of Similarity Measures in Event-Based Data . . . . . . . Katerina Vrotsou and Camilla Forsell
170
Feasibility Study of Predictive Human Performance Modeling Technique in Field Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naomi Yano, Toshiyuki Asahi, Shin’ichi Fukuzumi, and Bonnie E. John Surprise Generator for Virtual KANSEI Based on Human Surprise Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masaki Zenkoyoh and Ken Tomiyama
180
190
Table of Contents – Part I
XXV
Part III: Visualisation Techniques and Applications Explicit Modeling and Visualization of Imperfect Information in the Context of Decision Support for Tsunami Early Warning in Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monika Friedemann, Ulrich Raape, Sven Tessmann, Thorsten Schoeckel, and Christian Strobl
201
Kansei Stroll Map: Walking around a City Using Visualized Impressions of Streetscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuichiro Kinoshita, Satoshi Tsukanaka, and Takumi Nakama
211
Multivariate Data Visualization: A Review from the Perception Aspect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yan Liu
221
Methods for Service Sciences from Visualization Points . . . . . . . . . . . . . . . Hiroyuki Miki, Naotsune Hosono, and Sakae Yamamoto
231
Interacting with Semantics: A User-Centered Visualization Adaptation Based on Semantics Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kawa Nazemi, Matthias Breyer, Jeanette Forster, Dirk Burkhardt, and Arjan Kuijper
239
Riding the Technology Wave: Effective Dashboard Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lisa Pappas and Lisa Whitman
249
Peer-to-Peer File Sharing Communication Detection Using Spherical SOM Visualization for Network Management . . . . . . . . . . . . . . . . . . . . . . . . Satoshi Togawa, Kazuhide Kanenishi, and Yoneo Yano
259
Visualizing Stakeholder Concerns with Anchored Map . . . . . . . . . . . . . . . . Takanori Ugai
268
VICPAM: A Visualization Tool for Examining Interaction Data in Multiple Display Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roshanak Zilouchian Moghaddam and Brian Bailey
278
Part IV: Security and Privacy Privacy Concern in Ubiquitous Society and Research on Consumer Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yumi Asahi
291
Modelling Social Cognitive Theory to Explain Software Piracy Intention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ameetha Garbharran and Andrew Thatcher
301
XXVI
Table of Contents – Part I
A Practical Analysis of Smartphone Security . . . . . . . . . . . . . . . . . . . . . . . . Woongryul Jeon, Jeeyeon Kim, and Youngsook Lee
311
Cryptanalysis to a Remote User Authentication Scheme Using Smart Cards for Multi-server Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youngsook Lee, Jeeyeon Kim, and Dongho Won
321
Exploring Informational Privacy Perceptions in the Context of Online Social Networks: A Phenomenology Perspective . . . . . . . . . . . . . . . . . . . . . . Emma Nuraihan Mior Ibrahim
330
Server-Aided Password-Authenticated Key Exchange: From 3-Party to Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Junghyun Nam, Juryon Paik, Jeeyeon Kim, Youngsook Lee, and Dongho Won
339
Does Privacy Information Influence Users’ Online Purchasing Behavior? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jimmy H. Nguyen and Kim-Phuong L. Vu
349
Analysis of Authentication Protocols with Scyter: Case Study . . . . . . . . . Oˇcen´ aˇsek Pavel
359
Routing Functionality in the Logic Approach for Authentication Protocol Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ocenasek Pavel and Hranac Jakub
366
An Approach for Security Protocol Design Based on Zero-Knowledge Primitives Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oˇcen´ aˇsek Pavel
374
Part V: Touch and Gesture Interfaces Effects of Joint Acceleration on Rod’s Length Perception by Dynamic Touch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takafumi Asao, Yuta Kumazaki, and Kentaro Kotani
381
ERACLE: Electromyography System for Gesture Interaction . . . . . . . . . . Paolo Belluco, Monica Bordegoni, and Umberto Cugini
391
Development of Tactile and Haptic Systems for U.S. Infantry Navigation and Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linda R. Elliott, Elmar T. Schmeisser, and Elizabeth S. Redden
399
Utilization of Shadow Media - Supporting Co-creation of Bodily Expression Activity in a Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Koji Iida, Shiroh Itai, Hiroko Nishi, and Yoshiyuki Miwa
408
Table of Contents – Part I
XXVII
Virtual Interaction between Human and Fabric . . . . . . . . . . . . . . . . . . . . . . Shigeru Inui, Akihiro Yoneyama, and Yosuke Horiba
418
Hand Gesture-Based Manipulation of a Personalized Avatar Robot in Remote Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teruaki Ito
425
Vector Keyboard for Android Platform-Based Devices . . . . . . . . . . . . . . . . Martin Klima and Pavel Slavik
435
Study on Haptic Interaction with Digital Map on Mobile Device . . . . . . . Daiji Kobayashi, Yoshitaka Asami, and Sakae Yamamoto
443
Characteristics of Information Transmission Rates Using Noncontact Tactile Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kentaro Kotani, Masayoshi Hayashi, Nobuki Kido, and Takafumi Asao Multimodal Threat Cueing in Simulated Combat Vehicle with Tactile Information Switching between Threat and Waypoint Indication . . . . . . . Patrik Lif, Per-Anders Oskarsson, Bj¨ orn Lindahl, Johan Hedstr¨ om, and Jonathan Svensson
450
454
Design of Vibration Alert Interface Based on Tactile Adaptation Model to Vibration Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuki Mori, Takayuki Tanaka, and Shun’ichi Kaneko
462
Applicability of Touch Sense Controllers Using Warm and Cold Sensations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miwa Nakanishi and Sakae Yamamoto
470
Information Processing for Constructing Tactile Perception of Motion: A MEG Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ayumi Nasu, Kentaro Kotani, Takafumi Asao, and Seiji Nakagawa
478
A Study on Selection Ability in the 3D Space by the Finger . . . . . . . . . . . Makoto Oka, Yutaro Ooba, Hidetaka Kuriiwa, Ryuta Yamada, and Hirohiko Mori Characteristics of Comfortable Sheet Switches on Control Panels of Electrical Appliances: Comparison Using Older and Younger Users . . . . . Yasuhiro Tanaka, Yuka Yamazaki, Masahiko Sakata, and Miwa Nakanishi Support for Generation of Sympathetic Embodied Awareness: Measurement of Hand Contact Improvisation under Load Fluctuation Stress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takabumi Watanabe, Yoshiyuki Miwa, Go Naito, Norikazu Matsushima, and Hiroko Nishi
488
498
508
XXVIII
Table of Contents – Part I
Part VI: Adaptation and Personalisation Different People Different Styles: Impact of Personality Style in Web Sites Credibility Judgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rahayu Ahmad, Jieyu Wang, Karoly Hercegfi, and Anita Komlodi A Comprehensive Reference Model for Personalized Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthias Breyer, Kawa Nazemi, Christian Stab, Dirk Burkhardt, and Arjan Kuijper Dynamic Interface Reconfiguration Based on Different Ontological Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elisa Chiabrando, Roberto Furnari, Pierluigi Grillo, Silvia Likavec, and Ilaria Lombardi
521
528
538
Analysis of Content Filtering Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Woongryul Jeon, Youngsook Lee, and Dongho Won
548
A Smart Movie Recommendation System . . . . . . . . . . . . . . . . . . . . . . . . . . . Sang-Ki Ko, Sang-Min Choi, Hae-Sung Eom, Jeong-Won Cha, Hyunchul Cho, Laehyum Kim, and Yo-Sub Han
558
Interactive Personalization of Ambient Assisted Living Environments . . . Alexander Marinc, Carsten Stockl¨ ow, Anreas Braun, Carsten Limberger, Cristian Hofmann, and Arjan Kuijper
567
Development of a System for Proactive Information Service . . . . . . . . . . . Myon-Woong Park, Soo-Hong Lee, Young-Tae Sohn, Jae Kwan Kim, Ilju Bae, and Jae-Kwon Lim
577
My Personal User Interface: A Semantic User-Centric Approach to Manage and Share User Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Till Plumbaum, Katja Schulz, Martin Kurze, and Sahin Albayrak
585
Part VII: Measuring and Recognising Human Behaviour Effect of Menstrual Distress on Task Performance . . . . . . . . . . . . . . . . . . . . Keiko Kasamatsu, Mi Kyong Park, and Seiko Taki A Study of Attention Control by Using Eye Communication with an Anthropomorphic Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tatsuya Mita, Ryo Wada, Noriaki Kuwahara, and Kazunari Morimoto Auditory Feature Parameters for Music Based on Human Auditory Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masashi Murakami and Toshikazu Kato
597
603
612
Table of Contents – Part I
Construction of a Model for Discriminating between Electroencephalographic Patterns at the Time of Incorrect Inputs Based on Sensitivity Spectrum Analysis . . . . . . . . . . . . . . . . . . . . . . Raita Ohori, Daiki Shinkai, Yoshimitsu Nagai, and Syohei Ishizu Basic Study of Analysis of Human Brain Activities during Car Driving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noboru Takahashi, Shunji Shimizu, Yukihiro Hirata, Hiroyuki Nara, Hiroaki Inoue, Nobuhide Hirai, Senichiro Kikuchi, Eiju Watanabe, and Satoshi Kato Bereitschaftspotential Modeling by DBNM and Its Application to BCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shino Takata, Toshimasa Yamazaki, Maiko Sakamoto, Takahiro Yamanoi, and Kenichi Kamijo Emotional Human-Machine Interaction: Cues from Facial Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tessa-Karina Tews, Michael Oehl, Felix W. Siebert, Rainer H¨ oger, and Helmut Faasch Development of an Eye-Tracking Pen Display for Analyzing Embodied Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michiya Yamamoto, Hiroshi Sato, Keisuke Yoshida, Takashi Nagamatsu, and Tomio Watanabe
XXIX
618
627
636
641
651
Care Giving System Based on Consciousness Recognition . . . . . . . . . . . . . Noriko Yokoyama, Tomoyuki Yamaguchi, and Shuji Hashimoto
659
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
669
Developing Optimum Interface Design for On-Screen Chinese Proofreading Tasks Alan H.S. Chan, Joey C.Y. So, and Steve N.H. Tsang Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Kowloon Tong, Hong Kong
[email protected]
Abstract. This paper includes a review of some related empirical studies concerning display factors that may contribute to the on-screen Chinese proofreading performance. The effects of typeface, font size, number of text lines, text direction, and copy placement on speed, accuracy, and subjective preferences in past reading tasks are discussed. This paper, in particular, introduces the development of a Chinese Proofreading System for proofreading experiments and delineates some research ideas for identifying the optimum interface design settings for on-screen Chinese proofreading tasks. The results of this research work are expected to provide useful design recommendations to assist in determining the display factor settings and text display layout that would improve work performance and satisfaction in Chinese comparison proofreading tasks. Keywords: Proofreading, Chinese Reading, Chinese Proofreading System, Usability Evaluation, Subjective preference.
1 Introduction Proofreading is an important task for reducing input and printing errors to ensure accurate information presentation to readers. The two main ways of proofreading are comparison proofreading and noncomparison proofreading. Comparison proofreading is a critical process in electronic book production and it involves careful reading of the dead copy (original copy) with a live copy and marking deviations (e.g. typeface, page and column breaks) from specifications, misspellings, nonstandard grammar, and other errors in language on the live copy. Noncomparison proofreading is usually done when there is no true dead copy or when the dead copy is only referred to if the proofreader sees something puzzling. An overview of the display factors of typeface, font size, number of lines of text, text direction, and copy placement that are likely to affect speed, accuracy, and subjective preferences in proofreading tasks is presented below. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 3–10, 2011. © Springer-Verlag Berlin Heidelberg 2011
4
A.H.S. Chan, J.C.Y. So, and S.N.H. Tsang
1.1 Typeface The display factors of typeface and font size have been shown to affect text readability and reading task efficiency on computer screens [1] - [3]. Research studies on Chinese characters with regard to legibility assessment [4], reading performance [5], and character identification task [6] have been reported. Chi et al. [4] found that Hei characters are the most legible, followed by Ming, Kai, and Li characters. In reading a variety of message signs, Lai [5] found that participants achieved higher accuracy for Hei and Ming styles than for Kai style. Given the diverse results concerning the effects of font type on different perceptual tasks, it is of great interest to attempt to determine the best font type for the proofreading process. 1.2 Font Size In measuring readability on computer screens as a function of character size, Snyder and Maddox [7] examined the effects of dot element size and character size and showed that smaller characters produced faster English reading speed than larger ones. However, in the same study, larger characters were found to produce faster search times in a search task. Larger characters are generally considered more readable than smaller ones, but it seems likely that there may be an optimum size such that changes in either direction (smaller or larger) will reduce reading performance [1, 8]. For reading or proofreading Chinese characters, the effects of font size are still largely unknown, though it was shown that the legibility of Chinese characters was enhanced by increasing font size [9]. Therefore, it is expected that on the basis of previous studies, larger font size should be preferred and result in better Chinese proofreading performance. 1.3 Number of Lines of Text, Line Spacing, and Line Length Past research examining the number of lines of text in paged view has produced varied results in reading tasks [10]. Ling and van Schaik [11] investigated the effects of text presentation on information search tasks and found that wider line spacing resulted in better accuracy and faster search times. Previous work on line length was conducted for English reading and the results revealed that line spacing and number of text lines are two opposing factors for determining the optimum line length for reading [12] - [14]. However, the optimum line length for reading Chinese is not yet known. There is thus a need to find out the optimum number of lines, line spacing, and line length for presenting large amount of materials in windowed view in Chinese proofreading task. 1.4 Text Direction Chinese text has been traditionally written and printed vertically from top to bottom of a page, with columns moving from right to left. However, since the early 20th
Developing Optimum Interface Design for On-Screen Chinese Proofreading Tasks
5
century this traditional convention has been largely westernized and, in a lot of printed material today, Chinese characters appear horizontally (usually starting in the top left corner). Sun et al. [15] found that horizontal Chinese characters were read twice as fast as vertical ones. Shih and Goonetilleke [16] revealed that for the Chinese population, a one-row word menu in Chinese was more effective than a one-column word menu in terms of search time and accuracy. Also, in a visual search study, Goonetilleke et al. [17] found that Chinese people used predominantly horizontal search patterns whilst searching for a Chinese target character in a background of distracting characters. Thus, the horizontal reading direction is expected to be more preferred and result in better Chinese proofreading performance than the vertical direction. However, it is believed that in a proofreading task, the text direction factor may interact with the copy placement factor. 1.5 Copy Placement Copy placement refers to the way in which the live and dead copies are arranged. Anderson [18] recommended that the live copy is always placed nearest to proofreader’s pen position. For the common left-right copy placement setting, the dead and live copies are under the left and right hands, respectively for the righthanders. An above-below setting can also be used and a common practice is that the dead copy is placed at the top and the live copy at the bottom [19], such that the live copy is also close to the proofreader’s hand. For Chinese characters particularly, a study on the effects of copy placement and its interaction with text direction on proofreading is necessary to help designers create better interfaces for proofreading tasks on computer screens.
2 Methodology 2.1 Experimental Design This study will examine the ways of improving proofreading performance and proofreader subjective preferences in two phases of research work. In view of the possible influence of the above mentioned display factors on proofreading Chinese, the effects of typeface, font size, text direction, and copy placement on speed, accuracy, and subjective preferences in proofreading will be examined in the first phase of this research. While the first phase is aimed to finding the optimum values or settings for display factors to present printed material on a page, the second phase will be directed at investigating how a large amount of text can be displayed and read effectively using scrolling in the window view method when the material cannot be accommodated within a single viewing page. The proofreading time and performance (number of hits, misses, false alarms and other errors.) will be measured and analyzed. To evaluate subjective preferences, a 10-point Likert scale is used to obtain participants’ opinions on attributes of proofreading comfort, ease and fatigue [3].
6
A.H.S. Chan, J.C.Y. So, and S.N.H. Tsang
2.2 Participants Native Chinese university students with more than five years of computer experience will be invited to participate in each experiment. They will all be right handed in operation of a computer mouse and for handwriting in daily life. Participants will be tested to ensure normal or corrected-to-normal vision with an Orthorator. 2.3 Materials A number of unrelated Chinese prose passages will be extracted from past examination papers of the Chinese Language subject of the Hong Kong Certificate of Education Examination. Mean passage length, including punctuation marks, will be around 550 characters. An average of five errors will be embedded per passage of a live copy. The errors will include extra word, missing word, wrong word, wrong order, and extra spacing, etc. Both dead and live copies will be presented in a positive polarity condition with black text on a white background. 2.4 Apparatus and Software A personal computer (Intel Xeon E5506 2.13 GHz) with a 24-inch liquid crystal display monitor and an application program prepared with Microsoft Visual Studio 2008 Professional will be used for stimulus presentation and response capture. At this stage of study, a Visual Basic program (2008 Express Edition) was used to develop the interface of Chinese Proofreading System. An experimenter can adjust the typeface, font size, line length, line spacing, text direction and copy placement according to specific test conditions. The system operation flow is shown in Fig. 1. An illustration of the proofreading interface is shown in Fig. 2. The proofreading system is embedded with five different check functions, which corresponds to the errors of extra word, missing word, wrong word, wrong order and extra spacing. Those check functions were assigned with different colors to stand out their representing errors. The dead copy and live copy are shown on the screen at the same time. The page number of the passage is shown on the top right hand corner to tell the progress of the test. The Next and Undo buttons are for proceeding to next passage and returning to last step respectively. Proofreading time and hitting accuracy for each passage are calculated by default formulas and those data will be stored for analysis. For detail operation of the proofreading system, readers can refer to the section 2.5. Generally, the Chinese Proofreading System contains three main components: User Maintenance, Participant Maintenance and Passage Maintenance. The User Maintenance allows the experimenter to assign experimental conditions to the participants. The Participant Maintenance allows participants to input their personal information. Their test performance is recorded. The Passage Maintenance allows experimenter to add or edit the passages.
Developing Optimum Interface Design for On-Screen Chinese Proofreading Tasks
Start
Start the Program
Enter Personal Information
Test Trial
Trial
Subjective Evaluation No
End of the Passages?
Yes End Fig. 1. System Operation Flow
7
8
A.H.S. Chan, J.C.Y. So, and S.N.H. Tsang
Fig. 2. An illustration of a developed proofreading interface
2.5 Chinese Proofreading System The purpose of this section is to show a proposed Chinese Proofreading System which will be used for the planned proofreading experiments. During proofreading, participants are allowed to scroll or navigate the two documents independently or jointly together by sliding the scroll wheel in the corresponding document frame areas. An illustration of the proofreading interface is shown in Figure 2. Participants will be asked to compare the live copy against the dead copy and to highlight the errors in the live copy as fast and accurately as they can by using the computer mouse to click the corresponding error type button shown at the bottom of the screen. Then the participants will be asked to evaluate subjective preferences for that condition using a 10-point Likert scale on attributes like comfort (1 = least comfortable, 10 = most comfortable), ease (1 = least easy, 10 = most easy) and fatigue (1 = least fatiguing, 10 = most fatiguing). This process will be repeated until the testing of all experimental conditions is finished. To avoid mental or visual fatigue, a two-minute rest will be given after a certain number of passages are proofread. Participants will take around one to two hours to complete the experiment.
3 Summary In view of the likely influence of the display factors discussed above on Chinese proofreading, the effects of typeface, font size, text direction, and copy placement on
Developing Optimum Interface Design for On-Screen Chinese Proofreading Tasks
9
speed, accuracy, and subjective preferences in proofreading materials displayed in a single page will be examined in the first experiment of this study. A natural extension of this work is to investigate how a large amount of text can be effectively displayed on screen so as to maximize performance and satisfaction of proofreaders when it is not possible to accommodate and present the material within a page of view. In each stage, the design details of an experiment will be carefully considered and confirmed. The experiments will then be conducted to collect participant task times, accuracy, and preference ratings for the different test conditions. The experimental data will be analyzed and the results will be useful for formulating design recommendations and determining the display factor and screen design settings that will improve comparison proofreading performance for Chinese. Acknowledgments. The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU 110410). The authors thank for the technical support provided by SM Wong.
References 1. Bernard, M.L., Chaparro, B.S., Mills, M.M., Halcomb, C.G.: Comparing the effects of text size and format on the readability of computer-displayed Times New Roman and Arial text. International Journal of Human-Computer Studies 59, 823–835 (2003) 2. Wang, A.H., Chen, C.H.: Effects of screen type, Chinese typography, text/background color combination, speed, and jump length for VDT leading display on users’ reading performance. International Journal of Industrial Ergonomics 31, 249–261 (2003) 3. Chan, A.H.S., Lee, P.S.K.: Effect of Display Factors on Chinese Reading Times, Comprehension Scores and Preferences. Behaviour & Information Technology 24, 81–91 (2005) 4. Chi, C.F., Cai, D., You, M.: Applying image descriptors to the assessment of legibility in Chinese characters. Ergonomics 46, 825–841 (2003) 5. Lai, C.J.: An ergonomic study of Chinese font and color display on variable message signs. Journal of the Chinese Institute of Industrial Engineers 25, 306–331 (2008) 6. Yau, Y.J., Chao, C.J., Hwang, S.L.: Optimization of Chinese interface design in motion environments. Displays 29, 308–315 (2008) 7. Snyder, H.L., Maddox, M.E.: Optimal element size-shape spacing combinations for a 567 matrix in information transfer from computer-generated dot-matrix displays.Tech. Rep. HFL-78-3, ARO-78-1 (1978) 8. Mills, C.B., Weldon, L.J.: Reading text from computer screens. Computing Surveys. Computing Surveys 19, 329–358 (1987) 9. Cai, D., Chi, C.F., You, M.: The legibility threshold of Chinese characters in three-type styles. International Journal of Industrial Ergonomics 27, 9–17 (2001) 10. Dyson, M.C.: How physical text layout affects reading from screen. Behaviour & Information Technology 23, 377–393 (2004) 11. Ling, J., van Schaikm, P.: The influence of line spacing and text alignment on visual search of web pages. Displays 28, 60–70 (2007) 12. Duchnicky, R.L., Kolers, P.A.: Readability of text scrolled on visual display terminals as a function of window size. Human Factors 25, 683–692 (1983)
10
A.H.S. Chan, J.C.Y. So, and S.N.H. Tsang
13. Dyson, M.C., Kipping, G.J.: The effects of line length and method of movement on patterns of reading from screen. Visible Language 32, 150–181 (1998) 14. Rayner, K., Reichle, E.D., Pollatsek, A.: Eye movement control in reading: an overview and model. In: Underwood, G. (ed.) Eye guidance in reading and scene perception, pp. 243–268. Elsevier, Oxford, UK (1998) 15. Sun, F.C., Morita, M., Stark, L.W.: Comparative patterns of reading eye movement in Chinese and English. Perception & Psychophysics 37, 502–506 (1985) 16. Shih, H.M., Goonetilleke, R.S.: Effectiveness of menu orientation in Chinese. Human Factors 40, 569–576 (1998) 17. Goonetilleke, R.S., Lau, W.C., Shih, H.M.: Visual search strategies and eye movements when searching Chinese character screens. International Journal of Human-Computer Studies 57, 447–468 (2002) 18. Anderson, L.K.: Handbook for Proofreading. NTC Business Books, Lincolnwood (1990) 19. Newby, G.B., Franks, C.: Distributed proofreading. In: Proceedings of the 3rd ACM/IEEECS Joint Conference on Digital Libraries, pp. 361–363. IEEE Computer Society, Washington (2003)
“Life Portal”: An Information Access Scheme Based on Life Logs Shin-ichiro Eitoku1, Manabu Motegi1, Rika Mochizuki1, Takashi Yagi1, Shin-yo Muto1, and Masanobu Abe2 1 Nippon Telegraph and Telephone Corporation, 1-1 Hikari-no-oka, Yokosuka-Shi, Kanagawa, 239-0847 Japan {eitoku.shinichiro,motegi.manabu,mochizuki.rika, yagi.takashi,muto.shinyo}@lab.ntt.co.jp 2 Okayama University, 1-1-1 Tsushima-naka Kita-ku Okayama-Shi Okayama, 700-8530 Japan
[email protected]
Abstract. In this paper, we propose a life log viewer that gives the users new findings from life logs in daily life, and provides seamless integration with external information. We classify life logs into two types. "Unintended life logs" are recorded automatically without the user's direct input; "intended life logs" are recorded purposefully by the user. Based on this classification, we implement a prototype that has two characteristics. First, it can visualize a user's unintended life log from long-term and multi-dimensional points of view. Second, its user interface is designed to visualize the transitions from the analysis results of the unintended life log to event data in the intended life log, and from event data in the intended life log to search results that provide information about the events. We believe that this viewer is a stepping-stone to the "Life Portal" that integrates existing portals with life log analysis to create a stimulus for search initiation. Keywords: Life log, Visualization, Scheduler, E-mail, GPS.
1 Introduction Thanks to the rapid growth of the Internet, we can now easily access enormous amounts of information. That includes niche information that we could not imagine accessing decades ago. Needless to say, search engines play a key role. However, they fail to support the user most effectively in that they do not recognize the user's interests or come up with ideas or things that the user wants to know; the user needs to be inspired by acquiring the desire to initiate searches. The miniaturization of mobile terminals with various sensors (e.g. GPS devices and acceleration sensors) has made it possible to continuously collect some kinds of life log over long periods without the user's direct intervention (e.g. position from GPS devices, operation from remote controller). In the MyLifeBits project, for example, Gemmell et al. proposed a platform to manage the personal life log extracted from many information sources [1]. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 11–20, 2011. © Springer-Verlag Berlin Heidelberg 2011
12
S. Eitoku et al.
Our idea is to use recollections as cues to initiate new searches; the result is that all searches start with life log analysis. We call this new search framework, "Life Portal." In this paper, we propose a life log viewer that can extract, for the user, new findings as search cues from the life log of daily life activities, and provides seamless integrated access to external information. For this purpose, we classify life logs into the types of “unintended life logs” and “intended life log”. "Unintended life logs" are recorded automatically without the user's intervention. "Intended life logs" are recorded manually for some specific purpose. Based on this classification, we implemented a prototype that has the following characteristics. Point 1: it visualizes the user's unintended life log gathered over the long-term with multi-dimensional viewpoints. Point 2: its user interface is designed to visualize transitions from the analysis results of the unintended life log to event data in the intended life log, and from event data in the intended life log to search results that provide information about the events. We conduct a simple experiment and confirm that these features of the prototype can give the user new viewpoints (motivation) to access other information.
2 Classification of Life Logs and Usages of Life Log Generally speaking, we are happier, feel more satisfaction or become excited when we find something that we did not expect. From this point of view, in terms of life log and its usage, we introduce the classification hypothesis shown in Figure 1. There are two types of life log. The intended life log covers blogs, photos, schedules, and so on, all of which are created by the user or intentionally recorded by the user for later use. The unintended life log includes locations obtained by GPS, motions as identified by acceleration sensors, actions obtained by a remote controller, and so on. These are automatically and continuously recorded with no direct user intervention over a long period. Our idea is that the user is more surprised when we get information from less aware things. Also, the user is more surprised when we can get information that is not included in the life logs. Therefore, user surprise increases in the following order, expected usage of intended life log, unexpected usage of intended life log, and unexpected usage of unintended life log. For example, reviewing old photos (intended life log) is an expected usage, because the users can get no more information than the one included old photos, so the degree of surprise is small. On the other hand, recommendation using user’s life log is one of the most effective usages at this moment, because the user will get unexpected information from unintended life log. The recommendation system of Amazon.com, Inc.[2], for example, proposes several Life log
Intended record
Expected usage
Add value
- Blogs - Photo taken by the user - Schedule etc. Unintended record
- Location (by GPS devices) - Motion (by acceleration sensors) - Action (by remote controller) etc.
- (Active) recall etc.
Unexpected usage - Recommendation - Information access (discussed in this paper) etc.
Fig. 1. Value in Life Log Usage
“Life Portal”: An Information Access Scheme Based on Life Logs
13
items that are related to an item selected by a particular user. The recommendation is generated based on the tendencies of other users that bought the selected and the tendency is calculated from unintended life log data; i.e., purchase history or URL click logs of other users. The use can get an item information that is not included his/her life log (unexpected usage), and purchase history or URL click logs are recorded automatically without the user's intervention (unintended life log). Our conviction is that new findings or ideas inspired by life log (especially, unintended life log) observations will invoke new searches, another unexpected usage of life log.
3 Concept of “Life Portal” The dotted frame in Figure 2 shows the current information access framework. The motivations for information access include user’s interest or curiosity and are invoked by stimuli from the external world. Users can get information by applying search engines to the Internet or from the feedback to questions proposed to Social Networking Service (SNS). Here, it should be noted that the user is expected to be aware of what he/she wants in the first place; i.e., even if there is a theme that will prove interesting to the user, nothing happens unless the user initiates the search by himself/herself. Another limitation of the framework is that the stimuli that cause user’s interest of curiosity mainly come from the external world. Much information in external world is recorded in the form of web page (e.g. news site and shop site) and we can access and use them easily in anytime. Although, human’s memory has limitation, so, it is more difficult to access and use his/her memories. Against these limitations, life log makes it possible to add a new aspect to this framework. First, life log can provide "personal" events that can invoke new searches, because it records what the user, and only the user, experienced. Therefore, the recall of a forgotten event raises the chance that the user will become curious about the event. Second, life log analysis can provide the user with new viewpoints or tendencies, because life log can continuously collect data over long periods from several information sources. By observing the data from long-term or multidimensional points of view, the user may reach a finding that differs from those he reached at the time. For example, if the user feels that he is busy today, his understanding can be improved by reference to activities over the last few weeks. Again, the new findings can invoke new searches. The integration of all components, shown by the solid frame in Figure 2, yields the "Life Portal". The visualization of unintended life logs and combination with the current information access framework Support techniques - Recommendation - Visualization etc.
Life Log
Awareness, New viewpoint Tendency, Event recalled
External world
Stimulus
Life Portal
Stimulus
Interest Curiosity
Current information access framework Portal Site
Search
Browse information
SNS Site
Post opinion or Call for information
Feedback from other users
Fig. 2. “Life Portal” and Current Informational Access Framework
14
S. Eitoku et al.
will create, in effect, a new information source that can invoke other interests or raise the user's curiosity. Just recording the user’s life logs, the user can find such a interests or curiosity and access related information easily.
4 Related Works Olguin et al. analyzed the frequency of face-to-face communication and the unrecognized behaviors of organizations [3]. They extracted the relationships between the subjects and estimated the subjects' condition. Ushiama et al. studied a method to extract the relationships between life logs collected from different information sources [4]. However, none of these papers discussed approaches to visualizing such life logs. On the other hand, there are some studies to visualize life logs for recall. Major approaches to the visualization of different kinds of life log include time lines and mapping. Rekimoto proposed an approach that uses time lines to visualize the files in a PC [5]. Aizawa et al. studied the interface needed to access life log holding data of outdoor actions [6]. De Silva et al. proposed a method to visualize and search the life log collected from peoples' activities in a house [7]. The method proposed by Ringel et al. shortens the time taken to search a life log by showing multiple search results on the same timeline [8]. Kim et al. use two axes for displaying a life log; a timeline and a map [9]. According to the result of user's operation on one axis, the life logs displayed on the other axis are automatically changed. Eagle et al. collected life logs of 100 subjects for 9 months by using the sensors on their cellular phone; GPS, Bluetooth, and so on. They visualize the data on a map [10]. As another approach to help the user to understand past events easily, methods that use the comic style have been proposed [11] [12]. However, none of these papers discussed how the user could be given new viewpoints that are one of important functions for “Life Portal”. In terms of combination of life logs for visualization, Ueoka et al. proposed a method in that the kinds of life log are changed dynamically to match the time granularity [13]. Kapler et al. proposed a method to visualize the activities on a timeline and a map at the same time [14]. However, the main usage of these methods is recall and they do not support integration with current information access frameworks, like having a function to access other sources of information.
5 Prototype System of “Life Portal” The prototype system was designed to support the following functions: - Visualization to find new information from unintended life logs - Seamless integration with the current information access framework The visualization of unintended life log induces new viewpoints. This raises new interests or curiosity that may trigger the access of other information. By linking the new viewpoints, those the user already recognizes, and external information, the user can access and browse external information through Life Portal.
“Life Portal”: An Information Access Scheme Based on Life Logs
15
As a basic function for supporting access to information through the motivation of the stimulus induced by the visualization, we introduce a scheduler. Note that the user enters the 4W attributes (who, what, when, and where) by himself/herself. Although it is not always true that the scheduler items are actually entered, the information is still a good reference for knowing what the user was interested in. Life log sources used by our prototype are the information registered in the scheduler, e-mails sent and received, and GPS data. The following sections introduce "Visualization of unintended life log" and "Access of Information" as provided by this prototype. 5.1 Panoramic View: Visualization to Find New Information from Unintended Life Log The first main function is "Panoramic View"; its aim is to show the frequencies with which the user contacted each person in each period. The frequencies of contact are calculated from the number of e-mails sent and received and the number of scheduled meetings or contacts. Visualization is implemented as color density (Figure 3). Stronger colors mean higher contact frequencies. Vertical axis plots the time line, and horizontal axis plots the names of the people contacted by the user. In this prototype, frequency is calculated by the following equation (1).
f [i ] = u[i ] / max u[ j ] j =0... n
i, j : ID allocated to each person the user contacted f[i] : Frequency concerning User i for a certain period u[i]: Sum of the number of schedule information and e-mails sent and received for a certain period, concerning the person whose ID is i
(1)
The first characteristic is to show long-terms trends and to permit comparisons among other people at the same time. Both attributes are important in finding new viewpoints. For example, a manager can find any imbalance in his business communication by comparing the contact frequencies of his subordinates. That is, we visualize these frequencies on two axis; time and people, not just timeline. The second characteristic is that data calculated on different granularities are displayed at the same time. For example, though a user got 100 e-mails from a certain person, receiving 100 e-mails on just one day in a month has quite different meaning from receiving 3 or 4 per day for a month. Therefore, we consider that it is important to display multiple levels of data at the same time. 5.2 Scheduler View: Seamless Integration with Current Information Access Framework The second main function is "Scheduler View"; its aim is seamless combination with the current information access framework. This is achieved by using the information in the user’s scheduler (Figure 4 (A)). The characteristic is that the user can access related information and browse it in the "Scheduler View" since the schedule information is automatically linked to the visit location extracted from GPS data.
16
S. Eitoku et al.
27/02/2009(Fri.) – 22/02/2010(Mon.) 2009/02 03
Contact frequencies for Meaning of color and color intensity
04 05
Time line
Blue color part: Frequency calculated per week (Strong color: high frequency)
06 07 08
1mont
1week
09
1week
10
1week
11
1week 1week
12
Mr. A Mr. B Mr. C Mr. D Mr. E Ms. F Mr. G Mr. H Mr. I Mr. J Mr. K Mr. L Mr. M Mr. N Mr. O Mr. P Mr. Q Mr. R Mr. S Mr. T
Contacted persons’ names
Frequenc y
…
02
Frequency for Ms.F
2010/01
…
Red color part: Frequency calculated per month (Strong color: high frequency)
Fig. 3. Overview of “Panoramic View”
This prototype provides two kinds of information through the "Scheduler View" according to the date of the schedule information; - Actual visited place and time corresponding to the selected schedule information (An example of the other kind of the user’s life log) - Information of shops near the place scheduled to visit (An example of information other than the user’s life log) When the user selects the schedule information of past events, actual visited addresses and times are displayed on a map (Figure 4 (B)). There is sometimes a difference between the information entered in the scheduler and the real action in terms of time. Also, it is sometimes difficult to identify the address from just the information in the scheduler, because the place names sometimes consists of just a part of an address (e.g. "Tokyo", "Osaka") or alternate terms (e.g. "Office") from which it is difficult to identify the address of visited place (e.g. "Yokosuka Kanagawa, Japan"). Therefore, this prototype determines the visited place as that which corresponds to the selected schedule information using the time attribute. At first, this system extracts visited place and time from GPS data using the method proposed in [15], and calculates the similarity defined by equation (2) for each extracted visited place. We assume that the visited place that had the highest similarity s(u,v,y,z) to the real action corresponded to the schedule information. The resulting matched pairs (e.g. Place name entered in scheduler: "Office" --- Corresponding address: "Yokosuka Kanagawa, Japan") are stored in a database, and this is used for estimating the place at which a future event will occur (mentioned below).
“Life Portal”: An Information Access Scheme Based on Life Logs
s (u , v, y, z ) = e
−(
17
z − y v −u ( v + u ) − ( z + y ) + − ) v −u z − y 2⋅( v −u )
(2)
u: Planned start time registered in scheduler,v: Planned end time registered in scheduler y: Start time the user stayed in the place, z: End time the user stayed in the place
When the user selects the information about a future event as entered in the scheduler, this system shows on map information of the shops near the place where the user will be staying (Figure 4 (C)); this information is an example of information that supports the user's future action. The system uses the place attribute (address) of selected schedule entry in the database to conduct a search. In the example mentioned above, when the user registers the schedule information whose place name is "Office", the system presumes that the scheduled event will be conducted at "Yokosuka Kanagawa, Japan". The prototype then displays the shop information near the address. (A) Overview
Information registered in scheduler Date (When)
Summary (What)
Attendee Place (Whom) (Where) Mr.E Mr.Y
Meeting
Musashino
(B) For past event
Visited place extracted from GPS data
(C) For future event
Place the user will Shop name
Musashino-Shi, Tokyo, Japan 12/02/2010 14:31-17:02
1-14-11 Musashino, Tokyo
12/02/2010 14:31-17:02 Musashino, Tokyo
Visited duration extracted from GPS Shop place
Shop address
Fig. 4. Overview of “Scheduler View”
5.3 Example of Usage Flow Figure 5 shows an example of usage flow. The user browses the frequency of contact in "Panoramic View" in Figure 5 (i). When the user clicks the contact frequency for certain person in a certain period (Figure 5 (A)), the frequency in the selected term is displayed in numerical value (Figure 5 (B)). Its range is from 0.0 to 1.0, 1.0 is the highest frequency. By clicking the button in Figure 5 (C), the prototype displays a summary of schedule information for the selected term. When the user selects a certain summary of schedule information (Figure 5 (D)), details of the selected item (date, summary, attendee and place information) are shown (Figure 5 (iv)). By clicking the place associated with this schedule information (Figure 5 (E)), the user
18
S. Eitoku et al. (A)
Visualization for giving awareness / new viewpoints
Ms. A
Mr. B
Mr. C
Mr. D
Mr. G
Mr. E Mr. F
Mr. Y
Display information to support user in acquiring details to find viewpoints
(i) Visualization of frequency of contact number calculated from e-mails sent and received and schedule information
(C)
Click (A)
Mr.J
(ii) Displaying contact frequency in numerical value Mr.Y
(B) Contact frequency
Mr.E
Click (C)
(D)
Mr.M
Schedule related to Mr. Y 2/12
Exhibition of N Corp.
2/23
Meeting for N Corp.
(iii) Displaying title of emails sent and received and summary of schedule information related to the selected person
Click (D) 12/02/2010 13:30-17:00 Exhibition of N.Corp.
(iv) Displaying the date, summary, attendee and place of selected schedule information
Mr. E, Mr.Y Musashino
Click (E) [For past event]
(E)
Musashino, Tokyo, Japan 12/02/2010 14:31-17:02 02/12/2010 14:31-17:02 Musashino, Tokyo
(v) Displaying the other life log as related to the schedule information.
(F) Access to external information
Click (E) [For future event] 1-14-11, Musashino, Tokyo
(vi) Displaying external information (e.g. recommended shops in the area where user will stay) using schedule information and GPS data
(G)
Fig. 5. Example of Flow in Usage of Prototype
“Life Portal”: An Information Access Scheme Based on Life Logs
19
can access other life log or information other than the user's life log. From the information about the past event, actual visited place, and time corresponding to the selected schedule information are shown; see Figure 5 (F). On the other hand, from the information of future events, some information about shops near the scheduled meeting spot is displayed (Figure 5 (G)).
6 Visualization of Actual Life Log We entered one subject's life logs (scheduler entries, e-mails received and sent for business, and GPS data) gathered over a 2-year period into the prototype (The subject is one of the authors). The subject then used this system and reported his experiences in using the prototype. The subject pointed out that "Panoramic View" allowed him to more clearly recognize the key turning points, which had been recognized only vaguely before. We can see that the subject exchanged e-mails and had meetings mainly with Mr. Q until October 2009 (Figure 6 (A)); however, from October 2009, the main contact person changed from Mr. Q to Mr. M and Ms. F (Figure 6 (B)). His work content actually changed around this time. He also pointed out the difference between the completion time of a project between the information in the schedule (intended life log) and that in the contact frequencies (unintended life log). He discovered that although the scheduler indicated that the project had already been finished, he continued to frequently contact the person in charge for some time (Figure 6, right). These results indicate that "Panoramic View" can stimulate new interests or satisfy curiosity by accessing other information. 27/02/2009(Fri.) – 23/02/2010(Tue.) 2009/0
Schedule Information
(A)
26/05/2009 11:00-12:00 Meeting concerned with “Event A”
03 04 05
02/06/2009 09:30-11:00 “Event A”
06 07
(B) 2009/05
08 09
2009/06
10 11 12
Mr. A
2010/0 02 Mr. A Mr. B Mr. C Mr. D Mr. E Ms. F Mr. G Mr. H Mr. I Mr. J Mr. K Mr. L Mr. M Mr. N Mr. O Mr. P Mr. Q Mr. R Mr. S Mr. T
Fig. 6. Example of Frequency of E-mails Sent and Received
20
S. Eitoku et al.
7 Conclusion and Future Works This paper introduced the framework of "Life Portal" as a new value of "Unexpected usage" of life logs. It allows life logs to yield new motivation for accessing information. In the "Life Portal" framework, we classify life logs in two types. "Unintended life logs" are recorded automatically without the user's intervention; "intended life logs" are deliberately recorded by the user. Based on this classification, a prototype was developed that could visualize the unintended life log in panoramic view, combine unintended and intended life logs, and provide seamless access to related information through schedule information. We conducted a simple experiment and confirmed that the prototype could give the subject new viewpoints that motivated him to access other information. In the future, we will conduct more detailed experiments to investigate the influence of visualization in finding new viewpoints. We will also develop a function to access information other than shop information.
References 1. Gemmell, J., et al.: MyLifeBits: fulfilling the Memex vision. In: Proc. of the tenth ACM international conference on Multimedia (MULTIMEDIA 2002), pp. 235–238 (2002) 2. Amazon.co.jp, http://www.amazon.co.jp/ 3. Olguin, D.O., et al.: Sensible Organizations: Technology and Methodology for Automatically Measuring Organizational Behavior. IEEE Trans. on Systems, Man, and Cybernetics Part B: Cybernetics 39(1), 43–54 (2009) 4. Ushiama, T., et al.: A Life-Log Search Model Based on Bayesian Network. In: Proc. of IEEE 6th International Symposium on Multimedia Software Engineering, ISMSE 2004 (2004) 5. Rekimoto, J.: Time-Machine Computing: A Time-centric Approach for the Information Environment. In: Proc. of UIST 1999, pp. 45–54 (1999) 6. Aizawa, K., et al.: Efficient Retrieval of Life Log Based on Context and Content. In: Proc. of CARPE 2004, pp. 22–30 (2004) 7. De Silva, G.C., et al.: An Interactive Multimedia Diary for the Home. IEEE Computer, Special Issue on Human Centered Computing 40(5), 52–59 (2007) 8. Ringel, M., et al.: Milestones in Time: The Value of Landmarks in Retrieving Information from Personal Stores. In: Proc. of INTERACT (2003) 9. Kim, I., et al.: PERSONE: Personalized Experience Recording and Searching On Networked Environment. In: Proc. of CARPE 2006, pp. 49–53 (2006) 10. Eagle, N., et al.: Reality Mining: Sensing Complex Social Systems. Personal and Ubiquitous Computing 10(4), 255–268 (2006) 11. Sumi, Y., Sakamoto, R., Nakao, K., Mase, K.: ComicDiary: Representing Individual Experiences in a Comics Style. In: Borriello, G., Holmquist, L.E. (eds.) UbiComp 2002. LNCS, vol. 2498, pp. 16–32. Springer, Heidelberg (2002) 12. Cho, S.B., et al.: AniDiary: Daily Cartoon-Style Diary Exploits Bayesian Networks. IEEE Pervasive Computing 6(3), 66–75 (2007) 13. Ueoka, R., et al.: Virtual Time Machine. In: Proc. of 11th International Conference on Human-Computer Interaction, HCII 2005 (2005) 14. Kapler, T., et al.: GeoTime Information Visualization. Proc. of IEEE Information Visualization 2004, 25–32 (2005) 15. Nishino, M., et al.: A place prediction algorithm based on frequent time-sensitive patterns. In: Proc. of Pervasive 2009 (2009)
Proposal of the Kawaii Search System Based on the First Sight of Impression Kyoko Hashiguchi and Katsuhiko Ogawa Faculty of Environment and Information Studies, Keio University, 5322 Endo Fujisawa-shi, Kanagawa-ken, 252-0882, Japan {t07624kh,ogw}@sfc.keio.ac.jp
Abstract. We propose a blog search engine called “Kawaii Search” (where Kawaii means pretty) to search blogs based on the impression of their text on a printing surface, considering factors such as the format and layout of text and density of words. Particularly in Japan, blogs reveal the personality characteristics of users depending on how they place their text. For example, some writers leave more space between lines or use hieroglyphics and “Gal words[1],” which consist of slang or abbreviations. Further, words can be categorized using four types of characters: kanji, hiragana, katakana, and alphabet. Each results in a different impression that reveals a writer’s personality. Given this approach, blog readers can not only read blog, but also interpret each writer’s personality. By focusing on impression differences, we propose a new search algorithm specialized for Japanese blogs. To show that these differences can act as the base of our search algorithm, we conducted an experiment that successfully verified the algorithm applied to the following three blog patterns: “kawaii” (pretty or lovely), “majime” (seriousness or industrious), and “futsu” (normal). The results show that in terms of the accuracy of the algorithm, our study categorized “kawaii” well; however, “majime” and “futsu” did not show good results. Keywords: Impression, Blog search engine, text formatting, Japanese blogosphere, information retrieval.
1 Introduction Blog search systems are generally based on the statistical and structural information in the blog text, including the frequency or relationships of words [2]. These systems search for blog articles that suit user requirements based on the content and keywordbased search techniques, including page rank and TFIDF [3]. With increase in the variety of blog writers’ styles and readers searching blogs for different reasons, a more sophisticated search system is required. For example, users not only want to read an article that matches the content they are searching for, but also want to find a blog that meets their aesthetic requirements. Particularly in Japanese blogs, pictographs and emoticons are frequently used to express a blog writer’s individuality. Conventional search systems such as Google do not reveal the atmosphere or personality of its text; however, when people read blogs and diaries, they often look G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 21–30, 2011. © Springer-Verlag Berlin Heidelberg 2011
22
K. Hashiguchi and K. Ogawa
not only at the words, but also the design and layout of the text. The Kawaii Search system analyzes the qualitative information such as impression and layout of blogs quantitatively. We therefore propose this search system to find blogs which are visually preferred by users.
2 Kawaii Search Kawaii Search is a system that searches blogs based on their appearance. In this section, we reveal typical differences found in blog appearances. We also describe the concept and vision of Kawaii Search. 2.1 Blogs in Japan When we compare blogs in Japan to those in other countries, blogs in Japan have more character sets, including kanji, hiragana, katakana, and Roman. Further, Japanese blog writers use more spacing, symbols, pictograms, and emoticons than blog writers in other countries. These various styles give readers different impressions for each individual blog. As an example, a blog discussing “Ichiro” and consisting of more kanji, less blank space, and no pictograms may give an impression of a serious blog, as shown in Figure 1(a). Conversely, even though the topic is the same, a blog that consists of less kanji, more blank space, and many of pictograms will have a pretty (or light) impression, as shown in Figure 1(b). Different writing patterns create different impressions.
(a)
(b)
Fig. 1. Two blogs discussing “Ichiro”; (a) consists of more kanji, less blank space, and no pictograms; (b) consists of less kanji, more blank space, and many emoticons
In Japan, blogs written celebrities are popular among the public [4]. Each individual writer has a distinct writing pattern. Since writers know that many people read their articles, they may consider ways to increase readership through the blog’s textual layout. A writer may arrange the article by using effective spacing and emoticons so that it appears prettier, or use kanji for a smarter appearance. In other words, a blog’s
Proposal of the Kawaii Search System Based on the First Sight of Impression
23
appearance indicates the impression of the article, and the style of writing reflects the writer’s personality and character. Blog’s readers not only understand the meaning of the blogs, but also read writer’s personality and character. In this way, because a blog’s appearance is important for reader and writer, we propose the search system based on the impression of their text on a printing surface, considering factors such as the format and layout of text and density of words. 2.2 Concept The Kawaii Search concept is described in this section. In addition to keyword-based blog search, Kawaii Search shows blogs similar to those of celebrities in terms of appearance. As illustrated in Figure 2, three celebrity blogs are shown at the top of the Kawaii Search site. These blogs are categorized by the following icons: Majime (serious), Kawaii (pretty), and Futsu (normal). Above these icons is a textbox in which users can enter their search keywords. At the bottom of the Kawaii Search interface, search results are listed and the site shows the titles and thumbnails of each “hit.” Instructions are as follows: 1. Enter the keyword and click one of the three icons shown (i.e., Majime, Kawaii, or Futsu). 2. The site will show blog articles similar to the one that the user selected. Figure 3 shows example results from a search request.
Fig. 2. Kawaii Search interface
In Figure 3, the first article is a kawaii celebrity blog, and the second article is a search result that is the top hits of kawaii. Since both writers use many pictograms and spaces, those articles have similar impression. Conversely, the third article is a majime celebrity blog, and the fourth article is the search result that is the top hits of majime. Both articles consist of more words and no pictograms or emoticons. As is evident from these examples, Kawaii Search can successfully match blogs that have a similar appearance to the base blog.
24
K. Hashiguchi and K. Ogawa
a. Kawaii celebrity blog
b. Search result similar to c. Majime celebrity blog kawaii celebrity blog
d. Search result similar to the majime celebrity blog.
Fig. 3. Example Kawaii Search results
3 The Kawaii Search Algorithm 3.1 Kawaii Value It is difficult to quantify judgmental standards used when a person reads a blog article; however, in many cases, we may judge external characteristics of the blog using the overall impressions of sentences. This includes measures such as line row, character arrangement, condition of sentences, and so on. Kawaii Search focuses on these constituents, using the following six variables: Conspicuous Value, Words Value, Vertical Space Value, Hiragana Value, Emoticon Value, and Pictogram Value. An explanation of each of these follows. Conspicuous Value. Conspicuous Value is based on how obvious Japanese characters are to the human eye. As illustrated in Figures 4(a) and 4(b), the conspicuousness of a word is based on the word itself and the words surrounding it. In Figure 4(a), when we compare (phoenix) and (one), seems bolder, whereas in Figure 4(b), in is not as prominent. As described in Figure 4(a), when we compare “ ” (phoenix) and ” ” (one), “ ” seems bolder. On the other hand, as shown in Figure 4(b), ” ” in “ ” is not as bold as “ ” in Figure 4(a). We define conspicuousness as the Conspicuous Value based on the strokes of a character. The calculation method is as follows. As shown in Figure 5, the red boxes identify the groups of words. The blue numbers are the number of strokes in each word. For example, to calculate the Conspicuous Value of the word (sunny) in the sentence shown, (it is sunny today), we first add 12 and 3, the number of strokes for . Second, if we have multiple characters in the word, we calculate the average by dividing the resulting sum by the number of characters in the word (i.e., 15/2 = 7.5). Third, we add the number of strokes in words next to this word; in the example, the neighboring words are and (i.e., 4 + 4 + 3 = 11). Fourth, we divide this sum by the number of characters in those surrounding words (i.e., 11/3 = 3.67). Finally, we divide the average number of the word by the
鸞 鳥鸞鳥
鸞
鸞
一
鸞
鸞
鸞
一 鳥鸞 鳥
晴れ
今日は晴れです 晴れ
は
です
晴れ
鸞
Proposal of the Kawaii Search System Based on the First Sight of Impression
25
average number of the surrounding words (i.e., 7.5/3.67 = 2.04). The Conspicuous Value for is therefore 2.04. By using these steps, all words are assigned Conspicuous Values.
晴れ
(a)
(b)
Fig. 4. Conspicuousness: (a) high conspicuousness; (b) low conspicuousness
Fig. 5. Example words and corresponding strokes for calculation of the Conspicuous Value of each word
Other Values. In this section, we describe the other proposed variables, i.e., the Words Value, Vertical Space Value, Emoticon Value, Pictogram Value, and Hiragana Value. Figure 6 illustrates these values using an example. Words Value is the number of words used in the given blog article. This value is used to measure the length of the article.
Fig. 6. Example blog excerpt with Words Value, Space Value, Emoticon Value, Pictogram Value, and Hiragana Value shown
Vertical Space Value, Emoticon Value, and Pictogram Value are the frequencies of appearance of those types of elements in the given blog article. Vertical Space Value corresponds to vertical space created by
tags; Emoticon Value corresponds to emoticons; and Pictogram value refers to pictograms. Figure 7 shows examples of emoticons. These values quantify the characteristics of the article as follows:
26
K. Hashiguchi and K. Ogawa
Emoticon Value = Emoticon / Words
(1)
Vertical Space Value = Space / Words
(2)
Pictogram Value = Pictogram / Words
(3)
In each of the above equations, words refers to the number of words in the given blog. ᧭
᧮
᧯
᧰
⅙㡴ቒ椇ₙቑ傃剡ትሺቂ
⅙㡴ቒ AYA
⅙㡴ቒ AYA
⅙㡴ቒᅀAAᅀ
ᇭ
椇ₙቑ傃剡ትሺቂᇭ
椇ₙAA
椇ₙቑAA
AA
ቑAA
傃剡! !
傃剡ትAA
AYA
Fig. 7. Example blog excerpt (“I practiced track and field today”) showing emoticons; in the first blog article, the Emoticon Value is zero; progressing left to right, the Emoticon Value. rise.
The Hiragana Value is the appearance frequency of hiragana in the given blog article, as illustrated in Figure 8. This value can be interpreted as the degree of softness, as expressed in the formula below (where letters do not include pictograms): Hiragana Value = hiragana / letters
(4)
Fig. 8. Example blog excerpts illustrating hiragana; progressing left to right, the Hiragana Value rises
3.2 Blog Templates: Kawaii, Normal, Serious We identified three blog templates as patterns for kawaii, futsu, and majime [5]. Each pattern’s values are detailed below. Majime Value =0.36 ×Conscious Value + 0.33 × Words Value - 0.51 ×Vertical Space Value
(5)
Kawaii Value = 0.31 × Vertical Space Value + 0.26 ×Pictogram Value+0.22×Emoticon Value+ 0.16 ×Words Value + 0.16×Conscious Value - 0.25×Hiragana Value
(6)
Futsu Value =0.28×Vertical Space Value + 0.23×Emoticon Value + 0.16×Hiragana Value - 0.35× Words Value - 0.06×Pictogram Value
(7)
Proposal of the Kawaii Search System Based on the First Sight of Impression
27
We set the Majime Value which can search the blog articles including a lot of numbers of characters, and a few spaces. Kawaii Value can search the blog articles including in a lot of spaces, pictograms and emoticons and words, and Futsu Value can pick out the blog articles including spaces, emoticons, and hiragana. We did scoring from the blog articles that the score is high by using these three values (Table 1). Table 1. The characteristic of each Value Majime Value
Kawaii Value
Futsu Value
High score
Conscious Value Words Value
Vertical Space Value Emoticon Value Hiragana Value
Low score
Vertical Space Value
Vertical Space Value Pictogram Value Emoticon Value Words Value Conscious Value Hiragana Value
Pictogram Value Words Value
3.3 System Structure In this section we describe the Kawaii Search system, which is composed of three building blocks. Overall, the Kawaii Search system is implemented in PHP and MySQL. The first component is the crawler, which downloads the blog articles. The second component is the indexer, which receives blog articles from the crawler and isolates the text. Next, the indexer analyzes the text using Mecab[6], which splits the text into its individual morphemes. In many cases, images, links, and advertisements are included in the given blog article. Our system does not accept any images, except for the pictographs, as determined by the algorithm. The indexer saves only the text, pictographs, emoticons, and
(line breaks) in the underlying database, and it calculates the six values described above and stores them in the database. The third component is the searcher. Users enter keywords and click on an icon categorized as Kawaii, Futsu, or Majime. The searcher obtains the six parameters for the blog article that the user clicked. At the same time, the searcher obtains blog articles in which the keywords match and retrieves the corresponding six parameters; this action occurs via the database. Next, the searcher calculates the difference between the acquired values and the values of the blog article that the user clicked. Scoring is done based on the number of differences (the fewer differences, the better). Finally, the searcher sorts in descending order and displays the results.
4 Evaluation 4.1 Experimental Method By using our Majime, Kawaii, and Futsu indices, we conducted an evaluation experiment to assess whether the searched blog articles represent the personality of the writers. The subjects in this evaluation were 12 college students in their 20s (6 women and 6 men); each participant reported reading blog articles before. In this time we set
28
K. Hashiguchi and K. Ogawa
searching words “Ukeru (where Ukeru means many kinds of meaning such as interesting, receive, catch, fun and so on ),” “Aho(where Aho means cluck),” “Rikujyo (Where Rikujo means track and field, land and so on.)” which are able to search valorous blogs in terms of impression. In the experiment, the top 10 (of 1200) blog articles that were searched using the Majime Value, Kawaii Value, and Futsu Value were used. Subjects were shown the blog articles which present blog articles that have been evaluated by many to be cute, normal, and serious. For each of the top 10 blog articles, subjects were asked to evaluate their similarity with the cute blog article, the normal blog article, and the serious blog article; the following 5-point scale was used: (5) very similar; (4) a little similar; (3) cannot say either; (2) not very similar; and (1) not similar at all. 4.2 Experimental Result and Discuss Experimental Result. Figure 9 shows the mean values of the scores reported by the subjects. The upper side of the table lists the keywords used in the experiment. For example, searching the blog articles that contained the keyword “Aho” by using the Kawaii Value returned articles that many that people found cute, but only a few articles that were found serious or normal. In contrast, searching the articles by using the Futsu Value returned articles that were found to be normal and serious to almost the same extent. .H\ZRUG$KRcluck
.DZDLL
PDMLPH
)XWVX
Keyword : Ukeru (interesting, receive, catch and fun) Keyword :Rikujo(track and field, land and so on)
Fig. 9. Mean Kawaii Value, Majime Value, and Fujitsu Value reported by subjects
Proposal of the Kawaii Search System Based on the First Sight of Impression
29
Discuss. As is evident in Figure 9, for any keyword, the Kawaii Value can be used to successfully select blogs that people find cute. The Majime Value and Futsu Value could not be used as they were similar. The Kawaii value is the indices that a blog article with many pictograms, spaces, emoticons is easy to be picked. Such blog articles are easy to understand the difference of the appearance in comparison with Majime and Futsu blog article. Therefore a blog article that user feel pretty is easy to be searched. In the Majime value that picked out the articles including much number of words and little space between the lines. For search keyword “Ukeru,” results were successful. Because the word “Ukeru” has several possible meanings (i.e. receive, get, and fun), the crawler downloads many types of blogs. However, the other keywords could not be used as they were similar. It is thought that there was a problem for the articles that the crawl downloaded. For search keyword "aho" and “Rikujo”, space tends to become wide if the blog articles have much number of words. For example, in "Rikujo", there are many blog articles written by records of the time, and such a blog article has a lot of words and much space. In addition, with a “aho”, there are many articles transferred from the other sites such as twitter [7] or 2channel[8]. These blog articles have a lot of words and much space between the lines. Therefore, the blog articles with a little space between the lines are hard to be chosen. Futsu Value could not be used as they were similar. There was a problem for the Futsu Value itself. The blog articles that are included many hiragana letters, and have much space are many kinds of appearances. So the Futsu value searched the blog article of various appearances. We need to adjust a blog article doing the crawl and Futsu value.
5 Results and Considerations In this paper, we proposed Kawaii Search to search blogs based on the impression formed by their text on human readers. We performed experiments to verify the utility of the Kawaii Search algorithm. By using experiments, we found that the Kawaii Value produce good results for selecting pretty. However, the Majime Value and Futsu Value did not produce good results. For our future work, we first need to correct the Majime Value and Futsu Value. Second, in addition to the six values described above, there are many factors that affect the impression of a word such as font and color. We need to expand the search algorithm by adding these types of factors. Third, when users read blog articles, the hardware used (for e.g., PC, iPad, smartphone) may have varying screen sizes, which affects the impression of the text. In this paper, we considered a standard PC or laptop screen; we need to consider the size of the screen in which the reader actually reads the blog article. Fourth, we analyzed the precision of our system based on an evaluation by 12 college students; in future, we plan to improve the search results by increasing the number of reviewers. Finally, we plan to solve these problems and improve search efficiency.
30
K. Hashiguchi and K. Ogawa
References 1. Tanabe, K.: Speech Patterns of Japanese Girls or Gals ‒Symbol of Identity and Opposition to Power, OPAL 3. Queen Mary, Univ. of London, London (2005) 2. Lindahl, C., Blount, E.: Weblogs: Simplifying Web Publishing. IEEE, Computer 36(11), 114–116 (2003) 3. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, NY (1983) 4. ameba, http://official.ameba.jp/ 5. Kyoko, H., Katsuhiko, O.: MENKUI SEARCH: Search System Based on the First Sight of Impression Keio University, graduation thesis (2011) 6. mecab, http://mecab.sourceforge.net/ 7. twitter, http://twitter.com/ 8. 2channel, http://www.2ch.net/
Development of a Tracking Sound Game for Exercise Support of Visually Impaired Yoshikazu Ikegami, Keita Ito, Hironaga Ishii, and Michiko Ohkura Shibaura Institute of Technology, 3-7-5 Toyosu Koto-ku, Tokyo, Japan {m110007,m108017,m105005,ohkura}@sic.shibaura-it.ac.jp
Abstract. We developed an exercise support system that the visually impaired can use alone at home. Our system used the entertainment characteristics of games to encourage users to continue exercising. We focused on continuity, fun, and system usability and performed and improved our system by conducting experiments repeatedly. Keywords: visually impaired, system, sports.
1 Introduction Many visually impaired people have the desire to exercise [1]. However, they often cannot exercise for reasons that include no time and no facilities available for visually impaired. Based on this background, we developed an exercise support system that the visually impaired can use alone at home. Exercising must be continued to be effective. Therefore we developed an exercise support game that utilizes fun to motivate continued exercise. After development, we improved the fun of our system based on the results of evaluation experiments. In addition, we focused on its usability, and improved it so that the visually impaired can use it even when they are alone.
2 Development 2.1 Outline Figure 1 is an overview of the system. We employed a bicycle-type device for the use at home because it needs relatively small space. The system was controlled by a PC. The device sends such data as the degrees of leaning of the steering wheel and the numbers of pedal rotations to a PC, which sends sound data to the headphones. 2.2 Outline Because our previously developed system for the visually impaired that employed sound localization got favorable comments [2], we also employed sound localization for our new system. Fig. 2 shows an image of the game. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 31–35, 2011. © Springer-Verlag Berlin Heidelberg 2011
32
Y. Ikegami et al.
PC Sound Headphones
Data (degree of leaning)
Steering wheel
Pedal Data (pedal rotation) User
Fig. 1. Overview
Users
Target
Fig. 2. Image of game
The game flow is as follows: 1. Users wear the headphone and ride the bicycle-type device. 2. The PC outputs a target sound that is assumed to be located around the users.
Development of a Tracking Sound Game for Exercise Support of Visually Impaired
33
3. Users pedal the bicycle to catch the target to which they get closer to the target by handling the steering wheel and pedaling. 4. When users reach the target, it disappears, and a score is given. Then, a new target appears. 5. After a certain number of targets have appeared, one stage is finished, and the next stage begins. 6. When three stages have finished, the users receive by voice their total score, ranking, and mileage. We employed a lion’s roar and the sound of maracas as the target sounds.
3 Evaluation 3.1 Experiment We experimentally evaluated our prototype system with 20 subjects: seven were totally blind, five had amblyopia, and eight were unimpaired. We recorded the log data of the system and performed five-point scale questionnaires. The questionnaire results that focused on fun are shown in Fig. 3. The game was evaluated as enjoyable by both the visually impaired and unimpaired.
Totally⏷䦁劔 blind
Amblyopia ㇀尥劔
㤃䧋劔 Unimpaired
Fig. 3. Questionnaire results about system’s fun
34
Y. Ikegami et al.
3.2 Improvement of System Based on the questionnaire results, we made the following improvements. • • • • •
Added a ranking function. Added a function to raise the degree of difficulty when a user plays well. Added a function for pedaling backward. Added various game modes and target sounds. Simplified the game’s start.
For raising the difficulty, targets appeared at the positions that are difficult to localize and the time limit was shortened. We also added a new game mode with new target sounds such as sheep bleats and dog barks. 3.3 Evaluation Experiment of Improved System We re-evaluated our system after the above improvements. We set it up in a school for visually impaired students, and 17 students freely used it for two months. As in previous experiment, we recorded log data and performed questionnaires whose main items were the following. • Which parts of the system should be improved? • Which elements of the system were enjoyable? Figure 4 shows how many times each student used the system. Although some used it more than ten times, many students only used it a few times. This indicates that our current system needs more improvement to increase continuity. Figure 5 shows the questionnaire result that focused on the enjoyable system elements. Riding a bicycle-type device and getting scores were considered fun.
Students Times system used Fig. 4. Frequency that system was used
Development of a Tracking Sound Game for Exercise Support of Visually Impaired
35
(1) Catching sounds
(2) Operating bike
(1)
(3) (4)
(2)
(5) (3) Exercising
(4) Getting scores 0%
20%
40%
60%
80%
100% (5) Other wise
Fig. 5. Questionnaire results: enjoyable elements
4
Conclusions
In this study, we developed a system that encourages the visually impaired to continue exercising at home. The evaluation experiment results suggest that our system provided enjoyable support for continued exercise for the visually impaired. Data from an evaluation experiment of our improved system, showed advancements of continuity and fun and what elements were considered fun. We will implement a competition function because of such a strong demand from the young visually impaired students.
References 1. National rehabilitation center for persons with disabilities: A report of survey about exercise and sports for graduates of national rehabilitation support center for visually impaired (in Japanese), http://www.rehab.go.jp/achievements/japanese/19th/paper20.html 2. Ishii, H., Inde, M., Ohkura, M.: Development of a game for the visually impaired. In: 16th World Congress on Ergonomics, IEA2006, CD-ROM, Maastricht (July 2006)
From Personal to Collaborative Information Management: A Design Science’s Approach* Mei Lu¹ and Philip Corriveau² 1 Samsung Information Systems America 75 West Plumeria Drive, San Jose, California, USA 2 Intel Corporation, 5200 NE Elam Young Parkway, Hillsboro, Oregon, USA
[email protected],
[email protected]
Abstract. This paper reports findings from evaluation of five solution concepts aimed to address challenges in managing projects, tasks, and different modes of work. Both users and Information Technology (IT) departments appeared to best resonate with the concept of “project workspace,” which was conceptualized as a persistent space that allowed users to organize, track, and resume active project work. Even though users agreed that multi-tasking and interruptions were characteristics of their daily jobs, they did not desire mechanisms to block or alter information flows. Instead, users wanted information management to be streamlined in the contexts of collaboration and teamwork. The most desirable scenarios included spontaneous retrieval of information related to a customer or colleague, quick information assembly for different phases of a project, effective management of team tasks, and seamless connection from personal to team workspaces. Keywords: Information management, search, tasks, projects, collaboration.
1 Introduction Abundant research has studied business users’ challenges in personal information management. Three main areas of issues are documented: 1) Interruptions and multitasking -- Users typically handle multiple tasks on their computer, and work with frequent interruptions from both internal and external sources; 2) Source and tool fragmentation -Users’ work or decision-making often relies on information from disconnected sources, devices, or applications; 3) Challenges with information organization -- Users spend minimum effort to organize information thus later have challenges in finding and re-using useful information. Two main gaps exist in the literature: 1) Most research focuses on end user perspectives; few has examined IT departments or overall business’ views; 2) While a number of design implications have been generated from different studies, fewer projects have evaluated multiple potential solutions in business settings, and compared benefits and priorities. This research adopts a design science research [11]’s approach, which creates and evaluates IT artifacts intended to solve identified organizational problems. The authors provide seven guidelines for an effective design *
The research was conducted when the author was working in Intel Corporation.
G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 36–45, 2011. © Springer-Verlag Berlin Heidelberg 2011
From Personal to Collaborative Information Management: A Design Science’s Approach
37
science research –“design-science research requires the creation of an innovative, purposeful artifact (Guideline 1) for a specified problem domain (Guideline 2).” (p.8) It requires thorough evaluation of the artifact (Guideline 3), and innovative solving of known or unknown problems (Guideline 4). The artifact must be rigorously defined (Guideline 5) based on the construction of the problem space (Guideline 6). “Finally, the results of the design-science research must be communicated effectively” (Guideline 7). These seven guidelines can be summarized into four major milestones: 1) A thorough understanding of the problem space; 2) Creation of innovative artifacts; 3) Evaluation of the effectiveness of the artifacts; 4) Communication of results. In this project, we first tried to understand challenges business users face in personal information management through literature reviews. Secondly, with team brainstorming and ideation, five potential solution concepts were proposed. Afterward, to evaluate the concepts and further explore ideas for desired solutions, indepth interviews were conducted with 28 participants from 14 US companies. The remainder of the paper is organized according to the major steps of design science research whereby we start with a definition of the problem space, then discuss designs or proposed solution concepts. Further, we report findings from the evaluation research, and provide recommendations for design and development.
2 The Problem Space: Challenges in Information Management Personal information management refers to individual user activities to acquire, organize, or retrieve information on personal computers. The dynamic and complexity of personal information management in business environments have increased dramatically in the past decades, as individual work, collaboration, and business operations are all becoming more computerized. Challenges are well documented. They can be summarized into three main areas. Multi-tasking and interruptions: Users typically handle multiple tasks on computers, and work with frequent interruptions from both internal and external sources [2, 7, 10, 12, 16]. Interruptions cause distraction from the current work, redundant effort, forgotten tasks, and cognitive burden and delays when users try to resume previous work [12, 16]. Current information organization tools do not provide adequate support for frequent task suspensions and resumptions. Long-term projects are more complex and harder to return to than short-term activities. Further, as pointed out by Chudoba, et al. [6], the issue of multi-tasking and interruptions can be potentially worsened by the situation that employees in large companies typically work simultaneously in multiple teams and projects. Information source and tools fragmentation: Users’ work creation or decisionmaking often depends on information from disconnected sources or applications [3, 13, 15]. Lu, et al. [15] demonstrate that variety and fragmentation of tools and information sources is detrimental to teams’ performance. Further, the tools people use are often inadequate for their goals [14]. For example, Bellotti, et al. [2] report that users embed extensive task management activities into e-mail communication, which leads to ineffectiveness in managing priority, deadlines, and workload. Several other studies have observed that e-mail is overly and inadequately used for multiple purposes, such as task and contact management, and personal archiving [8, 18, 19].
38
M. Lu and P. Corriveau
Burdens of information organization and re-use: Most users devote little time to organize information on computers [5]. For those who do organize, they have to spend scattered efforts to classify different types of information, such as e-mail, documents and web bookmarks [5]. Storing information in different formats and structures leads to challenges in retrieving and re-using information [9]. To summarize, key user needs for personal information management include: 1) Ability to easily organize information and preserve contexts for different task and projects, so that users can resume tasks or work after temporary suspension [7,12,16]; 2) Assistance with tracking, organization and reminder of short term activities; 3) Integration of information from different sources so that users can easily access all information related to different topics of work [3, 13, 17]; 4) Tools to contextualize interruptions -- As observed by Mark, et al. [16], collocated employees work longer uninterrupted sessions than distributed employees, suggesting collocated employees are more likely to be interrupted at natural breaks. Tools may help users better communicate work contexts so that interruptions can be more appropriately timed.
3 Design Artifacts: The Solution Concepts A team of 19 people in Intel gathered for a 6-hour brainstorming and ideation session. The participants included user researchers or human factors engineers, designers, market researchers, marketing and strategic planners, platform architects, software architects from Intel’s IT department, and a researcher from an external research firm who facilitated the discussions. The brainstorming session started with a review of literature and research data on user needs compiled by a user researcher and market researcher. The meeting was followed by a team discussion on information assistance, or how computers or intelligent agents can assist users with information management. In order to stimulate innovative ideas, the team first tried to generate metaphors and real world analogies of “assistance.” Examples included: a golden retriever helps his or her owner to fetch newspapers in the morning; an employee at Home Depot helps a disoriented customer at a store; a tunnel that helps travelers to get to the destination quicker; animals leave traces along the way they have walked by; a person’s frustration when he or she cannot find a car key at home, and actions he or she may take to prevent that. The team was then divided into four sub-groups to brainstorm on solutions. The brainstorming concluded with a consolidation of ideas from subgroups. Afterward, a smaller team of researchers and designers continued to refine the ideas into five solution concepts, which are summarized in Table 1.
4 Evaluation of the Artifacts: The In-Depth User Research Semi-structured in-depth interviews were conducted with a total of 28 employees from 14 businesses in two US cities: the New York City and Kansas City. Of the businesses, three were small businesses with fewer than 100 employees, five were mid-sized (100-1000 employees) and six were large enterprises (>1000 employees). They came from a variety of industry verticals. We interviewed two people separately from each business – a knowledge worker and an IT decision maker, who might be the chief information officer (CIO), directors/managers of IT departments, or those who served as IT people in small businesses.
From Personal to Collaborative Information Management: A Design Science’s Approach
39
Table 1. Proposed solution concepts and targeted user needs Solution concepts Project workspace
Easy search
Proactive search Task handler
Mode selector
Summary
Targeted needs
A persistent space that organizes and tracks all relevant information for each project. Users no longer need to switch from one application to another to view different elements for a project – all elements can be viewed in one “workspace.” Previous contexts are preserved if interruption occurs. Users can easily search across both internal (e.g. a PC’s hard-drive or company network) and external (e.g. the Internet) information sources. Searches can be extended across different file types, and can be initiated from any application. It learns about users’ interests and activities, and proactively seeks and compiles information relevant to a user’ job or tasks. It enables users to use handhelds or computers to create and track tasks. Users use handhelds to (e.g., through voice commands) communicate to their computers to create or update tasks. Further, the program will automatically analyze and structure tasks to track due dates, reminders, and identify bottlenecks in workload. Users can select different work modes (e.g., “In a Meeting,” and “Presenting”). Depending on the mode, the solution reconfigures a user’s desktop and applications, and communicates the user’s status. Non-essential elements are faded into the background, and non-critical incoming communications may be blocked or routed.
Integration of information from different sources; preservation of work contexts; easy work resumption after interruption Reduced needs for manual information organization. Integration of information from different sources. Integration and notification of useful information. Assistance with tracking, organization and reminder of short-term tasks and activities.
Interruption reduction or contextualization; Better access to tools and information needed for different modes of work.
The two-hour interviews were conducted in participants’ offices, and had two parts. The first part was about users’ current practices in information management. In the second part, we evaluated the five solution concepts described with visual storyboards and scenarios. According to Hevner et al. [11], IS research “must address the interplay of business strategy, IT strategy, organizational infrastructure and IS infrastructure.” (p. 4). Thus in the interviews, we asked participants to assess the usefulness and uniqueness of the proposed solutions from different perspectives. IT decision makers were invited to discuss perceived benefits of the solution concepts and major adoption hurdles from IT and their company’s business perspectives. With knowledge workers, we evaluated how the solutions might be relevant to work effectiveness and efficiency. At the end, the participants were invited to provide ratings along a five-point scale on the usefulness and uniqueness of the concepts. In this qualitative research, the quantitative rating was used as a way to stimulate deeper thinking on why a participant liked or disliked a concept, and the extent of its relevancy for his or her job. Participants were encouraged to support their viewpoints with real examples, or demonstrate on their computer on issues they had encountered.
40
M. Lu and P. Corriveau
4.1 Information Flows and Personas We identified two kinds of information flows that were both critical to users’ daily jobs or business operations: structured and unstructured. Structured information was data stored in databases with consistent formats, and could be computed and analyzed by computers. Examples included data in manufacturing production, inventory, financial, customer relationship management, and business intelligence systems. Unstructured information was content that could not be easily interpreted and analyzed by machines, such as e-mail messages, web sites, and documents. For both structured and unstructured information, users’ daily information flows involved internal colleagues and external collaborators, for example, customers, suppliers and partners. Personas are archetypical representation of major categories or segments of users [1]. By creating personas, we sought to typify user needs, values, and behavior patterns, and assist designers and architects to envision solutions against those needs. We observed four major business personas who had different patterns of priorities. Senior management: His or her workdays were highly unstructured. E-mail and meetings were the two most important communication mechanisms. Newly received information often determined what they needed to do next. Constant access to e-mail and business information (e.g., budget information in the financial database) was critical. They had a strong preference for a device that was wearable in a pocket and allowed them to be always connected with business information and key staff. Road warriors: They worked with both structured and unstructured information while on the road. They typically had job roles such as sales, account managers, and customer services. Primary goals were to be instantaneously prepared for their customer meetings, quickly solve problems, or easily relay information back to their company on customer orders, issues, or market trends. Office workers: They were mainly knowledge workers or middle management. They typically worked in offices more with unstructured information. They had challenges in getting information organized and shared, quickly finding related information for work creation or completion. Special workforces: A large portion of the workforces performed business critical functions with structured data from thin clients, which were light-weight computing devices mainly served as connecting points between servers and monitor displays. IT described a number of advantages for using such devices, including ease of security management; ease of device deployment and replacement without the needs to install the operating system, applications and data; and ease of update and maintenance. 4.2 Acceptance of the Solution Concepts In this section, we summarize user feedbacks to the five solution concepts. Project Workspace. This concept appears to be best received. It appealed more to knowledge workers/middle management than to other personas -- senior management and road warriors’ work tended to be less structured around projects; special workforces typically worked with information in structured databases. The concept received an average usefulness rating of 4.2 out of 5, and a uniqueness rating of 4.1.
From Personal to Collaborative Information Management: A Design Science’s Approach
41
Users could identify the benefits we had hypothesized for this concept, including, one location for different types of information for a project; easy retrieval of information related to a particular topic; easy focus or resumption of tasks after interruptions; easy accumulation of project history and reservation of work contexts; and potentials for easy sharing of projects information and more effective collaboration. The main question users asked was how easy it might be to learn to use it. IT was particularly concerned with the potential learning curve, and regarded it highly detrimental to IT’s reputation if users were unwilling to adopt a solution IT deployed. In addition, IT questioned whether the solution would create multiple layers of information organization and duplication of files. The concern was about the potential extra burden for data storage management and file backup. Task Handler. This concept appealed more to senior management and road warriors. It received a usefulness rating of 3.9, and uniqueness of 3.7. It was hypothesized to allow more effective management of tasks that requires immediate or near term actions. However, users identified more with benefits that were initially not in the core of hypothesis, such as • Voice inputs into PC: Several participants, especially senior management or people with a support staff, viewed voice inputs as more efficient than typing. For example, in a law firm, lawyers often used audio dictation as a way to communicate with assistants. Participants mentioned that they could use such a tool to create voice mails or delegate tasks to other people. Participants also perceived benefits in using a voice tool to create inputs into structured databases. For example, a sale person could use this tool to record customer requests or feedbacks in a customer relationship management system, immediately after he or she left the customer’s office; or a railroad operator could use it to easily update the status of a cargo car in databases. • Spontaneous capture of thoughts: such a tool allowed users to capture thoughts which they might forgot later on, as the operation manager in a large retail company said, “I am kind of tangent thinker. I might be walking down the street and thinking of something. This would enable me to kind of document it and get it active.” IT expressed more concerns on this concept than knowledge workers. One was security. If such a device would constantly sync up with their computers or enterprise’s networks, IT questioned the cost and burden to implement the virtual private network (VPN) for the device. IT and senior management were not convinced of the justification of such a potentially expensive device only for the purposes of tracking and managing daily tasks. They regarded task management a basic skill from their employees. They expected such a device to perform more business-critical functions, for example, for users to use voice to create inputs into business operation databases, particularly at places without computer access or network connection; for users to manage team tasks or quickly relay tasks to other people; or for senior management to access and manage business critical tasks on operation databases. Easy Internal and External Search. This concept received an average usefulness rating of 3.9, and uniqueness of 3.3. An easy search tool was regarded a must-have, however, it would not satisfy all needs for information retrieval and organization.
42
M. Lu and P. Corriveau
Participants mentioned that they still would like to browse first, and use search as the last resort. Search was especially important for finding historical documents either created by self or others in the company. Key needs include speed of search; ability to search across own hard drives and shared network repositories; and search within different file types, including documents, e-mails, calendars, PDF files, and binary files (for example, computer aided design documents). Several IT managers said that they did not see the needs to combine both internal and external searches – ‘They serves different purposes.” And “there might be security issues.” In addition, IT managers were concerned with the burden of search indexing, storage maintenance, backup, and access right control; as well as potential impact on the performance of document management systems. Proactive Search. This concept received an average usefulness rating of 3.7 and uniqueness of 3.4. It appealed more to senior management or those people who worked with time-sensitive information. For example, a person who managed investor relationships mentioned that it was extremely important to stay current with any thing people might say about the company. The participants could identify the potential values of proactive and tailored information delivery, which would integrate and track information from different sources, and allow them to be constantly informed about topics of interests. However, they worried that the tool would deliver more irrelevant information and thus cause overload or distraction. Some participants mentioned that they would like to retain the control on when and how to find information, as one participant said, “I would rather search when I think I need it.” Or as another user puts it, “I don’t want my computer to be too smart. It is just a tool.” In addition, several participants asked about their privacy – “will I be always tracked on what I read?” Mode Selector. The participants typically didn’t find this concept appealing. It received an average usefulness rating of 3.0 and uniqueness of 3.7. They didn’t consider themselves working in different modes. More importantly, they seemed to be proud of the dynamic of their workdays created by different information flows, and regarded it an indicator as being successful in workplaces. Even if they might need to work without interruption, for example, when in a meeting with executives, they didn’t see the needs of any sophisticated applications to block or alter incoming communication flows, or help to rearrange their information or applications. We sensed a strong sentiment that users didn’t like something that implied that they needed help organizing their work or remembering their tasks. “If you have something important to do, you really should be just doing it yourself. You shouldn’t need this, all this stuff to help you ... And this seems like it’s something for people who can’t concentrate...”
5 Discussion and Recommendations Even though personal information management is an individual action that happens on personal computers, we find that the most important needs can be best described within the contexts of users’ collaboration with internal and external teams. From the participants’ discussions about their current practices of information management,
From Personal to Collaborative Information Management: A Design Science’s Approach
43
and their reactions to the potential solution concepts, their information management priorities are anchored around mainly two notions: projects and people. Both knowledge workers and IT regard effective management of information related to projects and customers as most critical to their work and business objectives. They desire solutions that integrate both structured and unstructured information and streamline the flows in different phases of project activities. They desire solutions that support quick information retrieval and re-use for better services or support to either internal colleagues or external customers. User activities and needs appear to evolve as a project advances through its life cycle. Three phases of project activities are identified from the study: exploration, creation, and conclusion. Project-based information management is intertwined with management related to three main groups of people: customers, experts and team members. Exploration and Initiation. As one participant said, “a project often becomes a project after a fair amount of discussions.” A good search tool is critical to identify 1) existing documents or information that can be re-used for a potential project; 2) expertise in the company that is relevant for project decision or formation of teams. As a Vice President in a legal firm said, “quick (document) turn around (to a customer) is critical.” A desirable scenario is for users to be able to use a search tool to pull information from different sources, such as shared document repositories, email messages, and structured databases. Afterwards, users can easily identify reusable information (for example, budget, scope and duration of a similar project), quickly form a proposal or project plan for a potential customer. As the project is formally established, a user can gather all the relevant information from earlier exploration and discussion, create a formal workspace for next phases of the project. Work Creation. Consistent with several studies [3, 7, 15, 17], users need to have better ways to organize active work and information related to projects. Currently, a user may find active information related to a project scattered in different applications, such as e-mail, word processing, spreadsheet, and presentation applications. There is no effective mechanism to easily bring different information related to a topic in one view. Over the time, users need to easily classify and accumulate information from different sources or of different types, e.g., e-mail, documents, instant messaging, notes, web pages, and work history, in a persistent space. While the concept of “project workspace” was presented as workspace for individuals, the participants wanted their personal workspace to be seamlessly connected to a team space thus individual work can be shared with teams. During the course of work, users will often need spontaneous information retrieval for unplanned events. As described by one participant, “everything is planned, but it could change or shift around depending on what priorities take place and what new information is discovered.” Examples of useful scenarios can be: when a user receives a phone call from a customer inquiring about project status, he or she can use one interface to retrieve all information related to the customer, including e-mail messages, customer account information in a database, or attachments this customer has sent via e-mail, but has been saved to a hard drive. As a participant said, “When a phone call comes in, if it (software) automatically pulls up the files, that would be a
44
M. Lu and P. Corriveau
fabulous thing for a sales person.” Users may use similar tools to quickly gather information in a meeting about meeting topics, presenters or attendees. Users need mechanisms that unobtrusively help them capture and track their tasks. Task management needs to be embedded in applications or other information management tools. Blandford and Green [4] identify important factors determining people’s choices of tools for time and task management, as how portable, accessible, shareable, and updateable tools are. From this study, the majority of users informally manage short-term tasks and commitments with tools that they are already familiar with, for example, e-mail, documents, calendars, and sticky notes. Very few of them spend time setting up a formal task lists using a dedicated tool. Users seem unwilling to invest time to learn a new tool; IT or senior managers are not eager to invest for extra hardware or software. Management of shared or delegated tasks is regarded as more important than that of own individual tasks. An example of desired scenarios is creating a task list for a team can be as easy as creating a table in a document or email message. Once the list is sent or saved in the team space, the tasks are automatically updated in different owners’ individual task lists. A user can also easily “tag” information embedded in different sources to update own task list, for example, highlight a sentence in an e-mail, and select to tag it as “my task”. Project Conclusion. Upon successful completion of a project, users and IT want to easily archive all related information. IT wants to manage the content or information according to document retention policies, for example, in compliance with legal requirements. For end users, the primary goal is to ensure easier retrieval for future projects or references. A project workspace as described above can help users to accumulate all related information over the course of creation so that the burden is not huge at the end of the project. As a conclusion, while reporting findings on user needs and desired scenarios for collaborative information management, we have described a methodology designed to gather in-depth feedbacks from business users on values of potential technologies at the early stage of a product or strategic planning cycle. With this method, we seek to gather information from three perspectives -- 1) End users: whether a solution can fit into and improve users’ daily jobs, and help them better fulfill their primary goals; 2) IT: whether a solution can be easily managed and supported by the IT departments; 3) Business, in this study, represented by IT whose decision on IT investment is tied to business needs: whether a solution is tied into overall business objectives or strategies. Within Intel, these data have been used, in conjunction with quantitative data, to inform decision-making on platform feature prioritization or ecosystem enablement. These data are also used to create detailed user scenarios to inform technical capability design or gap analyses.
References 1. Anderson, G., Bramlett, B.W., Gao, J., Palmer, R., Marsh, D.: Intel Usage-to-Platform Requirements Process. Intel Technology Journal 11(1), 23–34 (2007) 2. Bellotti, V., Ducheneaut, N., Howard, M., Smith, I.: Taking Email to Task: the Design and Evaluation of a Task Management Centered Email Tool. In: Proc. CHI 2003, pp. 345–352 (2003)
From Personal to Collaborative Information Management: A Design Science’s Approach
45
3. Bergman, O., Beyth-Marom, R., Nachmias, R.: The Project Fragmentation Problem in Personal Information Management. In: Proc. CHI 2006, pp. 271–274 (2006) 4. Blandford, A.E., Green, T.R.G.: Group and Individual Time Management Tools: What You Get Is Not What You Need. Personal and Ubiquitous Computing 5, 213–230 (2001) 5. Boardman, R., Sasse, M.A.: Stuff Goes into the Computer and Doesn’t Come OutA CrossTool Study of Personal Information Management. In: Proc. CHI 2004, pp. 583–590 (2004) 6. Chudoba, K., Wynn, E., Lu, M., Watson-Manheim, M.B.: How Virtual Are We? Measuring Virtuality in a Global Organization. Information Systems Journal 15(4), 279– 306 (2005) 7. Czerwinski, M., Horvitz, E., Wilhite, S.: A Diary Study of Task Switching and Interruptions. In: Proc. CHI 2004, pp. 175–182 (2004) 8. Ducheneaut, N., Bellotti, V.: E-Mail as Habitat: An Exploration of Embedded Personal Information Management. Interactions, 30-38 (September – October 2001) 9. Dumais, S., Cutrell, E., Cadiz, J.J., Jancke, G., Sarin, R., Robbins, D.C.: Stuff I’ve Seen: A System for Personal Information Retrieval and Re-Use. In: Proc. SIGIR 2003, pp. 72–79 (2003) 10. Gonzalez, V.M., Mark, G.: Constant, Constant, Multi-tasking Craziness: Managing Multiple Working Spheres. In: Proc CHI 2004, pp. 113–120 (2004) 11. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design Science in Information Systems Research. MIS Quarterly 28(1), 75–105 (2004) 12. Iqbal, S.T., Horvitz, E.: Disruption and Recovery of Computing Tasks: Field Study, Analysis, and Directions. In: Proc. CHI 2007, pp. 677–686 (2007) 13. Karger, D.R., Jones, W.: Data Unification in Personal Information Management. Communication of the ACM 49(1), 77–82 (2006) 14. Kratkiewicz, G., Mitchell, G.: An Adaptive Semantic Approach to Personal Information Management. In: Proc. IEEE International Conference on Systems, Man and Cybernetics, pp. 1395–1400 (2004) 15. Lu, M., Watson-Manheim, M.B., Chudoba, K.M., Wynn, E.: Virtuality and Team Performance: Understanding the Impact of Variety of Practices. Journal of Global Information Technology Management 9(1), 4–23 (2006) 16. Mark, G., Gonzalez, V.M., Harris, J.: No Task Left Behind? Examining the Nature of Fragmented Work. In: Proc. CHI 2005, pp. 321–330 (2005) 17. Pickering, C., Wynn, E.: An Architecture and Business Process Framework for Global Team Collaboration. Intel Technology Journal 8(4), 373–382 (2004) 18. Whittaker, S., Sidner, C.: Email Overload: Exploring Personal Information Management of Email. In: Proc. CHI 1996, pp. 276–283 (1996) 19. Whittaker, S., Bellotti, V., Gwizdka, J.: Email in Personal Information Management. Communication of the ACM 49(1), 68–73 (2006)
A Classification Scheme for Characterizing Visual Mining Elaheh Mozaffari and Sudhir Mudur Concordia University, Computer Science & Software Engineering, 1515 St. Catherine St. West, Montreal, Quebec, Canada {e_mozafa,mudur}@cs.concordia.ca
Abstract. Visual mining refers to the cognitive process which integrates the human in analysis of information when using interactive visualization systems. This paper presents a classification scheme which provides user-centered representation of goals and actions that a user performs during the visual mining process. The classification scheme has been developed using contentanalysis of published literature containing precise descriptions of different visual mining tasks in multiple fields of study. There were two stages in the development. First, we defined all the sub-processes of visual mining process. Then we used these sub-processes as a template to develop the initial coding scheme prior to utilizing specific data from each of the publications. As analysis proceeded, additional codes were developed and the initial coding scheme was refined. The results of the analysis were represented in the form of a classification scheme of the visual mining process. The naturalistic methods recommended by Lincoln and Guba have been applied to ensure that the content analysis is credible, transferable, dependable and confirmable. Keywords: Visual mining, large dataset analysis, human information behaviour.
1 Introduction In today’s applications, data are becoming available in large quantities. Fields as diverse as bioinformatics, geophysics, astronomy, medicine, engineering, meteorology and particle physics are faced with the problems of making sense out of exponentially increasing volumes of available data [1]. Therefore, one of our greatest challenges is to take advantage of this flood of information and turn raw data into information that is more easy to grasp and analyze. Over the years, a large number of interactive visualization systems have been developed, all claiming to help users analyze, understand and gain insight into the large quantity of available data through appropriate transformations of the raw data into visual representations. We refer to the human analytical process that uses such visually represented information as being the Visual Mining (VM) process. It concerns the cognitive process which integrates the human factor during the course of mining and analyzing information through the visual medium. It contributes to the visual discovery of patterns which form the knowledge required for informed decision making. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 46–54, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Classification Scheme for Characterizing Visual Mining
47
Purely from a technology perspective there are many studies which have focused on techniques and tools for building up visualization systems. Also, the importance of understanding users’ workflows and practices has been recognized by many researches [2-4] . Jim Gray points out that without integration of users’ workflows and interactions with the information, even the best system will fail to gain widespread use [5]. However, there are to date no reports on studies from the perspective of user behavior in visual mining of large data sets. To understand users of large datasets while performing visual mining, studies about users’ information behaviours are critical. Information behavior is defined as: “The totality of human behavior in relation to sources and channels of information, including both active and passive information-seeking, and information use” [6]. However, as previously mentioned, studies with specific focus on scientists’s information behaviour (how they look for required information and actually use them) in the visual mining process are rare. The study that is reported in this paper answers this call, and aims to improve our understanding of information behaviors of users activities during the process of visual mining of large datasets. The rest of the paper is organized as follows. In section 2 we review the related studies and identify the problem. In section 3 we describe and justify our choice for the methodology used for addressing this problem. Section 4 includes description and justification of our method of chosing visual mining case study samples and the analysis of user information behavior in visual mining in these samples. Results of this research are presented in section in section 5. Section 6 concludes this paper and discusses potential for future work.
2 Background Information behaviour has been the focus of many researches in the few last decades in the field of library and information science. The highlights of studies on information behaviour include Wilson's (1981) model of information-seeking behaviour [7] , Dervin's (1983) sense-making theory [8] , Ellis's (1989 and 1993) behavioural model of information seeking strategies [9,10] , Kuhlthau's (1991) model of the stages of information seeking behaviour [11], Belkin’s (1993) characterization scheme of information-seeking [12] and Wilson's (1997) problem solving model [13]. The studies presented above are however inadequate with regard to their suitability for representing user’s information behavior in VM process. These studies are cannot completely model user information behavior in visual mining. Their sole goal is to describe the information-seeking activity, the causes and consequences of that activity, or the relationships among stages in information-seeking behavior [14] . For example the model proposed by Belkin et al. represents dimensions of information seeking behaviors in information retrieval system. Information seeking is one of the sub-process of VM process (as we shall explain in more details in section 3) therefore it can not completely describe VM process. In addition, the studies done with library patrons focus on the user’s tasks that are perhaps learned behaviors due to their prior knowledge of how libraries work. They tend to ask questions that they know can be answered. Visualizations might support a different way of asking questions and getting answers [15].
48
E. Mozaffari and S. Mudur
This paper presents a study that began with the aim of extending our understanding of the interdisciplinary process of visual mining, and in doing so looked to strengthen and improve our understanding of user’s information behavior in the visual mining process. The study has yielded a classification scheme for characterizing visual mining. Such a classification scheme has many different applications: support requirement analysis in system engineering of interactive visualization software, studying and assessing different tasks which typically occur in visual mining, improving the functionality and interface design of newer interactive visualization systems and by providing a system-independent taxonomy of tasks, it can be used for evaluating and classifying existing interactive visual mining systems based on what they support.
3 The Sub-processes of Visual Mining Today, many different groups around the world are undertaking research on visualization for data mining and analysis in order to effectively understand and make use of the vast amount of data being produced. Different terms are used to describe this process, visual data mining [16, 17], visual exploration [18] and visual analytics [19] to name but a few. In addition, there appears to be some variation in understanding that people have of the process even under the same term. Niggemann [20] defined visual data mining as visual representation of data close to the mental model. Ankerst [21] considered visual data mining as a step in the knowledge discovery process which utilizes visualization as a communication channel between the computer and the user. Visual analytics is defined as an iterative process that involves information gathering, data preprocessing, knowledge representation, interaction and decision-making. In order to gain further insight, it integrates methods from knowledge discovery in databases, statistics and mathematics, together with human capabilities of perception and cognition [22] . From these definitions, one common theme that can be recognized is that they all rely on the human visual channel and take advantage of human cognition. They also emphasize on the three aspects: task, visualization and process. From the above approaches it may be noted that in VM, data visualization takes place either before or after a data mining algorithm, or perhaps even independently. For the purposes of our research, however, we will focus on the human involvement in the visual data exploration process which utilizes human capabilities to perceive, relate and conclude. We consider VM as a hypothetical formation process that primarily uses the visual medium. Such visualization allows the user to interact directly with visually represented aspects of the data, analyze, gain insight and perhaps even formulate a new hypothesis. Later on, the user can evaluate the best possible hypotheses and make a judgment based upon it. In fact, this visual information exploration process helps to form the knowledge for informed decision making. In order to identify analysis tasks and human information interactions in the VM process, we first look at how analysis works and then extend its sub-processes to the context of VM. The analytical process itself is both structured and disciplined. Usually analysts are asked to perform several different types of tasks such as assessing, forecasting and developing options. Assessing requires the analyst to
A Classification Scheme for Characterizing Visual Mining
49
describe their understanding of the present world around them and explain the past. Forecasting requires that they estimate future capabilities, threats, vulnerabilities, and opportunities. Finally, developing options in order to establish different optional reactions to potential events and assess their effectiveness and implications. The process begins with planning. The analyst must determine how to address the issue, what resources to use, and how to allocate time to various parts of the process. Then, they must gather relevant information and evidence in order to relate them to their existing knowledge. Next, they are required to generate multiple candidate explanations in the form of hypotheses, based on the evidence and assumptions, in order to reach a judgment about which hypothesis can be considered to be the most likely. Once conclusions have been reached, the analyst broadens their way of thinking to include other explanations that were not previously considered and provide a summary of the judgments they had made [19]. Building upon the above description of the analytical process, we defined the subprocesses of VM as follows: 1. The user initiates the VM process by planning how to address the issue, what resources to use and how to allocate time to various parts of the process to meet deadlines. The next step is to gather all relevant information by seeking information through searching, browsing, monitoring and generally being aware [23]. 2. Searching refers to active attempts to answer questions, look for a specific item or develop understanding around a question or topic area. Browsing is an active yet undirected form of behavior. For example, when performing physical acts such as 3D navigation tasks or scrolling/panning, the user has no special information need or interest, but becomes actively exposed to possible new information. Monitoring is a direct and passive behavior. The user does not feel such a pressing need to engage in an active effort to gather any information but may be motivated to take note of any expected or unexpected information. Also, when the user has a question in mind, and may not be specifically acting to find an answer, they would take note of any relevant information that appears. Being aware is a passive undirected behavior and is similar to browsing except that the user could locate information or data unexpectedly. 3. The next step in the VM process is to relate the findings with the knowledge that is hidden in the expert's mind. 4. Based on the findings, the user then generates multiple candidate explanations in the form of hypotheses. By applying analytical reasoning the user can use their prerogative to either confirm or reject any hypothesis and formulate a judgment about which is the most relevant. 5. Once conclusions have been reached, the user will be engaged in the act of broadening their thinking to include other possible explanations that were not previously considered. Then the user summarizes analytical judgment either as assessment, estimation or evaluation of options depending on the goal. 6. As the concluding step, the user usually creates a product to include the analytical judgment in the form of reports, presentations or whatever other form of communication is deemed appropriate.
50
E. Mozaffari and S. Mudur
4 Method In this section, we describe our methodology for characterizing visual mining process. But first, we provide the justification for choice of our methodology. Surveys and interviews are the most common research methods for studying users’ information behavior [24]. As we know, in a typical user study or survey, user’s motivation, knowledge and expertise considerably influence user performance and thus the final conclusions. Of course, using domain experts provides more realistic results [25]. However, it is not easy to employ enough participants (domain experts) for interviews and surveys in this type of study, nor is it possible to have access to them for any extended period of time because most of the experts are distributed across different external institutions. Therefore we turned to scientific publications which, in general, clearly record the behavior of experts while being engaged in the visual mining process and equally importantly are also peer reviewed. We adopted the qualitative direct-content analysis approach [26], to reveal the visual mining behavior of scientists from such publications. Qualitative content analysis is an unobtrusive method which uses nonliving form of data, generally categorized as texts. And it is well established that one kind of text that can be used for qualitative data inquiry in content analysis is official publications [27, 28]. The advantages of working with prior published works are: 1. The data are stable and non-changing, 2. The data exists independent of research. This is because the data is not influenced through researcher’s interaction as is the case with interviews. They already exist in the world regardless of the research currently being done [29] and 3. They provide information about procedures and past decisions that could not be observed by researchers [27]. We obtained the information on work practices through analysis of end-results of researches as they were described in published scientific literature. The naturalistic methods recommended by Lincoln and Guba were applied to ensure that the content analysis was (to the extent possible) credible, transferable, dependable and confirmable.
5 Case Study Samples For our content we initially chose sixty one published papers primarily concerned with reports on effective use of visualization for analysis and mining of large datasets. The chosen papers were from four different domains, namely, medicine, bioinformatics, epidemiology and geoscience. Each paper was studied and those which did not report actual case studies by experts were excluded from further consideration. The final numbers of papers which contained cases studies that described interaction with visual information in each domain are given in Table 1. Every one of these papers was analyzed and used in the information interaction coding process described next.
A Classification Scheme for Characterizing Visual Mining
51
Table 1. Numbers of papers analyzed in each domain for qualitative content analysis
6
Context
Number of papers
Bioinformatics
8
Medical
9
Geoscience
11
Epidemiology
3
Analysis of Case Studies
The Sub-Processes of Visual Mining explained in section 3 were used to develop the initial coding scheme prior to data analysis [26]. As analysis proceeded, additional codes were developed and the initial coding scheme was revised and refined. Coding of the data took place in multiple iterations. (1) Initial coding of each paper began with manual annotation of paper by reading case studies line by line, to highlight each relevant concept of human interaction and label it. Subsequent iterations of reading and coding of each paper in a constant comparison with previous paper and coding allowed emergence of categories and themes. We used NVivo 9 software that helps work with unstructured information such as documents, surveys, audio, video and pictures in order to assist in better decision-making [30]. NVivo 9 allowed us to code relevant concepts of VM in the articles and assign them to nodes which can be as hierarchical (tree nodes) or un-hierarchical (free nodes) as required. The relevant concepts of visualization were first coded as free nodes. Then, after coding a few articles and comparing them with previous ones, were either modified to tree nodes, renamed or deleted as required. Coding with NVivo 9 was convenient since it allowed adding, renaming, deleting or merging of codes as required but it did not, however, automate the coding process. (2) The consistent coding was addressed by including several iterations of coding around a period of a year. (3) Peer debriefing technique was used to confirm the interpretations and coding decisions. Peerdebriefer, a disinterested observer, analyzed the research materials and questioned the data, meaning, and interpretation. She was a colleague and had a PhD in computer Science, was not involved in the study. She had knowledge about qualitative research and phenomenon under investigation. The interactions between researcher and the peer-debriefer also included in the audit trail. She also acted as the auditor. (4) The coding changes were maintained by creating static models in NVivo 9 for future reference. In addition ideas, discussions, interpretations and decisions were recorded in the memos in NVivo 9 to keep tracking of the development of analysis. These allowed an audit trail to be maintained. (5) An external auditor examined the audit trail. (6) The dynamic models illustrating code relationships were used to visualize explore and present connections and patterns in the data. (7) At the end, member checking which is most important action in a naturalistic inquiry [31] was conducted to test the result of analysis with a geographers and a research fellow in biomedical engineering. They confirmed the results and verified the interpretations.
52
E. Mozaffari and S. Mudur
7 Results The above mentioned process led us to formulate a set of criteria which characterizes the VM process. Table 2 presents these criteria and their possible values as the task model for visual mining. Table 2. Classification Scheme of VM Criteria
Values
Goal
assess, estimate, develop options
Information seeking
searching, browsing, monitoring, being aware
Retrieval
pattern, hypothesis, judgment
In the resulting classification scheme, the user’s goal of visual mining requires an understanding of the current situation and explaining the past (assess), estimating future capabilities (estimate) and developing different possible options (develop options). In order to accomplish these goals, the user must gather relevant information and evidence through active or passive information-seeking activities which, as already described, are classified as searching, browsing, monitoring and being aware. The retrieved item(s) during these activities can be a pattern, hypothesis or final analytical judgment. Finally, in order to further validate the classification scheme, typical real-world visual mining tasks were extracted and listed from the reviewed literature. All extracted tasks were re-described using the VM classification scheme in order to validate the model. Finally to ensure that further refinement is not needed, visual interaction tasks were extracted from ten new papers all containing reports of visualization case studies. All these tasks were comprehensively described by the VM task model. This process was repeated again with an additional five papers. Since no changes were required in the classification scheme, we concluded that our final classification scheme was stable and no further refinements were needed.
8 Conclusion and Future Work To understand users of large datasets while performing VM, studies about users’ information behaviours are critical. However, studies that focus on scientists’s information behaviour in the visual mining process are rare. To his end, this paper has presented a summary of a study concerned with human interactions with visually represented information which aimed to improve our understanding of information behaviors of users of large datasets. By carrying out a trustworthy qualitative content analysis procedure using published papers reporting visual information interaction tasks, we have derived that user behaviours in this context can be differentiated along
A Classification Scheme for Characterizing Visual Mining
53
a small set of three criteria. These three criteria were represented in the form of a classification scheme of the visual mining process. This classification scheme allows to describe real world visual mining tasks which play an important role in analysis of large datasets. In our future work we plan to use these criteria in modelling user behavior through behavioral strategies, validating these strategies against known case studies in different domains and applying it in comparative evaluation of visualization systems and in the design of newer systems.
References 1. Mann, B., Williams, R., Atkinson, M., Brodlie, K., Williams, C.: Scientific Data Mining, Integration and Visualization. In: Integration, and Visualization Report of the workshop held at the e-Science Institute (2002), http://www.nesc.ac.uk/talks/sdmiv/report.pdf 2. Atkinson, M., De Roure, D.: Data-intensive Reseach: Making best use of research data. eScience Institute (2009) 3. Van de Sompel, H., Lagoze, C.: All Aboard: Toward a Machine-Friendly Scholarly Communication System. In: Hey, A.J.G., Tansley, S., Tolle, K. (eds.) The Fourth Paradigm: Data-intensive Scientific Discovery, Microsoft Research: Redmong, pp. 193– 199 (2009) 4. Borgman, C.L.: Scholarship in the Digital Age: Information, Infrastructure, and the Internet. MIT Press, Cambridge, MA (2007) 5. Gray, J.: Scientific Data Management in the Coming Decade. SIGMOD 34(4), 34–41 (2005) 6. Wilson, T.D.: Human information behavior. Informing Science 3, 49–55 (2000) 7. Wilson, T.D.: On user studies and information needs. Journal of Documentation 37(1), 3– 15 (1981) 8. Dervin, B.: An overview of sense-making research: concepts, methods and results to date. International Communications Association Annual Meeting, Dallas, Texas (1983) 9. Ellis, D.: A behavioural approach to information retrieval design. Journal of Documentation 46, 318–338 (1989) 10. Ellis, D., Cox, D., Hall, K.: A comparison of the information seeking patterns of researchers in the physical and social sciences. Journal of Documentation 49, 356–369 (1993) 11. Kuhlthau, C.C.: Inside the search process: information seeking from the user’s perspective. Journal of the American Society for Information Science 42, 361–371 (1991) 12. Belkin, N.J., Marchetti, P.G., Cool, C.: Braque: Design of an Interface to Support User Interaction in Information Retrieval. Information Processing and Management 29, 325– 344 (1993) 13. Wilson, P.: Information behavior: An inter-disciplinary perspective. In: Vakkari, P., Savolainen, R., Dervin, B. (eds.) Information Seeking in Context, pp. 39–50. Taylor Graham, London (1997) 14. Wilson, T.D.: Models in information behaviour research. Journal of Documentation, 55, 249–270 (1999) 15. Morse, E. L.: Evaluation of Visual Information Browsing Displays. PhD Thesis, University of Pittsburgh (1999)
54
E. Mozaffari and S. Mudur
16. Keim, D.A.: Information Visualization and Visual Data Mining. IEEE Transaction on Visualization and Computer Graphics 8, 1–8 (2002) 17. Simoff, S.J., Michael, H., Böhlen, M.H., Mazeika, A.: Visual Data Mining - Theory, Techniques and Tools for Visual Analytics. Springer, Heidelberg (2008) 18. Tominski, C. Event-Based Visualization for User-Centered Visual Analysis. Ph.D. thesis, University of Rostock, Rostock, Germany (2006) 19. Thomas, J.J., Cook, K.A.: Illuminating the Path: The Research and Development Agenda for Visual Analytics. IEEE press, New York (2005) 20. Niggemann, O.: Visual Data Mining of Graph-Based Data. Ph.D. Thesis, University of Paderborn (2001) 21. Ankerst, M.: Visual Data Mining. Dissertation (Ph.D. thesis). Faculty of Mathematics and Computer Science, University of Munich (2000) 22. Keim, D.A., Mansmann, F., Schneidewind, J., Thomas, J., Ziegler, H.: Visual Analytics: Scope and Challenges. LNCS. Springer, Heidelberg (2008) 23. Bates, M.J.: Toward an Integrated Model of Information Seeking and Searching. In: Fourth international Conference on Information Needs, Seeking and Use in Different Contexts, vol. 3, pp. 1–15 (2002) 24. McKechine, L.E.F., Baker, L., Greenwood, M., Julien, H.: Research method trends in human information literature. New Review of Information Behaviour Research, 3, 113– 125 (2002) 25. Plaisant, C.: The challenge of information visualization evaluation. In: Proc. of the Conference on Advanced Visual Interfaces (AVI). ACM, NY (2004) 26. Kyngas, H., Vanhanen, L.: Content analysis (Finnish). Hoitotiede 11, 3–12 (1999) 27. Patton, M.Q.: Qualitative research and evaluation methods, 3rd edn. Sage Publications, Thousand Oaks (2002) 28. Bhowmick, T., Griffin, A.L., MacEachren, A.M., Kluhsmann, B., Lengerich, E.: Informing Geospatial Toolset Design: Understanding the Process of Cancer Data Exploration and Analysis. Health & Place 14, 576–607 (2008) 29. Hesse-Biber, S.N., Leavy, P.: The practice of qualitative research. Sage publications, Thousand Oaks (2006) 30. QSR International, http://www.qsrinternational.com/ news_whats-new_detail.aspx?view=367 31. Lincoln, Y.S., Guba, E.G.: Naturalistic inquiry. Sage Publications, Inc., Beverly Hills (1985)
Transforming a Standard Lecture into a Hybrid Learning Scenario Hans-Martin Pohl1, Jan-Torsten Milde2, and Jan Lingelbach2 1 University of Applied Sciences Heinrich-von-Bibra-Platz 1b, 36037 Fulda, Germany 2 University of Applied Sciences Marquardstrasse 35, 36039 Fulda, Germany
[email protected], {jan-torsten.milde,Jan.lingelbach}@informatik.hs-fuda.de
Abstract. We describe the successful transformation of a traditional learning setting of a standard lecture into a hybrid learning arrangement. Based on the 3C-Modell of Kerres and de Witt, the lecture has been extended to integrate exercises. Students are motivated to work in smaller groups. In order to allow students to work according their own work speed and motivation, the learning material is distributed using the e-Learning platform. This material includes video recordings of the lecture in a high quality. The evaluation of this transformation process shows evidence, those students take profit from the extended hybrid learning arrangement. Keywords: Hybrid learning scenario, interactive slide presentation, blended learning.
1 From Standard Lecture to a Hybrid Learning Scenario E-Leaning has been very successful over the last years. This is especially true in the context of higher education, where many universities have set up e-Learning systems to support their teachers and students. While the systems provide a rich set of possibilities to the teachers, it can be seen, that many (if not most) of them do not integrate e-Learning into their teaching settings. At this point in time the didactics of e-Learning scenarios needs to get more attention. Using e-Learning or blended learning as a central part of the general teaching methodology has substantial effect onto the design of both teaching and learning style. In this paper we would like to focus onto the transformation problems and outcomes, that arise when using new e-Learning based teaching methods and integrating new media into higher education teaching (see [Ker05]). The transformation process leads to a more open learning setting that we expect to lead to a higher learning motivation and better learning outcomes. (see [Sch04], [Nie04], [Sch09]) We will provide observations how the didactics have been influenced by each of the transformation steps taken (see [Ojs06], [Car02], [Bro89], [Lav91]). G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 55–61, 2011. © Springer-Verlag Berlin Heidelberg 2011
56
H.-M. Pohl, J.-T. Milde, and J. Lingelbach
In our example a traditional introductory course on bachelor level had been transformed into a blended learning arrangement. The course “Introduction to Electronical Engineering” of the winter term 2009/2010 is a first years course (see [AISOP]). It is structured into a lecture and additional exercises, which were taught separately from the lecture. The students in this introductory course have a very inhomogeneous background. Especially their knowledge in mathematics and physics vary substantially. This leads to a number of problems. While it was possible for most of the students to keep up with the content of the lecture, many had problems when it came to doing exercises, which in turn formed the basis for the final written exam. Some students also had problems with the complexity of the lecture and were unable to structure the content to their needs, making it hard for them to solve the exercises. 1.1 The 3C-Modell as a Didactical Framework for the Design of Blended Learning For most blended learning arrangements lectures or seminars are combined with the application of new media. This technical support is often based strongly to quite available learning-management-systems (LMS). The combination is obtained in the optimum case, the best of both worlds. On the one hand there is direct contact between teachers and students with the opportunity to interact directly in focus. On the other side the rather learner-centered view is stressed. By the offer of electronic materials the self-determination of the students increases. They get the possibility to themselves to choose her learning paths and to determine the learning speed independently. [Dem07] About test scenarios and learning success excrement roles the teacher can intervene steering and accompanying. Hence, it corresponds to often demand "shift from teaching to learning". However, the main question with Blended Learning attempts is always after the right mixture of on-line study and presence study. Also the variety of the electronic support must be used carefully. The pure supply of digital materials is a really successful didactic conversion of a draught in the rarest cases. [Bac01] For Driscoll [Dri02] blended learning scenarios are a good entrance in the learning with new media. Besides, in particular must be taken on them, mostly still to be developed, media competence of the learning consideration. The arrangement of these teaching arrangements is always a question to the concept of the teacher. So Kerres und de Witt wrote: “In our interpretation, blended learning basically refers to (at least) the mix of different: − didactical methods (expository presentations, discovery learning, cooperative learning, …) and − delivery formats (personal communication, publishing, broadcasting, …)” [Ker03] The 3C-model by Kerres and de Witt [Ker03] underlying to this article creates exactly for these questions a framework to describe the components of the learning arrangement and their weighting and to determine the suitable methods and formats in a second step.
Transforming a Standard Lecture into a Hybrid Learning Scenario
57
It is based on three components: content, construction and communication. Content is all material the learner should be able to recall. All types of media, e.g. literature, scripts, slides, pictures, animations, podcasts, video podcasts, lecture recordings, are possible. Thereby the content is necessary to allow the learner to internalize the externalized knowledge and construct its own knowledge. The construction component is necessary when the information from the material should be available for actions. The learner must reflect the learned facts and fit it in the whole issue. For an integrated learning arrangement the communication component is essential to e.g. discuss learning results with other students, to reflect the facts with the practical reuse in contact with the teacher or got a different view by interaction with others. All three components could be realised in a number of different media types including synchronous and asynchronous online presentations. In a traditional teaching setting, content will be presented by the teacher (e.g. as a slide presentation), in a non-communicative setting (teacher is presenting, students are listening) with a very low fraction of construction time (students will repeat content for themselves after the presentation). In our transformation process we attempted to change this situation to make the teaching situation more communicative and constructive. The slide presentations in the lecture have been changed to a more interactive way of presentation to support the content component. Exercises have been integrated into the lecture, thus closing the temporal gap between content presentation and practical appliance to enhance the construction component. The complete lecture has been video recorded and is provided as an online video stream to the students. In order to support the students with their exercises, sample solutions have been worked out and have also been video recorded. The complete learning material of the lecture is located on the central e-Learning platform of the university and can be used by students asynchronously to the lectures and exercises. Furthermore communication tools were provided there to promote the communication component.
2 Interactive Presentation The interactive slides presentation has been realised using the “Oxford Paper Show” system (see [papershow]). This system allows to couple traditional slide projection with a computer based presentation. The underlying technology is based on a special paper with extremely tiny control marks printed onto it. These marks are recorded and processed by a camera, which is integrated into a special pen, which in turn is wirelessly connected to the computer using Bluetooth. In our arrangement we copied all slides onto the paper and created scans of these slides. A physical version of the slides was given to the teacher and a virtual version of the slides existed on the computer. As an effect of this technology, teachers do not have to change their style of presentation. The slides can be selected with the digital pen and will be projected by the computer. It is possible to write onto the (paper) slides. These comments will be shown on the virtual slides in real time. Every slides transaction is therefore captured
58
H.-M. Pohl, J.-T. Milde, and J. Lingelbach
and can be recorded. This allows an asynchronous playback of the slide presentation. In addition, all comments on the slides will be stored and can later be distributed electronically. A noticeable effect of the integration of this new technology into teaching is the calm presentation style. The teacher explains the content to the student and simultaneously writes down comments. As such, he remains seated and is not moving across the class room. The students seemed to be much more focused and kept listening to the teacher. Instead of writing down everything the teacher explained, more individual notes were taken.
3 Integration of Exercises In order to consolidate the lecture's content, exercises were integrated into the lecture. The exercises were given to the students at the end of each lecture. After 20 Minutes a sample solution was presented by the teacher using the paper show technology. Further exercises were given, which should be solved as part of the self-study phase of the students. If requested, the solution for this exercises were presented in the following lecture. Integrating the exercises resulted in a deeper understanding of the lecture's content. Students took part in the lecture more actively and misunderstandings were identified immediately. Another effect of the exercise integration is the speed reduction of the lecture. The teacher focused on the most relevant topics, thus reducing the complexity of the lecture. The students profited from the interactive presentation style of the sample solution. The exercises are relatively complex and the stepwise explanation helped them to comprehend the solutions. If needed, the presentation could be replayed multiple times. To interact and communicate with other students outside the lectures the students uses very often asynchronous methods like threated discussion. It was interesting to see that for this interaction not the standard learning management system was used. Instead a server outside of the university supported by student representatives of the department Applied Computer Science.
4 Video Recordings of the Lectures In order to facilitate an intensive self-study phase, all lectures have been video recorded and were put onto the e-Learning platform for individual downloading (see [AISOP]). In addition a streaming server was set up. This made it possible to watch the lecture even on a slow bandwidth connection. It only takes about two hours of post processing (mainly automatic transcoding) until the lectures are available to the students. For the recording of the lectures a mobile recording system had been designed and set up by the central e-Learning laboratory of the university. The system has been built around a “Sony AnyCast Station”, a mobile video recording system, allowing to recording and mixing down up to six video channels. We attached two cameras to the
Transforming a Standard Lecture into a Hybrid Learning Scenario
59
system. These cameras are fully remote controllable. Using two cameras provided us with a higher flexibility for the art work of the recording, resulting in a more “interesting” video providing a higher level of immersion. In order to capture the signal of the presentation computer, a splitter had been attached to the system, thus making it possible to both show the signal on the data projector and to record it in a high quality. The sound has been recorded using wireless microphones. The recording, video mixing and camera control is done in real time by a single person. We trained a couple of student tutors, who are now able to record lectures on their own. As the recording system is designed to be mobile, it can be used in any standard lecture class. Setting up the system takes about 15 minutes for two people. The students used the video recordings very intensively. The video were watched almost around the clock. Quite often only smaller parts were selected and have been watched multiple times. The recordings also had positive effects onto the lectures. Students explained that it was comforting to them, that they could repeat topics, even when they did not understand the topic during lecture time. The attendance rate did not drop, so students still wanted to take part in the “live” lecture. A similar effect was noticed by the teacher. It took him little time to get used to the recording situation. Knowing that the content was available for replay, he was able to refer to the video, if needed. That reduced the need to repeat things during the lecture, leaving more time for the exercises.
5 Presentation on the e-Learning Platform Additional material to the video recordings and the slide presentation was put onto the central e-Learning platform of the university. This material included the texts of the exercises and a large number of exercises of past years. The students rarely used this additional material and rather stuck to the content which was of immediate interest to them. The e-Learning platform provided means of communication, such as forums and chats. These were used by the students for organisational purposes only. A content related discussion did not take place. The e-Learning platform formed the basis for the self-paced study style of the students. In earlier years students tended to learn “on demand”, starting to learn about two weeks before the final exam was due. With the platform online and the material available, the students started to work in a more continuous way.
6 Conclusions At the end of the term, the students were asked to fill out a questionnaire. A total number of 78 persons attended the course. Of these 55 persons took part in the evaluation. The questionnaire is standardised which makes the automatic processing of the data possible. The evaluation is archived for documentation purposes. Therefore long term comparison becomes possible and will be performed during the next years.
60
H.-M. Pohl, J.-T. Milde, and J. Lingelbach
The analysis of the evaluation showed a high acceptance rate of the used methodologies and technologies. More than 55% of the participants took advantage of the new media. Almost 70% observed positive learning effects when using the elearning material for their course preparation and course repetition (see Fig. 1).
Fig. 1. Evaluation results
Switching from a standard lecture to a hybrid learning scenario has resulted in a number of positive effects. Most of the participants explained, that e-Learning played an important role in their learning success. The students liked the online support and would like to see it extended in the future. As shown in figure 2 especially online exercises and audio and video recordings were requested. This is a very positive outcome, as the self-activation of the students was one of the central targets of this transformation.
Fig. 2. Requested forms of online support
Offering an open learning situation to the students leads to a more intensive learning experience and results in a deeper understanding of the content. The integration of exercises into the lectures provided a transfer to the practical appliance of the theoretical content. The used technologies allowed to preserving a classical teaching/presentation style, while transforming the material into the digital world. The teacher was able to focus onto the lecture and was not distracted by technological problems.
Transforming a Standard Lecture into a Hybrid Learning Scenario
61
References [Ker05] Kerres, M.: Didaktisches Design und E-Learning. In: Miller, D. (Hrsg.) E-Learning Eine multiperspektivische Standortbestimmung, pp. 156–182. Haupt Verlag (2005) [Sch04] Schulmeister, R.: Didaktisches Design aus hochschuldidaktischer Sicht - Ein Plädoyer für offene Lernsituationen, Stand: (March 4, 2010), http://www.zhw.uni-hamburg.de/pdfs/Didaktisches_Design.pdf [Sch09] Schulmeister, R.: Studierende, Internet, E-Learning und Web 2.0. In: Apostolopoulus, N., et al. (Hrsg.) E-Learning 2009 - Lernen im digitalen Zeitalter, pp. 129–140. Waxmann (2009) [Ojs06] Ojstersek, N., Heller, I., Kerres, M.: E-Tutoring. Zur Organisation von Betreuung beim E-Learning. In: Arnold, R., Lermen, M. (Hrsg.) eLearning-Didaktik, pp. 107–116. Schneider Verlag, Hohengeren (2006) [Nie04] Niegemann, H.M., et al.: Kompendium E-Learning. Springer, Heidelberg (2004) [papershow] http://www.papershow.com/de/index.asp (July 5, 2010) [AISOP] E-Learning Plattform der Hochschule Fulda (July 5, 2010), http://elearning.hs-fulda.de/aisop/ [Bro89] Brown, J.S., Collins, A., Duguid, P.: Situated Cognition and the Culture of Learning. Educaltion Researcher 18, 32–42 (1989) [Lav91] Lave, J., Wenger, F.: Situated Learning: Legitimate Peripheral Participation. Cambridge Press, New York (1991) [Car02] Carman, J.M.: Blended Learning Design: Five Key Ingredients (July 5, 2010), http://citeseerx.ist.psu.edu/viewdoc/ download?doi=10.1.1.95.3197&rep=rep1&type=pdf [Pel01] Pellegrino, J.W. (Hrsg.): Knowing What Students Know: The Science and Design of Educational Assessment. National Academy Press, Washington, D.C (2001), ISBN 9780309072724 [Bac01] Bachmann, G., Dittler, M., Lehmann, T., Glatz, D., Rösel, F.: Das Internetportal LearnTechNet der Uni Basel: Ein Online Supportsystem für Hochschuldozierende im Rahmen der Integration von E-Learning in der Präsenzuniversität. In: Haefeli, O., Bachmann, G., Kindt, M. (Hrsg.) Campus 2002 – Die Virtuelle Hochschule in der Konsolidierungsphase. Münster, pp. 87–97. Waxmann (2002), ISBN 978-3830911913 [Ate04] Attewell, J., Savill-Smith, C. (Hrsg.): learning with mobile devices – research and development. Published by the Learning and Skills Development Agency (2004), ISBN 1 85338 833 5 [Dem07] Demetriadis, S., Pombortsis, A.: E-Lectures for Flexible Learning: a Study on their Learning Efficiency. Educational Technology & Society 10(2), 147–157 (2007) [Foe02] Foertsch, J., Moses, G., Strikwerda, J., Litzkow, M.: Reversing the lecture/homework paradigm using eTEACH web–based streaming video software. Journal of Engineering Education 91(3), 267–274 (2002) [Wie10] Wiethäuper, H.: E-Learning: Integration von Mediendidaktik und Lerntechnologie in Bildungsprozesse (2010), http://www.uni-marburg.de/fb21/sportwiss/mitarbeiter_seiten/ wiethaeuper/elearning/zw_medien_technologie.pdf, Stand: (January 14, 2011) [Dri02] Driscoll, M.: Blended Learning: Let’s Get Beyond the Hype, Stand: (January 20, 2011), http://www-07.ibm.com/services/pdf/blended_learning.pdf
Designing Web Sites and Interfaces to Optimize Successful User Interactions: Symposium Overview Robert W. Proctor1 and Kim-Phuong L. Vu2 2
1 Department of Psychological Sciences, Purdue Univerity, West Lafayette, IN, USA Department of Psychology, California State Univerity Long Beach, Long Beach, CA, USA
[email protected],
[email protected]
Abstract. Since the Web became widely available in the mid 1990s, it has come to be used by a range of people for many purposes. Effective user interactions are required for a Web site or product to accomplish its intended goals. Given the user-intensive nature of the Web and the many usability issues associated with performing tasks on the Web and with mobile devices, it is important for designers and researchers to understand issues that relate to how to optimize interfaces for the Web design and other systems involving humancomputer interaction. This symposium is devoted to issues involved in the design of Web sites and interfaces to promote successful user interactions. Keywords: Information Display, Input Devices, Mobile Devices, Organization of Information, Web Design.
1 Introduction Since the World Wide Web became widely available in the mid 1990s, it has come to be used by a variety of people for many different purposes, including e-commerce, social networking, data display, information sharing and collaboration, and mobile transactions. Effective user interactions are required for a Web site or product to accomplish its intended goals. Given the user-intensive nature of the Web and the numerous usability issues associated with performing tasks on the Web and with mobile devices, designers and researchers need to understand issues relating to Web design and Web usability. There is often a tendency to pit academicians against practitioners, basic research against applied research, and theoretical knowledge against experiential knowledge. Yet, from our experience, we have found that an approach that emphasizes multiple perspectives and multiple methods is most beneficial for acquiring knowledge and advancing technology [1]. Communication among individuals with various backgrounds, interests, and training is essential for facilitating the development and transfer of knowledge between researchers and practitioners in the domain of human–computer interaction (HCI), among other applied domains. Because the most complete understanding of usability problems arises from combining the insights of practitioners and in industry and government with the knowledge of academicians gained from controlled research, we consistently strive to encourage interaction among experts from the different communities. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 62–65, 2011. © Springer-Verlag Berlin Heidelberg 2011
Designing Web Sites and Interfaces to Optimize Successful User Interactions
63
To that end, in 2005, we edited the Handbook of Human Factors in Web Design [2], in which we articulated the general view described above specifically for Web design, stating: It is our opinion that the handbook should reflect the varied backgrounds and interests of individuals involved in all aspects of human factors and Web design. Consequently, we have made a concerted effort to obtain contributions from a diverse group of researchers and practitioners. The contributors are an international ensemble of individuals from academia, industry, and research institutes. Furthermore, the contributors have expertise in a variety of disciplines. We think that awareness of the wide range of views and concerns across the field is essential for usability specialists and Web designers, as well as for researchers investigating theoretical and applied problems concerning Web use and design. (p. xv) Consistent with this view, the book covered a full range of topics relevant to human factors in Web design, from historical developments and human factors principles in general to specific human factors applications in Web design. These content areas included content preparation for Web sites, search engines and interface agents, issues of universal accessibility, usability engineering, Web applications in academic and industrial settings, information security, and emerging technologies such as wireless communication and e-health. That same year, we also organized a symposium for HCII 2005, “Human Factors Considerations in Web and Application Design”, which highlighted many of the then current issues concerning usability in Web design. Papers presented in the symposium included: “Web-based Presentation of Information: The Top Ten Mistakes and Why They Are Mistakes” [3], “User Search Strategy in Web Searching” [4], “Cross Cultural Web Design” [5], “Understanding Online Consumers: A More Complete Picture” [6], “Web-based Programs and Applications for Business” [7], and “User Interface Design Guidelines” [8]. All of the topics covered in the 2005 handbook and symposium are still relevant today, but technological and societal developments have fueled many changes in Web use since that time. Social networking was in its infancy, mobile computing was being developed, and 4G wireless was not available. Rapid developments in these areas, as well as others, have greatly changed the computing landscape, and all of the developments involve new human factors issues associated with use of the technology. To capture these developments, we have edited a second edition of the handbook, which was recently published [9]. Because most of the topics that were relevant in 2005 continue to be relevant today, many chapters of the second edition provide updated information on those topics. New chapters are devoted to topics that have emerged as important since the first edition. They include: Organization of Information for Concept Sharing and Web Collaboration, Web-Based Organization Models, Web Portals, Human Factors of Online Games, Accessibility Guidelines and the ISO Standards, Use of Avatars, and Mobile Interface Design for M-Commerce. Much as the 2005 symposium was intended as an adjunct to the first edition of the handbook, this 2011 symposium is intended to be a companion to the recently published second edition, highlighting a subset of usability topics of interest to
64
R.W. Proctor and K.-P.L. Vu
designers and researchers interested in the Web and HCI. Fitting with our emphasis on communication among academicians and practitioners, the papers represent contributions from persons working in industry and academia.
2 Overview of Symposium This symposium is devoted to issues involved in organization and display of information for HCI in general and Web design in particular. The first two papers address user interactions on e-commerce sites. Najjar presents “Advances in Ecommerce User Interface Design,” in which he emphasizes that interface design is not a static field but a dynamic one in which possible interface features to incorporate into designs are continually changing. He describes new user interface features of which ecommerce designers may want to take advantage when designing Web sites. Examples are provided for many features, including social media connections, automated product recommendations, contextual product visualization, flash sales, and mobile commerce. When performing e-commerce transactions, personal information is transmitted through the Web. The topic of privacy is addressed by Nguyen and Vu in their paper, “Does Privacy Information Influence Users’ Online Purchasing Behavior?” In their study, users indicated whether they would make a purchase of an inexpensive or expensive item at different Web sites. Privacy information was made salient by a search engine called PrivacyFinder, but this information did not appear to influence users’ purchasing behaviors. Pappas and Whitman, in their paper “Riding the Technology Wave: Effective Dashboard Data Visualization,” discuss how to optimize data dashboard displays, identifying the types of data that are best represented in particular formats and techniques for displaying multiple visualizations. They emphasize that the choice of what to display and how to display it depends on the needs of the particular user. In “A Conceptual Model of Using Axiomatic Evaluation Method for Usability of the Consumer Electronics,” Guo, Proctor, and Salvendy describe concepts from Axiomatic Design theory that are based in information theory, and they discuss how it can be applied as a usability evaluation method for mobile consumer electronics. An experiment is described in which participants identified more usability problems associated with a cell phone when employing axiomatic evaluation than when using a more traditional usability method. Mobile devices are also of concern to Xu and Bradburn, whose paper, “Usability Issues in Introducing Capacitive Interaction into Mobile Navigation,” focuses on user interactions with mobile devices. They present an experiment that evaluates the use of capacitive touch sensors that are able to distinguish light and forceful touches, as a possible option for interface design. Xu and Bradburn discuss issues associated with implementation of capacitive touch devices and propose initial guidelines for their use. In the final paper, “Movement Time for Different Input Devices,” Bacon and Vu describe an experiment showing movement times for three input devices commonly used in HCI tasks. Movement time was shortest when the input modality was a button press on a response panel, intermediate when it was a computer mouse, and slowest when it was a touch screen. The authors discuss implications for the design of display-control configurations using these input devices.
Designing Web Sites and Interfaces to Optimize Successful User Interactions
65
This symposium contains both basic and applied knowledge derived from experiments and design experience. Each paper provides a unique contribution to understanding issues for optimizing interfaces and Web sites for human use.
References 1. Proctor, R.W., Vu, K.-P.L.: Complementary Contributions of Basic and Applied Research in Human Factors and Ergonomics. Theor. Iss. in Erg. Sci. (in press) 2. Proctor, R.W., Vu, K.-P.L. (eds.): Handbook of Human Factors in Web Design. Lawrence Erlbaum Associates, Mahwah (2005) 3. Tullis, T.: Web-based Presentation of Information: The Top Ten Mistakes and Why They Are Mistakes. In: HCI International 2005, Human-Computer Interfaces: Concepts, New Ideas, Better Usability, and Applications, vol. 3. Lawrence Erlbaum Associates, Mahwah, (2005) 4. Fang, X.: User Search Strategy in Web Searching. In: HCI International 2005, HumanComputer Interfaces: Concepts, New Ideas, Better Usability, and Applications., vol. 3. Lawrence Erlbaum Associates, Mahwah (2005) 5. Rau, P.-P.P., Choong, Y.-Y., Plocher, T.: Cross Cultural Web Design. In: HCI International 2005, Human-Computer Interfaces: Concepts, New Ideas, Better Usability, and Applications, vol. 3. Lawrence Erlbaum Associates, Mahwah (2005) 6. Volk, F., Kraft, F.: Understanding Online Consumers: A More Complete Picture. In: HCL 2005, Human-Computer Interfaces: Concepts, New Ideas, Better Usability, and Applications, vol. 3. Lawrence Erlbaum Associates, Mahwah (2005) 7. Vaughan, M., Dumas, J.: Web-based Programs and Applications for Business. In: HCL 2005, Human-Computer Interfaces: Concepts, New Ideas, Better Usability, and Applications, vol. 3. Lawrence Erlbaum Associates, Mahwah (2005) 8. Najjar, L.: Accessible Java-Application User Interface Design Guidelines. In: International, H.C.I. (ed.) HCI International 2005, Human-Computer Interfaces: Concepts, New Ideas, Better Usability, and Applications, vol. 3. Lawrence Erlbaum Associates, Mahwah (2005) 9. Vu, K.-P.L., Proctor, R.W. (eds.): Handbook of Human Factors in Web Design, 2nd edn. CRC Press, Boca Raton (2011)
Petimo: Sharing Experiences through Physically Extended Social Networking Nimesha Ranasinghe1, Owen Noel Newton Fernando1, and Adrian David Cheok1,2 1
Keio-NUS CUTE Center, IDM Institute, National University of Singapore, 119613, Singapore 2 Keio University, Hiyoshi, Kohoku-ku, Yokohama City, Kanagawa, Japan {nimesha,newtonfernando,adriancheok}@mixedrealitylab.org
Abstract. This paper presents an experience-sharing platform, Petimo, which consists of two modules, Petimo-World and Petimo-Robot. This system extends the traditional social networking concept into the physical world by incorporating a child friendly soft robotic toy for easy and safe social experience. It adds a new physical dimension to social computing and provides extra safety in making friends by physically touching each other’s robots. Petimo system can be connected to any social network and it provides safety and security for children. Petimo-World demonstrates many basic features with traditional online social networks in order to share personal experiences. Petimo-World stands out from all other virtual worlds with its interesting and sophisticated interactions such as the visualization of friends’ relationships through spatial distribution in the 3D space to clearly understand the closeness of the friendship, personalized avatars and sending of special gifts/emoticons.
1 Introduction Most of the times people are able to use full range of expressions in face to face communications: language, expressions, gestures, all the senses (hearing, sight, touch, smell, and taste), and interaction with the artifacts and space. However, remote communication has to rely on a more limited range at present: text, sound, image, and video alone or in any combinations [1]. Thus the main motivation of proposed research is to share the experience in remote communication. However, we have a little understanding and knowledge on feeling, emotion, or mood provoked in users, though it is a basic element in the human-mind [6]. The most common definition for Experience is “the knowledge or skill which comes from practice rather than from books or something that happens to one and has an effect on the mind and feelings” as explained [3]. Noticeably, in Japanese culture there is a field of study known as ‘Kansei’. It is a process, which expresses the feelings gathered through all the senses (i.e. hearing, sight, touch, smell, and taste). This process has a broad interpretation including sense, sensibility, emotion, feeling, and experience. Furthermore, when developing the prototype system we have incorporated several steps to address the main motivation defined as in Kansei process [4, 5]. In the present society, social networks have become the latest trend for online experience sharing and online communications especially among young children. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 66–74, 2011. © Springer-Verlag Berlin Heidelberg 2011
Petimo: Sharing Experiences through Physically Extended Social Networking
67
Social networks facilitate for making new friends while keeping old friends in close contact as well as for expressing its users themselves or their personal experiences to friends. The users in social networks such as facebook, myspace, and twitter are using one or more means of communication such as texts, images, audios, or videos to communicate their experiences. Especially, with the expansion of digital media, the attraction of teenagers and younger children to social networks and other activities in the cyber-world is growing. Among many of the related issues in current social networking services two main problems have identified to answer through this research. Firstly, although the social networking services greatly improved during the last couple of years, the traditional methods of using social networking services have not changed. The users still have to use social networking services through either a computer or a mobile phone interface. There have been many researches conducted on the importance of touch for children as well as adults, especially for childhood developments and infants' developments [2, 7]. Because of this reason people are more physically isolated though they have enough tools to connect remotely [8]. The lack of physicality in these modes of communications is one of the main reasons and that is one motivation for this research. Secondly, cyberspace is increasingly becoming an unsafe and more victimized environment, especially for children [9]. This results in conflicting messages between parent and child, social isolation, cyber connectivity with unknown people with unverified identities. Psychologists have theorized about the meaning of online relationships during adolescence and warned about the dangers of sexually exploitative online relationships [10]. With these motivations we introduced “Petimo”, which is aimed towards providing a novel means of physical interaction to social networks as well as a novel platform for children family interactions. As finer inspirations to the described research theme, Petimo provides an experience communication platform through social networking and physically extending the social networking concepts towards children and family communications. “Petimo” and the “Petimo-World” are two main components in this research where Petimo is a soft robotic companion and Petimo-World is a 3D representation of a virtual world designed for children. Both the Petimo and PetimoWorld are influenced by the popular Japanese “Kawaii” (cute) culture [11], as shown in Figure 1. The Petimo-World is a 3D virtual world with added social networking capabilities along with the soft robotic toy named Petimo. Petimo extends the virtual social network into the real world and provides physical interactions and direct communication with the characters in Petimo-World. When children play with their friends using Petimo, the virtual characters in Petimo-World interact with each other accordingly. This research is also motivated by Japanese “Kawaii” values and aiming for designing an experience-sharing platform with this insight of cuteness. From this perspective, we decided to focus on designing a robot with a warm feeling and a tender image of personality. By using feminine colors and a smooth surface, we aim to reduce mechanical feeling and increase human kind sensation into Petimo robot. With a spherical outer and a curve shaped display, it will be more like a pet, which could play with the children with its lovely cute eyes. In addition, based on Japanese manga (comic) [12] culture, characters are designed with big eyes and egg-shaped
68
N. Ranasinghe, O.N.N. Fernando, and A.D. Cheok
faces that bring a soft feeling to children and the ‘chubby cheeks’ are similar to a baby face [13]. Without limiting to just online interaction, Petimo is expected to further the social interaction into the real physical world by providing such similar features in the typical social networking services. We embed interaction modes such as shaking, touching, squeezing etc. in the robot with the objective of allowing natural human communication through the device. By introducing physical face-to-face friend adding in social networks, it was aimed to provide more close coherence between online and real worlds in addition to providing security. Finally, we believe that this approach will introduce a new paradigm for remote communication by introducing experience communication to the existing social networking concept thus increasing the natural qualities of lives.
Fig. 1. Petimo Robot and Petiomo-World
2 Related Works Modern online social networks have been enhanced with lots of interesting features as the worldwide user attractions are expected to rise boundlessly. MySpace and Facebook can be considered as some of the common online social networks for adults. Safe social networking cannot be expected through these networks, especially for children as they may provide unsafe methodologies in socializing. Conversely, social networks, which are specially designed for children like Hello Kitty Online [15] and Club Penguin [14], could be categorized as similar social networks to Petimo-World. They provide messaging and social networking services like email, emoticons (emotion icons), actions such as waving or dancing, discussion boards, online video sharing etc. This may create certain security lapse for child-safety, especially by exposing children to abuse by strangers. More importantly these are purely virtual worlds that do not have the advantage of having physical interaction and safe friend making features like Petimo. Poken [16] allows users to connect through a small tangible device in the shape of a palm with four fingers. In this way, users make friends and exchange social information based on the time and place that they meet. While this may be effective for adults to interact and socialize with one another, there are potential problems for young children in using Poken. The physical device itself is relatively small, enclosed in a hard casing. Currently, as the Poken user interface has a quite simple contact adding mechanism, emotional and experience based communication is difficult to
Petimo: Sharing Experiences through Physically Extended Social Networking
69
articulate among users. Petimo is designed for children, with a soft, squeezable enclosure, cute design, and color display integrated to allow children to perform bidirectional emotional communication such as sending emoticons and gifts. Tangible and physical objects have rich affordances which users can learn simply by grasping and manipulating them [17]. Previous generation of children before the explosive growth of computers and the Internet learn by exploring and manipulating physical objects. The power of information and the Internet mean that computers have taken over any other toys or natural physical environment as the tool for learning. It is hard to deny that computer hold immense power for children to learn from it. However, there still exist a gap between the digital computer and the physical world. Learning using the computer as a tool neglects the lessons we can learn from interacting with real physical objects. Therefore, this approach supports traditional exploratory play with physical objects, extend, and enhance by the interactive power of digital technology.
3 System Description The software architecture of the system is depicted in Figure 2. The Petimo-World client side comprises of two software components, Petimo-World Client and PetimoInterface Client. Petimo-World Client is an extension to Multiverse Client. PetimoInterface is the software component that implements the communication between the Petimos. Petimo-Interface connects directly with the Petimo-World server while the communication between two Petimo-World browsers is done through Multiverse online gaming platform. Petimo-World server is a centralized server that stores the data related to Petimo-World users and coordinates the communication in PetimoWorld. 3.1 Petimo-World Features This section presents a detailed technical description of the two levels in the PetimoWorld, known as the Macro and Micro worlds. Petimo-World has designed as two levels named macro world and micro world. Macro world has a novel 3D friend list arrangement while micro world is a garden like environment where other friends can visit and play. As explained, Macro level was developed to provide the user management functionality on the Petimo-World. The main user is represented as a character named Seedar that has a shape very similar to the Petimo robot. The friends are arranged around the seedar in Spherical orbits where the whole set of Seedars are immersed in a pink color galaxy. When the user logs in to the Petimo-World, he or she is directed to the Macro level. The user's seedar appears on the screen with friends arranged in spherical orbits in the galaxy. The user can navigate through the galaxy and reach the friend seeders. Macro level provides interactions such as visiting a friend's micro level, removing a friend through right clicking on the seedar character. As the arrangement of the friends in the Macro world is based on the concept of spherical orbits, the Perlin noise [18] based approach was chosen for the algorithmic
70
N. Ranasinghe, O.N.N. Fernando, and A.D. Cheok
base because it renders a more natural arrangement of friends in a spherical orbit. The friends are scattered into spheres based on the grouping created by a grouping algorithm as shown in Figure 3.
Fig. 2. Software architecture of Petimo
Fig. 3. Spatial arrangement in macro world
Petimo: Sharing Experiences through Physically Extended Social Networking
71
By clicking on friends' Petimo characters, users can visit their friends' micro world, which lies below the macro world. Micro world is a garden like environment, as shown in Figure 4, representing the world inside the Petimo planet.
Fig. 4. Overview of micro world
3.2 Petimo-Robot Features Petimo includes a friend adding function using close proximity radio frequency identification (RFID) technology. As shown in Figure 5 children can add friends by
Fig. 5. Friend adding function
72
N. Ranasinghe, O.N.N. Fernando, and A.D. Cheok
activating the “Add Friend” option on the Petimo menu and physically touching their friends' Petimo. This internally results in exchanging of unique 64-bit identification keys between two Petimos and sending this event to the online user verification system for authentication, after which the relationship is created. The user input sensing includes a smooth scrolling enabled resistive touch-sensing pad primarily for child-friendly menu navigation. Pressure activated squeeze areas of the robot surface facilitates exchange of special gifts and emoticons online. To ensure the rich content and personal experience sharing, a vibrotactile effect generator, sound output module, and a display module have also been used for actuation. A mini, low cost, energy saving color Organic Light Emitting Diode (OLED) [19] display has been used in Petimo as the primary media for interactive feedback as in Figure 6. The unique RFID key exchanging mechanism extends the communication bandwidth comprehensively without additional complexity associated with tangible interfaces.
Fig. 6. Petimo Robots and OLED display (Emoticon and Gift sending)
4 Communication Module Communication module is the heart of the Petimo platform and the communication module bundled with the Petimo-World. However, it has the ability of performing tasks independently with the Petimo server. This allows Petimo users to use robot and interact with the system without interacting with the Petimo world. The ability to configure more than one communication module in one PC provides the ability to configure several Petimo robots for one user thus to map different characters in virtual world. This module has two sub modules: Robot to PC- and PC to servercommunication. Robot to PC communication is implemented through Bluetooth protocol while PC to server communication is implemented through TCP/IP sockets.
5 Conclusion In this paper, the importance of the multi-sensory communication mediums along with social networking as well as the importance of sharing personal experiences is
Petimo: Sharing Experiences through Physically Extended Social Networking
73
considered. We have extensively described Petimo as a revolutionary, interactive, and friendly soft robotic device, extending its capabilities to change social networks fundamentally providing a novel approach for children to make friends easily in a more protected and safe social networking environment. Petimo together with PetimoWorld, encourages the building of real social networks through interactions as they interact by squeezing, touching and sending gifts or emoticons to their friends, family, and parents. This will dramatically change the younger generation's tendency of being disconnected from family and loved ones by bridging the gaps of existing social network security issues and acting as a powerful means to support a child's safe path toward a secured and personally enriching social networking experience. The individual concepts gleaned from this can be widely used future works with new interfaces which could not have been imagined before extending its capabilities to fundamentally change social networks and providing a novel approach to helping children make friends easily in a more protected and safe social networking environment. Additional Authors. Kening Zhu, Dilrukshi Abeyrathne, Kasun Karunanayaka, Chamari Priyange Edirisinghe, Roshan Lalintha Peiris, and James Keng Soon Teh, are from National University of Singapore. Yukihiro Morisawa, Charith Fernando, Miyuru Dayarathna, Anusha Indrajith Withana, Nancy Lan-Lan Ma, and Makoto Danjo are from Keio University Japan. Acknowledgement. This research is carried out under CUTE Project No. WBS R7050000-100-279, partially funded by a grant from the National Research Foundation (NRF) administered by the Media Development Authority (MDA) of Singapore.
References 1. Hertenstein, M.J.: Touch: Its Communicative Functions in Infancy. Human Development 45(2), 70–94 (2002) 2. Field, T.: Touch. MIT Press, Cambridge (2003) 3. Longman Dictionary of Contemporary English 4. Levy, L., Yamanaka: On Kansei and Kansei Design- A Description of Japanese Design Approach. In: International Association of Societies of Design Research (IASDR 2007) conferences, Hong Kong (2007) 5. Elokla, N., Morita, Y., Hirai, Y.: Using the Philosophy of Kansei: Happiness with Universal Design Product. In: International DesignEd Asia Conference, Hong Kong (2008) 6. Nagashima, T., Tanaka, H., Uozumi, T.: An overview of Kansei engineering: a proposal of Kansei informatics toward realising safety and pleasantness of individuals in information network society. International Journal of Biometrics 01(01), 3–19 (2008), ISSN: 17558301 7. Hertenstein, M.J.: Touch: Its Communicative Functions in Infancy. Human Developement 45(2), 70–94 (2002) 8. Eriksen, T.H.: Tyranny of the moment: Fast and slow time in the information age. Pluto Press (2001)
74
N. Ranasinghe, O.N.N. Fernando, and A.D. Cheok
9. Cho, C.H., Cheon, H.J.: Children’s Exposure to Negative Internet Content: Effects of Family Context. Journal of Broadcasting & Electronic Media 49(4), 488–509 (2005) 10. Wolak, J., Mitchell, K.J., Finkelhor, D.: Escaping or connecting? Characteristics of youth who form close online relationships. Journal of Adolescence 26(1), 105–119 (2003) 11. Lee, D.: Inside look at japanese cute culture (September 2005), http://uniorb.com/ATREND/Japanwatch/cute.htm 12. Schodt, L., Frederik.: Manga! Manga!: The World of Japanese Comics (Manga). Kodansha America (March 1986), http://www.amazon.com/exec/obidos/ redirect?tag=citeulike07-20&path=ASIN/0870117521 13. Hatch, Joshua, Rasinski, Tinothy, V.: Comic Books: From Superheroes to Manga. Red Brick Learning (2005) 14. Club penguin online(cpo) (2008), http://www.clubpenguin.com 15. Hello kitty online, hko (2008), http://www.sanriotown.com/main/index.php?lang=us 16. Welcome to poken (2008), http://www.doyoupoken.com 17. Ishii, H., Ullmer, B.: Tangible bits: towards seamless interfaces between people, bits and atoms. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Atlanta, Georgia, United States, pp. 234–241. ACM, New York (1997) 18. Perlin, K.: An image synthesizer. In: Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pp. 287–296. ACM, New York (1985) 19. Oled (2008), http://www.4dsystems.com.au/prod.php?id=29
Comparison Analysis for Text Data by Using FACT-Graph Ryosuke Saga1, Seiko Takamizawa2, Kodai Kitami2, Hiroshi Tsuji3, and Kazunori Matsumoto2 1
Kanagawa Institute of Technology, Facuty of Information and Computer Science 1030 Shimo-ogino, Atsugi, Kanagawa, 243-0292, Japan 2 Kanagawa Institute of Technology, Graduate School of Engineering, 1030 Shimo-ogino, Atsugi, Kanagawa, 243-0292, Japan 3 Osaka Prefecture University, Graduate School of Engineering, 1-1 Gakuen-cho, Nakaku, Sakai, 559-8531, Japan {saga,matumoto}@ic.kanagawa-it.ac.jp, {seiko.takamizawa,kodai.kitami}@gmail.com,
[email protected]
Abstract. This paper describes a method to apply the Frequency and Cooccurrence Trend (FACT)-Graph to comparison analysis. FACT-Graph is the method to visualize the changes in keyword trends and relationships between terms over two time periods. The usefulness of FACT-Graph has been shown in tracking trends in politics and crime. To apply FACT-Graph to compare information, we use class transition analysis and separate analysis periods into categories that are the target of comparisons, and collate the features in each comparison target. For the comparison analysis by using 138 articles from two newspapers, we compare topics such as politics and events in them by using the relationships between terms found in the FACT-Graph results. Keywords: Comparison Analysis, Visualization, FACT-Graph, Text Mining, Knowledge Management.
1 Introduction As information systems progress, several business organizations have begun to focus on knowledge management to create business value and sustain competitive advantage by using data in data-warehouses [1][2]. To make these data-warehouses work to their advantage, they have to recognize their strong points, develop a strategy, and make effective investments. To recognize advantages, comparison analysis is often done by using crosstabulation and visualization analysis. Comparison analysis is relatively easy when the comparative data are expressed quantitatively. However, most significant data often occur in text data and are difficult to obtain from pre-defined attributes. Therefore, text data in questionnaires, reports, and so on must be analyzed. Text mining is useful for analyzing text data to obtain new knowledge [3]. In text mining, the applicable areas are wide-ranging such as visualization, keyword extraction, G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 75–83, 2011. © Springer-Verlag Berlin Heidelberg 2011
76
R. Saga et al.
summarization of text, and so on. We have developed the Frequency and Co-occurrence Trend (FACT)-Graph for trend visualization of time-series text data[4]. FACT-Graph is used to visualize the trends in politics and crimes to extract important keywords that look unimportant at a glance. This paper describes a method to compare two targets by using FACT-Graph. However, FACT-Graph targets time-series data, so we cannot apply it for comparison analysis. Therefore, we change data for analysis on the basis of class transition analysis to enable FACT-Graph to carry out comparison analysis. The rest of this paper is organized as follows: Section 2 describes the overview and underlying technologies of FACT-Graph. Next, Section 3 describes how to apply FACT-Graph for comparison analysis. After that, Section 4 performs a case study of two Japanese newspapers. Finally, we conclude this paper.
2 FACT-Graph FACT-Graph is a method to create visualized graphs of large-scale trends[4]. It is shown as a graph embedded co-occurrence graph and the information of keyword class transition. FACT-Graph enables us to see the hints of trends, which have been used for analyzing trends in politics and crime by using analysis tool [5][6]. FACT-Graph uses nodes and links. It embeds the change in a keyword’s class transition and co-occurrence in nodes and edges. It has two essential technologies: class transition analysis and co-occurrence transition. 2.1 Class Transition Analysis and Co-occurrence Transition Class transition analysis shows the transition of keyword class between two periods [7]. This analysis separates keywords into four classes (Class A to D) on the basis of term frequency (TF) and document frequency (DF) [8]. The results of the analysis detail the transition of keywords between two time-periods (before and after) as shown in Table 1. For example, if a term belongs to Class A in a certain time period and moves into Class D in the next time period, then the trend regarding that term is referred to as “fadeout”. FACT-Graph identifies these trends by the node’s color. For example, red means fashionable, blue unfashionable, and white unchanged. In convenience, we call the fashionable patterns Pattern1, the unchanged patterns Pattern 2, and unfashionable patterns Pattern 3. Additionally, a FACT-Graph visualizes relationships between keywords by using co-occurrence information to show and analyze the topics that consist of multiple terms. As a result, useful keywords can be obtained from their relationship with other keywords, even though that keyword seems to be unimportant at a glance, and the analyst can extract such keywords by using FACT-Graph. Moreover, from the results of the class-transition analysis, the analyst can comprehend trends in keywords and in topics (consisting of several keywords) by using FACT-Graph. Also, FACT-Graph pays attention to the transition of the co-occurrence relationship between the keywords. This transition is classified into the following types. (a) Co-occurrence relation continues in both analytical periods. (b) Co-occurrence relation occurs in later analytical period. (c) Co-occurrence relation goes off in later analytical period.
Comparison Analysis for Text Data by Using FACT-Graph
77
The relationship in type (a) indicates that these words are very close together, so we can consider them to be essential elements of the topic. On the other hand, relationships in types (b) or (c) indicate temporary topical changing. Table 1. Transition of Keyword Classes; Class A (TF: High, DF: High), Class B (TF: High, DF: Low), Class C (TF: Low, DF: High), and Class D (TF: Low, DF: Low) After
Before
Class A Class B Class C Class D
Class A Hot Common Broaden New
Class B Cooling Universal Widely New
Class C Bipolar Locally Active Locally New
Class D Fade Fade Fade Negligible
2.2 Output of FACT-Graph Figure 1 overviews the steps for outputting FACT-Graph. Firstly, the text data must be morphologically analyzed in order to output FACT-Graph. A morpheme is the smallest unit that gives meaning to a sentence (a, the, -ed, etc.). Text data is divided into morphemes, and the parts of speech are also judged by using these tools. This step also extracts the attributes of each document, such as date of issue, length of document, category of document, and so on. Then, the term database is built. The user sets up the parameters such as analysis span, the filter of documents/terms, thresholds used in the analysis, and so on. Then, the term database is divided into two databases (first and second half periods) in accordance with the analysis span. Each term’s frequency is aggregated in respective databases, and keywords are extracted from terms under the established conditions. These keywords go through procedures concerning the transition of keyword’s classes and co-occurrence. The output chart that reflects the respective processing results is a FACT-Graph.
E e
Time-series Text Data
Term DB
Transition Analysis (Keywords and Links)
F
B
D d G g
t1 ? t 2
A a
t2? t3
C
FACT-Graph 2-span database Parameter Analysis period
Thresholds (TF,DF, co-occurrence)
Number of keywords
User
Fig. 1. Overview of Outputting FACT-Graph
78
R. Saga et al.
3 Comparison Analysis by FACT-Graph 3.1 Approach To apply FACT-Graph to compare information, we pay attention to class transition analysis in FACT-Graph. As we mentioned before, class transition analysis is carried out on the basis of two time periods, and FACT-Graph shows the changes between them. The other side of the coin is that FACT-Graph shows the results of the comparison between the periods, and the periods are simply regarded as the categories “Before” and “After”. In other words, we can comprehend that FACT-Graph performs a comparison analysis between two categories “Before” and “After” although it treats time-series text data. By replacing the periods with targets for comparison, we can compare them by using FACT-Graph. However, applying FACT-Graph to comparison analysis has three problems: processing target data, explaining class transition analysis in comparison analysis, and how to express co-occurrence relationships. 3.2 Converting Target Data To apply FACT-Graph to comparison analysis, we need to convert date data. In FACT-Graph, the time data must be included in target data because the data are necessary and help to separate all target data into two periods. On the other hand, for comparison analysis the time data are not necessarily, and the time data do not exist in target data from the very first. Therefore, we attach pseudo time data to target data as a category that belongs to either a period between t1 and t2 like the “Before” period or a period between t2 and t3 like the “After” period. Therefore, it is possible to perform comparison analysis by using FACT-Graph. 3.3 Explanation of Class Transition Analysis and Co-occurrence The interpretation of comparison analysis by FACT-Graph is different from that of trend analysis, but the essential idea is same. The concept of the comparison between two targets is the same as that of class transition analysis although the meanings of a FACT-Graph change, and we can compare the two targets in the same way we analyze FACT-Graph. For example, if two targets have the same keywords that belong to Class A (high TF and high DF), these targets have the equivalent features about topics that the keywords indicate. Let one target have Class A keywords and another Class B, Class C, Class D keywords. Then the former target is characteristic of the topic. FACT-Graph has three types of co-occurrence. For comparison analysis, the cooccurrence means that one or more target uses the terms together. That is, the cooccurrence of type (a) means that a co-occurrence relationship exists in both targets. The other types mean that a co-occurrence relationships exists in alternative targets. By the way, for trend analysis by using FACT-Graph, flux and reflux of the tides of terms are important, so we classify classes into four classes, Class A to D, by the height of TF and DF. However, for comparison analysis, knowing whether a term
Comparison Analysis for Text Data by Using FACT-Graph
79
exists or not is necessary to find features of comparison targets in comparison analysis. Therefore, we add a new class, Class E, which expresses a term existing in only one side of comparison targets. Additionally, this class is expressed by a circular broken line in a visualized graph. Summarizing the above discussion, the FACT-Graph for comparison analysis visualizes terms as shown in Figure. 2.
Class A: High TF and DF Class B: Low TF and High DF Class C: High TF and Low DF
E e
F Class D: Low TF and Low DF
B
D d G g A a
C
Class E: TF = DF = 0 Pattern 1 and Type (a) : Be characteristic terms to both targets and exist links in both targets Pattern 2 and Type (b) : Be characteristic terms to Target B and exist links in Target B only. Pattern 3 and Type (c) : Be characteristic terms to Target A and exist links in Target A only.
Fig. 2. FACT-Graph for Comparison Analysis
4 Experiment 4.1 Data Set By using FACT-Graph, we carried out an experiment to verify whether comparison analysis can be performed. In this study, we used editorials published in The Mainichi and The Asahi newspapers, two of Japan’s major newspapers, between 2006 and 2008. Editorials are used because they pick up on important issues and are often written on the basis of interviews or opinions. Generally, these articles are written from several viewpoints, and the assertions are characteristic of and different for each publisher. Note that we regard few frequent words as unnecessary terms because there is a probability that they are noise and error words. Therefore, we removed the terms for which TF is less than 2 and DF is equal to 1. In this case study, we limited articles to those on the topic of the Olympic Games. Mainichi had 64 editorials and Asahi 74. We apply Jaccard coefficient as cooccurrence and adopt the relationships whose co-occurrence is over 0.3. To carry out class transition analysis in FACT-Graph, we configure the threshold into the top 20% ranked termed on the basis of Zipf's law[9], which is often called 20-80 rules.
80
R. Saga et al.
4.2 Result of Analysis Figure 3 shows the results of FACT-Graph in these conditions. In this graph, blue nodes and links indicate the features in The Mainichi and red nodes and links The Asahi. When we take a global view of FACT-Graph, the term “Olympic”, which is the most important word, is bigger than other nodes and belongs to Class A and Pattern 1 in this graph. Also, there are “Beijing”, “Japan”, and “China”, which have much the same pattern and class as “Olympic”. Therefore, in this analysis period, the biggest topic in this graph concerns the Beijing Olympics. Also, in the central parts of Figure 3, the nodes of Pattern 3 connected by type (c) have been closed up. These nodes are a lot of words that are relevant to the games themselves, such as “Kitajima” (a Japanese gold medal winning swimmer), “Judo”, and “Skating”. For this reason, we can say that The Mainichi describes the Olympic Games without referencing anything else. Figure 4 scales up the lower right of Figure 3. There are a lot of Pattern 2 nodes about China in this area, and all the nodes are connected to each other with type (b). From this area, we can conclude that The Asahi wrote the articles about the Olympics that reference the government of China. On the other hand, Figure 5 scales up the upper left of Figure 3. There are some topics about not only Tokyo’s bid for the 2016 Summer Olympic Games but also the Turin Winter Olympics and Beijing Summer Olympics in the target data. Specifically, the 2016 Summer Olympic Games is deeply relevant to the Tokyo gubernatorial election. In fact, this area has nodes such as “Ishihara” or “governor of Tokyo”, and The Asahi used these nodes extensively even though the relationships between terms often occurred in The Mainichi. In the same way, The Asahi also mentioned “Taiwan”, "Nationalist Party”, and so on. That is, The Mainichi writes separately about elections and the Olympics. On the other hand, The Asahi describes topics politically because it writes about the Olympics while referring to a political party of a foreign country. As a result of the analysis limited to articles about the Olympics, we found that the articles in The Asahi have a stronger political tone than those in The Mainichi. 4.3 Consideration As we mentioned above, comparison analysis for text data by using FACT-Graph is possible. However, there are two analytic issues. 1. FACT-Graph for trend analysis attaches more importance to the features in the latter period than those in first period. That is to say, the FACT-Graph represents the trends from the views of the latter period. Actually, Figure 3 shows the features of The Asahi more strongly. However, it is better that each period is observed equally. Therefore, we have to analyze text data by interchanging Target A and Target B and analyze two graphs. 2. For setting the threshold of class transition analysis, we use Zipf's law in this analysis. However, we should pick out the value so as to apprehend the terms’ features depending on the characteristics of the data.
Comparison Analysis for Text Data by Using FACT-Graph
Fig. 3. Result of Visualization with Mainichi Newspaper and Asahi Newspaper
81
82
R. Saga et al.
Fig. 4. Broaden Lower Right of Figure 3
Fig. 5. Broaden upper left of Figure 3
5 Conclusion This paper described a method to compare two targets by using FACT-Graph, which can visualize the trends for time series text data. To apply FACT-Graph to comparison analysis, we interchanged target data with time series on the basis of class transition analysis. Also, we explained the two essential technologies for comparison analysis and performed comparison analysis. To validate the usability of FACT-Graph, we compared the features of The Asahi and The Mainichi newspapers by using editorials from both. From the results of comparison analysis targeting the word “Olympic”, we found that The Asahi tended
Comparison Analysis for Text Data by Using FACT-Graph
83
to write more political articles than The Mainichi and showed that the proposed method could be used for comparison analysis between two targets. We have three future works. First is visualizing both features of the analysis target. To perform comparison analysis more accurately, we have to regard all features of target data as equal. Therefore, we should integrate figures that express features of both targets. Second is integrating graphs among three targets. In this paper, we integrated graphs between two targets. However, comparing three targets may possibly be necessary, so we should integrate graphs among three targets by considering the explanation of class transition analysis. Finally, we should verify the optimal threshold for TF and DF. Acknowledgement. This research was supported by The Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan Society for the Promotion of Science (JSPS), Grant-in-Aid for Young Scientists (B), 21760305, 2009.4-2011.3.
References 1. Tiwana, A.: The Knowledge Management Toolkit: Orchestrating IT, Strategy, and Knowledge Platforms. Prentice-Hall, Englewood Cliffs (2002) 2. Inmon, W.H.: Building the Data Warehouse. John Wiley & Sons Inc., Chichester (2005) 3. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007) 4. Saga, R., Terachi, M., Sheng, Z., Tsuji, H.: FACT-Graph: Trend Visualization by Frequency and Co-occurrence. In: Proceedings of the 31st annual German conference on Advances in Artificial Intelligence, Kaiserslautern, Germany, pp. 308–315. Springer, Heidelberg (2008) 5. Saga, R., Tsuji, H., Tabata, K.: Loopo: Integrated Text Miner for FACT-Graph-Based Trend Analysis. In: Proceedings of the Symposium on Human Interface 2009 on Human Interface and the Management of Information. Information and Interaction. Part II: Held as part of HCI International 2009, pp. 192–200. Springer, Heidelberg (2009) 6. Saga, R., Tsuji, H., Miyamoto, T., Tabata, K.: Development and case study of trend analysis software based on FACT-Graph. Artificial Life and Robotics 15, 234–238 (2010) 7. Terachi, M., Saga, R., Tsuji, H.: Trends Trends Recognition. In: IEEE International Conference on Systems, Man & Cybernetics (IEEE/SMC 2006), pp. 4784–4789 (2006) 8. Salton, G. (ed.): Automatic text processing. Addison-Wesley Longman Publishing Co., Inc. (1988) 9. Baayen, R.H.: Word Frequency Distributions. Springer, Heidelberg (2002)
A Comparison between Single and Dual Monitor Productivity and the Effects of Window Management Styles on Performance Alex Stegman, Chen Ling, and Randa Shehab College of Industrial Engineering, University of Oklahoma, Norman. 200 W. Boyd St., Room 124 Norman, Oklahoma 73019
[email protected]
Abstract. Several research studies have been published on user opinion and productivity of using dual monitor systems. These studies found that users typically enjoy using multiple monitors, but none found a strong increase in performance and productivity. Other researchers have focused on improving multiple monitor usability, but often without any statistical framework. This study compared single and dual monitor productivity measures: task time, cursor movement, and number of window switches. Additionally, window management styles (WMS) were studied in order to help designers understand user behavior better. WMS were broken into two categories, toggler and resizer, and then compared to the WMS created by Kang and Stasko (2008). The results of the research showed a significant difference between the number of open applications and a significant difference between single and dual monitors for the number of window switches. The only significant difference between the toggler and resizer WMS was the number of window switches, which was an interaction between the styles and the tasks. Keywords: Dual Monitors, Window Management Style, Productivity.
1 Introduction Research on computer monitors varies greatly. Examples of the research topics include monitor size, the benefits of using LCD rather than CRT, and visual fatigue. Research on the use of multiple monitors in a work environment indicates that productivity increases when dual monitors are used in lieu of single monitors. [13]. The present study was performed to describe the differences in the work patterns between single and dual monitor configurations in a controlled experiment simulating an engineering work environment. The study provided a comparison of task time, cursor movement, and the number of toggles between applications for both single and dual monitor configurations. The study also outlined how users with different work patterns designate applications to monitors and how they utilize the virtual workspace. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 84–93, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Comparison between Single and Dual Monitor Productivity
85
2 Literature Review Over the past decade, several important pieces of literature have been authored on using multiple monitors and the associated problems. Productivity continues to be a relevant topic because businesses and people are interested in understanding the possible benefits of using additional monitors. Through a better understanding of this new technology, new operating systems and applications can be designed to take full advantage of the virtual real estate gained when multiple monitors are utilized. Research has shown that when a person uses a single monitor, they will often use it as a single space; if they were using multiple monitors, they consciously treat the space on each monitor separately. [6]. The author also noted user tasks were split into primary and secondary tasks, and that most of the secondary tasks were for communication, such as instant messaging or email, and personal resources, such as calendars and web browsers. In addition, multiple monitor users often use the secondary monitor(s) in direct support of a primary task on a primary monitor. While increasing display space does not resolve the limitation on information management, such as the placement of windows and task bars, [6] went on to say that it does help by allowing users to spread information out and organize their windows more effectively. However, this study discovered that not all information is treated equally; people prefer to allocate their attention to one task while having additional information readily available. Research on time saved when using multiple monitors showed no significant difference in the amount of time spent switching between windows or the number of visible windows when comparing single and dual monitor configurations. The research also showed that the participants often hid specific window content that was determined to be distracting or unnecessary. [7] Researchers and software developers researched multiple monitor usage, tracking who used software to gather data on each participant’s work PC over an extended time period. [8] The results of the research showed that single monitor users were nearly twice as likely to have their active window fully visible, and much less likely to have their email fully visible. The researchers also found that the amount of unused space increased as the amount of space and monitors increased, meaning that participants used the space more efficiently. In conclusion, [8] reflected that users often interact with windows simultaneously to complete a task. The authors explained that multiple monitors help users view more windows at any given moment, but they still switch between windows as frequently. They speculate multiple monitor users may use larger window sizes, and believe that this pattern may not actually be related to monitor size. The research done also showed that multiple monitor users do not rely on the taskbar as much as single monitor users, as well information being spread across many monitors. Usage patterns with multiple monitors were more dynamic compared to single monitor usage. The research, however, remained unable to statistically test these patterns because the data were collected from an uncontrolled source. Focusing more on multiple monitor usage patterns, [9] identified three main usage strategies when interacting with the dialog boxes. They found that each participant
86
A. Stegman, C. Ling, and R. Shehab
used one of the basic strategies, but only a small fraction used the same strategy throughout the entire experiment. These findings indicate that a sole solution to problems encountered when using multiple monitors may not fit all users or fit all types of tasks. Given that participants exercise multiple strategies, software and operation system designers must think of robust solutions. The researchers indicated that further study of multiple monitor usage patterns would be beneficial to designers, but may not provide clear solutions to increase usability. Research on the effects that multiple monitors have on work performance and patterns for light weight computer tasks was completed by [13]. In a controlled experiment, the researchers compared how the participants performed on both single and dual monitor computers while working with word processors, web browsers, and spreadsheets. During the tasks, cognitive load, task time, and the window operations were recorded. Tracking software also collected the opening and closing of windows, locations of windows, as well as moving and resizing of windows. The cognitive workload measure indicated no significant differences between the single and dual monitors, or the sequence of which they were used. The results, however, did show a trend of multiple monitor users feeling less workload than single monitor users. An analysis on the different behaviors of single and dual monitor users on window management styles (WMS) was also performed by [13]. The categories were formed based on how the participants accessed windows and for the methods by which they organized their screen space. Participants accessed their windows by either using Alt+tab to switch the window in view, or by moving and resizing their windows. Three additional categories were formed for organizing screen space: “Maximizers”, “Near Maximizers”, and “Careful Coordinators”. Maximizers kept windows at the maximum size, while using Alt+Tab to switch between windows. Near Maximizers resized their windows to occupy most of the screen space, while Careful Coordinators resized their windows so that several windows were visible simultaneously. These categories, however, did not significantly affect task time or workload when compared to the number of monitors being used or the sequence of use. Even though the differences were not significant, when multiple monitors were used during the first session, Alt+Tab users and Move/Resizers both benefited. During the second session, the Alt+Tab users performed faster when using a single monitor, which could have been attributed to the learning curve. During the two sessions, the researchers observed that Alt+Tab users were less likely to use Alt+Tab with multiple monitors during the first task assignment, and were probably led to act more like Move/Resizers due to unfamiliarity of the task. In contrast, the Move/Resizers were able to coordinate their windows more rapidly and less frequently due to the increased screen space. The workload measures also showed that the Move/Resizers felt less workload during the multiple monitor setting in the second session. A higher workload may have been experienced when multiple monitors where first used, since the user had to manage more screen space and more windows. As the users gained experience using the additional monitors, they were able to develop strategies that allow them to efficiently utilize the increased screen space with less workload.
A Comparison between Single and Dual Monitor Productivity
87
3 Problem Definitions and Hypothesis The primary objective of the study was to compare the differences in work behaviors and usage patterns when participants used a single monitor and used dual monitors. Differences were examined for three metrics. Task time was measured as the number of seconds it took to complete each individual task. Cursor movement was quantified as the number of pixels the cursor moved during each of the computer tasks that the participant completed. Finally, the total number of switches between windows was recorded as a sum of the number of times a participant switched from one application to activate another application. This metric was chosen because it measures a loss in productivity due to cognitive processes being interrupted when users must seek out information, as well as the time lost in activating windows. An explanation of how productivity is affected by interrupting tasks can be found in [2]. Each of the three metrics was examined with respect to WMSs formulated within this research, as well as for the WMSs developed by [13]. Because the user’s WMS and work patterns may vary as the complexity of the task changes, altering the number of open applications may provide insight into how work patterns change when workload varies. Therefore the number of applications used was varied between 2, 3, and 4. The first set of hypotheses focused on the benefit of using dual monitors over single monitors in terms of time, cursor movement, and how many times participants had to switch between windows. The hypotheses for each of these dependent variables were that dual monitors would allow participants to perform faster with less cursor movement and less window switches. This notion was founded on participants being able to spread the windows across the larger virtual workspace when using dual monitors. Since the participants will have more viewable information, the amount of cursor movement to the task bar, as well as the number of window switches, would decrease as a result. The second set of hypotheses focused on the effects of different WMSs on performance. Two WMSs were created and tested for differences, toggler (TOG) and resizer (RES). A TOG was defined as a user who primarily sizes their windows to occupy most of the screen, relying on the task bar more heavily when activating windows. A RES was defined as a user who sized multiple windows to allow them to simultaneously view multiple windows. It was hypothesized that RESs would show a decrease for all three dependent variables, based on the thinking that since RESs are able to view more information at once, they would not have to search for windows using the task bar or activate applications by clicking overlapping windows. The RES did not have to activate windows to see pertinent information. The results of these three TOG and RES hypothesis tests were then compared to the WMS of [13] in order to identify any advantages either set of WMS may have over the other.
4 Methodology The computer tasks used for the experiment were designed to simulate normal engineering computer work that would be completed in a typical office setting while
88
A. Stegman, C. Ling, and R. Shehab
using multiple windows simultaneously on the computer monitor(s). Software packages were chosen to simulate the typical software used in an office environment. These software simulated drafting, data entry, spreadsheet analysis, email, and gathering data. A total of six tasks were designed. Each task was categorized by how many active windows were needed during the task. Three categories of total number active windows were used, two, three, and four, and there were two iterations of each. The task with two active windows used Excel and Outlook. The three active window task added Internet Explorer. Finally, the four active window task added AutoCAD to the suite. Each participant performed all 6 tasks during the experiment, 3 on both a single monitor and dual monitors. Prior to the experiment, the participants completed a short questionnaire and training. After each monitor setting, the participants completed a questionnaire. Finally, a post completion survey was administered. Overall, 36 participants were recruited, mostly from the School of Industrial Engineering at University of Oklahoma. The range of participant age was 18 to 39 years, with an average of 23 years of age, and a standard deviation of 3.5 years. The gender distribution of participants in this study was 67% males and 33% females. The primary equipment used by the participants was a DELL Optiplex 745 PC with two 19 inch monitors. The computer was fit with a standard keyboard and an optical mouse. The dual monitors were placed equidistant from the midsagittal plane. Three independent variables were studied during the experiment: 1) the number of screens (single or dual), 2) the number of active windows open during the tasks (2, 3, and 4), and the WMSs (TOG and RES). WMS was also researched in [13]; however, the categories have been defined differently in this research. Instead of having multiple categories for both sizing and accessing application windows, the categories used in this research combine the two. “Alt-Tabbing” was prohibited by experimental instruction in order to control the behavior, unlike [13]. This research focused on whether or not users are viewing information simultaneously, with the thought that by simplifying the categories, significant differences between the different WMSs would be found.
5 Data Validation, Correlations, and Results For task time, the fitted value plot showed a troublesome trend, as the fitted value increases the residual value also increases. The residual plots for cursor movement showed that the assumption of normality may have been violated due to the slight curvature of the line and the presence of several outliers. The fitted value plot showed a curve trend and an increasing fitted value. The violation to a zero mean for the error term could clearly be seen in the histogram, where the negative residuals showed a much higher frequency than the positive terms. The plots for window switches showed a violation occurred due to the increasing trend of the residual values as the fitted value increased. The histogram, however, did show a defined bell shaped curve, while the order plot also showed no violation. A correlation between cursor movement and task time was found (r2=0.7466, p <0.0001) indicating that as cursor movement increases, task time also increases. The correlation between task time and number of window switches showed that as the
A Comparison between Single and Dual Monitor Productivity
89
number of switches increases, task time also increases (r2= 0.5330, p<0.0001). Finally, the correlation between cursor movement and the number of window switches showed a positive trend for cursor movement as the number of window switches increases (r2=0.5191, p<0.0001). Since cursor movement and number of window switches were correlated to task time, they can both be considered indicators of productivity. Table 1. Means and standard deviations off all data points Window Switches Standard Deviation
Mean # of Window Switches
Mouse Movement Standard Deviation
Mean Mouse Movement
Task Time Standard Deviation
Mean Task Time
4
Sample Size
3
# of Monitors
# of Windows 2
1
34
461.06
168.55
92806.99
39233.39
9.59
7.21
2
34
484.41
154.48 101314.65
50355.19
4.06
3.09
1
34
687.91
267.78 144077.49
66694.41
34.88
10.79
2
34
643.97
253.18 124435.74
63203.57
30.79
12.49
1
34
1314.82
456.51 224794.09
96363.69
32.68
8.72
2
34
1310.09
414.08 225310.30
93169.08
32.06
9.55
As the means suggest, there were no significant differences for the number of monitors used or interactions. The number of windows did, however, produce a significant difference (p<0.0001). Tukey analysis showed that all three tasks were different from each other, although the two and three window task times were similar. Despite the mean mouse movement being higher for the 2 and 4 window tasks, no significant results were found. There was a significant difference in mouse movement for number of windows being used (p<0.0001). Though there were no significant interactions, the 3 window task exhibited a decreasing trend between the single and dual monitor tasks. Tukey analysis confirmed that all three tasks were different. There was both a significant difference in the number of windows and in the number of monitors (p<0.0001 and p=0.0028, respectively). Unlike task time and mouse movement, the Tukey showed that the 3 and 4 window tasks were similar, while the 2 window task was different from both. A statistical analysis was performed on the WMS in two ways. The first analysis was performed only on participants that exhibited one style of window management, called pure TOGs (PT) and pure RESs (PR). Of the 34 total participants, three were classified as PRs and 15 were determined to be PTs. The second analysis examined the patterns of all 34 participants. Task time for PTs and PRs was found to be significantly different (p<.0001). Tukey analysis revealed that the 3 window task requires less mouse movement that the 4 window task. Mouse movement only revealed significant differences for the number of windows being used. Tukey analysis revealed that the 3 window task requires less mouse movement than the 4 window task. The number of windows switches did not produce a significant difference (p=0.420). Only the interaction between the number of windows and the WMSs was significant for number of window switches (p=0.040).
90
A. Stegman, C. Ling, and R. Shehab
An additional statistical analysis was performed of all of the participants considering their respective WMS for each task. However, the results showed significant results only for the number of windows used for both measures of task time and mouse movement. The means for each type of WMS can be found below in table 2. Table 2. Means of TOGs and RESs with respect to number of monitors, Pure users are identified by bold italics # of Monitors Style
Mean T ask T ime
Mean Mouse Movement
Mean # of Switches
1
R
952.59
1028.83 172859.61 176445.11 35.77 32.00
1
T
1024.70
939.70
2
R
845.00
1088.17 164926.70 182445.29 32.88 29.50
2
T
1021.04 1032.23 178188.46 170398.93 30.94 32.63
189972.23 182445.29 32.88 31.93
6 Conclusions, Recommendations, and Future Work The experiment showed mixed results on the benefit of using dual monitors instead of single monitors. Unlike [13], no significant difference was found between single and dual monitors with respect to task time. It was surprising that despite including more complicated tasks than what was found in the literature, no significant differences for task time were found. Contrary to what the actual task times reveal, participants felt they had performed faster in the dual monitor configuration. When trying to determine the reasoning behind the results, two logical interpretations were found. Either there truly were no differences in task time due to the lack of operating system and software compatibility with multiple monitors, or problems existed within the experiment obstructing any significant differences from being found. It could have been possible that resizing windows and locating the windows on the additional monitor mitigated the productivity benefits of using dual monitors because of the time spent moving windows. When looking at the results for cursor movement, the differences between the single and dual monitor configurations were minimal, especially for the four window tasks. Cursor movement was a dependent variable that went unexamined by previous research. Even though no significant differences were found in this study, the survey revealed that the participants thought that they were moving the mouse less during the dual monitor configuration. The benefit of using dual monitors was seen through the decreased number of window switches required to complete a task with the dual monitor. However, time savings associated with reduced switching of windows were most likely mitigated by relocating and positioning windows. Other possible reasons behind the lack of difference between the single and dual monitor configuration results could have been limitations within the experiment, such as lack of experience with AutoCAD or lack of exposure to dual monitor computers. Drafting software experience was required of each participant in the experiment, but
A Comparison between Single and Dual Monitor Productivity
91
participants were not required to be familiar with AutoCAD. Due to the simple nature of the drafting required, proficiency with just AutoCAD was deemed to be unnecessary. Even though training was provided to each participant on the exact functions that were used during the drafting task, the lack of experience of those participants may have adversely affected their task times, mouse movement, and number of window switches. Also, there was no control for dual monitor experience, thus the results of the study may indeed have been impacted by user experience. When examining these variables no strong differences were seen between inexperienced and experienced AutoCAD users or multiple monitor users. The third possible explanation of why the experiment did not find differences between single and dual monitors was that the four window task could be completed using three windows at a time. Studying the four window tasks more closely, the tasks closely resemble a set of two separate tasks, rather than one task that requires all four windows to be used simultaneously. Therefore, only three windows were used simultaneously. Had the instructions stated that the user should perform both subtasks simultaneously, the results may have been different. Alternatively, if a between subject experimental design was utilized, equal tasks would not have been required. Another issue that may have affected the data arose during the design of the experiment. The researchers determined that it was necessary to create varying tasks of the same number of windows with equivalent task times. Using a GOMS analysis, two tasks for each of the 2, 3, and 4 window levels were created. However, during the experimental trials the researchers noticed that the task times for the variations in each of the 2, 3, and 4 window sets were not equal. The researchers determined that the cause of the disparities was due to the varying amounts of reading and comprehension that took place between the tasks. The tasks were then adjusted to equalize the task times, and the GOMS analysis results were adjusted to reflect the changes. It is possible that the adjustments to the tasks could have caused these differences. However, without a formal task analysis that incorporates a key stroke analysis and considers mental activity, it was impossible to form tasks that are truly equivalent. Therefore, it is recommended that future researchers consider developing a formal analysis that can be used to accurately form tasks that are equal in complexity. In conclusion to the dual monitor analysis, there may still be advantages to having multiple monitors despite the lack of evidence that they increase productivity in terms of time savings. The post-experimental survey revealed that people enjoyed using dual monitors, and believed that they were more productive when using them. If users are more satisfied with using a piece of equipment, job fatigue may be reduced while motivation and work morale may be increased. It would then be logical that providing multiple monitors to a work force would increase productivity for drafting, data mining, programming, and for gathering and recoding large amounts of information. Multiple monitors are especially advantageous when dynamic information, data that requires the user to frequently check, is being displayed by the user. Using a single monitor system would cause the user to allocate prime screen real estate to be able to view the information instantaneously, or cause them to search for the windows more often, thus reducing productivity. With a multiple monitor system, a user is able to designate an area or monitor to view secondary information.
92
A. Stegman, C. Ling, and R. Shehab
The results for three WMS analyses were surprising, no significant differences were found for any of the WMS. The trends of the means generally showed that there were some differences; therefore, perhaps an additional study with larger sample size would be able to show a statistically different result. Although, Kang and Stasko’s styles showed larger differences in performance for each WMS, compared to the TOG and RES WMS, there was still criticism of their categorization method. Kang and Stasko’s styles are based on both window size and how much information can be viewed. While the styles worked for single monitors, it did not effectively categorize dual monitor users. This was due to the fact that the participants could be classified as multiple styles simultaneously since two different styles could be utilized on different monitors. During the analysis, the process of categorizing each participant was difficult, due to the parameters set for each style. This occurrence was largely due to window size being used as a factor in determining what WMS a participant used. For example, during a four window task, a participant may have two maximized windows on one monitor, toggling between them, while on the second monitor having two windows sized so that both are fully visible at one time. Therefore, that user would be a Maximizer on one monitor and a Careful Coordinator on the second. Using the TOG and RES styles, the same participant would be identified as a RES, because they are generally trying to view as many windows as they can simultaneously. The reason that the TOG and RES categorization method was determined to have an advantage was because it focused on how a participant was gathering information. The most important factor on how windows were managed was how the participant was retrieving information, and how much information they were viewing at one time. Thus, the potential improvement to performance from being able to view more information at once would drive a user to manage their windows differently. While Kang and Stasko’s method considered how the windows are positioned, when they considered window size as a factor they confounded the process of categorization. The exact size of the windows does not matter; what actually matters is whether the participant can view multiple sources of information. In future studies multiple monitor productivity should examine the influence by many different factors such as: monitor size, monitor separation, integration with laptops, software, operating systems, and WMS. From the results of this research, it is recommended that more attention be placed on WMS and the amount of windows being used simultaneously. Alternatively, increasing task complexity by forcing participants to reference windows more often should also be examined. For both methods of increasing task complexity, it is also important to measure how the different styles of window management compare to each other when task complexity is increased. Finally, future research should also place more focus on recruiting an equal sample size of each of the WMS and longer experiment times.
References 1. Ashdown, M., Oka, K., Sato, Y.: Combining Head Tracking and Mouse Input for a GUI on Multiple Monitors. Computer Human Interatcion- Extended Abstracts (2005) 2. Bailey, B., Konstan, J., Carlis, J.: Measuring the Effects of Interruptions on Task Performance in the User Interface. IEEE, 757–762 (2000)
A Comparison between Single and Dual Monitor Productivity
93
3. Card, S., Moran, T., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum, Hillsdale (1983) 4. Covin, J., Tobler, N., Anderson, J.A.: Productivity and multi-screen displays. Rocky Mountain Comm. Review, Dept. Comm. Univ. Utah 2(1), 31–53 (2004) 5. Czerwinski, M., Smith, G., Regan, T., Meyers, B., Robertson, G., Starkweather, G.: Toward characterizing the productivity benefits of very large displays. In: Proceedings of Interact 2003, pp. 9–16 (2003) 6. Grudin, J.: Partitioning Digital Worlds: Focal and Peripheral Awareneness in Multiple Monitor Use. In: Proceedings of Computer Human Interaction 2002, pp. 458–465 (2003) 7. Hutchings, D., Stasko, J.: New operations for display space management and window management. GVU Techinical Report GIT-GVU, pp. 2-18 (2002) 8. Hutchings, D., Czerwinski, M., Smith, G., Meyers, B., Robetson, G.: Display space usage and window management operation comparisons between single monitor and multiple monitor users. In: Proceedings of AMI 2004, pp. 32–39 (2004) 9. Hutchings, D.R., Stasko, J.: Mudibo: Multiple Dialog Boxes for Multiple Monitors. In: Proceedings of Computer-Human Interaction 2005 Extended Abstracts, pp. 1471–1474 (2005) 10. Hutchings, D. R., Stasko, J.: Revisited display space management: understanding current practice to inform next generation design. In: Proceedings of Graphics Interface, pp. 127– 134 (2004) 11. Jaschinski, W., Heuer, H., Kylian, H.: Preferred position of visual displays relative to the eyes: A field study of visual strain and individual differences. Ergonomics 41(7), 1034– 1049 (1998) 12. Jon Peddie Research (2008), 2008 CAD report, http://www.jonpeddie.com/publications/cad_report/ 13. Kang, Y., Stasko, J.: Lightweight task/application performance using Single versus Multiple monitors: A comparative study. In: Proceedings of the Graphics Interface (2008), pp. 17–24 (2008) 14. Mackinlay, J.D., Heer, J.: Wideband Displays: Mitigating Multiple Monitor Seams. In: Proceedings of Computer-Human Interaction Conference (2004) 15. Ringel, M.: When one isn’t enough: an analysis of virtual desktop usage strategies and their implication for design. In: Proceedings of Computer Human Interaction Extended Abstracts 2003, pp. 762–763 (2003) 16. Robertson, G., Czerwinski, M., Baudisch, P., Meyers, B., Robbins, D., Smith, G., Tan, D.: The Large-Display User Experience, pp. 44–51. IEEE Computer Society, Los Alamitos (2005) 17. John St., M., Harris, W., Osga, G.A.: Designing for multitasking environments: Multiple monitors versus multiple windows. Proceedings of Human Factors and Ergonomics Society. 41, 1313–1317 (1997) 18. St. John, M., Manes, D.I., Oonk, H.M., Ko, H.: Workspace Control Diagrams and HeadMounted Displays as Alternative to Multiple Monitors in Information-Rich Environments. In: Proceedings of Human Factors and Ergonomics Society, 43 (1999) 19. Tan, D.S., Czerwinski, M.: Effects of visual separation and physical continuities when distributing information across multiple displays. In: Proceedings of INTERACT, pp. 252– 265 (2003) 20. Tullis, T., Albert, B.: Measuring the user experience. Morgan Kaufmann, San Francisco (2008) 21. Valleta, R.: Computer use and the U.S. Wage Distribution, 1984-2003. FRBSF Working Paper 2006-34, pp. 32-39 (October 2006)
Interface Evaluation of Web-Based e-Picture Books in Taiwan Pei-shiuan Tsai1,2 and Man-lai You1 1 Graduate School of Design National Yunlin University of Science & Technology, Taiwan 2 Department of Early Childhood Education Taipei Municipal University of Education, Taiwan
[email protected],
[email protected]
Abstract. Web-based e-Picture books can integrate the elements of multimedia and offer the special reading experience that is different from printed picture books. The research aims to understand the development status of three epicture book websites in Taiwan and give recommendations for improvement. Through 1) 12 adults who filled out the questionnaires about browing e-Picture books; and 2) 10 surveyed (two teachers, two parents and six children) who were observed in their operation and given in-depth interviews, the research analyzed three interface designs of e-picture websites in Taiwan: “Guru Bear Parent-Child Common Reading Network: Dear Bear Reading Room”, “Kiddo Book” and “CCA (Council for Cultural Affairs) Children Cultural Center: Picture book Garden”. The results of analysis were: 1) most of them were flipping pattern; 2) primarily linear development; 3) less interactive; 4) most of them were adapted from physical printed books; 5) it is difficult for children to register additional account and install browsing software. The recommendations for future publishers and designers were: 1) increase the interaction of the story; 2) Make good use of multimedia interactive design elements; 3) Enhance user control; 4) Integrate e-Picture Book platform; and 5) create all-new e-Picture books. Keywords: e-Picture book, e-Storybook, e-Book, Usability.
1 Introduction After the launch of iPad by Apple, the e-Book market has reach climax. The e-Books in the current markets could mainly refer to the e-Book readers that every manufacturer has their own specifications. But most of the e-Books present themselves on the reader in the form of static image and text, or even only the electronization of printed books. So strictly speaking, there are few e-Books that use multimedia elements and interactive design to design exclusive contents for the eBook. Being not able to recognize the words, pre-school children usually look at the colors and patterns with high novelty. E-Picture book is drawing-based, supplemented G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 94–102, 2011. © Springer-Verlag Berlin Heidelberg 2011
Interface Evaluation of Web-Based e-Picture Books in Taiwan
95
by a small amount of text and could be a helper for cultivating reading habits. It combines multimedia elements plus convenient online platform to design e-Picture book. From the aspect of creation, production and browsing, it is a new try and challenge that could transmit and distribute by an economic and convenient way and provides reading contents that have different media expression and sensory stimulation. The research selected three e-Picture books websites in Taiwan to conduct interface design, evaluation and analysis and understand the current situation of development before providing recommendations for improvement.
2 Literature Review 2.1 Electronic Book An electronic book is a text and image-based publication in digital form produced on, published by, and readable on computers or other digital devices (Wikipedia, n.d.). The Chinese term “electronic book” is directly translated from English. In literature, Van Dam mentioned electronic books for the first time. In a broad sense, it means the media that stores and transmits the characters and pictures information through electronic channel (Lwo, 1995). Barker (1992) argued that the electronic book was used to describe new type of books that was different from traditional paper books. But like paper books, they were composed of pages. The difference was that each page of an electronic book was designed and dynamic electronic information. Electronic book could be considered an aggregation of multi-pages, responsive and lively multi-media (includes information of characters, picture or voice). A picture book is an art form that combines visual and verbal narratives in a book format. A true picture book tells the story both with words and pictures. Electronic picture book (or e-Picture book, EPB) is to present picture books in the electronic form including CD-ROM, WWW. The applied multi-media elements include characters, pictures, animations, voice, sound effects and music. It mainly operates through mouse and keyboard in user control (interactive operation pattern). The manipulation of mouse includes drag and click whereas the manipulation of keyboard I includes character enter and key enter. The source of story materials includes adaptation and creation. The e-Picture book of the present invention includes a plurality of pages graphically depicting or telling a story. The e-Picture book in the research means the web-based e-Picture book. 2.2 Usability Usability is the ease of use and learnability of a human-made object. Lazar (2006) highlights ease-of-use as an equally important usability consideration he also advocates for a balanced approach to Web design that allows for the appropriate use of media elements such as graphics, plug-ins, and animation. Schneiderman (1993) emphasizes consistency and predictability in interface design that provides for a high
96
P. Tsai and M. You
level of user control. Usability means that the people who use the product can do so quickly and easily to accomplish their own tasks. This definition rests on four points: (1) Usability means focusing on users; (2) people use products to be productive; (3) are busy people trying to accomplish tasks; and (4) users decide when a product is easy to use. (Dumas & Redish, 1999) Usability is the quality of attribute that assesses how easy user interfaces are to use. The word "usability" also refers to methods for improving ease-of-use during the design process. Usability is defined by five quality components: (1) Learnability: How easy is it for users to accomplish basic tasks the first time they encounter the design? (2) Efficiency: Once users have learned the design, how quickly can they perform tasks? (3) Memorability: When the users return to the design after a period of not using it, how easily can they reestablish proficiency? (4) Errors: How many errors do users make, how severe are these errors, and how easily can they recover from the errors? (5) Satisfaction: How pleasant is it to use the design? (Nielsen, 2003) In conclusion, a usability analysis can tell us which part of page users view first, how long they stay and where they go next. Usability is about analysing how a user interacts with a website and using that information to make the website as userfriendly as possible.
3 Method 3.1 Procedure 1. Browse separately assigned e-Picture books (two for each type of each website). 2. Conduct questionnaire surveys and interviews through five-point Likert items: a. strongly disagree; b. disagree; c: neither agree or disagree; d. agree; e. strongly disagree. The contents of questionnaire include the satisfaction with the items such as overall page design, convenient to find e-Picture books, easy to operate, appropriate size of footage of story, animation design of story, suitable for children on their own operations. The primary contents of interviews focused on the questions on the e-Picture operations in different websites to understand the reason of satisfaction or dissatisfaction. 3.2 Subjects 1. 12 adults filled out the questionnaires (their education background included ten graduated from the children education related departments, two from the department of the Chinese language and literature; the occupations included three mothers who had young children, two kindergarten teachers, two elementary school teachers, five early childhood education educators). 2. 10 people included six children (one 10-year-old boy, one 10-year-old girl, two 7year-old boys and two 7-year-old girls), two kindergarten teachers, two parents were received one-to-one in-depth interviews on the e-Picture books they had browsed. Among them, four adults were picked up from the adults who had filled out the questionnaires.
Interface Evaluation of Web-Based e-Picture Books in Taiwan
97
Table 1. Comparison of three e-Picture book websites in Taiwan Guru Bear Parent - Child Read Together Net
Kiddo book
Website
http://www.gurubear.com.tw/
http://www.kiddobook.com/
http://children.cca.gov.tw/garde n/
Establishment Time
2008.07
2006.12
1999.09
Membership Registration
9
9
咕嚕熊親子共讀網:熊熊閱讀室
Install Browsing Software Fee Trial Subscription
奇豆線上書房
CCA Children Culture Hall Picturebook Garden
兒童文化館:繪本花園
9 9 Need credit for reading and earn credit for publishing article 9 3 trial subscriptions per category Inactive English Picture book
E-Live Picture book
9 NT$290/month
Free
9 6 books trial subscription
The contents of trial subscription is not complete version
Electronic Book
Story Amount
24
27
35
111
144
Story Voice
9
9
9
9
9
Story Animation
Good
Better
None
Limited
Better
yGame zone yThree-dimensional play zone yCreation zone
yGame (each story) yReading guide (each story) yDiscussion (each story)
Extension Activities
yInteractive Learning
2011.01.31 Update
Fig. 1. Example of Inactive English Picturebook
Fig. 2. Example of Kiddo book
Fig. 3. Example of CCA Children Culture Hall - Picturebook Garden
98
P. Tsai and M. You
4 Results and Conclusions Summing up the results of questionnaire and interviews and list as follows: 1. The Design of Website Homepage. The three websites get not bad evaluation. The average scores are above 3.6. 2. The Design of Interface The e-Picture book of “Kiddo Book” needs extra installment of browsing software and has more options of interactive functions, the average score of manipulation design is the lowest in all websites. The interface design of “Kiddo Book” is considered by users as the best one. It provides switching of previous/next pages, options of different captions (English/Chinese/Chinese plus phonetic notation), options of text location, play/pause, automatic/manual, etc., that is, it provide more user’s control. 3. the Screen Size of e-Picture book The screen size of “Children Culture Hall” and “Guru Bear Parent - Children Read Together Net - Interactive English Picture book” are the smallest and their average scores are the lower. The e-Picture book of “Kiddo Book” is playing in almost full screen and get highest score 4.7. 4. the Design of Story Animation The story animation of “Guru Bear Parent - Children Read Together Net – E-Live Picture Book” and “Children Culture Hall” has more camera effects and dynamic expression of characters and gets higher scores. 5. Story Voice “Guru Bear Parent - Children Read Together Net – E-Live Picture Book” and “Kiddo Book” are most highly praised by interviewers. The dubbing of different roles is different. 6. Whether suitable for children to self-manipulate “Kiddo Book” needs extra installment of browsing software, so it obtains the lowest score 2.4. “Children Culture Hall” need not to register user’s name and most easy to manipulation, so it obtains the highest score 4.2. Four 7-year-old interviewee children indicated that it is difficult for them to register user’s name and had no such experiences. It is also difficult for all children to install software for viewing the picture books. As a whole, there were following common points in the designs of e-Picturebooks: a) flipping pattern; b) linear story; c) insufficient interaction; d) most of them were adapted from printed books; e) needed extra registration account number and installment of browsing software that was difficult for children. So far, there have not yet been the most satisfactory design for web e-Picture book and there still are much space for improvement.
Interface Evaluation of Web-Based e-Picture Books in Taiwan
99
Table 2. Analysis of of Questionnaire Statistics N=12
Mean
Q1-1
4.0
Q1-2
3.6
Q1-3 Q2-1
S.D.
Min
Max
0.603
3
5
0.900
2
5
3.8
0.622
3
5
3.4
0.996
2
5
Q2-2
3.3
0.754
2
4
Q2-3
3.8
0.577
3
5
Q3-1
3.3
0.622
2
4
Q3-2
4.1
0.515
3
5
Q3-3
3.8
0.452
3
4
Q3-4
3.0
0.739
2
4
Q3-5
4.0
0.739
3
5
Q4-1
2.5
0.798
1
3
Q4-2
3.0
0.853
1
4
Q4-3
3.8
0.754
2
5
Q4-4
4.7
0.492
4
5
Q4-5
2.6
0.900
1
4
Q5-1
3.1
0.669
2
4
Q5-2
4.0
0.953
2
5
Q5-3
2.7
0.888
1
4
Q5-4
3.6
0.996
2
5
Q5-5
4.0
0.739
3
5
Q6-1
3.2
0.937
1
5
Q6-2
3.8
0.866
2
5
Q6-3
3.9
0.515
3
5
Q6-4
2.4
0.996
1
4
Q6-5
4.1
0.793
3
5
5 Suggestions 1. Increase the interaction of story The e-Picture books in the three websites were adapted from physical printed books that the publishers operated them for nothing more than the added-values of original publications. Most of story contents are linear developments that lack of interaction. In future, they should bring the advantages of multimedia and web into the full play and increase double or multi-routes of story development to strengthen the interactions between the readers and story contents.
100
P. Tsai and M. You
2. Make good use of multimedia interactive design elements The e-Picture book should emphasize more on the use of multimedia interactive design and allow the readers have the reading experiences that are different from the printed books. Unfortunately, the performance on the multi-media by existing ePicture books in three websites was not, on the contrary, good as Living Books series. So they should make good use of the advantages of web version to interact with their readers to extend reading activities. 3. Enhance user control “Kido Book” provides the function options including page up and down switch, different subtitle options (English/Chinese/Chinese plus phonetic notations), options of text location, play/pause, auto/manual play. It provides the most user controls in the three websites. Most of the surveyed recommend that it increase the control options of the size of screens and subtitles that the users could decide by themselves the size of picture and subtitle, that is, it enhances the users’ control power. 4. Integrate e-Picture Book platform At present time, most of e-Picture Books websites in Taiwan are operated by the publishers, some even need to install browsing software for reading that is not convenient to the readers. In future, they should construct an integrated platform that the e-Picture Books from different publishers could be read by the users at the same time. It would not need to set different accounts and could further to reach the goal of mutual exchange and healthy competition. 5. Creative All-new e-Picture book Most of the existed e-Picture books came from the adaptation of physical printed picture books that is the digitalization of existed publications. They usually use scanning method to process the images or part of images were processed through simple animation and nothing else. It is pity that they did not consider the creation of e-Picture books from the angle of multimedia elements and interaction. So the creation and publication of all-new e-Picture books should be encouraged.
References 1. Barker, P.: Electronic books and libraries of the future. The Electronic Library 10(3), 139–149 (1992) 2. Dumas, J.S., Redish, J.C.: A practical guide to usability testing. Intellectual books, Portland (1999) 3. Lazar, J.: Web usability: A user-centered design approach. Pearson Education, Inc., Boston (2006) 4. Lwo, L.-S.: Electronic books and the new communication era. Instructional technology and media, 21, 13-16 (1995) (Chinese) 5. Nielsen, J.: Usability 101: Introduction to Usability (2003), http://www.useit.com/alertbox/20030825.html (Retrieved December 20, 2008) 6. Schneiderman, B.: Leonardo’s laptop: Human needs and the new computing technologies. The MIT Press, Cambridge (2002) 7. Wikipedia. E-book (n.d.). Retrieved, from http://en.wikipedia.org/wiki/E-book
Interface Evaluation of Web-Based e-Picture Books in Taiwan
101
Appendix: The Results of Questionnaire Statistics Questions I like the website design of “Guru Bear Parent-Child Read Together Net”
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Ave S.D. 4
4
4
4
3
3
5
4
4
4
4
4
4.0 0.603
I like the website design of “Kiddo Book”
4
4
5
2
3
3
4
5
3
4
3
4
3.6 0.900
I like the website design of “Children Culture Hall” I can find the e-Picture book of “Guru Bear Parent-Child Read Together Net” easily I can find the e-Picture book in “Kiddo Book” easily I can find the e-Picture book in “Children Culture Hall easily I feel easy to manipulate “Bear Reading Room –– Interactive English Picture book” in “ Guru Bear Parent-Child Read Together Net” I feel easy to manipulate “Bear Reading Room –– E Live Picture book” in “Guru Bear Parent-Child Read Together Net” I feel easy to manipulate “Bear Reading Room – Electronic Picture Book” in “Guru Bear Parent-Child Read Together Net” I feel easy to manipulate “Kiddo Study Room” in “Kiddo Book” I feel easy to manipulate “Picture book Garden” in “Children Culture Hall” I feel the screen size of “Bear Reading Room –– Interactive English Picture book” in “Guru Bear Parent-Child Read Together Net” is appropriate I feel the screen size of “Bear Reading Room –– E Live Picture book” in “Guru Bear Parent-Child Read Together Net” is appropriate. I feel the screen size of “Bear Reading Room—Electronic Picture book” in “Guru Bear Parent-Child Read Together Net” is appropriate I feel the screen size of “Kiddo Study Room” in “Kiddo Book” is appropriate I feel the screen size of “Picture book Garden in “Children Culture Hall” is appropriate I like the story animation of “Bear Reading Room –– Interactive English Picture book” in “Guru Bear Parent-Child Read Together Net” I like the story animation of “ Bear Reading Room –– E Live Picture book” in “Guru Bear Parent-Child Read Together Net” I like the story animation of “Bear Reading Room—Electronic Picture book” in “Guru Bear Parent-Child Read Together Net” I like the story animation of “Kiddo Study Room” in “Kiddo Book” I like the story animation of “Picture book Garden” in “Children Culture Hall”
4
4
4
4
3
5
3
3
4
3
4
4
3.8 0.622
2
4
4
4
2
4
5
2
4
4
3
3
3.4 0.996
4
3
4
2
2
3
4
4
3
3
3
4
3.3 0.754
4
4
3
5
4
4
3
4
4
4
4
3
3.8 0.577
4
3
4
3
3
4
3
2
3
4
3
3
3.3 0.622
4
5
4
4
3
4
4
5
4
4
4
4
4.1 0.515
4
4
4
3
3
4
4
4
4
4
4
3
3.8 0.452
4
3
4
2
3
3
2
2
3
4
3
3
3.0 0.739
4
4
3
5
4
5
3
4
4
5
3
4
4.0 0.739
2
3
4
3
3
2
2
1
2
3
2
3
2.5 0.798
3
3
4
4
3
2
4
1
3
3
3
3
3.0 0.853
2
4
4
5
4
3
4
4
4
4
3
4
3.8 0.754
5
5
5
4
4
5
5
4
5
5
4
5
4.7 0.492
4
3
3
3
2
2
2
1
2
4
2
3
2.6 0.900
4
3
3
4
3
4
2
2
3
3
3
3
3.1 0.669
4
5
5
5
4
2
5
4
4
4
3
3
4.0 0.953
3
4
4
3
2
3
3
1
2
2
3
2
2.7 0.888
5
5
4
3
2
5
3
3
3
3
4
3
3.6 0.996
4
3
3
4
4
4
4
5
5
5
4
3
4.0 0.739
102
P. Tsai and M. You
Questions I think it is suitable for the elementary school children to self-manipulate “Bear Reading Room – Interactive English Picture book” in “Guru Bear Parent-Child Read Together Net” I think it is suitable for the elementary school children to self-manipulate “Bear Reading Room –– E Live Picture book” in “Guru Bear Parent-Child Read Together Net” I think it is suitable for the elementary school children to self-manipulate “Bear Reading Room—Electronic Picture book” in “Guru Bear Parent-Child Read Together Net” I think it is suitable for the elementary school children to self-manipulate “Kiddo Study Room –– Electronic Picture book” in “ Kiddo Book” I think it is suitable for the elementary school children to self-manipulate “Picture book Garden –– Electronic Picture book” in “ Children Culture Hall”
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Ave S.D. 3
4
4
5
3
3
3
1
3
3
3
3
3.2 0.937
2
4
4
5
3
4
4
5
4
4
3
3
3.8 0.866
3
4
4
5
3
4
4
4
4
4
4
4
3.9 0.515
2
4
4
2
3
2
1
1
2
3
3
2
2.4 0.996
5
4
4
5
3
5
3
4
4
5
3
4
4.1 0.793
A Digital Archive System for Preserving Audio and Visual Space Makoto Uesaka, Yusuke Ikegaya, and Tomohito Yamamoto College of Information Science and Human Communication, Kanazawa Institute of Technology 7-1 Oogigaoka, Nonoichi, Ishikawa, 921-8501 Japan
[email protected]
Abstract. A digital archive system has been widespread in various fields because it can preserve precious cultural heritage, books, pictures or videos without any deterioration. Moreover, preserving its information on the web, a digital archive system can share a lot of things between general users, and can pass them down new generation easily. In this research, we focus on spatial information of a place or an event which can provide high presence and retrieve personal memories, and develop a digital archive system which can preserve such kind of spatial information. Keywords: Digital Archive, Omnidirectional image, Multi-channel audio, Spatial information.
1 Introduction A digital archive system has been widespread in various fields because it can preserve historical and cultural heritage such as paintings, pictures or sculptures without any deterioration [1]-[4]. Moreover, by providing its data on the web, a digital archive system realizes to share a lot of things between general users, and pass them down next generation. However these digital archives, especially archives provided on the web are likely to be composed of a flat picture and stereo sound. As a result, users can not get as same reality or presence as real things. In the field of Virtual Reality (VR), to solve this problem, some researches preserve historical heritage by high realistic way and display them by special display system. For example, Abe et. al. have preserved “Maijishan Grotto” in China by stereo camera and displayed them by stereo graphics [5]. In addition to such the research, not only precious historical heritage but also daily life of all around the world has been archived. For example, Watanabe et. al. have archived personal data of people in Tuvalu and reported daily life of Tuvalu where has suffered from sea surface elevation by global warming [6]. In the research of the digital archive of historical heritage, it is possible to preserve and display them in high realistic way. However such the system tends to need expensive equipments for measuring and displaying. Moreover, the high realistic G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 103–110, 2011. © Springer-Verlag Berlin Heidelberg 2011
104
M. Uesaka, Y. Ikegaya, and T. Yamamoto
display tends to need large space for setting up and does not have mobility. As a result, only precious historical heritage can be archived in the system. On the other hand, in the research of the digital archive of daily life, it provides only photograph, video or text on the web. Therefore it is difficult to feel the reality of peoples’ daily life deeply. In this research, we develop the digital archive system which provides high realistic information without any expensive equipments but mobile devices. In our system, archived data are provided from website, and which can display an omnidirectional image using WebGL and stereo sound using HTML 5. Users also can download the archived contents into our audio-visual display [7], and enjoy them in high realistic way. Moreover, for the archive system, we shoot some contents which are not historical heritage but personally or locally important scenery and space.
2 A Digital Archive System 2.1 System Overview Our digital archive system is composed of two components (Fig.1). One is web system which provides archived contents. The other is spatial audio-visual display which realizes to reproduce downloaded contents. Users can enjoy archived contents by the procedure below; 1. Users access our web site from local PC. 2. Selecting an archive from lists, and users preview it on the browser. 3. Downloading a preferred archive to local PC, and users set up archived data into our display system and enjoy them. Server side of the web system generates a web page dynamically to client side. Client side realizes to preview and download contents and so on. The spatial audio-visual display is composed of head mount display (HMD) and multiple mobile devices, and which can reproduce downloaded contents by high realistic representation.
Fig. 1. System Overview
A Digital Archive System for Preserving Audio and Visual Space
105
2.2 Client and Server Side of the Web System Fig.2 shows a top page of a web site. The top page has a function that realizes to search contents from map of Hokuriku area in Japan, or by title, tags and created date. Fig.3(a),(b) show preview images of archived contents. An image of Fig.3(a),(b) derived from a same omnidirectional image but different view angle. In this preview mode, users can see archived space and listen to archived surround sound easily. In this system, archived image is mapped into sphere model, and virtual camera is arranged in center of sphere by using WebGL. If users want to see omnidirectional view, users just click and drag on the displayed image. Moreover, sound data correspond to movement of omnidirectional view (sound data have its own position data in the sphere). If user moves the direction of view, direction and volume of sound also are changed by using API of HTML5.
Fig. 2. Top page of web site
(a)
(b) Fig. 3. Preview page
106
M. Uesaka, Y. Ikegaya, and T. Yamamoto
Server side of web system is composed of Fedora for the OS, MySQL for the database server, Apache for HTTP server. In server side, generating dynamical page, uploading and downloading process and searching function are implemented using PHP. 2.3 Spatial Audio-Visual Display System To enjoy archived contents in high realistic way, users download contents in local PC, and set up them into our audio-visual display system [7]. Our system is composed of multiple terminals such as a PC or an iPhone as shown in Fig.4. Each terminal is connected to LAN and these are divided into one server, one visual client and multiple sound clients. In this system, at first, the server simulates an archived space. After simulation, the server transmits the information of the virtual space to terminals by UDP multicast. Then, terminals present the high presence to a user using received information. To reproduce spatial sound, sound clients calculate distance decrement of sound using three positions (a listener’s position, a sound source’s position and a sound client’s position). Using these positions, the system calculates the volume in real time and reproduces a spatial sound by the volume difference of each client. To represent spatial view, visual client uses a head mount display and a webcamera. For presenting spatial view, the visual client detects user’s posture by head tracking sensor, and reflects it to viewing angle in the virtual space. This system is basically implemented as a software level, and allows any device to become the terminal if the device can be connected to network. Therefore, using devices such as desktop, laptop, or smart phone which have been widespread, our system can be built relatively inexpensive.
Fig. 4. Spatial audio-visual display
A Digital Archive System for Preserving Audio and Visual Space
107
3 Contents Digital archive systems developed so far have mostly focused on highly valuable heritage such as historical sculptures or fine arts. One reason why such heritage has been archived is that it has been very expensive to create such archives. However in the world, there are a lot of precious things to archive such as rural beautiful scenery, local festival or events, or personally valuable space. Some of these scenery, events tend to exist in a moment, or tend to fade away if population of its area decrease in the future. In this research, we focus on such things and archive them in low cost way, and provide in highly realistic way. Concretely, in this research, we have archived beautiful rural scenery of Hakusan area in Ishikawa, and daily student life of Kanazawa Institute of Technology. In Hakusan area, a population decreases year by year, because there are few jobs in the area. Therefore some villages will be vanished in the future, and beautiful rural scenery called “Satoyama (a very special type of natural environment that cannot exist without moderate intervention of human beings)” also will be vanished. To archive such rural part contributes to preserve the area itself, and if the scenery will be lost, it is possible to tell how this area was to next generation. Students have a lot of enjoyable, miserable or emotional experience in their student life. Ordinary, they can remember fragments of such memories by photograph or video. If our system provides where they were or how they were in high realistic way, they can experience as if they came back to their student life. Moreover, after student became a parent, they can tell children their student life. 3.1 Making an Omnidirectional Image and Video Fig.5(a) shows all of shooting kit of an omnidirectional image (a general digital camera, a panoramic camera mount and a tripod). To make an omnidirectional image, at first, it is necessary to take pictures by rotating a camera mount by 30 degree. Next, to angle a camera mount up and downward, and take pictures by rotating a mount. Finally, to take a picture of the top. By using these taken pictures (about 40 pictures), an omnidirectional image is created by editing soft (Panoweaver). Fig.6 shows pictures before editing and Fig.7 shows a created omnidirectional image. To make an omnidirectional video, Ladybug2 (PointGreyResearch) is used. Fig.5(b) shows shooting kit. This 6 lens camera can create omnidirectional video at real time (4.7mega pixel per image). 3.2 Making Spatial Sound In this research, R-09HR (Roland) and a general headphone were used to record sound. However, if it is possible to record sound source clearly in WAVE format, any device is acceptable. Recorded sound sources are placed in virtual sphere in which a created omnidirectional image is mapped. After that, audio data and visual data integrated into one file set (contents information is described in XML format). When reproducing contents, our audio-visual display reads its XML data and simulates archived space.
108
M. Uesaka, Y. Ikegaya, and T. Yamamoto
(a)
(b)
Fig. 5. Shooting equipments of an omnidirectional image and video
Fig. 6. Unidirectional pictures
Fig. 7. An omnidirectional Image
A Digital Archive System for Preserving Audio and Visual Space
109
3.3 Examples of a Digital Archive Fig.8-10 show examples of archived contents. Fig.8 shows old Japanese houses called “Gasshou zukuri” which are composed of woods and straw. It is necessary a lot of cost to maintain these houses. However, in this area, young generation go out cities to get jobs, and only elder generation stay. Therefore it is very difficult to keep this scenery in the next generation. Fig.9 shows Satoyama in Hakusan. Satoyama is very popular Japanese scenery in the old days. In Satoyama, famers cultivated their vegetables and cut firewood for stove in winter. However such famers are rare nowadays and Satoyama vanishes year by year.
Fig. 8. An archive of “Gasshou zukuri” houses in Gokayama
Fig. 9. An archive of “Satoyama” in Hakusan
110
M. Uesaka, Y. Ikegaya, and T. Yamamoto
Fig.10 shows students’ project “Tukimi Kouro” of Kanazawa Institute of Technology. In this project, students light up various places in Kanazawa city by handmade lumps. All of students made a lot of effort in this project. Therefore, this archive would remember their students’ life vividly.
Fig. 10. An archive of “Tsukimi Kouro” at Ishiura Shrine in Kanazawa
4 Conclusion In this research, we developed a digital archive system which realize to preserve high realistic spatial information and to provide it at low cost. Moreover we created some local and personal archived contents. In future works, we will increase contents and release the web sites in public. Moreover we will develop the web system to create an omnidirectional view and surround audio for general users to create high realistic spatial archives.
References 1. 2. 3. 4.
Nagasaki Archive, http://nagasaki.mapping.jp/ World Digital Library, http://www.wdl.org/en/ National Archives of Japan, http://www.digital.archives.go.jp/ Daijyouji Temple Digital Museum of the Maruyama School, http://museum.daijishan.or.jp 5. Abe, N., Kawai, T., Ohya, J., Zha, H., Ando, M.: Digital Archiving of Maijishan and Stereoscopic VR Content. TVRSJ 4(3), 275–282 (2009) (in Japanese) 6. Suzuki, M., Watanabe, Y., Endo, S., Watanabe, H.: Tuvalu Visualization Project. SIGGRAPH ASIA 2009, Sketches, Article No.259 (2009) 7. Takahashi, K., Yamamoto, T.: 3D Audio-Visual Display Using Mobile Devices. In: ACM SIGGRAPH (2010) (Posters)
Experience Explorer: Context-Based Browsing of Personal Media Tuomas Vaittinen1, Tuula Kärkkäinen2, and Kimmo Roimela3 1
Nokia Research Center, P.O. Box 407, FI-00045 Nokia Group, Finland
[email protected] 2 Tampere University of Technology, Unit of Human-Centered Technology, P.O. Box 589, 33101 Tampere, Finland
[email protected] 3 Nokia Research Center, P.O. Box 1000, FI-33721 Tampere, Finland
[email protected]
Abstract. We designed and built a system for browsing digital content and activity data created and gathered with mobile phones. We evaluated the system with 13 users to study the value of the context-based visualizations in real life. In addition to supporting reminiscing, content aggregated on the map revealed life patterns supporting reflection. Aggregation of items from several people also revealed common interests among friends. Keywords: Personal content, context, lifelogging, self reflection, user study.
1 Introduction People want to be able to reminisce and reflect on the important events and every day incidents in their lives. Modern technology supports recording of a wide variety of content types. Furthermore, a mobile phone can be used for recording the user's context as well. In addition to private reminiscing, the social uses of lifelogs have been identified as important [1], but little research has been carried out on them. In this paper we present a system for browsing personal media, enhanced by using context information recorded by a mobile phone for organizing the content items. The system is designed for browsing the user’s own content as well as content that the user's friends have shared with him or her. In addition, we describe a user trial that studied what kind of value the gathered content and one’s recorded context, visualized in relation to friends, time and location, would provide for users.
2 Related Work 2.1 Previous Lifelogging Systems The ability to store all the relevant documents, photos and messages during one's life has enticed people from the days of Vannevar Bush [2]. His iconic article described a vision of Memex, a desk-sized tool for storing documents on microfilm and searching G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 111–120, 2011. © Springer-Verlag Berlin Heidelberg 2011
112
T. Vaittinen, T. Kärkkäinen, and K. Roimela
them effectively. Since then, several “memex”-like approaches have also been presented for desktop computers; MyLifeBits, for example, stores all the viewed photos, read documents and listened audio [3]. Typically, lifelogging systems focus on browsing the user’s private content, and do not include features for sharing the content. Furthermore, early lifelogging systems tend to make only limited use of the information about one’s social interactions. Current mobile phones are exceptionally well suited for recording the owner’s life events. In addition to being almost always carried, a mobile phone includes a considerable amount of information about the owner’s communications and social connections. In addition to content and activities, a mobile phone suits well for recording the context of the user. Context can be defined as any information that characterizes the situation of, for example, a person [4]. As an example, a system called Affective Diary records the user's movement, level of arousal and names of Bluetooth devices around the user [5]. It combines this context information with content, like photos and SMS messages, to create visualizations of the user’s activity [5]. 2.2 Browsing the Lifelogs While storing digital memories automatically, the vast amount of the content quickly becomes a challenge. The items valuable for reminiscing become easily obscured with the irrelevant items [6]. When looking for means to cope with that, people and location have been identified as important memory cues in searching photos [7]. Furthermore, people tend to group their photos based on events and later on they rely on the event grouping when browsing the photos [8]. The strong tie between the content and the event in which it has been captured suggests that browsing should be supported simply by automatically grouping the content by events [9]. Timeline view for presenting stored content about one's life is a natural and well established approach used in many research systems such as MyLifeBits [3], LifeLines [10], Lifestreams [11] and Stuff I’ve Seen [12]. While the basic idea of the implementations is the same, different visualization methods and additional information have been used to increase the value of the view. LifeLines uses the thickness and color of the line to indicate the significance of the events presented on the lines [10]. Stuff I've Seen displays search results with a preview of each item in chronological order, indexing the results with landmark events from, for example, the user's personal calendar [12], [13]. 2.3 Value of Lifelogs Photos are a common trigger for reminiscing and lately one of the most popular research approaches to lifelogging is the continuous visual capturing of one’s life with a wearable camera [14], [15], [16]. The findings of Kalnikaité et al. [16] suggest, however, that different data types and views to the data support different types of remembering, from detailed recalling to inferring the past events and habitual patterns. They suggest that image data should be the cornerstone of lifelogs, but richer recordings about the past should be collected. Lifelogs can support general recalling of emotionally valuable events, but they have been shown to help in self-reflection as well [17].
Experience Explorer: Context-Based Browsing of Personal Media
113
Digital memories are often shared with the people who were present at the time of capture [17]. Hence, the information should be shown and provided as a preset when selecting recipients for sharing. Recently, popular photo browsing applications like iPhoto and Picasa have incorporated face recognition to enhance the photos with information about who appear in them. Although the described type of content analysis works for photos, it requires some manual work from the user, and with a variety of media types, different methods are needed for determining who was there.
3 Experience Explorer We built a system called Experience Explorer (ExEx) for recording and browsing digital memories and visualizing one’s life history in relation to friends. The system is designed for browsing the user’s own content and content shared to him or her by friends. The information about the user’s context and interactions with friends is used to support event-based finding of content, to invite reflecting on one’s past actions, and to provide an overview of one’s social life. The possibilities to follow other users’ life too closely are, however, minimized. Discussion on the general privacy issues of the logging system has been published separately [18]. 3.1 System Components The system consists of three main components: (1) a persistent context logging client on the mobile phone, (2) a central server hosting the data collected with the logging client, and (3) a desktop web application for browsing the collected data. For logging the context data, we use Nokia Simple Context Collector. Its client runs as a background task in the user’s S60 phone and periodically samples context data such as GPS location, network cell ID, and the surrounding Bluetooth (BT) and Wireless Local Area Network (WLAN) device environment. In addition, it collects music listening data, call log information and sent and received SMSs. All the data is uploaded to a network server, from where our central server fetches it and stores the resulting context information in a database. In addition to the context data, our server scans selected third-party services, such as Flickr, for personal digital media uploaded by the users. The metadata about each content item is harvested and stored into the same database as the context data. The original content remains on the third-party servers, but details such as the title, description, and thumbnails for visual content are stored in our database for indexing. A more detailed description of the platform has been published separately [19]. Having the joint dataset of context and content metadata, we apply a number of algorithms to determine relations between the users and the content items. 3.2 User Interface The UI of ExEx is implemented in Adobe Flex and has three main views: the lifethreads, map, and item info views. The lifethreads and map views are used for browsing content previews and for selecting items to be viewed individually in the item info view or to be shared within the system. The content item types supported by
114
T. Vaittinen, T. Kärkkäinen, and K. Roimela
the system are photos, videos, listened-to music tracks, text messages, phone calls and location tracks (i.e. sequences of stored GPS coordinates). Lifethreads View. The lifethreads view presents items in chronological order on user’s own and his or her friends’ lifethreads. User's own lifethread is drawn vertically on the left, with his or her items laid on the line (see Fig. 1). The more there are items created with similar timestamps, the wider the user’s lifethread is at that point. Friends' lifethreads (in the middle and on the right in the Fig. 1) are displayed next to the user’s lifethread, with items in similar vertical chronological order. The horizontal position of the friends’ lifethreads at each point of time depends on the friend’s distance from the user at that time. The user has several options for selecting the distance criteria used in the visualization: physical distance visualizes how far the user’s friends have been from the user geographically, communication activity shows how much the user has communicated with each of the friends, and music taste shows how similar the user’s music taste has been with the friends during the viewed time. When using the physical distance as the closeness criteria, the approach aims to visually group together all the items created in a same event, providing a more complete picture of a shared experience than a single user’s items would.
Fig. 1. Lifethreads view
In the case of photos and videos, the items are presented as thumbnails, and for the rest of the media types, as icons. By holding the mouse cursor over an item, a tooltip is displayed. The items can be selected for closer viewing in the item info view by double clicking them. Map View. The map view lays the items out on top of map graphics to the locations they were created. The location is induced from recorded context data, matching it to the creation time of the item. As the user’s context is continuously recorded, the location can often be deduced on the basis of the recorded GPS data at or around the
Experience Explorer: Context-Based Browsing of Personal Media
115
creation time. When the GPS signal is blocked, the system reverts to the WLAN access points and the mobile network cell IDs. For both of these, it determines the location of each base station based on the GPS data recorded, by any user, while within the range of that base station. Items are displayed as thumbnails or icons similarly as in the lifethreads view. To avoid filling the map with hundreds of overlapping paths, GPS tracks are shown as icons, instead of drawing the full path on the map. Item Info View. The item info view displays a larger version of the item in question and the metadata related to it. The location of the item is shown on a small map below the item. Also, a list of friends who were present at the time of creation is displayed. Next to the friends, thumbnails and icons of related items from oneself and others are displayed. The content shown in the related content is selected using multiple criteria: geographical proximity, similarity of creation time, similarity of surrounding Bluetooth context, and the number of similar tags or keywords. Each item is scored based on the sum of these individual criteria, and the results are presented with the highest-scoring item first. Privacy Considerations. The intention was to provide as rich information about one’s social connections as possible within the acceptable limits of privacy. A user’s exact location at a certain time is visible only for those friends with whom the user has explicitly shared a GPS track or an item with location information. The location of a user is, however, used for the visualization of the lifethreads showing the user’s distance to one's friends, and for listing the friends present at an item creation.
4 Evaluation A user trial was conducted in Finland to evaluate the system. It focused on the reminiscing and self-reflection value that the context-related UI components provided users as well as on gaining insight to the optimal ways to present the information. 4.1 Method The study consisted of two phases: 6 to 9 weeks of gathering content for browsing and 8 to 12 days of using the full system including the ExEx web application for browsing the content. To provide the users with enough content for reminiscing and reflecting on their lives, the content collection and logging started early. The moment the implementation of the web application reached the level of quality to deploy to the users, the second phase started. During the study, the users had Nokia Simple Context Collector running on their mobile phones. They also used ShoZu (www.shozu.com) for tagging the photos and videos with GPS coordinates and uploading them to Flickr, which was used for storing the users’ photos. During the first phase, participants could browse their photos and videos only in Flickr, but in the second phase all their stored content was automatically visible to them in ExEx as well The users were lent Nokia N95 8GB
116
T. Vaittinen, T. Kärkkäinen, and K. Roimela
multimedia phones and compensated for flat rate data plans for the duration of the study. Participants paid the rest of their phone bill, i.e. calls and SMSs, normally. The first phase started with a group interview. After the participants had their data plans activated, we met them again and gave the phones. The instructions were to use the phone as they would normally do, but upload the photos that they took to Flickr. In addition, if they were listening to music while on the move, they were asked to use the phone as their primary player. In the end of the phase, participants were met individually to introduce the desktop web application for browsing the content. They signed into ExEx with a laptop and were able to see their data for the first time. After describing how they interpret the different elements in the UI, small tasks were given to them. For example, they were asked to check with whom they had communicated most around the time the study started. The aim was to find out usability problems and to ensure that all users familiarize themselves with the core functionalities of the system. During the tasks, the users were asked to think aloud as much as they could. In the second phase, the users used ExEx individually for 8 to 12 days. The usage of the different UI components was logged. The participants filled in a diary about their usage and experiences. After this, the users were met individually for the last time for interviewing them on their experience on using the web application. The interviews were recorded and transcribed for analysis. The observations from the usage and diaries, as well as the interview quotes from the sessions were grouped and conclusions were made by reviewing the appearing salient themes. 4.2 Participants The participants were recruited via mailing lists. From the groups who signed up, we selected three groups of friends to participate in the study. Two groups had five participants and one had three so the total was 13 users. 11 of the users were male and two were female (the skew mostly due to a last minute cancellation by one of the groups). Their ages varied from 19 to 32 years (average of 23). 10 of the users were students and three participants were in working life (factory worker, IT consultant and surveying engineer). Participants were all active media consumers and familiar with social media sites. Two of them were active Flickr users already before the study and three had Flickr accounts but had just tried the service a couple of times. 4.3 Results By the end of the trial, the logging had been running 7.5 to 11 weeks depending on the user (including both phases of the trial). During that time, on average 1997 items were logged for each user. The differences between the users were quite large, as 3 of the users had less than 1000 items and one had more than 6000 items to browse. Differences were mainly caused by the varying music listening habits of the participants. Some users listened only to a couple of albums during the trial, whereas some listened to music many hours a day. Each user had on average 427 calls, 320 SMSs, 70 photos, 1.3 videos, 1087 music tracks and 91 location tracks to browse from themselves. In addition, they had some items their friends had made visible to them. The shared items were mainly photos, although some sharing experiments with the other media types were done as well.
Experience Explorer: Context-Based Browsing of Personal Media
117
According to the diary data, the participants used ExEx 2 to 5 times during the individual usage period. The sessions lasted from 5 to 20 minutes and each user's total usage time was between 21 and 70 minutes. In addition, the users used the system in the both interviews for about 45 minutes each. The actual usage was not as active as we had hoped for but the users were well familiar with all the features of the system and everyone had enough experience to reflect on their own needs for such systems. The considered benefits of the logging not only included utilitarian uses of one’s personal log, like reducing memory load, or personal reflection related issues like seeing where one has been and how one has been using the phone, but the information related to one’s friends was considered equally important. The following sections discuss in more detail the interplay of reminiscing, personal reflection and information about friends. In addition, the roles of individual items and aggregated visualizations are considered. The changes that the logging caused to the participants’ every day behavior is discussed separately in another paper [18]. Reminiscing. Lifelogging tools naturally lend themselves to reminiscing past events. The most evocative item types in our study were reported to be the photos and SMSs. Photos were mentioned by two thirds of the users and the SMSs by one third. The special value of the photos was highlighted by the usage logs as well. For example, 42% of clicks on related items in the item info view were done on images, twice as many as on the second most popular type: messages (20%). The chosen presentation style naturally had an effect on the value of the individual items. Seeing only the time and the name of the caller of a phone call did not trigger many memories, whereas some information about the contents of the call might have. In everyday life, content items and phone interactions clustered on the map were more intriguing for the participants than the pure recorded location tracks. However, when the location track recorded a memorable event, for example when taking a tour outside of one’s hometown and away from daily routines, the pure track also provided value for reminiscing. Having made the actions outside of one’s daily routines and circles added to the value of content as well. For example, users who had traveled abroad were especially interested in their actions from the trip shown on the map. ”I was so looking forward to seeing the phone calls and text messages there (on the map of the service). The ones that I made when I was abroad.” (female, 22 years) Self-reflection. Another potential benefit of lifelogging tools is the reflection of one’s habits they might enable. A single music track may have been listened to many times on many occasions, so displaying the plain item in the UI does not pinpoint a certain event for reminiscing as precisely as photos and messages do. However, due to the amount of music listening events logged in our study, they started to reveal participants’ daily life patterns valuable for reflection. The music and other routine actions like phone calls formed interesting and meaningful information when aggregated on map. For example, the frequent paths formed by these icons provided participants with an outsider’s view of their actions: ”This is really intriguing. You can actually see my route from my home to the city center. On my way to work, there are all these, music, some text messages, phone calls, a line is formed by these, what I’ve done. This is really nice.” (female, 22 years)
118
T. Vaittinen, T. Kärkkäinen, and K. Roimela
Friends’ Activities and Social Context. ExEx was designed not only to collect and show the user’s own actions but also to show the social context of one’s life and information about one’s friends’ lives. The overall importance of seeing friends' content in the system came up frequently in the interviews but was highlighted also by the number of clicks users made on friends' photos. 72% of times a user clicked a photo in the lifethreads or map views it was a photo from someone else than the user oneself. The lifethreads view with the physical distance criteria and the related items in the item info view both gathered the user’s and his or her friends’ items from the same event together. The features were valued by the users: “If there are photos from some event you get all of them conveniently since friends' items are attached as well” (male, 25 years) Two participants mentioned in the interview that they found something previously unknown that was common among them. For example, the lifethreads view with the music track items from friends helped noticing listened-to artists common to one’s friends. Again, aggregated visualization supported making the individual items meaningful in the web application. Lifethreads vs. Map. The lifethreads and map views were seen to support each other well. Most of the users considered the lifethreads view to be the most important view of the service. The map view was used about every time the service was used, but still less than the lifethreads view. According to the logs, 63 percent of the time spent in one of the two modes was spent in lifethreads view. It was described as an easy way to get a general view of the recent activity. According to diaries and interviews, users were checking their own activity (especially music listening), whether their friends had been close, and if someone had shared new photos with them. “Well, I start always from checking a couple of last days, has anyone been near me or published some photos recently or something” (female, 21 years) 4.4 Discussion and Future Work Although we were able to collect data about the value the logged data and ExEx visualizations provided users, many issues would benefit from further studies. Sharing happened mostly on photos so a long-term study would be needed to get a better view of, for example, issues in location sharing. Sharing of tracks happened especially while traveling abroad, suggesting self-presentational value, but more data should be collected to get a more complete picture of social uses of lifelog data. The question of real-time sharing also came up, since one participant noticed he had been waiting at the railway station on the same evening as his friend. Information about the past supports reminiscing, but viewing and sharing of live information would support social activities. The slow response time of the lifethreads view and the occasional unpredictable changes in the closeness visualization were criticized and had reduced the motivation of some of the participants to use the system more extensively. However, the view to one's social activity clearly provided value, and insights on implementing similar visualizations were gained.
Experience Explorer: Context-Based Browsing of Personal Media
119
5 Conclusions In this paper, we discuss how automatically captured context data can be used in the UI for organizing personal content in a meaningful way, as well as in visualizing one’s life history in relation to friends to support reminiscing and self reflection. In our study, items with evocative contents, like photos and SMSs, were inherently valuable for reminiscing of singular events. Furthermore the aggregation and locationbased presentation of the content increased also the value of logged actions, like phone calls and music tracks. When aggregated on the map they revealed life patterns supporting reflection. Since the system allowed sharing of items with friends, the reminiscing of joint experiences could be based on items from several participants. The results show that lifelog information recordable with current mobile phones visualized in appropriate ways would not only support reminiscing, but also assist users to realize new things about their own life patterns and themselves. Moreover, as social encounters and communication with others play an essential role in people’s lives, the information about the user’s friends was also highly appreciated in the study. Hence, taking into account the privacy needs, lifelogging systems should support sharing and showing content from friends as well. Acknowledgments. We thank Minna Wäljas,Joel Hakulinen, Tero Hakala, Joonas Itäranta, Ville-Veikko Mattila, Rich Hankins and the C3 team from NRC as well as Mahtava development for their contribution to the study. We would also like to thank Thomas Olsson and David Murphy for their insightful review of the paper.
References 1. Peesapati, S.T., Schwanda, V., Schultz, J., Lepage, M., Jeong, S.-y., Cosley, D.: Pensieve: supporting everyday reminiscence. In: 28th ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 2027–2036. ACM Press, New York (2010) 2. Bush, V.: As We Think. Atlantic Monthly 1, 101–108 (1945) 3. Gemmell, J., Bell, G., Lueder, R., Drucker, S., Wong, C.: MyLifeBits: Fulfilling the Memex Vision. In: 10th ACM International Conference on Multimedia, pp. 235–238. ACM Press, New York (2002) 4. Dey, A.: Understanding and Using Context. Pers. Ubiquit. Comput. 5(1), 4–7 (2001) 5. Ståhl, A., Höök, K., Svensson, M., Taylor, A.S., Combetto, M.: Experiencing the Affective Diary. Pers. Ubiquit. Comput. 13(5), 365–378 (2009) 6. Czerwinski, M., Gage, D.W., Gemmell, J., Marshall, C., Pérez-Quiñones, M., Skeels, M., Catarci, T.: Digital memories in an era of ubiquitous computing and abundant storage. Commun. ACM 49(1), 44–50 (2006) 7. Naaman, M., Harada, S., Wang, Q., Garcia-Molina, H.: Paepcke. A.: Context Data in Georeferenced Digital Photo Collections. In: 12th ACM International Conference on Multimedia, pp. 196–203. ACM Press, New York (2004) 8. Gargi, U., Deng, Y., Tretter, D.: Managing and Searching Personal Photo Collections. Technical report HPL-2002-67, HP Labs (2002), http://www.hpl.hp.com/techreports/2002/HPL-2002-67.pdf
120
T. Vaittinen, T. Kärkkäinen, and K. Roimela
9. Bentley, F., Metcalf, C., Harboe, G.: Personal vs. Commercial Content: The Similarities Between Consumer Use of Photos and Music. In: 24th ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 667–676. ACM Press, New York (2006) 10. Plaisant, C., Milash, B., Rose, A., Widoff, S., Shneiderman, B.: LifeLines: Visualizing Personal Histories. In: 14th ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 221–227. ACM Press, New York (1996) 11. Fertig, S., Freeman, E., Gelernter, D.: Finding and Reminding. SIGCHI Bull. 28(1), 66–69 (1996) 12. Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R., Robbins, D.: Stuff I’ve Seen: a System for Personal Information Retrieval and Re-use. In: 26th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 72–79. ACM Press, New York (2003) 13. Ringel, M., Cutrell, E., Dumais, S., Horvitz, E.: Milestones in Time: The Value of Landmarks in Retrieving Information from Personal Stores. In: Ninth IFIP TC13 International Conference on Human-Computer Interaction, pp. 184–191. IOS Press, Amsterdam (2003) 14. Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: A Retrospective Memory Aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177–193. Springer, Heidelberg (2006) 15. Mann, S., Fung, J., Aimone, C., Sehgal, A., Chen, D.: Designing EyeTap Digital Eyeglasses for Continuous Lifelong Capture and Sharing of Personal Experiences. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 2002–2006. ACM Press, New York (2005) 16. Kalnikaité, V., Sellen, A., Whittaker, S., Kirk, D.: Now Let Me See Where I Was: Understanding How Lifelogs Mediate Memory. In: 28th ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 2045–2054. ACM Press, New York (2010) 17. Olsson, T., Soronen, H., Väänänen-Vainio-Mattila, K.: User Needs and Design Guidelines for Mobile Services for Sharing Digital Life Memories. In: 11th International Conference on Human-Computer Interaction with Mobile Devices and Services, pp. 273–282. ACM Press, New York (2008) 18. Kärkkäinen, T., Vaittinen, T., Väänänen-Vainio-Mattila, K.: I don’t mind being logged, but want to remain in control: a field study of mobile activity and context logging. In: 28th ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 163–172. ACM Press, New York (2010) 19. Belimpasakis, P., Roimela, K., You, Y.: Experience Explorer: a Life-Logging Platform Based on Mobile Context Collection. In: 3rd IEEE Conference on Next Generation Mobile Applications, Services and Technologies, pp. 77–82. IEEE Press, New York (2009)
Service Science Method to Create Pictograms Referring to Sign Languages Naotsune Hosono1, Hiromitsu Inoue2, Hiroyuki Miki3, Michio Suzuki4, Yuji Nagashima5, Yutaka Tomita6, and Sakae Yamamoto7 1 Oki Consulting Solutions Co., Ltd., 4-11-15 Shibaura, Minato-ku, Tokyo 108-8551, Japan 2 Chiba Prefectural University of Health Sciences 3 Oki Electric Ind. Co., Ltd. 4 Association of Architectural of Japanese DEAF 5 kogakuin University 6 Keio Univercity 7 Tokyo University of Science
[email protected]
Abstract. This paper discusses a method to create pictograms referring to several local sign languages with applying the concept of Service Science with Multivariate Analysis (MVA). Since pictograms are universal communication tools, human centred design (HCD) and context analysis by Persona model are applied. The experiments consist of two steps. Through the proposed method, the relationship between selected words and local sign languages are initially explained by sensory evaluation of the subjects. Under the cycle of HCD, the pictogram designer will perform to summarize the expression of several local sign languages by this method. The acquisition of user experience is to include it as a design guideline for context of emergency and traveling situations. Considering the results of the second experienced phase to prove the outcome design, the proposed method is one of the guidelines to create pictograms referring to several sign languages. Keywords: Service Sciences, Human Centred Design, Pictogram, Universal Communication, Sensory Evaluation.
1 Introduction This paper discusses a method to create pictograms or icons referring to several local sign languages with the concept of service science and Multivariate Analysis (MVA) [1]. Since pictograms or icons are universal communication tools, Human Centred Design (HCD) [2] and context analysis by Persona model by Alan Cooper [3] are applied in this research. This research was started in order to investigate the context of universal communication through local sign languages. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 123–130, 2011. © Springer-Verlag Berlin Heidelberg 2011
124
N. Hosono et al.
HCD is based on the context of use which is organized by four factors as user, product, task and environment in use (Figure 1). The research scope covers not only linguistic studies of sign language but also HCD with context of use [4].
Foreign People Impaired People
Travelling Emergency
Input
Users
Tasks
Equipment
Environment
Output Satisfaction Efficiency Effectiveness
Communication Service Collaboration Service
In use at Station, Airport Ambulance, Hospital
Fig. 1. Context of use of Guidance on usability
2 Research Purpose and Issues The purpose of this research is to figure out a method to create meaningful pictograms or icons referring to several local sign languages [5]. The sign language (SL) is basically a communication method from one person to the other. The main factors of sign language consist of the hand shape, location and movement. There is a dilemma that SL is a language with motion whereas pictograms or icons are still ones. There was quite a discussion among researchers. Then hand shapes and locations are drawn by an animation and movements are done by arrows referring to a snapshot of the related local sign languages.
3 Research Procedures Considering such research purpose and issues, the following research procedures are prepared; − − − − − − −
Phase 1: Phase 2: Phase 3: Phase 4: Phase 5: Phase 6: Phase 7:
Determine a concept Create Persona Model and Scenario Extract key words extraction on emergency and travelling situations Conduct first Sensory Evaluation with 7 local sign languages Design summarized pictograms Conduct second Sensory Evaluation with 8 local sign languages Conclude a method
Service Science Method to Create Pictograms Referring to Sign Languages
125
3.1 The Context Determination Based on the concept described above, two context situations have initially been chosen [6]. Alan Cooper proposed the Persona Model related to HCD where several situation representing Personas are imaginably created in order to simulate and find how they will behave under a certain context. This method is highly accepted by the manufacturers in creating new product plans and has been applied to service science as well. 3.2 Persona Model and Scenario Creation The first step is to create two Personas with applying the Persona Model under HCD [3]. The first Persona is a deaf person in a situation where he suffers a sudden illness while commuting in the morning, and is carried to the hospital by an ambulance. The second one is an office lady who lives in Hong Kong and has to visit Tokyo on business and then pleasure (Figure 2). Diary like scenarios underlying Personas are described from discussions with three colleagues utilizing the Brain Storming Method. These scenarios mainly pay attention to the dialogues between the Persona and those people surrounding [7]. The first scenario of the deaf person in an emergency consists of about 600 words (equivalent to 3000 Japanese characters) and second with the traveling woman about 1700 words (equivalent to 8500 characters).
First visit to Japan Sensitivity for apparel fashion Middle income class
Yie Ling (依林) “Interested in city living of Japan”
Profile
Personal Profile
•
• • • •
This time a package tour is used however normally independent tour is preferable to reserve transportation and accommodation. Dislike to act in a group Usually the trip schedule is not fix up to the final stage. Cancelled quite often. Interested in a big city lives of Tokyo or Osaka and night life. Interested in advanced technologies and modern concepts. Try to get on Shinkansen. Dislike a bus tour of package tour. Try to Shinkansen even though extra payment. Interested in Japanese famous view point such as cherry blossoms or Mt. Fuji. Wish to find Japanese friends and start communication. Shopping is essential when traveling. Always care for souvenirs for friends and colleagues. This time plan to purchase cosmetics and home electric appliances by TAX free and Japanese foods. Eat relatively much and warm dishes are preferable. Buffet style, needles and beef. Interested in Japanese cousin. Preferred a single room of JPY6,000-7,000. Interested in hot spa and wish to stay Japanese style Ryokan. Preconception of Japan Little chance to use Chinese language, Expensive price, Good service, Safe and secured –
• • •
–
•
–
•
–
•
–
Goal • • • • •
Wish to experience to live in a big city, advanced technologies and good services. Visit famous view points in Japan. Buy cosmetics, home electric appliances by Tax free and Japanese typical souvenirs. Try to take exotic foods Try to save travel expenses
Occupation:OL of overseas company Nationality: Chinese(Hong Kong) Age:27 Characters:,Cheerful, Friendly, Challenged minded, independent and established, Easy to get lost Spoken languages: Chinese and English
Trip information in advance
About Japan: enough Trip budget: JPE200K Trip days: One week(Business and pleasure)
Mobile Usage Situation
Mobile Phones ‒ Daily used ‒ Mobile mail and Internet ‒ Internet sites are games and music. Other used functions ‒ Music play ‒ Digital camera
Fig. 2. An example of a Persona model
126
N. Hosono et al.
3.3 Keywords Extraction on Emergency and Traveling Situations This research is focused upon dialogues with several participants and referring to observations from the view point of the provider and the receiver under service science principle. The next phase is to extract words that are fundamentally essential to the dialogues of the scenarios. 37 words were selected and categorized by discussions with three colleagues. Looking at the dialogues in the scenarios under the selected context, the hardest process is initiating the dialogue to a stranger. In modern times, people are worried about security. They are extremely cautious when approached by an unfamiliar person. Several interjections are included to assist the initiation of dialogues. 3.4 First Sensory Evaluation with 7 Local Sign Languages The research is initially focused upon creating pictograms or icons to make dialogues since the fundamentals of sign language are hand shape, location and movement. This research references to a collection of animation figures consists of seven local sign languages whose author is a deaf architect, gave overwhelming support to the research by supplying and permitting reference to the database. The seven local sign languages are of American, British, Chinese, French, Korean, Japanese, and Spanish [8].
S
A
J
F
C
U
K
E
Fig. 3. An example of a voting sheet of “Expensive”
Service Science Method to Create Pictograms Referring to Sign Languages
127
In the experiment, subjects are first shown an expression with the collection of animation figures consists of seven local sign languages. After then subjects are informed of the sign meaning, they are requested to vote with 19 tokens which of the seven different local sign language expressions (samples) best coincides with the informed image. They are asked to put all 19 tokens on the condition that they are permitted eventually zero voting on some samples. This sensory evaluation method can easily make relative comparisons between the seven expressions of local sign languages and is more applicable than the ordering method or pair comparison method. An example of voting sheet of “Expensive” with local sign languages is shown in Figure 3 Then the correspondence analysis of Multivariate Analysis (MVA) by statistic Software, Statistical Package for Social Science (SPSS) [9, 10] is applied. The outcome is plotted as similar local sign languages are to be plotted closely on a plane. In the characteristics of correspondence analysis, the subjects who have general and standard ideas are positioned in the centre, whereas those who have extreme or specialized ideas are positioned away from the centre. The center crossing point of the first and second Eigenvalues is gravity point or average point. The first experiment subjects are 13 people in their 20’s including nine science course students, four humanity course students. Some have experience living overseas and sign language interpreting. After voting by the tokens, all the subjects are asked of their confidence level with Semantic Differential (SD) method. Figure 4 is an example of outcome chart where “Expensive” is plotted.
Dimension 2
Dimension 1
Fig. 4. A plot chart of “Expensive” with seven sign languages
3.5 Summarized Pictograms Design Through the experience of the first Sensory Evaluation with seven local sign languages of 37 words, many sign language expressions are identified by representing
128
N. Hosono et al.
the meaning. Among them the most converged seven words of “when?”, “good-by”, “painful”, “thank you”, “where?”, “toilet”, and “expensive” among 37 are selected by means of brain storming. Following to the cycle process of HCD, the original designer is asked to summarize and design an animation like pictogram referring to the outcome of several local sign languages by the sensory evaluation mentioned above. The newly designed pictogram is added to seven local sign languages with American, British, Chinese, French, Korean, Japanese, and Spanish. 3.6 Second Sensory Evaluation with 8 Local Sign Languages The next procedure is the same manner as the first experiment of Phase 4. After subjects are informed of the sign meaning, this time they are requested to vote with 23 tokens which of the eight different local sign language expressions including newly designed one will be the best coincides with their image. The procedure was the same manner as the first step, and the correspondence analysis of Multivariate Analysis (MVA) by SPSS is once again performed. The outcome including the newly designed pictogram is plotted with other seven local sign languages in order to measure whether the newly created pictogram represents of the cluster.
Dimension 2
Dimension 1
Fig. 5. A plot chart of “Expensive” with eight sign languages
The second experiment subjects are 20 engineering department students in their 20’s including two female students. Almost all except three are different subjects from the first experience. After voting by the tokens, all the subjects are again asked of their confidence level with Semantic Differential (SD) method.
Service Science Method to Create Pictograms Referring to Sign Languages
129
Figure 5 is an example of outcome chart where “Expensive” is plotted. The newly designed one is plotted close to Japanese, Korean, Chinese and French sign languages. Whereas American, British and Spanish plotted further down. These deployments of the plots are simular in seven and eight sign languages experiments. 3.7 Conclude the Method Comparing two outcomes of Phase 4 with seven local sign languages and 6 with eight local ones, followings are concluded. • All seven newly designed animation pictograms are positioned in the centre of the related local sign languages cluster. • Even though almost of the subjects are different at the first and second experiment, the general outcome plot patterns hold similar patterns in space. • In oriental sign languages of Japanese, Korean and Chinese tend to be plotted closely together.
4 Conclusion and Discussions This paper discusses a method to extract the summarized expression of several local sign languages in order to draw pictograms by applying the sensory evaluation with MVA. The experiments consist of two steps. The first step is to find out a pictogram is a majority common expression upon a word among seven local sigh languages. Looking at the first step, this method looks valid in practice since Japanese, Korean and Chinese sign languages are similar by historical background, and in fact and they are plotted close to each other. The second step is to prove the characteristics of the pictogram represent the meaning of the word. Almost all of the newly designed pictograms positioned in the centre of the cluster then it is representative of the clusters. Through the proposed method, the relationship between selected words and local sign languages are initially explained by sensory evaluation by the subjects. Under the cycle of HCD, the pictogram designer will perform to summarize the expression of several local sign languages by this method. The acquisition of user experience is to include it as a design guideline for instance of the context of emergency and traveling situations. The issues are that the quality of the newly designed pictogram depends on the designer’s ability to summarize several ones. The newly designed pictograms are still biased by sign languages in this research in order to become much easier communication tool, and require further improvement to simplify and easily to understand for everybody. Considering the results of the second experienced phase to prove the outcome design, the proposed method is one of the guidelines to create pictograms by referring to several sign languages. Acknowledgements. This research is supported and funded by SCOPE (Strategic Information and Communications R&D Promotion Programme) project organized by
130
N. Hosono et al.
Ministry of Internal Affairs and Communications (MIC) of Japan. A collection of local sign language database is supplied and permitted for research use by Mr. M. Akatsuka of Architectural Association of Japanese DEAF (AAJD). Gratitude to Dr. K. Nakazono of NTT who gave some comments to the research.
References 1. Hosono, N., Inoue, H., Tomita, Y.: Sensory analysis method applied to develop initial machine specification. Measurement 32, 7–13 (2002) 2. International Organization for Standardization: ISO9241-210 (former 13407:1999), Ergonomics Human-centred design processes for interactive systems (2010). 3. Cooper A.: About Face 3, Wiley (2007) 4. Miki H., Hosono, N.: Universal Design with Information Technology (Japanese version), Maruzen (2005) 5. Horton W.: The Icon Book, John Wiley & Sons, Inc., New York (1994) 6. International Organization for Standardization: ISO9241-11, Ergonomic requirements for office work with visual display terminals (VDTs), Guidance on usability (1998) 7. International Organization for Standardization: ISO9241-110, Ergonomic requirements for office work with visual display terminals (VDTs), Dialogue principles (2006) 8. Akatsuka, M.: Seven sign languages for tourists: Useful words and expressions, ChineseJapanese-American Working Group (2005) 9. Field, A.: Discovering Statistics Using SPSS, 3rd edn. Sage Publications, Thousand Oaks (2009) 10. SPSS. Categories in Statistical Package for Social Science ver.18, SPSS (2009)
MoPaCo: Pseudo 3D Video Communication System Ryo Ishii, Shiro Ozawa, Takafumi Mukouchi, and Norihiko Matsuura NTT Cyber Space laboratories, NTT Corporation, 1-1, Hikari-no-oka, Yokosuka-Shi, Kanagawa 239-0847, Japan {ishii.ryo,ozawa.shiro,mukouchi.takafumi, matsuura.norihiko}@lab.ntt.co.jp
Abstract. We propose a pseudo 3D video communication system that imparts motion parallax which adjusts to the viewpoint position of a user and enables the user to view video pictures in which depth can be perceived with an ordinary equipment setup, namely a monocular camera and a 2D display. We have implemented the system and evaluation experiment results with it showed that its imparting of motion parallax allows it to represent distances that closely reflect actual face-to-face situations better than 2D video can. In addition, subjective evaluations confirmed that motion parallax gives users the feeling that the conversational partner is actually present and makes it easier for them to comprehend the positional relationship of the conversational partner in space. Keywords: Video communication, motion parallax, depth perception, interpersonal distance.
1 Introduction Our aim is to implement a video communication system which not only enables ordinary conversation, but which also provides a highly realistic feeling that enables users to work together in a natural manner while sharing a mutual space and observing each other, in a manner that is equivalent to actually being face-to-face. For this reason, it is important to give the user the feeling that the conversational partner is in front of him and have natural transmission of nonverbal information such as inter-personal distances, gaze, and pointing gestures, using a body that shares a mutual space with the conversational partner. An ordinary video communication system that uses a 2D display to show video footage which has been captured with a monocular camera lacks most of the depth information about the conversational partner and the space around him. In addition, the visibility of the conversational partner and background does not change to match the changing viewpoint when the user moves. For this reason, in addition to the problems of a feeling of spatial separation from the conversational partner [1, 2] and a lack of feeling of the presence of the conversational partner [3], there is a huge problem in that nonverbal information associated with depth (such as inter-personal distance and pointing gestures) cannot be transmitted correctly. In order to transmit such information, it is necessary to impart depth information within the video footage, and there are currently many research projects into tackling these challenges. It is known that G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 131–140, 2011. © Springer-Verlag Berlin Heidelberg 2011
132
R. Ishii et al.
motion parallax and binocular parallax are major cues that enable humans to sense depth at close distances [4]. Of these, binocular parallax can be used comparatively simply, so several methods have been proposed for using binocular parallax to represent depth by providing a stereoscopic view of the conversational partner [5]. To present correct depth information naturally in video communication, however, it is absolutely essential to present video that has motion parallax that corresponds to the observation position of the user. In this study, we propose a method that implements video communication that is associated with motion parallax with a simple setup. Since this method is intended for systems that are generally available, it was implemented with a configuration similar to that of an existing video communication system, namely a monocular camera and a 2D display. In this paper, we also report on evaluation experiments using the thus-implemented system, by which we proved experimentally the perception of depth towards the conversational partner due to motion parallax, impressions relating to the conversation video, and the accuracy with which pointing actions that follow pointing gestures are transmitted, and thus report on the effectiveness of motion parallax.
2 Related Work A video expression method that enables users to depth perspective using motion parallax alone has been proposed. Suenaga [6] proposed a 3-dimension perspective display with motion parallax. The method of display is to project video pictures corresponding to the user’s viewpoint position tracked by stereo cameras onto a 2D display. However, the system provides only CG model contents. That is, the system cannot generate photographed video in real time. The objectives of this study are to propose a new method of implementing motion parallax by using only a monocular camera in real time, and also to verify the effect on the transmission of information about nonverbal behavior by motion parallax and impressions of the resultant video.
3 Proposed MoPaCo System 3.1 Overview of MoPaCo System In this study, we propose a real-time video communication system called MoPaCo (which is a contraction of “Motion Parallax Communication”) which implements motion parallax although using only a monocular camera. Fig. 1 shows images of motion parallax video representations by MoPaCo of a conversational partner which correspond to the viewpoint positions of different users. The display for a user who is at some distance from the conversational partner in the video can give the user the feeling they are linked as if through a window. It is thought that this motion parallax video representation will eliminate spatial separation, improve the feeling of presence
MoPaCo: Pseudo 3D Video Communication System
133
of the conversational partner, and enable the transmission of nonverbal information that is associated with depth by imparting depth information to video pictures. To present a motion parallax video of a conversational partner on a 2D display so as to correspond to the viewpoint positions of different users, the following processes are necessary: (1) Measurement of each user’s viewpoint position (2) Construction of a 3D space having information on the dimensions and positional relationships of the people and the background, based on information obtained from a camera or other means (3) Rendering of the 3D space constructed in step (2) on a 2D display, to correspond to each user’s viewpoint position that was obtained in step (1). We propose the technique to perform the steps (1) and (2) using only a monocular camera and implemented the MoPaCo system using a monocular camera. In this section, we describe the details for each process.
Background Conversational partner
2D display
Scene from right side User
Scene from right side
Scene from front side Fig. 1. Concept images of video representations caused by motion parallax
3.2 Measurement of User’s Viewpoint We propose the use of only a monocular camera to detect each user’s viewpoint. Before calculating the 3D position from parts information (coordinate position) of each face in the 2D image, the system measures the eye separation distance of each user, as preprocessing. It then acquires the distance of each user from the camera, using the Depth from Focus function that is used for achieving focus in ordinary cameras. During this process, template matching is performed on the image captured from the camera to measure the positions of both eyes (2D coordinates within the image) and the orientation of the head. The system calculates the eye separation distance of each user from the distance from the user to the camera, information measured from the image, and the angle of view and resolution of the camera.
134
R. Ishii et al.
Based on this information, real-time capture is started and the system obtains the positions of both eyes (2D coordinates within the image) and the orientation of the head from the captured image, and calculates the viewpoint position z of that user from the camera from there at that time. Note that the x- and y-coordinates are calculated from the 2D coordinates within the image and the pixel pitch of the image. Based on this information, real-time capture is started and the system obtains the positions of both eyes (2D coordinates within the image) and the orientation of the head from the captured image, and calculates the viewpoint position z of that user from the camera from there at that time. Note that the x- and y-coordinates are calculated from the 2D coordinates within the image and the pixel pitch of the image. 3.3 Construction of a 3D Space In this study we propose a method of constructing 3D information with respect to an image captured from a single camera. This is done by performing background difference processing using background information that was acquired beforehand, maintaining the 2D plane and dividing it into a personal area and a background area, and creating a multi-layer structure with those areas arranged as layers in accordance with their depth-wise positions. The use of 2D images ensures that a high-resolution display is possible. In addition, if only the background difference is subjected to image processing, the processing costs are low and thus real-time processing is possible. Using the person and background images, the system then generates multiple layers having a distance relationship from the camera, at full size (in the rest of this paper, the layers that use the person image and the background image are called the person layer and the background layer). The method of calculating distance information measures for the background layer beforehand is by the depth-from-focus method provided in the autofocus function of the camera, when the background difference image is acquired. For the person layer, the user viewpoint position is used. These distances become distance information from the camera of the person layer and the background layer, respectively. Based on this distance information, the system uses Equation (1) to calculate the full size (width wi height hi) of each layer i from the thus-acquired distance information di and the camera’s angle of view (width θw, height θh). This procedure configures a 3D space having full size and position information.
×
wi = 2*di*tanθw/2, hi = 2*dh*tanθh/2
(1)
3.4 Rendering of 3D Space Based on User’s Viewpoint As shown in Fig. 2, the person layer and background layer generated by the 3D spatial information module are projected in perspective to match the viewpoint position of the user, using the 2D display as a projection surface. Thus motion parallax video is implemented.
MoPaCo: Pseudo 3D Video Communication System
2D display
135
Layers of image
Viewpoint
Conversational partner
Fig. 2. The person layer and background layer are projected
3.5 Results of Implementation Using the proposed method described above, we implemented the MoPaCo system which enables bidirectional viewing of motion parallax video in real time. The development environment was a computer with Intel Core i7 Extreme 980X as the CPU, 12 GB of memory, and a NVIDA GeForce GTX480 graphics board. The results of this implementation are shown in Table 1. The “Reflection of conversational partner’s image” in this table is that the conversational partner’s video appears in the user’s display from the captured video, and “Reflection of motion parallax” motion parallax is the appearance in the video from a movement of the user’s viewpoint position. The table also shows frame rate and response (time lag) in each part. The frame rate is sufficiently high. In contrast, response is somewhat late in arriving. Table 1. Implementation results Captured image size
1280×780 (HD)
1920×1080 (Full HD)
Reflection of conversational partner’s image
Frame rate
30 fps
18 fps
Response
260 ms
330 ms
Reflection of motion parallax
Frame rate
30 fps
Response
300 ms
Fig. 3 shows a scene as observed in 2D video, motion parallax video with MoPaCo, and actual face-to-face (REAL) situations. In comparison with the face-toface condition, there was no parallax in the video in the 2D condition, even if the user’s head moved, so the human dimensions and positional relationships did not match. With the MoPaCo condition, in contrast, the dimensions and positional relationship between the person and background were reproduced in the video.
136
R. Ishii et al. REAL
2D
Glass window
MoPaCo
2D display
Fig. 3. Scenes for three observational conditions: actual face-to-face (REAL), 2D video, and motion parallax video with MoPaCo
4 Experiments 4.1 Experiment Procedure Using the MoPaCo system, we conducted experiments with test subjects, with the objective of verifying the transmission by motion parallax video representation of inter-personal distances, which are particularly important in nonverbal information, and the effect of motion parallax on increasing the feeling of a partner’s presence, the positional awareness of the partner’s space and spatial comprehension, etc. In the experiments, the position of the conversational partner was varied in face-to-face conversational situations, conventional 2D video, and video with motion parallax. After each subject had observed a situation, they gave their evaluations. There were a total of 18 experimental conditions, consisting of combinations of three observation methods and six distances of the conversational partner, as shown in Table 2. Table 2. Implementation results Observation conditions (3 conditions) • REAL condition: Observing the conversational partner through a glass window • 2D condition: Observing the conversational partner in an image displayed on a 2D display (in this case, the user’s viewpoint position is where the image is displayed at a position when the user is sitting straight in the chair) • Motion Parallax with MoPaCo (MP) condition: Observing the conversational partner in an image having motion parallax, which is shown on a 2D display Distance to the observer (6 conditions) • The mannequin’s position is set to 0, 20, 40, 80, 120, and 200 cm from the partition (display plane), in other words, 150, 170, 190, 230, 270, and 350 cm from the subject.
To measure the perceived distance to the conversational partner in the verification of the effect of inter-personal distance representation, we adopted a subjective evaluation by which subjects gave the distance from their chest to the chest of the conversational partner by a scale-less string length. Asking the subjects to answer
MoPaCo: Pseudo 3D Video Communication System
137
with the length of a piece of string is designed to eliminate any personal differences in the subjective criterion, in comparison with methods in which answers are numerical values. In this case, if the string length given as the answer under the REAL condition is the same as the length under the MP condition, we can say that motion parallax video makes it possible to perceive depth to the conversational partner in a similar manner to actual face-to-face conversation. We obtained the subjects’ impressions of motion parallax by asking them to evaluate the seven items listed in Table 3, such as the feeling that the conversational partner is present, using a six-step Likert scale (1-6 points, six being best) on a questionnaire sheet for the 2D and MP conditions.
Table 3. Subjective evaluation items • Ease of viewing video: Was the video of the conversational partner easy to see? • Stereoscopic effect: Did you feel that the conversational partner’s space was in 3D? • Intuition of distance comprehension: Could you intuitively grasp the feeling of depth between you and the conversational partner? • Presence of conversational partner: Did you feel that the conversational partner was present? • Face-to-face feeling: Did you feel that you had actually met the conversational partner? • Ease of spatial comprehension: Was it easy to understand the positional relationship between the conversational partner and the background? • Feeling through window: Did you feel that you had met the conversational partner through a window via the display?
<Subjective space>
<Partner’s space>
Partition
Background 340 cm
Glass window or 2D display
Mannequin
Subject 46 cm 150 cm
~200 cm
0
94 cm
Fig. 4. Experimental equipment
The experimental setup was such that the subject was seated on a chair at a position 150 cm from a partition in which a glass window was installed, as shown in Fig. 4, and was able to observe the space of the conversational partner through the glass window and display. Note that with the 2D condition and the MP condition, the display was installed directly behind the glass window. The size of the glass window (46 cm high x 80 cm wide) was less than that of the display, so the edges of the display were not visible from the subjects. Note that we used a humanoid mannequin
138
R. Ishii et al.
in place of an actual person, in order to ensure that the visual stimulus of the conversational partner was uniform. Before the experiment, each subject observed the actual mannequin and verified its size. Taking the effect of the order of trials into consideration, the sequence of trials was performed under randomized conditions. Subjects were asked to sit in the chair and raise their head upon hearing the start signal, then observe the mannequin while remaining seated in the chair but moving their head freely. No time limits were set, but all the subjects finished their observations within ten seconds and then made their evaluations. They made their perceived distance evaluations after undergoing a trial for each of the experimental conditions. In contrast, they made their video impression evaluations after going through all the experimental conditions. 4.2 Experiment Results for Perceived Distance Experiments were conducted using ten subjects (nine males, one female, age range 20-60). We first show the results obtained for perspective distance in Fig. 5. The figure shows the average values of string lengths measured for all subjects. The actual distance from the subject to the conversational partner is plotted along the horizontal axis and the average value of the string length (perceived distance) is plotted along the vertical axis. The dotted line shows the values when the actual position of the conversational partner matches the string distance. It was clear that perceived distances given as string lengths were much smaller than the actual distances (the dotted line). Significant differences were seen overall, in that the string lengths became shorter in the sequence for the REAL → MP → 2D condition. In other words, it was confirmed that the MP condition is better able than the 2D condition to represent inter-personal distance in a manner close to that of the REAL condition. However, because a difference in depth perception exists between the MP and REAL conditions, we think that it is necessary to define the relationship between depth perception and motion parallax video, and to determine how distance in the MP condition can be represented accurately in the same way as it is in the REAL condition.
Average string length (Perceived distance) (cm) Actual distance from subject to conversational partner (cm) Fig. 5. Experiment results
MoPaCo: Pseudo 3D Video Communication System
139
4.3 Experiment Results for Video Impression Fig. 6 shows the average values of the subjective evaluation scores for the questions that 10 subjects were asked to answer on the questionnaire. In this experiment, we performed paired t-testing for each evaluation item. These results are also shown in the graphs. The “Ease of viewing video” in the figure caused us some concern, in that since movements under the MP conditions result in large changes in the video, drunkenness or other abnormalities might not be noticed. However, the existence of such a problem could not be confirmed. We consider that the MoPaCo feature of imparting motion parallax to video pictures in accordance with viewpoint position will give humans a natural visual effect. This is supported by results showing that MP was evaluated significantly higher than 2D in two paired T-tests for the six evaluation factors other than “Ease of viewing video” (Stereoscopic effect : t(9)=2.68, .10
Fig. 6. Impression evaluation results for motion parallax video
140
R. Ishii et al.
5 Conclusion We have proposed a pseudo 3D video communication system that imparts motion parallax which adjusts to the viewpoint position of a user and enables the user to view video pictures in which depth can be perceived with an ordinary equipment setup, namely a monocular camera and a 2D display. The system was implemented and evaluation experiment results with it showed that its imparting of motion parallax enables it to represent distances that closely reflect actual face-to-face situations better than 2D video can. In addition, it has been confirmed from subjective evaluations that motion parallax gives the user a feeling as if the conversational partner is actually present and makes it easier for him/her to comprehend the positional relationship of the conversational partner in space. In the future, we plan to conduct real-time conversations to further confirm the effectiveness of using motion parallax in video communication.
References 1. Williams, E.: Coalition formation over telecommunications media. European Journal of Social Psychology 5, 503–507 (1975) 2. Strickland, L.H., Guild, P.D., Barefoot, J., Paterson, S.A.: Teleconferencing and leadership emergence. Human Relation 31, 583–596 (1978) 3. Heath, C., et al.: Disembodied conduct: communication through video in a multimedia environment. In: Proc. CHI 1991, pp. 99–103. ACM Press, NY (1991) 4. Cutting, J., Vishton, P.: Perceiving Layout and Knowing Distances. In: Epstein, W., Rogers, S. (eds.) Perception of Space and Motion, pp. 69–117. Academic Press, New York (1995) 5. Towles, H., Chen, W.C., Yang, R., Kum, S.U., Fuchs, H., Kelshikar, N., Mulligan, J., Daniilidis, K., Holden, L., Zeleznik, B., Sadagic, A., Lanier, J.: 3DTele-Immersion Over Internet 2. In: ITP 2002 (2002) 6. Tsuyoshi, S., Yoshio, M., Tsukasa, O.: 3D Display Based on Motion Parallax Using Noncontact 3D Measurement of Head Position. In: Proceedings of OZCHI 2005 (2005)
Analysis on Relationship between Smiley and Emotional Word Included in Chat Text Junko Itou1, Tomoyasu Ogaki2, and Jun Munemori1 1
Faculty of systems Engineering, Wakayama University, 930, Sakaedani, Wakayama 640-8510, Japan {itou,munemori}@sys.wakayama-u.ac.jp 2 Graduate School of systems Engineering, Wakayama University, 930, Sakaedani, Wakayama 640-8510, Japan
[email protected]
Abstract. In this research, we analyze the relationships between smileys and emotional words in chat text aiming to apply these relationships to an embodied character chat system. Smileys add various meanings, especially mental information to plain chat texts to make our text communication successful. We focus on the way to use smileys and emotional words so that we can estimate the chat atmospheres. We performed an experiment to investigate the relationships between smileys and emotional words in chat dialogue.
1 Introduction Online communication system such as e-mail, a chat system, a remote meeting system, and a distance learning, becomes a common tool for people. Now we can communicate with people using a graphical chat system that agents talk in virtual 3D space in place of users. Users obtain messages by watching embodied character's actions as well as by reading plain texts in a chat system that employs a embodied character. A character in the chat systems plays a role as an agent of a user not only to express the user's emotional states or intentions which cannot be displayed by a chat message, but also to make the chat alive. As the result, it becomes clear what meaning the user implies for the chat messages. In order to control actions of embodied characters, users need to input texts or click a button consciously at the each timing. It is a troublesome task for a chat user. Therefore, a character chat system is required that the nonverbal expressions are automatically controlled according to the dialogue atmosphere and input chat messages. Previous works on controlling nonverbal expressions of embodied characters mainly discuss the consistency of nonverbal expressions with the speech utterances or the goal of conversation for each agent [1][2]. However, when we consider dialogues between a pair of embodied characters, we need to consider the interdependences between nonverbal expressions displayed by those two characters. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 141–146, 2011. © Springer-Verlag Berlin Heidelberg 2011
142
J. Itou, T. Ogaki, and J. Munemori
Emotional words and smileys add nonverbal information variously to text messages and express the user's emotional state just as embodied characters add. So, we focus on the relationships between emotional words and smileys aiming to apply these relationships to an embodied character chat system in the remainder of this article.
2 Related Researches on a Chat System with Embodied Characters and Nonverbal Expressions In most current graphical communication systems, a user should choose and specify the character's action explicitly, or a system focuses on the consistency of nonverbal expressions with the speech utterances. Cassell[1] proposed an agent system which interpreted user's inputs and generated responses automatically with nonverbal expressions to be fulfilled according to the desired function. This research focused on the relation between spoken words of users and responses of the agent which show words or actions. However, it is reported in the field of social psychology that there are some interdependences between nonverbal expressions displayed by “each person” in daily conversations [3]-[6]. It has been investigated in social psychology what features are found in nonverbal expressions given by humans during their conversation. In the experiments by Matarazzo, noddings by the listeners in a conversation encouraged utterances of the speakers. As the result, animated conversation between the speakers and the listeners is realized [5]. In another experiments by Dimberg, facial expressions of the test subjects were affected by those of their partners [6]. The subjects smiled when their partners gave smiles to them, whereas they gave expressions of tension when their partners had angry faces. These results imply there are positive correlations or synchronicities between the nonverbal expressions given by the conversation partners. Table. 1 illustrates these positive correlations by *. Table 1. Interdependences between nonverbal expressions of different persons
By employing the knowledge in social psychology we aim to realize automatic actions of embodied characters. People substitute smiley for nonverbal expressions in a text communication. That means we can estimate mental states of a user by the usage of smileys and emotional words.
Analysis on Relationship between Smiley and Emotional Word Included in Chat Text
143
3 Nonverbal Expressions Shown by Smiley and Emotional Words 3.1 Goal In the most current graphical chat systems, users should choose and specify the character's actions explicitly. By employing the knowledge described in the previous section, we aim to realize automatic actions and reactions of embodied characters in order to make users' chat alive. It is preferable that character's actions and reactions can be produced by users’ messages because users attend to only their chat and it is very troublesome for users to set manually all actions and reactions at each message during their chat. We will apply the interdependences between nonverbal expressions described in section 2 to the actions of embodied agents in chat system. 3.2 Emotional Words and Frequently Used Smileys People can interpret a smiley variously so they use and receive smileys in many ways. In order to investigate the relationship between smileys and emotional words, we collected smileys which were generally used in a text communication and the emotional words remembered by the smileys. At first, we collected in total 151 kinds of smileys from a smiley list registered with IME2007. Fig. 1 shows a part of the smileys. We asked 7 college students who ordinarily used smileys in text communications to answer whether they knew the meanings of the smileys and used those. It was 9 kinds of smileys that more than 5 subjects replied they often used the smileys and it was 48 kinds of smileys they had not used at all. I show the 9 kinds of smileys mentioned above at Table. 2.
Fig. 1. Example of smiley Table 2. Frequently-selected smileys
Next, we indicated the subjects to write the words associated with the 9 smileys when they converted from Kanji to the 9 smileys and to categorize the 9 smileys into 35 emotional word categories [7] in Table.3 and into 8 basic emotions [8]. The 8 basic emotions include ‘‘delight’’, ‘‘grief’’, ‘‘affection’’, ‘‘aversive’’, ‘‘fear’’, ‘‘rage’’, ‘‘amazement’’, and ‘‘caution’’.
144
J. Itou, T. Ogaki, and J. Munemori Table 3. Emotional word category
4 Experimental Results 4.1 Categorized Result The subjects had a tendency to choose “delight’’ and “affection’’ among 8 basic emotions for the smiley (1) - (5). They also associated the smiley (1) - (3) with the emotional words expressing friendly such as pleasure, joy and satisfaction. As for the smiley (4) and (5), joy, pleasure and satisfaction were given as the smiley (1) - (3), but consideration and warmth were chosen, too. It became clear that these emotional categories are indispensable in the chat. The smiley (4) and (5) showed approximately the same result, but it is revealed that the expression of emotion was stronger in (5). If a smiley expresses a same mental state, we should reflect the strength of the emotion in an embodied character of a chat system according to the situation. Many subjects categorized the smiley (6) into “grief’’ of 8 basic emotions but there was some subjects chose uneasiness, fear, pity, pain and unpleasant. It is thought that this smiley depends on context greatly. As the smiley (6), information except the smiley is necessary for estimate of feelings. The smiley (7) shows a person sheds tears. Most of subjects got hold of a same image for the smiley (7) and they selected ‘‘grief’’ and sorrow. As for the smiley (8), ‘‘grief’’ was mainly selected however some subjects chose ‘‘affection’’ and ‘‘aversive’’. It was the smiley (9) that answers was divided most. They selected ‘‘delight’’, ‘‘affection’’, ‘‘grief’’ and ‘‘aversive’’ among 8 basic feelings. They tend to choose sorrow, loneliness and regret for the smiley (8), and to choose disappointment, uneasy, hesitation, regret and pain for the smiley (9). The words associated with the smiley (8) are “I'm sorry’’ and “I apologize’’ such as words to express a feeling of the apology, “Thank you in advance’’ such as words to make a request, and, “Thank you’’ such as words to be given thanks to. 4.2 Smileys in Chat Texts We performed second examination to investigate how the smiley used in real chat conversations and what kind of smiley appeared. Experimental subjects were 10 college students: they were divided into 5 pairs. They connected to the chat server from different rooms. We instructed them to exchange chat messages freely for 15 minutes. They can only use plain text and
Analysis on Relationship between Smiley and Emotional Word Included in Chat Text
145
smileys in the chat system. We show them the list of smileys which more than 3 subjects answered they frequently used, in previous subsection’s experiment. Add to the above direction, we announced they had not to use smiley when they think it was not necessary to input smileys. As a result, we obtain 234 sentences and 55 smileys. 67.3% of appeared smileys can be categorized (1) – (5) in Table. 2, 12.7% smileys are members of (9) and 7.3% smileys are (7). A total of 87.3% of the smileys that were appeared in the chat texts, so these smileys and the basic emotions associated with the smileys are essential feature to chat smoothly and emotionally. 4.3 Future Work As a future work, we will add the above direction to a chat system that employs embodied characters. As discussed in section 2, the actions of characters are not only determined by messages of a user but also the actions of the conversation partner character. For example when one character laughs by a message of the user that employs the character, the partner character should smile back without a text reply of the chat partner.
Fig. 2. Overview of a chat system that employs embodied characters
We have proposed a chat system that has embodied characters. Fig. 2 is the overview of our chat system. The main window shows information about login users. In the chat window, a log of users’ chat is shown. A user's agent character and the chat partner’s agent character variously act according to the users' input chat texts. When a message includes the keywords corresponding to "think", the partner character shows a reaction to stimulate their dialogue according to the research by
146
J. Itou, T. Ogaki, and J. Munemori
Matarazzo. From the indication mentioned in section 2, we intend to reflect these relationships between an action and a reaction.
5 Conclusions In this article, we investigated the relationships between smileys and emotional words to apply to an embodied character chat system. In the experiments, it is revealed that the basic emotions “delight’’, “affection’’ and ‘‘grief’’ are most frequently appeared in daily chat conversations so we should make the embodied character act to satisfy the relationships among nonverbal expressions, and relationships between emotional words and smileys. Finally, we should plan to investigate whether atmospheres produced by actions of embodied characters based on smileys are accordance with real atmospheres of chat conversations.
References [1] Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Ilhjalmsson, H., Yan, H.: Embodiment in Conversational Interfaces: Rea. In: CHI 1999, pp. 520–527 (1999) [2] De Carolis, B., Pelachaud, C., Poggi, I., De Rosis, F.: Behavior Planning for a Reflexive Agent. In: Proc. of International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 1059–1066 (2001) [3] Beattie, G.W.: Sequential patterns of speech and gaze in dialogue. Semiotica 23, 29–52 (1978) [4] Kendon, A.: Some functions of gaze direction in social interaction. Acta Psychologica 26, 22–63 (1967) [5] Matarazzo, J.D., Saslow, G., Wiens, A.N., Weitman, M., Allen, B.V.: Interviewer Head Nodding and Interviewee Speech Durations. Psychotberapy:Theory, Research and Practice 1, 54–63 (1964) [6] Dimberg, U.: Facial Reactions to Facial Expressions. Psychophysiology 19(6), 643–647 (1982) [7] Yoshida, M., Kinase, R., et al.: Multidimensional scaling of emotion. Japanese Psychological Research 12(2), 45–72 (1970) [8] Plutchik, R.: The Multifactor-Analytic Theory of Emotion. The Journal of Psychology 50, 154–171 (1960)
Designing Peripheral Communication Services for Families Living-Apart: Elderly Persons and Family Yosuke Kinoe and Mihoko Noda Graduate School of Intercultural Communication, Hosei University 2-17-1, Fujimi, Chiyoda-ku, Tokyo 102-8160, Japan
[email protected],
[email protected]
Abstract. We developed a new augmented communications environment which aims to engender a greater sense of social proximity to geographically distributed family members and improve their emotional well-being. First, a field study was conducted to determine important peripheral communication cues for sensing presence and mood of family members as well as memory triggers. Secondly, the design principles were extracted from the study results to guide the development of a first-of-a-kind prototype, the “SharedEpisodes” that delivers visual peripheral cues along the line with the story of family episodes. Finally, an initial field evaluation was conducted. Overall responses of the participants indicated positive for this type of communications environment that supports the awareness of the presence and state of family members, and the exchange of peripheral communication cues based on family episodes. Future works involve methodological improvement and prototype enhancements including the choice of alternative modality. Keywords: telecommunication, periphery, peripheral cues, distributed family.
1 Introduction This paper describes an effort to develop a new augmented communications environment which aims to engender a greater sense of social proximity to geographically distributed family members. We focus on the issues of supporting the exchange of peripheral communication cues individual family cultivated while they lived together [1] and the improvement of emotional well-being of families that live apart. The present research consists of a field study of peripheral communication cues in family relationships, development of an augmented communications environment based on the field study and its initial field evaluation. 1.1 Social Background The recent change of the social structure [2] and family composition steadily influences the transformation of family functioning [3] and view of the family [4][5]. The number of families that live apart, either by choice or necessity, has been increasing due to G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 147–156, 2011. © Springer-Verlag Berlin Heidelberg 2011
148
Y. Kinoe and M. Noda
various social circumstances. Additional trends, in part due to greater longevity, indicate a growing elderly population, often living in isolation from the rest of their families. In Japan, the percentage of people living alone including elderly people who live solitary lives increased by a staggering 28.6% between 1995 and 2005, while there was a drop of 2.5% in the percentage of nuclear family households [6]. This seems to be the trend for the next decades. Similarly, in the United States, the proportion of nuclear family households (two married parents and a child) dropped from 40% of all households in 1970 to 23% in 2005, while in the same period, the number of single adult households climbed from 16% to 28% [7]. These dramatic shifts in household composition entail significant changes in the nature of family relationships and, we argue, place increased importance on the role of communications technology for social benefit. 1.2 Related Works: Supporting Families Who Live-Apart Field studies of technology use in the home have been conducted for a variety of purposes include natural observational studies of family awareness [8] and assist family members living apart, in particular, seniors. Such technologies include a jacket for “hug over a distance” [9], family portraits [10], an augmented “planter” [11], a synchronized décor [12], information organizing systems [13], and an interactive installation supporting touch over a distance [14]. Social and emotional factors have also been considered within the eldercare experience of “aging in place” [15]. However, there have been comparatively few studies on the details of background communication among close individuals [11].
2 Peripheral Communications We believe that the reflection of individual family values in their life-world is important for developing communications support for family. Those values are associated with family episodes, memorabilia, emotional bond or attachments [16], habits, communication styles and special objects they keep in their homes [17]. People who live together consciously or unconsciously convey, perceive, and share various peripheral information in their everyday lives (Fig. 1). Examples include the cues of tone of voice, singing in the kitchen or shower, the pace of footsteps, doors being opened or slammed shut, light or music leaking through a door, movement of personal belongings such as keys, hats and bags of family members, and the aroma of coffee brewing or cookies baking. Each family has individual style of using these cues to gain awareness of the mood or physical presence of other family members. When family members move apart, these cues are no longer shared, which, we believe, diminishes the sense of close contact the family previously enjoyed. Our research project is investigating how technology can help convey these subtle, but significant elements of peripheral information to family members or partners living apart, so as to increase peripheral awareness of the state of loved ones without the technology becoming intrusive or overly demanding of foreground attention.
Designing Peripheral Communication Services for Families Living-Apart
149
Fig. 1. Family who live together convey, perceive, and exchange various peripheral communication cues among family members. They sometimes use these cues to gain awareness of the mood or physical presence of other family members.
3 Field Study: Seniors and Their Families In the design of a communications environment for close individuals who live apart, we believe that it is important to understand meaningful peripheral cues and family episodes, and how those cues are used by these individuals while living together. We conducted a field study consisting of a series of empirical sessions by combining multiple qualitative research methods. The aims were to collect and determine important peripheral cues for sensing presence and mood of family members, family episodes, memory triggers that evoke feelings of missing one another, and verify that our assumptions concerning the use of peripheral communications were valid. 3.1 Method and Research Settings The session of the field study involves interactive semi-structured interviews, completion of a questionnaire, in-situ contextual inquiry, and open-ended discussion. Eight respondents (one male and seven female), ranging between 48 and 83 years of age, participated. We also conducted complementary interviews with two care personnel of an assisted-living facility to understand background of everyday lives of elderly residents staying in the facility. These respondents were classified into three classes based on the current resident status; namely an elderly couple, pairs of elderly solitary parent and an independent child, and residents living in an assisted-living facility. Table 1 summarizes the profile of these individuals. The field study took place in the suburbs of Tokyo, Japan, between July and December of 2008. The sessions, spread over several weeks, were conducted on an individual basis to assure participants’ privacy and divided into two or three components. Table 2 describes overview of the sessions of our field study. Prior to the interviews, the introductory session facilitated for participants a better understanding and awareness of their everyday background communications with family members as well as the establishment of an appropriate rapport. The typical topics of the sessions included (a) profile information including family structure and overview of family
150
Y. Kinoe and M. Noda
history, conversational partner, topics, frequency, and method of communication, (b) cues for sensing presence of other family member, (c) cues for sensing mood of other family member, (d) memory triggers that evoke feelings of missing, (e) family memorabilia while living together and living-apart, (f) topics they wish to convey, and reality of their everyday situations of family communications. Table 1. Summary of profiles of the respondents
Class / Relation
# 1 2 3 4 5 6 7 8
Gender Living apart from Family Duration / Age
mother F daughter (married) F mother F daughter (married) F husband (in hospital) M elderly couple wife F F resident living in an assisted-living facility F
elderly solitary parent & independent child
83 58 79 48 83 74 81 83
daughter
parent daughter
parent partner partner son sister
30 years 20 years 4 years 30 years 13 years
Living with n/a husband, son n/a husband, son n/a son n/a n/a
Table 2. Overview of a structure and topics for the field study session
#
Module
1
• informed consent, rapport building • questionnaire, • semi-structured interview.
2, • • (3) • •
Detail
introduction, profile information including family structure and family history, conversational partner, topics, frequency, method of communication, cues for sensing presence and mood of other family member, memory triggers that evoke feelings of missing, cues for sensing presence and mood of other family member informed consent, (cont’d), questionnaire, semi-structured interview, 7. memory triggers that evoke feelings of missing (cont’d), in-situ contextual inquiry, 8. family memorabilia (while living together and living-apart), open-ended discussion. 9. topics you wish to convey, a gap between the wish and reality in their everyday family communications, 10. debriefing. 1. 2. 3. 4. 5. 6.
3.2 Analysis Approximately sixteen hours of verbal reports were obtained from the field interviews over sixteen sessions in total and contained a broad range of topics relating to communication practice among the families. All the verbal reports obtained from the field interviews were transcribed and analyzed according to a standardized analysis procedure of the VPA Method [18]. In this method, analysts first clarify the analysis viewpoints and define an analytic model that consists of a set of specific terms for characterizing the data. Then, analysts consistently characterize bulk of verbal reports using a standardized procedure of the “Segmentation & Tagging.” All the transcripts are divided into segments. Each segment is encoded with a tag, a combination of the attributes selected from each analysis viewpoints of the analytic model. In this study, the following viewpoints
Designing Peripheral Communication Services for Families Living-Apart
151
were established for encoding: (a) topical themes (e.g. recent news, memorable episode), (b) situation (Five Ws), (c) relationship with the person mentioned, (d) primary modality and (e) emotional response to the topic. 3.3 Results The field interviews involved a very broad topics such as recent joyful news from a grandchild, a regrettable incident with a married son, as well as family memorabilia during a few decades of family histories, what kind of cues were used, and how those cues were used for gaining the awareness of the moods of family members. Use of Conventional Communications Media. Several shortcomings of conventional communications media were noted. In general, emotional characteristics of communications, including one's overall mood and expressions of sarcasm or humor, were felt to be not as easily conveyed through communications media as in person. In particular, senior participants were hesitant to initiate contact using telecommunications technologies even when they wish to speak with their loved ones. Peripheral Cues. A total of 54 distinct peripheral communication cues were identified. These were analyzed according to the primary modality of individual cues (Table 3). Reference data obtained from our previous study using the same research framework for young adults in Montreal [1] are also presented (Table 3, right). The results indicate that visual (Tokyo:68.5%; Montreal:43.5%) and auditory (13.0%; 46.0%) were the two dominant modalities of peripheral communication cues with family. Very few cues were reported for olfactory (5.6%; 3.1%). A small number of cues involved the other modalities of somatosensory (0%; 1.9%), taste (0%; 0%), and multiple modalities over a particular interval, or did not involve any particular modality information. These were classified in “others” (13.0%; 7.5%). Table 3. Summary of peripheral communication cues obtained from the field study for elderly persons and their families in Tokyo (left), and reference data from our previous field study using the same research framework for young adults and their families in Montreal (right)
# Gender /Age 1 2 3 4 5 6 7 8
F F F F M F F F
83 58 79 48 83 74 81 83
(a) Tokyo Living Peripheral cues apart Vis Aud Olfact Oth from ual itory ory er daughter 4 0 0 2 parent 6 2 0 2 daughter 4 0 0 1 parent 6 0 0 0 partner 6 1 2 1 partner 8 3 0 1 son 3 1 1 0 sister 0 0 0 0
#
1 2 3 4 5 6 7
(b) Montreal (Kinoe & Cooperstock, 2007) Peripheral cues Gender/ Living Age apart Vis Aud Olfact Oth from ual itory ory er M 26 brother 14 14 0 0 M 23 brother 17 16 1 0 F 19 parent 9 16 0 3 F 22 partner 4 3 1 0 F 22 partner 13 9 0 4 F 23 partner 9 13 3 2 F 23 partner 4 3 0 0
Unforgettable Episodes of Family. Interestingly, significant portion of peripheral communication cues were mentioned in association with their family episodes. Those episodes concretely describe how they used the cues in a particular situation (i.e. Five
152
Y. Kinoe and M. Noda
Ws) with a particular affect. For example, the participant #4 in Tokyo genially smiled and described that a mother’s cane staying put in its holder was an explicit cue intelligibly indicating her mother was in good condition. Family episodes seem essential and sometimes work as a kind of symbol of their close relationships among the families that live-apart. Their semantics of the cues within family members have been developed through a long period of their life history even though these were not so meaningful to others.
4 Designing Peripheral Communication Services Based on the analysis results of the field study, the design principles were established to guide the development of a first-of-a-kind prototype for peripheral communication services. The principles were continuously refined and played a role of the cornerstone throughout the development. 4.1 Design Principles The following principles were established for designing a prototype which aims to support geographically distributed families to maintain and cultivate their relationships in a natural way. • Principle #1: Allow family members to exchange their unique peripheral communications cues at the adequate level of fidelity. • Principle #2: Snuggle up to family members’ everyday lives without their noticing. No additional attentions and actions are required for initiating and terminating their peripheral communications. • Principle #3: Deliver their cues along the line with the story of family episodes. 4.2 The “SharedEpisodes”: A Prototype for Peripheral Communications We developed several types of design prototype for facilitating peripheral communications among family members by utilizing different modality of peripheral communication cues such as visual, auditory and cutaneous sensation. The “SharedEpisodes” is a first-of-a-kind prototype which emphasizes on visual cues. First, based on the analysis of field study, several distinctive family episodes were selected from the field interviews and decomposed into elements of the target situation. Secondly, core features of the target family episode were incorporated into three levels of the layers of a prototype, namely basic, common and episode-specific. The “SharedEpisodes” senses specific events and peripheral communication cues caused by a family member’s behavior and visually delivers those cues to the remote party. Fig. 2 shows an example of its application featuring (a) the episode#1 of an elderly mother and her daughter and (b) the episode #2 of an elderly couple. Focus Episode #1: between elderly mother and her married daughter. A married daughter living with her husband and children often concerns health condition of her elderly mother who lives-apart. Her mother often enjoys going for a walk and sometimes leaves a cane when being in a good condition. While she lived with her mother, she reasoned about her mother’s condition by seeing a cane left in its holder
Designing Peripheral Communication Services for Families Living-Apart
153
near the front door. The “SharedEpisodes” supports their peripheral communications based on the episode. It senses contact state between a mother’s cane and its holder by using RFID tags/reader and then delivers the scene of her mother’s favorite cane resting in its holder to a digital picture frame in her daughter’s house (Fig. 2-a). This aims to help the daughter imagine her mother’s condition through visual imagery of personal relevance, without any intentional foreground communication. Focus Episode #2: between wife and her husband staying in a hospital. The husband of an elderly couple had been stayed in a hospital for several years. His wife hesitates making a call as it’s limited in a hospital. They engendered various heartwarming memories concerning gardening while they lived together. The husband well knows his wife enjoys gardening habitually at the turn of the year and remembers a beautiful hanging basket she made for him. The “SharedEpisodes” thus senses the movement of her trowel used in gardening and contact state between a trowel and hanging basket in the garden by using RFID tags/reader and then visually delivers the cues to a digital picture frame in a hospital room (Fig. 2-b). This aims to help the husband imagine his wife’s everyday life especially gardening through visual imagery of personal relevance, without any intentional foreground communication.
(a) Elderly mother and her married daughter
(b) Elderly couple – wife and her husband who is hospitalized for a long time
Fig. 2. The “SharedEpisodes” is a prototype for peripheral communications services. It senses meaningful events and peripheral communication cues caused by a family member’s specific behavior and visually delivers those cues to the remote party.
5 Evaluation of the “SharedEpisodes” We conducted a quick field evaluation of the “SharedEpisodes.” The aims were to evaluate its effects on family communications, their relationships and emotional wellbeing, and validate our design principles previously established.
154
Y. Kinoe and M. Noda
5.1 Methods and Research Settings The field evaluation took place in the suburbs of Tokyo, Japan, between December of 2008 and January of 2009. Four respondents including a pair of elderly mother and her daughter (participants #3, #4 in Table 1) and an elderly couple (participants #5, #6 in Table 1) who joined our previous field study, ranging between 48 and 83 years of age, again participated. It involves interactive semi-structured interviews, completion of a questionnaire and an open-ended discussion. The “SharedEpisodes” was installed in their living spaces. Due to the network limitation, the tool was modified to deliver the cues previously recorded off-line. The participants were asked to evaluate a total communications environment including the tool after they used it for seven days. The interview sessions were conducted on an individual basis and also asked to answer the same questionnaire at the timing of start and end of the deployment. 5.2 Results Three participants excluding the participant #3 (an elderly mother) claimed that the “SharedEpisodes” facilitated increasing opportunities for remembering family members and gaining the awareness of their presences. However, their responses were not always naively positive, but also contained complicated and sensitive feelings for gaining the awareness of the presence of their family members. Several changes between pre and post deployment of the prototype were found in the responses from the participants #4 (daughter of an elderly mother) and #6 (wife of an elderly couple) to a part of questionnaires on the status of their well-being. The participant #6 changed her answer to “completely agree” (post) from “agree” (pre) for the question “I feel a warm bond with my family member.” She claimed she could foster various aspects of her husband staying in a hospital while using the prototype. On the other hand, the participant #4 changed her answer to “completely agree” (post) from “somewhat agree” (pre) for the question “I get anxious about my life in the future.” She explained that after she began to use the prototype she paused to realize a great influence of the presence of her mother on her and became anxious with a possible pain of the loss of the presence of her mother in the future. 5.3 Discussions Contextual Use of Family Episodes. The participants’ overall responses indicated positive for communication support that aims to facilitate peripheral communications among family living-apart, along the line with the story of family-specific episodes. Mixed Feelings. One of the expected effects was to allay anxieties of the audience; however, the responses were more complex. For example, the participant #4’s response obviously involved mixed feelings of gratefulness and anxieties for her mother while gaining the awareness of the presence of the mother. It is suggested that a process of maintaining a close relationship of family involves not only delightful but also essentially mixed, sensitive, and sometimes anxious feelings for the family. Enhancing the Design Principles. Our strategy is to deploy the design principles throughout the development. The design principles #1, #2 and #3 provided a useful framework but are insufficient for realizing successful peripheral communication
Designing Peripheral Communication Services for Families Living-Apart
155
services. The results suggested the importance of allowing users to overlook or be unaware of the cues by obscuring them in the background when users don’t need them. This requirement closely relates to the key concept “periphery” [19]. The Choice of Modality and Presentation Method. A digital picture frame was very helpful for presenting the cues visually in a household setting, yet it sometimes attracted a certain degree of unexpected attentions of users. Ambient display technology or “peripheral audio” [1] will be a considerable alternative. Methodology and Process for Developing Peripheral Communications Services. Recently, various service engineering processes were presented. Anaby-Tavor et al. proposed a model-driven process consisting of a sequence of activities: analysis, strategy design, development, and deployment [20]. Our methodology adopted in this study provides a sub process that consists of (a) qualitative research in the field, (b) qualitative analysis and derivation of family values including cues, episodes, memorabilia of family, (c) establishment of design principles, and (d) development of peripheral communications environment and its field evaluation. These correspond to the analysis, design and development phases of the process.
6 Conclusion We developed a peripheral communications environment which aims to engender a greater sense of social proximity to distributed family members by facilitating the exchange of peripheral communication cues among family living-apart. A field study was conducted to determine important peripheral communication cues and episodes in family relationships. The design principles were established to guide the development of the “SharedEpisodes,” a prototype that delivers visual cues along the line with the story of family episodes. An initial field evaluation was also conducted. The participants’ overall responses indicated positive for the tool, but also seemed to involve mixed feelings. We are assessing the effectiveness of peripheral technologies as related to various design factors such as the choice of modalities, use of high- or low-fidelity media, and the contextual use of family-specific episodes, and evaluate their influence on users’ emotional responses. Our future works involve methodological enhancements, refinement of the design principles and prototype including the choice of alternative modality. We hope that this study, and the associated technologies being developed, will stimulate further research into the use of peripheral communications, helping distributed family members maintain awareness of each other’s state and emotional well-being. Potential audience of our peripheral technology seems to include close individuals also in different conditions of separation such as diaspora and immigrants. Acknowledgments. This work was partially supported by Grant-in-Aid for Scientific Research (20500675, 23300263). We thank our study participants and nursing-care staffs who volunteered their time to provide us with insight into the role of peripheral cues in family communication. We thank Dr. Jeremy R. Cooperstock who jointly conducted our previous studies in Montreal, and C.Honda, C.Ojima and Kinoe Lab. members of Hosei University who devotedly supported our field studies in Tokyo.
156
Y. Kinoe and M. Noda
References 1. Kinoe, Y., Cooperstock, J.R.: Peripheral telecommunications: Supporting distributed awareness and seamless transitions to the foreground. In: Okadome, T., Yamazaki, T., Makhtari, M. (eds.) ICOST. LNCS, vol. 4541, pp. 81–89. Springer, Heidelberg (2007) 2. Murdock, G.P.: Social Structure. Free Press (1965) 3. Parsons, T., Bales, R.F.: Family socialization and interaction process. Routledge (1956) 4. Holstein, J.A., Gubrium, J.F.: Constructing the Life Course, 2nd edn. General Hall (2000) 5. Fineman, M.A.: The Autonomy Myth: A Theory of Dependency. New Press, NY (2005) 6. Population Census of Japan, Statistics Bureau, Ministry of Internal Affairs and Communications of Japan, http://www.stat.go.jp/english/data/kokusei/2005/nihon/ index.htm (translated) 7. US Census Bureau datasets, http://factfinder.census.gov 8. Hutchinson, H., Mackay, W., Westerlund, B., Bederson, B.B., Druin, A., Plaisant, C., et al.: Technology Probes: Inspiring Design for and with Families. In: Proc. ACM CHI 2003 (2003) 9. Vetere1, F., Gibbs, M.R., Kjeldskov, J., Howard, S., Mueller, F., et al.: Mediating Intimacy: Designing Technologies to Support Strong-Tie Relationships. In: Proc. ACM CHI 2005 (2005) 10. Vetere1, F., Gibbs, M.R., Kjeldskov, J., Howard, S., Mueller, F., et al.: Mediating Intimacy: Designing Technologies to Support Strong-Tie Relationships. In: Proc. ACM CHI 2005 (2005) 11. Miyajima, A., Itoh, Y., Itoh, M., Watanabe, T.: “Tsunagari-kan” Communication: Design of a New Telecommunication Environment and a Field Test with Family Members Living Apart. Intl. J. of Human-Computer Interaction 19(22), 253–276 (2005) 12. Tsujita, H., Tsukada, K., Siio, I.: SyncDecor: Appliances for Sharing Mutual Awareness between Lovers Separated by Distance. In: Extended Abstracts ACM CHI 2007, pp. 2699– 2704 (2007) 13. Taylor, A.S., Swan, L.: Artful Systems in the Home. In: Proc. ACM CHI 2005 (2005) 14. Hayashi T.: Mutsugoto 2007, http://www.tomokohayashi.com/mutsugoto2007.html 15. Hirsch, T., Forlizzi, J., Hyder, E., Goetz, J., Stroback, J., Kurtz, C.: The ELDer Project: Social and Emotional Factors in the Design of Eldercare Technologies. In: Proc. CUU 2000 (2000) 16. Bowlby, J.: A Secure Base: Parent-Child Attachment and Healthy Human Development. Basic Books, New York (1988) 17. Csikszentmihalyi, M., Rochberg-Halton, E.: The Meaning of Things: Domestic Symbols and the Self. Cambridge University Press, Cambridge (1981) 18. Kinoe, Y., Mori, H.: Discovering Latent Relationships among Ideas: A Methodology for Facilitating New Idea Creation. In: Bullinger, H., Ziegler, J. (eds.) Human–Computer Interaction: Ergonomics and User Interfaces, pp. 1242–1246. LEA (1999) 19. Weiser, M., Brown, J.S.: The Coming Age of Calm Technology (1996), http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm 20. Anaby-Tavor, A., Amid, D., Sela, A., Fisher, A., Zhang, K., Tie Jun, O.: Towards a Model Driven Service Engineering Process. In: 2008 IEEE Congress on Services - I, pp. 503–510 (2008)
Visual Feedback to Reduce Influence of Delay on Video Chatting Kazuyoshi Murata, Masatsugu Hattori, and Yu Shibuya Kyoto Institute of Technology Matsugaski, Sakyo-ku, Kyoto 606-8585 Japan {kmurata,shibuya}@kit.ac.jp,
[email protected]
Abstract. When there is a certain delay in video chatting, participants often misunderstand other partners’ response and make unintended interruptions. In this paper, to overcome these problems, we present two kinds of visual feedback: Scroll Wave Indicator and Afterimage Indicator. An experiment was conducted to confirm effectiveness of these indicators. The result of experiment showed that the Scroll Wave Indicator helped participants to understand the remote participant’s response timing and decreased unintended interruptions even if there was 2 [sec] round-trip delay. Keywords: visual feedback, delay time, video chatting.
1 Introduction Video chat systems have been popular to support communication among spatially distributed friends, couples, or families. However, there are unavoidable problems caused by a delay in such video systems. For example, Ruhleder and Jordan [1] showed unintended interruptions as one of the particular problems due to delay for video-based remote communication. Fig.1 illustrates an example of unintended interruptions in delayed situation. If there is not a delay, an utterance of participant A is transmitted to participant B’s site right away and participant B can receive the utterance immediately. However, if there is a certain delay, each participant receives delayed utterance of other participant. In this example, participant A utters something (A1), and then waits responses from participant B. However, there is no response for a while because of delay. After a certain time, participant A starts another utterance (A2) because he/she assumes that participant B does not want to respond. Then participant A is interrupted by utterance of participant B (B1’) suddenly and he/she is confused. At participant B’s site, participant B receives a delayed utterance (A1’). Participant B starts responding to participant A (B1) after a certain thinking time, and then he/she is interrupted by next utterance of participant A suddenly (A2’). Participant B is also confused because the utterance of A2 is not related to A1’ and B1. These interruptions are called unintended interruptions. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 157–164, 2011. © Springer-Verlag Berlin Heidelberg 2011
158
K. Murata, M. Hattori, and Y. Shibuya
No response…
A’s waiting time A1 A2
A’s site
B1’
time B’s site
A2’ A1’ B1
Unintended Interruptions occured.
B’s thinking time
Fig. 1. An example of unintended interruption
Thus, when there is a certain delay in video chatting, participants often misunderstand partners’ response and make unintended interruptions. These misunderstandings and unintended interruptions disrupt conversation and make participants confused. In this situation, because participant A cannot know if there is some delay or not, he/she cannot distinguish the waiting time due to the delay from the thinking time of participant B. However, if the participant can recognize the relationship between the timing of the playback of his/her utterance at remote site and the timing of other participant’s response to it, he/she can understand the response timing of other participant and avoid these unintended interruptions. In this paper, we propose a visual feedback method to overcome these problems. The proposed method consists of two kinds of indicators: Scroll Wave Indicator and Afterimage Indicator. There are many studies aimed to reduce delay of video-based telecommunication. However, it is impossible to get rid of delay completely. The purpose of our proposed method is to reduce the influence of delay by improving user interface of video chat system instead of removing the delay itself.
2 Proposed Method Our proposed method provides visual feedbacks which show the status of voice and video playback of remote site. Fig.2 illustrates an example of display timing of these feedbacks. The display timings of these feedbacks are delayed for a length of a oneway delay from actual start timing and stop timing of the voice and video playback. That is, every start timing and stop timing of the visual feedbacks is delayed for the length of a round-trip delay from these timings of the participant’s own utterances. The length of his/her partner’s response timing at remote site (TR) is equal to the length of it at local site (TL) consequently.
Visual Feedback to Reduce Influence of Delay on Video Chatting
159
The participant who sees these feedbacks can recognize when his/her partner responded to his/her utterance. Consequently, misunderstandings of the response timing of his/her partner and occurrences of unintended interruptions are prevented.
B is listening to my voice, so I’ll wait until the playback stops.
The visual feedback is displayed after a round-trip delay from actual timing of A’s utterance.
TL ( = TR = B’s thinking time)
A’s site
time
B’s site
TR ( = B’s thinking time) Fig. 2. An example of display timing of proposed feedbacks
We introduced the Progress Bar Indicator to decrease unintended interruptions of voice chatting in our previous study [2]. However, it is difficult to apply this indicator to video chatting. Kawashima et al. proposed Visual Filler to solve these problems [3]. However, the Visual Filler requires inferring the end timing of the user’s utterance automatically. This inferring process is not needed for our proposed method. 2.1 Scroll Wave Indicator Fig. 3 shows an appearance of our proposed method. The Scroll Wave Indicator shows a state of a voice playback of a user and his/her partner. A waveform of upper area means an amplitude of the partner’s voice playback. This waveform appears from left end of a black rectangle area and scrolls toward left side of the indicator at a constant rate as the voice of the partner is played at the user’s site. That is, the waveform is displayed after a length of a one-way delay from actual start and end timing of an utterance of the partner. A waveform of lower area means an amplitude of the user’s voice. The waveform also displayed after a length of a round-trip delay. The waveform appears from the black line, and then it scrolls toward left side of this indicator at a constant rate. This black line is shifted to the right from the center of the lower area. At the center of the lower area, a dashed line is drawn. The distance between the black line and the dashed line means the length of the round-trip delay. That is, when the waveform reaches the halfway between the black line and the dashed line after it scrolled for the length of a
160
K. Murata, M. Hattori, and Y. Shibuya
one-way delay, the user’s voice starts to play at the partner’s site. Subsequently the waveform reaches the dashed line after it scrolled for the length of a round-trip delay. The waveform over the dashed line means the voice of the user which was being played at his/her partner’s site when the voice of the partner which is being played at the user’s site was uttered by the partner. By comparing the positional relation between the partner’s waveform and the user’s waveform, the user is able to understand the timing when the partner’s utterance which is being played at the present moment was uttered at his/her partner’s site. In addition, the user is able to infer which part of his/her utterance was played at the partner’s site from the figure of the waveform. The user is able to refrain from unnecessary utterance, e.g. “Can you hear me?”
9LGHRLPDJHRIDSDUWQHU
7KH6FUROO:DYH,QGLFDWRU 7KH$IWHULPDJH,QGLFDWRU $GHOD\HGVHOIYLGHRLPDJHLV FRPELQHGZLWKUHDOWLPHVHOI YLGHRLPDJH 7KH6FUROO:DYH,QGLFDWRUIRUYRLFHVXSSRUW
Speech
waveform of a Speech waveform of a participant.
Fig. 3. An example of display timing of proposed feedbacks
2.2 Afterimage Indicator Through video chatting, participants often communicate not only verbal messages but nonverbal messages, e.g. notations or gestures. Accordingly we propose the Afterimage Indicator as a visual feedback which presents the timing when such nonverbal messages are played through a video image at a remote site. An appearance of the Afterimage Indicator is also shown in Fig.3. In the Afterimage Indicator, a video image of a user which is delayed for a length of a round-trip delay is overlapped with a real time video image of the user. This overlapped image helps the user to
Visual Feedback to Reduce Influence of Delay on Video Chatting
161
understand the timing when the user’s nonverbal messages are played at his/her partner’s site.
3 Experiment We conducted an experiment to confirm the following effectiveness of our proposed method. • The proposed method helps a user to understand the timing of his/her partner’s response. • The proposed method decreases unintended interruptions. 3.1 Procedure Six pairs of volunteers, ranging in age 22 to 25 years, were participated in this experiment. They were four students, seven graduate students and one alumnus of our university. They had been known well each other and made daily face-to-face communication. Each participant of a pair was in separate rooms and both of them wore headsets. They talked to each other through the headsets. They were asked to make three minutes discussion. The discussion topic was to line up three candidates for certain goal. For example, “You are going to have lunch. What kind of lunch will you have?” They were asked to make the discussion with our proposed method. Moreover, they were asked to do it with usual video chat system (hereafter, “conventional method”). The video chat system had two video windows of the partner and the participant himself/herself, and they talked each other through headsets. This experiment was performed under three kinds of round-trip delay conditions: 0, 1 and 2 [sec]. 3.2 Measures We measured rates of unintended interruptions in this experiment. In addition, participants were asked to answer a questionnaire for subjective evaluation after each discussion. Rate of unintended interruptions. The rate of unintended interruptions (hereafter, “RUI”) was defined as a following equation. RUI = NIU / NUP. NIU: Number of utterances interrupted by a partner that were uttered by a participant successively. NUP: Number of utterances of a participant.
Participants were able to communicate by using both of voice and a user’s body motion through headset and video image on PC in this experiment. Therefore we considered that there were two kinds of unintended interruptions. One was unintended interruptions by a partner’s voice through a headset, and the other was unintended interruptions by a partner’s body motion through a video image. We defined the former as RUI by voice, and the latter as RUI by video. The Scroll Wave Indicator was expected to decrease RUI by voice and the Afterimage Indicator was expected to decrease RUI by video.
162
K. Murata, M. Hattori, and Y. Shibuya
Questionnaire. The participants were asked to answer a questionnaire including following questions. 1. You could understand the timing when your partner responded to your utterance. (1: disagree – 5: Agree ) 2. How often did you watch the self-video image on PC? (1: didn’t watch it – 5: always watched it)
4 Result and Discussion Fig. 4 shows the result of RUI by voice. In the case of 2 [sec] delay condition, a t-test revealed a significant difference between the RUI by voice with proposed method and that with conventional method (p<.05). There was no significant difference in the case of 1 [sec] condition. Fig. 5 shows the average score of question #1. An ANOVA showed a significant main effect of delay condition in the case of using the conventional method (F2,11=4.696, p<.05). A post-hoc test revealed that the score of 2 [sec] delay condition was significantly lower than that of 0 [sec] delay condition. However, there was no significant difference in the case of using the proposed method.
Conventional method
Proposed method
Rate of unintended interruptions
0.4
*: p<.05
0.3 0.2 0.1 0 1 [sec] delay
2 [sec] delay
Average score of question #1
Conventional Proposed Fig. 4. Themethod result of RUI by voice.method *: p<.05 5 4 3 2 1 0 [sec] delay
1 [sec] delay
2 [sec] delay
Fig. 4. The average score of question #1
Visual Feedback to Reduce Influence of Delay on Video Chatting
Conventional method
163
Proposed method
Average score of question #2
5 4 3 2 1 0 [sec] delay
1 [sec] delay
2 [sec] delay
Fig. 5. The average score of question #2
Thus, as the length of the delay was extended, it became hard to understand the timing of a partner’s response with the conventional method. On the other hand, even if the delay was 2 [sec], the participant with the proposed method was able to understand the timing easily. These results indicate that the Scroll Wave Indicator helps a participant to understand the timing of his/her partner’s response. Furthermore, these results suggest that the Scroll Wave Indicator decreases unintended interruptions. With regard to the Afterimage Indicator, the effect of the Afterimage Indicator was unable to be analyzed because there were few unintended interruptions by a participant’s body motion. In this experiment, the task which performed by participants did not require them to use nonverbal expressions by body motion. They therefore needed not to use a video image on PC. The score of question #2 is shown in Fig. 6. There was no difference between the score of the proposed method and that of the conventional method. This result indicates that participants barely watch their self-video image. Some participants commented that the overlapped delayed selfvideo image was obstacle to see the usual self-video image. To solve this problem, we need to improve the Afterimage Indicator which combines the effectiveness of the indicator and visibility of the self-video image.
5 Conclusion In this paper, the Scroll Wave Indicator and the Afterimage Indicator are proposed and evaluated experimentally. The result of evaluation showed that the Scroll Wave Indicator made participants understand the remote participant’s response timing and decreased unintended interruptions even if there was 2 [sec] round-trip delay. However, the effectiveness of the Afterimage Indicator was not shown. Future work will involve an experiment using tasks that should be performed with both of verbal and nonverbal communications.
164
K. Murata, M. Hattori, and Y. Shibuya
References 1. Ruhleder, K., Jordan, B.: Co-Constructing Non-Mutual Realities: Delay-Generated Trouble in Distributed Interaction. Computer Supported Cooperative Work 10(1), 113–138 (2000) 2. Murata, K., Nakamura, M., Shibuya, Y., Kuramoto, I., Tsujino, Y.: Visual feedback to reduce the negative effects of message transfer delay on voice chatting. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4558, pp. 95–101. Springer, Heidelberg (2007) 3. Kawashima, H., Nishikawa, T., Matsuyama, T.: Visual filler: Facilitating Smooth Turntaking in Video Conferencing with Transmission Delay. In: CHI 2008 extended abstracts on Human factors in computing systems, pp. 3585–3590 (2008)
Research on the Relationships between Visual Entertainment Factor and Chat Communication Tomoyasu Ogaki, Junko Itou, and Jun Munemori Graduate School of systems Engineering, Wakayama University, 930, Sakaedani, Wakayama 640-8510, Japan Faculty of systems Engineering, Wakayama University, 930, Sakaedani, Wakayama 640-8510, Japan
[email protected], {itou,munemori}@sys.wakayama-u.ac.jp
Abstract. In this article, we analyze the effects of visual entertainment factors included in visual cues on chat communication aiming to enliven exchanging chat messages. Visual cues such as smileys, avatars and pictograms are essential to make our communication successful. However, visual cues can be used as only a substitution to express user's intentions. Therefore we propose a chat system with characters which change different forms according to chat messages input bye users with visual cues and we investigate the effects on exchanging chat messages.
1 Introduction The use of online communication is becoming widespread and the diversity of online communication tools has increased in recent years. For example, several online communication tools such as e-mail, chat system, and distance learning are currently in use. Further, varieties of chat systems provide features, from conventional textbased ones to those that use a graphical interface in which virtual 3D agents represent users in a virtual 3D space. Using visual cues including smileys, avatars and pictograms, users can visually express their emotions or intentions in their chat messages. Visual cues facilitate smooth communication between chat system users. Many people use visual cues in their daily electronic text-based conversations. Over 70 % of cellular phone users use a pictograph in their communication according to an investigation that took into account people of all age groups from senior citizens to young people [1]. According to another research, approximately 90% of university students use emoticons in their daily e-mail messages [2]. Moreover, an investigation revealed that among young people aware of an avatar, approximately 50% used an avatar on the Internet and approximately 70% wanted to use an avatar [3]. Therefore, it can be deduced that the visual cues mentioned above are used by several people from different generations in their daily communication, including chat communication, and that visual cues are useful for everyday conversations. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 165–170, 2011. © Springer-Verlag Berlin Heidelberg 2011
166
T. Ogaki, J. Itou, and J. Munemori
Previous works on chat systems mainly discuss how visual cues can be used for smooth communication. In this work, we focus on the use of visual cues for creating a sense of fulfillment as well as for entertainment. We propose a chat system with visual entertainment factors and visual entertainment features that users can bring up to recreate their original characters through chat, and we investigate how visual entertainment features affect chat communication.
2 Application of Visual Cues to Communication Systems 2.1 Situations Where Visual Cues Are Used Visual cues include emoticons, pictographs, and avatars. Emoticons are a combination of a character and the sign that can be employed to express a use’s emotion. Emoticons are commonly used at the end of a sentence in e-mail, chat conversations, and even bulletin boards. Pictographs consist of texts. It is used as a picture on cellular phones and instant messengers. Pictographs can be of various types, such as those expressing emotional states, or those depicting animals, plants, and buildings. User’s employ an avatar is as an agent chat, instant messengers and online games. An avatar is a human-like character that can easily express the user’s emotions as well as emoticons and pictographs can do. From the above explanation, it is clear that these visual cues play an important role in expressing a user’s emotions and intentions more accurately in an e-mail or chat system. 2.2 Communication Systems Many visual cues have already incorporate communication systems. Instant messengers are applications that enable the members logged into chat with one another. Latest instant messengers allow users to exchange messages containing pictographs, animations and sounds. Koda proposed a communication tool that can write messages on a picture of a character displayed on the screen of the user’s personal computer [4]. The user can choose from many different types of character templates according to the message content. “MEDIAC Messenger” is a chat system that users can use 3D characters as their agents [5]. In this messenger, 3D characters perform various actions in the window on the desktop and react to the words typed in chat messages. This chat system focuses on the relationships among emotional words, the actions of the characters, and the reactions of the characters. This chat system also allows visual cues such as pictographs, smileys, and avatars. However, the main purpose of this system is to convey the user’s emotions and intentions. Our aim is to use visual cue to obtain an active feedback and a passive response in a chat system.
Relationships between Visual Entertainment Factor and Chat Communication
167
3 Chat Systems with Visual Entertainment Features 3.1 Goal Our goal is to evaluate the proposed chat system on the basis of its visual entertainment features. We used visual cues for not only smooth communication but also humorous communication: therefore, we introduced “evolving of a character” into our chat system as a visual entertainment features. We designed characters that can induce positive feelings in users by changing their aspects in order to confirm how the character affects users and whether it triggers chat communications. 3.2 Evolving of a Character The designed character evolves into newer forms according to the words, smileys, and emoticons used in chat messages. The overview of the evolving of a character is shown in Fig.1. The characters change their forms when the number of input chat messages exceeds a threshold value. When a user inputs “positive” words or smileys often, the state of the character changes to that on the left side in the second step with a high probability, as shown Fig.1. On the other hand, when a user inputs more “negative” words or smileys, the state of character tends to change to that shown on the right side in the second step. The “Positive” words includes “happy”, “delight”, “smile” and so on. The “Negative” words includes “angry”, “tired”, “sad”, “bored”, and so on. This system accepts twenty types of keywords and seven types of characters.
Fig. 1. Evolving of a character
168
T. Ogaki, J. Itou, and J. Munemori
3.3 System Structure Our chat system consists of a server and multiple clients. When users start the client system, some windows open up on their screen as shown in Fig.2. The right window displays characters or the users’ avatars, chat logs, input forms and a list of emoticons. In the top portion of this window, users’ avatars perform various actins and evolve according to the chat text input bye users. After user A inputs a message in the field provided in the chat window and sends it to the server, the server extracts a keyword from the sent message by morphological analysis to determine the action that character A should perform. The instructions for the action that the character should perform and user A’s message are sent from the server to each client. When a user selects an emotion icon from the list as an input, a balloon containing that emotion icon is displayed. The left window displays a list of smileys. Smiles are divided into five categories: “delight”, “angry”, “sorrow”, “surprising” and “others”. Users can easily input smileys into chat messages from this list.
Fig. 2. Overview of proposed system
4 Experimental Results To investigate the role of visual entertainment features in triggering chat communication, we performed an evaluation experiment using three types of chat systems: (a) text-based chat, (b) avatar chat, and (c) chat with an evolving avatar. In chat system (a), users can input only text. In chat system (b), they can use text as well as an avatar that doesn’t change its form. Chat system (c) is the proposed chat system.
Relationships between Visual Entertainment Factor and Chat Communication
169
The experimental subjects were ten college students: they were divided into five pairs. They were made to sit away from each other to prevent them from knowing their partner’s emotions or reactions. We instructed each subject to freely chat online for 15 min using smileys and/ or emotion icons. They used the chat systems in order of chat system (a) to chat system (c). The result is summarized in Table.1. Averages were calculated for each type of data. The table entry “visual cue” indicates that the corresponding average is for those users that selected smileys and emotion icons. From Table.1, it is clear that the participants had a tendency to use visual cues actively and the number of messages and wordage increased significantly when they used chat system (c). This result shows that use of visual entertainment features triggers user’s conversations and does not disturb the exchange of messages between users. Table 1. Averages of different kinds of chat-related data
number of messages total wordage visual cues
(a) 17.88 273.50 0.13
(b) 20.25 319.25 4.00
(c) 23.00 369.63 9.74
The results of a questionnaire filled out by the subjects are provided in Tables.2 and 3. From table 2, it can be understood how chat system (c) affects chat conversation. From the evaluated values listed, it can be concluded that the character and other emoticons did not disturb the chat communication between users and that the participants could communicate as smoothly using chat system (c) as they did using a traditional chat system such as system (a). Table 2. Effect of the chat system with an evolving character on chat conversations
Questionnaire Item ( i ) Use of the chat system with an evolving character disturb your chat conversation (1: yes - 5: not at all) ( ii ) Use of the chat system with an evolving character makes you input messages in a proactive (1: not at all - 5: yes) (iii) Emoticons are useful in chat conversation (1: not at all - 5: yes)
Average 4.2
3.9 4.2
According to Table.3, the evaluated value for “ease in conversing” for chat system (c) was 4.1 and the decentralized value was 0.1. On the other hand, the decentralized value of the same item for chat system (a) was 0.8. This implies that most subjects tend to feel that they can easily exchange messages using chat system (c). From these results, we deduce that our proposed chat system enlivens chat conversations with the
170
T. Ogaki, J. Itou, and J. Munemori
help of visual entertainment features that help trigger a dialogue between subjects and that the subjects enjoy their chat. Table 3. Evaluation of each chat system
Questionnaire Item Ease in conversing Fun in conversing Triggering of conversations
(a) 3.0 2.8 2.3
(b) 3.5 3.7 3.5
(c) 4.1 4.6 4.4
5 Conclusions In this article, we discussed the effects of visual entertainment features on the chat conversations of users in a chat system. By a comparison test of three types of chat systems, we were able to confirm that visual entertainment features trigger the exchange of chat messages. In this experiment, we used each chat system only once and for a short term. Therefore, we could not observe how the continual use of our chat system affects chat conversation. In the future, we plan to consider this point and improve our chat system.
References 1. MyVoice Communications, Inc. The research on commonly-used function of cellar phone (2008), http://www.myvoice.co.jp/biz/surveys/11601/ 2. Okamoto, E.: Research on transition of emotion by face mark, Sonoda Women’s University (2002) 3. NTT ADVERTISING, Inc. The research on utilization and recognition of ‘avatar’ in internet sites, NEWS RELEASE (October 17, 2007), http://www.ntt-ad.co.jp/news/20071017/20071017.pdf 4. Koda, T.: Development and Analysis of an Emotionally Expressive Communication Tool "Petaro". The Transaction of Human Interface Society 8(1), 101–108 (2006) 5. Itou, J., Hoshio, K., Munemori, J.: A Prototype of a Chat System Using Message Driven and Interactive Actions Character. In: 10th International Conference on Knowledge-based Intelligent Information & Engineering Systems, KES 2006 (2006)
Multimodal Conversation Scene Analysis for Understanding People’s Communicative Behaviors in Face-to-Face Meetings Kazuhiro Otsuka NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corp., 3-1 Morinosato-Wakamiya, Atsugi-shi, Kanagawa-pref., 243-0198 Japan
[email protected] http://www.brl.ntt.co.jp/people/otsuka/
Abstract. This presentation overviews our recent progress in multimodal conversation scene analysis, and discusses its future in terms of designing better human-to-human communication systems. Conversation scene analysis aims to provide the automatic description of conversation scenes from the multimodal nonverbal behaviors of participants as captured by cameras and microphones. So far, the author’s group has proposed a research framework based on the probabilistic modeling of conversation phenomena for solving several basic problems including speaker diarization, i.e. “who is speaking when”, addressee identification, i.e. “who is talking to whom”, interaction structure, i.e. “who is responding to whom”, the estimation of visual focus of attention (VFOA), i.e. “who is looking at whom”, and the inference of interpersonal emotion such as “who has empathy/antipathy with whom”, from observed multimodal behaviors including utterances, head pose, head gestures, eye-gaze, and facial expressions. This paper overviews our approach and discusses how conversation scene analysis can be extended to enhance the design process of computer-mediated communication systems. Keywords: Conversation scene analysis, meeting analysis, multimodal interaction, nonverbal behavior.
1 Introduction Face-to-face conversation is one of the most basic forms of communication in daily life and group meetings are used for conveying/sharing information, understanding others’ intention/emotion, and making decisions. Despite the rapid progress in network infrastructure, mobile devices, and social media, visual communication among spatially separated people is still impeded by unnatural and uncomfortable usability, it remains quite different from face-to-face conversations. To design better communication systems, the author believes that it is critical to elucidate the mechanism of human communication, i.e. how people communicate/interact with each other. In recent years, the analysis of human conversation such as group meetings has been acknowledged as an emerging research field, and a growing number of researchers are becoming engaged in this field [1]. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 171–179, 2011. © Springer-Verlag Berlin Heidelberg 2011
172
K. Otsuka
In face-to-face conversations, people exchange not only verbal messages, but also nonverbal messages, such as eye-gaze, head and body gestures, facial expression, and prosody [2]. Therefore, it is expected that conversation structures and dynamics can be largely understood by observing the nonverbal behaviors of the conversing participants. Conversation scene analysis aims to provide the automatic description of conversation scenes from such multimodal nonverbal behaviors of participants as captured by cameras and microphones. It ranges from low-level (e.g. physical and/or behavioral) to high-level attributes (e.g. mental and/or social). So far, the author’s group has proposed a research framework that probabilistically models conversation phenomena for solving several basic problems including speaker diarization, i.e. “who is speaking when” [3, 4], addressee identification, i.e. “who is talking to whom” [5, 6, 3], the estimation of visual focus of attention (VFOA), i.e. “who is looking at whom” [5, 6, 7], interaction structure, i.e. “who is responding to whom” [8, 9], and interpersonal emotion such as “who has empathy/antipathy with whom” [10], from observed multimodal behaviors including utterance, head pose, head gestures, eyegaze, and facial expressions. This paper overviews our approach to multimodal conversation scene analysis. In Section 2, we introduce our concept of the probabilistic modeling of conversation phenomena and several examples. Section 3 details our sensing approach including a real-time system for conversation scene analysis. In section 4, I will introduce our latest effort toward understanding interpersonal emotions such as empathy and antipathy during conversations. Finally, in Section 5, we will discuss how conversation scene analysis can advance the design process of human-human communication systems.
2 Conversation Scene Analysis Based on Probabilistic Conversation Models The main components of our approach to conversation scene analysis involve probabilistic conversation models and multimodal sensing of human nonverbal behaviors. This section sheds light on the modeling part and later, in section 4, the sensing part is presented. Our approach to conversation modeling features i) probabilistic modeling, ii) a hierarchical generative model, and iii) unsupervised inference. This approach assumes that conversation phenomena can be modeled as generative hierarchical models, where the higher layer governs the dynamics of the lower layer. Based on this idea, the author has developed a dynamic Bayesian network (DBN) with three layers. In the following subsections, we introduce our methods for estimating the joint visual focus of attention (VFOA) of meeting participants, i.e. “who is looking at whom”, and addressee identification, “who is talking to whom” [5, 6]. Next, section 2.2 introduce the problem of estimating an interaction structure, i.e. “who is responding to whom” [8, 9]. Section 2.3 shows our recent approach for estimating VFOA by joint modeling of head pose and imagebased gaze detection [7].
Multimodal Conversation Scene Analysis
173
Fig. 1. Conversation Model. (a) Concept of layered structure, (b) Basic model for estimating gaze patterns and conversation regimes. (c) Enhanced model for inferring interaction structures involving utterance and head gestures.
Fig. 2. Gaze pattern and indicated types of conversation, called conversation regimes. (a) most frequent gaze patterns in a 4-person conversation (left to right). Nodes represent people and arrows indicate interpersonal gaze directions. No arrow indicates looking at no one. (b) The three conversation regimes are convergence, dyad-link, and divergence. The typical topology of gaze patterns is shown for each regime. A regime can indicate participants’ roles such as speaker, addressees, and side-participants during a conversation.
2.1 Joint Estimation of Visual Focus of Attention and Conversation Structures Gaze, also called visual focus of attention (VFOA), has been of interest to psychologists for many years. They have advanced several functions of gaze such as monitoring others, expressing one’s attitudes/intentions, and regulating turn-taking [11, 12]. In 2005 [5], authors targeted the problem of estimating visual focus of attention (VFOA), i.e. “who is looking at whom”, in multiparty conversations, and proposed “conversation structure”, the attempt at first joint modeling of multiple people’s gaze, in addition to conversational context. The structure indicates “who is talking to whom, and who is listening to whom”. It also addresses the subsequent
174
K. Otsuka
problem of addressee identification. Fig. 1 illustrates the main concept of the model. The top layer, called the conversation regime, represents the global status of conversation, especially the patterns of message exchange among people, e.g. “who is talking to whom and who is listening to whom”. The middle layer, the interaction layer, represents how people interact with each other, e.g. gaze interaction. The bottom layer is behavior layer, which includes the various nonverbal behaviors of meeting participants such as head pose, presence/absence of utterance, head gestures, and facial expressions. These human behaviors can be measured with sensors [5], computer vision techniques [3, 6, 9], and audio signal processing [3, 4]. The resulting model can be represented as a Bayesian network as shown in Fig. 1(b) and (c). The model in Fig. 1(b) is directed at the problem raised in this section, while the one in Fig. 1(c) is used in the next section. In the model in Fig. 1(b), the conversation regime controls the dynamics of gaze patterns and utterances. The gaze pattern is a hidden variable and is estimated from head-pose measurements in the bottom observation layer. The key idea of this model is the hypothesis that the topology of gaze pattern (a set of gaze directions of all participants) is strongly related to the basic conversation structures, i.e. “who is talking to whom, and who is listening to whom”, we call these conversation regimes. Fig. 2(a) shows typical gaze patterns and our conversation regimes. This model is based on the common observation that “speaker orients his/her gaze to addressees, who tend to look at speaker”. With this model, the regimes and the gaze patterns are jointly estimated from the utterances and the head directions measured with sensors or face tracking in videos; the estimation is implemented using a MCMC (Markov chain Monte Carlo) method. Author’s contributions include shedding light on the joint gaze directions of people and rendering the problem suitable for computational analysis. 2.2 Estimation of Multimodal Interaction Structures, i.e. “Who Responds to Whom, When, and How?” In addition, an extended model has been proposed for analyzing cross-modal nonverbal interactions, i.e. “who responds to whom, when, and how”, from multimodal cues including utterances, gaze, head gestures [8, 9]. Fig. 1(c) shows the structure of this model. This model focuses in particular on head gestures such as nodding, shaking, and tilting, which play important roles in conversations. Speaker’s head gestures appear as visible signs of addressing, questioning, and stressing, and hearer’s gestures can be interpreted as signs of listening, acknowledgement, and agreement/disagreement [13]. Their gestures are used to regulate various interactions such as question/answer, addressing/back-channel response, and yielding the floor. As an additional hidden variable to be estimated, this model incorporates a causal relationship among behaviors, called interaction structure, i.e. “which behavior triggers which behavior”, where behavior refers to utterances and head gestures. Fig. 3 shows two ways of visualizing the estimation results. In Fig. 3 (a), the wave from the rightmost person (P4) to leftmost person (P1) indicates that P4's nod is a reaction toward P1. In Fig. 3 (b), reactions are indicated by arrows starting from responder to the triggering behavior of the other.
Multimodal Conversation Scene Analysis
175
Fig. 3. Visualization of interaction structures. (a) spatial diagram, (b) temporal diagram. In (a), arrows indicate estimated gaze directions, and wave starts from responder to person who triggered the response. Fig. 3 (b) shows detected utterance intervals (gray rectangles), detected head gestures (lines below utterance), the estimated response (white arrows started from responder to the triggering person).
2.3 Estimation of Visual Focus of Attention from Head Pose and Image-Based Gaze Detection So far, head pose has been considered as a reasonable indicator of gaze direction (VFOA) in the field of meeting analysis. However, it can be ambiguous and the determination of correct gaze direction can often fail because a single head pose affords multiple gaze directions. For more reliable estimation, the new conversation model incorporates an image-based gaze detector, which can model of joint head and eye-gaze dynamics [7]. Its superiority over head-only models is particularly strong when dealing with complex attention shifts and gaze aversion dynamics.
3 Real-Time Meeting Analysis System Sensing people's behavior is another important part of conversation scene analysis. People's nonverbal behaviors are mainly exchanged through audio and visual channels. Therefore, audio signal processing and computer vision techniques are expected to be useful for sensing and measuring the nonverbal behaviors. To advance sensing technology, the author’s group developed a real-time system for analyzing group conversations using an omnidirectional camera-microphone sensor [3], which is combined with face position/pose tracking and voice detection with arrival direction, to determine “who is talking to whom” and “who attracts most attention”, as shown in Fig. 4. This system targets a round table meeting with 4 to 8 people, and captures the meeting scenes with omnidirectional camera-microphones, as shown in Fig 4(c). Of particular note, we developed a GPU-based face tracker, called STCTracker [14], that uses Memory-based particle filtering [15], which can realize both real-time speed and robustness. Real-time response is the key to realizing practical applications such as computer-mediated teleconferencing and conversational robot/agents.
176
K. Otsuka
Fig. 4. Real-time system for group meeting analysis, (a) meeting scenes, (b) real-time 3D visualization, (c) omnidirectional camera-microphone
Fig. 5. Model of emotional interaction. The model links the emotional interaction (empathy/ antipathy/unconcern), gaze patterns, and facial expressions between two people, based on cooccurrence nature of facial expressions when people interact to confirm/exchange/share their emotion with others.
4 Inferring Emotional Interactions: Who Feels Empathy with Whom? Beyond the structural aspect of conversations, the next challenge is discovering people's emotions, i.e. “what kind of emotion is raised by others” and “how human emotions change with interactions in conversations”. Recently, the author’s group has focused on interpersonal emotion developed in the course of conversational interactions. We pay particular attention to empathy and antipathy, and aim to automatically infer “who shared empathy/antipathy with whom and when” [10]. To that end, we focus on the facial expressions of the participants as a useful cue, and hypothesize that the inter-play of facial expressions between two people is strongly related to their emotional interactions, which shape empathy/antipathy between the two. For example, a typical interaction in conversations is for the recipient of a smile (a facial expression) to returns it. Here, if the recipient feels empathy with the first
Multimodal Conversation Scene Analysis
177
person in terms of his/her opinion and/or his/her personality, the recipient shares the feeling of the first person. Note that empathy sharing can be evidenced by sad-sad exchange in addition to smile-smile exchange, i.e. when talking about sad events. On the other hand, responding with an opposite expression, e.g. anger from smile, can indicate antipathy. No response means unconcern or sometimes a sign of antipathy to the person. Based on our hypothesis, we have developed a probabilistic model that links the emotional interaction (empathy/antipathy), co-concurrency of facial expressions between the pair of people, their gaze directions, and utterances. MCMC (Markov chain Monte Carlo) simulation yielded the posterior distribution of emotional interaction that well replicated the judgment of human external observers who watched the video. Here, we introduce a new scheme for evaluating human emotions. Instead of assuming that there is one solid ground truth in the mind of the targeted person that can be reliably measured, we focus on the statistical nature of human discernment of others’ emotion from the observer’s viewpoint, i.e. human observation/interpretation inherently varies from person to person. From this point of view, we evaluated the estimation results by measuring the distance between two distributions, one is the posterior distribution from the model computation, and the other is the histogram obtained from annotations by multiple human observers.
Fig. 6. Design framework for developing human-to-human communication system based on conversation scene analysis
5 Discussion of Communication System Design The author believes that conversation scene analysis has the potential to yield enhanced communication system designs for spatially separated people. Fig. 6 illustrates our research and design framework based on conversation scene analysis for developing such systems. In Fig. 6, the group meeting on the left can be analyzed by the techniques mentioned in this paper, and the result of the analysis can be used for synthesizing/visualizing meeting scenes for remote viewer(s), see the right hand of Fig. 6. This flow is denoted by the arrow (a) in Fig. 6. We have developed to date a simple visualization scheme as mentioned in the earlier sections. Further sophistication is an important research topic for better understanding of remote meetings by remote users. Moreover, for bi-directional communication, the return flow, which also involves analysis and synthesis as indicated as arrow (b), is required
178
K. Otsuka
to send the remote user’s image/audio back to the meeting participants on the left. Moreover, to evaluate such computer-mediate communication systems ((c) in Fig. 6), the idea and techniques of conversation scene analysis are useful. The analysis is expected to yield several quantitative measures for assessing how naturally people behave and conduct communication through the system, based on their behaviors including gaze, utterances, gestures, and facial expressions. A comparison of the behaviors between face-to-face setting and remote communication systems would provide important clues for assessing naturalness in remote communication sessions. Such quantitative assessments and related insights will enable designers to improve remote systems in terms of both analysis and synthesis components (indicated as (d) in Fig. 6). For example, a remote system might degenerate the impact of some nonverbal behaviors because of the limited visual communication channel, and the analysis could tell system designers what nonverbal behavior was weakened and must be reinforced. In addition, this framework would also be useful for developing agent/robot communication systems.
6 Conclusion This paper introduced our approach to multimodal conversation scene analysis, and discussed its future in terms of designing better human-to-human communication systems. So far, the author’s group has proposed a research framework based on probabilistic modeling of conversation phenomena to solve several basic problems including speaker diarization, addressee identification, an interaction structure, the estimation of visual focus of attention (VFOA), and the inference of interpersonal emotion, from observed multimodal behaviors including utterance, head pose, head gestures, eye-gaze, and facial expressions. In addition, this paper discusses how the conversation scene analysis can be extended to and can enhance the design process of computer-mediated communication systems. Multimodal analysis of people's communication enables a deeper understanding of the human-human relationship in all aspects of human life, not only in the online world, but also in real life. Future works include following. First, it is necessary to target more realistic environments, which include not only human-human conversation, but various activities involving objects, tools, and environments. Therefore, continuous effort is needed to develop more robust sensing technologies for both vision and audio processing that can deal with low quality image and audio captured in highly cluttered and noisy environments. In addition, full multimodal integration and modeling is essential for understanding high-level aspects of human communication such as cultural and language differences. Third, it will contribute to social computing such as social network analysis. Finally, it would be interesting to explore its possible application in psychology and sociology research.
References 1. Gatica-Perez, D.: Automatic Nonverbal Analysis of Social Interaction in Small Groups: A Review. Image and Vision Computing 27, 1775–1787 (2009) 2. Argyle, M.: Bodily Communication, 2nd edn. Routledge, London and New York (1988)
Multimodal Conversation Scene Analysis
179
3. Otsuka, K., Araki, S., Ishizuka, K., Fujimoto, F., Heinrich, M., Yamato, J.: A Realtime Multimodal System for Analyzing Group Meetings by Combining Face Pose Tracking and Speaker Diarization. In: ACM ICMI 2008, pp. 257–264 (2008) 4. Ishizuka, K., Araki, S., Otsuka, K., Nakatani, T., Fujimoto, M.: A Speaker Diarization Method based on the Probabilistic Fusion of Audio-visual Location Information. In: ICMI 2009, pp. 55–62 (2009) 5. Otsuka, K., Takemae, Y., Yamato, J., Murase, H.: A Probabilistic Inference of MultipartyConversation Structure based on Markov-Switching Models of Gaze Patterns, Head Directions, and Utterances. In: ACM ICMI 2005, pp. 191–198 (2005) 6. Otsuka, K., Yamato, J., Murase, H.: Conversation Scene Analysis with Dynamic Bayesian Network based on Visual Head Tracking. In: ICME 2006, pp. 949–952 (2006) 7. Gorga, S., Otsuka, K.: Conversation Scene Analysis based on Dynamic Bayesian Network and Image-based Gaze Detection. In: ACM ICMI-MLMI 2010 (2010) 8. Otsuka, K., Sawada, H., Yamato, J.: Automatic Inference of Cross-modal Nonverbal Interactions in Multiparty Conversations. In: ACM ICMI 2007, pp. 255–262 (2007) 9. Otsuka, K., Yamato, J.: Fast and Robust Face Tracking for Analyzing Multiparty Face-toFace Meetings. In: MLM 2008 (2008) 10. Kumano, S., Otsuka, K., Mikami, D., Yamato, J.: Analyzing Empathetic Interactions based on the Probabilistic Modeling of the Co-occurrence Patterns of Facial Expressions in Group Meetings. In: 9th IEEE Conference on Automatic Face and Gesture Recognition, FG 2011 (2011) 11. Kendon, A.: Some Functions of Gaze-direction in Social Interaction. Acta Psychologica 26, 22–63 (1967) 12. Goodwin, C.: Conversational Organization: Interaction Between Speakers and Hearers. Academic Press, London (1981) 13. Maynard, S.K.: Interactional Functions of a Nonverbal Sign: Head Movement in Japanese Dyadic Casual Conversation. J. Pragmatics 11, 589–606 (1987) 14. Mateo Lozano, O., Otsuka, K.: Real-time Visual Tracker by Stream Processing. Journal of Signal Processing Systems, 285–295 (2008) 15. Mikami, D., Otsuka, K., Yamato, J.: Memory-based Particle Filter for Face Pose Tracking Robust under Complex Dynamics. In: IEEE CVPR 2009, pp. 999–1006 (2009)
A Virtual Audience System for Enhancing Embodied Interaction Based on Conversational Activity Yoshihiro Sejima1, Yutaka Ishii2, and Tomio Watanabe3 1
Graduate School of Science and Engineering, Yamaguchi University, 2-16-1 Tokiwadai, Ube, Yamaguchi, Japan 2 Information Science and Technology Center, Kobe University, 1-1 Rokkodai, Nada, Kobe, Japan 3 Faculty of Computer Science and System Engineering, Okayama Prefectural University 111, Kuboki, Soja, Okayama, Japan
[email protected],
[email protected],
[email protected]
Abstract. In this paper, we propose a model for estimating conversational activity based on the analysis of enhanced embodied interaction, and develop a virtual audience system. The proposed model is applied to a speech-driven embodied entrainment wall picture, which is a part of the virtual audience system, for promoting enhanced embodied interaction. This system generates activated movements based on the estimated value of conversational activity in enhanced interaction and provides a communication environment wherein embodied interaction is promoted by the virtual audience. The effectiveness of the system was demonstrated by means of sensory evaluations and behavioral analysis of 20 pairs of subjects involved in avatar-mediated communication. Keywords: Human Interaction, Nonverbal Communication, Virtual Communication, Enhanced Interaction, Virtual Audience.
1 Introduction Advances in information technology have enabled humans to communicate through computer-generated (CG) characters called avatars in virtual worlds such as Second Life and online games [1]. Many studies on the development of communication systems using avatars [2]-[4] and evaluation of the embodied conversational agent in such communication systems [5],[6] have been performed. However, it is difficult to enhance the embodied interaction in these systems and to create an environment where the talker can share the sense of unity because the characteristics that enhance such embodied interaction has not been introduced thus far. Therefore, the development of an embodied interaction support system that enhances embodied interaction and increases talker involvement is essential. In a previous study, we developed a speech-driven embodied entrainment system called InterWall wherein interactive CG sunflowers act as listeners [7]. This system G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 180–189, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Virtual Audience System for Enhancing Embodied Interaction
181
can support human interaction and communication by generating embodied entrained movements such as nodding and body movements based on the speech input of a talker. We confirmed the importance of providing a communication environment in which not only the avatars but also CG objects placed around the avatars are related to virtual communication. In this study, in order to promote enhanced embodied interaction, we analyzed the enhanced interaction through avatars by using an embodied virtual communication system. Then, we proposed a model for estimating conversational activity based on the above-mentioned analysis and developed a virtual audience system in which the proposed model is applied to InterWall. This system uses the proposed model to generate not only embodied entrained movements but also activated movements. The effectiveness of the developed system is demonstrated by means of sensory evaluations and behavioral analysis in an avatar-mediated communication system.
2 Analysis of Enhancing Embodied Interaction 2.1 Experimental Method Using the embodied virtual communication system “EVCOS,” two subjects indulged in a free conversation assuming that embodied interaction was enhanced. Figure 1(a) shows a sample virtual communication scene between two CG characters called VirtualActors (VAs). The VA is an interactive avatar that represents the subject’s interactive behavior such as nodding and gestures on the basis of the nonverbal information provided by him/her in virtual space [8]. The experimental process was as follows. First, the subjects used the system for a 3-min free conversation. Here, the displayed images and the voices of both talkers were captured to record the communication scene using avatars. Next, they watched the video using Windows Media Player and picked up the enhanced interaction scene. Finally, non-participating subjects also watched the video using the same method (Figure 1(b)). The subjects were four pairs of male students.
Fig. 1. Example of communication and picking-up scene
182
Y. Sejima, Y. Ishii, and T. Watanabe
2.2 Analysis of Enhancing Interaction To detect the enhanced interaction period, we set the binary as the period of the communication scene selected by subjects as ON and the period otherwise as OFF. Figure 2 shows an example of the analysis results. The period indicated by the arrowed line was that of the communication scene selected by the subject. The same period is selected in spite of non-participating subjects. A speech overlap was observed in live communication [9], [10]. Therefore, the enhanced interaction was analyzed by focusing on the speech overlap. Speech data x(i) with a maximum amplitude of ±32767 was normalized at 1 and defined every 30 ms at a sampling rate of 11 kHz as the binary burst-pause of speech generated with a fill-in value of 166 ms. The overlap in the speech inputs of subjects 1 and 2 was evaluated by using the following expression.
overlap(i ) = x1(i ) × x 2(i )
(1)
x1(i ) : voice(talker1), x 2(i ) : voice(talker2) A sample speech-overlap analysis result is shown in Figure 2. The figure shows that both the rate of speech overlap and degree of the histogram increase. Additionally, a positive correlation between the speech overlap and enhanced interaction is seen.
Fig. 2. Example of analysis of enhanced interaction
3 Estimated Model of Conversational Activity Based on the analysis of the enhanced interaction, we propose a model for estimating the conversational activity from the degree of speech overlap. In this model, the present speech overlap is calculated from expression (1). Next, the conversational activity is estimated by introducing the calculated speech overlap to the weighted moving-average method from expression (2). The assignment of a weight to the present speech overlap causes the estimated value to increase rapidly. Figure 3 shows an example of the time changes in the estimated value and histogram. When the
A Virtual Audience System for Enhancing Embodied Interaction
183
degree of the histogram increases, the estimated value also increases. Thus, the conversational activity can be estimated from the speech overlap. K
u (i ) =
∑ ( K + 1 − j )overlap(i − j ) j =1
(2)
K
∑j j =1
u (i ) : conversational activity,
K : const.( = 150 frame)
The enhanced interaction scene was preliminarily estimated on the basis of the established threshold from the estimated value using the model developed to estimate conversational activity. This threshold was set by a preliminary experiment. When the estimated value exceeded a threshold value, the period was estimated as the enhanced interaction scene.
Fig. 3. Example of time changes in estimated value and histogram
4 Virtual Audience System 4.1 Concept The concept of a virtual audience system is shown in Figure 4. In human face-to-face communication, not only verbal messages but also nonverbal actions such as nodding and body movements are rhythmically related and mutually synchronized between the talkers [11]. This synchrony of embodied rhythms, called entrainment, in communication results in the sharing of embodiment in human interaction. By focusing on the entrainment in embodied communication, we have already analyzed the entrainment between the speech input of a speaker and nodding and body movements of a listener in face-to-face communication and have developed iRT (InterRobot Technology) that generates a variety of communicative actions and movements such as nodding and body movements on the basis of the speech input [12]. In addition, we have developed an interactive CG sunflowers called InterFlower and have demonstrated that InterFlower can effectively support human interaction and communication [13].
184
Y. Sejima, Y. Ishii, and T. Watanabe
When the avatars are involved in an embodied interaction, the virtual audience generates interactive movements such as bending forward to enhance the embodied interaction; this leads to sharing of embodiment in virtual space and realizes smooth communication.
Fig. 4. Concept of virtual audience system
4.2 System Setup Figure 5 shows the setup of the virtual audience system. The virtual space is generated using Microsoft DirectX 9.0 SDK on a workstation (HP Workstation xw4200) running Windows XP. The movements of the head, arms, and body of each VA is measured in terms of their positions and angles using four magnetic sensors (Polhemus FASTRAK) placed on the subject’s head, wrists, and back. The voice is sampled with 16 bits at 11 kHz. The movement and voice data of each VA are transmitted through Ethernet. The VAs are represented at a frame rate of 30 fps. The virtual audience consists of textures of the same size as a wall in a virtual space. These textures are used to represent six CG sunflowers and the background. The sunflowers act as listeners by producing communicative actions and movements that are coherently related to the speech input [7]. When a subject’s speech is inputted to the system, the CG sunflowers emulate nodding movements, and the leaves in these flowers emulate body movements. Nodding movement is defined as the falling–rising of the flower with respect to the vertical at a speed of 0.03 rad/frame. Body movement is defined as the upward-downward motion of the leaf at a speed of 0.0125 rad/frame. In addition, when a value estimated using the proposed model exceeds the threshold value, the sunflowers exhibit enhanced movements such as bending forward. Bending-forward motion is defined as the falling-rising movement of the entire flower body with respect to the vertical at a speed of 0.012 rad/frame. The sunflower leaves emulate enhanced movements defined as “wigwag” at a speed of 0.04 rad/frame. The moving-average (MA) model was used as the nodding response model [12]. The MA model estimates the nod timing y(i) as the weighted sum of binary speech signal x(i) in each frame of 1/30 s, as shown in the following expression (3).
A Virtual Audience System for Enhancing Embodied Interaction
185
J
y (i ) = ∑ a ( j )x (i − j ) + w(i ) j =1
(3)
a ( j ) : linear prediction coefficient, w(i ) : noise
Fig. 5. Setup of virtual audience system
5 Communication Experiment 5.1 Experimental Method The setup of the experiment in which two remote subjects converse from separate rooms is shown in Figure 6. The experiment was performed under the conditions of free conversation and consensus-building. In the free conversation experiment, two modes were compared: virtual-audience-generated entrained movements such as nodding and body movements (entrained mode) and virtual-audience-generated entrained and enhanced movements (combinational mode), because the effectiveness of entrained mode has been confirmed. In the consensus-building experiment, two modes were compared: the combinational mode and a static mode where the virtual audience does not move. The experimental process was as follows. First, the subjects used the system to communicate with each other. Next, they used the system for a 3-min free conversation in each mode. Then, in each mode, they discussed specific topics without any time restriction. The topics discussed by the subjects were selected from the website “Goo Ranking,” where ranking surveys are conducted on various topics, as listed in Table 1 [14]. In the consensus-building experiment, the subjects predicted the 1–3 rankings for the four highest-ranked items. They continued to converse until a consensus was arrived upon. Finally, they were instructed to perform a paired comparison of modes. In the paired comparison experiment, based on their preferences, they selected the better mode.
186
Y. Sejima, Y. Ishii, and T. Watanabe
The survey was conducted using a seven-point bipolar rating scale ranging from -3 (not at all) to 3 (extremely): 0 denotes the moderation after the completion of each mode. Each pair of subjects was presented with the two modes in a random order. Each pair of subjects belonged to the same sex (20 males and 20 females).
Fig. 6. Example scene of communication experiment Table 1. Ranking of conversational topics discussed in the consensus–building experiment
5.2 Results of Sensory Evaluation
The questionnaire results are shown in Figure 7. Figure 7 (1) shows the results of the sensory evaluation carried out in the free conversation experiment. From the results of the Wilcoxon signed-rank test, “enjoyment” and “excitement” were at the significant level of 1%, and “preference” and “unification” were at 5%. Figure 7 (2) shows the results of the sensory evaluation carried out in the consensus-building experiment. The significant level was set at 1% for all items. These results show the effectiveness of the system. The results of the paired comparison are shown in Figure 8. In the free conversation experiment, the combinational mode was selected by 82.5% of the subjects (33 out of 40). In the consensus-building experiment, the combinational mode was selected by 92.5% of the subjects (37 out of 40).
A Virtual Audience System for Enhancing Embodied Interaction
187
Fig. 7. Result of sensory evaluation by questionnaire
entrained mode
static mode 7.5%
17.5% combinational mode
combinational mode
82.5%
92.5%
Free conversation
Consensus building
Fig. 8. Result of paired comparison
5.3 Result of Behavioral Analysis
The interaction among the subjects were analyzed by focusing on the conversational activity in the communication experiment. The rate of the estimated value of the proposed model was evaluated from expression (4) in each experiment.
1 N ∑ u (i ) N i =1 u(i) : conversational activity, N : conversational section Activity =
(4)
The analysis results are shown in Figure 9. The figure shows the rate of conversational activity based on the proposed model and the results of the statistical analysis. In the free conversation experiment, there was a significant difference of 1%
188
Y. Sejima, Y. Ishii, and T. Watanabe
between the combinational and entrained modes. In the consensus-building experiment, there was no significant difference between the two modes. These results show that embodied interaction is enhanced by a virtual audience in a free conversation. [%] 30
** P<0.01 **
20
10 Free conversation
Consensus building
0 entrained mode combinational mode static mode
Fig. 9. Result of the rate of conversational activity
6 Conclusion In this paper, we proposed a model to estimate conversational activity based on the speech overlap in conversation and developed a virtual audience system to enable enhanced embodied interaction. Using the system, we performed communication experiments and carried out a sensory evaluation and speech-overlap analysis under the conditions of free conversation and consensus-building. The results showed that the developed system effectively enhances embodied interaction. Acknowledgments. This work was supported under the CREST (Core Research for Evolutional Science and Technology) of JST (Japan Science and Technology Agency).
References 1. Linden Lab, Second Life, http://secondlife.com/ 2. Miyajima, T., Fujita, K.: Control of avatar’s facial expression using fundamental frequency in multi-user voice chat system. In: Proc. of the 6th International Conference on Intelligent Virtual Agents, p. 462 (2006) 3. Kusumi, T., Ogura, K., Miura, A.: The Development of a Positive Community using Virtual Space for Cancer Patients. In: Proc. of Second International Symposium on Universal Communication, pp. 490–493 (2008) 4. Kato, R., Yoshitomi, Y., Asada, T., Fujita, Y., Tabuse, M.: A Method for Synchronizing Nods of a CG Character and a Human Using Thermal Image Processing. In: Proc. of 18th IEEE International Symposium on Robot and Human Interactive Communication, pp. 848–853 (2009)
A Virtual Audience System for Enhancing Embodied Interaction
189
5. Michael, K., Patrick, G.: Igaze: Studying reactive gaze behavior in semi-immersive human-avatar interactions. In: Proc. of the 8th International Conference on Intelligent Virtual Agents, pp. 191–199 (2008) 6. Amy, L.B., Soyoung, K.: The Effects of Agent Nonverbal Communication on Procedural and Attitudinal Learning Outcomes. In: Proc. of the 8th International Conference on Intelligent Virtual Agents, pp. 208–214 (2008) 7. Sejima, Y., Watanabe, T.: A Speech-Driven Embodied Entrainment Wall Picture System for Supporting Virtual Communication. In: Proc. of 3rd International Universal Communication Symposium, pp. 309–314 (2009) 8. Watanabe, T., Ogikubo, M., Ishii, Y.: Visualization of respiration in the embodied virtual communication system and its evaluation. International Journal of Human-Computer Interaction 17, 89–102 (2004) 9. Ito, H., Shigeno, M., Nishimoto, T., Araki, M., Niimi, Y.: The Analysis of the Atmosphere in the Dialogs. Technical report of IPSJ, SLP, vol. 2002(10), pp. 103–108 (2002) 10. Nishimura, R., Kitaoka, N., Nakagawa, S.: Response Timing and Prosody Change Modeling in Conversations and Their Application to a Spoken Dialog System. In: Proc. Of the 48th SIG-SLUD of the Japanese Society for Artificial Intelligence, pp. 37–42 (2006) 11. Condon, W.S., Sander, L.W.: Neonate movement is synchronized with adult speech. Science 183, 99–101 (1974) 12. Watanabe, T., Okubo, M., Nakashige, M., Danbara, R.: Interactor: Speech-driven embodied interactive actor. International Journal of Human-Computer Interaction 17, 43– 60 (2004) 13. Yoshida, M., Watanabe, T., Yamamoto, M.: Development of a Speech-Driven Embodied Entrainment System with 3DCG Objects. Transactions of Human Interface Society 9(3), 87–96 (2007) 14. goo ranking (November 3, 2008), http://ranking.goo.ne.jp/
VizKid: A Behavior Capture and Visualization System of Adult-Child Interaction Grace Shin, Taeil Choi, Agata Rozga, and Mario Romero College of Computing Georgia Institute of Technology, Atlanta, GA, USA {gshin37,tchoi6,agata,mario}@gatech.edu
Abstract. We present VizKid, a capture and visualization system for supporting the analysis of social interactions between two individuals. The development of this system is motivated by the need for objective measures of social approach and avoidance behaviors of children with autism. VizKid visualizes the position and orientation of an adult and a child as they interact with one another over an extended period of time. We report on the design of VizKid and its rationale. Keywords: Spatiotemporal visualization, mutual orientation, instantaneous distance, behavior analytics.
1 Introduction The development of VizKid was motivated by the increased prevalence of Autism Spectrum Disorders (ASDs) in the United States, and the concomitant need for objective measures of social behavior to help diagnose the condition and to track children’s development [1]. In particular, measures of the extent to which children with autism seek or avoid social interactions figure prominently in evaluating treatment outcomes for this population [2]. Current methods for measuring such behaviors typically involve parent or teacher-report questionnaires [3, 4], or timesampled direct observations of specific behaviors [5-7]. The behaviors of particular interest are typically the number of approaches the child makes to their interactive partner, the child’s responsiveness to the interactive partner’s social bids, and the amount of time the child spends in proximity to the partner versus alone in solitary play [5-7]. Whereas parent and teacher reports of such behavior are inherently subjective and may be unduly influenced by external factors, direct observations are often too labor and time intensive to scale up. The current project takes a first step toward developing new objective measures for capturing and visualizing the extent to which a child seeks or avoids social interaction. We take as our starting point interactions between two individuals. After consulting with behavior analysts and therapists at a local autism treatment center, we built a video visualization system that supports the analysis of social approach and social avoidance through interactive graphs of mutual distance and orientation between the two individuals. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 190–198, 2011. © Springer-Verlag Berlin Heidelberg 2011
VizKid: A Behavior Capture and Visualization System of Adult-Child Interaction
191
2 Related Work VizKid belongs to the large family of visualization systems that extract meaningful features from image sequences with the aim of highlighting evidence of target events without having to linearly browse the video. Here, we provide some of the most relevant examples from the literature. Daniel and Chen present one of the first abstract visualizations of behavior in video [10]. They visualize motion in a translucent space-time cube by mapping greater motion to greater opaqueness, thus enabling an operator to see through inactive regions and focus on the space-time volumes where the action occurred. Ivanov et al. present a visualization of the history of living spaces [12]. The authors provide 2D visualizations of space augmented with motion detection and video data. Through motion detection and path reconstruction, they visualize the historical flow of people through a building and provide contextual detail, such as people, objects, and actions, through strategic camera views. Romero et al. visualize activity in Viz-A-Vis as a stack of 2D aggregate motion heat maps on top of the space under observation, similar to a geographic information system [14]. The translucent heat maps have a near one-toone correspondence with architectural space that naturally supports space-centric queries. Viz-A-Vis also visualizes aggregate activity in places and periods of interest on the cells of an activity table. Large patterns of space usage are visible and open for interpretation and analysis coupled with sequences from the original video. Kubat et al.’s TotalRecall visualizes long-term video from real home environments in a 2D representation [13]. TotalRecall slides frames like cards spread out from a deck. The visual effect is that each 2D location in the visualization is a combination of multiple spatiotemporal coordinates that provides an overview structure. Crnovrsanin et at. present a proximity-based visualization plotting traces as distance to a point of interest vs. time [9]. The proximity-based visualization is particularly relevant to our re-mapping of coordinate systems to highlight relevant events. DeCamp et al. reconstruct the 3D geometry of the space under observation and project the historical paths of the occupants of the place into the 3D coordinates in space [11]. Botchen et al. present a 2D time lapse video visualization with highlighted abstractions of target objects and activities [8].
3 System Implementation The goal of VizKid is to facilitate the observation and analysis of the flow of the interaction between two individuals. Specifically, the system’s success will depend on the extent to which it helps behavior analysts understand reciprocal interactions between the child under observation and the person interacting with the child. We implemented the backend of VizKid in Matlab and the frontend in Processing, a Javabased open source programming language geared towards interactive visualizations. The next sections describe the three phases of the system: data collection, data annotation and aggregation, and data visualization.
192
G. Shin et al.
Fig. 1. (a) The inside view of the assessment room. (b) The observation room. (c) A schematic of the camera deployment in the assessment room.
3.1 Data Collection We collected the data for designing VizKid at Georgia Tech’s Child Study Laboratory (CSL), an experimental environment designed to mirror a typical playroom while facilitating the collection of high-quality video and audio data for behavioral experiments. CSL consists of two components. The first is an assessment room measuring 14 by 21 feet where data collection takes place. The assessment room is equipped with child-friendly furniture and toys (see Figure 1a). The second component of CSL is an observation and control room from which we can monitor the activity in the assessment room and manage the data collection. A human operator controls the cameras to optimize the data collection based on position, orientation, and observed behaviors (see Figure 1b). The assessment room is equipped with 11 cameras, eight around the perimeter of the room and three overhead cameras that fully cover the floor plan (see Figure 1c). For developing VizKid, we collected video from the overhead camera positioned directly in the middle of the ceiling. The overhead cameras are Axis 209 MFD recording motion JPEG at a resolution of 640 by 480 pixels (VGA) and at 30 frames per second. We replaced the standard lens with a shorter 2.8 mm lens with aperture F2.6 and an angle of view of 106°.
VizKid: A Behavior Capture and Visualization System of Adult-Child Interaction
193
One adult and one child participated in a one-hour recording session at CSL. We provided the participants with a set of play materials (painting set, train set, and blocks) and told them to play and engage as they wished. We classified a large number of captured activities, including table-top interaction, floor-play, and larger movements around the room. To manually pinpoint location and orientation, we selected a representative segment of video lasting 15 minutes and we manually coded 450 frames at a frequency of one frame every two seconds. 3.2 Data Annotation and Aggregation We built a simple Matlab application to click on the center of the shoulders and on a vector heading denoting the orientation of the each individual. This resulted in four clicks per frame or 1800 clicks for the 450-frame sequence at one frame every two seconds. This Wizard of Oz solution replaces a computer vision system that would track blob location and orientation. In the future, we will automate this extraction process by placing colored shoulder pads or similar fiducial markers on the individuals’ shoulders and by using robust computer vision techniques to accurately compute location and orientation. Figure 2 shows the world coordinate system of two individuals, the adult and the child. The distance d of the adult and the child is measured from the center of the adult’s shoulders to the center of the child’s shoulders in pixels. Orientation values θ1 and θ2 are obtained by calculating the angles between the line connecting the two individuals and their individual orientations, as defined above. Note that we are not marking the orientation of the head, which would require a fiducial maker on it. In our figure, the orientation of the head is denoted by the small black triangle. Rather, we are marking the orientation of the vector perpendicular to the line connecting the shoulders, where we will place the markers. We considered it would be more robust and less invasive to compute the orientation of the chest as an approximation to social orientation. In future work, we may place fiducial markers on the head as well, especially if our preliminary experiments determine the necessity for them.
Fig. 2. The world coordinate system of the individuals
From the subjects’ locations, we compute the Euclidian distance between them in image space using the Pythagorean Theorem and the angle of the line connecting the two points. We do not calibrate the cameras or reconstruct physical world coordinates. Thus, distance is not in meters or feet, but in pixels. Because of wideangle perspective projection from a 3D world to a 2D image space and because of wide-angle lens optical distortions, the mapping between pixel distances and physical
194
G. Shin et al.
distance in a one-camera system is a computationally under-constrained problem. Furthermore, a heuristic approximation to physical distance is complex and requires some understanding of the scene, such as people’s height. Again, this metric simply approximates the common idea of social distance. Part of the purpose of the current work is to determine the level of accuracy necessary to provide useful support to behavior analysis. If we determine that pixel distance is not enough, we will reconstruct physical distance with more complex vision algorithms. Because we wish to visualize distance and orientation on the same graph, we normalize the two measures to be on the same unit-less scale. To normalize distance, we linearly map the diagonal of the image (an approximation to the room’s diagonal) to 1.0 and two adjacent pixels to 0.0. Thus, the furthest two people can be apart is 1.0 and the closest is 0.0. Again, this measure is a simple approximation where we do not consider the complexities of wide-angle perspective and optical distortion. From the subject’s individual orientations, we define and compute a normalized measure of mutual orientation. We define mutual orientation to range between 0 and 1, where 0 is facing each other and 1 is facing away from each other. Everything in between is a linear mapping across the two extremes. Note that this definition is a many-to-one mapping. For example, two people facing north will add to 0.5, facing south will add to 0.5, and one facing north and one facing south will add to 0.5. Again, our goal is to determine if a simple and approximate metric of social orientation will suffice for effective behavior analysis. Figure 3 provides some examples of our simplified definition of mutual orientation.
Fig. 3. Our normalized definition of mutual orientation
The distance and mutual orientation data obtained via the process detailed above results in two time series. To gain a historical overview we aggregate the data. To visualize the aggregate, we map distance and mutual orientation to polar coordinates (See Figure 4). We placed the position of the adult at the center of the polar coordinate system, and we fixed the orientation of the adult to always point north. It is important to note here that we define an adult-centric coordinate system because we are interested in the child’s behavior, the dependent variable that we can’t control. If we place the child as the center of the reference system, the visualization becomes unstable and hard to read. Also, it is common for behavioral interventions to control the behavior of the therapists, which in our case would be the adult in the room. By filtering on controlled and discrete behaviors, we expect to be able to compare the differing results in the child’s behavior.
VizKid: A Behavior Capture and Visualization System of Adult-Child Interaction
195
Fig. 4. The polar coordinate systems for the adult-centric graph. The adult is in the center always pointing north. This aggregation does not account for the orientation of the child.
In the adult-centric polar coordinate system, we placed the child at radial distance d from the center. The angular position θ of the child is where the child is with respect to the adult. In other words, we simply map θ1 to θ, keeping 0° pointing north (90° in polar coordinates). Recall that θ1 is the angle between the orientation of the adult and the line connecting the adult and the child. Next, we discretized the polar coordinate space into bins. Each time the child’s location falls on a particular bin, the system increases the bin’s counter by one. Thus, the bin count over a specific period reflects the frequency with which the child was in that particular location. Note that the adult-centric polar coordinate system does not account for the orientation of the child. In our current implementation, we are ignoring that information. Through a user study, we plan to determine whether that information is necessary. If it is, we plan to compute a vector sum of all the orientations at a particular location and visualize a vector field of the sums in the adult-centric polar coordinate system at an interactive request from the user. 3.3 Data Visualization We developed VizKid in Processing, a high-end interactive graphics Java library. VizKid is an information visualization system that supports the analysis of social orienting (distance and mutual orientation) between two people interacting in the observation space. Figure 5 shows the three components of VizKid: the video panel in the upper left corner, the timeline panel on the bottom, and the aggregate panel in the upper right corner. The video panel (Figure 5a) shows the raw video frames and the vector of the child’s and the adult’s location and orientations. This panel allows the user to view the actual footage corresponding to the distance and mutual orientation data at a specific point in time. It provides a reification tool to understand the concrete details abstracted by our proxy visualizations of distance and orientation. Users can see specific objects, places, gestures, and actions. The timeline panel (see Figure 5c) contains playback control buttons that allow the user to play, pause, rewind, and fastforward the video while brushing both the timeline view and the aggregate view at the
196
G. Shin et al.
Fig. 5. A screen shot of the VizKid user interface: (a) The raw video panel; (b) the aggregate panel; and (c) the timeline panel
correct points in time. Users can observe the interaction flow between the child and the adult in the video and relate it to the visualizations. The aggregate panel displays the polar coordinate information for the child’s distance and relative orientation from the adult, described above in Section 3.2, using a heat map (see Figure 5b). The heat map represents the child’s spatiotemporal location relative to the adult over some pre-specified period of interest to the analyst. This version of the heat map is in gray-scale, with white indicating that the child rarely appeared in that particular bin position, and darker shades of gray, indicating increased frequency at a particular position. Because the graph is adult-centric, the location of the heat map clearly conveys where in respect to the adult the child spent their time. In other words, if the graph shows a dark region to the left of the center of the circle and close to its edge, the child spent most of the time far away from the adult and tended to stay to the left of the adult. The blue dot denotes the position being brushed in the time line (approximately frame 170 in the x axis).
VizKid: A Behavior Capture and Visualization System of Adult-Child Interaction
197
A double-sided arrow slide bar at the bottom of the timeline allows users to specify the window of time over which they wish to aggregate position and orientation data. It is a tool for dynamic queries. This aspect of the visualization goes beyond a single moment in time to allow the user to define and observe at a glance how the child interacted with the adult over some specific period, such as a particular condition within an experiment or even over the course of the entire experiment. Figure 5c shows the timeline panel that graphs normalized position and orientation on the vertical axis and time on the horizontal axis. The yellow line shows the normalized distance and the green area is formed by adding and subtracting the normalized mutual orientation from the normalized distance. This common information visualization technique is called Theme Rivers and it is meant to make visible the patterns in a multivariate time series. Moment by moment, the instantaneous mutual orientation is both added and subtracted from the instantaneous distance. Thus, the possible range of values goes from -1 to 2. In other words, the smallest possible value for distance is 0 and the largest possible value for mutual orientation is 1. If you subtract this value of orientation from distance, you get -1. On the other hand, if you add the largest possible value for orientation, 1, to the largest possible value for distance, 1, you get 2. So, the combined normalized scale is [-1:2]. To interpret the visualization the user needs to keep track of the center and the width of the green area: the wider the area, the less oriented towards each other the individuals; the higher the center, the more distant the individuals. It is important to note that a single (x, y) coordinate in this graph is an ambiguous representation due to the fact that multiple distances and orientations may add up to the same value. We disambiguate the graph by including both metrics in yellow and green.
4 Conclusions and Future Direction We developed VizKid, a capture and visualization system with the aim of facilitating more fine-grained examination of children’s social approach and avoidance behaviors over the course of an extended interaction. The main contribution of VizKid is the user interface, particularly the integration of the visualization of the interactions between a child and an adult with original video frames, and a means for aggregating and visualizing the distance and orientation data over various time scales. Our next step is to deploy this system with our collaborators at a local treatment center for children with autism, and via a series of case studies, examine how they apply the system to analyze practical problems, and refine the system based on their feedback. On the technical end, we will incorporate computer vision techniques to automatically extract the spatiotemporal data reflecting the relative orientations and positions of the individuals being observed. One proposal for doing so is to attach different colored patches on both of the adult’s and the child’s shoulders and to use color detection techniques to automatically detect the position of each shoulder. By doing so, we will be able to calibrate the positions of the shoulders and, consequently, the positions and orientations of the adult and the child. Based on the psychological and behavioral literature on measuring social behavior in autism, the future functionality of the system includes: 1) additional capabilities that quantify the aggregated data; 2) specific measures of who initiates social contact; and 3) the ability
198
G. Shin et al.
to track the child’s social approach and avoidance behavior to multiple individuals at the same time. We expect this functionality to approach the affordances necessary for VizKid to collect and analyze data in real environments, such as in a daycare or in a school setting. Acknowledgements. The work described in this paper was supported by the NSF Expeditions Award 1029679. We thank the mother and child who participated in our data collection and the behavior analysts who guided our design.
References 1. Rice, C.: Prevalence of autism spectrum disorders - Autism and Developmental Disabilities Monitoring Network. MMWR Surveillance Summary 58(10), 1–20 (2009) 2. White, S.W., Koenig, K., Scahill, L.: Group Social Skills Instruction for Adolescents With High-Functioning Autism Spectrum Disorders. Focus on Autism and Other Developmental Disabilities (September 24, 2010) (online-first publication) 3. Gresham, F.M., Elliott, S.N.: The Social Skills Rating System. American Guidance Service, Circle Pines, MN (1990) 4. Castelloe, P., Dawson, G.: Subclassification of children with autism and pervasive developmental disorder: A questionnaire based on Wing’s Subgrouping scheme. Journal of Autism and Pervasive Developmental Disorders 23(2), 229–241 (1993) 5. Lord, C., Magill-Evans, J.: Peer interactions of autistic children and adolescents. Development and Psychopathology 7(4), 611–626 (1995) 6. Hauck, M., Fein, D., Waterhouse, L., Feinstein, C.: Social initiations by autistic children to adults and other children. Journal of Autism and Developmental Disorders 25(6), 579–595 (1995) 7. Ingram, D.H., Mayes, S.D., Troxell, L.B., Calhoun, S.L.: Assessing children with autism, mental retardation, and typical development using the Playground Observation Checklist. Autism 11(4), 311–319 (2007) 8. Botchen, R.P., Schick, F., Ertl, T.: Action-Based Multifield Video Visualization. IEEE Transactions on Visualization and Computer Graphics 14(4), 885–899 (2008) 9. Crnovrsanin, T., Muelder, C., Correa, C., Ma, K.: Proximity-based Visualization of Movement Trace Data. In: IEEE Symposium on Visual Analytics Science and Technology, October 12–13. Atlantic City, New Jersey (2009) 10. Daniel, G., Chen, M.: Video Visualization. In: Proceedings of the 14th IEEE Visualization (VIS 2003). IEEE Computer Society, Los Alamitos (2003) 11. DeCamp, P., Shaw, G., Kubat, R., Roy, D.: An Immersive System for Browsing and Visualizing Surveillance Video. In: ACM MultiMedia, MM 2010, Milan, October 25–29 (2010) 12. Ivanov, Y., et al.: Visualizing the History of Living Spaces. IEEE Transactions on Visualization and Computer Graphics 13(6), 1153–1160 (2007) 13. Kubat, R., et al.: TotalRecall: Visualization and Semi-Automatic Annotation of Very Large Audio-Visual Corpora. In: Ninth International Conference on Multimodal Interfaces (ICMI 2007) (2007) 14. Romero, M., Summet, J., Stasko, J., Abowd, G.: Viz-A-Vis: Toward Visualizing Video through Computer Vision. IEEE Transactions on Visualization and Computer Graphics 14(6), 1261–1268 (2008)
Interactive e-Hon as Parent-Child Communication Tool Kaoru Sumi1 and Mizue Nagata2 1
Hitotsubashi University, Information and Communication Technology Center, 2-1 Naka, Kunitachi, Tokyo 186-8601 Japan 2 Jumonji University, Sugasawa, Niiza-shi, Saitama 352-8510 Japan
[email protected]
Abstract. In this paper, we described a media for helping children understand content, called Interactive e-Hon. It works by transforming text into an easily understandable storybook style with animation and dialogue. In this system, easy-to-understand content is created by a semantic tag generator through natural language processing, an animation generator using an animation archive and animation tables, a dialogue generator using semantic tag information, and a story generator. Through our experiment, we have shown that this method of transmitting visual images with verbal information is effective for promoting understanding. Keywords: Understanding, animation.
1 Introduction When providing information to children in words, we must carefully choose words or concepts that the children know, or we must include explanations of the words or concepts themselves. Explanations with visual information, however, are more intuitively understandable than verbal explanations. Children’s picture books include both verbal and visual information so that children can easily understand the content. If we could dynamically generate a picture book based on words, it would improve children’s understanding of content. We think that image media such as pictures or animation can efficiently support understanding, particularly in the case of children. Simultaneously presenting verbal information with visual information should support communication and understanding. Communication via a picture book broadens children’s vocabulary and helps them learn about unknown concepts. We have developed a system, Interactive e-Hon [1], for helping children understand content. Interactive e-Hon transforms text from electronic content into an easily understandable “storybook world.” The Japanese word hon means “book,” while ehon means “picture book.” By transforming text into animation, Interactive eHon can generate a dynamic picture book. Attempts to transform natural language into animation began in the 1970s with SHRDLU [2], which represents a building-block world and shows animations of adding or removing blocks. In the 1980s and 1990s, more applications [3][4][5] appeared, in which users operate human agents or other animated entities derived from natural G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 199–206, 2011. © Springer-Verlag Berlin Heidelberg 2011
200
K. Sumi and M. Nagata
language understanding. Recently, there has been research on the natural behavior of life-like agents in interactions with users [6][7][8]. The main theme in this line of inquiry is the question of how to make these agents as human-like as possible in terms of dialogicality, believability, and reliability. WordsEye [9] is a text-to-scene generation system that includes spatial data. In contrast, our system generates animations but not scenes. In this paper, we describe the effect of Interactive e-Hon using the system as a dynamic picture book based on existing digital text.
2 Interactive e-Hon Interactive e-Hon helps children understand difficult content through the use of animation. Our idea is that visual data attracts a child’s interest, and that the use of actual examples, like metaphors, facilitates understanding because each person learns according to her own unique mental model [10][11], formed based on her background. Interactive e-Hon is a fully automatic word translation medium that provides expression through the use of 3D animation and dialog explanation to help users understand Web content or any other electronic resources, such as news, novels, and essays. For given content, animation and a dialog explanation spoken by a voice synthesizer are synchronized. Figure 1 shows the system framework of Interactive e-Hon. The system generates documents with semantic tags (.tag files), morphological and dependency structure information (.morph files), and animation files (.ehon files), based on the .x file format of DirectX. We use Japanese for text.
Web
Server Ontology
Content text
Tag generator
World Japanese Animation SOAR thesaurus database AI engine view database
Animation generator
Dialogue generator
Japanese morphological / dependency structure analyzer Japanese
Story generator
Voice synthesizer
lexcon
PC Animation & dialogue Fig. 1. System framework of Interactive e-Hon
Interactive e-Hon as Parent-Child Communication Tool
201
3 Experiment of Parent-Child Communication The users of Interactive e-Hon are assumed to be a pair consisting of a parent and a child. By observing real users, we could evaluate the effect of using the system through their interactions with it. Therefore, we conducted experiments using real subjects to examine whether Interactive e-Hon was helpful for the users’ understanding. We used pairs consisting of a teacher and a child, instead of a parent and a child. The subjects were two preschool teachers and four children. Teacher A was in her fifties and had been teaching for 25 years. Teacher B was in her forties and had been teaching for 3 years. Children S and H were both boys, approximately 5 years and 6 months old. Child C was a girl, also 5 years and 6 months old. Child M was a girl, 4 years and 3 months old. In the experiment, the subjects viewed the content of “the origin of the teddy bear” via a dialogue and an animation generated for the text from the Web. The content was presented on the screen with explanation via the dialogue of the parent agent. Each pair of users was asked to sit in front of the display and talk freely while viewing the system. Their interactions were recorded on video. Each teacher was then asked to respond to a questionnaire afterward. Figure 2 shows a screen shot of the resulting content. 3.1 Confirming of Children’s Understanding According to the responses, the teachers and children have previously been unaware of “the origin of the teddy bear’s name.” We concluded that the concept was not very easy for the children, because both teachers said that the content included some difficult words and concepts. Consider the following examples of a teacher asking about a child’s understanding. <Example 1> Teacher A: Do you know about the United States of America? Children S: I don’t know. (He shakes his head.) Teacher A: You don’t know? <Example 2> Teacher A: Do you know about bear hunting? It means catching a bear. Children S: (He nods.) According to their responses, both teachers reported that visualization was an advantage of the system. Because of the visualization, the teachers could easily explain concepts even to the small children, who had poor vocabularies. Regarding the effectiveness of visualization, we observed that both teachers repeatedly pointed to the display during their interactions with the children.
202
K. Sumi and M. Nagata
<Example 3> Teacher A: The Washington Post is a newspaper, you know. (She points to the display.) Children S: (He nods.) Teacher A: That story was published in a newspaper in the United States of America. (She points to the display.) <Example 4> Teacher B: Then, the company made and sold the stuffed bears. They sold a lot of them. (She points to the display.) The mother and child agents talk about the content. The original text information can be seen in the text box above the animation. The following is a dialogue explanation for this example: Parent Agent: President Roosevelt went bear hunting. Then, he met a small, dying bear. Child Agent: The President met a small bear who was likely to die. Parent Agent: But, what do you think happens after that? Child Agent: I can’t guess. Tell me the story. Parent Agent: The President refused to shoot and kill the bear. And, he helped it instead. Child Agent: The President assisted the small bear. Parent Agent: The occurrence was carried by the Washington Post as a heartwarming story,with a caricature by Clifford Berryman. Child Agent: The episode was carried by the newspaper as a good story
Fig. 2. Sample view from Interactive e-Hon
Interactive e-Hon as Parent-Child Communication Tool
203
The next example shows the possibility of showing text on the screen to support the understanding of children who can read. <Example 5> Teacher A: Then, what it is called in the text? Child H: I don’t know. Teacher A: Here it is. (She pointed to the display.) Child H: Teddy. Teacher A: Yes. Oh, Yes. 3.2 Attracting Children’s Attention Teacher A reported that the animation attracted the children’s attention, indicating another advantage of the system. Child H was very interested in the animation on the display, from the beginning to the middle. <Example 6> Child H: What’s this? What? Teacher A: (She nods.) (The story starts.) Child H: Ooooh! Ooooh!
Explanation using animated representation can thus facilitate children’s understanding and enable easier explanation. It also attracts a child’s attention, as shown by the above example. 3.3 Combining Content and Existing Knowledge Teacher A pointed out the possibility for content to combine children’s experience and existing knowledge with their imaginations. The next example illustrates this type of interaction. <Example 7> Child S: In my house…… Teacher A: Yes? Child S: I have a stuffed bear… A big one…. I have it during sleeping time…. Teacher A: Oh. That’s nice. Child S: Such a big one. Teacher A: You have a bear in your house. Child S: Yes. <Example 8> Teacher B: Do you have a stuffed bear in your house? Child M: (She nods.) Yes. A blue ribbon one. Teacher B: It has a blue ribbon? Like this? (She pointed to the display.) Child M: (She nods.) I always take care of it. Teacher B: You always take care of it. Child M: (She nods.)
204
K. Sumi and M. Nagata
3.4 Promoting Children’s Question We also observed the children asking the teachers for explanations. <Example 9> (Voice: The company exhibited the stuffed bear at an expo.) Child M: What does “exhibit” mean? Teacher A: It means “bring and show.” He is bringing it, see? (She points to the display.) This example illustrates the possibility of a child working with an adult and actively acquiring knowledge through their interaction. 3.5 Acceleration of Children’s Understanding The next example demonstrates the acceleration of a child’s understanding as a result of interaction with the teacher. <Example 10> (Voice: Then, 3000 bears were ordered and there was a teddy bear boom in America. So the name “teddy bear” became established.) Teacher A: Ah… Americans thought the teddy bear was cute, and it attracted their attention. Then, everyone said, “I want to buy a bear.” So the company made a lot of them, like this. (She points to the display while explaining.) 3000 bears were ordered, you know? That’s so many, isn’t it? Child S: Yes it is. Teacher A: You understand? 3000 bears is a lot. Child S: Yes. Teacher A: Then, so many bears were ordered. All these people in America said, “I want to buy a teddy bear,” and they bought them. (She points to the display.) Child S: Now, you know what? Teacher A: Yes? Child S: Ah, the teddy bear was everybody’s favorite? Teacher A: Yes. It was everybody’s favorite. From the expressions, “3000 bears were ordered” and “there was a teddy bear boom in America,” it would be difficult for a child to understand the consequences of the expression, “the teddy bear was everybody’s favorite.” Additionally, the concepts of “3000 bears”, ”order,” and “boom” are not easy, and the inference of why ordering 3000 bears led to a boom is also difficult. As a result of interaction via the Interactive e-Hon system, however, Child S understood and translated his own word for “favorite.” Consequently, through this experiment, we demonstrated the possibility of actively supporting children’s understanding by having them use our system to interact with an
Interactive e-Hon as Parent-Child Communication Tool
205
adult. The experiment also showed the advantages of visualization and of explanation by showing a related concept.
4 Discussion Through our experiment using subject pairs consisting of a teacher and a child, instead of a parent and a child, we have observed some effects of parent-child communication via Interactive e-Hon. In our experiment, we observed many instances of a teacher’s pointing to the display to explain concepts. We observed that pointing is effective for explaining, so the interval of time in the content presentation for explanation is important. Interactive e-Hon shows content for each paragraph of the original text, so that there is an interval between paragraphs. When the presented content from a paragraph ends, users have to click the “next” button. Then, the next paragraph is shown as content. We need more verification of the appropriate interval time unit for not interrupting the users’ communication. We think that visual information plays an important role in intuitive understanding of content. Visualization is always more understandable than explanation using many words. In the experiment, we observed a child’s paraphrasing unknown words into his own words. There were inferences, such as this: When many bears were ordered, many people said they wanted to buy a teddy bear (teacher). If many people wanted to buy a teddy bear, the teddy bear was everybody’s favorite (child). This interaction leads to children’s understanding and paraphrasing. We also observed that this visualization via Interactive e-Hon attracted the children’s attention and encouraged their questions. By enabling the children to combine content with their existing knowledge, Interactive e-Hon facilitated communication. Interactive e-Hon is not a passive medium like television but a medium for mediating users’ communication. Parent-child interaction via Interactive e-Hon was facilitated like interaction via a picture book through the experiment. This communication style leads to correct, improved understanding through the users’ discussion. It also leads to further interaction between users. We could have changed the original text to make it easier, as there were some difficult words for the children in this experiment. If a parent changed the words or edited the content, it would provide better content. If the interface were enriched in this manner, the system could also be used as a tool for making educational material for parents or teachers and creating content for children.
5 Conclusion We have discussed the effect of using Interactive e-Hon, a system for facilitating children’s understanding of electronic content by transforming it into animation and dialogue. The system corresponds to understanding of the external world using a visual and verbal media. Through our experiment, we have shown that this method of transmitting visual images with verbal information is effective for promoting understanding.
206
K. Sumi and M. Nagata
References 1. Sumi, K., Tanaka, K.: Automatic conversion from E-content into virtual storytelling. In: Subsol, G. (ed.) ICVS-VirtStory 2005. LNCS, vol. 3805, pp. 260–269. Springer, Heidelberg (2005) 2. Winograd, T.: Understanding Natural Language. Academic Press, London (1972) 3. Vere, S., Bickmore, T.: A basic agent. Computational Intelligence 6, 41–60 (1990) 4. Bolt, R.A.: “Put-that-there”: Voice and gesture at the graphics interface. In: International Conference on Computer Graphics and Interactive Techniques archive, Proceedings of the 7th annual conference on Computer graphics and interactive techniques. ACM Press, New York (1980) 5. Badler, N., Phillips, C., Webber, B.: Simulating Humans: Computer Graphics, Animation and Control. Oxford University Press, Oxford (1993) 6. Cassel, J., Vilhjalmsson, H.H., Bickmore, T.: BEAT: the Behavior Expression Animation Toolkit. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters, pp. 163–187. Springer, Heidelberg (2004) 7. Tanaka, H., et al.: Animated Agents Capable of Understanding Natural Language and Performing Actions. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters, pp. 163– 187. Springer, Heidelberg (2004) 8. Marsella, S., Gratch, J., Rickel, J.: Expressive Behaviors for Virtual World. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters pp. 163–187. Springer, Heidelberg (2004) 9. Coyne, B., Sproat, R.: WordsEye: An Automatic Text-to-Scene Conversion System. In: SIGGRAPH 2001, Proceedings of the 28th Annual Conference on Computer Graphics. ACM, Los Angeles (2001) 10. Johnson-Laird, P.N.: Mental Models. Cambridge University Press/Harvard University Press, Cambridge/Mass. (1983) 11. Norman, D.A.: The Psychology of Everyday Things. Basic Books (1988) 12. Sumi, K.: Anime Blog for collecting Animation Data, Virtual Storytelling. In: LNCS. Springer, Heidelberg (2007) 13. Nishida, T., Kinoshita, T., Kitamura, Y., Mase, K.: Agent Technology, Omu Sya (2002) (in Japanese) 14. Liu, H., Singh, P.: Commonsense reasoning in and over natural language. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215, pp. 293–306. Springer, Heidelberg (2004)
SAM: A Spatial Interactive Platform for Studying Family Communication Problem Guo-Jhen Yu, Teng-Wen Chang, and Ying-Chong Wang Graduate School of Computational Design, National Yunlin University of Science and Technology {g9734703,tengwen,g9934714}@yuntech.edu.tw
Abstract. Communication among nuclear family is a complex but immediate problem due to the small number of family members and the diverse daily schedule of modern society. Because family members have to live together every day, so they must consider and coordinate mutually in communication to avoid estranging by over indifference. With the ambient environment and sensible space technology mentioned above in place, the aim of this study is to explore the possible calm interface. While with nuclear family context in mind, how to build such interface and its implication to the family communication is the problem. For building up an interface using plant-as-media, this study is using the property of plant phototropism. According the above concept, we build up the wall-formed plant interface and two perceptible family spaces practically. Starting with studying on family communication, and this research implements an ambient environment (Spatial Ambient environment, SAM) utilizing sensible space technology and calm interface. Keywords: Interactive Behavior, Spatial Interface.
1 Introduction Communication among nuclear family is a complex but immediate problem due to the small number of family members and the diverse daily schedule of modern society. Because family members have to live together every day, so they must consider and coordinate mutually in communication to avoid estranging by over indifference. The difference of family environment from other environment is family members are required to consider and make sense of reciprocally. However, there always are misunderstandings or unpleased events happened in unexpected situation. For example, under the situation of member is busy or rests, the other member may want to do some interaction with a family member in an undesired moment; or some member hope to be concerned but no one knows. These events will hurt the feeling of family gradually and eventually break the relationship. One of main causes for these events is by neglecting others feeling and lacking of good communication. On the other hand, with new technology nowadays, there is lots of contextual information surrounding people either virtually or physically. In addition to the G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 207–216, 2011. © Springer-Verlag Berlin Heidelberg 2011
208
G.-J. Yu, T.-W. Chang, and Y.-C. Wang
contextual Information, researchers have conducted studies on human gestures and activities to expose the possible intention of human behaviors that are indirect and ambient. Weiser brought up a word “calm” in 1997, he believe information deliver is tranquil and need less attentions [1]. Consequently, the purpose for this study is to find a method reflecting the family situation and give members moderate (calm) information feedbacks, encouraging the members to contemplate the relationship with more interaction. 1.1 Activities Classification of Family Members When technology is involved in supporting family member communication, the major issues are convenience, safety and awareness[2]. In communication behavior, Argyle indicated the linguistic communication includes oral communication and writing communication. Non-linguistic communication is comprised of facial expression, gesture, tones, attitude, and location and distance of people[3]. Further, we can use where people sit in a living room to guess the relationship of people, but such data is quite ambiguous. Additionally, the daily routines of each member will affect others. Huang classified time into “essential time”, “restraint time”, and “free time”[4]. ”Essential time ” means brush teeth, take a bath and eating and so on. “Restraint time” is the time for member to do their own work. “Free time” is the period to do casual activities. They contact other members in various situations under disparate time classification. 1.2 Sensible Space On the technology part, with the development of physical computing and embedded computers aids sensing technology in awareness and managing physical object and environment. This expansion makes computing technology in a way that people can control both virtual and physical objects by body gestures or direct manipulation. For this reason, we can get data by sensor set in space and convert them into useful information after computing analysis and processing. For example, when user swirl handle means he/she want to go out or back. This provides the sensing ability to the surrounding spaces in family environment. 1.3 New Media as Interface Addition to the sensible space, interface plays an important role in the message transmission in the space. Interface or human-computer interface studies how computer gets data, makes messages provided on the interface be easy to read and allow human perform the tasks without interference. New type of interfaces using different media technology to provide a more nature feedbacks. Such as, ambient display is a novel way to present information by light and voice in surrounding. This way is not limited by computer monitors but allow human manipulate the property of environment and media convey information with clear affordance[5]. New media provides a new type of interface. Nevertheless, new media such as plant interface
SAM: A Spatial Interactive Platform for Studying Family Communication Problem
209
changing by time used or digital interface has the results of stronger feeling and stimulus to urge user to contemplate profoundly. Ambient environment can aware the condition of people in space and this situation will change the environment itself incrementally. Consequently, the environment becomes livable and will grow up with habitants. When bringing computing technology into family environment, it will allow the environment to display data and transmit that provides the basic platform for this research. Chen ever brought up the way of social media to solve family communication[6]. This study takes nuclear family as studied object, and the family comprises parents and two kids. The concept of calm technology in family environment claim a spatial media interface to help members think the problem of getting along.
2 The Problem With the ambient environment and sensible space technology mentioned above in place, the aim of this study is to explore the possible calm interface. While with nuclear family context in mind, how to build such interface and its implication to the family communication is the problem.
3 Literature Reviews To understand how environment aware data and display, as well as the spatial interaction design, we discuss by the following viewpoints: 3.1 Using Plant as an Ambient Display Calm technology engages both the center and the periphery of our attention and in fact moves back and forth between the two[1]. Such display technique makes people focused on work in hand, and show data around without interfering people attention and deliver to people passively. Data transmission of Calm technology retain between center and periphery[7]. In this study, living room, dining room, kitchen and bedroom which are main space for people living are set as center and aisle are periphery. In 2004, Easterly combined rubber tree with physical computation device and WiFi to switch data from network to information. Make a control on plant given water to display the information on network to users. After a long time, there is variation on plant growing[8]. Plant Display makes a further integration of such idea with the thought of ambient display. We take advantage of present transformation of plant interface to present people feeling and affection to descript this interface can show quality of time. User can be moved and impressed by the behind meanings of interface when seeing it[9]. The data conveyed by plant interface can’t present concrete information but deliver abstract affection ingeniously.
210
G.-J. Yu, T.-W. Chang, and Y.-C. Wang
3.2 Space Awareness and Spatial Interaction Chan brought up concept of ambient trigger in 2006 that enable environment to realize the state of designer in space. Through the movement of “trigger” makes system recognize the state of designer and filter redundant and outside data[10]. Huang applied the way user moves steps into change of large-scale projector on 3D spatial angle of view and communicate with clients by this method[11]. Except let people not perceive the existence of computer, environment and object own the ability of computation. On the other hand, preset the usage is more precisely assisting users’ demands. Wan and etc. put a physical interface utilized plant phototropism in an open space and compare with others to observe whether people alter behavior in virtue of existence of interface. Although the effect is minimum on the beginning, the effect will transform into biggest after two weeks test time. The interface can enforce and change people essential activities slowly[12]. Interactive Grass system created an environment to reflect the atmosphere on working by designing a working table connected with network and take grass as an interface[13]. So, plant interface can be the organic media for people getting along in environment by unique plant time property to show context of environment to affect people behavior implicitly.
4 How We Approach For building up an interface using plant-as-media, this study is using the property of plant phototropism. The system controls the position of light to make growing curve disparity of plant to arise particular visual variation, as Fig. 1. For applying such interface to reflect the interaction situation of family members, it is the distance of members, location and time to be selected to display the favorable impression of members in model of family interaction. The changes of the supplied water or light to make variation in plant growing slowly as time passed is the property this study is aiming to control.
Fig. 1. Plant arise visual variation because of phototropism
We perform such concept into practical family environment and put the plant interface on the aisle as the wall to connect two spaces. When moving in two spaces,
SAM: A Spatial Interactive Platform for Studying Family Communication Problem
211
family members can see the variation owing to plant phototropism and acquire some mental feedbacks. Further, urge the family members to think the interaction of family.
5 SAM System According the above concept, we build up the wall-formed plant interface and two perceptible family spaces practically. Taking the living room as the first design prototype, we called this prototype as Spatial Ambient environment, SAM for short. We use foliage plants as the prototype of plant interface, as Fig. 2-(a). We gather the specific action data of people in living room and kitchen by sensors. After the regional system (space server) receiving the data, it will deliver the data to main system (SAM server) to analysis information and to manage position and time of plant light by physical computation way. In long-termed experiment period, the position of light will cause difference of plant growing direction. When all plants are near one side of wall, it means the family space linked to wall need to be concerned, as Fig. 2-(b).
(a)
(b)
Fig. 2. SAM interface prototype. (a)Wall-formed plant interface. (b) The magnification of part of interface.
5.1 Framework Spatial Ambient environment (SAM) includes: the sensor module, database module, computing module and organic visualization module. SAM is responsible for controlling and conveying information between every sensor module and links to database to analysis data. SAM interface is distributed into contact light and ambient light; system framework shows as Fig. 3.
212
G.-J. Yu, T.-W. Chang, and Y.-C. Wang
Fig. 3. System communications flow
We take living room as earlier stage study environment embedded designed sensor module. Every module is communicating by wireless way to transmit information. This study takes location and distance of people as data resource. The following will introduce contact light and ambient light separately. (1) Contact light. Contact light system set the distance between people more close means the relationship to people more intimate. Therefore, two people to be more closely, the number of light turned on will be more; when the duration of people to get along is longer, the length of contact light opened will be longer, as Fig. 4. (2) Ambient light. Due to the situation of spaces, ambient light will display different color by yellow, red and green to indicate “essential time”, ”restraint time” and ”free time”. The color will give family member cues directly to understand what kind situation of family members are in this space.
Fig. 4. SAM interface
SAM: A Spatial Interactive Platform for Studying Family Communication Problem
213
5.2 The Activity Analysis This part explains the disposition of sensor module in living room and the analysis of behaviors of member to approach each other. The experimental space is located in SOFTLab, NYUST. The sample behaviors in the living room are shown in Fig. 5. The Fig. 6 displays the position of sensor and serial number. The reed pressure sensor is used on coach. When people sit on coach, sensor will be triggered and this means the situation of there is people on the coach.
Fig. 5. Interactive behaviors in living room
Fig. 6. The disposition of sensors put on coach
We take two people contact mainly to analysis the distance and time of member sit on coach. Sensor will judge the position of first one, and then decide the distance by position of the second one. As Table 1, the system will receive the input data and output the relatively response. When there is the third one sit in this space, system will be input by the situation of F0 and F1. Table 1. Action corresponded to SAM sensor and visualization module Action A user sits in living room
SAM sensor module F0 or F1 or F2 or F3
SAM visualization CL1 Duration = 8/h
Other one sits in the living room
Duration > 30/m F0, F3 or F1, F3 or F2, F0 or F3, F0 Duration = 20/m
CL1,CL2 Duration + 10/m
Visualization condition Plant will grow deviously
214
G.-J. Yu, T.-W. Chang, and Y.-C. Wang Table 1. (continued)
When one approaching an other
F0, F2 or F1, F0 or F3, F1
Then they are to be near more closely
Duration = 20/m F0, F1 or F1, F2 or F2, F1 or F2, F3 or F3, F2
CL1, CL2, CL3 Duration + 20/m
Duration = 20/m
CL1, CL2, CL3, CL4
Plant grow upwardly
Duration + 30/m
CL = Contact Light.
6 Lessons Learned Two lessons learned from this research. One is the plant display in interaction design might invoke different plants for different purpose, and another is prototyping is the important process and technique to the contextual interaction design problem. More problems will be discovered and resolved during the prototyping process. They are described as follows. (1) Plant display mainly depends on what kind of plant used. Different type of plants will have different representation and the information it can represent. In the case studies, grass, foliage plants are applied for the representation of plant display. Generally speaking, using a single plant for representing the whole interface, although it can simplify the interface, but if the changes itself is not noticeable easily, then the consequence will not be clear enough for users to understand. Therefore, finding another plant that has noticeable changes over time is a desirable choice. This suggests a hybrid plant interface with multiple plants. Such as zebrine pendula will change its color according to the sunlight (as shown in Fig. 7). When there is plenty of sunlight, the leaves of zebrine pendula will turn red, reverse, the leaves will then turn silver color.
(a)
(b)
Fig. 7. The color changes of the leaves of zebrine pendula (a) plenty of sunlight: red, (b) lack of sunlight: silver
SAM: A Spatial Interactive Platform for Studying Family Communication Problem
215
(2) Prototyping is the important process and technique to the contextual interaction design problem. While using living room in SOFTLab as the platform for experiment, the activity analysis of family members can then be studies and designed the corresponded sensing spaces. Many details will not be spotted without actually working in the prototypes. Such, the chatting among members, the tone of conversation are discovered as the factors for friendliness among members. However, due to timing constraints, the sofa and coffee table have been selected as the sensible furniture in the prototype. There are still other systems can be applied and developed in this experiment platform.
7 Conclusion Starting with studying on family communication, and this research implements an ambient environment (Spatial Ambient environment, SAM) utilizing sensible space technology and calm interface. A prototype and its framework are proposed and developed. The key calm interface is a plant interface connecting both contact light and ambient light. This interface is located in the hallway in a family context that can reflect both conditions in two adjacent rooms. In addition, the SAM platform invokes both server and sensing spaces to form a complex framework that can be used for further studies on actual family communication. The idea of this research is simple (building up an augmented ambient environment) but once the implementation involving physical environment the problems increase tremendously. However, with this platform, further research can then be conducted.
References 1. Weiser, M., Brown, J.S.: The Coming Age of Calm Technology. Springer-Verlag New York, Inc., New York (1997) 2. Khan, V.-J., Markopoulos, P., de Ruyter, B., IJsselsteijn, W.A.: Expected Information Needs of Parents for Pervasive Awareness Systems. In: Schiele, B., Dey, A.K., Gellersen, H., de Ruyter, B., Tscheligi, M., Wichert, R., Aarts, E., Buchmann, A. (eds.) AmI 2007. LNCS, vol. 4794, pp. 332–339. Springer, Heidelberg (2007) 3. Argyle, M.: Social Interaction. Transaction Pub. (2007) 4. Huang, H.-I.: A Study Of The Generic Family Living Style And The Communication Pattern Among Family Members. Master of Design, Institute of Industrial Design, National Yunlin University of Science & Technology (2008) 5. Wisneski, C., Ishii, H., Dahley, A., Gorbet, M., Brave, S., Ullmer, B., Yarin, P.: Ambient displays: Turning architectural space into an interface between people and digital information. In: Yuan, F., Konomi, S., Burkhardt, H.-J. (eds.) CoBuild 1998. LNCS, vol. 1370, p. 22. Springer, Heidelberg (1998) 6. Chen, C.-W.: The study of Context-Oriented Family Inter-Relationship Platform. Master of Design in Computational Design, Graduate School of Computational Design, National Yunlin University of Science & Technology (2010) 7. Buxton, B.: Integrating the Periphery and Context: A New Taxonomy of Telematics. In: Graphics Interface 1995, pp. 239–246 (1995)
216
G.-J. Yu, T.-W. Chang, and Y.-C. Wang
8. Easterly, D., Kenyon, M.: Bio-Fi: Inverse Biotelemetry Projects. In: MM 2004, New York, USA (2004) 9. Kuribayashi, S., Wakita, A.: PlantDisplay: Turning Houseplants into Ambient Display. In: ACE 2006, Hollywood, California, USA (2006) 10. Chen, T.-H.: Ambient Trigger: An Interface Framework for Evoking Ambient Reconfiguration in Personal Design Environment. Graduate Institute of Architecture College of Humanities and Social Science. National Chiao Tung University (2006) 11. Huang, I.-C., Chang, T.-W.: A Study of Using Oversized Display in Supporting Design Communication. In: 8th International DDSS Conference, Eindhoven University of Technology, pp. 289–301 (2006) 12. Wan, D.H., Kembel, J., Hurst, A., Forlizzi, J.: User Awareness and User Behavior in a Shared Space. In: CAADRIA (2006) 13. Shih, J.-H., Chang, T.-W., Hong, H.-M., Li, T.-C.: Physical representation social presence with interactive grass. In: HCI (2007)
The Effects Visual Feedback on Social Behavior during Decision Making Meetings Merel Brandon1,2, Simon Epskamp1, Thomas de Groot1,2, Tim Franssen1, Bart van Gennep1, and Thomas Visser1 1
University of Twente, Human Media Interaction, P.O. Box 217, 7500 AE Enschede, The Netherlands 2 T-Xchange, P.O. Box 1123, 7500 BC Enschede, The Netherlands {merelbrandon,simoneskamp,timfranssen,bartvangennep, thomasvisser}@student.utwente.nl,
[email protected]
Abstract. This paper describes the design and evaluation of a visualization that provides feedback for meeting participants on their social behavior (Social Mirror). Our Social Mirror provides feedback on participation level, interactivity level, and level of agreement. For the evaluation we conducted an experiment where two groups of four participants each took part in a meeting with and in a meeting without the Social Mirror. The results showed that the participants could easily extract information from the Social Mirror without being distracted from the topic of discussion during the meeting. Our results further suggest that the Social Mirror leads to changes in the social behavior of the participant; in particularly due to the agreement visualization. Moreover most participants prefer meetings with the presence of the Social Mirror. Keywords: Meeting, collaboration, decision making process, social mirror, social visualization, social behavior, social feedback.
1 Introduction Finding solutions for complex problem requires the sharing of knowledge between stakeholders in the decision-making process [1]. For that, meetings in which the stakeholders meet are very important. Unfortunately, meetings are often ineffective and inefficient. One of the reasons can be found in the social signals and social behavior between discussants that facilitate communication. People sometimes behave in an undesired manner during meetings (e.g. aggressively dominating, not participating because of shyness or expression of negative feelings, indecisiveness) [2]. Dominant alternations of speaker turns between two meeting participants often occur during group meetings, leaving little opportunity for other participants to take the turn [3]. One of the pitfalls of meetings is social loafing: “people expend less effort when working in groups than working alone” [4]. Another risk of meetings is groupthink: the tendency of groups to strive for unanimity, leaving no space for nonconsensus thinking [5]. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 219–228, 2011. © Springer-Verlag Berlin Heidelberg 2011
220
M. Brandon et al.
This paper describes the design and evaluation of a Social Mirror. A Social Mirror is a near real-time visualization that shows the discussants information about the social behavior of themselves and other people, and allows the discussants to recognize patterns in their social behavior over time [6]. Social signals and social behavior are expressions of a person’s affective, attitudinal, or cognitive state towards social situations or interplay trough a multiplicity of non-verbal behavioral cues. Social signals are short intentional non-verbal expressions (e.g. turn taking or mirroring), social behavior last longer (e.g. agreement, politeness, empathy) [7]. Interestingly, previous research shows that feedback on social behavior could lead to improved group performance and overall effectiveness [8]. For example, visualizing the speaking time of participants of a group meeting has a balancing effect on their participation level during a meeting [3,6,9,10]. We developed a visualization that provides feedback on three social aspects: participation level, interactivity level, and level of agreement. These aspects are considered determinants of social loafing, undesirable social behavior of participants (e.g. dominance, under participation, and indecisiveness), and dominant speaker alternation. In addition, agreement visualization is an indicator of groupthinking. The goal of our Social Mirror is to make the social behavior of participants explicit. We expect that this will motivate people to change their social behavior during a meeting. Through a user study we evaluated the effect of our Social Mirror on the social behavior of participants. We have explored the usability, perceived usefulness, and satisfaction of the participants with the Social Mirror. The outline of the paper is as follows. We start by describing more related work on social visualizations. Next, we portray our Social Mirror system design. Further, we describe the evaluation method. Section four, five and six will describe respectively the research method, results, and the discussion of the results. Finally, we conclude this paper with our ideas for future research.
2 Related Work Already in 1959 Smith and Kight [8] showed that giving feedback to meeting participants on their social behavior has a positive effect on productivity. Their experimental study showed that groups that received social behavior feedback between two meetings were significantly more productive during the second meeting as compared to the groups that were not given feedback. Over the last decade several research projects focused on applications that provide automatic visual social behavioral feedback during meetings. The findings of these projects [3,6,9,10] indicate that visualizing speaking length has a balancing effect on the level of participation: over-participators speak less, and under participators speak more. The Meeting Mediator of Kim et al. [9] used Social Metric Badges to collect information about the participation level and the interactivity level. This information was provided to each participant on a mobile phone. The Meeting Mediator had significant balancing effect on both the participation level and on the interactivity level.
The Effects Visual Feedback on Social Behavior during Decision Making Meetings
221
The term Social Mirror comes from a paper by Karahalios and Bergstorm [6]. They describe two types of social mirrors we will briefly discuss here: the Conversational Clock and the Conversational Votes. The Conversation Clock is a shared display that shows the participants their turn taking history of the whole conversation. Participants can see who has been speaking when, for how long, and how loud. After testing this application with a small group they noticed that participants discovered patterns in the conversational flow. What’s more, participants made inferences from these patterns. For example, they could recognize leader-follower roles. The Conversational Votes provide participants with a red and a green button to indicate their agreement with the speaker. The buttons were placed underneath the table, to make it possible to vote anonymously. The speaker could see in the visualization the amount of agreement or disagreement among the listeners. This resulted in speakers lengthening their turn when they saw that others disagreed. Despite this undesired result, Karahalios and Bergstrom described some preliminary interesting benefits of displaying agreement, namely an increased feeling of inclusion and satisfaction that all could express their opinion. To make increase this effect, they suggested a more detailed expression of the level of disagreement and agreement. We have followed the suggestion of Karahalios and Bergstrom by supporting gradual agreement visualization.
3 Social Mirror System The goal of our Social Mirror is to motivate meeting participants to behave socially in a desirable manner (e.g. actively participating, and not dominating the discussion) during meetings. We believed that making participants more aware of their behavior, and making their behavior more explicit to them and the group would motivate them to change their social behavior towards more desirable behavior. The system shouldn’t make high-level inferences (e.g. dominance or participation level), because we believed that this would lead to false inferences, and participants’ refusal of the system. Therefore, the system should visualize the relative speaking duration of each participant as an indicator of participation level and dominance. Speaking less than other participants doesn’t necessarily mean that the participant is not actively participating. Whether social behavioral changes are desired when the system shows that a participant has spoken rarely, is up to the participant. The system provides participants insights in dominant speaker alternations by visualizing how long and how often pairs of participants have spoken following each other. Furthermore, we wanted to increase the feeling of inclusion and satisfaction by visualizing the level of agreement. By providing the participants with an interface to actively indicate their opinion about statements of others or meeting outcomes, we wanted to decrease indecisiveness. The Social Mirror we designed uses two methods to collect information on the social behavior of the discussants: 1) agreement through a user interface, and 2) interactivity level and participation level through automatic speaker detection.
222
M. Brandon et al.
Fig. 1. The setup of the Social Mirror for eight participants
Fig. 1 shows the design of the system. Participants have their own user interface on a tablet P. This interface allows them to draw their own avatar before the discussion starts. These avatars are used to represent the participants’ alter ego on shared screens. We reasoned that it would be easier for participants to recognize a visual representation of themselves when they created it, than when it would be created for them. The visualization on the shared table-screen shows which avatar belongs to whom. The avatars are positioned in front of the participants. The interactive visualization on the shared vertical screen provides information about the social behavior of the participants. Fig. 2 shows a snapshot of the shared vertical screen during a group discussion. The participants have a similar view on their tablet PCs. The participants can express their agreement on the current discussion topic by controlling the position of their avatar via their tablet PC. A smaller distance between avatars signals more agreement, while a large distance indicates disagreement. The background image of the Social Mirror is normally the galaxy map as shown in fig. 2. The facilitator (e.g. the moderator of the group discussion) can decide to change the background image. We have created two extra types of background images. First, a background image that has four diverse colors in the corners. This background can be used in case the meeting has four possible meeting outcomes. The participants can place their avatar on or in between the colors that represent the meeting outcome they prefer. Secondly, a gradient background image that is red on one side and green on the other. This background image can be used to indicate satisfaction with a solution, meeting outcome, or agreement with a statement. The size of circles around the avatars indicates how much each participant has spoken during the discussion. Participants that are represented with bigger circles have spoken more during the meeting.
The Effects Visual Feedback on Social Behavior during Decision Making Meetings
223
The lines between the circles show how much the participants have spoken following one another. At short intervals a new line is drawn from the previous speaker to the current speaker.
Fig. 2. The Social Mirror of four participants as displayed on the shared vertical screen and on the tablet PC. Distance between avatars signals level of agreement, size of avatars signals participation level, and lines between avatars signals interactivity level between pairs of participants.
4 Evaluation The evaluation goal of this project was to explore the effects of the Social Mirror on social behavior of participants. We wanted insights in the usability, perceived usefulness, and satisfaction of the users with the system. We conducted a within subject design user study with two groups of four participants each. Qualitative data was collected through questionnaires. Both groups had a discussion with the Social Mirror (experimental condition), and a similar discussion without the Social Mirror (control condition). To eliminate learning effects, the order of the conditions was counterbalanced for the two groups. In both conditions participants were given similar tasks. Eight information engineering students participated (7 male, 1 female, average age 23). The experiment took place in a smart meeting room of the University of Twente. During both sessions there was a facilitator present, who led the meeting, and a second person that captured the content of the meeting in a mind map. The participants were told that the goal of the experiment was to test a newly designed tool called the Social Mirror. After this introduction the first questionnaire was given with some personal data questions (e.g. age, gender, personality statements related to meeting behavior) and a consent form was signed. This personal data is used to check the similarity of the groups. During each condition the participants were given a different (though similar) decision task (fictional but based on reality). During both tasks the group had to
224
M. Brandon et al.
decide between four concepts for the site planning of a former airport terrain. All participants had been assigned a stakeholder role (nature lover, local government, resident, and businessman). The roles were not changed between conditions. The interface of the Social Mirror was shortly explained to the participants. Though, the participants were not instructed with respect to the interpretation of, or the reaction to (e.g. desirable changes in their ‘social’ behavior) the Social Mirror. During the experimental condition, participants were asked to draw their own avatar that would be used by the Social Mirror. As a starter, the facilitator asked the participants to indicate their position towards the four site planning concepts (e.g. which concept(s) they preferred). During the experimental condition the participants didn’t only verbalized their position, but also visualized their position using the Social Mirror with four background colors that symbolized the four concepts. During the discussion with the Social Mirror the participants could freely move their avatar to signal their agreement with the other participants. Five minutes before the end of the meeting, the participants were asked to give their position towards the four concepts again. After the discussion that would last for 15 minutes they had to decide in favor of one of the four site planning concepts. The session ended with an assessment of the participants’ satisfaction with the final decision. During the experimental condition the participants visualized their satisfaction using the green-to-red gradient background of the Social Mirror. After each discussion (experimental and control) the participants completed a second questionnaire. These questionnaires asked the participants to score the social behavior of themselves and others. The questionnaire after the experimental condition had additional questions about the Social Mirror. After the participant experienced both conditions they were asked to indicate their preference for a meeting with or without the Social Mirror. The experiment ended with an informal debriefing with all participants.
5 Evaluation Results 5.1 Social Behavior We asked the participants to score their own and each other’s social behavior on a five point Likert scale with semantic labels after both the discussion with and the discussion without the Social Mirror. The participants scored how often each participant took the turn (5 = very frequently, 1= never), for how long they spoke on average (5= very long, 1 = very short), and how dominant they were (5= very dominant, 1= very submissive). For each participant we have their own score and the group score (mean of the scores of others). Fig. 3 shows the mean and standard deviation of the self scores and group scores for social behavior. The difference between the mean scores for both conditions is very small. All scores are between three and four. The standard deviations are larger for the scores that are given after the discussion with the Social Mirror, except for the self assessment scores for dominance.
The Effects Visual Feedback on Social Behavior during Decision Making Meetings
225
Fig. 3. Means and standard deviations of the scores that the participants gave themselves (self) and each other (group = mean group score of the social behavior of a participant) on the social behaviors (turn frequency, speaking length, and dominance) after the discussion with (with) and without (without) the Social Mirror
After the meeting with the Social Mirror the participants were given the following question: Has this Social Mirror influenced your behavior, how and why? Seven out of eight participants felt that their social behavior was influenced by the Social Mirror. Most changes in the social behavior were related to the visualization of agreement. One answer was related to the participation level and one to the distribution of attention among the participants. • “Yes, because you continuously have to indicate your position in relation to the other participants. You are forced to think about who you agree with.” • “It invited me to enter the discussion; you think more about your opinion.” • “I could clearly see to which people I should defend myself.” • “You are nudged to make more compromises (bad for your own view, good for the decision making)” • “I felt more confident when someone agreed with me and dared to defend my view better.” • “I felt forced to talk when I saw my avatar being the smallest.” • “It made it clear who I was focusing on, do I could determine if I needed to give one of the participants more attention.” Only one participant said that the Social Mirror didn’t have an effect on his behavior during the meeting. “Without this tool I would not behave differently, I would use the same arguments.” 5.2 Usability The participants were asked to rate the statements about the usability of the Social Mirror on a five point Likert scale with semantic labels. Fig. 4. shows a diagram of the mean scores of agreement with the statements (1=strongly disagree, 5= strongly agree).
226
M. Brandon et al.
Fig. 4. Mean scores of participants’ agreement with statements about the ease of understanding or using diverse aspects of the Social Mirror, and about how distractive the Social Mirror was perceived. Higher scores mean less distractive, easier to understand, or easier to use.
The answers to the open question: “What information did you get from the Social Mirror?,” showed that the participants could read some additional information from the Social Mirror. They could see to what solution the discussion was heading, and the effect of arguments on other participants. Six participants judged the appearance of the system as pretty, good, and useful. They liked drawing their own avatars. The gradient background image that visualized the four concepts as colors in the corners was perceived as a little boring and unclear. The galaxy map was perceived as pretty, but not useful. 5.3 Preferences After both discussions the participants were given the following question; What do you prefer: a discussion with or without the Social Mirror? Seven out of eight participants preferred the meeting with the Social Mirror. All the motivations for the preference for the meeting with the Social Mirror were related to the visualization of the agreement. “It makes the views of participants much more clear.” “When you weren’t paying attention, you can quickly see what everyone’s position is.” “Consensus would become clear sooner visually.” “You can see what moves people to change their view.” The participants also state that the Social Mirror would be even more valuable during meetings with six or more participants. The participant that preferred the meeting without the Social Mirror provided the following motivation. “I feel judged by the system. It gives me the feeling that I cannot take an extreme viewpoint.” The facilitator also preferred the meeting with the Social Mirror. “It provided the facilitator more insights into the preference or view of participants…. This makes it easier to work towards a decision. Participants see their own extreme views, and the facilitator does not have to point it out to them.”
The Effects Visual Feedback on Social Behavior during Decision Making Meetings
227
6 Discussion Our preliminary results show that our Social Mirror didn’t have an effect on the speaking frequency, speaking length, or dominance of the participants. We measured these social behaviors by asking the participants to score themselves and each other on these aspects. This is a subjective and qualitative measurement of the social behavior and not fully reliable. In contrast to the related work we found more balance in the participation level during the meeting without the Social Mirror (the standard deviations of the scores of the control condition are smaller). Most participants said that they changed their behavior during the meeting due to the presence of the Social Mirror. Most changes were related to the visualization of agreement. Participants were motivated to form an opinion, to reach consensus, actively participate, and equally divide their attention over all participants. Some of the changes that the participants described are behavioral changes but not social behavioral changes. During this explorative evaluation we used qualitative subjective measures of social behavioral changes. Naturally, there is always a bias between how people judge their own or each other’s social behavior compared to their real social behavior. The usability of the system was in general good. Most participants could easily read the Social Mirror without being distracted from the content of the meeting. However, participants forgot the meaning of the different colors of gradient background image. We told the participants that we created the Social Mirror. Therefore, participants could have given socially desirable answers. We doubt whether this is true in this case, since the questionnaires were anonymous and the answers to some other questions were sometimes quite negative. The level of distraction the system causes was only acquired through a subjective measurement. Both participants and the facilitator preferred the meeting with the Social Mirror. Especially the visualization of agreement was perceived useful. The agreement visualization provided insights in the opinions of the participants, this allowed them to determine a persuasion strategy, to see when consensus was reached, and even predict meeting outcome. The answers didn’t provide insight in the perceived usefulness of the visualization of the participation level and the interactivity level. The participant that didn’t prefer the presence of the Social Mirror, didn’t like the Social Mirror to make his extreme opinion explicit. He indicated that he felt forced to reach consensus. This finding could be explained in two ways. First, the Social Mirror could lead to discomfort and group thinking (only consensus thinking). Second, visualizing agreement motivates participants to reach consensus. These interpretations are not exclusive. The main shortcoming of this study is the small number of participants. Therefore, we can’t formulate any strong conclusions.
7 Future Work Our findings suggest that visualizing the level of agreement between participants is perceived as a useful feature during decision making meetings. We therefore suggest future research to explore the effect of a Social Mirror that visualizes agreement on
228
M. Brandon et al.
social behavior of participants and on meeting performance. The effect of agreement visualization on participation level, group thinking, and consensus reaching should be further investigated during future work. These effects might be more salient during meetings with six to eight participants. In contrast to related work we didn’t find a balancing effect of the Social Mirror on the participation level. Contrary, we found more balance in the participation level during the meeting without the Social Mirror. We therefore suggest a more extensive study of what aspects of a Social Mirror motivates people to change their social behavior. We would suggest a larger scale experiment with objective quantitative measurements. At last we would suggest exploring the effect of the Social Mirror on real life meetings, where outcomes have far reaching consequences. We expect different effects of the Social Mirror during those kinds of meetings. Acknowledgments. We would like to thank M. Poel and F.W. Fikkert of the HMI group of the University of Twente, and J. de Heer of T-Xchange for supervising the project.
References 1. Conklin, J.: Dialogue Mapping: Building Shared Understanding of Wicked Problems (2005) 2. Wayne, D.: The IAF handbook of group facilitation. In: Facilitation. Beyond Methods, ch. 3, pp. 35–55 (2005) 3. Sturm, J., van Herwijnen, O.H., Eyck, A., Terken, J.: Influencing social dynamics in meetings through a peripheral display. In: Proceedings of the ninth international conference on Multimodal interfaces - ICMI 2007, p. 263 (2007) 4. Jackson, J.M., Harkins, S.G.: Equity in Effort: An Explanation of the Social Loafing Effect. Journal of Personality 49, 1199–1206 (1985) 5. Hensley, T.R., Griffin, G.W.: Victims of Groupthink. Journal of Conflict Resolution 30, 497–531 (2010) 6. Karahalios, K.G., Bergstrom, T.: Social mirrors as social signals: transforming audio into graphics. IEEE computer graphics and applications 29, 22–32 (2009) 7. Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: Survey of an emerging domain. Image and Vision Computing 27, 1743–1759 (2009) 8. Smith, E.E., Kight, S.S.: Effects of feedback on insight and problem solving efficiency in training groups. Journal of Applied Psychology 43, 209–211 (1959) 9. Kim, T., Pentland, A.S., Chang, A.: Meeting Mediator: Enhancing Group Collaboration using Sociometric Feedback. In: Proceedings of the 2008 ACM conference on Computer supported cooperative work- CSCW 2008, pp. 457–466 (2008) 10. DiMicco, J.M., Pandolfo, A., Bender, W.: Influencing group participation with a shared display. In: Proceedings of the 2004 ACM conference on Computer supported cooperative work - CSCW 2004, p. 614 (2004)
Co-Creation of Value through Social Network Marketing: A Field Experiment Using a Facebook Campaign to Increase Conversion Rate Asle Fagerstrøm1 and Gheorghita Ghinea2 1
The Norwegian School of Information Technology, Schweigaardsgt. 14, 0185 Oslo, Norway
[email protected] 2 School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge UB8 3PH, London, United Kingdom
[email protected]
Abstract. The concept of social network marketing has gained much interest in both applied and academic marketing. While several studies have demonstrated the use of social network marketing, research on the actual effect on business value is scarce. A field experiment was prepared where applicants for IT bachelor studies were invited to join a Facebook group related to the subject of interest. Each Facebook group was assigned a contact person who received training in answering questions from the applicants and to create activities on the social network site. The results showed that the conversion rate for applicants who apply for a Facebook group was 88.8 %, which is significant higher than for those who did not apply for a Facebook group (43.3 %). We suggest that social network sites, such as Facebook, can be used as an arena for co-creation of value. Keywords: Social Network Marketing, Co-Creation of Value, Facebook Campaign, Field experiment.
1 Introduction There is much hype about social network marketing and its potential impact on business, and many companies are diligently establishing presences on Facebook, YouTube, Second Life, and other social platforms. According to an article in McKinsey Quarterly by Zeisser [1], the actual business value of social network marketing remains unclear, and while common wisdom suggests that they should be tremendous enablers and amplifiers of word of mouth, few companies have unlocked this potential. Morans [2] defines social network marketing as “any way to get attention for your message using people connected to the Internet.” In addition, he categorizes social network marketing in four types of social media: content, personality, interest and fantasy. Content-based social media marketing is built around individual messages. For example, YouTube hosts videos designed to be shared with others. Other contentbased social media sites don’t host the content – they just link to it. Personality-based G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 229–235, 2011. © Springer-Verlag Berlin Heidelberg 2011
230
A. Fagerstrøm and G. Ghinea
social networking sites allow each member to create a profile description, which again can be linked to the profiles of colleagues and friends, forming a network. Facebook, Twitter and LinkedIn have become significant personality-based networks for targeting segments. Interest-based social media marketing is communities organized around specific subjects on message boards, blogs etc. These communities give companies the opportunity to interact with consumers, and most importantly, to listen and learn from their experience and ideas about product improvements. Specialized search sites like Twingly allow category searches for blogs on a specific topic. Finally, virtual worlds such as Second Life are also a social medium. Marketing through a virtual world is denoted fantasy-based social media marketing. However, it is not easy to define exactly how these social networks can contribute to create business value.
2 Co-Creation of Value The marketing concept of co-creation of value provides a shift from a companycentric view to a more balanced view of a company and clients interacting and cocreating experience with each other [3-6]. Prahalad and Ramaswamy [7] who introduced the concept say that, thanks largely to the Internet, customers are fundamentally changing the dynamics of the marketplace. The marketplace has become an arena where consumers play a much more active role in creating value. Moreover, the authors [7] state that the characteristic aspect of the new marketplace is that consumers become a new source for competence for the company. The competence that customers bring is a result of the knowledge and skills they possess, their willingness to explore and learn, and their ability to engage in an active dialogue. Co-Creation of value is short for collaborative creation. It creates business value by employing the experience of people from both inside and outside the company. The consumers’ desire for this type of activities is not new. Alvin Toffler, an American writer and futurist, has written about the principle 30 years ago in his book “The Third Wave”. Toffler [8] states that people do not want to consume passively; they would rather participate in the development and creation of products meaningful to them. The type of collaborative engagement described by Toffler is now possible for example through social network sites on the Internet. Prahalad and Ramaswamy [3] delineate a perspective of co-creation value emphasizing on the interaction between the company and consumers as the locus of value creation and value extraction. Today’s consumers are increasingly active information seekers and are no longer dependent on information from the company. Furthermore, consumer-to-consumer communication and dialogue provide consumers an alternative source of information and perspective. Prahalad and Ramaswamy [3] suggest that companies must focus on personalized interactions to co-create value with their customers. In their view, co-creation of value not only describes a trend in business of jointly creating products. It also describes a movement away from customers buying products and services as transactions, to those purchases being made as part of an experience. Prahalad and Ramaswamy [3] claim that consumers seek freedom of choice to interact with the firm through a range of experiences.
Co-Creation of Value through Social Network Marketing
231
Furthermore, consumers want to define choices in a manner that reflects their view of value, and they want to interact and transact in their preferred language and style. So what does it mean to co-create value in a world of virtual conversations enabled by social networks? Personality-based social networking sites such as Facebook, Twitter and LinkedIn allow each member to create a profile description, which again can be linked to the profiles of others, forming a network where companies and clients can interact. Social networks can, therefore, function as an arena where companies and customers interact and co-create value with each other [9]. The following study describes how a university college in Norway manages to co-create value as a result of interacting with applicants on Facebook. This paper is structured as follows: In the first section we give a presentation on how the Facebook campaign was organized and conducted. Second, there will be a short presentation of the results. Third, we discuss the results from the Facebook campaign in relation to co-creation of value between the company and consumers. Finally, our last section contains concluding comments on the use of social network marketing for value creation and value extraction.
3 Study Description The Norwegian School of Information Technology (NITH) is a private university college specialized in information technology. Like many other companies, NITH wants to adopt social network marketing as part of its marketing campaign. The recruitment period begins in January and ends when the semester starts at the end of August of the same year, with most applications being submitted during February to April. One challenge that the marketing manager of NITH faces is given by the relative low conversion rate from applicants. The conversion rate of the previous years was around 43% (of applicants becoming registered students), and the marketing manager recognizes that an increase of conversion rate would have a considerable impact on the company’s income. It was therefore decided to try social network marketing to increase conversion rates, and the experiences in this undertaking are reported in the present paper. To decide what to study and which college to apply for is for most people a highinvolvement situation. It is an extensive problem solving [10], and consumers in a situation like this need a great deal of information to establish a set of criteria on which to judge specific study alternatives and a correspondingly large amount of information concerning each of the alternatives to be considered. Accordingly, NITH decided to create a personal relation with all applicants by the use of personalitybased social media. The social media arena was meant to be a place where the school and the applicants could interact and hence co-create experience. The assumption was that when a personal relation was immediately established after the applicant submits his/her application it would increase the likelihood that the applicant accepts the offer and becomes a student. Since NITH did not have any experience with social network marketing, the campaign was design like a field experiment. It was decided that the target group for the campaign was given by applicants for each bachelor program: Digital Marketing, E-business, Programming, Interactive Design, Game Design and Game Programming.
232
A. Fagerstrøm and G. Ghinea
A Facebook group was established for each bachelor program. Each group was assigned a contact person who received training to enter into dialogue with applicants and create activities. In order to have a low barrier to engage in activity it was decided to use NITH students from each of the bachelor programs as contact person. NITH administration, marketing department and lecturers were not allowed to participate. The social network activity began in February 2009 and was completed towards the end of July 2009.
4 Results All Facebook groups received members immediately after they were published. Some of the groups, like Game Design and Interactive Design, recruited more applicants than other groups. The dialogue between applicants and the contact person was related to the content of the study program at NITH. The contact person managed in varying degree to facilitate interaction in their respective Facebook groups. It was obviously easier to achieve a good dialogue in groups with a certain amount of members (e.g., Game Design and Interactive Design) than in groups with few members. The applicants were curious and asked questions about program related topics, technology and tools that are used in the program. Some questions were related to job opportunity after finishing a specific bachelor program and others to the social activities at the campus. The dialogue in each Facebook group was totally transparent. All members of the group could take part in the other’s experience. We observed dialogues between the applicant and contact person, however, in some groups we also observed interaction between the applicants. Some applicants started to share information about their interests and technological skills, their experience as an applicant and how complex it was to decide what and where to study. Some applicants were more emotional and expressed how much they looked forward to start studying at NITH. But what about the conversion rate from applicants to student? Table 1 shows the conversion rate from applicants who did not apply for a Facebook group. Column one shows bachelor programs (Game Programming is not included due to missing data). Column two shows the number of applicants and column three the number of applicants who became a student. Column four shows conversion rate for each of the bachelor programs. Table 1. Conversion rate from applicants who were not on Facebook
Bachelor programs DigitalMarketing E-business Programming InteractiveDesign Game Design Total
Applicants 21 42 28 59 81
Applicants who become a student 5 23 3 30 39
Conversion rate 23.8 % 54.8 % 10.7 % 50.8 % 48.1 %
231
100
43.3 %
Co-Creation of Value through Social Network Marketing
233
As indicated in Table 1, the conversion rate of applicants who did not apply for a Facebook group was 43.3 %, which is much the same as in previous years. Table 2 shows the conversion rate from applicants who applied for a Facebook group. Table 2. Conversion rate from Applicants who were on Facebook
Bachelor programs DigitalMarketing E-business Programming InteractiveDesign Game Design Total
Applicants 8 21 11 24 43
Applicants who become a student 7 21 9 21 37
Conversion rate 87.5 % 100 % 81.8 % 87.5 % 86.0 %
107
95
88.8 %
As Table 2 shows, conversion rates for those applicants who applied for a Facebook group was 88.8 %, which is significantly higher than for those who did not apply for a Facebook group. This is 49 more students than if the conversion rate was 43.3 %. This gives NITH an income around 7260000 Norwegian Kroner (approximately US$ 1265309) spread over three years. The cost was estimated to be around 6000 Norwegian Kroner (approximately US$ 1046) that mainly covers salaries.
5 Discussion Prahalad and Ramaswamy [3] suggested that companies should focus on personalized interaction to co-create value with their customers. To reach this aim the authors define four main building blocks of interactions between the company and their consumers that facilitate co-creation experiences: Dialogue, access, risk-benefits, and transparency. Dialog includes the conversations between consumers and the company to jointly define and solve the consumer’s problems, while the company at the same time acquires knowledge about the consumer. One of the success criteria behind a Facebook campaign is probably the use of students as contact persons in the dialog between NITH and the applicants. As stated by Prahalad and Ramaswamy [3], “it is difficult to envisage a dialog between two unequal partners.” Hence, to achieve an active dialogue, the company and the customer must become equal and joint problem solvers. The NITH students and applicants are equal partners and, thus, joint problem solvers. The dialogue was centered around issues of interest to both – the NITH student and the applicant. It is hard to achieve dialogue if consumers do not have the same access and transparency to information. Companies have, according to Prahalad and Ramaswamy [3], “traditionally benefited from exploiting the information asymmetry between them and the individual consumer.” Because of the ubiquitous connectivity (computer or mobile) that a personality-based social networking site offers, it was possible for an
234
A. Fagerstrøm and G. Ghinea
applicant to get access to as much information as he/she needed from the other applicants on the Facebook group as well as from NITH. Prahalad and Ramaswamy [3] emphasize that both access and transparency to information are critical to have a meaningful dialog. Prahalad and Ramaswamy [3] states the following: “For active participation in cocreation, the company’s information has to be available to the consumers, including information search, configuration of products and services, fulfillment, and consumption.” By the use of a personality-based social network site, NITH helps the applicant’s through their extensive problem solving situation. Dialogue, access, and transparency enhance the applicant’s assessment of the risk-benefit of his/her decision (what and where to study). Should I start studying at NITH? What are the benefits and the risks? Instead of only depending on information given by the marketing department at NITH through advertisements, catalogue, and contact with sellers, applicants could now get information from other students and other applicants, and, in addition share his/her experience with others. This is, according to Prahalad and Ramaswamy [3], a personalized understanding of risk-benefit. A personalized co-creation experience reflects how the individual chooses to interact with the environment that the company facilitates. This is a totally different process – one that involves individual consumers on their terms [11]. The use of cocreation experiences as the basis for value creation was possibly the key to success for NITH’s social network marketing campaign.
6 Conclusion Traditional marketing is about “the achievement of corporate goals through meeting and exceeding customer needs better than the competition” [12, 13]. However, marketing has in recent years moved from a goods-dominant perspective, in which tangible outputs and discreet transactions were central, to a service-dominant perspective, in which intangibility, exchange processes, and relationships are central [4]. The emergence of social media has given companies a powerful tool to create business value. However, successes are not simply setting up a Facebook page or create and publish a YouTube video. It remains critical to know what the end goal of the company is [14]. Why is your company considering social network marketing? Who is your company’s targeting segments? What are you going to say in the social media space? The study described in this paper demonstrates, through a field experiment, how a company can co-create value through a social network marketing campaign. The NITH’s campaign on Facebook demonstrates that social networks can be used as an arena for co-creation experience as a basis for value creation. Types of social media are emerging fast. What is important, according to Moran [2], is the need to pay attention, so that when a new type of social media appears, a company can recognize it and consider whether it could work for its next marketing campaign. To realize the marketing potential of social media marketing, your company has to make them an arena for co-creation of experience.
Co-Creation of Value through Social Network Marketing
235
References 1. Zeisser, M.: Unlocking the elusive potential of social networks. McKinsey Quarterly (3), 28–30 (2010) 2. Moran, M.: Do It Wrong Quickly: How the Web Changes the Old Marketing Rules. IBM Press, Upper Saddle River (2008) 3. Prahalad, C.K., Ramaswamy, V.: Co-creation experiences: The next practice in value creation. Journal of Interactive Marketing 18, 5–14 (2004) 4. Vargo, S.L., Lush, R.F.: Evolving to a New Dominant Logic of Marketing. Journal of Marketing 68, 1–17 (2004) 5. Vargo, S.L., Maglio, P., Akaka, M.: On value and value co-creation: A service systems and service logic perspective. European Management Journal 26, 145–152 (2008) 6. Grönroos, C.: Service logic revisited: who creates value? And who co-creates? European Business Review 20, 298–314 (2008) 7. Prahalad, C.K., Ramaswamy, V.: Co-opting Customer Competance. Harvard Business Review 78, 79–87 (2000) 8. Toffler, A.: The Third Wave. Bantham Books (1980) 9. Hoffman, D.L., Novak, T.P., Chatterjee, P.: Commercial scenarios for the Web: Opportunities and challenges. Journal of computer-mediated communication 1(3) (1995) 10. Schiffman, L.G., Kanuk, L.L.: Consumer Behavior, 9th edn. Prentice Hall, Upper Saddle River (2007) 11. Prahalad, C.K., Ramaswamy, V.: The New Frontier of Experience Innovation. Sloan Management Review, 12–18 (2003) 12. Jobber, D.: Principles and Practice of Marketing, vol. 4. Mc-Graw Hill International (UK), Berkshire (2004) 13. Kotler, P., Keller, K.L.: Marketing Management, 12th edn. Prentice Hall, Upper Saddle River (2005) 14. Dwyer, P.: Measuring the value of electronic word of mouth and its impact in consumer communities. Journal of Interactive marketing 21(2), 63–79 (2007)
Towards Argument Representational Tools for Hybrid Argumentation Systems María Paula González1,2, Sebastian Gottifredi1,2, Alejandro J. García1,2, and Guillermo R. Simari2 1
National Council of Scientific and Technical Reseach CONICET, Argentina 2 Computer Science Department, Universidad Nacional del Sur Av Alem 1253 – 8000 Bahía Blanca, Argentina {mpg,sg,ajg,grs}@cs.uns.edu.ar
Abstract. Argumentation Systems are reasoning systems that provide automatic computation of arguments. “Argument Assistant Systems” are graphic-oriented tools for supporting end-users to manipulate arguments. Recently, the novel family of “Hybrid Argumentation Systems” (HAS) has emerged, combining these two approaches. Even when some HAS have been presented, either they show in the interface only final results of the computation of the dispute situation under consideration, or have not explicit considered usability features focused on real final users. Besides, current semantic goes from the definition of theoretical considerations to the graphical representation of the dispute situation under consideration, avoiding the direct manipulation of arguments is a graphical fashion. This paper discusses lessons learned at the development of DeLP Client, a particular HAS software oriented towards end-users where main goals include going beyond the above limitations. To achieve usability goals, some usabilityoriented design guidelines recently proposed for the argumentation systems domain are considered. Keywords: Knowledge Representation, Defeasible Argumentation, Hybrid Argumentation Systems, Usability Guideline.
1 Introduction and Motivation Argumentation is an important aspect of human decision making. In any argumentation process first a construction of arguments supporting and against a statement is made, second the set of warrant (acceptable) arguments is defined and finally it is decided whether the statement can be ultimately accepted or not. Over the past decade, theoretical advances in the area have consolidated different computational models, going from pure Argumentation Systems (AS) [1] that provides automatic calculus of arguments to more user-oriented tools called Argument Assistant Systems (AAS) [2] [3], where the goal is to assist the user in the process of arguing, rather than to provide complex reasoning tasks. In between, a novel family of argumentation systems called Hybrid Argumentation Systems (HAS) has emerged, combining the above two approaches. User oriented tools which provide an aid for drafting and generating arguments are complemented G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 236–245, 2011. © Springer-Verlag Berlin Heidelberg 2011
Towards Argument Representational Tools for Hybrid Argumentation Systems
237
with capabilities for automatic calculus of arguments that help to come to a decision when complex scenarios of arguments have to be considered. HAS general abstract frameworks have to be distinguished from the more concrete models that we have called “concrete HAS” (HASC), where a particular logic language is involved. Even when different HAS have been presented in the last years, they suffer a number of limitations: either they show in the interface only final results of the computation of the dispute situation under consideration instead of depicting the most significant intermediate steps that leads to give some argument status (e.g. warrant, undecided, etc.); or are not intended for final users. Besides, in most cases usability is not explicitly considered. However, facing end users with real interactive interfaces is essential to ensure HAS penetration beyond academia. In addition, achieving an appropriate degree of quality (in particular associated with usability) is crucial to promote good practices and acceptance of this kind of tools. In that respect, note that recently a set of usability-oriented guidelines have been proposed for the development of AAS [4], advancing towards an standardization about the way arguments can be sensibly and clearly presented to users, especially when they are defeasible. This paper discusses lessons learned at the development of the concrete HASC DeLP Client oriented towards end-users. The tool is based on Defeasible Logic Programing DeLP [5], which has been successfully embedded in real-world applications (e.g. recommender systems [6], decision support systems [7], and CSCW [8]). On the basis of the above usability-oriented guidelines, a full implemented prototype is presented. A preliminary usability inspection is sketched. Our final goal focuses on the design, implementation and evaluation of qualify HAS software tools for creating, drafting, calculating, and analyzing arguments.
2 Characterizing Hybrid Argumentation Systems Argumentation Systems (AS) are increasingly being considered for applications, constituting an important component of multi-agent systems for negotiation, problem solving, and for the fusion of data and knowledge [9]. In this context, Hybrid Argumentation Systems (HAS) combine the power of the AS with some user-oriented facilities inherited from Argument Assistant Systems (AAS) [10] [11] [12] [13]. As pointed out in [2], AAS provide often a realization of a formal argumentation theory, offering a good test bed for analyzing the advantages and disadvantages of the actual application of the theory. They have to be distinguished from AS automated reasoning systems; the latter can do complex reasoning tasks for the user, whereas AAS's goal is not to replace the user's reasoning, but rather to assist him in this process. The term HAS was coined by Hunter at [14], where an early discussion about the necessity of combining “formal” and “informal” argumentation models (AS and AAS respectively) was presented, and a first attempt to characterize common HAS features was shown. Going beyond [14], alternative modelling of argument leads to the distinction between what we have called “concrete HAS models” (HASC) -where a specific logical language underlays the definition of arguments and the notion of attack- and general HAS. Examples of HASC systems include [15] [16] [17] [18] and [19], among others. On the other hand, general HAS (as [20], [21] or [22]) have an abstract structure, including generic argument representation and a binary relation
238
M.P. González et al.
between them called “attack relation”. As posted in [23], abstracting away from the structure and meaning of arguments and attacks enables the study of properties which are independent of any specific aspect, but limiting expressiveness and applicability. Despite of their differences, all HAS systems include an embedded AS that allows to determine when a given argument can be considered as ultimately acceptable with respect to the available knowledge by means of some recursive analysis, which takes the form of a tree-like structure called dialectical tree in the particular case of defeasible logic-based HASC as [5] or [24]. Intuitively, in a dispute scenario conflicting arguments may emerge: an argument A attacks another argument B whenever both of them cannot be accepted at the same time, as that would lead to contradictory conclusions. The notion of defeat comes then into play to decide which argument should be preferred. An argument A defeats an argument B whenever A attacks B, and besides, A is preferred over the attacked part in B (with respect to some preference criterion). The criterion for defeat can be defined in many ways. As a generic criterion, it is also common to prefer those arguments which are more direct or more informed. This is known as the specificity principle [1]. The notion of defeat among arguments may lead to complex “cascade” situations: an argument A may be defeated by an argument B, which in turn may be defeated by an argument C, and so on. Besides, every argument may have on its turn more than one defeater. Arguments must additionally satisfy the requirement of consistency (not include contradictory propositions) and minimality (not including repeated or unnecessary information). In addition, HAS should provide all the facilities included in AAS for drafting and generating arguments, assisting the user in his reasoning process. This assistance involves several aspects of the argumentation process, e.g. keeping track of the issues that have been raised, assumptions that have been made, evaluating the justification status of the statements involved in the argumentation process, etc. Indeed, some features have to be included in the HAS interfaces, as they constitute common elements at AAS [4] [14]. First, they convey the representation of some user mental model, (i.e., all the cultural and personal-biased users' perceptions and assumptions, as well as their pre-conceptions about how the tasks performed by the system are solved in real world and consequently how the system is expected to react), together with the interaction style (including both physical and mental actions). Additionally, feedback and support is usually included (explicit current system status; prevention and recovering from errors and misuse, e.g. by means of help and documentation, undo options, etc.) as well as diverse interoperability facilities (as links to multimedia elements). Besides, there are some common features in AAS interfaces typically associated with the argumentation process itself. Three central features are the visual argument representation (including the recognition of different types of arguments, their statuses, etc.), the modeling of conflict among arguments which allows the user to recognize the argumentation situation under consideration, and the preference criteria associated with the possibility of visualizing or deducing how the conflict among arguments is resolved. As mentioned above, some implemented HAS were designed to show in the interface only final results of the computation of the dispute situation under consideration. Others are intended for agents rather than for final users [19]. Besides, in most cases usability is not explicitly considered. Note that these limitations do not invalidate the qualitative advances conquered in the last years. However, from the
Towards Argument Representational Tools for Hybrid Argumentation Systems
239
end-user perspective visualizing a comprehensible and graphic oriented representation of the intermediate steps to determine the status of an argument (e.g. warrant, undecided, etc.) enhance the understanding of final results, giving more adequate support and feedback. Besides, note that the user's acceptance regarding HAS will be directly proportional to its interface quality in use. As we will describe in the next section, to cope with the above situation recently a novel HASC system is being developed, including the AS DeLP and a user-oriented interface. Some usabilityoriented guidelines focused on AAS.
3 The Proposal As stated in [11], a significant strand in AS research focuses nowadays on the design, implementation and evaluation of practical software tools for creating and analyzing arguments. Consequently, we started comparing the existing AS and AAS to characterize common features in their interfaces, plus the minimal set of requirement for designing and developing usable HAS. Then, based on DeLP [5] and a previous visualization tool developed for DeLP answers [25], we developed the HASC prototype DeLP Client including the usability-oriented guidelines shown at [4]. 3.1 Designing and Developing Usable HASC Systems Developing high quality HASC systems is challenge. Over the last 10 years, the recommendations early proposed at [14] still remind valid. However, a more complex scenario justifies a revision of these pioneering ideas. First, a minimal set of operations has to be covered. Taking as starting point [3] and [14], all the HAS features described in Section 2 have to be covered (e.g. the user mental model, a visual argument representation, etc.). In addition, some requirements to ensure the visualization of the automatic calculus of arguments have to be presented, including the representation of the intermediate steps that lead to particular conclusions (argument status). Second, note that the quality of the HAS interfaces will play a key role respecting the user experience. In particular, usability - formally defined by ISO 9241-11 as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use”- will play a major role. In that respect, our learned lessons include detecting the necessity of standardization for HAS respecting usability. As a first step, a minimal set of usability-oriented guidelines has been proposed for AAS [4], where each guideline is instantiated by means of questions and recommendations, and different usability principles were identified to evaluate guidelines quality in use. Third, current HAS implementations provide a one-way interactive style, in the sense that the graphical representation is static, just mirroring the output of the automatic computations. On the contrary, some AAS as Araucaria [10] or Compendium [12] are flexible enough to admit manipulation of arguments in a direct fashion. Indeed, it is possible to move arguments, resize them, link them between each other (e.g. attack or contra-attack, support, add more specificity, etc.). Consequently, it would be desirable to define a novel double-way interactive style for HAS to include direct manipulation of arguments. In the case of the HASC the challenge will be to link user actions at the
240
M.P. González et al.
system interface with the dynamic modification of the dialectical trees that compute the current status of all arguments under consideration. In addition, if the direct manipulation includes the possibility of “drag and drop” information for external sources (as in [10]), then the knowledge base should be updated. 3.2 The HASC DeLP Client: A Prototype As stated above, our proposal includes the general-purpose AS Defeasible Logic Programming (DeLP) [5], which uses a representational language for writing sentences that follows Prolog syntaxs. In DeLP two kinds of knowledge are included. Strict knowledge or KS (denoted Head <- body) corresponds to the knowledge which is certain, as statements or undisputable facts about the world, or mathematical truths (e.g. implications of the form (∀(x) P(x)→Q(x)). The strict knowledge is consistent, i.e. no contradictory conclusions can be derived from it. On the other hand, defeasible knowledge or KD (denoted head -< body) corresponds to that knowledge which is tentative, modelled through “rules with exceptions” (defeasible rules) of the form “if P then usually Q” (e.g., “if something is a bird, it usually flies”). Such rules model our incomplete knowledge about the world, as they can have exceptions (e.g., a penguin, a dead bird, etc.). An argument A for a claim c is basically some “tentative proof” or proof tree (formally, a ground instance of a subset of KD) for concluding c from A ∪ KS. Figures 1 and 2 (left) show DeLP syntax, including KS and KD examples. Given an argument A1 for some conclusion C, DeLP builds all the sequences of arguments [A1, A2,... , Ak] such that every argument Ai (except the first) defeats the previous argument in the sequence. These sequences are known as argumentation lines (or dialogue line) and model dispute dialogues between two parts called Proponent and Opponent. An argument A considered as finally acceptable (warranted or undefeated) if every argumentation line starting in A has an odd number of arguments. This accounts to say that A has “survived” all possible attacks (i.e., every attack to A is successfully defended in every argumentation line). Typically, when there are several argumentation lines, they are represented in a tree structure (dialectical tree) which shows all possible argumentation line that start by a given argument. Queries are solved by computing these dialectical trees, answering “yes” (only if there is a warranted argument supporting the query), “no” (only if there is a warranted argument supporting the contrary of the query.), or “undecided if we are in a situation in which we cannot decide (none of the above cases hold). Figures 1 and 2 show different screenshots of the DeLP Client HASC prototype.1 While Figure 1 includes the toy “Bird” example related to the information “if something is a bird, it usually flies”, Figure 2 is associated with a more complex example called “Many Trees” where the value of visualizing intermediate steps when many arguments are jointly considered is appreciated. The AAS part (both figures, left) includes two main horizontal panels. On the top, a Query Manager where queries can be posted by the user; on the bottom the editable DeLP Program Manager to write statements (or open an existing file) that describes the current dispute situation under consideration, including undo options (e.g. clear button) and the facility to load the statements at the DeLP engine, the embedded AS that was described above. 1
Current version deployed during 2010 at LIDIA Lab (http://lidia.cs.uns.edu.ar/).
Towards Argument Representational Tools for Hybrid Argumentation Systems
241
Fig. 1. Client Screenshots for the “Bird” example. AAS part (left). DeLP Tree Drawer showing intermediate steps for the claim “fly(coco)?” with zoom at the bottom argument. Top and bottom triangle are green, intermediate triangle is red (right).
Fig. 2. DeLP Client Screenshots for the “Many Trees” example. AAS part (left). DeLP Tree Drawer showing evidence for the claim “a?” (right).
Once the sentences are introduced, users can post queries at the Query Manager to ask for any claim; and rely on DeLP to compute the status of the involved arguments. To respond a query, the DeLP Client opens the DeLP Tree Drawer (Figures 1 and 2, left), where two panels show the final argument status plus the most significant intermediate steps that lead to them. Dialectical trees with triangular nodes and arrows model the arguing situation: each triangle represents an argument (triangles are a common drawing for proof trees), and arrows indicate relationships and the inference direction. The left panel shows all the possible dialogue lines that can be computed associated with the current query, each of them identified by his root. The best explanation is detailed in the right panel, where green nodes depict undefeated arguments and red nodes the defeated ones. As in [25], if any node is clicked, the dialectical tree expands itself detailing the internal structure of the argument. Besides, both the statements and the related trees can be exported.
242
M.P. González et al.
4 Usability Inspection: Discussion Once the DeLP Client prototype was deployed, a preliminary usability inspection was carried outs to assess the real scope of the main interface features (see Section 2) that account for the usability-oriented guidelines (UG) underlying the design. The inspection was performed at the LIDIA Labs by experts in usability experienced on the AS area. Four different computers were used to simulate user’s scenarios. Further usability studies involving real users are needed to validate the conclusions discussed here. The interface seems easy to learn. Respecting the user mental model (UG UM#1 to UM#3), the DeLP Client matches both underlying theory and real world, since the fundamental elements of the theory (arguments, attacks, dialectical trees, automatic calculus of arguments, etc.) were represented. Accepted levels of predictability seems to be achieved, as the node-oriented graphical representation of AAS was followed (covering UG UM#3), avoiding the inclusion of Prolog-like syntactic. The domain is explicit enough, and no other facilities beyond the ones associated with arguing were included. Maybe the position of the Program Manager and Query Manager panels should be interchanged to show at the top the Program Manager, as in real world at least a sentence should be posted before arguing. Note that the top panel is not accessible before introducing information in the bottom one. The interaction style (UG IS#1 to IS#7) sketches a future WIMP (Windows, Icons, Menus, Pointers) fashion. It combines form filling for the input of sentences with a still simple menu selection (currently the quantity of available options do not deserve pull-down menus, and options are displayed with boxes in horizontal lines). Main options are activated by means of mouse clicks and, as discussed before, direct manipulation of arguments associated with a novel “double way” interactive style has been characterized but still reminds undeveloped. Covering UG IS#1, when the DeLP FX Visualizer is activated the user control is slightly reduced, showing coherence with the underlying theory (the automatic calculation of arguments at HASC). Besides, a minimalist style was adopted (UG IS#3), even when the absence of all the tool final facilities makes difficult to ensure that the current information architecture (simple and clear) will be preserved and not necessary interaction styles will be included (UG IS#5). Maybe the Prolog-like syntax at the Program Manager should be replaced (or complemented) with a more user-oriented language to better minimize technical considerations (UG IS#4). The prototype does not provide configurable issues (UG IS#7). However, respecting the UG IS#7 different elements can be easily included in a single view: e.g., statements can be added to the bottom of the DeLP FX Visualizer to reinforce the user perception of the discussion under consideration. The visual argument representation plus the conflict among argument modelling (UG VR#1 to VR#3 and CA#1 to CA#3) are remarkable. Indeed, main goals of the proposal include visualizing a comprehensible and graphic oriented representation of the intermediate steps that leads to give some argument status. In that respect, the double panel screen of the DeLP FX Visualizer enrich user experience by providing a comprehensible and intuitive representation of both the final result and the alternative answers to a particular query. In particular, the expert opinions suggested that arguments are clearly represented by the tree nodes (UG VR#1). The selection of red and green colors to distinguish final status of arguments helps to achieve affordance and predictability of guideline VR#2, as red is usually associated with “no” symbols
Towards Argument Representational Tools for Hybrid Argumentation Systems
243
(not supported arguments) and green is the opposite (as in the semaphores). Besides, the arguments relevance (UG VR#3) is visualized in an easy to learn and consistent way, since tree-like structures position the argument under consideration at the root and then alternate the attacks and supports going down, following prior user knowledge respecting trees hierarchy. Alternative modelling of the arguing situation currently under consideration is not provided (UG CA#2). Different views to allow zoom in and out are nicely included at the DeLP FX Visualizer (UG CA#3); in a flexible, easy and consistent way. However, the prototype under evaluation can’t compete with the flexibility provided by some AAS interfaces (as in [12]). Respecting the preference criteria (UG PC#1 and PC#2), the usability inspection could not arises any conclusion. Indeed, real final user opinions are irreplaceable to ponder the achievement of these guidelines. Interoperability related guidelines (UG IO#1 and IO#2) are still inefficient, as only a set of pre-written sentences saved as a .delp file can be imported. However, note that other HASC also do not cover these guidelines, leading to the definition of an Argument Interchange Format (AIF) still under discussion. On the contrary, guidelines regarding feedback and support (UG FS#1 to FS#5) are reasonable accomplished. Indeed, undo options are included, and error prevention achieved by means of the described interaction style. Finally, the guidelines associated with collaboration were omitted as the current prototype is designed for individual users.
5 Related Work To the best of our knowledge, there is no another approach similar to the one discussed here. A relevant implemented HASC is the agent-oriented MARGO [19], devoted to practical reasoning about service composition. MARGO embeds the CaSAPI system [24], an AS general-purpose engine for assumption-based argumentation implemented in SICStus Prolog. MARGO differs from our approach not only in the underlying logic structure of arguments ([5] versus assumption-based argumentation) but also in the information visualization. While we provide a node-oriented graphical representation, MARGO depicts a Prolog-like syntactic, with statements listed as in command-line interfaces. An example of a HASC that has still not includes a final user interface is the [26] system. Recently some interesting HAS abstract frameworks have been implemented as real world applications. An example is the Dungine Java Reasoner by South et al. [21], successfully integrated with Araucaria [10] using the ArgKit library.2 Another relevant approach is presented at [munoz09], where the authors propose the materialization of a complete argumentation system ready to be built in conventional agent software platforms. In a more general perspective, some multi-semantic argumentation engines have also been developed with the capability to refer many semantics in the same module at the same time. Interesting examples are the MoDiSo environment,3 and the web-based LASAD system4. Finally, note that the term Hybrid Argumentation was used in [17], but with a different meaning that the one introduced 2
http://www.argkit.org http://www.cs.ait.ac.th/~dung/modiso/About.html 4 http://cscwlab.in.tu-clausthal.de/lasad/ 3
244
M.P. González et al.
here. While our goal is to characterize the particular family of HAS, [17] refers to a concrete computational model for a form of argumentation that is a hybrid between abstract and assumption-based argumentation.
6 Conclusion Hybrid Argumentation Systems (HAS) emerged in the last years as a natural way of coping with knowledge management and decision making in dispute scenarios where incomplete or contradictory information has to be handle. They offer an interface including user-oriented facilities for visualizing, creating, drafting, and analyzing arguments derived from Argument Assistance Systems [2] [Verjeij 2007], plus an underlying pure Argumentation System (AS) [1] for calculating attack and other relationships between them. This paper discusses the lessons learned at the development process of an HASC aimed to mirror not only final results of the computation but also most significant intermediate steps that leads to give some argument status (e.g. warrant, undecided, etc.). Main contributions are related to the visualization of the intermediate steps that support argument calculation in a natural and intuitive way, as well as the consideration of the usability-oriented guidelines shown at [4]. The AS DeLP [5] underlies the proposal. Future work includes the performance of alternative full usability evaluation including real final users. On the basis of the obtained results, an incremental iterative development process based on a Usability Engineering approach has to be carried out over current implementation. At near future cycles of that process direct manipulation of arguments has to be considered, leading to a revision of the questions associated with every usability-design guideline at [4] to cover it. Acknowledgments. This paper was funded by Projects PIP-CONICET 112-20080102798 and UNS PGI 24/ZN18 (Argentina); and TIN2008-06596-C02-01 (Spain).
References 1. Chesñevar, C.I., Maguitman, A., Loui, R.: Logical Models of Argument. ACM Computing Surveys 32(4), 337–383 (2000) 2. Verheij, B.: Artificial argument assistants for defeasible argumentation. Journal of Artificial Intelligence 150(1-2), 291–324 (2003) 3. Verheij, B.: Argumentation Support Software: Boxes-and-Arrows and Beyond. Law, Probability & Risk 6, 187–208 (2007) 4. González, M.P., Chesñevar, C., Pinkwart, N., Gomez Lucero, M.: Developing Argument Assistant System from a Usability viewpoint. In: Proc. KMIS 2010, pp. 157–163 (2010) 5. García, A., Simari, G.: Defeasible Logic Programming: An Argumentative Approach. Theory and Practice of Logic Programming 4(1), 95–138 (2004) 6. Brena, R., Chesñevar, C.: Information Distribution Decisions Supported by Argumentation. In: Encyclopaedia of Decision Making and Decision Support Technology, Information Science Reference, vol. II, pp. 489–495 (2008), ISBN 978-1-59904-843-7 7. Williams, M., Hunter, H.: Harnessing ontologies for argument-based decision-making in breast cancer. In: Proceedings of ICTAI 2007, pp. 254–261 (2007)
Towards Argument Representational Tools for Hybrid Argumentation Systems
245
8. González, M.P., Penichet, V.M.R., Simari, G.R., Tesoriero, R.: Development of CSCW interfaces from a user-centered viewpoint: Extending the TOUCHE process model through defeasible argumentation. In: Kurosu, M. (ed.) HCD 2009. LNCS, vol. 5619, pp. 955–964. Springer, Heidelberg (2009) 9. Bench-Capon, T., Dunne, E.: Argumentation in Artificial Intelligence. Int. Journal on Artificial Intelligence 171, 619–641 (2007) 10. Reed, C., Rowe, G.: Araucaria: Software for Argument Analysis, Diagramming and Representation. Int. Journal on Artificial Intelligence Tools 14, 961–980 (2004) 11. Buckingham Shum, S.: Cohere: Towards Web 2.0 Argumentation. In: Proc. Int. Conf. COMMA 2008, pp. 97–108. IOS Press, Amsterdam (2008) 12. Okada, A., Buckingham Shum, S., Sherborne, T. (eds.): Knowledge Cartography: Software Tools and Mapping Techniques. Advanced Information and Knowledge Processing Series. Springer, Heidelberg (2008), ISBN 978-1-84800-148-0 13. Van den Braak, S., Vreeswijk, G., Prakken, H.: AVERs: an argument visualization tool for representing stories about evidence. In: Proc. of the 11th ICAIL, pp. 11–15 (2007) 14. Hunter, T.: Hybrid argumentation systems for structured news reports. The Knowledge Engineering Review 16(4), 295–329 (2001), ISSN 0269-8889 15. Besnard, P., Hunter, A.: A logic-based theory of deductive arguments. In: Artificial Intelligence, vol. 128, pp. 203–235 (2001) 16. Rahwan, I., Amgoud, L.: An Argumentation-based Approach for Practical Reasoning. In: 5th AAMAS 2006, pp. 347–354. ACM Press, New York (2006) 17. Gaertner, D., Toni, F.: Hybrid argumentation and its properties. In: Proc. 2nd Int. Conf. COMMA, pp. 183–195. IOS Press, Amsterdam (2008), ISBN: 978-1-58603-859-5 18. Kakas, A.C., Toni, F.: Computing Argumentation in Logic Programming. Journal of Logic and Computation 9, 515–562 (1999) 19. Morge, M.: The hedgehog and the fox. In: Rahwan, I., Parsons, S., Reed, C. (eds.) Argumentation in Multi-Agent Systems. LNCS (LNAI), vol. 4946, pp. 114–131. Springer, Heidelberg (2008) 20. Prakken, H.: An abstract framework for argumentation with structured arguments. Argument and Computation 1, 93–124 (2010) 21. South, M., Vreeswijk, G., Fox, J.: A Java Dung Reasoner. In: Proc. Int. Conf. COMMA 2008, pp. 360–368. IOS Press, Amsterdam (2008), ISBN: 978-1-58603-859-5 22. Brewka, G., Gordin, T.: Carneades and Abstract Dialectical Frameworks: A Reconstruction. In: Proc. COMMA 2010, pp. 3–12. IOS Press, Amsterdam (2010), ISBN 978-1-60750-618-8 23. Baroni, P., Giacomin, M.: Semantics of Abstract Argument Systems. In: Argumentation in Artificial Intelligence, pp. 25–44. Springer, Heidelberg (2009), ISBN: 978-0-387-98196-3 24. Gaertner, D., Toni, F.: Computing Arguments and Attacks in Assumption-Based Argumentation, vol. 22 (6), pp. 24–33. IS, IEEE Computer Society, Los Alamitos (2007), ISSN 1541-1672 25. Escarza, S., Castro, S., Martig, S.: DeLP Viewer: a Defeasible Programming Visualization Tool. In: PROC. XVCACIC, pp. 556–565 (2009), ISBN: 978-897-24068-4-1 26. Williams, M., Hunter, H.: Harnessing ontologies for argument-based decision-making in breast cancer. In: Proceedings of ICTAI 2007, pp. 254–261 (2007)
Development of a Price Promotion Model for Online Store Selection Shintaro Hotta, Syohei Ishizu, and Yoshimitsu Nagai Department of Science and Engineering, Aoyama Gakuin University, Japan
[email protected]
Abstract. There are many customer concerns related to online shopping, such as the inability to view actual products and the possibility of dishonesty. Online shopping nevertheless has the advantage of generally low prices. Effective price promotion that considers both customer concerns and price advantage is important for online stores. We developed a store selection model for both online stores and brick-and-mortar stores. We also conducted a survey to test the store selection model. Finally, we propose an effective price promotion method for each type of store. Keywords: store selection model, price promotion, brand selection model, maximum likelihood estimation, multinomial logit model.
1 Introduction Online stores have an advantage over brick-and-mortar stores in that they have reduced personnel expenses and land costs. This allows them to keep prices low, even without the advantage of economies of scale. Although online shopping has the advantage of generally low prices and the convenience of buying goods from home, there are many other customer concerns, such as the inability to view actual products and the possibility of dishonesty. An unknown online store that offers extremely low prices might give rise to customer concerns. In contrast, if an online store offers the same prices as a brick-and-mortar store, then the online store would be placed at a disadvantage due to the many potential customer concerns, in spite of the convenience of buying goods from home. Accordingly, an effective price promotion strategy is needed for online stores, one that considers both the possibility of customer concerns and the advantages of discounted prices. Price Sensitivity Measurement (PSM) analysis investigates responses to product pricing. The following four questions allow us to derive five indices (Table 1) for a given product: − − − −
What price would be too cheap for this product? At what price would you be concerned about product quality? What price would be too expensive for this product? At what price would you consider this product too expensive to buy?
G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 246–252, 2011. © Springer-Verlag Berlin Heidelberg 2011
Development of a Price Promotion Model for Online Store Selection
247
Table 1. An index to constitute price range
Upper price
Lower price Most suitable
The upper price at which the product would not be purchased The lower price at which the product would not be purchased The price most accepted by consumers
price Compromise price Acceptable price range
The price which it is not ideal, but a compromise The range from the upper price to lower price
Fig. 1. Representation of PSM analysis
Figure 1 shows an accumulation graph of the number of respondents at the prices on the horizontal axis. The points of intersection of the four curves give the indices listed in Table 1. We can use PSM analysis to find the price that causes consumers to feel uneasy about the quality of the product. Each index is found at a single point, however, so comparison of multiple objects is difficult. On the other hand, the brand choice model can show the price promotion effect of the product through the magnitude of utility. McFadden’s [1] multinomial log it model formulates choice probability on the assumption that consumers choose brands whose utility is the highest. Although many brand selection models assume that utility simply increases as price decreases, Kiuchi [2] plots utility on two graphs by using a
248
S. Hotta, S. Ishizu, and Y. Nagai
Weibull distribution function for utility of price. In one graph, utility decreases if the price is too low because of concerns about quality. In the other graph, utility increases as price decreases. In this model, utility can be plotted against the ratio between wholesale price and retail price for each good, and we can gain insight into effective price promotion methods for each product. This model considers brand selection of products, but also considers consumer behavior related to selecting a store to purchase the product. In this study, we analyze the effectiveness of price promotions for different stores using Kiuchi’s model, shifting the model target from brand selection to store selection. When doing so, store utility must incorporate the concept of the product in this model, as it is formed through the sale of many products. The aim of this study is to build a store selection model that considers price promotion for each store, and to propose an effective price promotion method for each store. We use a Wei bull distribution-type function for part of the utility function price to consider reactions to the two types of prices mentioned above. Furthermore, we test the proposed price promotion method by conducting a questionnaire survey on purchases and applying the results to the proposed model.
2 Conventional Study We shall first explain the utility function of Kiuchi’s study. This is a brand choice model that considers price receptivity. A feature of this study is that utility is plotted on two graphs using a Weibull distribution function for the partial utility of price. In one graph, utility decreases if the price is too low because of concerns about quality. In the other graph, utility increases as price decreases. The following is the product utility equation, given a consumer h buying a product i in the term t.
Vith = α i + β i Xit + f ( APi ) . f ( APi ) = η i δ i APi
δ i −1
(
exp − γ i APi
δi
(1)
)− C
i
.
(2)
αi is a particular constant of the product i. βi is a respondent parameter. X it is a binary variable, indicating whether the product i has POP advertisement in the term t. APi is the price rate of the product i.
δ i , γ i are shape parameters. η i is a parameter that is adjusted such that integration of the Weibull distribution function is 1. Ci is a constant that expresses the domain of negative numbers.
Development of a Price Promotion Model for Online Store Selection
249
3 Store Selection Model 3.1 Choice Probability Here we develop a store selection model using a multinomial logit model. We then consider the utility of each store. The following is the equation for the utility of the stores, when a consumer n buys a product j from a store i.
U ijn = Vijn + ε ijn
(3)
.
Vijn is the settled part of U ijn .
ε ijn is the probability part of U ijn . We presume that each
ε ijn obeys the same double exponential distribution for
independence. Then, the following is the equation for the probability in which a consumer n will buy a product j from a store i.
Pjn (i ) =
( ) ∑ exp(V ) . exp Vijn k
(4)
n kj
3.2 Utility Function We developed the store selection model referring to Kiuchi’s model. This allows us to include in the model consumer concerns related to online stores by using a Weibull distribution function for utility of price. The following is the equation for the utility of the stores when a consumer n buys a product I in the term t.
Vijn = α i + f (APij )
f (APij ) = ηiδ i APij
δ i −1
(5)
.
(
exp − γ i APij
δi
).
(6)
αi is a particular constant of the store i. APij is the price rate of the product j in the store i. δ i , γ i are shape parameters. η i is a parameter to adjust
restriction such that integration of the Weibull
distribution function is 1. 3.3 Estimation Method We used the maximum likelihood estimation as the estimation method. The following is the equation for the likelihood function L.
250
S. Hotta, S. Ishizu, and Y. Nagai I
J
N
i
j
n
L = ∏∏∏ Pjn (i )
yijn
(7) .
y ijn is a dummy variable for store selection. If a consumer n buys a product j in n
n
store i then y ij is 1, otherwise y ij is 0. The following is the equation for the logarithm likelihood function. I J N ⎡ ⎡ ⎤⎤ ln L = ∑∑∑ y ijn ⎢exp Vijn − ln ⎢∑ exp Vkjn ⎥ ⎥ i j n ⎣ k =1 ⎦⎦ . ⎣
( )
α i ,η i , δ i , γ i set α 3 to 0 when
( )
(8)
We find
when the logarithm likelihood function is maximized.
We
estimating, because
αi
is the relative difference for three
stores.
4 Application of Store Selection Model 4.1 Questionnaire We sent out questionnaires related to purchasing to test the store selection model. The questionnaire concerned home appliances purchased in January2010. The data covered 1,000 purchases. We received 100 valid responses covering 10 types of products. The three stores examined were Yodobashi Camera, Amazon.com, and Good Price. Yodobashi Camera is a famous Japanese general merchandising store for electronics. Amazon.com is a well-known online shopping site. Good Price is a relatively unknown online shopping site. We used two online stores in the questionnaire, because we assumed different results would be obtained from a comparison of a well-known online store and an unknown online store. 4.2 Estimation Table 2 shows the estimated parameters for each store. Table 2. Estimated parameters
Yodobashi Camera Amazon.com Good Price
αi
ηi
δi
γi
0.83
11.79
3.25
11.80
-0.14 0.00
20.16 11.55
2.37 5.62
11.05 27.75
Next, we consider estimated parameters and utility functions. We can analyze the change in utility for a given price rate by developing a store selection model, and we
Development of a Price Promotion Model for Online Store Selection
251
can calculate choice probabilities using Equation 2. We can therefore consider the effect of price promotion for each store by comparing utilities. First, we analyze the constants αi . In Table1, α1 is the highest of the three stores, because of Yodobashi Camera’s popularity with consumers.
α 2 is lower than α 3 , on
the assumption that consumers attitude towards Amazon.com as a source for buying home appliances is midway between those of Yodobashi Camera and Good Price. 4.3 Price Promotion Figure 2 shows a utility function graph for each store. Table 3 lists price rates on maximum utility for each store. In the hypothesis, utility decreases if the price is too low because of concerns about quality in online stores. This confirms the phenomenon in brick-and-mortar stores as well. The utility for each store is in a range from 0.80 to 1.00, showing that price promotion is not effective. Yodobashi Camera’s utility is higher than the others in a range from 0.45 to 1.00. Yodobashi Camera’s market share is high, and it has a large number of customers as compared with online stores. Thus, it is not necessary for Yodobashi Camera to use price promotions in a range from 0.45 to 1.00.
Fig. 2. Utility function graphs Table 3. Price rates on maximum utility
AP* Yodobashi Camera Amazon.com Good Price
0.42 0.29 0.53
Next, we consider the situation of online stores. Good Price has the lowest acceptable area of all the stores, because we assume that its popularity is very low and
252
S. Hotta, S. Ishizu, and Y. Nagai
consumers therefore trust it less. When the price rate is about 0.55, Amazon.com’s utility is equal to Yodobashi Camera’s utility. In addition, when the price rate is less than 0.40, Amazon.com’s utility is highest. When the price rate is less than 0.80 for Yodobashi Camera, it is ideal for each online store to approximate the price rates shown in Table 3, because the market share of online stores decreases. When the price rate is less than 0.80 for Yodobashi Camera.
5 Conclusion We developed store selection model by considering price promotion and sent out questionnaires related to purchasing to test the store selection model. The following conclusions were provided from a questionnaire. First, utility showed little change in a range from 0.80 to 1.00. Therefore, price promotion is not effective in that range. Second, when price rates are less than 0.80, it is ideal for each store to approximate the price rates shown in Table 3. In particular, if the price rate is less than 0.80 for Yodobashi Camera, online stores should reduce prices lest they be deprived of market share. Third, online stores must implement measures other than price promotion to gain greater market share. In this way, we consider price promotion by comparing each store’s utility. We hope that this knowledge contributes to the performance of price promotion for stores.
References 1. McFadden, D.: Conditional Logit Analysis of Qualitative Choice Behavior. In: Zarembka, P. (ed.) Frontiers in Econometrics, pp. 105–142. Academic Press, London (1974) 2. Takuya, K.: A Brand Choice Model with the Price Acceptability. In: JAMS Proceedings of the conference of the National, vol. 39, pp. 212–213 (2007) (in Japanese) 3. Hiroyuki, T.: Analysis of Price promotion for Variety-Seeking Behavior. JIMS 14(1), 61– 73 (2005) (in Japanese) 4. Shintaro, H.: Development of Price Promotion Model for Web Store Selection. In: JSQC Proceedings of the conference, vol. 40, pp. 73–76 (2010) (in Japanese) 5. Hiroshi, N.: A Brand Choice Model Considering Consumers’ Reference Price. The Behaviormetric Society 26(2), 78–88 (1999) (in Japanese)
Design Effective Voluntary Medical Incident Reporting Systems: A Literature Review Lei Hua1 and Yang Gong1,2 1
MU Informatics Institute and 2 Health Management and Informatics University of Missouri, Columbia, MO 65203
Abstract. Voluntary medical incident reporting systems (VMIR) are an application of information technology to support medical errors reporting for health professionals and thus ultimately improve healthcare quality and patient safety. The overall goal of this paper was to investigate the usage and effective design of VMIR by literature review. We expected to uncover design potentials from prior studies by examining on both incident reports analysis and system design, by which to establish a user-centered design framework that integrates identified factors for advancing VMIR effectiveness and efficiency. All papers regarding voluntary reporting system were identified through systematic electronic database searches. Three eligibility criteria were applied: 1) voluntary programs; 2) information system; 3) medical incident/error reporting. Of 8 eligible articles identified, the main themes are about current systems’ shortcomings on underreporting, report quality, standardized nomenclature/ taxonomy, communication, usability as well as reporting culture and environment. Eventually, all of identified concerns in the study will be addressed in a VMIR system prototyping process to attack the shortcomings aforementioned. Keywords: Medical Incident Reporting, User-centered Design, Information System.
1 Introduction With the suggestion of the Institute of Medicine (IOM)[1], an increasing number of US hospitals have implemented voluntary reporting systems to learn from error and prevent its reoccurring in the future. Since 2000, a growing number of quantitative and qualitative researches have been published regarding voluntary medical incident reporting system usage and design. As many other clinical information systems, the VMIR is struggling of effective design to overcome barriers on system acceptance and usage. Legislation, leadership support, blame culture and punitive environment, clinician involvement, and system usability are influential factors to the data quantity and quality of medical incident reporting systems. In this study, we investigated the usage and design concerns of G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 253–261, 2011. © Springer-Verlag Berlin Heidelberg 2011
254
L. Hua and Y. Gong
previous VMIR systems to identify technical contributing factors, with which the researchers can coordinate in a user-centered VMIR framework for removing barriers of voluntary medical incident reporting systems in a technical manner.
2 Method 2.1 Data Source Databases selected for literature searching were (1) Medline (1950-2010); (2) Compendex (1969-2010); (3) PsycINFO (1987-2010). Terms and keywords fell in three categories (voluntary participation, computer system, medical errors) for searching: • Voluntary programs (MeSH & “explode”), voluntary (Ei controlled vocabulary); • Information system (MeSH & “explode”, Ei controlled vocabulary), system analysis (MeSH & “explode”), system design, reporting system; • Medical errors (MeSH & “explode”), medical incident; The “explode” box of searching tool was ticked. It included all narrower terms under the MeSH terms listed above. The authors are also searching the reference lists to ensure all relevant articles to be properly reviewed. 2.2 Inclusion and Exclusion Criteria The article inclusion criteria were composed of: • • • • •
Voluntary system Medical incident/error reporting pertinent Computer-based system Practical studies regarding VMIR use and design Studies detailed with reported data statistics and system design discussions
Medical incident/error reporting is not a brand new territory. Many reporting systems were designed in paper forms, call center supported forms and computerized applications. Usage and design concerns on varied type of forms could manifest differently. Thus, the authors excluded literature regarding mandatory medical incident reporting system and non-electronic systems, as focusing on computer-based platform and voluntary use, Different from the comprehensive review of Holden & Karsh[2], we are more interested in potentials of system design improvement on a basis of analyzed reports. Therefore, the authors further excluded the papers that merely talked about pure data analysis than both of them.
Academic and general field, Ohio State University Health System[4]
Neonatal intensive care field, Vermont Oxford Network[5]
Intensive care field, in Johns Hopkins Hospital[9]
2004
2004
2005
2004
Clinical Fields Pediatric chemotherapy field in a Hospital[3]
Year
854 (July 1, 2002 June 30, 2003)
676 (28 weeks started from Oct. 22, 2001); Ratio: 14.6 16.2 events/week (122 beds); 15.1/week (207 beds) 1,230 (Oct. 4, 2000 Mar. 7, 2002, 17 months)
Reporting No. and Ratio 97 (Feb. 8, 2002 Mar. 9, 2003)
Severity: 25% minor harm, 1.9% serious harm, 0.15% death (673 reported harm) Others: contributory factors were failure to follow policy or protocol (47%), inattention (27%), communications problem (22%), error in charting or documentation (13%), distraction (12%), inexperience (10%), labeling error (10%), and poor teamwork (9%); 581 (47%) reports related to medications, nutritional agents (breast milk, formula, and parenteral nutrition), or blood products Severity: 21% led to physical injury, 14% increase ICU length of stay, the most are no harm Average time expense: 12 minutes 45 seconds
Reporters: physicians (10%), nurses (>50%) Average time expense: 7 minutes 40 seconds Others: statistically significant reduction both in event open time and management complete time proves efficiency improvement
Severity: 13% reached patients, 1% increased patient monitoring, 2% temporary harm Reporters: chemotherapy pharmacists (69%), floor nurses (31%) Others: no significant different on age, gender, race and residence between hospitalized incident and non-incident patient populations
Report Statistics
Table 1. Eligible VMIR Studies
TIU: home-made taxonomy for coding SAF: usability e.g. reduce free text entry and print option; feedbacks to individual and organization
TIU: Leape[6], Nadzam[7] and Kaushal[8] SAF: specialty-based system; anonymous reporting
Terms in Use (TIU) System Acceptance Factors (SAF) TIU: National Coordinating Council for Medication Error Reporting and Prevention SAF: leadership; project ownership; standard data definition; human factors; team dynamics; data and performance feedback; security and privacy TIU: already-familiar house language SAF: Usability enhancement; user classification and centered; access and security control; facilitate event followup
Design Effective Voluntary Medical Incident Reporting Systems: A Literature Review 255
Cardiothoracic Intensive care and post anesthesia care in BarnesJewish Hospital[11]
Anesthetic field (via mobile devices), Geelong Hospital[12]
General field, Brigham and :RPHQ¶V Hospital[14]
2005
2006
2009
2005
Clinical Fields General field, Osaka University Hospital[10]
Year
14,179 (May 2004 Nov. 2006, 31 months); Ratio: 20 reports/1000 inpatient days
156 (Aug. 2001 Feb. 2004); Ratio: 35 reports/1000 anesthetic procedures
Reporting No. and Ratio 6,041 (June 1, 2001 Mar. 31, 2004); Ratio:177 reports/month (1076 beds) 157 in total, 112 from ICU (Jan. 6, 2003 - Dec. 31, 2003) ; Ratio: 25.3 reported events/1000 patient-days(ICU)
Severity: 24% near misses, 61% adverse events but no harm, 14% temporary harm, 0.4% permanent harm, 0.1% death Reporters: Physicians submitted only 2.9% of the reports; most reports were submitted by nurses, pharmacists, and technicians Average time expense: 14 minutes, varies from incident type to type
Severity: 54% patient reached without harm, test/treatment/procedurerelated and medication were the 2 most frequently types of events contributing to patient harm Reporters: nurses (69%), physicians (19%), other staff (6%), anonymous (4%) Others: 20 patients (19%) have more than 1 event; the median number of days from hospital admission to the first event was 3 days; 3-fold increase in reporting ratio; identified cause and classification of event Severity: 46.2% near misses, 53.8% serious outcome anesthetic trainee Average time expense: 5 seconds Others: summarized categories and sub-classification for incident reporting with numbers of incidents and outcomes
Reporters: nurses(84.7%), physicians (10.2%), pharmacist(2.3%) Others: uncovered problems on computer prescription, intravenous administration of a high risk drug, and the manipulation of syringe pumps and blood transfusion according to reports analysis
Report Statistics
Table 1. (continued)
TIU: 8 anesthetic incident categories from literatures by 1999; Patient Safety International terms [13] SAF: nomenclature for critical incidents in health care; supportive and blame-free environment; timely and efficient feedback TIU: home-made category of incident types SAF: immediate response and reassurance; lack of time; ease of use
TIU: home-made taxonomy via coding SAF: voluntary, accessible, anonymous, and non-punitive; time tense and unsure what to report; classification and coding of events
Terms in Use (TIU) System Acceptance Factors (SAF) TIU: N/A SAF: anonymous and blame free; new organizational structure; education, system improvement and feedback;
256 L. Hua and Y. Gong
Design Effective Voluntary Medical Incident Reporting Systems: A Literature Review
257
2.3 Study Selection and Information Extraction The authors reviewed the titles and abstracts of the identified citations and applied a screening algorithm based on the inclusion and exclusion criteria described above. The two investigators rated each paper as “potentially relevant” or “potentially not relevant.” The authors collected the following information from each “potentially relevant” article: year of publication, clinical field, reporting amount and ratio, reported data statistics, controlled vocabulary/terminology/taxonomy in use, discussed contributory factors to system acceptance.
3 Results Comprehensive literature searches identified 80 articles: 69 in Medline, 6 in Compendex and 5 in PsycINFO. After reading the fully papers, 72 articles were excluded. Eight articles met the eligibility criteria as shown in Table 1[3-5, 9-12, 14]. These studies took place in the United States[3-5, 9, 11, 14], Australia[12] and Japan[10]. The studies on VMIR were not limited to a particular local or regional area but an international topic. Most of them shared a similar writing style to elaborate importance and difficulty at the beginning, followed by statistics on amount and ratio of reporting, and the distribution of reporters and event severities, and ended by discussion on VMIR design trends. It partially represented the homogenization and limitation of current VMIR researches. Moreover, all qualified articles were published after 1999, at the year when the IOM report released. They illustrated following facts across investigated studies: • VMIR still encounters underreporting but performs better than paper-based reporting systems. The ratio ranges from 0.5% to 3.5%, with comparing to the prevalence of adverse events in a range from 2.9% to 16.6% [15]. Seemingly, specialty-based systems received high rates than comprehensive systems. • Five specialty-based systems including three ICU based reporting systems, and the other three are hospital-wide comprehensive incident reporting systems. • The pyramids of severity of harm across the studies are similar. The majority of reported incidents caused no harm and severity increases along when rates decrease. • Except for a few studies [3, 12] that are designed for specialists, the majority of users are nurses. • The reduction on reporting time expense is unclear. A comprehensive report with multiple sections combing coded fields and free text entries often requires around 10 minutes [4, 9, 14], which largely vary from incident type to type. Comparatively, a single paper form or call center for incident reporting might even save a few minutes [16]. • The seven of eight studies explain their preference in choosing terminology or taxonomy. Three of them employ established works, and the rest produce their own coding system of incident categories and terms. • All articles make discussions on design concerns of VMIR, which suggested a design of given blame-free, usability enhancement and feedback, etc.
258
L. Hua and Y. Gong
4 Discussion Overall, all eight articles exhibited a variety of difficulties in designing and adopting VMIR for high-quality incident reports. It includes voluntariness, terminology/taxonomy/ nomenclature[12, 17, 18], blame-free environment and reporting culture[19], usability and utility concerns[20-22], feedback[23] and administrative issues. Voluntariness shared a controversial point of view in medical incident reporting system design. In several technology acceptance researches [21, 22, 24], it was identified as a negative factor to decline system use. In the case of low perceived voluntariness, where user felt that use of the system is mandatory, the system use will be more often[21]. However, voluntary systems are still dominant and more acceptable in incident reporting area than mandatory ones. The mandatory systems are often adopted in military area, and typically designed to identify “bad” practitioners and facilities with an emphasis on individuals and on the error itself, but not its correction[25]. Controlled vocabulary/terminology/taxonomy is a prevalent challenge, due to computerization in all domains requires semantic interoperability among human and computer systems. In fact, there are a number of medical incident taxonomies or concept frameworks available as candidates for the development of medical incident reporting systems. E.g. NCC MERP Taxonomy of Medication Errors (NCCMERP), JCAHO Patient Safety Event Taxonomy (PSET), JCAHO Sentinel Events Reporting (JSER), Taxonomy of Nursing Errors (TNE), a Preliminary Taxonomy of medical errors in Family Practice (PTFP), Cognitive Taxonomy of Medical Errors (COG), Taxonomy of Medical Errors for Neonatal Intensive Care (NIC), MedWatch Index (MEDWATCH), and the International Classification for Patient Safety (ICPS). These taxonomies or conceptual frameworks do not only guide what to report but can also provide an agreed-upon structure to error report data. Unfortunately, they are lacking of consistency in practice. It may impede the interoperability among different medical incident systems at a larger scope. Utility and usability are major technical issues influencing system acceptance. They refers to not only VMIR systems but also aviation error reporting[20], building management [24], knowledge management [21] and the other health information technology area[22]. and are even highlighted in Davis’ Technology Acceptance Model (TAM)[26] and Neilson’s System Acceptability Model[27]. For example to VMIR, users might ask for better data entry tools which are ease of use and the reported data are re-usable at the usefulness level. On the contrary, if the system design failed to deliver a periodical progress or achievement to satisfy users’ evolving requirements and expectations on system performance in a timely manner, the users might feel frustration and even stay away from current usage to seek any alternatives. Feedback between reporters and expert reviewers is expected to encourage reporting, educate clinicians and notify corrective actions taken[2]. Discussed in all investigated articles, it was initially proved in support of reducing report open and complete time [4]. In view of communication science, feedbacks that meet users’ expectation or provide an perceived benefits will bridge sense-making or sense-giving gaps to encourage incident reporting activities of target users. By considering the above concerns, a computer-based prototype of VMIR has been under development since 2009 [28]. The authors reviewed the latest design
Design Effective Voluntary Medical Incident Reporting Systems: A Literature Review
259
suggestions in medical incident reporting area which are based upon and beyond Holden & Karsh’s work in 2007[2]. As a result, only three additional papers were identified and organized with the prior in Table 2 to complement system prototyping based on our previous studies [28-30]. By synthesizing the above works, we set up several objectives on design of our target VMIR system, by which the two major barriers of underreporting and low quality reports could be properly addressed. • Consider specialty-based incident reporting design to VMIR • Feedback at the various levels to a variety of stakeholders, especially to report submitters as encouragement • Increase the usage of mobile devices in incident reporting • Incorporate data sharing functions • Encourage reporting more details in aspect of incident process than outcome • Check value validity of data fields that easily encountered typos • Add functional aids (e.g. shortcut buttons) for data field entry if it is statistically possible • Set up prompts in reminding incident details that are important but were often overlooked in previous reports However, the order of dealing with above issues has not been determined in that we still know little about information behavior model in medical incident reporting and whether they will be sense-making to real user in the practice. Nevertheless, we believe an iterative process of system prototyping is able to figure this problem out step by step. Table 2. Recent design suggestions of VMIR
Design suggestions for VMIR Specialty-based; Feedback to encourage reporting, educate clinicians and notify corrective actions taken Handheld computer application narrowing down participation biases Reinforce process-oriented than outcome-oriented in reporting The group level data sharing might prompt error reporting rate significantly
Literatures Holden & Karsh, 2007[2]
Dollarhide, Rutledge, Weinger, & Dresselhaus, 2008[31] Nuckols, Bell, Paddock, & Hilborne, 2009[32] Anderson, Ramanujam, Hensel, & Sirio, 2010[33]
5 Current Efforts The computer-based prototype of VMIR has gone through its initial usability inspection[28] and testing on system interface. The usability violations identified by heuristic evaluation were partially fixed according to the severity. The latest prototype is undergoing a think aloud user testing by five human subjects who are target and
260
L. Hua and Y. Gong
real users to identify cognitive difficulties in using the prototype of reporting patient fall incident. Simultaneously, an unobtrusive data analysis on historical reports is in process, which selects patient fall as a demonstrative incident category to extract reprehensive features and indexing vocabulary for (semi-)structuring free text entries in reports. The initial work of this process has been accepted and in press since 2010 summer[30]. Furthermore, we are transplanting a few theories in Information Science and Communication Science to collaborate with technical solutions for bridging sense-making gaps of organization and stakeholder individuals. Acknowledgements. This project was in part supported by Richard Wallace Research Incentive Awards at the University of Missouri.
References 1. Kohn, L.T., Donaldson, M.S., Corrigen, J.: To err is human: building a safer health system. Report of Committee on Quality of Healthcare in America. Institute of Medicine, National Academy of Science (1999) 2. Holden, R.J., Karsh, B.T.: A review of medical error reporting system design considerations and a proposed cross-level systems research framework. Human Factors 49(2), 257–276 (2007) 3. France, D.J., et al.: Improving pediatric chemotherapy safety through voluntary incident reporting: lessons from the field. Journal of Pediatric Oncology Nursing 21(4), 200–206 (2004) 4. Mekhjian, H.S., et al.: Development of a Web-based event reporting system in an academic environment. Journal of the American Medical Informatics Association 11(1), 11–18 (2004) 5. Suresh, G., et al.: Voluntary anonymous reporting of medical errors for neonatal intensive care. Pediatrics 113(6), 1609–1618 (2004) 6. Leape, L.L., et al.: Preventing medical injury. QRB Qual. Rev. Bull. 19(5), 144–149 (1993) 7. Nadzam, D.M.: Development of medication-use indicators by the Joint Commission on Accreditation of Healthcare Organizations. Am. J. Hosp. Pharm. 48(9), 1925–1930 (1991) 8. Kaushal, R., et al.: Medication errors and adverse drug events in pediatric inpatients. JAMA 285(16), 2114–2120 (2001) 9. Holzmueller, C.G., et al.: Creating the web-based intensive care unit safety reporting system. Journal of the American Medical Informatics Association 12(2), 130–139 (2005) 10. Nakajima, K., Kurata, Y., Takeda, H.: A web-based incident reporting system and multidisciplinary collaborative projects for patient safety in a Japanese hospital. Quality & Safety in Health Care 14(2), 123–129 (2005) 11. Nast, P.A., et al.: Reporting and classification of patient safety events in a cardiothoracic intensive care unit and cardiothoracic postoperative care unit. Journal of Thoracic & Cardiovascular Surgery 130(4) (2005) 12. Freestone, L., et al.: Voluntary incident reporting by anaesthetic trainees in an Australian hospital. International Journal for Quality in Health Care 18(6), 452–457 (2006) 13. Glossary of Terms: Patient Safety International (2004) , http://www.patientsafetyint.com/Glossary.aspx (accessed August 2006)
Design Effective Voluntary Medical Incident Reporting Systems: A Literature Review
261
14. Levtzion-Korach, O., et al.: Evaluation of the contributions of an electronic web-based reporting system: enabling action. Journal of patient safety 5(1), 9–15 (2009) 15. Murff, H.J., et al.: Detecting adverse events for patient safety research: a review of current methodologies. J. Biomed. Inform. 36(1-2), 131–143 (2003) 16. Evans, S.M., et al.: Evaluation of an intervention aimed at improving voluntary incident reporting in hospitals. Quality & Safety in Health Care 16(3), 169–175 (2007) 17. Nagamatsu, S., Kami, M., Nakata, Y.: Healthcare safety committee in Japan: mandatory accountability reporting system and punishment (Review) (42 refs). Current Opinion in Anaesthesiology 22(2), 199–206 (2009) 18. Vozikis, A.: Information management of medical errors in Greece: The MERIS proposal. International Journal of Information Management 29(Compendex), 15–26 (2009) 19. Waring, J.J.: Beyond blame: cultural barriers to medical incident reporting. Social Science & Medicine 60(9), 1927–1935 (2005) 20. Barach, P., Small, S.D.: Reporting and preventing medical mishaps: lessons from nonmedical near miss reporting systems. BMJ 320(7237), 759–763 (2000) 21. Clay, P.F., Dennis, A.R., Ko, D.-G.: Factors affecting the loyal use of knowledge management systems. In: 38th Annual Hawaii International Conference on System Sciences, January 3-6 2005, Institute of Electrical and Electronics Engineers Computer Society, Big Island, HI, United states (2005) 22. Kijsanayotin, B., Pannarunothai, S., Speedie, S.M.: Factors influencing health information technology adoption in Thailand’s community health centers: Applying the UTAUT model. International Journal of Medical Informatics 78(Compendex), 404–416 (2009) 23. World Alliance for Patient Safety. WHO draft guidelines for adverse event reporting and learning systems (2005) 24. Lowry, G.: Modelling user acceptance of building management systems. Automation in Construction 11(Compendex), 695–705 (2002) 25. Cohen, M.R.: Why error reporting systems should be voluntary. BMJ 320(7237), 728–729 (2000) 26. Davis, F.D.: Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly, 13(3), 319–340 (1989) 27. Nielsen, J.: Usability engineering, 362 p. Morgan Kaufmann Publishers, San Francisco (1994) 28. Hua, L., Gong, Y.: Developing a User-centered Voluntary Medical Incident Reporting System. Stud. Health Technol. Inform. 160, 203–207 (2010) 29. Gong, Y.: Data Consistency in a Voluntary Medical Incident Reporting System. Journal of Medical Systems (2009) 30. Gong, Y.: Terminology in a Voluntary Medical Incident Reporting System: a HumanCentered Perspective. In: ACM International Health Informatics Symposium (2010) 31. Dollarhide, A.W., et al.: Use of a handheld computer application for voluntary medication event reporting by inpatient nurses and physicians. Journal of General Internal Medicine 23(4), 418–422 (2008) 32. Nuckols, T.K., et al.: Comparing process- and outcome-oriented approaches to voluntary incident reporting in two hospitals. Joint Commission Journal on Quality & Patient Safety 35(3), 139–145 (2009) 33. Anderson, J.G., et al.: Reporting trends in a regional medication error data-sharing system. Health Care Management Science 13(1), 74–83 (2010)
Technology-Based Decision-Making Support System Hanmin Jung, Mikyoung Lee, Pyung Kim, and Won-Kyung Sung KISTI, 52-11 Eueon-dong, Yuseong-gu Daejeon, KOREA 305-806
[email protected]
Abstract. This paper describes a decision-making support system focused on technologies, R&D agents, and R&D results. To deal with heterogeneous literatures and metadata, we introduce text mining and Semantic Web-based service platforms. InSciTe, a decision-making support system developed by us, provides a through process including analysis as well as ETL, verifies search and analysis results, connects its information with Semantic Web open sources in the level of RDF, and generates automatic summary reports. This system is significant in the sense that it has been implemented about a year earlier than similar projects such as CUBIST and FUSE. Keywords: Decision-Making Support, Text Mining, Semantic Web, Technology Intelligence Service.
1 Introduction Thanks to the dramatic development of the web, the range of the use of information in information services based on technological information likewise is changing gradually. Information is usually represented as metadata which can be defined data about data. There are two kinds of metadata; explicit metadata as formal documentation of resources and implicit metadata as automatic creation of description of resources. While, in the past, explicit metadata were loaded on to information analysis tools such as VantagePoint and Thomson Data Analyzer [1] [2], and human analysts gave insights to support decision-making processes under a clear goal, recent projects such as CUBIST, CTI, and FUSE have tended toward the additional use of implicit metadata hidden in documents by using automatic language processing techniques such as text mining, information extraction, and semantic annotation [3] [4] [5]. Such a tendency stems partly from the dramatic increase in the Web documents, but neither explicit metadata nor implicit metadata is sufficient in supporting decision-making processes. Though we know implicit metadata is a very useful resource to get insights, it is difficult to discover meaningful data, to transfer to structured information, and to further merge with explicit metadata in a uniform way. A case study on ACRC 1 shows this difficulty clearly. Only 23% of civil complaints are currently analyzed by human analysts, and no more to be interpreted only by humans to be of any use. Thus 1
The Anti-Corruption & Civil Rights Commission.
G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 262–267, 2011. © Springer-Verlag Berlin Heidelberg 2011
Technology-Based Decision-Making Support System
263
the results produced by the analysts are far from a comprehensive and systematic analysis. It also often leads mistrust on the results due to a large variation in its quality which is mainly caused by difference in the experience levels of analysts.
Fig. 1. Process for Transferring Implicit Metadata into Structured Information2
We would like to describe how explicit and implicit metadata are merged in an information infrastructure to implement decision-making support system. Semantic Web technologies such as ontology model and reasoning were introduced to provide such infrastructure, and text mining technology to acquire structured information from textual documents. The system includes a lot of service components which interact with search and reasoning engines.
2 System Description The growth and availability of data and, therefore, our need to consider it in decisionmaking and planning is growing exponentially [6]. Thus, the purpose of this study lies in supporting decision-making, especially for establishing R&D strategies, by adopting ontologies, implicit metadata, and explicit metadata based on technologyoriented information acquired from patents and academic papers, and using a Semantic Web-based service platform to implement them as an integrated system called InSciTe3 [7]. To achieve this goal, the following preconditions are necessary: • Bring relevant information and analytics together and present them so as to help human decision-makers improve the quality of their decisions. • Require ETL (extract, transform, and load) on different kinds of literatures such as patents and academic papers, and further resolve semantic differences as well as syntactic differences. 2 3
http://www.monrai.com/products/cypher/img/ad-framework.gif Intelligent Science and Technology Service.
264
H. Jung et al.
• Use semantic annotations and Semantic Web-based service platform to perform the ETL process effectively. In the decision-making process for the establishment of R&D strategies, the key elements are technologies, R&D agents (researchers, institutes, and nations), and R&D results (patents and academic papers). In particular, as a medium that links R&D agents and R&D results, technologies serve to determine the direction of the establishment of R&D strategies. This study focuses on effectively supporting decision-making by extracting, from academic papers and patents, the relationships among technologies and, with them at the center, analyzing the activities of R&D agents from the perspectives of cooperation and competition. InSciTe, a technology intelligence4 service developed by us, considerably differs from VantagePoint, which is representative information, especially for patents and academic papers, analyzer, in data size, analysis cost, user expertise, and focus strength. A detailed comparison of the two is shown in Table 1. As shown in fig. 2, InSciTe lies on ‘Decision Support’ level. The level requires a text mining platform for extracting meaningful information from scientific literatures as one of preceding tasks and a service platform which can deal with both explicit and implicit metadata. We use SINDI5 (see Fig. 3) as a tool to acquire technical terms and relations among them. The major functions of InSciTe can be summarized as follow: • Providing a through process including analysis as well as ETL6 by combining text mining and Semantic Web technologies • Verifying search and analysis results by applying reasoning verification to show how the results were induced • Connecting with Semantic Web open sources in the level of RDF for enhanced information accessibility (e. g., Linked Data, Open Calais) • Blending heterogeneous sources for multi-faceted explanation (e.g. academic view vs. business view) • Generating summary reports through automatic processing with hierarchical condition check (e.g. technology report and institution report) The data set used to implement InSciTe encompasses about 336,000 academic papers, over 400,000 patents, and about 70,000 technological terms in the field of green technology. They are loaded into the following ontology model. It includes a Topic class representing technical domain, Person, Institution, and Nation classes 4
Technology intelligence can be defined as an activity that enables companies to identify the technological opportunities and threats that could affect the future growth and survival of their business (http://en.wikipedia.org/wiki/Technology_intelligence). It slightly differs from competitive intelligence which widely covers the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors and any aspect of the environment needed to support executives and managers in making strategic decisions for an organization (http://en.wikipedia.org/wiki/Competitive_intelligence) in that technology intelligence focuses on technology-oriented analysis rather than overall analysis. 5 Scientific INformation DIscovery Platform. 6 Extract, Transform, and Load.
Technology-Based Decision-Making Support System
265
denoting research agents, and Patent and Article classes representing research results, and consists of 17 classes, 57 datatype properties, and 37 object properties. Table 1. Detailed Comparison of InSciTe and VantagePoint InSciTe
VantagePoint
- tens of millions of records
- 20,000 records
Planners, experts, chief officers, et al.
Analysts, consultants
DB
Metadata, full-text (DB2OWL)
Bibliographic database (import filter)
Dimension of analysis
Multi-dimensional
2-dimensional (co-occurrence matrices, maps, and networks)
Text mining level
Entity/relation extraction
Keyword extraction
Service type/method
Canned services Pull and push services
DIY, scripting Pull services
Web-based
Stand-alone
Ontology model
Expectancy value Bernoulli process
Data size Target users
Type Others
using
the
Fig. 2. Value Pyramid with the Difficulty Level of Implementation (Modified from [8])
266
H. Jung et al.
Fig. 3. SINDI Platform
Fig. 4. Ontology Model for InSciTe
As for representative services that support decision-making, there are: Agent/Technology Map, which shows the correlation between R&D agents and related technologies with respect to a given technology from the perspectives of cooperation and competition (see Fig. 5); Technology Trends Map, which shows R&D direction of representative agents for a given technology; Agent Network, which shows R&D agents’ cooperation behaviors on both abstract and concrete
Technology-Based Decision-Making Support System
267
levels; and Trends/Institute Summary Report, which automatically summarizes technological trends and research institute’s information on R&D.
Fig. 5. InSciTe Service Screen (Left: Agent/Technology Map, right: Agent Network)
3 Conclusion This paper showed the trends of information service and emphasized the importance of the use of both explicit and implicit metadata for supporting decision-making. Our decision-making support system consists of two platforms to achieve the goal; text mining platform and Semantic Web-based service platform. We have several future works such as constructing and getting test suites for sharable test environment and evaluating usefulness of this system targeting human decision makers and research scientists.
References 1. http://www.thevantagepoint.com/ 2. http://thomsonreuters.com/products_services/legal/legal_produ cts/intellectual_property/Thomson_Data_Analyzer 3. http://kn.theiet.org/news/sep10/cubist.cfm 4. http://cisti-icist.nrc-cnrc.gc.ca/eng/ibp/cisti/newsletters/ cisti-news/2009april.html 5. http://www.iarpa.gov/solicitations_fuse.html 6. McComb, D.: The CIO’s Guide to Semantics v3 (2009), http://semanticarts.com/wordpress/wp-content/uploads/2011/01/ The%20CIO’s%20Guide%20to%20Semantics%20v3.pdf 7. Lee, S., Lee, M., Kim, P., Jung, H., Sung, W.-K.: OntoFrame S3: Semantic Web-Based Academic Research Information Portal Service Empowered by STAR-WIN. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T., et al. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 401–405. Springer, Heidelberg (2010) 8. Bousfield, D., Fooladi, P.: Scientific, Technical & Medical Information: 2009 Final Market Size and Share Report, Outsell Inc. (2010)
Economic Analysis of SON-Enabled Mobile WiMAX Seungjin Kwack1, Jahwan Koo2, and Jinwook Chung3 1
NMS Lab, Network Department, Telecommunication Devision, Samsung, Korea 416, Meatan-3Dong, Yeongtong-Gu, Suwon-Si, Gyeonggi-Do, Korea
[email protected] 2 R&D Center of Mirai Huson, 2FL, Dasung Bldg, 9-3 Jamwon-dong, Seocho-gu, Seoul, Korea
[email protected] 3 Dept. Of Information and Communication Engineering, Sungkyunkwack University, 300 ChunChun-Dong JangAn-Gu, Suwon, Korea
[email protected]
Abstract. Even though mobile communication traffic continues to grow fast, its revenue does not increase sufficiently. Hence, mobile communication network operators are seeking technologies and strategies to maintain qualitative network services and reduce the Operational Expenditure (OpEx). Self Organization Network (SON) technology is one of the effective solutions to reduce OpEx. This paper presents the economic analysis of SON-enabled Mobile WiMAX. We define Network Propagation Models (NPM), introduce feasible SON use cases that can reduce the OpEx efficiently, select OpEx factors that can be reduced by deploying SON use cases, and propose mathematical expressions for revenue, CapEx, OpEx, CF, DCF and NPV. For the analysis, we consider a sample site and perform its cost and financial analysis through comparisons before and after SON deployment. As a result, 69% and 89% of total OpEx are decreased at newly added sites and traditional sites, respectively. Moreover, profits are increased earlier. Finally, SON technology can be achieved substantial OpEx reductions in network operational tasks. Keywords: Mobile WiMAX, Capital Expenditure (CapEx), Operational Expenditure (OpEx), Net Present Value (NPV), Self Organization Network (SON).
1 Introduction The volume of mobile data traffic is increasing rapidly. Not only because the consumption of mobile devices more specialized and specified towards the operators’ richer mobile-service offerings, such as mobile entertainment, multimedia services and enterprise services, is increasing rapidly but mobile communication networks are deployed widely. Figure 1 shows the relation of growth of data traffic and revenue that does not increase as much as growth of data traffic and the mobile communication trend are changing from voice to data dominant. That indicates mobile communication G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 268–277, 2011. © Springer-Verlag Berlin Heidelberg 2011
Economic Analysis of SON-Enabled Mobile WiMAX
269
network service providers need more and more expenses of Capital Expenditure (CapEx) and Operational Expenditure (OpEx) to guarantee compelling QoS and operators’ richer requirements. In mobile communication networks, revenue highly depends on its operational efficiency, for OpEx accounts to generally 24% of a typical wireless operator’s revenue [1]. Mobile communication network service providers are seeking technologies and strategies to increase their revenue by reducing OpEx substantially. Self Organization Network (SON) technology is introduced not only for OpEx reduction by diminishing human involvement in network operational tasks but for optimizing network efficiency and service quality [2].
Fig. 1. Mobile traffic volume and revenue (Lighting Reading, 2010)
This paper describes a cost and financial analysis for the SON-enabled Mobile WiMAX using Network Propagation Model (NPM) defined in chapter 3. Cost analysis includes revenue, CapEx and OpEx. Financial analysis includes cash flow (CF), Discount Cash Flow (DCF) and Net Present Value (NPV).
2 Relation Works 2.1 End to End Mobile WiMAX Network Architecture Mobile WiMAX services are required the installation of Access Service Network (ASN) and Connectivity Service Network (CSN) facilities to cover the cell sites. Figure 2 a) illustrates the physical topology of Mobile WiMAX. This topology mainly includes the following parts; User Terminals (MS), ASN and CSN. ASN provides means to connect mobile subscribers using OFDMA air link to IP backbone with session continuity [3]. The CSN represents a set of network functions providing IP connectivity services to WiMAX subscribers. Figure 2 b) illustrates the IP-based Mobile WiMAX architecture [4]. If CSN is installed on, that will nearly changed for a period time and needs some core network devices for connectivity services. But if ASN, it will need a large number of network access devices which are movable and much changeable than CSN to guarantee
270
S. Kwack, J. Koo, and J. Chung
outdoor coverage. So ASN expends operational cost more than CSN. If SON technologies are deployed in ASN, sizable OpEx can be reduced. Therefore, we will include the ASN and MS parts for financial analysis of the SON enabled Mobile WiMAX.
(a) Physical topology
(b) IP–based architecture
Fig. 2. Mobile WiMAX architecture
2.2 Introduction to SON The main objective of SON is to achieve substantial OpEx reductions by diminishing human involvement in network operational tasks and to optimize network capacity, coverage and service quality. According to the SOCRATES (Self-Optimisation and self-ConfiguRATion in wirelEss networkS) project, SON is introduced selforganization, self-optimization (SO), self-configuration (SC) and self-healing (SH). The general idea is to integrate network planning, configuration, and optimization into a single, mostly automated process requiring minimal manual intervention as shown in Fig.3 [5]. Detailed issues of SON are SC, SO and SH. First, SC function enables fast installation and deployment of mobile WiMAX system, and makes to reduce human involvement and deployment time e.g. SC mechanism without dedicated backhaul interfaces. [5] Second, SO function is crucial for the operational state of mobile networks. SO function is to optimize their operational algorithms e.g. MLB algorithm [5] and parameters of antenna, resource management, power setting etc. And scenarios of SO are introduced in response to changes in network, traffic and environmental condition [6]. Third, SH function assists network operators in recovering a network when it collapses and sudden failures due to some unexpected reason. It will reduce OpEx and make Mobile WiMAX system stable. But mobile communication networks have complex systems with a multitude of vendor control mechanisms and time varying parameters. These are intricate for interdependencies among these control mechanisms and parameters. So design of effective and dependable SON function has main challenges for these complexities.
Economic Analysis of SON-Enabled Mobile WiMAX
(a) Without SON
271
(b) With SON
Fig. 3. Network operations
This paper analyses the Mobile WiMAX network using SON, financially and costly. 2.3 Cost Analysis Cost analysis contains Revenue, CapEx and OpEx. These factors are well-known in accounting and economics. We describe that factors included to calculate or analysis, as follows [12]. Revenue is income that a company receives from its network business activities during a period t. Revenue includes call termination charges, the number of users etc. CapEx is the most closely-watched metric for determining the direction and level of investment that telecommunications carriers are making in network equipment and services. The CapEx includes costs for towers, network equipment, spectrum licensing etc. OpEx is operational expense of maintaining cost in time period t and includes salaries of employee, cost of power etc. 2.4 Financial Analysis CF is a basic financial measure of a company’s financial health that equals cash receipts minus cash payments over a given period of time. In this paper, it is measured during a specified, finite period of time using for calculating CapEx, OpEx and revenue. NPV is well used in capital budgeting to analyze the profitability of an investment or project. NPV analysis compares the value of money today to the value of money in the future. If the NPV of a perspective project is positive, it should be accepted. However, if the NPV is negative the project should be rejected because cash flows
272
S. Kwack, J. Koo, and J. Chung
will also be negative i.e. the project will lose money. A NPV equal to zero indicates that the project would provide no overall profit and no loss.
3 Financial Analysis of Mobile WiMAX 3.1 Definition of NPM To obtain realistic figures, we define a scenario model, NPM for Mobile WiMAX. NPM has three types; first is Fixed Expansion Network Operation Type (FENOT) that means operation of Mobile WiMAX in specific geographic area which has no increase, during a period of time. Second is Linear Expansion Network Operation Type (LENOT) that means operation of Mobile WiMAX in geographic area which has linear rate of increase, during a period of time. Third is Exponential Expansion Network Operation Type (EENOT) that means operation of Mobile WiMAX in geographic area that has exponential rate of increase, during a period of time. Figure 4 illustrates the LENOT, at initial year; network provider will make an operation of Mobile WiMAX service in an Operational Model, which means a single geographical area. After second year, Mobile WiMAX services will produce in geographic areas with a linear rate of increase. Table 1 defines the symbols of NPM types. ‘OM’ means a unit network type that means a single geographic service area. Table 2 defines propagation types of NPM, describes the characteristics of geographic service area of NPM, during a period of time, n.
Fig. 4. Linear expansion network operational type model Table 1. Symbol definition of network expansion type Symbols OM FENOT LENOT EENOT
Description Operational Model Fix Expansion Network Operation Type Linear Expansion Network Operation Type Exponential Expansion Network Operation Type Table 2. Definition of network expansion types
FENOT FENOT = OM
LENOT LENOT = n·OM
EENOT n EENOT = 2 ·OM
Economic Analysis of SON-Enabled Mobile WiMAX
273
3.2 Selection of SON Use Cases SON Use Cases and Standardization We need to know SON use cases and processes to evaluate the effectiveness of OpEx reduction. According to Network Working Group (NWG) in WiMAX forum release 1.6 and 802.16m, the SON technology of Mobile WiMAX is defined only for Femto cell [7]. And we can expect that the SON technologies of Mobile WiMAX will follow the LTE SON standardization. In this paper, we analyze the SON technology of 3GPP standardization. And documents of 3GPP TR 36.902 show the SON concepts and TS 32.500 show the SON use cases [8]. Introduction to SON Use Cases In 3GPP TR 36.902, SON use cases are introduced as (1) coverage and capacity, (2) Energy saving, (3) Interference reduction, (4) Automated configuration of physical cell Identity, (5) Mobility robustness optimization, (6) Mobility load balancing optimization, (7) RACH load optimization, (8) Automatic neighbor relation function, (9) Inter-cell interference coordination [9]. Because detail SON usages are not fully defined in 3GPP and there are few deployment of SON technology in commercial network products. Therefore we choose SON use cases that are expected to reduce OpEx efficiently as (1) Coverage and capacity optimization, (2) Energy saving, (3) Mobility load balancing optimization, (4) Inter-cell interference coordination. Because these use cases are efficient to reduce management services that has almost 50% more portion of OpEx. And to obtain realistic figures, we have defined a business case for a Mobile WiMAX rollout in Iran. We choose a sample site in Iran, and analyze the OpEx major components which consisted (1) Managed Service with filed operation and maintenance and radio network management, (2) Optimization Service during Service period, (3) Site survey and Design for ACR, (4) Network Optimization Service for RAS, (5) RF Planning Service. 3.3 Analysis Methods Revenue We define the revenue, R(t) during a period of time. In this paper, we calculate the revenue with NPM. For FENOT, we assume initial revenue and Revenue Growth Rate (RGR). And we calculate the revenue using multiplying revenue and RGR as follows.
R(0)= InitialRevenue
(1)
R(t ) = Revenue(t - 1) × RGR(t)
(2)
For LENOT, revenue is separated into two cases; the revenue of a geographic site which already produced Mobile WiMAX services and the revenue of newly added sites. First, a site of LENOT is calculated by multiplying the RGR and revenue at each year. Second newly added sites can be calculated by multiplying the initial revenue, RGR and number of sites which are growing linearly.
274
S. Kwack, J. Koo, and J. Chung
R(0)= InitialRevenue
(3)
R(t) = Revenue(t - 1) × RGR(t) + R(0) × LinearExpendedSites
(4)
For EENOT, revenue also is separated into two cases: Operated sites using LENOT calculation method and newly added sites using calculation method as follows.
R(0)= InitialRevenue
(5)
R(t) = Revenue(t - 1) × RGR(t) + R(0) × ExponentialExpendedSites
(6)
CapEx is under consideration of installation fees, number of users and cost of ASN G/W, BTS, Installation Materials, M/W & Transmission System, Power Supply Unit, CPE, Engineering Services and Civil Work. Then we calculate as follows [9]. CapEx = CapExEquipment + CapExCPE + CapExCivilWork
(7)
CapExEquipment = ∑ CapEx( ACR + BTS + PowerSupply + Microwave + Trasmission)
(8)
CapExCPE = ∑ CapEx (Outdoor + Indoor + MultiIndoor + USBDongle)
(9)
CapExCivilWork = ∑ CapEx( Rooftop + Greenfield + Shelter + Installation )
(10)
OpEx is under consideration cost of Project Management Service, Training Service, Managed Service and Technical Support, ASN G/W, NMS, Engineering Services. Then we calculate as follows. OpEx = OpExManagement + OpExTraining + OpExEngineering
(11)
OpExManagement = ∑ OpEx( project + NMS + Shelter )
(12)
OpExEngineering = ∑ OpEx (TestCost + SiteSurvey + NetOptimize + RFPanning )
(13)
CF equals cash receipts minus cash payments over a given period of time. In this paper, CF are calculated to three cases; FENOT, LENOT, EENOT. In FENOT case, OpEx is unchangeable for just keeping the operating sites. And we assume that CapEx is fixed for the initial cost and also assume the calculation method as follows [10]. CF (t ) = Revenue(t ) − CapEx (1) − OpEx (t )
(14)
In LENOT and EENOT cases, CapEx and OpEx will increase for newly added sites. So we assume the calculation method as follows. CF (t ) = Revenue(t ) − CapEx(t ) − OpEx(t )
(15)
Economic Analysis of SON-Enabled Mobile WiMAX
275
DCF is the major factor for calculating the NPV. After calculating the NPV value with growing discount rate 3%, 6%, 9%, 12%. We find out that there is nearly effect to NPV value. So we assume the DCF to 10% and assume the calculation method as follows [11]. DCF (t ) =
P (t , p ) (1 + r )t
(16)
NPV is the future stream of benefits and costs converted into equivalent values today. Economic measures based on the NPV are defined to assess the financial viability of potential network designs [13]. The NPV is used within the mathematical optimization framework to produce cost-effective deployments that maximize economic performance while maintaining technical constraints on the network. We assume the NPV calculation method as follows [11].
NPV =
NetworkPeriods
∑
DCF (t )
(17)
t =1
4 Financial Analysis Results of Mobile WiMAX We choose a sample site in Iran to deploy at the NPM and to analyze the financial factors; Revenue, CapEx, OpEx, NPV etc. and we will deploy the SON algorithm to the same sample site and will compare the result of before and after SON. 4.1 Cost Analysis The result of cost analysis. We assume the cost analysis factors with revenue, CapEx, OpEx. The cost factors are defined as follows. The Analysis of Total CapEx Result After analysis of the total CapEx result, equipment cost and installation fees account for 70% for the total CapEx. In this paper, we assume the CapEx to initial cost, because we just are interesting the OpEx reduction. The Analysis of Total OpEx Result After analysis of the OpEx calculation result, the 51.67% OpEx is spent on ‘Managed Service’ and ‘Technical Service’ of the total OpEx as shown in Fig 5. And ‘Managed Service’ includes ‘Field Operation and Maintenance’, ‘Project Management’, ‘Radio Network Management’ and ‘Optimization Management’. We find out that network service providers spent most expense for the network management and optimization. In this paper, we assume that ACE S/W, WSM S/W costs are included in initial OpEx. And others are included in maintenance expense. RGR; we assume RGR following Gaussian distribution, with a standard deviation of 64% as follows to calculate the revenue. The revenue is steadily declining during first five years, increasing rapidly from 5 to 10 years and decreasing from 10 to 15 years.
276
S. Kwack, J. Koo, and J. Chung
4.2 Cost Analysis of SON-Enabled Algorithms
After analysis, we find that initial OpEx of new site is decreased about 69% of the total OpEx after deploying and traditional network OpEx is decreased about 84%. 4.3 The Financial Analysis of SON-Enabled Algorithms
We find that CF and revenue curves, Fig. 5 of both FENOT and LENOT are following the Gompertz model. FENOT provides positive NPV and payback in less than 5 years. And LENOT provides positive NPV and payback in fewer 8 years. In other words, when network service provider can have profit after 5 year using FENOT and after 8 year using LENOT. But if they use EENOT, it will be not profitable over the time-period 15 years. Because they spend excessive expenses CapEx and OpEx more than revenue at initial time-period. And these will be accumulated during the total time-period.
FENOT
LENOT
Fig. 5. Comparison of Financial Analysis SON enabled and disabled
5 Conclusion Network operators have managed multi vendor equipments for flexible network services and spent much OpEx for various vendors, heterogeneous network and plenty of network elements. SON is announced to reduce OpEx reduction by diminishing human involvement in network operational tasks. In this paper, we choose the factors to reduce OpEx by using SON algorithms and choose the SON use cases feasible. We analyze cost and financial benefits. As a result, 69% and 89% of total OpEx are decreased at newly added sites and traditional sites, respectively. Moreover, profits are increased earlier. That is, network service providers can have network profits 3 years earlier when using FENOT than other network expansion types. If network service providers use SON technologies, it must be beneficial to deploy Mobile WiMAX services. But there are few cases deployed SON-enabled commercial network, so we need to do research more about the realistic deployment of SON.
Economic Analysis of SON-Enabled Mobile WiMAX
277
References 1. Self Organizing Network. NEC’s proposals for next-generation radio network management. NEC Corporation (2009) 2. SOCRATES. Self-Optimisation and self-ConfiguRATion in wirelESs Networks. SOCRATES project consortium, Netherlands (2008) 3. Ergen, M.: The Access Service Network in WiMAX: The Role of ASN-GW. mustafaergen.com (2008), http://www.mustafaergen.com/asn-gw.pdf 4. Lin, L., Chen, K.-C.: Mobile WiMAX: Introduction to Mobile WiMAX. In: Chen, K.-C., de Marca, J.R.B.(eds.), p. 10. John Wiley & Sons Ltd, Chichester (2008) 5. Hu, H., Zhang, J., Zheng, X., Yang, Y., Wu, P.: Self-configuration and Self Optimization for LTE Networks. IEEE Communications Magazine (2010) 6. Xu, L., Sun, C., Li, X., Lim, C., He, H.: The Methods to implement Self Optimisation in LTE System. In: ICCTA_IEEE Conferences (2009) 7. NGMN. User Cases related to Self Organising Network, Overall Description (May 2007) 8. 3GPP. Self-Organization Network (SON): Concept and requirements. TS 32.500 v8.0.0 (2008) 9. 3GPP. Self-configuration and self-Optimization Network (SON): Use Cases and Solutions. TS 36.902 v9.1.0. Release 9 (2010) 10. Celentano, J.M.: Carrier Capital Expenditures. IEEE Communications Magazine (2008) 11. Hurley, S., Allen, S., Ryan, D., Taplin, R.: Modelling and planning fixed wireless Networks, Wireless Networks. Springer, Heidelberg (2010) 12. Mishra, S.M., Hwang, J., Filippini, D., Moazzami, R., Subramanian, L., Du, T.: Technical report UCB/CSD-05-1411: Economic Analysis of Networking Technologies for Rural Developing Regions. EECS UC Berkerly (2008) 13. Henry, E.: Riggs: Financial and Economic Analysis for Engineering and Technology Management, 2nd edn. Wiley Series in Engineering and Technology Management. Wiley Interscience, Hoboken February 23, (2004)
ICT-Enabled Business Process Re-engineering: International Comparison Ya-Ching Lee1, Pin-Yu Chu2, and Hsien-Lee Tseng3 1
Institute of Communications Management, National Sun Yat-sen University 70, Lieng-Hai Rd., Kaohsiung, Taiwan
[email protected] 2 Department of Public Administration, National Chengchi University, 64, Sec.2, Zhi-Nan Rd., Taipei, Taiwan
[email protected] 3 Institute of Public Affairs Management, National Sun Yat-sen University, 70, Lieng-Hai Rd., Kaohsiung, Taiwan
[email protected]
Abstract. The purpose of this study is to investigate ICT impacts on BPR. By comparing the data in the United and Chile, it is found that ICT adoption affects BPR, and, BPR influences business performance. There are differences of ICT adoption impacts on BPR and differences of BPR influences on the profit among countries. Keywords: Business process reengineering, Structural equation modeling.
1 Introduction There are close relationships between the adoption of Information and communications technologies (ICTs) and business process reengineering (BPR). The literature has shown that ICTs change business practices to re-optimize business processes, along with improving the efficiency and performance (for example, Sarkar & Jagjit, 2006; Ziaul, Faizul, & Ken, 2006). However, most research usually focuses on one or two enterprise applications, such as electronic commerce and enterprise resource planning. Lee, Chu, and Tseng (2009) believe that previous studies do not offer a whole picture about the mixed effects of various ICTs on BPR. Therefore, this paper explores the national differences in ICT adoption and its impacts on BPR and performance.
2 Conceptual Background Business process reengineering is an approach to re-optimizing business processes for corporations, obtaining competitive advantage and enhancing business performance, such as cost saving, quality breakthroughs, better customer services, time reduction, and revenue increases (Morris & Brandon, 1993). ICTs are usually influential in reforming business practices. Researchers finds that ICT is an important enabler of G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 278–284, 2011. © Springer-Verlag Berlin Heidelberg 2011
ICT-Enabled Business Process Re-engineering: International Comparison
279
BPR because ICTs enable the distribution of power, function, and control (Morton, 1996) and render information collection and its analysis, the development of strategic vision, and teamwork efficient (Attaran, 2004; Akhavan, Jafari, & Ali-Ahmadi, 2006; Freeman, 2000; Venkatraman, 1991). Lee et al. (2009) suggest that ICTs change BPR in three dimensions: Workplace, workforce, and structure. Time and space limitation disappear because of ICTs. By telecommunicating from home or other place, ICTs provide the opportunities of direct and cross-unit collaboration among geographically dispersed business units (Sarkar & Jagjit, 2006). The second dimension ICT impacts BPR is changes in the workforce. A growing spread of automation in companies is made available by the ICTs. Automation eventually leads to the less need of human resources (Wymer & Regan, 2005; Sarkar & Jagjit, 2006; Ziaul et al., 2006; Lee et al., 2009). Organizational structural reforms become necessary to align with ICT adoption (Venkatraman, 1991). Business units can thus diminish the process of mediation and increase crossunit collaboration because of the information share and exchange of ICTs (Teng, Grover & Fiedler, 1994; Sarkar & Jagjit, 2006; Ziaul et al., 2006). As a result, organizational hierarchies become flatter, and the degree of centralization of decision making has been changed (Orman, 1998). Past research mostly focus on IT performance (Santhanam & Hartono, 2003; Jin, 2006; Hendricks, Singhal, & Stratman, 2007; Francalanci & Morabito, 2008). These research seldom link IT-driven performance with BPR, and thus, the consensus on the performance measurement standards is still far from reached. In addition, the performance of IT-driven and ICT-enabled BPR are not consistent. A great number of firms implementing multi-year, multi-million dollar ERP projects do not reap benefits from ICTs (Dryden, 1998) or gain small benefits relatively to ICT investments (Lyytinen & Robey, 1999; Carr, 2003; Na et al., 2004). It is therefore important to investigate business performance resulting from ICT-enabled BPR. In this paper, we investigate the relationships among ICT adoption, ICT-enabled BPR, and the performance.
3 Hypotheses and Research Questions • Hypotheses 1a-1c: Resource planning infrastructure (RP) has positive and significant impacts on workplace reforms (WP) (1a), workforce reforms (WF) (1b), and organizational structure reforms (OS) (1c) • Hypotheses 2a-2b: E-commerce infrastructure (EC) has positive and significant impacts on workplace reforms (WP) (2a), workforce reforms (WF) (2b), and organizational structure reforms (OS) (2c) • Hypotheses 3a-3c: Workplace reforms (WP) (3a), workforce reforms (WF) (3b), and organizational structure reforms (OS) (3c) positively affect business profit (profit). • Hypothesis 4: Workplace reforms (WP) positively affect workforce reforms (WF). • Hypothesis 5: Workforce reforms (WF) positively affect organizational structure reforms (OS).
280
Y.-C. Lee, P.-Y. Chu, and H.-L. Tseng
Workplace Reform (WP) Resource Planning Infrastructure (RPI) Workforce Reform (WF)
Profit (Profit)
E-commerce Infrastructure (ECI) Organizational Structure Reform (OS)
Fig. 1. Presents the ICT-enabled BPR model
4 Results The research responses are from chief information officers or senior information systems managers in the United States and Chile. The sample size is 248 from the United States and 301 from Chile. The SPSS 12.0 and AMOS 16.0 software packages were used for the statistical analysis. Scale reliability and construct validity are tested with confirmatory factor analysis (CFA). Based on CFA results, we discard items that load on multiple constructs or have low item-to-construct loadings (Anderson & Gerbing, 1988). The loadings of items on their respective factors are highly significant (p < 0.01). The goodness of fit index (GFI), comparative fit index (CFI), and Bollen’s fit index (IFI) range between 0.82 and 0.87 for two countries (Bollen, 1989). The root mean square of error approximation (RMSEA) values 0.061 and 0.067. The results find the data converge and the fit of CFA model appropriate (Bentler, 1995; Bollen, 1989). 4.1 Results for Hypotheses Testing The path coefficients of 2 countries are shown in Table 1. Several notable and unanticipated results are discovered. In the United States, there are two negative significant path coefficients: One from RPI to OS, and, the other from OS to Profit, with standardized path coefficients of -0.19 and -0.09, respectively. A possible explanation for why RPI does not have a positive impact on OS in the US might be the company size. In the US, company size is much bigger than Chile. Twenty-five percent of US firms have more than 2,000 employees. In Chile companies with over 1000 employees account for 18.7% and 8.6% of all companies in the sample, respectively, much lower than in the United States, large firms having proportionally greater spending on human resources might impact profits.
ICT-Enabled Business Process Re-engineering: International Comparison
281
Table 1. Results of structural equation model analysis of individual country models Path coefficients
USA
Chile
RPI-WP
0.19*
0.09*
RPI-WF
0.34*
-0.01
RPI-OS
-0.19*
0.09
ECI-WP
0.39*
0.26*
ECI-WF
-0.15
-0.02
ECI-OS
0.24*
0.08
WP-WF
0.37*
0.34*
WF-OS
0.63*
0.43*
WP-PRO
0.16*
0.10*
WF-PRO
0.13*
-0.01
OS-PRO
-0.09*
0.04
X2
554.47
530.03
DF
202
202
p-value
0.00
0.00
GFI
0.84
0.86
RMSEA
0.084
0.074
5 Conclusion and Discussion ICTs play a key role in business activities and improve business performance (Brynjolfsson & Hitt, 1996; Kohli & Devaraj, 2003). The literature is of shortage in examining the moderating effects of BPR and cross-country comparisons. This paper applies multiple-group structural equation modeling to test the ICT-enabled BPR Model to clarify the relationships among ICT adoption, BPR, and performance. The results confirm discovery of previous research that ICT adoption affects BPR, and, BPR influences business performance. However, the impacts are not all always positive in this research. There are differences in ICT adoption impacts on BPR and differences of BPR influence on profit among countries. In the case of the United States, the resource planning infrastructure and e-commerce infrastructure positively affect workplace reforms. The resource planning infrastructure negatively affects organizational structure reforms despite of improving companies’ workforce. E-commerce infrastructure positively impacts organizational structure. Profits are positively influenced by workplace reforms and workforce reforms. For Chile, workplace reforms are positively affected by the resource planning infrastructure and e-commerce infrastructure, leading to profit improvements. Though enterprises in
282
Y.-C. Lee, P.-Y. Chu, and H.-L. Tseng
Chile adopt new communication and information exchange tools to accomplish tasks without space constraints, they avoid dramatic workforce reforms and organizational structures reforms, which are more central and evolutional to business processes (Bhatt, 2000; Venkatraman, 1991; Ziaul et al., 2006). The finding in this study showing is that profits are indirectly affected by workplace reforms also suggest that workplace reforms are more peripheral to business operations. However, the results of current research demonstrates that firms can take advantage of workplace reforms to efficiently exchange the management processes and reduce the travel and communication costs (Hammer & Champy, 1993; Tippins & Sohi, 2003). The requirement of ICT usage to telecommunicate and teleconference to engage in interrelated work activities (Hammer & Champy, 1993; Tippins & Sohi, 2003) forces firms to simplify and automate management processes and more efficient (Hammer, 1990). As a result, the organizational structure flattens and involvement of employees in empowerment activities increases. This research also indicates IT paradox. That is, ICT-enabled BPR do not always lead to positive performance (higher profits). This is especially true when the organizational structure or workforce does not align with the adoption of the resource planning infrastructure. The pay-off of ERP implementation may take time to come clear (Sarkar & Jagjit, 2006; Ziaul et al., 2006). Therefore, further investigation is encouraged to examine impacts of the ERP infrastructure on BPR and performance in the long run. There are several contributions of this paper: 1. Empirical evidence is offered to refine impacts of ICTs on business process reengineering and performance from the firm-level studied in past research to the country-level. 2. The associations among ICT adoption, BPR, and performance haven not been strongly connected. The findings in this paper help to understand the complexity of ICT adoption and the prediction of BPR outcomes. 3. The national differences in ICT adoption and its impacts on financial performance are explored. It is worth noting that ICT-enabled BPR does not necessarily lead to profits. The explanation of the inconsistency with prior literature might be that ICT investment takes time to delay effects, and that the financial index is not the only indicator for performance estimation. Santhanam and Hartono (2003) believe that future tests of ICT capability have to consider the prior financial performance of firms.
References [1] Sarkar, N., Jagjit, S.: E-enabled BPR applications in industries banking and cooperative sector. Journal of Management Research 6, 18–34 (2006) [2] Ziaul, H., Faizul, H., Ken, C.: BPR through ERP: Avoiding change management pitfalls. Journal of Change Management 6, 67–85 (2006) [3] Lee, Y.C., Chu, P.Y., Tseng, H.L.: Exploring the relationships between information technology adoption and business process reengineering. Journal of Management & Organization 15, 179–185 (2009)
ICT-Enabled Business Process Re-engineering: International Comparison
283
[4] Morris, D., Brandon, J.: Re-engineering your Business. McGraw-Hill, New York (1993) [5] Morton, M.S.: How information technologies can transform organizations. In: Kling, R. (ed.) Computerization and Controversy: Value Conflicts and Social Choices, pp. 148– 160. Academic Press, San Diego (1996) [6] Attaran, M.: Exploring the relationship between information technology and business process reengineering. Information Management 41, 585–596 (2004) [7] Akhavan, P., Jafari, M., Ali-Ahmadi, A.R.: Exploring the interdependency between reengineering and information technology by developing a conceptual model. Business Process Management Journal 12, 517–534 (2006) [8] Freeman, R.: The IT Consultant: A Commonsense Framework for Managing the Client Relationship. Jossy-Bass Publications, San Francisco, CA (2000) [9] Venkatraman, N.: IT-induced Business Reconfiguration. In: The Corporation of the 1990s: Information Technology and Organization Transformation. Oxford University Press, New York (1991) [10] Wymer, S., Regan, E.: Factors influencing e commerce adoption and use by small and medium businesses. Electronic Markets 15, 438–453 (2005) [11] Teng, J.T.C., Grover, V., Fiedler, K.D.: Business process reengineering: charting a strategic path for the information age. California Management Review 36, 9–31 (1994) [12] Orman, L.V.: A model management approach to business process reengineering. Journal of Management Information Systems 15, 187–212 (1998) [13] Santhanam, R., Hartono, E.: Issues in linking information technology capability to firm performance. Mis Quarterly 27, 125–153 (2003) [14] Jin, H.: Performance implications of information technology implementation in an apparel supply chain. Supply Chain Management-an International Journal 11, 309–316 (2006) [15] Hendricks, K.B., Singhal, V.R., Stratman, J.K.: The impact of enterprise systems on corporate performance: A study of ERP, SCM, and CRM system implementations. Journal of Operations Management 25, 65–82 (2007) [16] Francalanci, C., Morabito, V.: IS integration and business performance: The mediation effect of organizational absorptive capacity in SMEs. Journal of Information Technology 23, 297–312 (2008) [17] Dryden, P.: ERP failures exact high price. Computerworld 32, 16–17 (1998) [18] Lyytinen, K., Robey, D.: Learning failure in information systems development. Information Systems Journal 9, 85–101 (1999) [19] Carr, N.G.: IT doesn’t matter, Harvard Business Review 81, 41–49 (2003) [20] Na, K.S., Li, X.T., Simpson, J.T., Kim, K.Y.: Uncertainty profile and software project performance: A cross-national comparison. Journal of Systems and Software 70, 155–163 (2004) [21] Anderson, J.C., Gerbing, D.W.: Structural equation modeling in practice: a review and recommended two-step approach. Psychological Bulletin 103, 411–423 (1998) [22] Bollen, K.A.: Structural Equations with Latent Variables. Wiley, New York (1989) [23] Bentler, P.M.: EQS Structural Equations Program Manual. Multivaritate Software, Encino (1995) [24] Brynjolfsson, E., Hitt, L.: Paradox lost? firm-level evidence on the returns to information systems spending. Management Science 42, 541–558 (1996)
284
Y.-C. Lee, P.-Y. Chu, and H.-L. Tseng
[25] Kohli, R., Devaraj, S.: Measuring information technology payoff: a meta-analysis of structural variables in firm-level empirical research. Information Systems Research 14, 127–145 (2003) [26] Bhatt, G.: Exploring the relationship between information technology (IT) and business process reengineering (BPR). Business Process Management Journal 6, 139–163 (2000) [27] Hammer, M., Champy, J.: Reengineering the Corporation: A Manifesto for Business Revolution. HarperCollins, London (1993) [28] Tippins, M.J., Sohi, R.S.: IT competency and firm performance: Is organizational learning a missing link? Strategic Management Journal 24, 745–761 (2003)
A Methodology to Develop a Clinical Ontology for Healthcare Business Mario Macedo1 and Pedro Isaías2 1
Escola Superior Tecnologia Abrantes, IPT, Rua Principal, 10-C, Peralva, 2305-516 Paialvo, Portugal 2 Universidade Aberta, Portugal
[email protected],
[email protected]
Abstract. The development of clinical ontologies using common clinical data is a very important issue to record healthcare patient history, to use medical guidelines and to services accountability. The usage of terminologies already developed and available like SNOMED is a benefit. However many doctors argue that they prefer to continue using natural language and unstructured text to record patient data. Their point of view is that natural language is much more complete and flexible than standardized terminologies. This study intends to prove that it is possible to recognize patterns from natural language and identify the clinical procedures as they would be written with a normalized language. Another delivery of this study would be a precisely accountability of healthcare services. Keywords: Ontology, Medical Guideline, Clinical Natural Language, SNOMED.
1 Introduction The healthcare organizations are complex structures and they are formed by two vectors, the economical sustainability and the quality of services. The economical sustainability is divided in some main aspects such as: • • • •
Clinical services; Clinical support services; Administrative services; Auxiliary support services.
The auxiliary services have identical characteristics of other organizations and they do not have particular issues of management . They can be modeled with BPM (Business Process Management) and BRM (Business Rules Management) tools and they can be quantified with time metrics, amount, volume or area. The clinical services are difficult to model, and quantify. In a general way the clinical services have a huge component of intangibility. In order to define the price of healthcare services there are some models. One of them is DRG (Diagnostic Related Group) which was developed in the USA by Health Care Financing Administration and it is very well known. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 285–291, 2011. © Springer-Verlag Berlin Heidelberg 2011
286
M. Macedo and P. Isaías
The DRG establishes a relation between a code, a diagnosis and a set of procedures for each treated patient. The DRG code has a price and a case mix index that depends on the variety of the services provided by the hospital. It is important to know if the procedures workflow is the most efficient and effective way to treat each patient. There are clinical benchmarkings to compare the best practices. The Joint Commission and Accreditation of Health Care Organizations have developed guidelines and quality indicators to evaluate healthcare services. Many software applications use codifications and normalized information. However many information is disseminated by many unstructured text and clinical diaries. This unstructured information is a barrier to develop workflows and automatic tools to analyze the data. Some doctors argue that it is not possible to codify all data. So, the unstructured text is still an important source of information.
2 Ontology Concept and Its Development According to Guber (1992) An ontology is a specification of a conceptualization. It is possible to say that an ontology is a representation of knowledge and is formed by a collection of concepts that represent entities of real life and associations between them. The type of these associations classifies the concepts association, that is, ontology could be a stereotype of real entities and their behavior. We can also use ontologies to represent knowledge and to conceptualize facts of real life. The representation of an ontology is based on two types of languages. A language to represent the concepts and their relationships and another to represent the constraints of the instances of the ontology. An instance of an ontology is a collection of concepts and associations that represent a fact of real life. An ontology has properties that can be reused and it can inherit properties from other ontologies. The W3c ontology working group (w3c, 2004) developed a Web ontology language to share ontologies on the web. This ontology is a xml (extended markup language) based language and establishes a definition of the classes of objects and their relationships. This language is very suitable to define structured information through the world wide web. The ISO (International Organization for Standardization) formed the TC215 Health Informatics Committee Working Group. So far this group has published 88 standards that cover areas like data communications, guidelines for terminology development, electronic health record formats, medical devices data communication and so on. The clinical terminologies are collections of concepts associated with some types of rules and classifiers. It is possible to say that a clinical terminology is an ontology. However there are many important data records in the clinical diaries that can´t be interpreted by a terminology. So, some authors argue that the unstructured text of the diaries is a very important complement to the terminologies despite of the fact that many clinical diaries are
A Methodology to Develop a Clinical Ontology for Healthcare Business
287
made from many different concepts and acronyms. According to Ruffolo et al. , (n.d.) there are some methodologies to identify taxonomies. The usage of some techniques like data mining tools to find patterns could be very complicated and the success depends on the words and abbreviations used. There are many reasons why the ontologies are so important in healthcare services. In fact, ontologies are a structured manner to describe facts, drugs, assets, procedures and entities relevant to the business. When there is a type of hierarchy like an anatomic structure, a disease concept, a specific drug and so on we can design taxonomies to define the hierarchy of concepts. Using ontologies in medical records makes it possible to achieve the following goals: • • • • • • •
Write understandable records with less effort and more efficiency; Share more reliable information with other professionals and people in general; Avoid data errors; Improve knowledge construction; Decrease clinical risk; Improve clinical benchmarking; Make feasible hospital balanced scorecards.
2.1 Steps to Develop an Ontology According to Noy and McGuinness (2001), there are some steps to develop an ontology. First of all the domain must be specified. After that the reutilization of other ontology should be considered. The ontologies have some properties like isomorphism and heritage. If the decision to develop a new ontology is made, the terms and significant classes of terms should be found. Finally the properties of classes and the taxonomy of concepts must be defined. It is not a simple job and it is never finished. The development of an ontology is an evolutionary task. There are some normalized languages to describe metadata of ontologies. The OWL (Ontology Web Language) from (w3c, 2004) intends to develop general metadata of ontologies to different domains. An XML based language named GLIF (Guideline Interchange Format) was introduced by two authors, Peleg and Wang (2010). This language is used to develop guidelines using any kind of medical ontologies. These structures can be transmitted between different systems. 2.2 Modular Ontologies One of the constraints to model the clinical ontologies is their size. There are so many concepts that it is impossible to join all of them in an unique ontology. So we have to split different domain ontologies in sets of concepts. The modularization is a technique to split the ontologies in a way to keep the performance of services, coherence of concepts and understandability.
288
M. Macedo and P. Isaías
According to Parent and Spaccapietra (2008) the Modularization strategies are: • Split overlapping modules; • Disjoint by semantic means; • Disjoint by structure strategies. Each ontology should have a signature. A signature is a set of symbols that identifies univocally each ontology. The ontologies are dynamic and they should be scalable with knowledge acquisition.
3 The Data Mining Techniques and Rule Extraction from Support Vector Machines The rules extraction from common data is a very important way to build ontologies. Inside hospitals there are many sources of data but most of them are only partially used. According to Diederich (2008) there are three different methods: • Explanation from available information; • Hypotheses that explain the facts; • Evaluation of different explanations. Also according to the same author, there are four major approaches: deductive, schematic, probabilistic and neural. The classical approach is the deductive that requires a language to describe the solution of the problem. The schematic uses a representation of the solution. The UML (Unify Modeling Language) is an example of this methodology. The probabilistic uses a Bayesian tree to describe the solutions. The neural approach uses rule extraction from support vector machines for knowledge acquisition. Another application of rule extraction is transfer learning. Transfer learning means the usage of knowledge acquired in a task to improve the learning acquisition in another task. According to Torrey et al. (2008), it is possible to create RL (Reinforcement Learning) algorithms based on a set of features. This theory is based in a thesis that each system is formed by a set of states. The transition between each states depends on the rules and on an advising function. This advising function is dynamic and is based on calculation over the success options taken by the system in the past. This success options are accounted with rewards taken based on good actions. The advising function is a matrix estimating weights. It is possible to create a vector space model with the words presented in the ontology. This matrix of words represents the words found in all concepts. Figure 1 represents a collection of words selected from a subset of the ontology of psychology from SNOMED terminology.
A Methodology to Develop a Clinical Ontology for Healthcare Business
289
With these words it is possible to identify classes of concepts for future analysis. It is also possible to build a Bayesian tree of concepts to help the doctor write with a normalized terminology, (Figure 1).
Fig. 1. A sample of decision tree
The REPTree is a fast decision regression tree learner algorithm. The method is based on information gain variance. The output making desensitization the root is shown in Figure 2. In order to analyze the written text, it is possible to create a vector space model in a multidimensional Euclidian space. A matrix of classes of words can be considered as well as the classes formed with the concepts picked from SNOMED terminology. desensitization < 0.5 | Psychoanalytic < 0.5 | | language < 0.5 | | | /socio < 0.5 | | | | Detoxication < 0.5 | | | | | Formal < 0.5 : 0 (1685/0) [843/0] | | | | | Formal >= 0.5 : 1 (2/0) [0/0] | | | | Detoxication >= 0.5 : 1 (2/0) [0/0] | | | /socio >= 0.5 : 1 (2/0) [1/0] | | language >= 0.5 : 0.67 (2/0) [1/1] | Psychoanalytic >= 0.5 : 1 (2/0) [0/0] desensitization >= 0.5 : 0.64 (5/0.16) [6/0.34] Fig. 2. Desensitization Tree
If there are n lines, a matrix of n vectors can be represented. Each value in the matrix can be substituted by 0 if the frequency is zero or one if the frequency is greater than zero. To identify patterns it is possible to define vectors where some concepts are marked as true. Using RL algorithms it is possible to verify if some patterns exist in a text by measuring the Euclidian distance between the text vector
290
M. Macedo and P. Isaías
and the pattern. This methodology is very important to analyze clinical diaries and find procedures, sequences and diagnosis.
4 The Proposed Framework Using this methodology it is possible to identify sequences of procedures and build the treatment workflow of each patient. The first delivery of using ontologies in healthcare is to establish a clear understandind of each patient history. With patients history it is possible to build knowledge and to identify the best practices that conduct to the best results. The price of healthcare services is established by a reimbursable policy. One well known model is the DRG (Diagnosis Related Group)(Kahn et al., 1990). This methodology assumes that each disease has a diagnosis code with some treatment procedures. The codified procedures and the aggregation of the codes for each patient treatment workflow define the DRG code. The price of the service depends on the DRG and on the other variable named case mix. The case mix is constant for a defined period of time and depends on the variability of treated patients regarding their diseases, (Tsai, 2009). Although the DRG price is not enough to develop the hospital budget. It is necessary to know the production costs. Some costs could be based on direct consumptions like drugs and assets, but others depend on the procedures that are very difficult to account for. Each section or department provides healthcare services that are related to treatment procedures. In a way it is possible to define a standard cost for each procedure. This cost is calculated based on ABC (Active Based Costing) methodologies. So, at the end, the cost of each patient treatment is a sum of the costs from different sources, only some of them are direct costs like drugs and other are related to procedures. The account system requires the information of each patient’s consumptions and the procedures used to treat the diseases. The procedures are captured by the clinical ontologies and again it is very important to identify the ontologies from the text of diaries.
Fig. 3. Proposed Framework (Authors’ Proposal)
A Methodology to Develop a Clinical Ontology for Healthcare Business
291
5 Conclusion This framework will promote the development of medical guidelines and accountability models. At the same time, it will improve the quality of healthcare services. The future work will be to build a model to define an Activity Based Costing model to establish the value of each procedure.
References 1. Clinical, O.: Methods and tools for the development of computer-interpretable guidelines (2010), http://www.openclinical.org/gmm_glif.html (retrieved) 2. Diederich, J. (ed.): Rule Extraction from Support Vector Machines: An Introduction. Springer, Heidelberg (2008) 3. Guber, T.: What is an Ontology? Knowledge System AI Laboratory (1992) 4. Kahn, K.L., Keeler, E.B., Sherwood, M.J., Rogers, W.H., Draper, D., Bentow, S.S., et al.: Comparing Outcomes of Care Before and After Implementation of the DRG-Based Prospective Payment System (1990) 5. Noy, F.N., McGuinness, D.L.: Ontology Development 101: Guide to Creating Your First Ontology. Stanford Medical Informatics Technical Report SMI-2001-0880 (2001), http://www.ksl.stanford.edu/people/dlm/papers/ ontology-tutorial-noy-mcguinness-abstract.html (retrieved) 6. Parent, C., Spaccapietra, S.: An Overview of Modularity. In: Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.) Modular Ontologies. LNCS, vol. 5445, pp. 5–23. Springer, Heidelberg (2009) 7. Peleg, M., Wang, D.: GLIF, the Guideline Interchange Form (2010), http://www.openclinical.org/gmm_glif.html (retrieved) 8. Ruffolo, M., Cozza, V., Gallucci, L., Manna, M., Pizzonia, M.: Semantic Information Elicitation from Unstructured Medical Records, http://cilc2006.di.uniba.it/download/camera/ 12_Ruffolo_CILC06.pdf (retrieved) 9. Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Rule Extraction for Tranfer Learning. In: Diederich, J. (ed.) Rule Extraction from Support Vector Machines (2008) 10. Tsai, C.-L.: Analyzing Patient Case Mix and Hospital Rankings (2009) 11. w3c. OWL Web Ontology Language (2004), http://www.w3.org/TR/owl-ref/ (retrieved February 4, 2011)
Advances in E-commerce User Interface Design Lawrence J. Najjar TandemSeven Austin, TX USA
[email protected]
Abstract. To remain competitive, e-commerce user interfaces need to evolve as customer behaviors and technologies change. This paper describes several new user interface features that designers may want to add to their e-commerce offerings. The features include social media connections, storefronts on social media sites, automated product recommendations, dynamic product customization, dynamic product contextual simulation, flash sales, and mobile commerce. Keywords: E-commerce, user interface design, social media, automated product recommendations, dynamic product customization, dynamic product contextual simulation, flash sales, mobile commerce, m-commerce.
1 Introduction E-commerce continues to expand in popularity around the world. In 2007, over 85% of Internet users worldwide had made an online purchase [1]. In the United States, by 2014, 8% of retail sales will be via e-commerce and over half of all retail sales will be performed online or affected by online research [2]. User interface design is critical for e-commerce sites and there are detailed suggestions for successful e-commerce user interface design (cf., [3, 4, 5]). But what are the latest trends in e-commerce design? What are the new features that designers may want to add? This paper describes several user interface design features that ecommerce user interface designers may want to include in their current e-commerce sites or other shopping opportunities. These design features include social media connections, storefronts on social media sites, automated product recommendations, dynamic product customization, dynamic product contextual visualization, flash sales, and mobile commerce.
2 Social Media Social media sites are exploding in growth. Seventy-two percent of Internet users in the world are active in a social media site [6]. Even though it is blocked in China [7], Facebook is now the most popular social media site in the world with over 583 million members [8, 7, 9]. Facebook is used by more than half of all people with Internet access [6]. In 2010, Facebook passed Google as the most visited Web site in the United States and got 9% of all site visits [9]. Twitter has 190 million users, 65 G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 292–300, 2011. © Springer-Verlag Berlin Heidelberg 2011
Advances in E-commerce User Interface Design
293
million tweets per day [10]. Other popular social media sites include MySpace, Hi5, Flickr, YouTube, Renren, and Orkut [6, 11]. People want to talk about their potential purchases and the brands they like. People do a lot of their talking using their social networks. In the United States, people spend more time using social media than using e-mail [12], and this trend is growing worldwide [6]. Also, social networking users spend 50% more online than people who do not use social networking sites [8]. So, create a social media presence for your e-commerce site on Facebook, Twitter, YouTube, and MySpace. Exploit the popularity of Facebook. Include a Facebook “Share” button so users can post a link to your site on their Facebook pages. Display a Facebook “Like” button on product pages like Levi’s does. When a visitor with a Facebook page clicks on the “Like” button, the number of people who “Like” your site increases and a link to your e-commerce site appears in the user’s activity stream. Then, when the visitor’s friends click on the link to get to your e-commerce site, the “Like” button shows which of the friend’s friends clicked on the “Like” button. Identify which type of page you are (e.g., book, drink, food, product) by adding semantic markup to the pages with “Like” buttons. That way, the page gets added to the correct category in the user’s Facebook profile [13]. Assign a staff member to update and maintain your Facebook page with fresh and inviting information, photos, videos, questions, and polls. When a Facebook user leaves a message on your “Wall” or in a discussion, respond within 24 hours. Mention that you are on Facebook in your marketing print materials [14]. One innovative use of Facebook to increase sales is to allow friends to give gifts to each other via Facebook [15]. The gift-giver buys a voucher or gift card on your company’s Facebook page. Your company posts a notification on the gift recipient’s wall. When the recipient clicks on the Facebook wall post, the recipient gets a code that can be redeemed for the gift. Sears and Amazon allow gift givers to buy gifts in various amounts. Starbucks lets the gift giver refill the recipient’s Starbucks card. Finally, eBay allows a group of friends to go in together on a gift. Since 22% of Twitter users follow a business on Twitter to learn about promotions [16], tweet once a week about short-term deals. In each tweet, include a Tiny URL (http://tinyurl.com/) that links to the deal. Dell gained over US$6.5 million in sales by tweeting about discounted products [17]. Although corporate blogs are the least trusted medium of communication [18], create a blog with fun, witty, or informative entries that are written by easy-to- relateto store staff rather than a high-level corporate executive. Since they are the second most trusted medium [18], be sure to include customer reviews of your products. If you have very technical products, such as cameras, consider asking staff experts to write reviews. To maintain credibility, identify the staffers as employees and give them simple titles such as “editor.” Product videos are a great way to engage shoppers and increase sales. US online shoppers increased their video views on retail sites by 40% from 2009 to 2010 [16]. One study found that fashion sites adding product videos increased their look-to-buy conversion rate by 134% [19]. Fashion shoppers who viewed videos had double the conversion rate of shoppers who did not view videos [19]. Another study found that retail site visitors who viewed videos spent two minutes longer on the site and were
294
L.J. Najjar
64% more likely to make a purchase [16]. So, make videos of staff members demonstrating features of the products that bring in the top 20% of your revenue. Add tags to facilitate search and post them on your site and on popular video sites such as YouTube, Metacafe, Yahoo Video, and Video.qq (if you have customers in China). For example, WineLibrary.com has hundreds of fun videos of the owner describing the wines sold on his site. Since they may own and use your products for years, people in social networks may know more about your products than you do [20]. So, write social media guidelines such as “follow current employee guidelines,” “be transparent,” “respect the audience and co-worker,” “add value,” and “be polite” [21]. Then, track social media comments about your site using tools such as Radian6 and offer answers or suggestions. Measure success by tracking traffic generated to the e-commerce site, percentage of positive comments, and the number of members who “Like” your page [21]. Use what you learn in social media to improve your products. Near the bottom of every page of your e-commerce site, display small icon links to the social media pages you created. Add “Share with friends” links that let users tell their friends about your site or a specific product page via Facebook, Twitter, Blogger, and MySpace. Since more than half of Internet users prefer to share via email versus their social networks [22], and e-mail from friends is the most trusted medium of communication [18], include a link to share your site via e-mail. Also, content shared via e-mail is more likely to lead to purchases than content shared via Facebook or Twitter [23]. REVOLVEclothing.com uses a wide variety of social media. Their Home page includes links to a blog, their product videos, their Facebook page, Twitter page, YouTube videos, and a way to sign up for their e-mail distribution list.
3 Social Commerce Most e-commerce users interact with social media and follow a brand in Facebook [24]. Brand fans in Facebook spend on average US$136.38 a year more on the brand than people who are not fans [25]. Support this interest by letting fans buy from a storefront on your social media page. Use tools from Aggregate Markets, Payvment, Storefront Social, and Usablenet to create a storefront tab on your Facebook page. Present a limited set of special products (e.g., products for Mothers Day). Features vary, but shoppers may search for products in your store, review and comment on products, put selections in a shopping cart on Facebook, and pay on Facebook using credit cards or PayPal [26]. Retailers with Facebook storefronts include 1-800-Flowers.com, Adult Swim UK, Athehof, Brooks Brothers, Delta Air Lines, Drugstore.com, Gap, JCPenney, Madame Bridal Gowns, Old Navy, and Peek…Aren’t You Curious. You can also use tools like Cartfly to embed an Amazon storefront in your Facebook or MySpace social networking page or your Blogger or WordPress blog [27]. Shoppers check out using the Amazon payment system and need an Amazon account to complete a purchase. Probably the most well known Amazon storefront on Facebook is the Procter & Gamble Pampers page. It uses a “Shop Now” tab, allows users to add products to a shopping cart on Facebook, check out using their Amazon ID and password, and get the products delivered to their homes [28]. Pampers offered
Advances in E-commerce User Interface Design
295
a new line of products to fans on its Facebook page and sold out the 1,000 packages in less than an hour [14]. BestBuy also has an Amazon storefront on Facebook. Here is an even simpler idea. Use a tool like Fluid Fan Shop to create a “Shop” tab on your social networking page. Then list about 15 products with images, prices, and “More Details” links that take social network users to your existing e-commerce, shopping cart, and checkout function like Guitar Syndicate does on their Facebook page. Strive to create a social selling experience and community rather than a simple purchase transaction. Provide product information, discounts, contests, polls, questions, and discussions to encourage fans to engage and talk about your brand. On each social commerce storefront product page display a “Like” button so shoppers can express themselves and other shoppers can see how many other people liked the product. Provide a way for users to review and comment on products or the store. Offer special deals to fans.
4 Automated Product Recommendations To personalize the user interface, to display products that are likely to interest shoppers, and to encourage additional purchases use automated recommendation tools such as Baynote or Certona that suggest products that may interest the user. Base the recommendations on a combination of the user’s past behavior, the past behavior of similar users, or the item itself [29]. Amazon.com and Netflix.com are examples of well-known e-commerce sites that very successfully utilize recommendation tools to increase sales. A simple way to participate in one of the most popular product recommendation systems in the world is to list some of your products on Amazon.com [30]. Your products get added automatically to some other product pages as “related products.”
5 Dynamic Product Customization To engage shoppers and spur social networking discussions of your e-commerce site, allow users to dynamically customize some of your products. Allow shoppers to select the product, make choices to personalize it, view the results as they make their choices, and purchase the customized product. NikeID.Nike.com lets users design and order customized running shoes. On LaudiVidni.com users create and buy tailored handbags (including style, design options, lining, and zipper) then zoom in to see details, see the handbag held by a virtual model, send the image to their social network, and purchase the customized handbag. Mattel Shop My Design allows girls to design and purchase their own customized Barbie – doll, clothes, accessories, even a doll-sized t-shirt with the girl’s name on it.
6 Dynamic Product Contextual Simulation Dynamic product contextual simulation allows online shoppers to see customized products on simulations of themselves, their group, or their living space. The idea is
296
L.J. Najjar
to make it fun for users to choose from many product options to personalize a product then simulate the product in a realistic environment that makes it easy for the shopper to make a purchase decision. Seventeen.com and JCPenney created an innovative virtual dressing room that allows teen girls to use their laptop cameras and hand gestures to “try on” various clothes that are overlaid on their live camera images, identify the items they like so they see similar clothes, e-mail an image of themselves in the clothes, post the image on their Facebook pages, and link to JCPenney.com to purchase the items. StilaCosmetics.com is a makeup site allows users to upload a straight-on photograph of themselves (or select a model’s photo); select a variety of products for face, eyes, lips, and cheeks; adjust the amount of the makeup; print, e-mail, or share the made-up image to Facebook or Twitter; and view and purchase the products used. On the Davidsbridal.com site, brides simulate the appearance of each member of the wedding party (bride, groom, bridesmaids, groomsmen, mother of the bride – even children) to create a wedding party “look.” The simulation allows the bride to select the physical attributes of wedding party members (sizes, facial features, skin tones, hair styles, hair lengths, hair colors), select from a huge assortment of dresses (styles, colors), tuxedos, and accessories (shoes, headbands, veils veil lengths). The application allows the bride to see the entire party together in a simulated wedding party photo (including selectable backgrounds), save and e-mail the “wedding photo,” and print the selected products for purchase during an appointment at the local store. On Art.com you can upload a photograph of your room (or use a model room); use a ruler to specify dimensions and see the size of the art change automatically to fit in with the room proportions; select and customize the frame, mat, and glass of wall art; place the customized art in the room; change the wall colors if the shopper chose a model room; drag the art around on the wall; and buy the customized art. Dynamic product simulations appear to improve sales and reduce returns. HawkesandCurtis.com men’s shirt simulation allows users to enter their measurements and “try on” various shirts and shirt sizes on a matching “robotic mannequin.” The retailer got a 57% increase in the look-to-buy conversion rate for shoppers who used their Virtual Fitting Room compared to shoppers who used their traditional style and size guide [31]. Quelle.com’s virtual dressing room increased their clothing sales an average of 3.1 times and decreased the number of packages returned by 28% [31, 32].
7 Flash Sales Flash sales are very limited time offers on very highly discounted products. To build excitement, increase members, and encourage users to visit your site frequently and make purchases there, occasionally offer a specific item at a steep discount (e.g., 70% to 85% off) for a short period of time (e.g., 72 hours). Post the flash sale information on your site, but also notify members via Twitter, Facebook, and e-mail. The limited time offer creates excitement, encourages shoppers to tell their friends about the sale, and promotes immediate purchase decisions. Display the flash sale on your “Home” page and include a time remaining countdown clock on the flash sale product page. Make sure your systems can support a huge, short surge in activity. E-commerce sites holding flash sales include eBay’s FashionVault.com, HauteLook.com, Kmart (bluelight
Advances in E-commerce User Interface Design
297
specials on Saturday), NeimanMarcus.com, Vinfolio.com, and WineShopper.com. The Outnet.com puts the flash in flash sales by occasionally hosting one-hour-long sales. People who visit flash sales sites buy two to five times more than other online shoppers [33]. In the travel industry, 27% of active travelers took at least one leisure flash sale trip during the past year [34].
8 Mobile Commerce Mobile commerce (or m-commerce) refers to purchases made by users over their mobile phones. The popularity of m-commerce is growing rapidly. China’s mcommerce market was estimated at US$163 million in 2006 and $953 million in 2010 [35]. In the United States, m-commerce was about US$396 million in 2008, US$1.2 billion in 2009 [36], and expected to reach US$2.42 billion in 2010 [37]. Japan is leading the m-commerce sales revolution with over US$10 billion in 2009 [36], a stunning 17% of all Japanese e-commerce purchases [38]. By 2015, worldwide US$199 billion of goods and services will be purchased via smartphones – 8% of all e-commerce purchases [39]. In the United States, nearly half (48%) of US mobile phone users surveyed have made a purchase via their mobile phones [40]. M-commerce revenue will be $23.8 billion – 8.5% of all US e-commerce revenue by 2015 [41]. From mid-2009 to mid-2010 Amazon had sales of US$1 billion from customers purchasing via mobile devices [42]. In 2009, eBay had US$600 million in mcommerce and expects US$1.5 billion in 2010 [43]. There are several reasons for the growth of m-commerce. The biggest reason is probably access. Over three-quarters of the people in the world have a mobile phone subscription [42]. By 2011, over 85% of new handsets will include a Web browser and by 2015 over one billion people will access the mobile Web via their handsets [42]. More of the high-volume data plans needed to shop on the Web are becoming available, the speed of cellular networks is increasing, and retailers are increasing the number of mobile commerce Web sites and apps [36]. Finally, the mobile phone is handy when and where shoppers need it, even when shoppers are away from home and their laptop computers. There are tradeoffs when choosing whether to develop a mobile phone app or Web-based mobile commerce site. Since they reside on the mobile phone, apps are a better choice for weaker bandwidth networks, slower phones, or very loyal shoppers who will find and download your app. However, it can be challenging to get users to download your app. Since the operating systems and runtime environments in mobile phones can be quite different, you may have to create and maintain several different apps (e.g., for Apple iPhone, Google Android, or RIM Blackberry). A cleverly built Web-based mobile commerce site (e.g., focused on the WebKit rendering engine for browsers and using HTML5 and mobile phone-specific stylesheets) can run on many mobile phones [44]. So, skip the apps and build a Web site that is optimized for mobile phones. The m-commerce site will be easier for shoppers to find using popular mobile search (Google found a 5X increase in mobile search from 2008 to 2009 [45]). The site will be cheaper to build and manage (for example, the application and data are stored on the cloud vs. each user’s phone) and you can exploit the powerful new functions of HTML5 (such as accessing the user’s
298
L.J. Najjar
camera as a 2D bar-code reader). If a user is accessing your e-commerce site with a mobile device, use an automatic redirect to load the optimized mobile commerce site. Retailers with mobile commerce sites include 1-800-Flowers, Amazon, Carrefour, Crocs, eBay, Marks & Spencer, Saks, Target, Victoria’s Secret, and Sears. Provide the functions m-commerce shoppers use most often such as locating a physical store (80%), comparison shopping (70%) often done from within a brickand-mortar store [36, 46], reading product reviews (65%), and getting product information (56%) [40]. M-commerce shoppers also use their mobile phones to determine whether a specific product is in stock at a local store and to check the status of an order [46]. To encourage mobile shoppers, allow them to sign up to receive notifications of special sales online and at local stores and to Tweet their purchases. For your mobile site, design a very lean, scaled down version of your e-commerce site. Since mobile phone screens are small, cellular networks can make browsing a painfully slow experience, and users want it, display a prominent search field at the top of each mobile site page. Display another search entry field at the bottom of search results so users don’t have to scroll to the top to search again [37]. Since there are more typos on tiny mobile phone keyboards, account for typing errors in search entries. To speed browsing, show helpful category names (e.g., “Men’s”) and product filters (e.g., “Button down shirts” within “Men’s shirts”). Display short, textual product descriptions. Since mobile users often check for product reviews while in a physical store, display product reviews on each product page. To minimize touch errors, leave white space around links that users need to touch to select [37]. To accommodate slow page loads use very few images except for product photos [31]. If you have local brickand-mortar stores, use a phone’s global positioning system function to display localized products, prices, store locations, and the availability of a product at the local store. By 2014, almost half of all mobile subscribers will make m-payments [42], so create your own secure mobile payment system, link to your e-commerce site checkout page, or team up with Amazon, Google, Tania Solutions, or PayPal [47]. Exciting new user interface features are available to engage your shoppers and increase sales. Keep your e-commerce site interesting and up-to-date by adding the features your users will appreciate most.
References 1. Nielsen Company, http://th.nielsen.com/site/documents/ GlobalOnlineShoppingReportFeb08.pdf 2. Schonfeld, E.: http://techcrunch.com/2010/03/08/forrester-forecast-onlineretail-sales-will-grow-to-250-billion-by-2014/ 3. Najjar, L.J.: E-commerce user interface design for the Web. In: Smith, M.J., Salvendy, G., Harris, D., Koubek, R.J. (eds.) Usability evaluation and interface design: Proceedings of HCI International 2001, vol. 1, pp. 843–847. Lawrence Erlbaum, Mahwah (2001) 4. Najjar, L.J.: Designing e-commerce user interfaces. In: Proctor, R.W., Vu, K.-P.L. (eds.) Handbook of human factors in Web design, pp. 514–527. Lawrence Erlbaum, Mahwah (2005) 5. Najjar, L.J.: Designing e-commerce user interfaces. In: Vu, K.-P.L., Proctor, R.W. (eds.) Handbook of human factors in Web design, 2nd edn. Lawrence Erlbaum, Mahwah (in press)
Advances in E-commerce User Interface Design
299
6. Van Belleghem, S.: http://www.slideshare.net/stevenvanbelleghem/ social-networks-around-the-world-2010 7. Tabuchi, H.: http://www.nytimes.com/2011/01/10/technology/ 10facebook.html?_r=1&adxnnl=1&ref=todayspaper& adxnnlx=1294664570-ldhwCd9KwwXiaJzPdQpnAg 8. comScore, http://www.slideshare.net/yesonline/ state-of-the-us-online-retail-economy-in-q1-2010 9. Saba, J.: http://www.msnbc.msn.com/id/40855800/ns/ technology_and_science-tech_and_gadgets/ 10. Wauters, R.: http://techcrunch.com/2010/06/23/twitter-international-growth/ 11. Worswick, H.: http://www.suite101.com/content/ the-most-popular-social-networking-sites-worldwide-a292097 12. Nielsen Company, http://blog.nielsen.com/nielsenwire/online_mobile/ what-americans-do-online-social-media-and-games-dominateactivity/ 13. Calore, M.: http://www.webmonkey.com/2010/04/ adding-facebook-like-buttons-to-your-site-is-damn-easy/ 14. Boatman, K.: http://www.inc.com/internet/articles/201004/facebook.html 15. Linn, A.: http://www.msnbc.msn.com/id/40533821/ns/ business-holiday_retail/ 16. comScore, http://www.slideshare.net/comScoreInc/ state-of-the-us-online-retail-economy-in-q2-2010 17. Ionescu, D.: http://www.pcworld.com/article/184076/ dell_makes_money_from_twitter.html 18. Bernoff, J.: http://www.forrester.com/imagesV2/uplmisc/Josh_blogging.pdf 19. Internet Retailer Magazine, http://www.internetretailer.com/2010/07/22/ fashion-retailers-using-treepodia-video-report-conversion-gains 20. Bucholtz, C.: http://www.ecommercetimes.com/story/70835.html 21. Forum One, http://www.onlinecommunityreport.com/2010/09/ social-media-strategy-monitoring-research/ 22. eMarketer, http://www.emarketer.com/Article.aspx?R=1007998 23. eMarketer, http://www.emarketer.com/Article.aspx?R=1007434 24. Adregate Markets, http://www.adgregate.com/press/ pr-20100420-Adgregate-Markets-ShopFans.html
300
L.J. Najjar
25. Taylor, V.: http://blogs.forbes.com/marketshare/2010/06/11/ a-brand’s-facebook-fans-are-valuable-consumers/ 26. Chaney, P.: http://www.practicalecommerce.com/articles/ 1971-Social-Commerce-Spotlight-Payvment-a-FacebookStorefront-Provider 27. MacManus, R.: http://www.readwriteweb.com/archives/ current_e-commerce_trends.php 28. Perez, S.: http://www.readwriteweb.com/archives/ amazon_launches_facebook_e-commerce_store.php 29. MacManus, R.: http://www.readwriteweb.com/archives/ e-commerce_top_internet_trends_of_2000-2009.php 30. Titlow, J.P.: http://www.readwriteweb.com/archives/ how_small_businesses_can_take_advantage_of_recommendation_ engines.php 31. Demery, P.: http://www.internetretailer.com/2010/07/01/ virtual-model-cinches-higher-conversions-retailer-hawes-c 32. Fits.me, http://fits.me/content/benefits-retailers 33. Keane, M.: http://econsultancy.com/us/blog/5998-comscore-flashbuying-sites-are-making-people-spend-more-money-online 34. Yesawich, M.: http://blog.ypartnership.com/?p=301 35. Paul Budde Communication, http://www.telecomsmarketresearch.com/research/ TMAAATXC-Buddecom-Global-Digital-Economy-ECommerceMCommerce-Trends-Statistics.shtml 36. ABI Research, http://www.abiresearch.com/press/ 3373-Shopping+by+Mobile+Will+Grow+to+$119+Billion+in+2015 37. Ryan, S.: http://www.ecommercetimes.com/story/71015.html 38. Dusan, http://www.intomobile.com/2010/12/21/ abi-research-us-mobile-commerce/ 39. Ferrante, A.: http://www.retailtouchpoints.com/ cross-channel-strategies/548-new-m-commerce-guide-suggestsstrategies-to-leverage-smartphones-for-conversion.html 40. Lightspeed Research, http://www.lightspeedresearch.com/ press-releases/the-mobile-shopping-revolution/ 41. Deatsch, K.: http://www.internetretailer.com/2010/07/23/ 3-out-4-retailers-have-mobile-strategy-place-study-find 42. mobiThinking, http://mobithinking.com/ mobile-marketing-tools/latest-mobile-stats#internet-phones 43. Deatsch, K.: http://www.internetretailer.com/2010/11/17/ mobile-moving 44. Warren, C.: http://mashable.com/2010/07/13/ mobile-web-optimization/ 45. Wojcicki, S., Gundotra, V.: http://googleblog.blogspot.com/2009/11/ investing-in-mobile-future-with-admob.html 46. Tonti, A.: http://ssb.mofusepremium.com/blog/mobile-web-tools 47. Travlos, D.: http://www.forbes.com/2009/11/10/ travlos-apple-amazon-intelligent-investing-mobile.html
Information Technology Services Indutry and Job Design Yoshihiko Saitoh Yamato Software Development Lab, IBM Japan, Ltd. 1623-14, Shimotsuruma, Yamato City, Kanagawa, Japan
[email protected]
Abstract. Service businesses produce over 70 percent of the gross domestic product in developed nations and there has been an increasing dependency on information technology to deliver services. Over the last few decades, many large companies that provide IT-based products have transformed their businesses into service oriented businesses. For those organizational changes, the job design must also be considered because the distinctive characteristics of service businesses require different skills than what was found in manufacturing. In reality, however, while the proportion of service businesses is increasing in such work organizations, service businesses produce job stress that can lead to health problems. Although stress perception appears to be the major factor of discomfort in IT services industry, studies about effective ways to design a job or avoid job stress are not so many. In this paper, several factors that produce job stress are discussed based on a conceptual model. Keywords: Information Technology, Services Industry, Job Stress, Job Design.
1 Introduction In many work organizations several strategies have been taken to adapt them to a harsh business environment. Of these, especially for manufacturing companies, shifting to service businesses is one of the ways to redesign the organization. IBM, for example, is one of the largest IT companies and generally thought as a systems and software company, but the business model was drastically changed to service businesses. As a result, IBM recently generated more than 50% of its revenue from its services division (Verma, 2008). Services in the IT services industry include various kinds of IT-related jobs ranging from business consulting to system construction. Especially in a service delivery phase, employees at customers’ sites are required to have a wide variety of skills to help clients understand their businesses and then to help clients build their desired systems. Thus, since shifting to service business would require workers with different skills compared to manufacturing workers, organizations must prepare education or training programs. However, it is not easy to educate employees because most organizations can not afford to prepare such costly programs. As a result, employees who do not have suitable skills are assigned to a work project and exposed to a stressful work environment. Without careful consideration of their stress perception, any organization in IT services industry will fail to obtain desired business success. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 301–304, 2011. © Springer-Verlag Berlin Heidelberg 2011
302
Y. Saitoh
As one of the conceptual models about job design and job stress, Smith and Carayon-Sainfort (1989) proposed a balance model. The model can be applied to organizations not only of manufacturing but also of service business. In the following sections, issues in IT services industry are examined, and several elements that consist of work environment are discussed based on the model.
2 Issues in Information Technology Services Industry Over the last decade, demand for customization has forced manufacturing companies to bundle more services with their products and service providers to rely more on personal interactions between customers and employees. Such business demands have inexorably forced transformations of job characteristics. However, one of the problems is that many organizations do not provide employees with enough opportunities to adapt to the changes. Such organizational changes certainly will produce psychological oppression in the workforce, and it can lead to negative outcomes such as reduced loyalty, increase of turnover rate, and job stress. As one of the main issues in IT services industry, an increase of the number of employees who have job-related mental health problems has become critical for work organizations. According to the Japan Productivity Center (2006), for instance, the number of workers who have mental health problems caused by job stresses is increasing in 60% of major Japanese work organizations. In IBM Japan, about 90% of long-term absentee have mental health problems, and the total number of employees who have such symptoms is increasing. Employees are a valuable human resource that may contribute in several different ways to a company's business activities. To avoid the negative outcomes, therefore, it is needed for any organization to find the elements that can produce negative influences and implement preventive measures.
3 Job Stress Model Many job stress models have been applied to simple production systems such as assembling lines in a plant. In those models, several stressors that can directly affect workers are examined such as repetitive simple tasks, noisy workplace, poor air quality, tight work schedule, and so on. As one of the job stress models, Smith and Carayon-Sainfort (1989) proposed a way of conceptualizing job design and job stress based on the balance among job elements. It addresses how organization and job design can influence worker health. In this model, the five elements (person, task, technology, organization and environment) are defined to assess the impact of job stress. Since it can be assumed that those elements are important even in the service business organizations, each element is reviewed in the following sections.
4 Person In a service oriented business, the role of the individual has become more important because direct customer interaction is one of the important factors to successfully
Information Technology Services Indutry and Job Design
303
complete a work project. In general, it is said that customer satisfaction is often influenced by the quality of the interpersonal interaction between the customers and the contact employee. In other words, success of service business is highly dependent on employees’ characteristics such as personality, skills, ability or motives. Especially, employees who excel at communicating with customers have a crucial advantage to smoothly run a project. In contrast, when an employee who does not have such characteristics is assigned to the work project, its service quality may go down. In such a case, the employee will feel psychological pressure.
5 Task In general, employees have their responsibility and job boundary. Compared to workers of manufacturing, employees in IT services industry have a wide job boundary. In recent years, for example, a project has become a basic work unit and there is no single task that is accomplished by a member. That is, employees must cooperate with other project members to do tasks. In such work environment, role ambiguity in job boundaries or uncertainty of job demands becomes a source of job stress, and it easily produces physical work overload and psychological work pressure. These negative influences can gradually or sometimes rapidly produce job stress on employees and reduce their work motivation.
6 Technology In constructing a system, employees must have enough knowledge and skills to complete it within a short period of time because current sophisticated systems consist of combined high technologies. Furthermore, since total workload of an employee tends to increase, each employee must obtain a wide variety of knowledge about the latest technologies. Although the best way to obtain such knowledge is to attend the training programs, such time-consuming effort can be a big burden for IT specialists who construct business systems at customers’ site.
7 Organization Effective organizations are those which produce excellent results by any measure of costs, quality or efficiency (Pasmore, 1988). When we think of the role of organizations, one of the most important factors is the relationships between an employee and his/her managers. In order to enhance employees’ performance, it is important that the managers recognize the individual need and provide opportunities for satisfaction. That is, human resource management based on employees’ needs or ability is extremely important to derive better performance from the employees.
8 Environment In the balance model, factors that produce sensory disruption such as noise or air quality are mainly discussed. In modern work offices, on the other hand, the stress
304
Y. Saitoh
level of those factors has become low. Instead, for employees of IT services industry, different environmental stress must be considered. In the service delivery phase, for example, employees must work at a client site in closed networks, restricted access authority or insufficient software/hardware resources. These environmental factors highly depend on the workplace of customers’ sites and each of them can be a source of job stress.
9 Discussion Any organizational change can impinge upon each employee with different impact, and the resultant employee's perception is also unique. If it negatively affects employees, it will produce negative responses such as reduced loyalties and continuous psychological pressure. Because of the complex interactions of the human mind, it is not easy to measure the impacts to workers. In general, when a worker is given a job which does not fit his/her ability or sense of value, he/she can not perform with maximum power. In terms of job design, the biggest limitations of existing job design research and theory is its narrow focus, and traditional theory does not consider an adequate range of work characteristics (Parker and Wall, 1998). As mentioned earlier, services in IT services industry include various kinds of IT-related jobs, and direct customer interaction has become an important factor to successfully complete a work project. It means that the mechanism of job-related mental health problems in service business can be different from the former mechanisms. It is unquestionable that highly motivated persons tend to work harder and perform more effectively in their jobs than less motivated individuals. Therefore, the main question for all managers in work organizations must be how motivate employees to work. The answer is not simple especially in services business because there are many situational contexts such as workplace environment, social interactions and workers’ characteristics. When we think of the characteristics of IT service business, flexible resource management system is highly recommended to place the right employees in the right jobs to encourage them to work.
References 1. Parker, S., Wall, T.: Job and Work Design. SAGE Publications Inc., Thousand Oaks (1998) 2. Pasmore, W.A.: Designing effective organization. John Wiley & Sons Inc., Chichester (1988) 3. Smith, M.J., Carayon-Sainfort, P.: A balance theory of job design for stress reduction. International Journal of Industrial Ergonomics 4, 67–79 (1989) 4. Verma, R.: Predicting customer choice in services using discrete choice analysis. IBM Systems Journal 47(1) (2008)
Dodging Window Interference to Freely Share Any Off-the-Shelf Application among Multiple Users in Co-located Collaboration Shinichiro Sakamoto, Makoto Nakashima, and Tetsuro Ito Department of Computer Science and Intelligent Systems, Oita University 700 Dannoharu, Oita-shi, Oita-ken, 870-1192, Japan {v0753035,nakasima,ito}@oita-u.ac.jp
Abstract. A method of dodging window interference is described for allowing multiple users to freely share any off-the-shelf single-user application in colocated collaboration utilizing a shared device. This method is indispensable for transparently realizing application sharing in light effort with a centralized architecture by using a surrogate window which is a mimic of the original application’s window. Although the original application should process any event on the surrogate window, window interference could be caused by overlapping the location of an event with the surrogate window and then the event cannot be processed. To avoid window interference we formulate the method based on quadrant-based window positioning, in which the original application’s window is dynamically repositioned for displaying only one quadrant of this window in one corner of the screen area. The availability of the proposed method was certified and the usability was clarified in co-located collaboration in a university laboratory. Keywords: Dodging window interference, window positioning, collaboration, application sharing, CSCW, centralized architecture, screen-sharing system.
1 Introduction Many ways of sharing any off-the-shelf single-user application among multiple users are well documented for computer-supported cooperative work (CSCW). A centralized architecture is employed in most available screen-sharing systems (e.g., [5] and [7]). The application sharing is achieved by centralizing an original off-theshelf single-user application (an ‘original application’ in short) and event occurrences onto one PC, and by copying the window image of the original application on each user’s PC. This architecture can transparently realize application sharing in distributed collaboration with no specific effort, i.e., without changing the source code of the original application for replicating it. However, the utilization of this architecture in co-located collaboration, where multiple users gather around a shared device, e.g., a tabletop display, has not been studied in depth. A centralized architecture has the benefit of supporting co-located collaboration by allowing each user to utilize the original application via its surrogate window in G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 305–314, 2011. © Springer-Verlag Berlin Heidelberg 2011
306
S. Sakamoto, M. Nakashima, and T. Ito
his/her preferred location and orientation [1]. Here, each of the surrogate windows works as a mimic of the original application’s window (an ‘original window’ in short) while displaying its copied window image. Each user initiates any event, such as clicking a mouse button, dragging a mouse, pressing a key, etc., on his/her own surrogate window, not on the original window. The original window should, however, be on the top of other windows in order to receive every event on its surrogate window. The events on the surrogate window are then interfered with by the original top window. In order to freely share the original application among multiple users without causing such interference, it is crucial that the original window is repositioned to an appropriate place on the screen area according to the location of the event. We propose a novel method of dodging window interference to freely share any off-the-shelf single-user application among multiple users in co-located collaboration. This method dynamically repositions the original window so that this window dodges the interference with its surrogate windows, allowing the original application to receive the events on the surrogate windows at any given time. To achieve this, the method realizes quadrant-based window repositioning which draws only one quadrant of the original window among four quadrants including the corresponding location of the event on the original window as their common origin. Since the original window has a square shape, it enables us to prove that at least one quadrant exists thereby avoiding window interference even if the event occurs at any location on any surrogate window when the size of the original window is smaller than that of the screen area. It is also possible to minimize the effort required to reposition the original window caused by the later events when the displayed quadrant is far away from the event location on the screen area to avoid window interference. The rest of this paper is organized as follows: The problems of previous application-sharing systems in co-located collaboration are discussed in Section 2. The requirements and the quadrant-based window repositioning of the proposed method are described in Section 3. We estimated the availability of our method in Section 4 and clarified the usability of the method in co-located collaborative work in Section 5, where we implemented our method into an application-sharing system, CollaboTray [1] employing a centralized architecture.
2 Application Sharing in Collaboration This section describes previous screen-sharing systems and an advanced applicationsharing system based on a centralized architecture. We also discuss the problems they have in co-located collaboration. 2.1 Application-Sharing Systems For application sharing, screen-sharing systems have been used in practice for almost twenty years, e.g., PCAnyWhere[7], NX[4], and VNC[5]. Among them the opensource screen-sharing system VNC is utilized in many systems for application sharing (e.g. [2], [8], and [9]). Those systems can allow the users to share any off-the-shelf application via its original window and surrogate windows without sharing the whole screen on one PC. A VNC-based toolkit for window managing on X-window system,
Dodging Window Interference to Freely Share Any Off-the-Shelf Application
307
Ametista [6], allows a user to rotate the window image as the users need in co-located collaboration [3]. Those VNC systems, however, have a problem that the users of the surrogate windows are not able to initiate any event as the user of the original window is operating the original application. An application-sharing system, CollaboTray [1], can deal with the above problem by making each of the users utilize the surrogate window while its original window is made invisible to disallow use of it by any user. A CollaboTray centralizes an original application on only one PC and decouples the drawing of the surrogate window of an orignal window from the processing of any event on the surrogate window. Any original CollaboTray, which is loaded with an original application, can yield its clone CollaboTrays each of which manages inherently the same surrogate window as the original CollaboTray. The CollaboTray uses the original window in a different way from the previous screen-sharing systems and has advantage of allowing the users to share the original application in any orientation and time. Figure 1 illustrates the basic approach of realizing application sharing in co-located collaboration with a centralized architecture, where an original window and its two surrogate windows A and B exist. In utilization of VNC, if ownership of the original window is disallowed by any user like CollaboTrays, the users can initiate any event by taking turns among themselves when utilizing the original application via their surrogate windows. However, a common problem arises in application sharing by VNC and CollaboTrays when the original window overlaps with its surrogate windows as shown in the figure. Even if the original window is invisible, the original window needs to be on top of the surrogate windows to receive any event on them. The problem is that the surrogate window owned by a user should also be on top of the original window to allow him/her to operate the original application. This contradiction causes window interference between the original window and its surrogate window. Note that each of the surrogate windows do not interfere with each other since neither is required to be on top of the other surrogate window when its user initiates any event. Screen area Surrogate window B The corresponding location of the event Original window Surrogate window A
The location of an event
Fig. 1. Application sharing in co-located collaboration
2.2 Window Interference in Co-located Collaboration The two cases of window interference are illustrated in Fig. 1. In the figure, the location of each event on the surrogate windows is represented by a filled circular or
308
S. Sakamoto, M. Nakashima, and T. Ito
triangular shaped mark. The unfilled ones correspond to the location of the event on the original window. For surrogate window A, the area including the corresponding location of the event on the original window is overlapped with this surrogate window. When the user initiates the event on the surrogate window A, the original window is interfered with by surrogate window A as the user is initiating an event on the surrogate, the original window is unable to get on the top, and thus the event cannot be sent to the original application. Conversely, for the surrogate window B, the area that includes the location of the event on this surrogate window is overlapped with the original window. When the event is sent to the original application, the original window gets on top of surrogate window B and thus the user of surrogate window B cannot initiate his/her next event. In addition to the above, there is another concern about the feature of the mouse moving. If a user uses a standard USB mouse, the location of the mouse cursor on the screen area is updated every 8 msec. For any event, the location of the next event may jump to the place on the original window causing the kind of interference seen in surrogate window B in Fig. 1. This can occur even if the original window is placed where it can avoid window interference.
3 Dodging Window Interference This section describes the method of dodging window interference by using quadrantbased window repositioning. We first specify the requirements to avoid window interference and then formulate the method to meet these requirements. 3.1 Requirements There are two requirements to avoid the window interference mentioned in Section 2.2: (a) to avoid the physical overlapping between the original window and its surrogate window to allow each other to get on top if needed, and (b) to avoid window interference by any fast movement of the mouse cursor. The former leads the following conditions to be satisfied: Ca1: The corresponding location of the event on the original window is outside its surrogate window on which the event occurs. Ca2: The location of the event on a surrogate window is outside its original window. The latter requirement is possibly avoided by satisfying the following condition: Cb: The original window stays as far away from the location of an event on its surrogate window as possible. Conditions Ca1 and Ca2 lead us to understand that only the smallest possible area of the original window has to be displayed, which includes the corresponding location of the event. For condition Cb, since the screen area has a square shape, one of the four corners of the screen area is the furthest from the location of any event. Given these facts, we devise a way of quadrant-based window repositioning, which selects a quadrant of the original window with the corresponding location of the event as its
Dodging Window Interference to Freely Share Any Off-the-Shelf Application
309
origin, and display this quadrant on the corner of the screen area, thus satisfying the above conditions. 3.2 Quadrant-Based Window Positioning Fig. 2 shows an example of dodging window interference for the case of surrogate window A in Figure 1, in which the original window is repositioned to the top right corner of the screen area. Only the third quadrant, i.e., Q3, of the original window is selected to be displayed with low opacity on the screen area, where four quadrants including the corresponding location of the event as their common origin exist. When each of the two quadrants, Q2 and Q4, is selected, the original window can be repositioned to the bottom right and the top left corners, respectively, as shown in the dashed square in the figure. The original window is, however, the furthest from the location of the event when Q3 is selected. Selecting Q3 satisfies Ca1, Ca2, and Cb. If Q1 is selected, the original window is repositioned to the bottom left corner of the screen area but condition Ca2 is not satisfied. Original window
Original window
SR
SL
Screen area
ST
Q2
Q1
Q3
Q4
WL
WR
WT
WB
Surrogate window A
Original window
SB Original window
Fig. 2. Dodging window interference
We here call a quadrant Qi (i= 1,2,3,4) of the original window an available quadrant if, Qi can be displayed so as to satisfy conditions Ca1 and Ca2 by positioning its origin on the corner of the screen area in the opposite direction of Qi. The overall process of dodging window interference between the original and its surrogate window is summarized as follows: Step 1: Divide the original window into four quadrants with the corresponding location of the event as their common origin. Step 2: Find all available quadrants from those four quadrants. Step 3: Select one quadrant Qi from among the above quadrants, which satisfies condition Cb. Step 4: Reposition the original window so as to only display Qi on the screen area. If the above process can find an available quadrant, we can say that the original window can avoid window interference with its surrogate window. Although the
310
S. Sakamoto, M. Nakashima, and T. Ito
available quadrant can be overlapped with the surrogate window in step 4, the utilization of the surrogate window dose not interfere with the quadrant by making it invisible. Let us prove the robustness of the above process in finding available quadrants. As shown in Fig. 2, let WL, WR, WT and WB denote the distance of the corresponding location of the event on the original window from the left, right, top and bottom edges of the window, respectively. Also let SL, SR, ST and SB denote the distance of the location of the event on the surrogate window from the left, right, top and bottom edges of the screen area, respectively. Suppose that both the size of the original and surrogate windows are smaller than the screen area in width and in height, i.e., WL+ WR < SL +SR and WT + WB < ST + SB. We now have two key lemmas in terms of an available quadrant in the situations in which the surrogate window is not rotated and is freely rotated. Lemma 1: Suppose the size of the original window (and surrogate window) is smaller than that of the screen area. Then at least two available quadrants exist even if the event occurs at any location on the surrogate window. Proof: Since the screen area has a square shape, the surrogate window can overlap with only one corner of the screen area at a maximal. This implies that there are three quadrants of the original window to satisfy condition Ca1. If two of these quadrants, Qi, and Qj, do not satisfy condition Ca2, then it can be said that the size of each of Qi, and Qj is greater than or equal to that of the quadrant of the screen area, which has the location of the event as its origin, in the opposite direction of Qi and Qj, respectively. Thus WL+WR ≧ SL+SR or WT+WB ≧ ST + SB should be satisfied. This contradicts the supposition about the size of the original window, so we can conclude that there are at least two available quadrants. What is the influence of rotating the surrogate window to the number of the available quadrants? With regard to this question we have the following lemma. Lemma 2: Suppose the surrogate window is rotated. Then at most two corners of the screen area are overlapped with this surrogate window. Proof: Regardless of the supposition about the size of the surrogate window, the diagonal length of this window can be longer than the height and width of the screen area. The surrogate window can thus overlap with the two corners at the same time by rotating it. This rotation can only occur when one corner of the original window is positioned at one corner of the screen area and the window is rotated upon this corner. We are now ready to formulate the following theorem for the number of available quadrants. Theorem: Suppose the size of the original window (and surrogate window) is smaller than that of the screen area in width and in height. Then there is at least one available quadrant even if the event occurs at any location on any surrogate window. Proof: From the proof of Lemma 2, the worst case in rotating the surrogate window is occurred when one corner of the surrogate window is positioned on one corner of the screen area and is rotated on this corner. Even if this occurs, by Lemma 1, at least two quadrants of the original window are available quadrants when the surrogate window
Dodging Window Interference to Freely Share Any Off-the-Shelf Application
311
is not rotated. Thus it can be said that even if the surrogate window is rotated by a user, either of those two quadrants is still an available quadrant.
4 Estimation of Applicability We certified the applicability of our method in Section 3.2 by estimating the number of available quadrants in step 2 and the distance of the original window from the location of the event in step 4. For estimation, we set the size of the screen area to the standard size, i.e., 1920 pixels width and 1080 pixels height. The ratio of the original and surrogate windows to the screen area in width and height was varied between 10% and 90%. The surrogate window was displayed by moving it vertically and horizontally every 60 pixels on the screen area. First, we estimated the number of available quadrants found by our method at every location of the surrogate window. Fig. 3(a) show the distribution of the minimal number of the available quadrants on the screen area, where the ratio of the original (and surrogate) window was 90%. The number in each cell indicates the minimal number of available quadrants when the center of the surrogate window was located in the cell and the window was rotated from 0 to 360 degrees at intervals of 15 degrees on the center, and the event was initiated on each location on the surrogate window. As a result of rotating the surrogate window, the cells, which had only one available quadrant, tended to converge on the left and right hand parts of the screen area. Fig. 3(b) shows the relative frequency of the cells having the minimal number of available quadrants within the total cells on the screen area by varying ratios of the original (and surrogate) window. While the ratio was under 90%, multiple available quadrants were found in almost every location on the screen area. Since each user does not intend to monopolize the screen area in co-located collaboration, it could be said that there is a high possibility to have multiple available quadrants for selecting the place, which can be far away from the location of the event, to reposition the original window.
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 2 2
1 2 2 2 3 3 3 3 3 3 3 3 3 3 3 2 2 2 1
2 2 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 2 2
(a)
2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
Relative frequency of the cells having the minimal number (%) 100
Minimal number of available quadrants
50 4 3 2 1 0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9
0
Ratio of the original window to the screen area in width and height (%)
(b)
Fig. 3. A distribution map of the minimal numbers and relative frequency of the cells having the minimal number of available quadrants
Secondly, we estimated the distance of the original window from the location of each event on the surrogate window when the original window was repositioned by
312
S. Sakamoto, M. Nakashima, and T. Ito
our method. Note that the maximal distance was equal to the diagonal length of the screen area, i.e., about 2202 pixels. Fig. 4 shows the probability of any occurring distance when the ratio of the original (and surrogate) window was 30, 50, or 90 % to the screen area in width and height. By observing the curves for 30% and 50%, as long as a user does not move the mouse cursor too fast (about 500 pixels, half of the height of the screen area, per update), we can say that our method avoids window interference caused by mouse movement at any given time. Even if the ratio was 90%, the possibility of avoiding window interference caused by any such mouse cursor speed was 77.6%. In the next section, we clarified that our method could work well in a prepared working environment without concern for mouse cursor speed and surrogate window size.
Fig. 4. The probabilities of occurring distances
5 Usability Case Study The mouse cursor speed and the size of the surrogate window can be changed by the users and thus there is a possibility of window interference. To clarify the usability, we prepared a working environment to utilize CollaboTrays in which this method was implemented. The PC used in the environment had an Intel Xeon processor running an MS Windows 7. 5.1 The Environment The implementation of our method into a CollaboTray was easily done by attaching its mechanism to the mechanism of drawing a surrogate window. Fig. 5(a) shows an example screen shot of a new version of a CollaboTray, which has a circular shape, when it was loaded with the surrogate window of an MS PowerPoint. Here, the original window, which was made highly visible in order to highlight its position, was repositioned in the top right corner of the screen area to avoid window interference. We prepared a working environment in which a university student polished up his research presentation slides through face-to-face interaction with the members of his research group, across a tabletop display connected to a PC and which had the same size screen area as in Section 4. Each of four groups had 3 members and total 12 subjects
Dodging Window Interference to Freely Share Any Off-the-Shelf Application
313
used the CollaboTrays. Each member of every group utilized his/her own mouse individually and two of them one keyboard. Note that each subject used a standard USB mouse and adjusted the mouse cursor speed as he/she liked before work. Fig. 5(b) shows a female subject and two male subjects in one research group collaboratively polishing up the presentation belonging to the male on the right. This subject loaded a CollaboTray with an original MS-PowerPoint for displaying his prepared presentation slides, and handed its clone CollaboTrays over to the other subjects. Everyone freely changed the content, magnified his/her surrogate window, rearranged the slide configuration, etc., during their about 60 minute collaborative work.
(a)
(b) Fig. 5. CollaboTrays
5.2 Results Although many subjects magnified the surrogate window on his/her CollaboTray, its average ratio was about 50% to the screen area in width and height. No one magnified his/her surrogate window to the size of the screen area. We also logged the moving distance of the mouse cursor per update (the moving distance for short), and the distance of the original window from the location of every event (the window distance for short). Table 1 shows the average distances during collaborative work performed by each research group. For each group the average window distance was statistically superior to the average moving distance (p< 0.05). As a result, window interference never happened. From these observations, we are able to deduce that our method is a practical approach which satisfies the requirements for dodging window interference. Table 1. The moving distance of the mouse cursor and the distance of the original window from the location of the event Group Moving distance in pixels
1
2
3
4
19.6 ± 83.9
19.1±83.0
19.1 ± 88.9
16.3±72.8
Window distance in pixels 1070.0 ±258.9 1029.3±237.9 1205.3 ±233.0 1234.3±169.6
After the work each subject filled in a questionnaire which included two questions: “Were you able to do collaborative work smoothly?” and “Did you feel comfortable
314
S. Sakamoto, M. Nakashima, and T. Ito
with the mouse movement?” Each subject answered on a 5-point Likert scale ranging from very acceptable (scale = 1) to very unacceptable (scale = 5). For each question, over 75 percent of the subjects gave positive answers (scale = 1 and 2) and this percentage was statistically superior to that of negative answers (scale = 4 and 5). From the result of the former question we could say that the repositioning of the original window has the added advantage of being able to freely share the original application and to help boost collaboration. As for the latter question, we could also say that in our method the mouse movement is not problematic.
6 Conclusions The method described in this paper allows users who are sharing off-the–shelf-singleuser applications in co-located collaborations to “dodge window interference.” This method is indispensable for developing application-sharing systems based on a centralized architecture. The applicability of the proposed method was certified both theoretically and through practical co-located collaborative work in a university research laboratory. The users of the CollaboTrays employing the proposed method were helped to boost their co-located collaboration. From this point on we need to improve our method for utilization in distributed collaboration. At present users having an original application will suffer interference from other users when each of them operates another original application through his/her surrogate window on a distributed PC. The applicability of our method should be clarified to realize application sharing in such distributed collaboration.
References 1. Abe, Y., Matsusako, K., Kirimura, K., Tamura, M., Nakashima, M., Ito, T.: Tolerant sharing of a single-user application among multiple users in collaborative work. In: Companion Proceedings of the ACM Conference on Computer-Supported Cooperative Work (CSCW 2010), pp. 555–556. ACM Press, New York (2010) 2. Hank, B.: Empirical evaluation of distributed pair programming. International Journal of Human-Computer Studies 66, 530–544 (2008) 3. Kruger, R., Carpendale, S., Scott, S.D., Greenberg, S.: Roles of orientation in tabletop collaboration: Comprehension, coordination and communication. Computer Supported Cooperative Work 13(5-6), 501–537 (2004) 4. Nomachine, NX, http://www.nomachine.com/documents/getting-started.php 5. Richardson, T., Stafford-Fraser, Q., Wood, K.R., Hopper, A.: Virtual network computing. IEEE Internet Computing 2(1), 33–38 (1998) 6. Roussel, N.: Ametista: a mini-toolkit for exploring new window management techniques. In: Proceedings of the Latin American Conference on Human-Computer Interaction, pp. 117–124. ACM, NY (2003) 7. Symantec, http://www.anyplace-control.com/pcanywhere.shtml 8. Tee, K., Greenberg, S., Gutwin, C.: Artifact awareness through screen sharing for distributed groups. International Journal of Human-Computer Studies 67, 677–702 (2009) 9. Ultra VNC, http://www.uvnc.com:8080/
Process in Establishing Communication in Collaborative Creation Mamiko Sakata and Keita Miyamoto Faculty of Culture and Information Science, Doshisha University 1-3 Tatara Miyakodani, Kyotanabe City, 610-0394 Japan
[email protected]
Abstract. We try to quantify the communication in collaborative activities in terms of verbal and non-verbal processes, using collaborators of a creative activity as the study subjects. This study set up a production task using LEGO® blocks. Our study subjects consisted of 5 groups of 3 males. They were asked to use their imagination freely to build a “castle” using the LEGO blocks. We recorded their activities with video cameras, while measuring their bodily movements three dimensionally using a motion capture system. Our experiment showed that the works created by groups with many illustrators(gestures for spatially expressing inner ideas and images) rated high both in perfection level and favorability rating. In a collaborative creation, it was shown that direct visual expressions of mental representations through nonverbal, rather than verbal, communication among the collaborators increased perfection level of the end product. Keywords: Collaborative Creation, Nonverbal Behavior, Bodily Movements.
1 Introduction When a plural number of people engage in a collaborative creation, what kind of communication will occur in its process and how will it affect the end results, i.e., the works? Suzuki et al (2007) revealed that verbal and nonverbal behavior such as eye contact, and finger-pointing were critical in the success and/or failure of a task of building a box-like structure. Matsuda et al (2007) showed that nonverbal behavior exchanged among highly intimate workers contributed to promote task achievement. These studies prove the importance of nonverbal behavior among workers in a collaborative work. Each study, however, used a “task-fulfilling type” collaboration in which the conditions for achieving the task were clearly shown to the collaborators. Some collaborative activities such as music composition, developing stage settings, etc. lack an overall finished product vision or image that all the collaborators share. It is each collaborator’s creative talent that will form the final product. In a non- taskoriented joint activity, collaborators must present their thoughts and images accurately and understand what are presented by other collaborators. In doing so, how do they G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 315–324, 2011. © Springer-Verlag Berlin Heidelberg 2011
316
M. Sakata and K. Miyamoto
collaborate to create their final product, the end product of their constantly changing images and ideas? With these questions in mind, we try to quantify the communication in collaborative activities in terms of verbal and non-verbal processes, using collaborators of a creative activity as the study subjects.
2 Experimentation 2.1 Experiment Outline This study set up a production task using LEGO® blocks in order to analyze how communication is established in a collaborative creation. 2.2 Procedure Our study subjects consisted of 5 groups of 3 males without prior acquaintance. They were seated at a round table and asked to use their imagination freely to build a “castle” using the LEGO blocks. The time allowed for this task was 30 minutes. We recorded their activities with a video camera, while measuring their bodily movements three dimensionally using a motion capture system. Each collaborator was asked to take the Big Five Character Trait Test and then answer a questionnaire before and after the experiment about the impressions of their collaborators and about the experiment itself. 2.3 Emotional Evaluation of Final Product Pictorial images of each work(see Fig.1) were used for emotional evaluation using a pair comparison method (Thurstone method), using 68 evaluators who were all university students (29 males and 39 females, aged 19.53±1.04). All the finished works were paired up and the paired were shown to the evaluators, who were asked to select which work showed “greater creativity”, “greater perfection” and “higher favorability rating”. 2.4 Analytical Indicators Using the results of the questionnaire, the researchers calculated the character trait of each group, evaluation of the experiment, and the changes in the collaborators’ impressions before and after the experiment. Using the recorded data, the researchers extracted “speech” (5 types - making suggestions, approvals, confirmations, other, and general chats, Table 1), “illustrators” (gestures for spatially expressing inner ideas and images, Fig.2), and the “eye contacts” “nodding” the collaborators made to one another. ELAN application software was used for data tagging (http://www.lat-mpi.eu/tools/elan). ELAN supports event extraction in visual observations, allowing unlimited annotations to be added to visual and audial data. It enables the researcher to record, in a time-series manner, the frequency and duration of each event occurring in visual data.
Process in Establishing Communication in Collaborative Creation
group1
group2
group4
317
group3
group5
Fig. 1. Final Product Table 1. Speech Category Speech Category Suggestion Approval Confirmation Other General Chat
Classification Criteria Presentation of ideas about castle structure to the collaborators. Responses implying consent to the suggestions and/or confirmations. Reconfirming the preceding suggestions or the images shared by the collaborators. All other speech concerning experiment but not included in the above three categories. All speeches without any relevance to the purpose of the project.
Fig. 2. Sample of Illustrator
Using the motion capture data, we calculated the movements of the upper body and the distance between the table and each collaborator’s forehead during upper body movements. We used the obtained results as analytical indicators. The movement of the upper body was defined to be the time-series distance of travel of the marker attached to the 7th cervical bone, using its relative coordinates, while keeping the marker attached to the sacral bone as the origin(Fig. 3). Okubo et al (2009) showed the relationship between the level of concentration and the forward lean posture. In
318
M. Sakata and K. Miyamoto
Fig. 3. The movements of the upper body
Fig. 4. The distance between the table and each collaborator’s forehead
this study, we defined an index to show the worker’s level of concentration by calculating the distance between the table’s center and the worker’s forehead (Fig.4). In order to observe the time-series changes in the communication generating process in a series of collaborative works, this study divided the experiment into three 10-minute phases, from the start signal to the finish signal. These three stages are called Phase 1, Phase 2 and Phase 3 herein.
3 Results and Discussion 3.1 Work Score As a result of the pair comparison method, Group 3 received the highest evaluation in creativity, while Group 5 was evaluated highest in terms of perfection and favorability rating. Fig.5 shows a plot chart of the creativity and perfection scale score obtained by the pair comparison method.
Creativity
-0.8
g1 (-0.612) ↓
-0.6
g5 g2 (-0.254) (-0.089) ↓ ↓
-0.4
Perfection
-1
-0.8
-0.6
g4 (0.101) ↓
-0.2
0
g3 (-0.323) ↓
g1 (-0.099) ↓
-0.4
-0.2
↑ g4 (-0.189)
g3 (0.854) ↓
0.2
0.4
0.6
0.8
1
g2 (0.255) ↓
0
0.2
0.4
↑ g5 (0.356)
Fig. 5. Plot Chart of the Creativity and Perfection
0.6
0.8
1
Process in Establishing Communication in Collaborative Creation
319
A result of correlation analysis showed a negative correlation between perfection and creativity(-0.546), but a positive correlation between favorability and perfection(0.846). In other words, greater perfection increased favorability but reduced creativity. It was also found that work’s creativity was not related to its becoming favorable. 3.2 Subjective Evaluation by the Participants We asked the participants to answer ten questions about their subjective evaluation, by allocating a maximum of 10 points to each question. By conducting principal component analysis using the score-based correlation matrix, we extracted two principal components with eigenvalue of 1 or greater(Table2). From the component matrix, we can call the first principal component as “Communication Component” and the second principal component, “Sense of Achievement Component”. In other words, the 10 subjective evaluation questions listed in the table can be summarized into two viewpoints: “smooth communication with the others” and “sense of achievement doing the task.” By plotting the first component on the x-axis and the second component on the yaxis, we obtained the result shown in the Fig.6. Except for the three collaborators in Group 4, the participant’s evaluation was different in no small way even among those belonging to the same group. Table 2. Result of PCA
Did you enjoy doing this task? How do you like the finished work? Did you feel a sense of achievement doing this task? Did your group divide the work load among the collaborators? Did your group achieve a good communication? Did your group enjoy conversations? Did you willingly bring forth your ideas? Did you willingly consent to other collaborator’s ideas? Were your ideas adopted? Do you think you had a good team work overall? eigenvalue cumulative contribution ratio(%)
Communication 0.739 0.840 0.454 0.385 0.842 0.887 0.803 0.696 0.736 0.881 5.549 55.486
Achievement 0.491 -0.019 0.818 -0.484 -0.165 0.025 0.176 -0.523 -0.104 -0.133 1.505 70.534
3.3 Speech The total time of different speech categories for each group are shown in the Fig.7. From this figure, one can see that the total speech time varied greatly from group to group and that Group 3 showed the exceptionally long total General Chat time. We then conducted the Ward’s cluster analysis of the total speech time for the different phases and obtained the results which are graphed below(Fig. 8). From this, we learned that Phase 1 (opening phase) contained suggestions and confirmation about the castle design, Phase 2 (mid-point phase) contained design approvals and general chats, and Phase 3 contained all other categories of speech.
320
M. Sakata and K. Miyamoto
Fig. 6. Plot of the PCA score
25:00.0
20:00.0
General Chat
15:00.0
Other Confirmation
10:00.0
Approval Suggestion
05:00.1
00:00.1
g1
g2
g3
g4
g5
Fig. 7. The total time of different speech categories for each group
As a result of correlation analysis, the length of general chats are closely tied to creativity, while the frequency of confirmation and approval contributed to favorability and perfection(r > |0.7|). In order to “build a castle with LEGO blocks”, each collaborator must first visualize the final product and communicate this to others, in an encoding process. Speech is, needless to mention, one of the most important modes of communication to relay a message. It is also important to accurately understand the message given by the others, in a decoding process. In this report, the message initiated by the collaborator is classified into the “Suggestion” category. It is also important to
Process in Establishing Communication in Collaborative Creation
321
accurately understand the message given by the others, in a decoding process. These processes are included in the “Approval” and “Confirmation” categories. This experiment confirmed the changes in speech in the series of collaborative activities. Also there was a strong relationship between the collaborator’s personality and the content of his speech, so verbal communication developed in a natural manner, in accordance with the character trait of each collaborator. As a result, the speech exchanged during collaborative activity was shown to affect the creativity element of the finished product.
General Chat Approval Phase 2
Other Phase 3
Suggestion Confirmation Phase 1
Fig. 8. Tree Diagram : Result of cluster analysis of the total speech time
3.4 Nonverbal Behavior The Fig.9 shows the frequency of the occurrence of illustrators. Illustrators occurred more frequently in the order of Group 5, 2, 3, 4, and 1. This order is the same as the order of work’s perfection shown in Section 3.1. The illustrator frequency is assumed to be related to the work’s perfection. The Fig.10 shows the timing of illustrator’s occurrence plotted in a time-series manner of group2 and 3. From this figure, illustrators occurred between a plural number of collaborators. In order to find out how nonverbal behavior affects the work score, we conducted a correlation analysis. From this table 3, it was clear that the frequency of the occurrence of nonverbal behavior by the collaborators contributed to the work’s creativity, perfection and favorability. In sum, the group with the frequent use of illustrators during collaborative work achieved high creativity, perfection and favorability of their work.
322
M. Sakata and K. Miyamoto
As described earlier, each collaborator must visualize “what kind of castle he wants to build” and relay this message to the others (by encoding) in order to “build a castle with LEGO blocks”. So far, language is described to be an important medium to achieve this. However, it is by no means the only way. McNeill (1987), the foremost researcher in gestures, describes that mental representations in humans first exist as undifferentiated imaginary thoughts and syntactic thoughts in the deep layer, which become differentiated into two when they appear on the surface. When these thoughts appear as an audio channel, it become speech, and when it appears as a bodily movement channel, it is represented as gesture. Our “illustrator” was a gesture that appeared accompanying speech. The original mental representative formed deep inside each collaborator appeared on the surface partly as speech (language) and partly as illustrator. An illustrator, therefore, was an outward expression of a mental representation showing itself through the collaborator’s bodily movement channel. Our experiment showed that the works created by groups with many illustrators rated high both in perfection level and favorability rating. In a “collaborative creation” as in our study, as opposed to task-oriented study, it was shown that direct visual expressions of mental representations through nonverbal, rather than verbal, communication among the collaborators increased perfection level of the end product. 80 70 60 50 Phase3
40
Phase2 30
Phase1
20 10 0 g1
g2
g3
g4
g5
Fig. 9. Frequency of the occurrence of illustrators
Fig. 10. The timing of illustrator’s occurrence
Process in Establishing Communication in Collaborative Creation
323
Table 3. The Correlations of Work Score and Nonverbal Behavior and Work Score
Illustrator
Nodding
Eye Contact
Phase1 Phase2 Phase3 Sum Phase1 Phase2 Phase3 Sum Phase1 Phase2 Phase3 Sum
Creativity 0.629 -0.425 -0.188 0.080 0.832 0.872 0.834 0.852 0.615 0.778 -0.167 0.795
Perfection 0.095 0.986 0.846 0.749 -0.531 -0.529 -0.524 -0.530 -0.183 -0.442 -0.722 -0.562
Favorability 0.522 0.903 0.907 0.952 -0.086 -0.059 -0.075 -0.072 0.184 -0.010 -0.961 -0.142
3.5 Bodily Movements After eliminated noise from the motion data obtained by three-dimensional motion measurements, we outputted the coordinates of the x, y and z axes of each marker in a time-series manner. As bodily movements were recorded using 120FPS, we extracted one frame from every 12 frames and used them for analysis after standardizing them into the same data volume as behavior analysis. In order to quantify how much the collaborator’s upper body movements were synchronized, we calculated a cross-correlation coefficient for the upper body movements. For each group, the obtained data between two collaborators were divided into 60-second portions, and they were phased out by 30 seconds until the cross-correlation coefficient was obtained at the maximum lag order 300 (30 seconds). The Fig.11 shows the time-series change in the cross-correlation coefficient of Group 1. From this graph, it was learned that the upper body movements of different collaborators were synchronized to no small measure. The sum of the correlation coefficient for each group whose absolute value was 0.3 or greater is shown in the Table 4. As the correlation coefficient overlapped by 30 seconds, we used values by dividing the sum by two. 0.6
0.3 a-b 0
b-c c-a
-0.3
-0.6
Fig. 11. The upper body movements of different collaborators
324
M. Sakata and K. Miyamoto Table 4. The sum of the correlation coefficient for each group Group 1 Phase1 Phase2 Phase3 Sum
6 4.5 5 15.5
Group2 1.5 3 4.5 9
Group3 5 1.5 7 13.5
Group4 1 2.5 3.5 7
Group5 9.5 3.5 3 16
It has been known that “entrainment” occurs in physical communication, which is time-oriented mutual synchronization of biological rhythms (Condon, 1974). Our study also showed in a quantitative manner that the collaborators’ bodily movements synchronized and entrained one another. Entrainment is known to play an important role in smooth conversion of information in communication (Watanabe, 1998). Our study showed that entrainment among the collaborators helped sharing of each other’s thoughts and images in their search for the goal.
4 Conclusion In a collaborative activity of “building a castle”, we were able to confirm multilaterally the process of creating a final product through a verbal and non-verbal fusion of one another’s mental representations. Overall data analysis showed that “no one rule existed that was common to all collaborators and groups.” That is why the final products showed different creativity levels. The different mental representations fusing within the work space created new representations, which then converged into the work’s creativity. So it seemed only natural that creativity was widely varied. However, the design of the final product clearly embodied the various elements such as each collaborator’s personality, group’s structure, and communication process. Being able to prove this using experimental data was a large achievement.
References 1. Condon, W.S., Sander, L.W.: Neonate movement is synchronized with adult speech: Interactional Participation and Language Acquisition. Science 183, 99–101 (1974) 2. Matsuda, M., Matsushita, M., Naemura, T.: Group Task Achievement under a Social Distributed Cognitive Environment: An Effect of Closeness between Group. The Institute of Electronics, Information and Communication Engineers transactions on information and systems J90-D(4), 1043–1054 (2007) 3. McNeill, D.: Psycholinguistics: A new approach. Harper & Row, New York (1987) 4. McNeill, D.: Hand and mind. University of Chicago Press, Chicago (1992) 5. Okubo, M., Fujimura, A.: Development of Estimation System for Concentrate Situation Using Acceleration Sensor. In: Jacko, J.A. (ed.) HCI International 2009. LNCS, vol. 5610, pp. 131–140. Springer, Heidelberg (2009) 6. Suzuki, N., Kamiya, T., Yoshida, S., Yano, S.: A Basic Study of Sensory Characteristics toward Interaction with a Box-Shaped Interface. In: Jacko, J.A. (ed.) HCI International 2009. LNCS, vol. 5611, pp. 513–522. Springer, Heidelberg (2009) 7. Watanabe, T., Okubo, M.: Physiological Analysis of Entrainment in Communication. Journal of Information Processing Society of Japan 39(5), 1225–1231 (1998)
Real-World User-Centered Design: The Michigan Workforce Background Check System Sarah J. Swierenga1, Fuad Abujarad2, Toni A. Dennis3, and Lori A. Post2 1 Usability/Accessibility Research & Consulting Michigan State University, East Lansing, MI, USA 2 Michigan Department of Community Health, Lansing, MI, USA 3 Department of Emergency Medicine Yale University, New Haven, CT, USA
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. The Michigan Workforce Background Check system demonstrates how an iterative user-centered design (UCD) process enhances organizationallevel communication practices and efficiency. Well-designed information communication technology is an essential component of effective public health management. Usability and accessibility testing informed subsequent design and development. The iterative improvement in the background check application demonstrates that UCD should be a component of public health management projects in particular, and online project development in general. Keywords: User-centered design, usability, accessibility, information technology, criminal background checks.
1 Introduction With increased demand for government services concomitant with funding deficits, states are seeking innovative methods to deliver services more efficiently. In many states, this has led to an increasing number of government self-service websites for renewing vehicle registrations, buying fishing and hunting licenses, registering to vote, filing taxes, etc. [1], [2]. The effectiveness of self-service websites, however, depends on how easy they are to use, which can vary significantly according to their design. User-centered design (UCD) is well known within the field of humancomputer interaction, and is critical to successful system development [3], [4]. However, governments rarely consider this process, and are therefore unlikely to devote resources to fund such efforts, especially given their economic constraints. The Michigan Workforce Background Check (MWBC) system identified UCD as a critical factor in system development, differentiating it from other applicants (see Acknowledgment). Michigan was one of seven states included in a pilot study from January 2005 to September 2007 to develop a statewide background check system for direct access workers in long-term care and hospice facilities. The Michigan system is now G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 325–334, 2011. © Springer-Verlag Berlin Heidelberg 2011
326
S.J. Swierenga et al.
considered a national model, winning out over six other systems developed as part of the initial pilot program, in part due to the emphasis on applying the UCD process. Its success is especially noteworthy given the audience that uses the MWBC system and the geographic challenges in Michigan. Most users in adult foster care group homes have little technological expertise, while others are human resource professionals in nursing homes, for example who have moderate to advanced computer skills. The statewide catchment of the system presented unique challenges. Most of the Michigan population is concentrated in the southern lower peninsula with vast rural areas in the northern lower and upper peninsulas, where users operate through dial-up Internet connections. The ability of the highly complex MWBC system to prosper in this environment is remarkable. The system requires users to parse large and complicated tables and databases, yet receives relatively few requests for support and has a high rate of success for users. The program processes 60,000 to 70,000 background checks per year. The Michigan program for background checks demonstrates how health information technology positively impacts organizational communication practices, which is essential for effective public health management [5]. This study focuses on how usercentered design (UCD) methodologies improve design and the implementation of an effective background check system in a complex organizational environment under challenging time constraints. We hypothesize that this approach will result in better care for seniors and other vulnerable adults, a more qualified workforce, and will ultimately prevent incidents of abuse, mistreatment, and exploitation. 1.1 Program Background Michigan was one of seven states that received funding from the United States Department of Health and Human Services, Centers for Medicare and Medicaid Services (CMS) to participate in the legislatively mandated background check pilot program aimed at preventing abuse. Section 307 of the United States Medicare Prescription Drug, Improvement, and Modernization Act (MMA) of 2003 (PL 108173) directed the Secretary of Health and Human Services to establish a program to identify efficient, effective and economical procedures to conduct background checks on prospective employees of long-term care facilities or providers with direct access to patients, thereby increasing the safety of Michigan's elders and persons with disabilities. At the time, there was no systematic process across the multiple health and human service agencies to conduct background checks on prospective employees, manage the process, disseminate findings, or to make consistent employment decisions. The complexity of the issue, as well as dated mandates, created voids that potentially put vulnerable populations at risk. To exacerbate the problem, Michigan’s aging population and the shortage of healthcare workers increased pressure on long-term care facilities and providers to lower standards when hiring new employees [6], [7]. An improved, standardized system of background checks was necessary to keep unqualified persons from being employed.
Real-World User-Centered Design
327
The new system was an improvement by: • Increasing the scope of workers given background checks and the thoroughness of the checks • Harmonizing procedures across state agencies • Establishing cost containment measures for background checks The Michigan Workforce Background Check system launched in 2006 has remained active since the pilot phase with support from the Michigan Department of Community Health (MDCH) and in collaboration with Michigan State University. There has been continuing development in response to state policy and legislative changes. Due to its success, the program was used as the model for the national demonstration program to screen healthcare applicants, authorized by the Affordable Care Act of 2010. 1.2 Michigan Workforce Background Check System Overview The Michigan Workforce Background Check (MWBC) system is a Web-based application that centralizes the screening process for prospective employees in longterm care facilities. The system integrates abuse and neglect registries, the Office of Inspector General’s Medicare /Medicaid exclusion database and state criminal records archives, while providing secure communication between the system and the fingerprint vendor, the Michigan State Police (MSP), the Department of Community Health (MDCH), and the Department of Human Services (MDHS). (See Figure 1.) The system incorporated UCD methodologies such as focus groups, expert reviews, usability testing, and accessibility compliance inspections into the design and development, which was critical for deploying a usable, cost-effective system. The two-tiered screening process includes name-based searches in relevant databases, such as the HHS Medicare/Medicaid Exclusion List (US Office of Inspectors General), Michigan Nurse Aide Registry (NAR), Offender Tracking Information System (OTIS), Michigan Public Sex Offender Registry (PSOR), and others. Checking applicant names allows an initial assessment that identifies persons with disqualifying convictions in Michigan before requesting a more comprehensive and costly national background check. Provided the applicant successfully completes the initial assessment, the employer then requests fingerprint-based state and federal (FBI) criminal history checks. Once a record is established in the system, a compliance officer can track and monitor the records through RAPback system alerts, in which the state regulatory agency is immediately notified when an employee’s criminal history record is updated, including any subsequent violations. The system also has the ability to create customized reports and manage an appeals process. The system provides a single data entry point for Michigan employers to check registries for potentially disqualifying information, request fingerprint appointments, automatically import results of federal and state fingerprint checks, and download and print system-generated employment authorization letters from the regulatory agencies.
328
S.J. Swierenga et al.
Fig. 1. MWBC system overview
2 Applying User-Centered Design Techniques to System Development The implementation of the MWBC program is the result of a successful collaboration among multiple constituents. Advisory groups and work groups were convened early in the process in order to achieve consensus for the legislative effort and overall system design strategy. Representatives from AARP, the Michigan Assisted Living Association, Hospice and Palliative Care Association, the Health Care Association of Michigan, Legal Aide of Western Michigan, the Michigan Quality Community Care Council, the Michigan Departments of Community Health, Human Services, State Police and Information Technology, United Auto Workers, Service Employees International Union and the Michigan Home Health Association met regularly and often with Michigan State University researchers to address program and system concerns in the year preceding the launch of the online system. The development team investigated platform and software tools and creating the initial development environment while usability specialists met with users to gather user interface requirements. Given disparities in skill using Web-based systems or computers in general among potential end-users, these preliminary design discussions were critical in determining realistic task scenarios. The challenge was to design a system that manages, maintains, and simplifies the background checking process in an automated and efficient way. Due to a compressed timeline for system deployment, the development team elected to use Microsoft Visual Studio and .NET framework and utilized Crystal Reports to generate dynamic and user customized reports. The programming language employed was C# and Microsoft SQL 2005 was the database management system. Some of these choices complicated usability and accessibility compliance during the initial system design phase, but the programmers were able to devise successful solutions. The resulting system has multiple user-interfaces for providers, state analysts and State Police to compliment their tasks and workflow.
Real-World User-Centered Design
329
3 User Experience Evaluations during the System Design Phase 3.1 Focus Groups and Concept Testing with Providers (2005-2006) Early in the system design phase, usability experts from Usability/Accessibility Research and Consulting (UARC) at Michigan State University conducted a series of usability focus groups with providers (i.e., facilities tasked with conducting the background checks) to identify their current manual processes and review the proposed web-based approach. Participants discussed user interface concepts and features during the sessions, and they interacted with a concept prototype of the system design (See Figure 2). Participants provided valuable insights on each step of the proposed process, e.g., handling manual registry hits, completing the manual registry status and hiring step pages. Comments were used to create user interface requirements for the Home page and the applicant demographic form.
Fig. 2. MWBC home page concept design
3.2 Usability Evaluation with Providers (2006) As is typical of most real-world development efforts, the highest priority was delivering a stable product that met the functional requirements on time. In Michigan’s case, development could not begin until the legislature passed laws authorizing the program. Although the development schedule was quite compressed (about eight weeks), the team was still able to conduct usability testing on a detailed user interface design in tandem with implementation. Six providers from MDCH and MDHS facilities participated. The findings indicated that participants had significant difficulty understanding how to perform the registry checks and make correct determinations based on the registry results. Participants also had trouble with aspects of the site’s layout and presentation, including locating fingerprint results on the Home page, and finding the key sentence in the criminal history summary letter that stated whether the applicant had exclusionary findings from the fingerprint checks.
330
S.J. Swierenga et al.
Usability recommendations based on the results were included as much as possible in the first release. (Figure 3 shows the Home page for release 1.)
Fig. 3. MWBC home page – first release
Developers were able to achieve this critical milestone by putting the system into production by the end of March. Approximately 6,600 providers were set up in system and of those, 3,787 providers initiated 4,797 applications within the first month. Providers included nursing homes, homes for the aged, county medical care facilities, adult foster care facilities, Intermediate Care Facility for people with Mental Retardation (ICF/MRs), psychiatric hospitals, hospices, and home health agencies. 3.3 Accessibility Evaluation (2007) While accessibility compliance with Section 508 standards was a system design goal of the project, achieving compliance was more of a challenge. Usability specialists performed an accessibility compliance inspection against Section 508 accessibility standards a few months after the first release. Unfortunately, although the site met some requirements, each page included in the analysis failed at least one Section 508 requirement. This is not uncommon when the development timeline is too compressed to fully employ standards-compliant coding strategies. We encountered a number of issues when we reviewed the page with adaptive technology (i.e., screen reader) including missing descriptive text for images, non-redundant color coding (exempt personnel were identified only through yellow highlighting without another, non-color
Real-World User-Centered Design
331
identifier), and interactive new application form (CSS and JavaScript issues). The evaluators recommended re-examining the site after making the accessibility enhancements. Remediation for accessibility was scheduled for future releases. 3.4 Usability Evaluation with Experienced Providers (2007) About a year after the first release, the website was evaluated again in one-on-one usability sessions with eight experienced participants responsible for conducting background checks at provider agencies on a regular basis. Overall, these experienced participants had little difficulty understanding how to perform registry checks. However, they had some difficulty interpreting the results, tending to incorrectly mark the findings as non-disqualifying and then continuing with the next registry check. As a result, some applicants were sent for fingerprints when they might have been excluded based on the registry findings alone. Recommendations from the second review included providing column sorting capabilities; providing more information in the online guide about scenarios users might encounter, e.g., interpreting criminal history results; enabling users to correct errors on the applicant form; saving partially completed applicant forms; and requiring fewer fields on the applicant entry form, especially with respect to personal characteristics. 3.5 Accessibility Evaluation (2008) As in usability testing, the 2008 accessibility evaluation indicated that the site had improved, although it still did not meet Section 508 Requirements. Key areas for improvement included ensuring sufficient color contrast, creating consistent navigation that worked properly with adaptive technologies, and cross-browser compatibility.
4 User Experience Evaluations in Phase 2 In 2007 – 2008, significant functionality was added to the system. The integration of the RAPback enabled immediate notification to the department when an employee’s criminal history record was updated and decreased the turnaround time significantly. During the same time period an appeal module was added, which consolidated the process for flagging a disqualified individual. This feature resolved a delay in reporting exclusionary findings to the employer and in reporting favorable outcomes of the appeal process. 4.1 User Interface Redesign Effort Following the integration of the new functionality, a major redesign took place during 2008-2009 to address the remaining accessibility issues in the provider user interface and add comprehensive online help. The entire user interface layer was recoded to incorporate CSS design best practices and to ensure user interface compliance with State of Michigan design standards. In particular, the effort involved separating the presentation layer from the system architecture layers. (See Figure 4.)
332
S.J. Swierenga et al.
Fig. 4. MWBC home page redesign
4.2 Usability Evaluation with Providers (2008) In 2008, a final evaluation was conducted in one-on-one usability sessions with 10 providers from MDCH and MDHS facilities. Users were quite successful in conducting background checks, but they had some difficulty working with potentially exclusionary hits in the registries and newer functionality. A few participants experienced significant difficulty with the new RAPback feature. Recommendations for improving the user interface included making the consent checkbox more noticeable, adding drop-down defaults for place of birth and country of citizenship, and adding instructions for the RAPback process. 4.3 Accessibility Evaluation (2009) A final accessibility compliance inspection was performed in early 2009 before the release of the new interface into production. The user interface was found to meet Section 508 accessibility standards. 4.4 Focus Group with State Analysts (2009) In 2009, after completing the redesign of the user interface for providers, the team started to concentrate on streamlining the user interface for State analysts. We held a usability focus group with the analysts from MDCH and MDHS to identify areas for improvement in their user interface design and process. In general, the MWBC system worked well for the analysts, but analysts also wanted a comprehensive case management tool within the system. Additional discussions sessions were held to review the process in detail for all of the major areas of the product. The developers enhanced the system to streamline the overall process, combine functionality where appropriate, and add the case management tool functionality.
5 System Effectiveness – Highlights To date, more than 400,000 applicants have been checked, with over 13,000 eliminated from consideration as a result of background checks. In addition to creating a custom screening tool, the Michigan Workforce Background Check team
Real-World User-Centered Design
333
helped in establishing policy and procedures for using background checks across state agencies. Highlights include: • Employer interface provides “one-stop” shopping for comprehensive background checks with real time responses for name-based checks • Fingerprint-based check virtually eliminates false positive or negative results • Turnaround time reduced from 6-8 weeks to 48 hours • Standardizing the background check process and improving communication between agencies prevents disqualified individuals from moving across facility types • Immediate automatic RAPback notification • High-security environment protects privacy of applicants and employees • Turnaround time for appeals reduced to one business day • Built using best practices for user interface design and software development • Compliant with Section 508 accessibility standards In summary, the design and implementation of this background check system benefited greatly from conducting user evaluations throughout an iterative development process. Focusing on applying UCD techniques resulted in a product that (1) facilitates communication among state and federal agencies, (2) services as an interactive decision support system, and (3) provides a better tool for public health practices. Acknowledgements. We acknowledge the Michigan Department of Community Health for supporting the system development and user experience research under the Health Care Worker Background Check Project (MDCH Grant # 20110499-00, Swierenga (PI), 10/1/10-9/30/11; MDCH Grant # 20101041, Swierenga (PI), 10/1/099/30/10; MDCH Grant # 11-P-93042, Swierenga (PI), 10/1/08- 09/30/09; MDCH Grant # 20080000, Post (PI) 10/1/07-9/30/08. We also want to recognize the prior project funding received from the U.S. Dept. of Health and Human Services under the Michigan Program for Background Checks for Employees with Direct Access to Individuals Who Require Long-Term Care project. (Grant # 11-P-93042/5. Post, Swierenga, and Oehmke (PIs), 1/1/05-12/31/07).
References 1. Bélanger, F., Carter, L.: The Effects of the Digital Divide on E-government: An Empirical Evaluation. In: Proceedings of the 39th Hawaii International Conference on System Sciences, pp. 1–7. IEEE Computer Society, Washington (2006) 2. Dawes, S.S.: The Evolution and Continuing Challenges of E-governance. Public Administration Review 68, S86–S102 (2008) 3. Bias, R.G., Mayhew, D.J. (eds.): Cost-justifying Usability: An Update for the Internet Age, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2005) 4. Swierenga, S.J.: Incorporating Web Accessibility into the Design Process. In: Righi, C., James, J. (eds.) The User-Centered Design Casebook, pp. 355–380. Morgan Kaufmann, San Francisco (2007)
334
S.J. Swierenga et al.
5. Swierenga, S.J., Choi, J.H., Post, L.A., Coursaris, C.: Public Health Communication Technology: A Case Study in Michigan Long-term Care Settings. International Journal of Interdisciplinary Social Sciences 1(5), 115–124 (2007) 6. Post, L.A., Swierenga, S.J., Oehmke, J., Salmon, C., Prokhorov, A., Meyer, E., Joshi, V.: The Implications of an Aging Population Structure. International Journal of Interdisciplinary Social Sciences 1(2), 47–58 (2006) 7. Post, L.A., Salmon, C.T., Prokhorov, A., Oehmke, J.F., Swierenga, S.J.: Aging and Elder Abuse: Projections for Michigan. In: Murdock, S.H., Swanson, D.A. (eds.) Applied Demography in the 21st Century, ch. 6, pp. 103–112. Springer Science and Business Media B. V., Heidelberg (2008)
What Kinds of Human Negotiation Skill Can Be Acquired by Changing Negotiation Order of Bargaining Agents? Keiki Takadama, Atsushi Otaki, Keiji Sato, Hiroyasu Matsushima, Masayuki Otani, Yoshihiro Ichikawa, Kiyohiko Hattori, and Hiroyoki Sato The University of Electro-Communications 1-5-1 Chofugaoka, Chofu, Tokyo,182-8558 Japan {keiki@,atsushi.otaki@cas.,keiji@cas.,matsushima@cas., masa-o@cas.,yio@cas.,hattori@,sato@}hc.uec.ac.jp
Abstract. This paper focuses on developing human negotiation skills through interactions between a human player and a computer agent, and explores its strategic method towards a human skill improvement in enterprise. For this purpose, we investigate the negotiation skill development through bargaining game played by the player and an agent. Since the acquired negotiation strategy of the players is affected by the negotiation order of the different types of agents, this paper aims at investigating what kind of the negotiation strategies can be learned by negotiating with different kinds of agents in order. Through an intensive human subject experiment, the following implications have been revealed: (1) human players, negotiating with the human-like behavior agent firstly and the strong/weak attitude agent secondly, can neither obtain the large payoff nor win many games, while (2) human players, negotiating with the strong/weak attitude agent firstly and the human-like behavior agent secondly, can obtain the large payoff and win many games. Keywords: human skill development, agents, interaction, subject experiment, bargaining game.
1 Introduction Recently, the human skill development has been much attention on and has been regarded as very important issue for enterprise growth. To achieve it, many approaches for training staffs, such as e-learning, are explored as the progress of the information technology (IT). According to Oshima, the human skill development in enterprise that employs wider IT is categorized in the following four stages [5]: (1) in stage 1, enterprises attempt to improve training efficiency by using IT (e.g., elearning), which is the most basic approach that enterprises take; (2) in stage 2, enterprises optimize training course by excluding useless training course and including useful training course; (3) in stage 3. enterprises attempt to connect training G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 335–344, 2011. © Springer-Verlag Berlin Heidelberg 2011
336
K. Takadama et al.
course with actual works, which enables the enterprises to utilize IT as the communication and knowledge management tools in actual works; and (4) in stage 4, enterprises develop human skills for management by using IT to support employee career development. What should be noted here is that most enterprises have reached at the stage 2, but they have not yet reached at the stages 3 and 4. This is because a practical human skill depends on human experience or intuition, which makes it difficult to train such employees strategically, in comparison with the training of the new employees for general knowledge such as compliance through the e-learning. Like the reached stage of enterprises, most studies on human skill development using IT is related to e-learning [1][7] as the approach in the stages 1 and 2, while a systematic approach to the stages 3 and 4 without depending the human intuition and experience have not yet been fully studied. To overcome this problem, our previous research [6] focused on negotiation skill as one of important skill and explored its skill development through human and agent interaction. We take this approach because the skills trained by an interaction with agents have the potential of being a highly important ability for the employee in administrative or managerial positions (in particular, enterprise managers having little management experience are useful for gaining such experience, which contributes to fostering the leader in organizations). From this perspective, our previous research addressed the negotiation skill development through bargaining game as the first step toward our goal. Specifically, we conducted the subject experiments by negotiating human players with the following two different kinds of the agents: (a) strong/weak attitude agents making aggressive/defensive proposals in advantageous/dis- advantageous situations; (b) the human-like behavior agents making mutually agreeable proposals as the number of games increases. However, human subjects in this experiment negotiated with the same agents. Since the acquired negotiation strategy has the possibility of being affected by the negotiation order of different type of agents, this paper aims at investigating what kind of the negotiation strategies can be learned by negotiating with different kinds of agents in order. This paper is organized as follows. Section 2 explains the bargaining game as an example for the negotiation skill development. An agent implementation of the bargaining game is described in Section 3. The subject experiment and its results are described in Sections 4. Finally, our conclusions are made in Section 5.
2 Bargaining Game 2.1 Bargaining Game Bargaining game [8], in which two players aim at dividing money through negotiation between them, have been studied in the context of bargaining theory [3] as one of the major subjects in game theory [4]. The bargaining game is proposed for investigating when and what kind of offers of an individual player can be accepted by the other players. From this feature, the game requires skill of determining when and what kind
What Kinds of Human Negotiation Skill
337
of offers are needed to win the game. Concretely, the bargaining game can investigate (1) whether the players learn situational advantages and disadvantages in negotiation and (2) whether they also learn appropriate strategy based on these situations. This game is generally a one-shot game as one of typical examples of negotiation. However, it is not a realistic as social activity due to the fact that two or more negotiations are usually conducted. From this fact, we employ the sequential bargaining game as shown in Figure 1. In this example, the player 1 starts by offering 30% of reward R (R=10 in the example), then the player 2 counter offers 20% by refusing the player 1’s offer. Through such negotiations, the player 2 finally offers 40%, which the player 1 accepts, meaning that the player 1 acquires 40% and the player 2 acquires 60%. If this game is not completed within the maximum number of negotiation size (MAX_STEP) which is predetermined in advance and shared among players, both players cannot acquire any rewards, meaning that the negotiation fails. Figure 2 shows the final negotiation step in which the player 2 offers 10% to the player 1. When the player 1 refuses it, both players acquire none of the reward as shown in the upper figure, but when the player 1 accepts it, the players 1 and 2 respectively acquire 10% and 90% rewards as shown in the lower figure. If the players are rational, then the player 1 acquires even small offer from the player 2 because 10% reward is larger than 0% reward, meaning that anything is better than nothing. This indicates that the player making the last offer has an advantage in comparison with the other player because the last offerer can acquire a large reward by making a small offer to the other agent if the other player is rational. In this example, the player 2 stands in the advantageous situation, while the player 1 stands in the disadvantageous situation. Note that the game result can be calculated beforehand through game theory analysis if both players are rational. However, it goes without saying that humans sometimes take irrational behaviors, which cannot predict the result, i.e., the subject experiments are needed to understand the results.
I offer 30% No, 20%
No, 40%
Fig. 1. Example of sequential bargaining game
338
K. Takadama et al.
Fig. 2. Final negotiation
2.2 Previous Subject Experiment Our previous research [2] conducted the subject experiments of the sequential bargaining game and obtained the results shown in Figure 3, where the vertical and horizontal axes respectively indicate the acquired reward and iterations (i.e., the number of games) in the left graph, while the vertical and horizontal axes respectively indicate the negotiation size (i.e., negotiation counts until the offer is accepted from the other player) and iterations in the right graph. The left graph suggests that rewards of both human players are close to 5:5 (50%:50%), while the right graph suggests that the negotiation size increases in early iterations and decreases after several iterations because of the following reasons: (1) the negotiation process size increases in the early several iterations because both players do not know their strategies each other, which promotes them to explore possibilities of obtaining a larger reward by competing with each other which requires further negotiations (i.e., a larger negotiation process size is required to explore a larger reward); and (2) the negotiation process size decreases in the late several iterations because both players find a mutually agreeable payoff by knowing their strategies each other, which decreases the motivation of players to negotiate again (i.e., a few negotiation size is enough to determine their rewards) [10]. We call this tendency as the decreasing trend. Hereafter, we use the terms “payoff” and “agent” instead of the terms “reward” and “player” for their more general meanings in the bargaining game.
(a) Reward
(b) Negotiation size
Fig. 3. Result of previous subject experiments [2]
What Kinds of Human Negotiation Skill
339
3 Modeling Agents This section explains the agent proposed in our previous research [2], which can replicate the tendency of the payoff and negotiation process size shown in Figure 3. This agent is based on Q-learning [11] in the reinforcement learning context [9]. 3.1 Knowledge The agent has knowledge of the negotiation strategy represented by Q-tables composed of many Q-values (i.e., Q(s,a)), each of which indicates the expected payoff (i.e., reward) when the agent executes its action “a” in situation “s”. Q(3,2) in the circle in Figure 4, for example, means that the expected payoff of offering 2 (20%) to the other agent when 3 (30%) is offered by the opponent agent is 7.7.
Fig. 4. Q-table
The Q-value is used to determine the action of the agent (i.e., the action with the high Q-value is selected with a high probability) and its value is updated by Eq. (1), where α, r, γ, s’, and a’ indicate the learning rate, payoff, discount factor, the next state, and the next action.
Q( s, a) ← Q( s, a) + α (r + γ max Q( s' , a' ) − Q( s, a))
(1)
a '∈A( s ')
Each agent has the same number of Q-tables based on the number of the maximum negotiation size as shown in Figure 5, i.e., the next state s’ is used in the next Q-tables (not in the same Q-table). This is unique modeling of our agents compared to the conventional Q-learning agent. Specifically, the number of Q-tables each agent has is MAX_STEP/2 (e.g., three when MAX_STEP is six). Note that t represents the negotiation times (i.e., the number of negotiations). 3.2 Action Selection
The following action selections are employed in the agent: y ε-greedy selection: This method selects an action randomly in probability ε(0 ε 1), while it selects the action with the largest Q-value in probability 1-ε.
≦≦
340
K. Takadama et al.
y Boltzmann distribution selection: This method selects an action stochastically based on the Q-value. The probability of selecting action “a” is calculated by Eq. (2), where T is the temperature parameter adjusting action selection randomness. When T is low, the agent’s behavior becomes rational, i.e., the agent selects his action with the maximum Q-value.
p ( a | s ) = e Q ( s ,a )
T
∑
e Q ( s ,ai )
T
(2)
a∈ Ai
To replicate the tendency of the human-like behavior (i.e., the decreasing trend described in Section 2.2), our previous research introduced the random decreasing parameter, changeRate (0 < changeRate < 1), to gradually decrease randomness as shown in Eq. (3).
-
T = T ×(1 changeRate) in each iteration
(3)
Fig. 5. Q-tables in agents
4 Subject Experiments 4.1 Outline
Subject experiments involved two phases, i.e., the leaning phase in Figure 6 (a) and the evaluation phase in Figure 6 (b). The learning phase aims at making the subjects learn their agent strategies, while the evaluation phase aims at evaluating an effectiveness of the acquired strategies. In the learning phase, the six human subjects who are unfamiliar with the bargaining game enter different six rooms, and each subject plays the bargaining game with one kind of the agents. In detail, the subject A negotiates with a human-like agent (described in the next section), the subject C negotiates with a strong/weak attitude agent (described in the next section), and the subject AC (or CA) negotiates with the human-like agent firstly (secondly) and the strong/weak attitude agents secondly (firstly) as shown in Figure 6 (a). In the evaluation phase, on the other hand, the player plays the game with another subject
What Kinds of Human Negotiation Skill
341
who has already negotiated with a different kind of the agent via the computer network. Specifically, the subject A negotiates with the subjects C and AC (or CA), the subject C negotiates with the subjects A and AC (or CA), and the subject AC (or CA) negotiates with the subjects A and C as shown in Figure 6 (b).
AC or CA
(Human-like behavior+ Strong/weak attitude)
(a) Learning phase
AC or CA (b) Evaluation phase Fig. 6. Outline of subject experiment
In the experiment, human subjects are only informed that negotiation opponents change after 40 games (i.e., 20 games starting with the first offerer and 20 games starting with the second offerer). For the game setting, the total number of games for one subject is set as 160 games, meaning that each subject negotiates with the same agent in two sets of 40 games (i.e., 80 games in total) and negotiates with the other two subjects negotiating with different type of agents in 40 games (i.e., 80 games in total negotiating with the other two subjects). The point of this experiment is that opponents are changed from computer agents to subjects without the change being noticed because they are in separate rooms and negotiate with the opponent via the
342
K. Takadama et al.
computer network. After all experiments, all subjects answer the questionnaire to investigate the negotiation skill trained by different kinds of agents. Subjects are categorized into groups A, C, and AC (or CA) based on agent type. We conducted this subject experiment twice with the six different subjects, i.e., the first and second experiments were conducted by two of A, C, and AC subjects, and by two of A, C, and CA subjects, respectively. 4.2 Two Bargaining Agents
To investigate an effectiveness of the training through the negotiation with the agent, the following agents are employed as the same as our previous research [2]: (1) Strong/weak attitude agent: This agent makes aggressive/defensive proposals in an advantageous/disadvantageous situation using ε-greedy action selection with ε=0.1. For example, this agent obtains a large (around 70%) payoff in an advantageous situation and acquires a small (around 30%) payoff in a disadvantageous situation as shown in the left of Figure 7. (2) Human-like behavior agent: This agent makes mutually agreeable proposals as iteration increases using randomness decreasing Boltzmann distribution selection with T=1000 and changeRate=0.00001. For example, this agent acquires around the 50% payoff in early several iterations and acquires around the 70%/30% in advantageous/disadvantageous situation in late several iterations as shown in the right of Figure 7, which shows the decreasing trend of the negotiation process size like human behavior.
Fig. 7. Characteristics of two kind of bargaining agents
4.3 Results
Figure 8 shows the subject experiment results of the subjects A, C, AC, and CA in the evaluation phase. In this figure, the vertical axis indicates the average ranking of the win and payoff, while the horizontal axis indicates the subjects A, C, AC, and CA, each of values are averaged from four subjects A and C and two subjects of AC and CA. The gray, white and black bars in the figure indicate the win ranking, payoff ranking, and average of win and payoff ranking averaged from (a) and (b), respectively. In detail, Figure 8 (a) shows the win and payoff ranking in the total games, Figure 8 (b) shows the win and payoff ranking in each set game, and Figure 8
What Kinds of Human Negotiation Skill
343
(c) shows the average ranking of win and payoff, all of which are averaged from (a) and (b). Note that the win and payoff rankings in the total games are determined by the order of “the total number of wins” and “the total acquired payoff”, while those in each set game are determined by the order of “the number of wins in one set game” and “the acquired payoff in one set game”. Figure 8 shows the same tendency of the subjects A and C, found in our previous research [6], i.e., (1) the subjects C acquired the largest payoff from the total game viewpoint; and (2) the subjects A win the game more than the other subjects from each set game viewpoint. These results suggest that (1) the subjects C learn the strategy to acquire the large payoff through negotiation with the strong/weak attitude agent; and (2) the subjects A learn the strategy to win the game through negotiation with the human-like behavior agent, because the number of the draw, i.e., the negotiation failure (which exceeds the maximum number of negotiation size) in the subjects A is larger than that in the subjects C, meaning that the subjects A win or draw the games while the subjects C mostly win the game but sometimes lose it with few draws. In comparison with the subjects A and C, (1) the subjects AC, negotiating with the human-like behavior agent firstly and the strong/weak attitude agent secondly, can neither acquire the large payoff nor win many games, while (2) the subject CA, negotiating with the strong/weak attitude agent firstly and the human-like behavior agent secondly, can win more games than the subjects C and acquire the larger payoff than the subjects A, which contributes to deriving the highest ranking of the average of win and payoff from the total and each game viewpoints as shown in Figure 8 (c). This indicates that the negotiation order of the different types of agents gives a big influence on the learning of the negotiation strategies.
Fig. 8. Result on subject experiments
5 Conclusions This paper focused on developing human negotiation skills through interactions between a human player and a computer agent, and explored its strategic method towards a human skill improvement in enterprise. For this purpose, we investigated
344
K. Takadama et al.
the negotiation skill development through bargaining game played by the player and an agent. In detail, we focused on an influence of the negotiation order of the different types of agents, and investigated what kind of the negotiation strategies can be learned by negotiating with different kinds of agents in order. Through an intensive human subject experiment, the following implications have been revealed: (1) human players, negotiating with the human-like behavior agent firstly and the strong/weak attitude agent secondly, can neither obtain the large payoff nor win many games, while (2) human players, negotiating with the strong/weak attitude agent firstly and the human-like behavior agent secondly, can obtain the large payoff and win many games. Since this work is the first stage toward the strategic human skill development, the following issues must be resolved in the near future: (1) conducting more subject experiments to improve a reliability of the found implications; and (2) employing other agents that have different features to generalize the found implications.
References 1. CIPD. People Management and Technology: Progress and Potential, pp. 14-28 (2005) 2. Kawai, T., Koyama, Y., Takadama, K.: Modeling Sequential Bargaining Game Agents Towards Human-like Behaviors: Comparing Experimental and Simulation Results. In: The First World Congress of the International Federation for Systems Research (IFSR 2005), pp. 164–166 (2005) 3. Muthoo, A.: Bargaining Theory with Applications. Cambridge University Press, Cambridge (1999) 4. Osborne, M.J., Rubinstein, A.: A Course in Game Theory. MIT Press, Cambridge (1994) 5. Oshima, A.: The Progress and Issues of IT based Human Resource Development in Japanese corporations. In: The 14th Symposium on Social Information Systems Symposium, pp. 99–104 (2008) (in Japanese) 6. Otaki, A., Hattori, K., Takadama, K.: Towards Strategic Human Skill Development Through Human and Agent Interaction: Improving Negotiation Skill by Interacting with Bargaining Agent. Journal of Advanced Computational Intelligence and Intelligent Informatics (JACIII) 14(7), 831–839 (2010) 7. Parry, E., Tyson, S.: Technology in HRM: The Means to Become Strategic Business Partner? In: Storey, J. (ed.) Human Resource Management, Thomson Learning, pp. 235– 249 (2007) 8. Rubinstein, A.: Perfect Equilibrium in a Bargaining Model. Econometrica 50(1), 97–109 (1982) 9. Sutton, R.S., Bart, A.G.: Reinforcement Learning -An Introduction. MIT Press, Cambridge (1998) 10. Takadama, K., Kawai, T., Koyama, T.: Micro- and Macro-Level Validation in AgentBased Simulation: Reproduction of Human-Like Behaviors and Thinking in a Sequential Bargaining Game. Journal of Artificial Societies and Social Simulation (JASSS) 11(2) (2008) 11. Watkins, C.J.C.H., Dayan, P.: Technical note: Q-learning. Machine Learning 8, 55–68 (1992)
An Efficient and Scalable Meeting Minutes Generation and Presentation Technique Berk Taner, Can Yildizli, Ahmet Ozcan Nergiz, and Selim Balcisoy Sabanci University, Tuzla, Istanbul, Turkey {berktaner,canyildizli,ahmetn,balcisoy}@sabanciuniv.edu
Abstract. Meetings are essential for a group of individuals to work together. An important output of meetings is minutes. Taking and distributing minutes is a time consuming task. Also, any new member of a meeting series will not be able to easily refer to old minutes if they are in written or e-mail format. Our contribution to this problem is to propose a new approach for taking meeting minutes that will allow dynamic and cooperative note taking. In addition, resulting minutes will allow any new participant to spend a smaller integration time. Keywords: dynamic meeting minutes, storytelling interfaces.
1 Introduction Meetings are essential for a group of individuals to work together. They are a platform, where ideas are transferred and debated on. Techniques to make meetings more efficient are constantly researched and different technologies for telepresence and integrated workspace are developed. An important output of meetings is minutes. Minutes should be taken by one person and should be distributed to and approved by every participant. Its content is generally under discretion of minute taker. After minutes are compiled, it should be distributed to and approved by participants of the meeting. This whole process takes up time lowering utilization. In addition, the content of minutes does not always help capture the process of decision-making. This creates another problem: Any new participant and any absent participant over a series of meetings will not be able to fully grasp the content of previous meetings. In order to address these issues we’ve proposed and interactive note taking medium. This medium allows tracking authors of ideas as well as taking snapshots of the medium in order create visual meeting minutes dynamically. With this approach, minute takers will spend less time and resulting minutes can be approved in much less time. Its visual nature allows any newcomer to the group to grasp the concepts and see the evolving of ideas from the beginning. The rest of the paper is organized as follows: Section 2 is about related work in the field, Section 3 is about our approach to the problem, Section 4 is about usability studies and Section 5 contains results and discussion. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 345–352, 2011. © Springer-Verlag Berlin Heidelberg 2011
346
B. Taner et al.
2 Approach In order to prove proposed approach, web based software is developed and used in usability studies. The task was designed to allow us to compare timings between two experiments - one using developed software and one using classical whiteboard and note-taking approach.
Fig. 1. Blank GUI, at the start of a meeting series
For the usability studies we limited number of participants to four, denoted by four different colors. Participants can either draw or enter text to the medium. There are two parts: One part is dynamic area that is cleaned at the start of each meeting; the other part is static area which contains time stamped snapshots of the medium. Static area is persistent throughout a series of meetings, acting as a constant reminder and visual minutes of previous meetings. Capturing important information in static area helps to capture state of a meeting when the snapshot is taken. This allows all participants to follow through the evolution of an idea. It will also show links between ideas and how a meeting is played out in the end. This is significant with respect to traditional minutes, since it only contains the final decision and actions. This way if snapshots are taken regularly, another person can reconstruct the progress in the meeting just by looking at the
An Efficient and Scalable Meeting Minutes Generation and Presentation Technique
347
snapshots. This feature is especially beneficial to any new participant in a series, as can be seen from our usability study.
Fig. 2. Screen shot of medium during a study session
Another benefit of this approach is persistance of important information. If a project spans thorough a long time interval; some ideas and information will be forgotten and lost in the meeting minutes. It is a common practice to make an list of important ideas and go through them regularly. With this approach, every important information can be marked dynamically and will be present at the static part of the medium in a reminding but unobstrusive manner. Last benefit of our approach is on dynamic creation of meeting minutes. For traditional text based approach, one participant must take notes and get approval from involved parties. This also gives an implied authority over minutes to minute taker. Minute taker may take concious or unconcious liberties with minute content, which would lead approval problems and lost time for a business. In our approach, all participants take part in note creation and everybody can see the visual notes before leaving the room and approve it for distribution. This way, there will be less communication and effort spent on approving a meeting minute.
348
B. Taner et al.
3 Related Work Although they are so important and serve as a common practice for discussing things among people, meetings currently are not considered as adequate as they are expected to be. The estimations regarding the meeting productivity by different types of managers give a range between %33 %47 [11]. One of the main reasons for this inefficiency is information loss, i.e. the failure to record important information, decisions and actions and how this affects future actions [4]. Thus, capturing information during the meetings becomes a crucial practice which has to be done efficiently to prevent loss. Conventional method for this practice is note taking which is done manually either by a person responsible for that or by everybody individually. However this is not an easy process and studies show that people are experiencing problems such as failure to note facts which turn out to be vital later, insufficiency of notes since there is not enough time to write everything, reduced ability to participate and difficulties to pay close attention [5, 12, 13, 14]. The predominant tools for note taking used today are pen, paper, whiteboards and laptops (for private note taking) [4], there is an interest in more powerful capture tools [3,13,15,16,17] that will enhance the capturing process by making it automated, creating environments which can capture utilizing more than one media (i.e. audio, video, text) and which can link them to each other contextually or depending on time. Systems that include all or part of these capturing abilities together are mainly called as smart meeting systems. They are also able to serve as incorporated to shared workspaces [6] that improve the group interaction process and facilitate the collaborative work within the group providing an environment to share information among the members. Moreover, since face to-face meetings are time consuming and there are distributed working groups in which people are not able to gather physically, internet is used by these systems to make the groups virtually collocated. There are many smart systems developed recently by researchers [18,19,20,21,22,23,24,25]. Yu et al. [10] proposed a three layered generic architecture to model a smart meeting system. These layers are meeting capturing, meeting recognition and semantic processing. The first one is the physical level that includes the capturing environment, devices and methods. The second one serves as the structural level which is responsible for the low level analyzing of the recorded media content. The last one handles the high level manipulations on the semantics such as meeting annotation, indexing and browsing. This structure is based on the system requirements which can be listed as multimodal sensing, multimodal recognition, semantic representation and interactive user interface.
‐
‐
‐
4 Usability Study The task was to discuss and debate on a given problem. The groups are told to come to a conclusion after doing two sessions. Each meeting was limited to 25 minutes. We had 10 participants divided into two subgroups of 5. Both groups consisted of graduate students of different universities. We had two different tasks for the subgroups. Task 1 was to determine whether there should be a final exam or term project for the graduate courses. Task 2 was to
An Efficient and Scalable Meeting Minutes Generation and Presentation Technique
349
determine whether migration from old software to new software should take place or not. The groups are constructed as such that two participants are in favor of the idea and the other are in opposition. Replace member is selected from the opposition group since opposing an idea requires more background knowledge about the topic. This was we were able to observe if a newcomer with no-prior knowledge is able to integrate into the debate and make contributions to the debate. One group was taken in a room where a classical whiteboard and a controller present to help with minute taking and compiling process. The controller did not take part in the debate, but tried to compile as much information as possible into a meeting minute. The other group was taken in a room, same as described above, with addition of a projector to allow participants to use developed software to create minutes on the fly. Web based software was operated by a controller based on directives from experiment group. The experiment starts with 4 participants from a subgroup is taken into test room and given Task 1. After 25 minutes of discussion and debate; the meeting adjourns. At the end of the meeting, controller passes minutes to all participants and gets approval before meeting 2. In parallel, the other subgroup is taken into another test room and given Task 2. After 25 minutes of discussion and dynamic note taking using developed software, the meeting is adjourned. Since notes are dynamically created and projected on the wall; all participants approved generated notes quickly before leaving the room.
Fig. 3. Meeting setting and integration time for two different groups. After meeting 1, participant number 4 is replaced with participant number 5 in both groups. One of the aims was to measure how quickly participant 5 is integrated into the discussion. In group1 (traditional approach) this time is 15 minutes, in group 2 (our approach) this time is reduced to 2 minutes.
350
B. Taner et al.
Second round of sessions are done the next day to simulate time gap between meetings. In second round one participant from each subgroup is removed and another participant without prior knowledge of the task is added (Fig 3). No communication between replaced participant and newcomer is allowed during time gap between meetings. In one group the newcomer is given printed meeting minutes and spent 15 minutes in the beginning by reading and grasping the discussion from previous session. The other new participant supplied with dynamically created minutes projected on the board using developed software. It took only 2 minutes before he stated that he was ready to proceed.
5 Results and Discussion This data supports our two claims: In the traditional minute taking approach the minute taker; in this case controller has to pass meeting minutes to each participant and get approval. Since participants do not see minutes until they are written down, they treat it as new content and have to read it from the beginning. With the interactive whiteboard approach, minutes are created by all participants during the meeting, and visible throughout the meeting; so approval process takes little amount of time. Any newcomer to a meeting must spend some time to review previous meetings notes and process the information. We call this time, “integration time”. In the traditional minute taking approach, any newcomer is presented with a stack of paper or e-mails to read before attending the meeting. In our usability study, traditional method produced 3 A4 papers of minutes. Even with this small size of minutes, the newcomer’s integration time was 15 minutes. With the whiteboard approach, the newcomer is only presented with the time stamped information on the static area. The integration took only 2 minutes before the newcomer said the meeting can begin. This shows a significant improvement over traditional approach. Our study shows that dynamically created visual meeting notes are beneficial in above mentioned aspects.
6 Future Work We used web based software to test our approach. One major limitation of this software is the assumption that participants are present in the same room. The software only provides a digital medium where participants can describe ideas. In order to test this approach in a more complex environment, software that supports visual and audio communication is needed. This way, participants can be distributed to remote locations and a series of meetings can be conducted using this new software and timings can be analyzed. Physical presence is an important aspect of communication and testing our approach with telepresence would be an interesting challenge.
An Efficient and Scalable Meeting Minutes Generation and Presentation Technique
351
References 1. Mark, G., Grudin, J., Poltrock, S.E.: Meeting at the Desktop: An Empirical Study of Virtually Collocated Teams. In: Proceedings of ECSCW 1999 (1999) 2. Anson, R., Bostrom, R., Wynne, B.: An Experiment Assessing Group Support System and Facilitator Effects on Meeting Outcomes. Management Sci. 41(2), 189–208 (1995) 3. Chiu, P., Boreczky, J., Girgensohn, A., Kimber, D.: LiteMinutes: An Internet Based System For Multimedia Meeting Minutes. In: Proceedings of 10th WWW Conference, Hong Kong, pp. 140–149 (2001) 4. Whittaker, S., Laban, R., Tucker, S.: Analysing Meeting Records: An Ethnographic Study and Technological Implications. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 101–113. Springer, Heidelberg (2006) 5. Whittaker, S., Hyland, P., Wiley, M.: Filochat: Handwritten Notes Provide Access to Recorded Conversations. In: Proceedings of CHI 1994, Boston, MA (1994) 6. Richter, H., Abowd, G., Geyer, W., Fuchs, L., Daijavad, S., Poltrock, S.: Integrating Meeting Capture within a Collaborative Environment. In: Proc. Ubicomp 2001, ACM Conference on Ubiquitous Computing, Atlanta GA, USA (2001) 7. Huber, G.P.: Issues in the Design of Group Decision Support Systems. MIS Quarterly 8(3), 195–205 (1984) 8. Davis, R.C., Landay, J.A., Chen, V., Huang, J., Lee, R.B., Li, F.C., Lin, J., Morrey III, C.B., Schleimer, B., Price, M.N., Schilit, B.N.: NotePals: Lightweight note sharing by the group, for the group. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 1999), May 1999, pp. 338–345. ACM Press, New York (1999) 9. Yu, Z., Nakamura, Y.: Smart Meeting Systems: A Survey of State of the Art and Open Issues. ACM Computing Surveys 42(2) (2010) 10. Yu, Z., Ozeki, M., Fujii, Y., Nakamura, Y.: Towards Smart Meeting: Enabling Technologies and a Real World Application. In: Proc. ICMI 2007, pp. 86–93 (2007) 11. Green, W.A., Lazarus, H.: Are today’s executives meeting with success. Journal of Management Development 1(10), 14–25 (1991) 12. Moran, T.P., Palen, L., Harrison, S., Chiu, P., Kimber, D., Minneman, S., Melle, W., Zellweger, P.: “I’ll get that off the audio: A Case study of salvaging multimedia meeting records. In: Proceedings of the CHI 1997, Atlanta, Georgia (1997) 13. Wilcox, L.D., Schilit, B.N., Sawhney, N.: Dynomite: A Dynamically Organized Ink and Audio Notebook. In: Proceedings of CHI 1997, pp. 186–193. ACM Press, NY (1997) 14. Davis, R.C., Landay, J.A., Chen, V., Huang, J., Lee, R.B., Li, F.C., Lin, J., Morrey, C.B., Schleimer, B., Price, M.N., Schilit, B.N.: NotePals: Lightweight Sharing by the Group, for the Group. In: Proceedings of CHI 1999, pp. 338–345 (1999) 15. Chiu, P., Foote, J., Girgensohn, A., Boreczky, J.: Automatically linking multimedia meeting documents by image matching. In: Proceedings of Hypertext 2000, pp. 244–245. ACM Press, New York (2000) 16. Chiu, P., Kapuskar, A., Reitmeier, S., Wilcox, L.: NoteLook: Taking notes in meetingswith digital video and ink. In: Proceedings of ACM Multimedia 1999, pp. 149– 158. ACM Press, New York (1999) 17. Cutler, R., Rui, Y., Gupta, A., Cadiz, J.J., Tashev, I., He, L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, S.: Distributed Meetings: A Meeting Capture And Broadcasting System. In: Proc. of the 10th ACM International Conference on Multimedia, pp. 503–512 (2002) 18. Waibel, A., Bett, M., Fınke, M.: Meeting browser: Tracking and summarizing meetings. In: Proceedings of the Broadcast News Transcription and Understanding Workshop, pp. 281–286 (1998)
‐‐ ‐
352
B. Taner et al.
19. Mikic, I., Huang, K., Trivedi, M.: Activity monitoring and summarization for an intelligent meeting room. In: Proceedings of the IEEE Workshop on Human Motion, pp. 107–112. IEEE, Los Alamitos (2000) 20. Chıu, P., Kapuskar, A., Reıtmeıer, S., Wılcox, L.: Room with a rear view: Meeting capture in a multimedia conference room. IEEE Multimedia 7(4), 48–54 (2000) 21. Ruı, Y., Gupta, A., Cadız, J.J.: Viewing meetings captured by an omnidirectional camera. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 450–457. ACM, New York (2001) 22. Lee, D., Erol, B., Graham, J., Hull, J.J., Murata, N.: Portable meeting recorder. In: Proceedings of the 10th ACM Conference on Multimedia, pp. 493–502. ACM, New York (2002) 23. Jaın, R., Kım, P., Lı, Z.: Experiential meeting systems. In: Proceedings of the ACM Workshop on Experiential TelePresence, pp. 1–12. ACM, New York (2003) 24. Stanford, V., Garofolo, J., Galıbert, O., Michel, M., Laprun, C.: The NIST smart space and meeting room projects: Signals, acquisition, annotation and metrics. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6–10. IEEE, Los Alamitos (2003) 25. Wellner, P., Flynn, M., Guıllemot, M.: Browsing recorded meetings with Ferret. In: Proceedings of the First International Workshop on Machine Learning for Multimodal Interaction (MLMI), pp. 12–21 (2004)
Object and Scene Recognition Using Color Descriptors and Adaptive Color KLT Volkan H. Bagci1, Mariofanna Milanova1, Roumen Kountchev2, Roumiana Kountcheva3, and Vladimir Todorov3 1
Computer Science Department, UALR, 2801 S. University Ave., Little Rock, Arkansas 72204, USA 2 Department of Radio Communications, Technical University of Sofia, Bul. Kl. Ohridsky 8, Sofia 1000, Bulgaria 3 T&K Engineering, Mladost 3, Sofia 1712, Pob.12, Bulgaria {Vhbagci,mgmilanova}@ualr.edu,
[email protected], {todorov_vl,kountcheva_r}@yahoo.com
Abstract. With the emergence and explosion of huge image databases there is an increasing necessity for effective methods to assess visual information on the level of objects and scene types. A wide variety of Content – Based Image Retrieval (CBIR) systems already exists. As a key issue in CBIR, similarity measure quantifies the resemblance in contents between a pair of images. Depending on the type of features, the formulation of the similarity measure varies greatly. The primary goal of our study is to reduce the computation time and user interaction. The secondary goal is to reduce the semantic gap between high level concepts and low level features. A third goal is to evaluate system performance with regard to speed and accuracy. In the proposed study transform color after statistical transform, such as the Adaptive Color Karhunen Loeve Transform (ACKLT) is used as a color descriptor. The results are showing the advantage of the new algorithm for ACKLT in comparison with the YCrCb color model. Based on the experimental results, we concluded that correct selection of descriptors invariant to light intensity and light color changes affects object and scene category recognition. Keywords: content-based image retrieval, Adaptive Color Karhunen Loeve Transform.
1 Introduction All current Content–Based Image Retrieval (CBIR) systems retrieving stored images from a collection by comparing features automatically extracted from the images themselves. The system then identifies those stored images whose feature values match those of the query most closely and displays the result on the screen. CBIR system is using machine learning techniques and image descriptions to distinguish objects and scene categories. For real world scenes, there can be large variations in viewing and lighting conditions and this complicates the image descriptions. Koen and al. in [1] discussed the invariance properties and the distinctiveness of color descriptors. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 355–363, 2011. © Springer-Verlag Berlin Heidelberg 2011
356
V.H. Bagci et al.
There are two basic types of color spaces – deterministic and statistical. The deterministic transformations such as YCrCb, YUV, YIQ, CMYK [2, 3, 4] are calculated using fixed coefficients and require less computations but their disadvantage is that they are not adapted to each individual image that is being processed. In the other type – the statistical transforms such as the Adaptive Color Karhunen-Loeve Transform (KLT) [2] the generated color space is adapted to the statistical properties of each image or group of images that is being transformed but the disadvantage is that it requires more computations than the deterministic color systems. That gives better quality of the restored image, less correlation of the components, etc [5, 6]. In [9] the new method for color segmentation of human faces using KLT approximation with a matrix of fixed coefficients. In this case the KLT is of relatively low computational complexity, but it is not fully adapted to the local statistics of the face colors. In the proposed system texture features and color features for computing the similarity between query and database images are used. The invariance properties of color descriptors are explored. In statistical transforms, (in our case, the Adaptive KLT) the components are adapted to each image that is being transformed. Therefore, the transform color descriptor is adapted to the statistical information of the image. The results show the advantages of the new algorithm for Adaptive Color KLT in comparison with the YCrCb color model. The most important feature of the Adaptive Color KLT is that it ensures strong components decorrelation. Based on the experimental results the conclusion is that correct selection of descriptors invariant to light intensity changes and light color changes affects object and scene category recognition . On Fig. 1 is shown the Module Block Diagram of the proposed system. The new approach includes first, the extraction of color descriptors based on Adaptive Color Karhunen Loeve Transformation (KLT); and second, the matching and classification is implemented. The system calculates the distance similarly between the image request and images stored in the database. As an output the system calculates the accuracy of object recognition and displays sorted images in ascending order of distance similarity. This paper is organized as follows: In section 2 the algorithm for Adaptive Color KLT is presented. In section 3 the recognition module is given. The experimental setup is presented in section 4. Finally, in section 5, conclusions are drawn.
Query Image
Color Feature Extraction using KLT Color Feature Extraction using KLT
Image Repository
Similarity Computation Using Quadratic Distance
Sorting & Display Results Fig. 1. Color Feature Extraction Block Diagram
Object and Scene Recognition Using Color Descriptors and Adaptive Color KLT
357
2 Color Transformation Based on KLT The proposed algorithm is a complete analytical solution to the problem of the color transform based on the KLT. It is based of the method presented in [1]. The algorithm is simplified so that to reduce the necessary computations of the color transform. Transforming an RGB image into the new color format is made by the following steps following the proposed algorithm presented in Fig 2, blocks (1)-(10), which is the forward algorithm for the Adaptive Color KLT: G Step 1: Determination of the primary color vectors Cs for each pixel from the original RGB image, where s is the current pixel and S the total number of the pixels in the image, therefore S = M × N , where M and N are the image height and width.
Step 2: Calculation of mean values of the colors R, G and B - Fig. 2, block (4). The mean values are necessary for the computation of the covariance matrix in the next step. Step 3: Calculation of the image covariance matrix: ⎡
S
G G ⎤
⎡ k11
⎥⎦
⎢k ⎣ 31
[K C ] = ⎢ 1 ∑ Cs Cst ⎥ − mG c mG ct = ⎢k 21 ⎢⎣ S
s =1
k12 k 22 k 32
k13 ⎤ k 23 ⎥ k 33 ⎥⎦
(1)
Where the coefficients k i, j are calculated using Fig. 2, block (4). The covariance matrix is a diagonal matrix so therefore the eigenvalues are always real numbers. Step 4: Calculation the coefficients (a, b, c) of the characteristic equation of the covariance matrix,
det| k ij − λδij | = λ3 + aλ2 + bλ + c = 0
(2)
using equations Fig. 2, blocks (5). Step 5: Calculation of the eigenvalues of the characteristic equation defined in the previous step. Given that the covariance matrix [K C ] is a diagonal matrix the eigenvalues can be defined by the “Cardano” relations or the so called trigonometric equations [10] Fig.2, block (6) and (7). Where we have the condition λ1 ≥ λ 2 ≥ λ 3 ≥ 0 . Step 6: Calculation of the eigenvectors of the covariance matrix [K C ] : Fig. 2, block (8). Here A m ,Bm , D m , Pm are the coefficients used for the computations and m = 1,2,3 . From the eigenvectors we form the transformation matrix [Φ ] : G ⎡ Φ1t ⎤ ⎡ Φ Φ13 ⎤ 11 Φ12 G [ Φ] = ⎢⎢ΦG 2t ⎥⎥ = ⎢Φ 21 Φ 22 Φ 23 ⎥ (3) ⎢ ⎥ ⎢Φ 3t ⎥ ⎣Φ 31 Φ 32 Φ 33 ⎦ ⎣ ⎦
358
V.H. Bagci et al.
1
[R], [G], [B] of size M×N = S
G Cs = [ R s , G s , Bs ]t for s = 1,2,..,S
2
R=
3
k1=
4 k4=
5
6
7
8
1 S
S
1 S
S
S
∑ Rs ; G = S ∑Gs ; 1
s =1
s =1
B=
1 S
S
∑ Bs s =1
1 S 1 S 1 S ( R s − R ) 2 ; k 2 = ∑ ( G s − G ) 2 ; k 3 = ∑ ( Bs − B ) 2 ; ∑ S s =1 S s =1 S s =1 S
1
s =1
1
s =1
s =1
a = − k 1 − k 2 − k 3 ; b = k 1 k 2 + k 1 k 3 + k 2 k 3 − k 24 − k 52 − k 62 ; c = k 1 k 26 + k 2 k 52 + k 3 k 24 − k 1 k 2 k 3 − 2 k 4 k 5 k 6 .
p = -(a 2 /3) + b < 0, q = 2(a/3) 3 − (ab)/3 + c, ϕ = arccos ⎡ − q/2 ⎢⎣
λ1= 2
p 3
10
( p /3)3 ⎤⎥ ⎦
p p ⎛ ϕ −π ⎞ a ⎛ ϕ +π ⎞ a ⎛ ϕ⎞ a cos⎜ ⎟ − ; λ 2 = −2 cos⎜ cos⎜ ⎟− ⎟ − ; λ 3= −2 3 3 ⎝ 3 ⎠ 3 ⎝ 3 ⎠ 3 ⎝ 3⎠ 3
Am= (k3 − λm )[k5 (k2 − λm ) − k4k6 ],
Bm= (k3 − λm )[k6 (k1 − λm ) − k4k5 ],
D m = k 6 [2k 4 k 5 − k 6 (k 1 − λ m )] − k 52 (k 2 − λ m ),
Pm = A 2m + B2m + D 2m ;
Φm1= Am /Pm; Φm2 = Bm /Pm; Φm3 = Dm /Pm
9
S
∑ (R s − R )(G s − G ); k 5 = S ∑ (R s − R )(Bs − B); k 6 = S ∑ (G s − G )(Bs − B)
⎡ L1s ⎤ ⎡Φ11 ⎢ L ⎥ = ⎢Φ ⎢ 2s ⎥ ⎢ 21 ⎢⎣ L3s ⎥⎦ ⎢⎣Φ 31 for
m=1,2,3
Φ12 Φ13 ⎤ ⎡R s ⎤ Φ 22 Φ 23 ⎥ ⎢Gs ⎥ ⎥⎢ ⎥ Φ 32 Φ 33 ⎥⎦ ⎢⎣ Bs ⎥⎦ s = 1,2,..,S
G L s = [ L 1s , L 2 s , L 3s ] t for s = 1,2,.., S
Fig. 2. Block diagram of the algorithm for Adaptive Color KLT
Object and Scene Recognition Using Color Descriptors and Adaptive Color KLT
359
Step 7: Performing the color transform using the already generated transformation G matrix [Φ ] to obtain the transformed color vectors L s = [L1s , L 2s , L 3s ] t using the equation Fig, 2, block (9). Where again s is the current pixel that is being transformed and S is the total number of the pixels in the image. Step 8: Perform an adaptive quantization of the obtained matrix [L1 ] to comply with the limits of 8 bits per pixel or 256 unique values in the matrix using the equations from Fig. 2, block (9). Here h L1 ( t ) is the histogram calculated for the first component of the Adaptive Color KLT, t ck is the center of gravity of the part of the histogram h L1 ( t ) between levels t k and t k +1 of the component L1 (k=1,2,..,K for K - number of quantization levels). After adaptive quantization we have three matrices [Lˆ1 ], [Lˆ 2 ], [Lˆ3 ] that comply with the limit of 8 bits per pixel or 24 bpp for each pixel in the image.
3 Object and Scene Recognition The proposed system comprises two basic steps of image processing, described below. 3.1 Feature Extraction
The color features vector of query image and database images are computed. Then these features of a query image and the features of database images will be used either to find the smaller distances or large similarity. In this work is offered to apply the direct ACKLT on the color vectors G Cs = [ R s , G s , Bs ] t for s=1,2,..,T using the equation Fig, 2, block (9). In result they G are transformed into vectors L s = [L1s , L 2s , L 3s ] t , which are after that normalized regarding their module. 3.2 Image Classification
The relevant images are sorted out based on the dissimilarity and displayed in ascending manner. That is if distance measures are used they are displayed starting from shorter to longer distances and if similarity measure is used they are displayed starting with higher similarity to lower similarity. The main aim of classification is to identify the characteristics that indicate the group to which each case belongs. It can also be used to understand the existing data and to predict how new instances will behave. For example it can be used to predict cases like whether individuals can be classified as likely to respond to a direct mail
360
V.H. Bagci et al.
solicitation, vulnerable to switching over to a competing long distance phone service, or a good candidate for a surgical procedure. Classification models are created by examining already classified data (cases) and inductively finding a predictive pattern. The existing cases may be from an historical database, such as images that belong to particular class. They may come from an experiment in which a sample of the entire database is tested in the real world and the results used to create a classifier. For example, a sample of a mailing list would be sent an offer, and the results of the mailing used to develop a classification model to be applied to the entire database. Sometimes an expert classifies a sample of the database, and this classification is then used to create the model which will be applied to the entire database. In this study a well known algorithm called Nearest-Neighbor algorithm is implemented . In this algorithm when an image is given it is compared to each and every image in the data base using a distance measure. The “unknown” or input image is said to be classified as belong to the class to which its distance will be shortest. It can also be used using similarity measure. If similarity measure is calculated we have to consider the largest value for similarity. To compare the distance between two manifolds we are using the distance measure variant of the Hausdorff metric. The proposed distance measure can handle changes in duration and is invariant to temporal shifts. Given two manifolds from two images A = [a1.a 2 .......a n ]. and B = [b1.b 2 ....b m ]. , we define d(A, B) =
aj b 1 n − i .∑ min bi n j=1 a
(4)
To ensure similarity, was used the following distance measure, presented in [8]: D(A, B) = d(A, B) + d(B, A )
(5)
For the final activity classification was adapted the Nearest Neighbor classifier (NN). It is assumed that T represents a test image and Ri represents the reference image of the class I. Then, the test image is classified into the class I which will minimize the similarity distance between the test image and the reference image, c = arg min D i (T, R i ) ,
(6)
where D is the similarity measure described in (5).
4 Experiments and Results We used the Amsterdam Library of Object Images (ALOI) dataset [7] Wide – baseline Stereo Full Color collection for experiments. This collection includes 2250
Object and Scene Recognition Using Color Descriptors and Adaptive Color KLT
361
images and 750 objects.We use 30 different objects under various illumination conditions. Different lighting conditions are presented in the AIOI such as : objects lighted by different number of white lights, object rotation images and images with different levels of JPEG compression.
Object Type (Code)
True 0
1
3
2
4
Type 5
6
7
8
9
X
Total
* 0 (257) 1 (284) 2 (337) 3 (406) 4 (503) 5 (602) 6 (706) 7 (836) 8 (909) 9 (979) X* Total
3 2
1 3 2
1 1 1
2 2 3 3 3 3 3
2
3
2
2
2
3
3
3
3
4
3 3 3 3 3 3 3 3 3 3 0 30
Fig. 3. Confusion Matrix
On Fig 3 is shown the confusion matrix. R = E / S = 4 / 30 = 0.13 Where E is a number of wrong classifications and S is a total number of objects. A = 1 – R = 1 – 0.13 = 0.87 (87%) Confusion matrix uses the test data including 3 closest images based on each test image but do not neglect the test image itself in results. According to test results, system performs well based on the database. When the system misses an image, it's observed that missed image is close to the location of the current neighbor among other 2200+ neighbors. In the future we aim to test the accuracy of the method based on different lightning conditions, different temperature and wide rotation angles. Another test would be to use multiple levels of image resolution to understand its effect in accuracy based on the levels of resolution.
362
V.H. Bagci et al.
On Fig. 4 are shown randomly chosen test images.
Fig. 4. Randomly Chosen Test images
5 Conclusions There are two types of color transforms – deterministic (RGB, YcrCb, YUV, YIQ, CMYK) and statistical (ACT – Adaptive Color KLT Transform). The deterministic transforms are defined by fixed equations with fixed coefficients which are not changing for each image. The statistical transforms (ACT) the components are adaptive for each image that is being transformed. Therefore the transform matrix generated by the algorithm is adapted to the statistical information of the image that is being transformed. The new approach permits reliable object detection and identification in various positions, lighting conditions and viewpoints. Acknowledgments. This paper was supported by the System Research and Application (SRA) Contract No. 0619069. This work was also supported in part by the Joint Research Project Bulgaria-Romania (2010-2012): “Electronic Health Records for the
Object and Scene Recognition Using Color Descriptors and Adaptive Color KLT
363
Next Generation Medical Decision Support inRomanian and Bulgarian National Healthcare Systems”.
References 1. Koen, E., van de Sande, A., Gevers, T., Snoek, C.: Evaluating Color Descriptors for Object and Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1582–1596 (2010) 2. Kountchev, R., Kountcheva, R.: New Method for Adaptive Karhunen-Loeve Color Transform. In: Proc. of 9th Intern. Conf. on Telecommunications in Modern Satellite, Cable and Broadcasting services (TELSIKS 1909), Nish, Serbia, October 7- 9, pp. 209–216 (2009) 3. Ivanov, P., Kountchev, R.: Comparative analysis of Adaptive Color KLT and YCrCb for representation of color images. In: Proc. of ICEST (2010) 4. Pratt, W.: Digital Image Processing. Wiley Interscience, New York (2007) 5. Fleury, M., Downton, A., Clark, A.: Karhunen–Loeve Transform – Image Processing. University of Essex, Wivenhoe Park (1997) 6. Dony, R.: The Transform and Data Compression Handbook. In: Rao, K., Yip, P., Raton, B. (eds.) Karhunen-Loève Transform. CRC Press, Boca Raton (2001) 7. Geusebroek, M., Burghouts, G., Smeulders, A.: The Amsterdam library of object images. Int. Journal of Computer Vision 61(1), 103–112 (2005), http://staff.science.uva.nl/~aloi/ 8. Masoud, O., Papanikolopoulos, N.: A method for human action recognition. Image and Vision Computing 21(8), 729–743 (2001) 9. Ionita, M., Corcoran, P.: Benefits of using decorrelated color information for face segmentation/tracking. Advances in optical technologies. Hindawi Publishing Corporation, ID 583687 (2008)
What Maps and What Displays for Remote Situation Awareness and ROV Localization? Ryad Chellali1 and Khelifa Baizid1,2 1
Italian Institute of Technology, Via Morego, 30, Genova 16163, Italy 2 University of Genova, Genova, Italy
[email protected],
[email protected]
Abstract. When exploring environments remotely, the knowledge of the teleoperated vehicle location is a key element for operators’ situational awareness. Given visual information provided by the ROV, we aimed at finding the best combination between the maps used by operators, visual feedbacks provided by the ROV and the displays to show the previous information. In our system, teleoperators use 2D top-view or 3D immersive representation of the remote world as maps. From the remote site, they receive a live video stream provided by a remotely controlled pan-tilt camera. Maps and video streams are displayed on PC screens or HMD’s. We give and we discuss here, the results of the performed experiments. As expected, the 3D maps give more accurate estimation but are time consuming. On the other hand, we found that the use of simple PC-screens leads to better results than HMDs. Keywords: Degrees of Autonomy and Teleoperation, Human Factors and Ergonomics, Motion Planning and Navigation in Human-Centered Environments.
1 Introduction To interact with objects in our peri-personal space: our visual system gives a 3D representation of this space, our tactile or/and our haptics systems inform us about contacts with objects and their weights, our proprioception produces the corporal scheme allowing to coordinate arms and hands movements, etc. To perform the same tasks within a remote environment, one uses ROV’s. He or she controls remote robots (ROV’s) and rely on the remotely sensed data to explore, to handle or to push objects for instance. The later are supposed to let operators to be aware about the remote world and to built a mental representation of it. Unfortunately, these feedbacks are inherently distorted and incomplete leading to poor and partial descriptions. Following that, operators perform additional mental efforts to compensate distortions and to recreate missing parts. The fatigue on one side and the weak situational awareness on the other one is the two main lacks tele-operation system are suffering from. In general, to take any motor based decision, operators use only few information (a monocular and reduced filed of view video stream are typical) and complete it by inferring the missing parts. One key issue for such tasks is the ROV G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 364–372, 2011. © Springer-Verlag Berlin Heidelberg 2011
What Maps and What Displays for Remote Situation Awareness
365
localisation, e.g. the operator’s estimate of position and orientation of the remote robot. Indeed, for vital tasks like navigation exploration, transport, search operations, etc. the knowledge of the remote robot is mandatory. This is usually achieved through self-localisation techniques (SLAM for instance or landmarks positioning). Unfortunately, these techniques fail when the remote environment is subject to changes. In this work, we aimed at investigating localisation of remote robots performed by naïve operators. Namely we wanted to see what type of information is suitable and how this information may be conveyed to operators to let them determine as accurately as possible the position and the orientation of remotely controlled robots. 3D and 2D maps were used as representation of the explored remote environment. As well, a video stream captured by an on-board camera and displayed respectively on a PC screen and on a HMD (head mounted display) was the only used feedback. In the next section, we give some hints about works. In section 3, we describe the experimental setup we built. In section we describe the method we followed to perform the experiments. We finish by given the results of these experiments (section 5) and we discuss in the last section.
2 Related Works In direct self-localisation, humans match surrounding visual features with those present on a map or any synthetic representation of the space. As well, they use also the past movement history in order to reduce the search space. In tele-operation, the visual feedback is not available directly as well as other proprioceptive cues (such as kinaesthetic information). For vision, the feedback is captured by a remote camera and displayed to operators. Given this, a first question arises: how operators perform localization task?. Do they perform the task as if they were there?. In other words, what are the spatial abilities and what are the cognitive functions one performs in order to achieve a spatial mapping (like for instance self-orientation, objects localization, etc)? A more specific issue is dealing with the effects of utilizing indirect or mediated information? The central problem of tele-operation is to design tools and develop concepts in order to simplifie the operators’ mental efforts and let them feel themselves present within the remote world, i.e., operators can achieve naturally physical and distant interactions such as goal directed or exploratory navigation, moving objects, placing sensors, etc [1], [2], [3] as if they were within the remote environment. Among the other senses (kinaesthetic, auditory, haptics, etc.), the visual channel and thus the spatial information is affected by distortions. In [4] and [5] for instance, authors demonstrated that operators have difficulties to localize the remote video camera and to keep a cap (maintaining the orientation constant) with a visual feedback Moreover, the showed also even if this latter has a good quality (e.g. a high bandwidth and a high resolution) the malfunctions remain. In [6] and [7], authors vary the conditions of displaying the visual information. They show that orientation performances for instance, are function of the displaying method: [8] showed that the best performances are reached for direct viewing while the worst are obtained when subjects are wearing head mounted displays. Moreover the imagination (position and orientation) of the robot location was the first objects
366
R. Chellali and K. Baizid
attending by subjects to achieve the given mission in several research works such as [24]. More precisely, interactive results have discussed the effect of changing the point of view of the environment on the user perception [9]. Provided views were based on virtual 3D environment. In fact, tele-operation usually need a high level of localization process to command to the remote robot. Several researches have considered the fact of the interface features as a trivial factor of tele-operation. Also, information resource [10] [11] considering user workload [12] and information feedback quantity [13] [14] were be discussed. The nature of the interface such as 2D map and 3D map, also, was investigated in several researches [2]. Results according to the 2D and 3D maps coupled with video stream were compared with limited abstraction by virtual representation view [2] and with the whole abstraction of the remote environment [8]. Number of helps during tele-operation process became available through Virtual Reality (VR) technology [15]. From the view point VR implementation several research works have been used VR as information feedback resource for robots visualization and simulation [16] [17] [18]. From another point of view VR was considered as an interaction interface which allows user to send comment to the real world situation [19]. Virtual abstraction concept has been used in many research areas to represent the current state information of the real robot’s environment in the virtual environment [16] [20]. Moreover, VR was involved in Human Robot Interaction (HRI) as a Collaborative Virtual Environment (CVE) which provides the possibility to multi users to interact together with multi robots [15]. Our contribution is concerned with the comparison of several self-localization methods. Our aim was to study the effects of the most commonly used technologies on the performances of remote robots localization.
3 Experimental Setup Our experiments were performed on the ViRAT platform developed in our Lab [16]. This platform allows tele-operators to control a set of mobile robots (both wheeled and legged) thanks to a collaborative and a mixed reality-based environment. For this experiment, only one operator is acting. As well, only the video stream provided by the embedded camera is considered. The later are controlled in pan and tilt. The experiment took place in our laboratory and the test area is an office-like space (see Fig. 1) of 12m*10m. 20 participants performed the experiments. The selected subjects were from different IIT laboratories (engineers, technicians, PhD students and Post-Docs). The age of subjects is ranged from 23 to 45 years. The percentage of males-female was 56%- 44%. 3.1 The Visual Information Participants visualize a live video stream acquired by the remote camera (640*480 pixels) and they control its pan and tilt following two modes: through a joystick or by head movements, respectively when the stream is displayed on a screen (DELL M1730 19220x1200) or on a head mounted display (VUZIX VR920, 640x480 pixels).
What Maps and What Displays for Remote Situation Awareness
367
3.2 Maps To enable users to indicate the position and orientation of the remote robot, a top view 2D and a 3D maps are used. Notice that the 3D map is in fact an interactive 3D environment that can be explored with a joystick. Both maps are exact replicas of the test area except for mobile or moving objects: chairs and persons, for instance, were not represented (Fig. 1).
Fig. 1. 3D and 2D maps used for robot localization task
3.3 The Localization Task The experiment task consists in finding the location of the remote robot in a minimal time. Before starting, each participant was informed about the goal of the experiment. They performed some training sessions in order to be familiarized with interface tools. Once the participants felt confident, they achieved 20 trials. 1. 2. 3. 4.
5 using the HMD and 2D maps, 5 using the screen and 2D maps, 5 using the HMD and 3D maps, 5 using the screen and 3D maps.
To switch between observation and decision phases, subjects ask verbally to display the live video stream or the working map. The verbal control avoids to subjects to handle an additional motor-based input tool. The visual information is displayed on a computer screen or on an HMD. The pointing to indicate the location of the remote robot is achieved with a joystick for the 3D map condition and a mouse for the 2D one. The two conditions (Display and Map) are randomly chosen among the 4 possibilities in order to avoid any bias due to learning. When the subject is ready, the localization process starts. The subject decides about the end of the task. For 2D maps by giving a first point representing the position and a second to derive the orientation. For 3D maps, users point a first point (position) and fixe the horizontal line of sight of the camera (orientation). Time, position and orientation are then registered as the result of the trial.
368
R. Chellali and K. Baizid
4 Results and Analysis We organize these results into three parts: 5. Influence of the reference map, 6. Influence of the feedback, 7. Influence of the displaying technology conditions. 4.1 Comparing 2D and 3D Maps Based Localization For 3D and 2D maps based localization, results are presented in (Fig. 2-5). The analysis shows that errors when using 3D maps are around 1m while for 2D maps, errors are around 1.5m (Fig. 2). On the contrary and as it was expected, subjects took twice time to achieve the task (Fig. 3) in 3D case. Following that, one can suggest that the additional dimension provided within 3D maps leads to a greater space to explore and thus it takes more time. Likewise, the additional dimension is an additional degree of freedom that helps to add constraints and information to disambiguate the robot location. Another possible explanation is dealing with habits: people are more used with 2D maps and they rely more on top views than to navigate with 3D environments. The analysis of the previous results gives us interesting hints and trends. Indeed and as we had is no significant difference between subjects concerning position errors in 2D map (p>0.052). On the contrary, for 3D there are differences (p<0.0015) in execution times: subjects do not have the same behavior suggesting the existence different skills in experiencing 3D environments. Regarding what subjects were asked to achieve, one has to take into account simultanously time and position errors. Indeed, subjects were asked to perform a good localization as quick as possible. Following that, the evaluation must combine both parameters time and errors. Conceptualy this equivalent to estimate the effort developed by subjects to achieve the task. In other words, a metrics t*Δp, built by multiplying the execution time by the error made by the subject is more suitable and more representative of the effort spent to achieve the task: a good performer is the one with the smaller error obtained in a minmal time, respectively, a bad performer will take a lot of time to provide a bad position-oirentation estimation.
Fig. 2. Comparison between 2D and 3D maps regarding to the positioning task errors
What Maps and What Displays for Remote Situation Awareness
369
Fig. 3. Comparison between 2D and 3D maps regarding to the task time
As expected, people made a larger effort in 3D maps conditions both for positioning and orientating. Likewise, behavioural differences are existing for 3D maps while in 2D maps people perform similarly. 4.2 Information Feedback We tested the effects of the field of view in order to see how subjects deploy motor activity in gathering the visual information (Fig.4 and Fig.5). Namely, subjects performed the localization task by using a video stream and a panoramic interactive view, respectively covering 36° and 180° of the FOV. We found that subjects are more accurate with a larger FOV. This confirms the geometrical intuition dealing with triangulation: the larger the angle between two lines the better is the accuracy of estimating the intersection of the two lines. On the other hand, people spent more or less the same time with both views.
Fig. 4. Comparison between video stream and panoramic views w.r.t position error (a) 2D map interaction (b) 3D map interaction
Fig. 5. Comparison between video stream and panoramic views w.r.t orientation error (a) 2D map interaction (b) 3D map interaction
370
R. Chellali and K. Baizid
4.3 Influence of the Displaying Technology In this part, we present the results we obtained for two displaying technologies (HMD and PC-screens). Results show that HMD, in several cases, was deteriorating the performance of the subjects compared to PC-screen (Fig. 6). These results were obtained when subjects interacted with both video stream and panoramic views feedbacks, as well for both 2D and 3D maps. We found that the use of the HMD has a negative effect and increases position errors. Considering execution times, we have an opposite an opposite effect: one can observe that people achieve the task faster with the HMD. This suggests that people integrate visual information more easily when it is correlated with head movement than when visual navigation is done through hands and joystick.
Fig. 6. Comparison between PC-screen and HMD interaction with video stream w.r.t position error ((a) 2D map interaction (b) 3D map interaction
5 Conclusion and Future Work In this work we have presented a study concerning the factors that influence remote localization tasks. Subjects estimate the position and the orientation of a remote robot as faster as possible. The variations was concerned with a reference map (a 3D and a 2D maps), the remote camera field of view and the tools to control it. Results show that 3D maps are more effective even if it takes more interaction time. On the other hand, one can see that a wider view leads to better results. Finally, the use of HMD and PC-screens seems to be dependent on the use of the two previous techniques (2D or 3D maps). In this study, only static conditions were considered: subjects do not perform any robot’s control action and this limits the localization capabilities. In real tele-operation conditions, users can use the robot mobility as new degrees of freedom to find the solution. The enlargement of the field of view suggests that and our next steps will focus on this aspect. Another issue we will tackle is the multi-robots system. For such systems, the complexity is higher but, on the other side, one has more information (more cameras) to rely on to find individual localizations.
What Maps and What Displays for Remote Situation Awareness
371
References 1. Halme, A., Suomela, J., Savela, M.: Applying telepresence and augmented reality to teleoperate field robots. Robotics and Autonomous Systems 26, 117–125 (1998) 2. Nielsen, C.W., Goodrich, M.A., Ricks, R.W.: Ecological Interfaces for Improving Mobile Robot Teleoperation. IEEE Transactions on Robotics 23, 927–941 (2007) 3. Darken, R.P., Kempster, K., Peterson, B.: Effects of streaming video quality of service on spatial comprehension in a reconnaissance task. In: Proc. Interservice/Industry Training Simulation Education Conf., Orlando, FL (2001) 4. Casper, J., Murphy, R.R.: Human–robot interactions during the robotassisted urban search and rescue response at the World Trade Center. IEEE Transactionson Systems, Man, Cybernetics 33, 367–385 (2003) 5. Burke, J.L., Murphy, R.R., Covert, M.D., Riddle, D.L.: Moonlight in Miami: A field study of human–robot interaction in the context of an urban search and rescue disaster response training exercise. Hum.– Comput. Interact. 19, 85–116 (2004) 6. Alfano, P.L., Michel, G.F.: Restricting the field of view: Perceptual and performance effects. Percept. Mot. Skills 70(1), 35–45 (1990) 7. Arthur, K.: Effects of field of view on performance with head-mounted displays. Ph.D. dissertation, Dept. Comput. Sci., Univ. North Carolina, Chapel Hill (2000) 8. Klatzky, R.L., Loomis, J.M., Beall, A.C., Chance, S.S., Golledge, R.G.: Spatial updating of self-position and orientation during real, imagined, and virtual locomotion. Psychological Science, 293–298 (September 1998) 9. Schmidt, D., et al.: Visuospatial working memory and changes of the point of view in 3D space, pp. 955–968. Elsevier, Amsterdam (2007) 10. Zaatri, A., Oussalah, M.: Integration and design of multi-modal interfaces for supervisory control systems. Information Fusion 4, 135–150 (2003) 11. Villella, P., Avizzano, C.A., Bergamasco, M.: An overview on portable Human Machine Interfaces (HMI) for teleoperation control of robotic swarms. In: AVT-SCI Joint Symposium on Platform Innovations and System Integration for Unmanned Air, Land and Sea Vehicles (May 2007) 12. Squire, P., Trafton, G., Parasuraman, R.: Human control of multiple unmanned vehicles: effects of interface type on execution and task switching times. In: Proceedings of the conference on Human-robot interaction, pp. 26–32. ACM, New York (2006) 13. Nguyen, L.A., Bualat, M., Edwards, L.J., Flueckiger, L., Neveu, C., Schwehr, K., Wagner, M.D., Zbinden, E.: Virtual reality interfaces for visualization and control of remote vehicles. Auton. Robots 11(1), 59–68 (2001) 14. Yanco, H.A., Drury, J.L., Scholtz, J.: Beyond usability evaluation: Analysis of human– robot interaction at a major robotics competition. J.Hum.–Comput. Interact. 19(1,2), 117– 149 (2004) 15. Mollet, N., Brayda, L.G., Chellali, R., Fontaine, J.-G.: Virtual Environments and Scenario Languages for Advanced Teleoperation of Groups of Real Robots: Real Case Application. In: 2nd International Conferences on Advances in Computer-Human Interactions, pp. 310– 316 (2009) 16. Baizid, K., Li, Z., Mollet, N., Chellali, R.: Human multi-robots interaction with high virtual reality abstraction level. In: Xie, M., Xiong, Y., Xiong, C., Liu, H., Hu, Z. (eds.) ICIRA 2009. LNCS (LNAI) vol. 5928, pp. 23–32. Springer, Heidelberg (2009)
372
R. Chellali and K. Baizid
17. Castillo-Effen, M., Alvis, W., Castillo, C.: Modeling and visualization of multiple autonomous heterogeneous vehicles. In: IEEE International Conference in Systems, Man and Cybernetics, vol. 3, pp. 2001–2007 (2007) 18. Crison, F., d’Huart, D., Burkhagdt, J.M., Dautin, J.L.: Virtual technical trainer: Learning how to use milling machines with multi-sensory feedback in virtual reality. IEEE Virtual Reality, 139–145 (2005) 19. Arrue, B.C., Cuesta, F., Braunstingl, R., Ollero, A.: Fuzzy behaviors combination to control a nonholonomic mobile robot using virtual perception memory. In: 6th IEEE International Conference on Fuzzy Systems, pp. 1239–1244 (1997) 20. Waller, D., Hunt, E., Knapp, D.: The Transfer of Spatial Knowledge in Virtyal Environment training. Presence 7(2), 129–143 (1998)
Evaluation of Disaster Information Management System Using Tabletop User Interfaces Hidemi Fukada1, Kazue Kobayashi2, Aki Katsuki2, and Naotake Hirasawa1 1
Department of Information and Management Science, Otaru University of Commerce, 3-5-21 Midori, Otaru, Hokkaido 047-8501, Japan {fukada,hirasawa}@res.otaru-uc.ac.jp 2 NTT COMWARE CORPORATION, 1-9-1 Kounan, Minato, Tokyo 108-8019, Japan {kobayashi.kazue,aki.katsuki}@nttcom.co.jp
Abstract. Most traditional disaster management systems in Japan employ input devices such as keyboards or mice, and it was necessary to post expert staff with high computer literacy to operate the system quickly and correctly in the tense situation when a disaster had occurred. In this research, a disaster information management system is proposed which can be easily operated, even under the disorderly conditions of a disaster, by the local government's person in charge of disaster management. This system achieves usability enabling easy input of damage information, even by local government staff with no expertise, by using a digital pen and tabletop user interface. Evaluation was conducted by prospective users using a prototype, and the evaluation results are satisfactory with regard to the function and operationality of the proposed system. Keywords: disaster information management system, tabletop user interfaces, geographic information system (GIS), digital pen.
1 Introduction In the past, the national government and local governments in Japan have introduced various disaster information systems to accurately and quickly ascertain and manage disaster information [1]. Most traditional disaster information systems are necessary to post expert staff with high computer literacy to operate the system quickly and correctly in the tense situation when a disaster occurs. However, in the current disaster response system of local governments, it is not easy for local governments to post such expert staff because they are struggling with staff cuts due to administrative and fiscal reform. In this research, we propose a disaster information management system that can be easily operated, even under the disorderly conditions of a disaster, by municipal personnel in charge of disaster management. The system uses a digital pen and tabletop user interface, allowing municipal personnel without specialized expertise to operate the system with a highly familiar action, i.e. using a pen to write on paper. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 373–380, 2011. © Springer-Verlag Berlin Heidelberg 2011
374
H. Fukada et al.
2 Proposal of Disaster Information Management System 2.1 Issues with Disaster Information Systems The user interfaces for the disaster information management systems that are widely used today are primarily graphical user interfaces (GUIs). Moreover, almost all personal computers have GUIs based on input devices like a keyboard and mouse. Therefore, specialized personnel trained in computers must be assigned to the department responsible for disaster management. If a disaster occurs and municipal personnel lack specialized expertise to operate a disaster management system, those personnel must operate a system with input devices that they are unfamiliar with. That situation will hamper the collection of precise disaster information and impede the rapid sharing of that information. Thus, tangible user interfaces (TUIs) that link physical objects and digital information have attracted attention because data can be entered easily and more instinctively with those interfaces [2]. Reports have described systems for urban planning and for use in the collaborative design process that feature an environment with a digital pen and a tabletop user interface based on the concept of a TUI [3],[4]. In the area of disaster management, an information system using a TUI has been proposed in the form of a disaster simulator available for use when managing a disaster [5], but systems to assist disaster response headquarters have yet to be proposed. The disaster information system proposed in this study incorporates a tabletop user interface and a digital pen to provide users, which include local government staff with no computer expertise, with a collaborative and easy-to-use environment. Users can directly manipulate a digital map projected on the table and use a digital pen as an input device to enter disaster information on the map. This setup is intended to consolidate accurate disaster information and allow users to quickly share that information with all concerned. 2.2 Concept of the Proposed System There are 3 design concepts for the proposed disaster information management system: Concept 1: To allow even non-specialized personnel to readily understand and master operation of the system and provide a use interface that would be readily available even under chaotic conditions. Concept 2: Disaster information coming into a disaster response headquarters should be converted into digital form without the need for complicated data entry and conversion work by hand, and management of information in conjunction with positional information (maps) should be possible. Concept 3: To retain logs of the course of emergency response efforts and facilitate their use in examining emergency disaster responses by managing disaster information that changes over time.
Evaluation of Disaster Information Management System
375
2.3 Design Policy of System Configuration Elements The system architecture of the proposed information system is shown in Fig. 1. The proposed system consists of two subsystems. One subsystem is a paper-based input system that would presumably be used by supervisors located at evacuation sites and the disaster site to report site conditions. Another subsystem is a tabletop system intended for use by the disaster response headquarters and would allow users to consolidate and manage entered disaster information and view that information on digital maps. Once disaster site supervisors fill the input forms (e.g. damage report form) with a digital pen, the entered information is converted into a digital format and registered on an Information Management Server at the disaster response headquarters via a network. On the other hand, personnel at the disaster response headquarters can view consolidated information on digital maps and examine what measures to take. At the disaster response headquarters, the system features a digital map that is projected onto a large sheet of paper laid out on a desktop. Information can be written directly onto the screen with the digital pen. We expect that the tabletop user interface would provide users with the feeling like working with“analog tools”i.e. conventional paper maps, so that the system would allow personnel who are not well-versed in computer use to easily operate the system. Moreover, regarding a “time-line” feature, all the input information are managed time-line basis, so that the stored disaster information can be verified as time passes.
Information Management Server
Network
Plan measures based on on-the-spot report
Disaster Information DB
Map DB
Input on-thespot report
Main Screen
Input form
Digital Pen
Projector
(Displayed by the projector)
Data Sending Server
Digital Pen
Digital Paper
Digital Paper
Fig. 1. System architecture
3 Implementation and Evaluation of a Prototype 3.1 Prototype Implementation We implemented a prototype to evaluate the effectiveness of the proposed system. The system configuration and features of this prototype are shown in Fig. 2.
376
H. Fukada et al.
An information management server was constructed based on the Geographic Information Systems (GIS) and commercially available digital map data. A digital pen allows handwritten data to be sent to a computer via Bluetooth. The table is covered by a paper, called digital paper, with special dot patterns for the positional coordinates of digital pen input. Information that the user enters on the table by using the digital pen is consolidated by the information management server and managed on the GIS. The information coordination server and the server to transmit form data were connected in a LAN. Each server was a readily portable laptop PC in light of the conditions that would actually be encountered when the system was set up in a disaster response headquarters.
Data Sending Server
Information Management Server Disaster Map DB
LAN
Information DB
GIS Engine:GeoPLATS HP Pavilion Notebook PC tx1000/CT OS:Windows 2003 Server CPU:AMDTurion 64X2 TI-60 Memory:2GB
Projector (Hitachi CP-A100J) Resolution:1024 768
×
HP Pavilion Notebook PC tx1000/CT OS:Windows 2003 Server CPU:AMDTurion 64X2 TI-60 Memory:2GB
Bluetooth
Bluetooth Menu Icon Sheet
Input Form Main Screen
Digital Pen (Anoto Maxell K.K.) Digital Paper
Digital Pen (Anoto Maxell K.K.)
Digital Paper
Fig. 2. System architecture of the prototype
3.2 Composition of the Prototype In terms of system features, a digital map as a main screen is projected on a table. At the right hand side of the main screen, there is a menu icon area. Manipulation of the digital map is done by touching icons with the digital pen (Fig. 3.). Entry of information onto the map is done by directly selecting icons displayed on the map with the digital pen. The types of icons indicating casualties or damage includes Casualties, Building Damage, Roadway Damage, and Road Closure and had been designed beforehand (Fig. 4.). In addition, a "time-line" feature is provided to filter and display the information in an easy-to-view format.
Evaluation of Disaster Information Management System
377
Fig. 3. System in use
Hazard Type / Area Icons Human Damage Icons Building / Road Damage Icons Measures System Tools
Fig. 4. Menu icon sheet
When the date and time is specified, the feature displays only information registered in a corresponding period of time. Information is sent to the screen at fixed intervals, so the course of emergency disaster responses can be viewed. 3.3 Evaluation We conducted experiments to evaluate the usability of this system with supposed users, municipal employees. The situation that we are evaluating is shown in Fig. 5. Participants were 16 total, 7 personnel in charge of disaster management in the Integrated Disaster Management Office, General Affairs Division, Iwate Prefecture and 9 fire department personnel of the Fire Department Headquarters of the City of
378
H. Fukada et al.
Ichinoseki. All the participants had experiences in disaster management with analog tools (paper maps), but most of them were not used to computers. After a short demonstration of the system, the participants were asked to input and view fictitious disaster information using the prototype, and then answer questionnaire on a 5-point scale, with 5 indicating the most favorable assessment and 1 indicating the least.
Fig. 5. Scenes in evaluation
4 Results and Discussion Results of an evaluation experiment using a prototype are shown in Fig. 6. To determine if Concept1 had been successfully realized, users were asked about the ease of operating the system. Results indicated that the ease of icon selection received a favorable assessment, as represented by an answer of 4 or 5 on a 5-point scale, from 100% of respondents. A major reason for the system’s favorable reception was its use of a digital pen. This is because a digital pen allows entry of information with a familiar feel not experienced with a conventional input device. With regard to the realization of Concept2, the usefulness of GIS-based information management was assessed favorably by 94% of respondents, with an average score of 4.4 points. The reason for this is because the proposed system can respond flexibly to changing situations commensurate with a disaster. That is, the proposed system can immediately rewrite existing information as events progress. Furthermore, this updated information is displayed on the digital map to allow it to better reflect conditions at the disaster site. With regard to the realization of Concept3, the usefulness of the “time-line” feature was assessed favorably by 94% of respondents, with an average score of 4.7 points; these results were satisfactory. The reason for this is because the system provides a record of disaster management and allows those efforts to be verified later, which is one aspect a disaster information management system must have. That is, the status of emergency response efforts must be recorded and correctly catalogued in order to verify what response efforts have taken place.
Evaluation of Disaster Information Management System
379
For a disaster information management system to perform reliably, it must operate effectively in a disaster management office in the midst of confusion stemming from a disaster. The proposed system provides a visual record of emergency response efforts, allowing tacit knowledge to be transformed into explicit knowledge. The proposed system did have one flaw. The visibility of icons displayed on the digital map was assessed favorably by only 69% of respondents, with an average score of 4.00 points. The reason why icon visibility was assessed less than favorably may have been because the table itself was a bit too bright. The opinions of personnel in the disaster management office will be subsequently sought in order to resolve this problem.
operability of the digital pen usefulness of GISbased information management
31%
63%
usefulness of the "time-line"
19%
75%
visibility of the icon
20%
good
40%
usually
60%
6%
6%
25%
6%
80%
100%
31%
38%
0%
very good
31%
69%
slightly bad
bad
Fig. 6. Evaluation results
5 Conclusion A disaster information management system has been proposed here. The system with a digital pen and tabletop user interface allows municipal personnel who don’t have specialized expertise to easily operate the system even in tense situation. Evaluation was conducted with municipal employees using a prototype. The proposed system had satisfactory results with regard to its functionality and operability. Major factors for the system’s favorable results were its use of a digital pen and GIS-based information management. However, the tabletop's luminosity must be improved in order to make the entry of disaster information easier. Topics for the future include ensuring that the response time and installation requirements for the system are adequate for its use at a disaster site. In addition, points for improvement should be ascertained through long-term assessments in
380
H. Fukada et al.
instances like disaster drills conducted by municipalities. Moreover, what features are needed should be examined with an eye toward use of the proposed system even under ordinary conditions. Acknowledgements. The authors wish to thank the staff responsible for disaster management in the Integrated Disaster Management Office, General Affairs Division, Iwate Prefecture and fire department personnel of the Fire Department Headquarters of the City of Ichinoseki for their help during this investigation and system evaluation.
References 1. Phoenix. Disaster Information Management System of Hyogo Prefecture, http://web.pref.hyogo.lg.jp/pa17/pa17_000000059.html (in Japanese) 2. Ishii, H., Ullmer, B.: Tangible Bits: Towards Seamless Interfaces between People, Bits, and Atoms. In: Proc. Conf. on Human Factors in Computing Systems (CHI 1997), pp. 234–241 (1997) 3. Underkoffler, J., Ishii, H.: Urp: a luminous-tangible workbench for urban planning and design. In: Proc. Conf. on Human Factors in Computing Systems (CHI 1999), pp. 386–393 (1999) 4. Haller, M., Brandl, P., Leithinger, D., Leitner, J., Seifried, T., Billinghurst, M.: Shared design space: Sketching ideas using digital pens and a large augmented tabletop setup. In: Pan, Z., Cheok, D.A.D., Haller, M., Lau, R., Saito, H., Liang, R. (eds.) ICAT 2006. LNCS, vol. 4282, pp. 185–196. Springer, Heidelberg (2006) 5. Kobayashi, K., Katada, T., Kuwasaaw, N., Narita, A., Hiarno, M., Kase, I.: Development of a Disaster Simulation System Using Tangible User Interfaces. Journal of Social Safety Science (9), 103–109 (2007) (in Japanese)
Relationality-Oriented Systems Design for Emergence, Growth, and Operation of Relationality Takuya Kajio, Manami Watanabe, Ivan Tanev, and Katsunori Shimohara Doshisha University, 1-3 Tatara-Miyakodani, Kyo-Tanabe, 610-0321 Kyoto, Japan {kajio2010,watanabe2010,itanev,kshimoha}@sil.doshisha.ac.jp
Abstract. Relationality-oriented system science introduced here is a new research field where we try to understand and grasp systems as substance in which humans, tangible and intangible artifacts are interdependent and function together. This paper proposes a research framework for a social network system that elicits relationality from people’s daily life, grows relationality with selfpropagation and self-proliferation mechanisms, and enables to promote, manage and operate reproduction of relationality.
1 Introduction While the progress of Information Communication Technologies (ICT) has been achieving the so-called advanced information society, some people seem to be left behind, isolated, and/or alienated from a community and/or society in Japan. In other words, the up-coming society should be rich in information and channels for communications, but it might be lacking in linkage, relation and communications between people in reality. The role which ICT and social information infrastructure should play is not small in the senses that they could drive the revolution of corporate culture, and that they would have potential to create a society where people can feel mental affluence and linkage with others. It is a fact that nowadays all of our activities covering from our daily life to our social and economical lives deeply depend on various kinds of social systems, whether or not we are conscious of the fact. Considering such trend, it should be difficult for only technology-oriented systems to attain and complete the role. In order to effectively operate and manage such systems, it should be indispensable to make much of and utilize human relations such as linkage between individuals, creativity and/or cooperativeness in people, a community and/or a society. Relationality-oriented system science is a new research field where we try to understand and grasp systems as substance in which human, tangible and intangible artifacts are interdependent and function together [1]. The view of such relationalityoriented system can be applied for all systems in which humans are concerned and involved. We have been seeking the meaning and significance of relationality-oriented systems, devising methodologies to create and facilitate such systems, and pursuing social information infrastructure where people can feel mental affluence and linkage G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 381–387, 2011. © Springer-Verlag Berlin Heidelberg 2011
382
T. Kajio et al.
with other people. Our research objectives are to promote the utilization of human relations through constructing social information infrastructure as a relationalityoriented system, to achieve affluent linkage with others which activates a local community, and hopefully to provide a new sense of value towards the up-coming society in near future. This paper proposes a research framework for a social network system that elicits relationality from people’s daily life, grows relationality with self-propagation and selfproliferation mechanisms, and enables to promote, manage and operate reproduction of relationality.
2 Social Network System for Emergence, Growth and Operation of Relationality Relationality here denotes interactions through which two entities mutually influence each other, linkage over time and space, and context as a result of accumulated interactions and linkage. Relationality is not limited to physical and spatial one, or rather basically invisible and information-driven, and sometimes ecological and environmental. Social and economical systems, culture, religion, and senses of value, therefore, are included in the relationality [2]-[5]. In this research, we envisage social information infrastructure, as a relationalityoriented system, which would naturally promote emergence of relationality between people, autonomously propagate and grow relationality somehow, and eventually form a network of relationality, so that people could be naturally and spontaneously involved in communications and activities in a local community. An image of the system’s behaviors is as follow: People’s small action concerning communication and/or community in their daily life would generate some value, naturally circulate the value somehow between people, and gradually and spontaneously weave a sort of linkage network. For example, someone made a phone call to his/her friend and could enjoy their conversation. They might want to express a sort of goodwill, smile, thank, and/or feeling of empathy, sympathy, and a sense of linkage. And they can naturally and easily express the feeling with a very simple action like the “like!” in the Facebook and a smile sign. In addition, if those whom he/she has some linkage with are vigorous in communications and activities in a local community, he/she could share in bounty of vigor. To the contrary, if he/she behaves actively in a community, he/she could make a contribution to them and/or the community. From the viewpoint of relationality-oriented system science, the system should autonomously change its functionality and behavior, and self-manage itself so that it could promote and maintain the reproduction of relationality, depending on the emergence, propagation, and growth of relationality. Aiming to create such social network system, we are working on the following three basic mechanisms that should be devised and implemented; (1) Quantification of relationality, and mechanism for autonomous propagation and proliferation of relationality, (2) Mechanism to monitor and control emergence and growth of relationality as well as dynamical situation of the relationality network, and (3) Mechanism to promote, manage and operate reproduction of relationality.
Relationality-Oriented Systems Design for Emergence, Growth
383
2.1 Quantification of Relationality and Its Self-propagation/-proliferation Mechanism For the quantification of relationality, we introduce an index of relationality which is generated along with emergence of relationality between people, tangible and intangible artifacts. People would be able to express a sort of goodwill, smile, thank, and/or feeling of empathy, sympathy, and a sense of linkage with a very simple action, for example, when they communicated and enjoyed their conversation. Such people’s small action concerning communication and/or community in their daily life generates the index of relationality. The index should circulate between people person by person and/or event by event every time people make a small action related to communications, such as representing and expressing such will and/or feeling above-mentioned to others in a local community. Thus, the index self-propagates and self-proliferates through its circulation. We could find some similarity between the relationality index and the LETS (Local Exchange Trading System) or local currency in a sense that they both play a role of connecting a person to a person and activating intercommunication of them, and in another sense that they both work as media for people to share and transmit a sense of value. The big difference we dare to distinguish them might be that the relationality index would stem from the concept of “Gift and Circulation” as opposed to that the virtual currency would complement the concept of “Give and Take”. Some of virtual currency have no interest or even minus interest so as to promote and activate exchange of something for something between people. In contrast, the relationality index focuses on its circulation rather than its exchange, and then it can be regarded as a sort of investment for human relation and/or human capital. In terms of technologies, we employ multi-agent based approach to model the relationality network and build such autonomous self-propagation and selfproliferation mechanisms for increasing liquidity of relationality. More concretely, an individual personal agent corresponding to an individual person should log the person’s communication and interaction with artifacts, keep the index of relationality, and propagate and circulate the index through agent-to-agent communications based on their log information. 2.2 Monitoring and Controlling Mechanism for Relationality Network This mechanism is for monitoring and controlling the dynamic situation of the relationality network, such as its emergence, growth, and stock and flow of the index. While the above mechanism for self-propagation and self-proliferation should work as the micro-level control, this mechanism should work as the macro-level control. Specifically, using this mechanism, we build a simulation model based on System Dynamics Modeling focusing on the stock and flow of the relationality index. The range and speed of self-propagation, ratio of proliferation in circulation, or ratio of attenuation in stagnation on the relationality index should influence the dynamics of the relationality index. We investigate, therefore, how such parameters affect the dynamics, and effective range of the parameters should be clarified through simulations [6]. 2.3 Mechanism for Reproduction of Relationality The direct meaning of this mechanism is to promote, manage and operate reproduction of relationality. Significant implication, however, consists in how to effectively interact
384
T. Kajio et al.
and function together the bottom-up mechanism in the micro-level and the top-down mechanism in the macro-level as above-mentioned. It should be the essence of relationality-oriented system that a system itself can autonomously regulate how to interact and function together both mechanisms to maintain its performance in dynamical and fluctuating environment. That is, the system should conduct automatic tuning of the parameters above-mentioned by using evolutionary computations, especially genetic programming, to imitate the mechanism of biological evolution on computers [7]. More concretely, the system should autonomously change its function and behavior, and self-manage itself so that it could promote and maintain the reproduction of relationality, depending on the emergence, propagation, and growth of relationality. Another challenge might be how to elicit an interest in this sort of system from people who are not so positive in communications, and to make them in the mood to take part in and be involved in this sort of system. For that purpose, we are investigating possibilities of agent-mediated communications, a narrative approach based on automatic story generation, and linkage mediated through virtual common in the Internet. Agent-mediated communications as a system enables to shift from a humantriggered passive system to a proactive system where individual personal agents observe the situation of their host users and sometimes proactively trigger connecting their host users for communication depending on the situation. It should be useful and effective to introduce a narrative approach, based on automatic story generation, into such agent-mediated communication and community. Postulating that the significance of communications is to stimulate people’s imagination and creativity, introducing a narrative approach and an automatic story generation system into so called Human Agent Interactions might have a great potential to involve people in communications.
3 Linkage Mediated through Virtual Common As one of research topics on relationality design [8][9], we proposes here a new social system based on the Internet by which people could aware of heart-to-heart linkage with other people mediated through a virtual common in the Internet, and by which people could feel and establish their presence in this real life world. Along with the progress and spread of the Internet, as a dark side of this trend some people seem to deepen the feeling of disconnectedness to and alienation from their society. In this research, we are targeting such people who cannot help but shut their heart to a community and/or society even under the so-called Ubiquitous network environment. 3.1 System Model People generate information naturally and unconsciously just living and coping with life through interactions with everyday things, even if they stay alone at home. So, we provide a user with a real entity (RE) in the real world which a user interacts with and
Relationality-Oriented Systems Design for Emergence, Growth
385
generates information through interactions with, a virtual entity (VE) which is correlated with RE, and a virtual space which consists of the user’s and others’ VEs. In a sense that the virtual space is comprised of VEs correlated with other users, the virtual space should work as a sort of virtual common (VC). In other words, the VC is shared by people, where users’ VEs exist and interact mutually. The mechanisms and/or behaviors in this system model are as follows (Fig. 1): 1. A user’s behaviors generated by interactions which the user conducts with RE in his/her daily life influence VE as his possession in the virtual world, for example, growth of VE and/or change of color and/or shape. 2. In the same way, other users’ behaviors by interactions which other users conduct with their REs in their daily lives influence the corresponding VEs which they possess individually, and then the user’s and others’ VEs interact each other in the VC where those VEs exist. 3. Every user can monitor and/or observe the situation of VC and especially the status of his/her VE among VEs through a PC screen or TV screen at home. 4. There is some mechanism with which VEs themselves form a sort of community, and can observe a change of others’ VEs. In addition, based on such observations, they can exchange simple messages, for example, “you looks fine” when a given VE is active, and “what’s wrong?” when the VE is inactive. 5. Such messages exchanged within VE community are eventually transmitted to the user, when the user approaches his/her RE corresponding to the VE in the real world.
Fig. 1. System model for linkage mediated through virtual common
386
T. Kajio et al.
3.2 Research Issues The above item 1) means that every VE depends on the corresponding RE and then eventually on the user. In a sense that VE represents and/or interprets the interactions between the user and RE, we can regard VE as the user’s self-expression. The items 2) and 3) above mean that every user can view the whole world of the VC, and see his/her own VE and its position to the whole world. In a sense, the user can give an eye to other users’ VEs as well as his/her VE, and vice verse the user can feel other users’ eye given to his/her VE. In other words, we expect that this would be a sort of mechanism with which the user is aware of his/her own “self-existence”. It implies that even if a user lives alone without any explicit connection to a society, he/she can feel and have implicit connection to a sort of community or society. Research issues are whether it works as a mechanism a) to motivate a user to make an action for self-expression, b) to feel his/her own presence through his/her virtual entity in a virtual space, and c) to naturally devote him/herself into continuous cycles of self-expression and proof of self-presence, whether or not he/she is conscious. The above items 4) and 5), in addition, mean that VEs get a sort of autonomy and interactivity. Specifically, VEs should be comprised of and work as a multi-agent system, and they themselves have their own dynamics in the virtual space. It means that there should be a possibility for VEs to behave beyond a user’s control as well as reflecting the user’s will. Research issues concerning this feature are d) whether a user gets aware of other users’ existence over the virtual space, and e) whether a user feel and get conscious of “linkage” with other people.
4 Conclusion We have introduced relationality-oriented systems science as a new research field where we try to understand and grasp systems as substance in which humans, tangible and intangible artifacts are interdependent and function together. From a viewpoint of relationality-oriented systems science, we proposed a research framework for a social network system that elicits relationality from people’s daily life, grows relationality with self-propagation and self-proliferation mechanisms, and enables to promote, manage and operate reproduction of relationality. Especially, we discussed research issues on the basic mechanisms for the social network system, that is, 1) quantification of relationality and its self-propagation/-proliferation mechanism, 2) monitoring and controlling mechanism for relationality network, and 3) mechanism for reproduction of relationality as the essence of relationality-oriented system design. As one of research topic on relationality design in order to elicit the people’s interest, we also proposed a new social system in which people can find and become aware of linkage with others mediated through virtual common in Internet. This research and system itself are still in conceptual level, and we have to tackle many research issues as mentioned in this paper. Relationality mediated by information, tangible and intangible, artificial artifacts influences human behaviors, thoughts and consciousness. Sooner or later intangible artifacts are formalized as social systems, tangible artifacts are embodied as objects, and then such social systems and objects work for generating new information. Such
Relationality-Oriented Systems Design for Emergence, Growth
387
social systems, objects and new information, in turn, mediate new relationality to influence human behaviors, thoughts and consciousness. We believe that it should be essential to repeat such circulating generation and reciprocal interactions of human consciousness and senses of value with relationality. Acknowledgements. The authors would like to thank Miss. Yurika Shiraishi, a master student, Doshisha University, for discussions about research on linkage mediated through virtual common.
References 1. Shimohara, K.: Designing Relationality: Towards Relationality-Oriented Systems Design. In: Proceedings of the Int. Conf. on Humanized Systems 2010 (ICHS 2010), September 2010, pp. 24–29 (2010) 2. Thinking Through Relationality: Relations, connectedness and social distance, Morgan Centre for the Study of Relationships and Personal Life, http://www.socialsciences.manchester.ac.uk/morancetre/events/ 2007/relationality/ 3. Clarke, S., Hahn, H., Hoggett, P. (eds.): Object Relations and Social Relations, The Implications of the Relational Turn in Psychoanalysis. KARNACBOOKS (2008) 4. Mitchell, S.: Relationality: From Attachment to Intersujectivity. Relational Perspective Book Series, vol. 20, p. 200. The Analytic Press (2000) 5. Wetherell, M. (ed.): Identities, Groups, and Social Issues, SAGE and The Open University (1996) 6. Shimohara, K.: Network simulations for relationality design - an approach toward complex systems. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS (LNAI), vol. 5178, pp. 434–441. Springer, Heidelberg (2008) 7. Tanev, I.: DOM/XML-based portable genetic representation of the morphology, behavior and communication abilities of evolvable agents. Artificial Life and Robotics 8(1), 52–56 (2004) 8. Nakano, Y., Morizane, M., Tanev, I., Shimohara, K.: Relationality design toward enriched communications. In: Jacko, J.A. (ed.) HCI International 2009. LNCS, vol. 5612, pp. 492– 500. Springer, Heidelberg (2009) 9. Shimohara, K.: Relationality Design. In: Proceedings of 2008 Int. Conf. On Humanized Systems, pp. 365–369 (2008)
Real-Time and Interactive Rendering for Translucent Materials Such as Human Skin Hiroyuki Kubo1, Yoshinori Dobashi2, and Shigeo Morishima1 2
1 Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo, Japan Hokkaido University, Kita 14, Nishi 9, Kita-ku, Hokkaido, Japan [email protected], [email protected], [email protected]
Abstract. To synthesize a realistic human animation using computer graphics, it is necessary to simulate subsurface scattering inside a human skin. We have developed a curvature-dependent reflectance functions (CDRF) which mimics the presence of a subsurface scattering effect. In this approach, we provide only a single parameter that represents the intensity of incident light scattering in a translucent material. We implemented our algorithm as a hardware-accelerated real-time renderer with a HLSL pixel shader. This approach is easily implementable on the GPU and does not require any complicated preprocessing and multi-pass rendering as is often the case in this area of research. Keywords: computer graphics, real-time rendering, subsurface scattering.
1
Introduction
Simulating sub-surface scattering is one of the most important issues to realistically synthesize translucent materials and has received a great deal of attention from the rendering research community over a decade. We propose a curvature-dependent reflectance functions (CDRF) that mimic the presence of a translucent material which is dominated by higher-order scattering events such as human skin, marble, jade, and so on. Since the proposed function is a local illumination model, the method synthesizes realistic translucent materials in real-time. The features of our method are as follows. ─ computationally lightweight (almost the same cost as Lambert shading.) ─ ingle-pass rendering (saving computation cost and graphics memory.) ─ unnecessary mesh parameterization (can be applied various objects compared with [1].) ─ can be easily used to stylize subsurface scattering effects. We focus on the issue which is relatively local scale scattering, such that the incident light hit the surface, and bleed into a narrow area. Although, our method does not consider an exact scattering in a whole shape of an object, we propose a plausible approach using local illumination approximation. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 388–395, 2011. © Springer-Verlag Berlin Heidelberg 2011
Real-Time and Interactive Rendering for Translucent Materials Such as Human Skin
389
A fast approximation of subsurface scattering has been receiving a great deal of attention from game developers and others. Our method is straightforward for artists to control and much simpler than Kolchin's [2] described below.
2
Previous Work
Rendering translucent materials has been an interesting research for a decade. There are currently two types of approaches to synthesize these objects, one is for off-line rendering and the other is for real-time rendering. 2.1
Offline Rendering
The photon mapping [3, 4, 5] is capable to simulate light transport inside a translucent material accurately, though, it is also known that the calculation cost is notably expensive. Since it basically requires over several hours to synthesize a single image, the photon mapping is not practical for real-time rendering. The method developed by Jensen et al. [6, 7] improved significantly on the speed of the simulation. Using a bidirectional surface scattering distribution function (BSSRDF) model, they combine a dipole diffusion approximation with single scattering computation, yet still cannot produce real-time rendering. 2.2
Real-Time Rendering
In 2007, d'Eon et al. [1, 8] developed a subsurface scattering rendering method using texture-space diffusion technique. They approximate the profile of multi-pole model using sum-of-Gaussians formulation, then apply gauss filtering of varying radius of blur to an irradiance map in the texture space. They realize a high quality, gpu accelerated method, however, it also contains two issues that cannot be avoided easily. One is the calculation cost and the size of required memory. Compared with the latest CPU and graphics processing unit, current consumer gaming devices are not able to compute very fast, and do not equip enough memory. Their method requires more than ten rendering passes (1-pass for rendering irradiance texture, 12-passes for 6 gaussian convolution, 1-pass for stretching and 1-pass for composit), thus, the method is computationally expensive and requires huge capacity of graphics memory. The other problem is the necessity of mesh parameterization. Since their technique approximates subsurface scattering by blurring irradiance in the texture space, every vertices of the model are necessary to be parameterized. Mesh parameterization of arbitrary shape is still challenging issue. Therefore, their technique is not very much practical for current consumer gaming devices. Kolchin[2] developed a curvature-based shading method. The basic idea behind this paper is similar to that of Kolchin's, in that the effects of subsurface scattering depend on curvature. Curvature of a surface can be used to derive a local-illumination approximation of subsurface scattering. In this paper, we describe that our method is significantly simpler and easier to control than Kolchin's method. Compared with these works, we propose a much faster technique than Jensen's Dipole Model and d'Eon's texture space diffusion, and a significantly simpler and easier to control than that of Kolchin's.
390
3
H. Kubo, Y. Dobashi, and S. Morishima
Our Approach
According to the previous works [6, 7], we realize that the effects of subsurface scattering tend to be more noticeable on small, intricate objects than on simpler, flatter ones, which indicates that surface complexity largely determines these effects. For the purposes of our research, we decided to use curvature to represent surface complexity, combined with a simple local illumination model. In this paper, we describe a curvature-dependent reflectance function (CDRF) for synthesizing translucent materials. CDRF is not a global illumination model such as BSSRDF, but a local illumination model. Compared with BRDF which is typical but also local illumination model, CDRF depends on not only incident direction surface curvature . In most cases of this kind of research, there are often too many parameters for artists to control subsurface scattering and stylize the appearance of translucent materials. Instead, to compute radiance on a surface using CDRF, there is only one that represents the degree of light scattering inside a material. is not parameter only provided by curve-fitting to simulated data-set, but also manipulated by an artist. Furthermore, this is easy implementable on the GPU and doesn't require any complicated pre-processing and multi-pass rendering as is often the case in this area of research [9].
Fig. 1. Workflow
To synthesize an image of translucent materials, we used the following procedures shown on figure 1. The input is 3D mesh object, and the output is a synthesized image. Prior to rendering, we acquire surface curvature of the mesh on CPU computing proposed by [10, 11]. The value of CDRF depends on curvature κ, scattering parameter
Real-Time and Interactive Rendering for Translucent Materials Such as Human Skin
391
and · . We pre-compute it for all combinations of these parameters and store it in a 2-dimensional look-up table (lut). Then, using this lut for faster computation, we calculate CDRF in pixel shader to render translucent materials in real-time.
4
Curvature-Dependent Reflectance Function
The effects of subsurface scattering tend to be more noticeable on smaller, more intricate objects than on simpler, flatter ones. This seems to indicate that surface complexity largely determines these effects. For the purposes of our research, we decided to use curvature to represent surface complexity, combined with a simple local illumination model. To represent subsurface scattering effect, we propose curvature-dependent , . CDRF is defined by convolution of incident reflectance function (CDRF) light energy and a gauss function , for providing blurring effect. ,
(1)
where, max cos ,
√
,0
exp
(2) (3)
axis, σ is supposed to be, relatively, in inverse Accordingly, observing on proportion to radius . Therefore, we assume that (4)
Fig. 2. Curve fitting
392
H. Kubo, Y. Dobashi, and S. Morishima
σ corresponds to the scattering intensity in the unit sphere. Using this reflectance function, diffuse radiance on a surface is calculated as following equation, ,
(5)
Note that, is an intensity of incident light from direction , is diffuse albedo, and Ω is over all the incident light directions. To confirm the validity of the formula, Eq(1), we rendered several spheres of varying radii to reveal the relationship between curvature and radiance using the photon tracing. The spheres are illuminated by a directional light from the left side of the sphere, as shown in Figure 2-(a). The calculated color of each pixel on the equatorial line represents a relation of light angle and radiance of the sphere's particular curvature (Figure 2-(b)). Then, we fit a curve to the obtained data using CDRF formula Eq. (2). The curves are not exactly the same, but it is similar and well fit. This means that CDRF approximate formula we propose is roughly accurate. In terms of the way to determine the scattering parameter , it is not only provided by curve-fitting to simulated data-set, but also manipulated by an artist.
5
Results
The results are summarized in figure3-4. These images are obtained from our implementation running on a 3.06GHz Intel® Core™ 2 Duo CPU with NVIDIA® GeForce® 9300M GS GPU. We implemented our algorithm as a hardware-accelerated real-time renderer. The renderer is implemented as a pixel shader in HLSL.
(a) Curvature
(c) Texture Blur Method
(b) Lambert
(d) Our Method
Fig. 3. An example of human head (©2009 TECMO KOEI GAMES CO., LTD. Team NINJA All rights reserved.)
Real-Time and Interactive Rendering for Translucent Materials Such as Human Skin
(a) Lambert
393
(b) Our Method Fig. 4. an example of human skin
(©2009 TECMO KOEI GAMES CO., LTD. Team NINJA All rights reserved.)
To validate our algorithm, we have tested our algorithm using a human head model. Figure3-(a) displays surface curvature of the model, hot colors represent higher curvature, and vice versa. For comparison, figure3-(b) and (e) are rendered using Lambertian, figure3-(c) and (f) are rendered using texture blurring method[1]. Figure3-(d) and (g) are synthesized images using our CDRF. Compared with the Lambertian, we are able to synthesize more natural, softer shading effects especially around its nose and cheek. The appearance of ours (figure3-(d),(g)) may be not very effective than that of texture blurring method (figure3-(c),(f)), though, the advantage of our method is calculation cost and unnecessary of mesh parameterization. Compared with Lambert shading, additional costs to compute CDRF are only 3times texture sampling and a few arithmetic operations, thus the frame rate of our method is not significantly increased. Since our method is implemented as a singlepass shader, according to our implementation and computational environment, over 10 times faster than texture blurring method which requires over 10-passes. Furthermore, we rendered another example of a human skin as shown in figure4. Compared with the hard appearance of the image on the left, our method is able to synthesize soft appearance of the skin.
394
6
H. Kubo, Y. Dobashi, and S. Morishima
Conclusion and Discussion
We have developed a method for approximating the effects of subsurface scattering using a curvature-dependent reflectance function. Since the function is a local illumination model, we are able to synthesize realistic translucent materials in realtime. Furthermore, our system can be easily used to stylize subsurface scattering effects because only one parameter σ is required. We approximate an object surface locally as a sphere's surface of corresponding radii according to the curvature. Thus, we cannot consider global object shape such as object thickness. Therefore, compared with a real object, difference of appearance tends to be more noticeable, when the object is especially thin such as leaves and papers. However, our method can be applied almost all practical scene, and it is possible to synthesize translucent materials plausibly. In our future work, we will apply our techniques to deformable objects by computing curvature on the GPU. Acknowledgement. We appreciate the feedback offered by TECMO KOEI GAMES CO., LTD.
References 1. d’Eon, E., Luebke, D., Enderton, E.: Efficient rendering of human skin. In: Rendering Techniques 2007: 18th Eurographics Workshop on Rendering, June 2007, pp. 147–158 (2007) 2. Kolchin, K.: Curvature-based shading of translucent materials, such as human skin. In: Proceedings of the 5th international conference on Computer graphics and interactive techniques in Australia and Southeast Asia, GRAPHITE 2007, pp. 239–242. ACM, New York (2007) 3. Kajiya, J.T.: The rendering equation. In: Proceedings of the 13th annual conference on Computer graphics and interactive techniques, SIGGRAPH 1986, pp. 143–150. ACM, New York (1986) 4. Dorsey, J., Edelman, A., Jensen, H.W., Legakis, J., Pedersen, H.K.: Modeling and rendering of weathered stone. In: Proceedings of the 26th annual conference on Computer graphics and interactive techniques, SIGGRAPH 1999, pp. 225–234. ACM Press/AddisonWesley Publishing Co, New York, NY, USA (1999) 5. Jensen, H.W., Legakis, J., Dorsey, J.: Rendering of wet materials. In: Rendering Techniques 1999, pp. 273–282 (1999) 6. Jensen, H.W., Marschner, S.R., Levoy, M., Hanrahan, P.: A practical model for subsurface light transport. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, SIGGRAPH 2001, pp. 511–518. ACM, New York (2001) 7. Jensen, H.W., Buhler, J.: A rapid hierarchical rendering technique for translucent materials. ACM Trans. Graph. 21, 576–581 (2002)
Real-Time and Interactive Rendering for Translucent Materials Such as Human Skin
395
8. Nguyen, H.: GPU Gems 3. Addison-Wesley Professional, Reading (2007) 9. Mertens, T., Kautz, J., Bekaert, P., Seidelz, H.-P., Reeth, F.V.: Interactive rendering of translucent deformable objects. In: Proceedings of the 14th Eurographics workshop on Rendering, EGRW 2003, pp. 130–140. Aire-la- Ville, Switzerland (2003) 10. Meyer, M., Desbrun, M., Schröder, P., Barr, A.H.: Discrete differential–geometry operators for triangulated 2-manifolds. In: Visualization and Mathematics III, pp. 35–57 (2003) 11. Goldfeather, J., Interrante, V.: A novel cubic-order algorithm for approximating principal direction vectors. ACM Trans. Graph. 23, 45–63 (2004)
Local Communication Media Based on Concept of Media Biotope Hidetsugu Suto1 and Makiba Sakamoto2 1 College of Design and Manufacturing Technology, Graduate School of Engineering, Muroran Institute of Technology 2 Division of Production and Information Systems Engineering, Graduate School of Engineering, Muroran Institute of Technology [email protected], [email protected]
Abstract. The media biotope concept considers media communication structures to be analogous to an eco biotope. Communities created by local media are connected and mutually influence each other. First, the properties of communication media suitable for creating media biotopes are discussed in order to define the concept of media biotope. Two novel communication mediums based on the media biotope concept were identified: one would strengthen communication among residents in a regions and the other would help travellers learn about sightseeing spots and communicate with residents. These mediums are designed to increase residents' enthusiasm for their region by promoting awareness among others. Keywords: Media Biotope, communication medium, community, society.
1 Introduction Our lives are easily influenced by information media, and the appearance of new technology media throughout history has continuously changed society. For instance, public opinion was arisen by the appearance of newspapers, and the popularization of television accelerated our desire to purchase. Now, transmitting information at high speed has become possible due to new communication media, e.g., the internet. While these new media connect strangers in distant places, they also weaken the connections between residents of local communities [2]. As a result, local residents often become indifferent to one another. In Japan, the major television stations are based in and broadcast programs from a Tokyo perspective. As a result, diversity in the Japanese sense of values has been lost, and people across the country aim to be “like people in Tokyo.” This has led to a loss of enthusiasm for their region, which is seen as a reason for decreasing civil participation in suburban cities [6]. Moreover, many young people have gone to “real” Tokyo, causing the population to be excessively concentrated there. As a result, the concept of “media biotope” [5] is attracting attention. In this concept, the focus is on the “small” media: cable television, free papers, community FM radio, etc. The aim is for the communities created by these media forming networks, so that the communities can prosper through their interactions. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 396–402, 2011. © Springer-Verlag Berlin Heidelberg 2011
Local Communication Media Based on Concept of Media Biotope
397
Our goal is clarify the methods that can be used to create media biotopes. First we review the concepts of communication medium and community. We then define media biotope, discuss its properties, and consider the autonomy of media biotope communities. We then present examples of communities created with local media.
Fig. 1. Schemes of communication with medium
2 Media and Community 2.1 Communication Medium The word “communication” has various meanings, and even the transmission of data might be considered communication in certain contexts. Here, communication is defined in accordance with the concept of communicative action as put forth by J. Habermas [3]: actions taken with the aim of understanding each other. Generally, “communication medium” means a single-direction medium, e.g., television, newspapers, and radio, or an interactive medium, e.g., cellular phones and e-mail. However, we can also a great deal of much information from chatting and rumours in daily life and make decisions in accordance with such information. Furthermore, we can see the actions of others even in such things as graffiti on a wall or a can left on a bench. Consequently, a medium can be thought of as something that can be perceived by the five senses and that affects our actions through the information received. Of course, the technologies or techniques used in the medium are irrelevant. Schemes of communications with a medium are illustrated in Fig. 1. A traditional model for communication media proposed by Shannon and Weaver is shown in
398
H. Suto and M. Sakamoto
Fig. 1 (A). In this framework, an information sender and the receiver are required. Information held by the sender is encoded into codes and transmitted to the receiver. The codes are then decoded and the receiver can understand the information. In contrast, Fig. 1 (B) shows a novel model for communication media. In this framework, the receiver obtains the sign from the environment, and he/she gets information by recognizing the sign. The receiver understands the environment around him/her through the medium. Hence, the information sender is not necessarily required. For example, assume a situation in which two people are talking. The speaker’s voices and gestures affect the environment. The receiver gets sounds and images as signs through face-to-face communication field as a medium. The receiver then recognizes the sign and the information is known to him/her. 2.2 Community The word “community” also has various meanings. It could be a neighbourhood association or a homeowner's association. Communities are also formed through intangible connections like those among people who went to school together or who use the public areas of an apartment building. They can arise among people who share the same hobby or the same sense of values. In these cases, there is no physical aspect such as a school or apartment building, and the individual connections are conceptual. That is, communities can be formed by people who share an identity, e.g., values, ideology, or history. Hence, there are two kinds of communities: those based on something physical and those based on relationships. Here, we discuss the second kind because it can also include the first kind.
3 Media Biotope 3.1 Definition The term “media biotope” conceptualizes the idea that media communication structures are an analogous to an eco biotope, and a word coined from “bio” and “topos.” It means a small area that is suitable for living things. In many cases, it indicates a small region with a uniform eco system, e.g., a pond, square, ruined house, and field. However, if the region is isolated from other regions, its biotope characteristics are weakened. When small living things, e.g., insects and birds, travel between such small regions and a network system is constructed between the regions, the regions and the system are also called a “biotope.” Furthermore, the activities that maintain such regions are also called “biotopes” in Japanese. Media biotope can be thought of as a biology of information media, and this concept suggests that we can focus on small local media, e.g., cable television, free papers, and community FM radio, because we can easily use these media. If local communities are formed with these small local media, and the communities construct a network system by interacting with each other, the communities and the system are a “media biotope.” Furthermore, the medium that generates a media biotope is a “media biotope oriented medium.”
Local Communication Media Based on Concept of Media Biotope
399
3.2 Properties “Small,” “connectivity,” “generality,” and “design” have been mentioned as properties of media biotope [5]. “Connectivity” means a situation in which each ecosystem is connected and affects the others mutually. “Generality” means that the regions are not sealed off and behave as part of a life space. “Design” means that the regions are designed so that they continue to perform as a biotope. We follow these definitions here. Scale. The range of the medium must be comparatively small in the media biotope concept as well as range of eco biotope. National newspapers and nationwide televisions can be called “big media” while notices for circulation in a region and bulletin boards at a train station can be called “small media.” But what about a CATV channel that reports local news? The categorization is case dependent. The important factor is not size of the area reached by the medium but the number of residents who can take part in the medium. The motivation of residents to act for the community decreases as the possibility of each member affecting the community through the medium decreases. This means that the influence of each person should be assured in order for a media biotope to form. However, if a specific person has substantial influence, there is risk that it will reflect only the values and interests of that person. As a result, each person's influence must be evident, and as many people as possible must be able to act through the medium. Connectivity. If a community created through a medium with the above mentioned scale becomes isolated from other communities, it might become ruled by a specific sense of values. This means that communities must be able to influence each other so that various senses of values are allowed. Generality. The ability of many people to freely participate is also an important factor. Communities created through network games or social network services (SNSs) that require invitations cannot be called media biotopes because not everyone can participate freely. For instance, a community in “gated city [1]” cannot be called a media biotope because it is closed even though it satisfies the conditions of scale and design. Design. The meaning of information transmitted through a medium is understood not only from the contents but also from the structures of the medium [4]. It is difficult to form a community with the properties of a biotope if the medium is designed to simply transmit semantic information speedily and easily. When designing a new medium, the characteristics of the community and of its members must be analyzed, and the medium must be designed in accordance with the results. 3.3 Autonomy of Communities Communities created with a media biotope must be autonomous and stable like a biosystem created with an eco biotope. The autonomy and stability of a system should be maintained not by effects from the outside the system, but by the results of the system's behaviours. That is, the information should be recycled instead of being simply broadcast. This flow mechanism should be incorporated when designing communication media.
400
H. Suto and M. Sakamoto
Furthermore, a community that looks completely stable is not good because it could become isolated. Communities should be able to adjust in accordance with changes outside the community while maintaining their autonomy. 4
Examples of Media Biotope
4.1 Traditional Events In Japan, these are many traditional events that function as a media biotope. A good example is the Feast of Lanterns ( “Obon” ). During this time, many people return to their home town to visit the graves of their ancestors. It is a good opportunity for them to see relatives and catch up on the news back home. Local festivals and the new year holidays in other countries function the same. However, interest in these events has waned, so communication mediums that take their place should be designed. Formally, blooding ceremony is carried out in second Sunday in January. However, several municipalities move it to Obon, because many persons can take part in the ceremony at their home town. It can be said good example of action for creating media biotope.
Fig. 2. Images of community activities on a pedestrian bridge
4.2 Community Pedestrian Bridges Figure 2 shows images of community activities on a pedestrian bridges designed using communication media. We call such a bridge a “community pedestrian bridge.” There are flower beds on the bridge, and people in the neighbourhood can freely plant flowers there ( Fig. 2 (A) ). In this system, the residents are encouraged to take care of the flowers together, to view the flowers, to visit the bridge, and to relax there. The pedestrian bridge is used by the residents not only as a place to cross the street but also as a place where they can plant flowers and get community news. Several promenade bridges could be placed in a rural area. The residents could find their locations by checking a web site. Visitors to the web site could see photographs
Local Communication Media Based on Concept of Media Biotope
401
taken on the bridges ( Fig. 2 (B) ), and these photographs could be regularly refreshed. The residents using the bridges could share photographs and news about the bridges with the public by posting to a web site ( Fig. 2 (C) ). Publicizing photographs and news about the bridges on the web would enhance communication among the bridge communities ( e.g., a someone viewing the flowers on other bridges could meet and talk with people from other communities about common topics such as talking care of and observing plants ( fig. 2 (C) ) ). Such interaction corresponds to an “exchange of local information” in the media biotope concept which should mutually activate the communities. Moreover, fixed cameras on the bridges would help to prevent crime on or around the bridges. In addition, residents can naturally participate in the community because the bridges would be set up in common areas of passage. Generally, pedestrian bridges, which enable disadvantaged people to cross the road, are set up near hospitals, elementary schools., etc. Making these community bridges would thus promote communication between the elderly and grade schoolers. Consequently, two generations, the grade schoolers and elderly, could be brought closer together.
Fig. 3. Images of community acrivities on a bus
Setting up such bridges could make the scenery in the region more beautiful. The external appearance of such bridges would be greatly influenced by the climate and culture of the local area. Therefore, the features of the culture and the climate in the region would be made more visible through these bridges. The long-term benefits include an increase in the resident's respect for their home towns and a stronger bond between generations. 4.3 Sightseeing Buses Figure 3 shows images of a community activities on a buses designed using communication media. We call such a bus a “communicative bus.” Two functions are incorporated: transportation for the residents and travellers and enhanced communication between residents and travellers.
402
H. Suto and M. Sakamoto
Local products, e.g., sweets and special lunch boxes, would be sold to the travellers at each bus stop. The name of the next stop and the special product sold there would be announced before each stop. The passengers could buy the local products they like as they use the bus for transportation. As a result, the travellers would naturally improve their understanding of the region while enjoying their trip. Electronic message displays would be set up 1) on the front of the bus, 2) on the side near the back doorway and 3) above the driver (see Fig. 3 (C)). Passengers could freely send brief messages like a “Twitter message” about their impressions, memories, etc. of the trip and on local products to these displays by using their cellular phones. The messages would be displayed until another user sends a message or until a fixed time has passed. These message displays would allow passengers to share their thoughts and feelings and entertain people outside the bus. Furthermore, because travellers are typically interested in local products and historical places, they would get a chance to communicate with the local community by buying their products. As a result, the memories of their trip would become more impressive. Travellers who ride together could use these messages as a means to communicate. In addition, communities and industry in the region would be invigorated, and the attractiveness of the region would increase as a result.
5 Conclusion We have redefined the concept of media biotope in order to discuss it from the viewpoint of system informatics. The properties of a community created by a media biotope were discussed in accordance with the definition of a biotope. We identified two novel communication mediums that can be thought of as media biotope oriented mediums. The first would strengthen communication among residents in a region. The second would help travellers learn about sightseeing spots and communicate with residents. These mediums would not be designed simply for communication. They would be designed as public infrastructures. Consequently, residents could use them without thinking, so media biotopes should form naturally. Acknowledgements. This work was supported by Grants-in-Aid for Scientific Research from the Japanese Society for the Promotion of Science (No. 20500220 and No. 21360191).
References 1. Blakely, E.J., Snyder, M.G.: Fortress America: Gated Communities in the United States. Brookings Inst. Pr. (1999) 2. Dreyfus, L.H.: On The Internet. Routledge (2001) 3. Habermas, J.: The Theory of Communicative Action I. Heinemann (1984) 4. Macluhan, M.: Understanding media: the extensions of man. McGraw-Hill, New York (1964) 5. Mizukoshi, S.: Media Biotope (Japanese). Kinokuniya publication (2005) 6. Suto, H., Katai, O., Okita, M., Kawakami, H.: A Medium Design for Sharing Empathetic Memories. In: Proc. the 8th Workshop on Social Intelligence Design, pp. 199–205 (2009)
Big Fat Wand: A Laser Projection System for Information Sharing in a Workspace Toru Takahashi and Takao Terano Tokyo Institute of Technology, J2-52, 4259, Nagatsuda-cho, Midori-ku, Yokohama City, Kanagawa, Japan {toru@trn.,terano@}dis.titech.ac.jp
Abstract. This paper proposes the method to solve a problem of a triad relation thorough an augmented reality system in cooperative works. It enables us to explain shortly, because it utilize spatial information without translation it to verbal ones. This paper realizes it with the laser projection AR system, Big Fat Wand (BFW). It meets requirement for a real workspace. From experimental results, the AR method with BFW is effective to decrease the explanation time. Keywords: Augmented Reality, Cooperative Work, Laser Projection System.
1 Introduction This paper proposes a new cooperative work support method through augmented reality (AR) technology. When we cooperatively work together in a work field, if there exists a “triad relation”: a relation among a far target object, an explainer, and an explainee, they tend to have misunderstandings and/or difficulties in their communication. It is because the explainer and the explainee have difficulty to share common points and/or concepts about the target object. Figure 1 depicts the triad relation in such a communication task. To cope with the difficulty, we utilize AR in the shorter explanation about the target object in order to support visual communication. For the purpose, in this paper, we report the experiments using Big Fat Wand (BFW) system: a portable laser projection system, which equips a handy version of a conventional laser show device. The triad relation often raises such difficulties that they must translate the special information of the target object into verbal ones with the longer explanation. Such communication would cause some accidents, when there exist the triad relation in a manufacturing factory or a construction field, which are usually noisy, dirty, dangerous, and hard to approach. To support such communication tasks with the triad relation, the shorter and simpler explanations are desirable in order to reach a common understanding about the target space. Therefore, they require the following three points: 1. Accurate and simple display of the target information: This will decrease misunderstandings about the triad relation; 2. Ease of use around the target object: This means the use of the AR to support adhoc communication; 3. Flexibility of the use both in the darker and lighter environment. This means we do not assume any experiment rooms on the triad relation. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 403–410, 2011. © Springer-Verlag Berlin Heidelberg 2011
404
T. Takahashi and T. Terano
To meet the above requirements, we propose the use of AR by simultaneously displaying the visual information of the explainer to the explainee onto the target object directly. This enables us to decrease the difficulty of the explanation task, because it is not necessary to translate the spatial information into the verbal ones. Figure 2 shows the situation. The objective of the paper is to experimentally demonstrate the effectiveness of the communication scheme shown in Figure 2 through our laser projection system BFW. The rest of the paper is organized as follows: In section 2, we briefly survey the related work in the literature; Section 3 explains the features of BFW; Section 3 describes experiments to cope with the triad relation problem; and concluding remarks follows in Section 5.
Fig.1. Triad Relation in a Communication Task
Fig. 2. Desirable Communication Scheme for the Triad Relation
2 Related Work So far, there has been various works in the literature to support field works using AR systems [1]. The researches have usually focused on a very specific task in a restricted space. For example, in [2], they reported experimental work for the maintenance task of a laser printer. Also, in [3], a system has been developed for a medical education task.
Big Fat Wand: A Laser Projection System for Information Sharing in a Workspace
405
Many of such AR systems require special purpose tracking techniques both to identify the place of a target object and to display the necessary information on it. Such typical systems include ARToolkit [4] and PTAM [5]. PTAM is characterized by its markerlessness. Of course, they are useful, however, to use such systems, there are the following defects: they must prepare a specific environment beforehand, or they are inflexible against the environmental changes.
3 Big Fat Wand: A Handy Laser Projection System The objective of BFW is to enhance human-computer interaction activities [6, 7]. BFW is a handy smaller version of a conventional laser show device. Compared with a conventional laser show device [8, 9], the size is as small enough as we use it in hand. Also, compared with a conventional laser pointer, it is connected with a laptop PC and programmable to allow the user to specify the pattern displayed via any characters and/or symbolic patterns on the targeted object [6, 7, 9]. BFW stands for the very big magic wand. The system displays various information onto any kinds of target objects, even if they are in a bright place. Therefore, BFW satisfies the AR system requirements described in the previous section. That is: 1) BFW can display accurate information onto the desired place by setting the handy part against the target object; 2) BFW can easily change the contents using contents editing toolkits on a PC, thus. We are able to show even hand-written information and animations at hand; and 3) The laser light is enough clear even in a very brilliant and /or fully dark place. Big Fat Wand system has the following components: i) A laptop PC with line drawing image generation, image display, and editing software, ii) A one-board micro-computer to convert the digital information of the drawings to the analog ones to control the device, and iii) Laser show device with laser light generator, small dynamic mirror devices to control displays of the drawings, and power supplies. The system is also equipped with special purpose authoring tools for naïve users to prepare the explanation materials. Very unique points of BFW are summarized as follows: i) A one-board 16 bit micro-computer manages DA conversion of explanation and controlling the images, ii) The portable cylinder part is carefully designed to avoid heat damages of the laser devices, and iii) the components of the devices are packaged in separated two parts to easily use the system. Figure 3 shows the architecture and Figure 4 depicts the photos of integrated device of BFW. A user uses the cylinder part to show the desired information. The box behind the cylinder contains devices such as one-board micro-computer, power supply, laser light generator, and so on. The information is prepared beforehand and/or on-site using special purpose authoring tools equipped on a laptop PC. The configuration of the authoring tools is shown in Figure 5. Using the tools, we are able i) to process static bitmap images (both characters and symbols), which are converted to line drawings, and ii) to generate simple animations combined with the
406
T. Takahashi and T. Terano
bitmap images, and also processes line drawings written with the postscript language. This means that the user is able to both design and generate the necessary information with bitmaps beforehand and draw pictures and character in real time.
PC With Information Editor
Big Fat Wand
Display Images
Fig. 3. Architecture of BFW
Fig. 4. Outlook of the Integrated Device
4 Effectiveness Validation To validate the effectiveness of the proposed method with BFW, we have conducted a series of route explanation experiments, whether they would have smoothly communicated in a triad relation or not. The route explanation experiments are designed to simulate tasks of material picking work in a manufacturing factory. The experiment consists of the following steps: first, the route of the material moves is shown to a subject: explainer; second, the subject is required to place
Big Fat Wand: A Laser Projection System for Information Sharing in a Workspace
407
specified materials at specified places and directions; third, the explainer is required to let another subject: explainee executes the material movement task. Through the experiments, we let them carry out the task with/without BFW and in noisy/silent environments.
Fig. 5. Software Configuration of BFW
4.1 Subjects and Experimental Set-Ups We have prepared three set of subjects: explainer and explainee. They have changed their roles in turn in the triad relation. These subjects are graduate students in our engineering department. They usually communicate each other in their daily life. An explainer explains the movement instructions to an explainee with a paper or BFW (Figure 6). To set the environment noisy or silent, they are required to wear or not to wear a headphone with white noises in order to prevent them from verbal communication.
408
T. Takahashi and T. Terano
Fig. 6. Scene of Experiment
We have evaluated the effectiveness by measuring the time for explanation, the accuracy of the competing the tasks, and a questionnaire survey after the experimental tasks. The explanation time is said to be 5 minutes, however, we allow them for 15 minutes. The questionnaire survey contains the following items: Q1. Is BFW easy to use to transfer the information to the other from the accuracy and easiness viewpoints? Q2. Is BFW explanation easy to understand by the explainer? Q3. Is the paper explanation to use to transfer the information to the other from the accuracy and easiness viewpoints? Q4. Is the paper explanation easy to understand by the explainer? Q5. Which is better to understand in the explanations by BFW or paper? Q6. Which is easier to make the explanations by BFW or paper? Answers of Q1, Q2, Q3, Q4 are measured by 1(negative), 2, 3, and 4(positive). Answers Q5 and Q6 are paper or BFW. 4.2 Experimental Results Figure 7 displays the explanation time. We have observed the statistical difference about the explanation time of BFW and the paper with/without noises. Figure 8 summarizes the questionnaire results. The results have suggested the superiority of BFW explanations. About the accuracy and ease of understandings, the explanation by BFW is usually better than the one by the paper. The ease of use of BFW is worse than the one of the paper explanation.
Big Fat Wand: A Laser Projection System for Information Sharing in a Workspace
409
**
1000 800
*
600
Time
400 200 0 Paper BFW (Non Noise) (Non Noise)
Paper (Noise)
BFW (Noise)
1
2
3
4
6 5 4 3 2 1 0
Number of Subjects
6 5 4 3 2 1 0
Number of Subjects
Number of Subjects
Fig. 7. Explanation Time
1
2
3
6 5 4 3 2 1 0
4
BFW Paper
1
2
3
4
Fig. 8. Questionnaire sumarisation
4.3 Discussion From the experiments, we have observed that the use of AR with BFW decreases the explanation time, especially it is effective in a noisy environment hard to orally communicate. This means the AR method can omit the special explanation on the tasks. Also, the direct projection of the explanations to the target object has decreased the ambiguity of the indication and increased the ease of understandings. On the contrary, in a noisy environment, the paper communication have required so much time because they must write everything in detail. However, the operations of a PC toolkit have caused the difficulty of communications. The operation of the handy part of BFW might has caused the difficulty of detailed explanation because of the shaking of images made by the laser system.
5 Concluding Remarks This paper has proposed a cooperative work support method in a triad relation environment through augmented reality (AR) technology. The communication in the
410
T. Takahashi and T. Terano
triad relation environment is difficult because of the hard tasks to translate special information about the target object into verbal ones. To cope with the issue, we have utilized Big Fat Wand: a handy laser projection system. BFW enables us to directly project the necessary information onto the target object. Our experimental results have suggested the AR method with BFW is effective to decrease the explanation time especially in a noisy environment, which is usually the case in a real manufacturing factory. Our future work includes further field investigations of the AR method with a very long range and the improvement of BFW both in a software system and a hardware device.
References 1. Azuma, R.: A survey of augmented reality. Presence: Teleoperators and Virtual Environments 6(4), 355–385 (1997) 2. Feiner, S., MacIntyre, B., Seligmann, D.: Knowledge-based Augmented Reality. Communications of the ACM 36(7), 52–62 (1993) 3. Kondo, D., Goto, T., Kono, M., Kijima, R., Takahashi, Y.: A Virtual Anatomical Torso for Medical Education Using Free Form Image Projection. In: VSMM, pp. 678–685 (2004) 4. Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K., Tachibana, K.: Virtual Object Manipulation on a Table-Top AR Environment. In: Proc. of IEEE and ACM International Symposium on Augmented Reality 2000, pp. 111–119 (2000) 5. Klein, G., Murray, D.: Parallel Tracking and Mapping for Small AR Workspaces. In: Proc. International Symposium on Mixed and Augmented Reality (2007) 6. Takahashi, T., Namatame, M., Kusunoki, F., Terano, T.: Big Fat Wand: A Pointing Device for Open Space Edutainment. In: Nijholt, A., Reidsma, D., Hondorp, H. (eds.) INTETAIN 2009. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 9, pp. 240–245. Springer, Heidelberg (2009) 7. Takahashi, T., Namatame, M., Kusunoki, F., Ono, I., Terano, T.: A handy laser show system for open space entertainment. In: Natkin, S., Dupire, J. (eds.) ICEC 2009. LNCS, vol. 5709, pp. 311–312. Springer, Heidelberg (2009) 8. Qijie, G., Yanyan, L.: Development of an Intelligent Laser Show System - A Novel Application of Mixed Reality Technology. In: 2005 ASEAN Virtual Instrumentation Applications Contest (2005) 9. http://www.laserfx.com/
Disaster Information Collecting/Providing Service for Local Residents Yuichi Takahashi1, Daiji Kobayashi2, and Sakae Yamamoto1 1
Department of Management Science, Tokyo University of Science, 1-3 Kagurazaka Shinjuku Tokyo Japan {yt,sakae}@hci.ms.kagu.tus.ac.jp 2 Faculty of Photonics Science Department of Global System Design, Chitose Institute of Science and Technology, 758-65 Bibi Chitose Hokkaido Japan [email protected]
Abstract. It has been pointed out that when people lack the information needed in the event of a disaster, such as a disastrous earthquake, this could lead to social chaos, including unwanted rumors and outrages, or could disrupt rescue and relief activities1, 2. In Japan, by law in principle, self-help or mutual assistance is required immediately after a disaster, and local residents are required to make judgments for action on their own. Although disaster information systems are gradually being organized at the municipal level, actual emergency evacuation areas and essential information for local citizens are still not sufficiently ready for provision at this stage.3 In this study, we established and evaluated a service infrastructure with an autonomous wireless network, aiming at providing services to collect and deliver disaster information, which will be required by local residents. Keywords: earthquake, disaster victims, distributed autonomous system, wireless network.
1 Introduction In the event of a disaster, such as a disastrous earthquake, information provision is effective in preventing chaos at the scene. Therefore, timely and accurate information collection and delivery services are essential. These services allow prompt rescue and relief activities and appropriate information delivery to local residents. Thus, it is urgent to establish a system to enable these services. The systems proposed so far are ones with Internet or mobile phone connections or with ad-hoc wireless LAN networks4, 5. We call such systems communication channel dependent systems, which require communication channels or establish communication channels between clients and servers via an ad-hoc network. From the perspective of an information service, such systems that accumulate information in PDAs and send it via an ad-hoc network when a communication channel is established are also regarded as communication channel dependent systems. There are two issues of concern regarding this system: 1) the system is not available until a communication channel is established, and 2) as users access the server to gain information, the intense access may lower server performance or cause G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 411–418, 2011. © Springer-Verlag Berlin Heidelberg 2011
412
Y. Takahashi, D. Kobayashi, and S. Yamamoto
communication channel congestion. In this study, we would like to propose an approach to resolve these issues. As it is difficult to generalize the situations of earthquake disasters, we set the following assumptions: (a) assuming a strong earthquake of approximate magnitude 7 in a residential area, (b) all the lifelines including electricity and communication channels stopped functioning, (c) lines for land phones and mobiles are congested and not working, and (d) the proposed system (hereafter called “the system”) can be preliminarily placed.
2 Method In order to resolve the above issues, we placed servers, which store information, closed to users in this study. By doing this, the system can run without a communication channel established, and the service can be continuously provided even with a narrow bandwidth. Since disaster information system users, such as local residents, shelter authorities, rescue and relief crews, and municipal employees are diverse and geographically dispersed, multiple servers are required to meet the condition in which servers must be placed close to users. Each server must hold information and be synchronized with each other. Also, they must independently run, communicate with each other, and dynamically detect others in case some are damaged in a disaster. The service infrastructure consists of many small sub-systems. Each of these systems independently provides information collection and delivery services. Also, these sub-systems can autonomously work together with other sub-systems to exchange information. By appropriately allocating sub-systems within a region, regional information is continuously shared, which can solve the issue of information shortages in a disaster. We adopted general consumer hardware products that are supposed to work approximately 72 hours with batteries. Though each hardware product itself is not robust, they are all independent so even if some of them are damaged, they do not affect others. Also, as it is allowed to dynamically add subsystems, damaged ones can be easily replaced to immediately recover the entire service. The network configuration of this system is illustrated in the Fig. 1. The system is formed by a group of small servers (hereafter called nodes) with server abilities and dynamic communication functions. Each node can function as a web server and allows clients (e.g. PCs and PDA) to connect to register or browse information. Nodes also detect others within the communication range to synchronize information. With these functions, the system can still work as an integrated unit even when part of the hardware is damaged in a disaster. Fig. 2 illustrates the hardware configuration of the nodes. We selected only devices that can function approximately 72 hours with either dry-cell or rechargeable batteries. In addition, dedicated PCs can be equipped with server functions for information entries. This means that users can initiate the registration of information even if no communication channels are established.
Disaster Information Collecting/Providing Service for Local Residents
413
Fig. 1. Whole image of the system: all nodes search another sub system, and then they communicate to each other in order to exchange information they had
Fig. 2. Elements of the nodes
414
Y. Takahashi, D. Kobayashi, and S. Yamamoto
3 Evaluation Experiments We conducted a field test and software simulation for the evaluation experiments. In the field test, we developed a prototype and tested operational capabilities by connecting the network between evacuation areas. Software simulation has been performed to check how the information is delivered when the nodes are multiplied to the actual number to be connected. 3.1 Field Testing Apparatus and Materials The experiment was conducted at notebook computers and access points with antenna as shown in Fig. 3 These apparatus were located as shown in Fig. 4. The interval to execute process to discovering another nodes and communication was set 60 seconds. • • • •
Node A: IBM ThinkPad X61, Windows Vista Ultimate (32bit), Japanese Edition , Node B: IBM ThinkPad X61s, Windows 7 Ultimate (64bit), English Edition, Node C: IBM ThinkPad X40, Windows XP Professional (32bit), English Edition, Node D: Access Point w/ antenna only.
Fig. 3. Apparatus of field testing
Disaster Information Collecting/Providing Service for Local Residents
Node B
Node C
Node D
Node A
415
㻌
㻌
㻌
㻌
Fig. 4. The layout of the nodes
Procedure At first, node A and B were installed as they could not communicate each other. Then, ten data were put into each node. Next, node C was installed. Then the nodes started communication in order to synchronize information they had. After the synchronization, ten more data were put into each node. The nodes recorded the registered timestamp and the received timestamp for each data as log. These trials were repeated three times. 3.2 Software Simulation Apparatus and Materials The experiment was conducted at a notebook computer (IBM ThinkPad X61s, Windows Vista Ultimate (64bit), Japanese Edition, Intel Core2 Duo CPU L7500, 4GBRAM), and the simulator was written in Java language (jdk1.6.0-11). The simulator generated defined node objects, put defined data, and made them
416
Y. Takahashi, D. Kobayashi, and S. Yamamoto
communicate each other. Each node objects recorded the time when they received all of the data. The parameters (required time to search, connect, and transfer data) were taken from actual measurements. Procedure At first, the node objects and data were defined in text files that contained comma separated values, and then the parameters were defined in properties file of Java language. Next, the simulator was executed ten times.
4 Result and Discussion 4.1 Field Testing At first, node A and B accepted user input without any other external network connection. These show the sub-systems have a feature of autonomous controllability. All of the former data were synchronized, after node C was installed. Node C was dynamically detected by Node A and B, then they communicated each other. This shows the system has a feature of autonomous coordinability. These two features are important to our system. Table 1 shows the average of required period to synchronize the latter 10 data. Logically, it takes maximum 240 seconds to synchronize the data. There were 3 nodes in the network, thus, node C received data within 120 seconds, and then sent them within 120 seconds. It depended on the delay between input timing and invocation timing of synchronization process. The result was nearly half of maximum period. The information synchronize periods were reasonable. Table 1. The average of required period to transfer 10 data
㻺㼛㼐㼑㻌㻭㻙㻪㻯 㻺㼛㼐㼑㻌㻯㻙㻪㻮 㻺㼛㼐㼑㻌㻭㻙㻪㻮
㻝㼟㼠㻌㻝㻜㻌㼐㼍㼠㼍㼇㼟㼑㼏㼉 㼚㼑㼤㼠㻌㻝㻜㻌㼐㼍㼠㼍㼇㼟㼑㼏㼉 㻥㻢㻚㻟 㻥㻜㻚㻣 㻝㻜㻞㻚㻠 㻥㻡㻚㻥 㻝㻟㻤㻚㻞 㻝㻞㻡㻚㻟
4.2 Software Simulation Fig. 5 shows the period required to distribute hundred data from the center node. It indicates the performance is well to far nodes; however, to the nearby nodes of the center node is not satisfied. This caused the communication was serialized; the center node could not communicate to the others, when the number of the nearby nodes was three or more. This point of the system should be improved. Anyway, our target city requires two hundred nodes in order to cover the city, thus, the result indicates the information will be shared in the city within thirty minutes.
Disaster Information Collecting/Providing Service for Local Residents
417
Fig. 5. The period required to distribute hundred data
5 Conclusion We developed information system for disaster victims as the distributed autonomous system using wireless network, and evaluated it. We could confirm the two important feature of our system. First, autonomous controllability increases the availability of collecting information at early time of the disaster. Second, autonomous coordinability helps gradual recovery of whole our system. The data transferring (synchronization) time were reasonable in order to use our system in our target city. Results of this study will be of service to construct disaster information systems for inhabitants. 5.1 Future Work More comprehensive and multifactorial field testing is required. In order to install and use our system actually, more complex geographical layout is required and more nodes are needed. Thus we should evaluate our system in near situation of real. Investigation of the information needed in a time of disaster is required. The information differs from one that is gathered by autonomous. And more, the user interfaces for the gathering / providing the information should be considered.
References 1. Osamu, H., et al.: Disaster information and social psychology. Hokuju Shuppan, 177 (2004) 2. Osamu, H.: Hanshin-Awaji (Kobe) Earthquake investigation report in 1995-1. Institute of Socio-Information and Communication Studies, the University of Tokyo (1996)
418
Y. Takahashi, D. Kobayashi, and S. Yamamoto
3. Sakae, Y.: The Providing Disaster Information Services in Ubiquitous Days. Journal of the Society of Instrument and Control Engineers 47(2), 125–131 (2008) 4. Nobuo, F., et al.: Intercommunications system “AnSHIn-system” and mobile disaster information unit “AnSHIn-Kun. AIJ J. Technol. Des. (12), 227–232 (2001) 5. Yusuke, T., et al.: A wireless mesh network testbed in rural mountain areas. In: The Second ACM International Workshop on Wireless Network Testbeds, Experimental Evaluation and Characterization, pp. 91–92 (2007) 6. The Headquarters for Earthquake Research Promotion, http://www.jishin.go.jp 7. Cabinet Office, Government of Japan, http://www.bousai.go.jp
Comfortable Design of Task-Related Information Displayed Using Optical See-Through Head-Mounted Display Kazuhiro Tanuma1, Tomohiro Sato2, Makoto Nomura2, and Miwa Nakanishi1 1
Keio University, Fac. of Science & Technology, Dept. of Administration Engineering Hiyoshi 3-14-1, Kohoku, Yokohama 223-8522, Japan [email protected], [email protected] 2 Brother Industries, Ltd., 3-8, Momozono-tyo, Mizuho, Nagoya 467-0855, Japan [email protected], [email protected]
Abstract. Optical see-through head-mounted displays (OSDs) enable users to view digital images overlaid on the real world. Their most prospective application is as media that display instruction manuals in industrial fields. This study elucidates requirements for comfortable design depending on the complexity of workers’ sight, particularly focusing on OSDs’ perfect seethrough feature. Our goal is to provide design guidelines for task-related information displayed using OSDs. Based on experimental results, requirements for the comfortable design of elements of the task-related information provided by OSDs are summarized. We suggest the content should be designed when OSDs are put to practical use. Complete examination revealed that users who repeatedly shifted their gaze from the OSD to the real object felt more comfortable because their eyes were subjected to lesser variation in brightness. Accordingly, we suggest that design elements of the information items should be designed to control the brightness difference between the information displayed on OSDs and real sight. Keywords: Optical see-through head-mounted display, Information design, Task-related information.
1 Introduction An optical see-through head-mounted display (OSD), a new type of display, enables users to view digital images overlaid on the real world. It has improved greatly in terms of image resolution, size, and weight, and will soon be available in our market. OSDs can be utilized in many ways. Their most prospective application is as media that display instruction manuals in industrial fields. Most of the recent sophisticated industrial machinery involves a fixed display to give workers task-related information such as present operation status (Fig. 1). As an idea, if such information is presented in front of workers’ eyes using OSDs instead of using fixed displays, it is expected that they can refer to it easily and work more efficiently and comfortably. In fact, our G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 419–429, 2011. © Springer-Verlag Berlin Heidelberg 2011
420
K. Tanuma et al.
previous research found that when workers performed wiring tasks by referring to an OSD manual, human error decreased remarkably and task efficiency increased by more than 15% compared to when the task was performed using a conventional paper manual [1]. An OSD is useful because when a worker wears it, he can simultaneously view both the real object and the related information. However, whether it supports workers effectively and comfortably depends on its content. Considering that the Side display Operation space information presented by an OSD is overlaid on the Fig. 1. Example of industrial real world, it must be designed in a manner different machines with a fixed display from that for a normal display. Thus, this study elucidates the requirements for comfortable design depending on the complexity of workers’ sight, particularly focusing on its perfect see-through feature. Our goal is to provide design guidelines for displaying task-related information using an OSD.
2 Method In this study, assuming that workers operate real objects and refer to the task-related information whenever necessary, we carried out an experiment in which participants referred to information displayed on an OSD when triggered by a simulated real sight image displayed on a large monitor. We determined how and what is to be specifically examined in this experiment as follows. First, although the patterns of real sight are actually innumerable depending on particular situations, we classified them into five patterns shown in Table 1, based on the application of this study to industrial fields. Second, based on our pilot survey of some real images presented by a fixed display with the latest industrial machinery, we chose six items shown in Fig. 2 as design elements composing task-related information. We aimed to clarify how these design elements should be set and combined, and to organize the knowledge into a design guideline for displaying task-related information using an OSD. 2.1 Experimental Process We prepared different values for each of the six design elements as shown in Table 2. In order to provide recommendations for each design element, we adopted a three step approach. Table 1. The real sight patterns
Comfortable Design of Task-Related Information Displayed Using OSDs
421
Fig. 2. Six design elements composing task-related information Table 2. Six design elements composing task-related information
2.2 Participants Twelve students participated in process 1, ten students participated in process 2, and eleven students participated in process 3 of the experiment. All of them had normal vision. We obtained the informed consent of the individuals who agreed to participate in this experiment.
422
K. Tanuma et al.
2.3 Experimental Environment and Apparatus The participants wore the OSD (prototype, Brother Industries, Ltd.) (Fig. 3) on their non-dominant eye [2] and sat 750 mm away from a 42-inch wide monitor (TH42PX300, Panasonic), as shown in Fig. 4. The OSD is a type of a retinal imaging display that enables users to view a full-color image of approximately 16 inches and approximately 100 cm away in their visual distance. Fig. 5 shows an image of the participant’s field of view. In addition, a keyboard used for performing tasks was placed in front of the participants.
Fig. 3. A participant Fig. 4. Positions of the large Fig. 5. An image of the participant’s wearing an OSD monitor and the participant field of view
2.4 Experimental Task The following task was given to the participants in every experimental process. The large monitor randomly displayed an alphabet at any time and in any position. On recognizing it, the participants pushed the enter key and an information item comprising 14 random alphabet characters was displayed on the OSD. The participants read and typed the information item using the keyboard. The information items in which design elements were altered were returned to the subjects and the task was repeated. However, even if they did mistype, they were told to continue the task without correcting it, so that the probability of mistyping is essentially constant. The participants were required to perform this task correctly and quickly. The following are the outlines for each experimental process. • Process 1: The real sight pattern was fixed at pattern A (Black). Each participant performed the task under a total of 135 combined conditions of three types of background colors, five levels of item sizes, and nine different item positions. • Process 2: Each participant performed the task under a total of 210 combined conditions of five patterns of real sight, three types of background colors, and 14 types of item colors. • Process 3: To complete typing all information items that the OSD displayed at each of nine positions, each participant performed the task under a total of 60 combined conditions of five patterns of real sight, four types of item allocation, and three patterns of item color codes. In the conditions of the allocation types of SG (single), HL (horizontal), and VT (vertical), the participants advanced to the next information item by pushing a key.
Comfortable Design of Task-Related Information Displayed Using OSDs
423
The participants evaluated the task comfort using a visual analog scale from 0 to 100 after each task. 2.5 Data First, the participants’ typing performance was recorded automatically, and task accuracy was determined by calculating the percentage of the typed character sequence as compared to the given character sequence. Second, based on the scores the participants provided after each task, the comfort level was defined by computing the percent of the score over 50.
3 Results We analyzed and examined the data for each process. 3.1 Item Size and Item Position We could not find any connection between the item size and the item position in data for both task accuracy and comfort level. First, we focused on how differences in the item size affected the task accuracy and the comfort level. The task accuracy was around 60% whenever any item size was used. Taking into consideration that the data of the task accuracy could involve simple mistyping, this result indicates that the smallest information item (0°32'25' * 0°16'12'') could be read as correctly as the largest information item (1°30'03'' * 0°45'01''). On the other hand, the comfort level increased with the increase in item size (Fig. 6). From these results, we determined that the item size more than the minimum in which the comfort level exceeds 80%, that is, over the size of 1°01'14'' * 0°30'37'', as a recommended range that users can easily recognize on OSDs. Next, we focused on how differences of the item position affected the task accuracy and the comfort level, if the item size was within the recommended range (over the size of 1°01'14'' * 0°30'37''). In this experiment, because the right eye was the dominant eye of all participants, they all wore the OSD on their left eye. The task accuracy was more than 60% in any item position. As described above, as the data of the task accuracy includes the constant percentages of mistyping, this result indicates that the information items in any position could be read correctly as long as the information item met the above size requirement. On the other hand, we can see a difference in the comfort level (Fig. 7). This result demonstrates that the information item presented to the participant’s nose-side could be recognized more comfortably than that presented to the participant’s ear-side. Moreover, this result also shows that the upper-side item was rather difficult to recognize compared with the lower-side item. These results indicate that the more important or often checked information should be positioned in the order of middle, lower-middle, nose-side, upper-middle, and ear-side, as the position requirements for the information items presented on the OSD.
424
K. Tanuma et al.
3.2 Background and Item Colors The colors of the information items and the background were examined after fixing the item size within the recommended range. We focused on how differences in the background color of the information items affected the task accuracy and the comfort level. Fig. 8a shows the task accuracy, and Fig. 8b shows the comfort level in each of the real sight patterns. The task accuracy was highest when the background type was BB (Black/Black) for three of five real sight patterns. Moreover, the comfort level was higher when the background type was BB in every real sight pattern. Accordingly, the BB background type was recommended as the background color for the information items displayed by the OSD in any real sight pattern.
Fig. 6. The comfort level of each item size
Fig. 7. The comfort level of each item position
Next, we focused on how each item color affected the task accuracy and the comfort level. Fig. 9 shows all item colors tested in this experiment in the order of highest comfort level by each real sight pattern. First, the color of RGB = (0, 255, 0) was evaluated highest for three of five real sight patterns. Conversely, the colors of RGB = (0, 0, 128), (128, 0, 128), and (129, 0, 126) were evaluated low for any real sight pattern. The order of high task accuracy was also similar to that of the comfort level. Paying attention to the brightness level and the absolute luminance level of each item color (Fig. 10), we observed that the item colors that are far from the background color in a plane composed of these two dimensions tended to increase the comfort level. Second, when the real sight pattern was A (Black), the comfort level was under 50% for only one color. However, in other real sight patterns, the comfort level was under 50% for more than one color, particularly when the real sight pattern was E (Full-color mosaic). These results suggest that the colors with high brightness and luminance such as RGB = (0, 255, 0) should be used with high priority as the item color, and that colors such as RGB = (0, 0, 128), (128, 0, 128), and (129, 0, 126) should be avoided, particularly when the luminance of the real sight is rather high.
Comfortable Design of Task-Related Information Displayed Using OSDs
425
3.3 Item Allocation and Color Code The allocation of the information items and its color code were examined after fixing the item size and the background color at the recommended conditions. Moreover, the information items were displayed using nine colors for each of the real sight patterns (see Fig. 9).
Fig. 8a. The task accuracy of each background in each real sight pattern
Fig. 8b. The comfort level of each background in each real sight pattern
First, we focused on how each allocation pattern of the information items affected the task accuracy and the comfort level. Fig. 11a shows the task accuracy and Fig. 11b shows the comfort level. The task accuracy of the allocation type of AL (All), in which all information items were presented at the same time and it was unnecessary to call the next information item, was highest for any real sight pattern. Moreover, in comparably complex real sight patterns D (Monochrome mosaic) and E (Full-color mosaic), the task accuracy tended to be higher when the type of item allocation was HL (Horizontal) or VT (Vertical), in which the information items were presented piece by piece. On the other hand, we did not find any significant conclusion about the comfort level, except that it was lowest in the allocation type of SG (Single), in which the information items were presented serially and it was necessary to call the next item each time. From the above, we can conclude that it is primarily desirable to present the information items at the same time as much as possible and to display every information item using the recommended size. However, if the real sight is so complex that it cannot be regarded as just a black or white pattern, it is also effective to present the information items piece by piece.
426
K. Tanuma et al.
Fig. 9. All item colors tested in this experiment in the order of high comfort level by each real sight pattern
Fig. 10. Brightness and absolute luminance of each item color
Next, we focused on how differences in the color code to distinguish the information items affected the task accuracy and the comfort level. There was no significant tendency in the task accuracy for any real sight pattern. On the other hand, we did find a difference in the comfort level (Fig. 12) between the real sight patterns. In the case where every information item was presented in a different color, the comfort level was low in four real sight patterns except pattern A (Black). Moreover, unifying the color of the information items was generally preferred when the real sight pattern was comparatively simple, but distinguishing the information items using three colors was preferred when the real sight pattern was comparatively complex. From these results, although it is not a problem if the information items are distinguished by many colors as long as the real sight can be regarded as simple black, it is recommended that the color of the information items should be unified
Comfortable Design of Task-Related Information Displayed Using OSDs
427
using one of the recommended colors when the real sight cannot be regarded as simple black. In particular, the information items should be distinguished using three colors when the real sight is so complex that it cannot be regarded as monochrome.
Fig. 11a. The task accuracy of each allocation type in each real sight pattern
Fig. 11b. The comfort level of each allocation type in each real sight pattern.
Fig. 12. The comfort level of each color code pattern in each real sight pattern
428
K. Tanuma et al.
4 Requirements for Comfortable Design of Task-Related Information Displayed Using OSD The design guidelines for task-related information displayed using an OSD can be summarized in Table 3. Further, from the overall examination of the experimental results, we found that the participants tended to feel comfortable when the luminance that their eyes received did not change much in time sequence. Because the transparency to the real sight was high in the case where the background of the information items was just black, the luminance that the participants’ eyes received did not change much even if their viewpoint repeatedly went between the real sight and the image displayed by the OSD and they felt more comfortable than in the other cases. This tendency is also proved by the result that the item allocation type of AL (All), which did not involve switching the images displayed by the OSD, was preferred and the result that distinguishing the information items using less than three colors was highly preferred. From this discussion, mitigating the change in luminance received by the users’ eyes can be suggested as a basic requirement besides the recommendations for each design element that were summarized in Table 3. Table 3. Design guidelines for displaying task-related information using an OSD A (Black)
Item allocation Color code
E (Full-color mosaic)
More important or often checked information should be positioned according to the order of middle, lower-middle, nose-side, upper-middle, and ear-side. BB (Black/Black)
Background (Whole/Part) Item color (R, G, B)
C (Monochrome D (Monochrome+2 mosaic) mosaic) Over 1°01'14'' * 0°30'37''
Item size Item position
B (White)
Luminance > 0.16, Brightness > 70
Luminance > 0.04, Brightness > 35 AL (ALL) without often switching images
HL (Horizontal) VT (Vertical) as well as AL (ALL)
3 colors or less
5 Conclusion In this study, considering the remarkable hardware-side progress of OSDs and its near-future practicality, we tried to organize the software-side knowledge in order to support effective introduction and utilization. In particular, we assumed its application to industrial fields, and focused on how the information content displayed by OSDs should be designed to enable workers to use it comfortably. From the results obtained by the experiment with three processes, it was expected that workers could work more comfortably not only by just using an OSD but also by receiving well-designed content from it. The guidelines for determining the design for each element were summarized based on the data.
Comfortable Design of Task-Related Information Displayed Using OSDs
429
The OSD is a type of display that is still developing, and there is large potential for its application. It is natural that the information content will be different depending on how and where the OSD is used. Thus, in order to allow for the content design to diversify, we provided the guidelines from macroscopic viewpoints without detailed restrictions for each design element. In the future, additional research will be necessary, as packages including both hardware and software sides enable workers to use this new technology positively and pleasantly.
References 1. Nakanishi, M., Okada, Y.: Practicability of using active guidance with retinal scanning display in sequential operations. In: Proceedings of IEA (International Ergonomics Association) 2006 Triennial Congress in Maastricht, Netherlands, on CD-ROM (2006) 2. Nakanishi, M., Ozeki, M., Akasaka, T., Okada, Y.: What Conditions are Required to Effectively Use Augmented Reality for Manuals in Actual Work. Journal of Multimedia (JMM) 3(3), 34–43 (2008)
Usability Issues in Introducing Capacitive Interaction into Mobile Navigation Shuang Xu1 and Keith Bradburn2 1 Lexmark International, Inc. 740 West Circle Road, Lexington, KY 40550, USA [email protected] 2 Hyundai America Technical Center, Inc. 6800 Geddes Road, Superior Township, MI 48198, USA [email protected]
Abstract. Capacitive sensing technology has become a promising solution to alleviating the hardware constraints of mobile interaction methods. However, little is known about users’ perception and satisfaction of the integrated capacitive touch interaction and conventional inputs on mobile devices. This study examined users’ performance of using a traditional 4-way navigation control enhanced with capacitive touch interaction. Findings from this investigation indicate that the additional capacitive interaction mode does not necessarily improve users’ performance or perception of mobile navigation tasks. Although users welcome the innovative interaction techniques supported by the traditional cell phone keypad, it is crucial that the touch-based interaction is easy to discover, easy to maneuver, and does not impede users’ conventional interactivity on mobile devices. Keywords: Mobile interaction, capacitive, touch-based input, navigation, target selection.
1 Introduction Equipped with wireless technologies, mobile devices are becoming increasingly common in people’s everyday lives for the convenience of accessing information and staying socially connected anytime, anywhere. However, the inherent hardware constraints of mobile devices, such as the small screen and keypad, make the information input difficult for mobile users. With the introduction of the Apple iPhone, more and more mobile devices are now fitted with touch screens that are designed for direct finger touch input. While touch is a compelling input modality for mobile interaction, there are three fundamental problems with direct finger touch input. Occlusion happens when the selecting target is smaller than the size of the finger contact area, which further prevents users from receiving visual feedback or confirmation. This problem is more pronounced for onehanded operation because the thumb pivots around the joint and can hide half of a mobile screen. Accuracy is also a problem commonly encountered during touch screen interactions. Parhi et al. reported that 9.2mm is the minimum size for targets to G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 430–439, 2011. © Springer-Verlag Berlin Heidelberg 2011
Usability Issues in Introducing Capacitive Interaction into Mobile Navigation
431
be accurately accessible with the thumb [1]. Some mobile devices, such as the iPhone, rely on a limited set of large buttons at the price of reduced number of interactive targets. This approach, therefore, is not always appropriate for many mobile applications. Last but not least, target accessibility becomes problematic as the borders of the touch screen are difficult to reach with the thumb because the morphology of the hand constrains thumb movements [2]. Various design solutions and interaction techniques have been proposed to address the above usability issues. For examples, capacitive sensing technology can be used to enhance the conventional keypad input by adding a touch sensitive layer on top of the buttons. In this way, it is possible to capture users’ finger touch input on the keypad without occluding the visual presentations on the mobile screens or compromising the graphic target size for accuracy. Hinckley et al. developed and demonstrated “touchsensing user interfaces” such as the TouchTrackball, the Scrolling TouchMouse, and the On-demand Interface [3, 4]. These input devices use unobtrusive touch sensors to detect users’ touch input without requiring pressure or mechanical actuation of a switch. Clarkson et al. add a layer of piezo-resistive force sensors to support continuous measurement of the force exerting on each key [5]. This enables applications such as smooth zooming into image in proportion to the force on a button. But the sensors cannot distinguish between pressure received in a pocket from the touch of fingers. Rekimoto et al. presented a series of touch sensory-enhanced input devices such as SmartSkin [6], ThumbSense [7], SmartPad [8], and PreSense [9]. The authors discussed various application ideas such as providing preview information before execution, enabling text input with touch-sensitive keypad, and recognizing finger motions or gestures as different input commands in addition to conventional keypad inputs. However, no user evaluations on these prototypes have been reported and there is no working version available on a real mobile device. Little is known from users’ perspectives. For example, can a user discover the different interaction layers embedded on the same key? Is it acceptable that a user must learn and remember the pre-defined gestures in order to enter information with the touch sensors? Will additional interaction mode hinder a user’s current activities on the conventional keypad? Many questions remain unanswered regarding mobile users’ performance and perception of their experiences with capacitive sensing input methods.
2 Proposed Design and Research Questions This study aims to examine the efficiency and accuracy of users’ performance and their perceived cognitive workload on mobile navigation tasks with capacitive touch control. With a traditional 4-way navigation key set, users can only control the scrolling speed via either press-and-release (also known as “single click”) or press-and-hold (also known as “hard press”). An additional input mode was introduced in this study by implementing a capacitive sensor on the navigation key set. Therefore, continuous physical contact (referred to as “Light Touch” hereafter) resulted in a different speed of automatic scrolling. Implementation of this added dimension of navigation control comes with substantial usability risks which are investigated in this study. We compared users’ performance on mobile navigation tasks using demo unit A that supports traditional 4-way navigation keys to users’ performance on demo unit B that enables capacitive touch-sensitive navigation keys. Two methods were available
432
S. Xu and K. Bradburn
on unit A: (1) “Single Click” that moves highlight one line per click; and (2) “Hard Press” that automatically scrolls the highlight at the speed of 24 lines per second. Besides these two methods, unit B also supported (3) “Light Touch” that scrolls the highlight at the speed of 6 lines per second automatically. The main research question is broken down into the following questions: Q1: [Discoverability] Is Light Touch on unit B easily discoverable for participants who are not aware of the existence of this additional interaction mode? Q2: [Efficiency] Will using Light Touch result in improved efficiency of participants’ navigation performance on unit B? Q3: [Accuracy] Will using Light Touch result in improved accuracy of participants’ navigation performance on unit B? Q4: [Perception] Will using Light Touch reduce participants’ perceived cognitive workload of their navigation task experiences on unit B?
3 Methodology Two cell phone units that have conventional 4-way navigation keys were used in this study to collect participants’ performance on icon navigation tasks and list scrolling tasks. Pressing the four direction keys moves the highlight up, down, left, or right, respective. Twelve people (6 females), between 25 and 64 years old, were recruited to participate in this one-hour study. All of them were right-handed, with no visual disabilities. All participants owned a cell phone and were familiar with the use of the navigation keys on mobile devices. None of them had used capacitive sensing technology previously. They were compensated at the end of the study for their participation. This study used a within-subject 2 (demo unit) x 2 (task session) experiment design, where the task orders were counterbalanced before being assigned to each participant. On each demo unit, the participant completed 2 task sessions, 10 tasks in each session. One task session focused on icon navigation, while the other task session focusing on list-item scrolling. The three dependent variables were defined and measured as below: (1) Task completion time was defined as the time elapsed from starting point to clicking on the target. (2) Error rate was calculated as the total number of overall errors divided by the total number of tasks. Either a Path Error (if the participant navigated off the pre-described path) or a Pre-selection Error (if the participant selected an incorrect target) was counted as an Overall Error (1 or 0 per task). (3) Subjective rating of task experience was collected after each session using NASA Task Load Index (TLX) questionnaire, on the scale of 1 to 21, to measure participants’ perceived mental demand, physical demand, temporal demand, performance, effort, and frustration. Before starting the tasks on each cell phone unit, the participant was asked to play with the device for about two minutes and get familiar with the navigation controls on each device. Participants were not aware of the existence of Light Touch mode on
Usability Issues in Introducing Capacitive Interaction into Mobile Navigation
433
unit B, nor did they know whether there was any difference between the navigation controls on each unit. After two minutes, a 7-point Likert questionnaire was given to the participant with questions such as “How many different methods were available to navigate the highlight on this device?”, and “How easy was it to discover this method?” “How easy was it to use or control this method?” for each identified navigation method. The experimenter explained and demonstrated the available interaction methods before the participant started the tasks. For each of the 10 icon navigation tasks, the participant was asked to use the navigation keys to move the highlight (up, down, left, and right) to the target icon following a pre-defined path on a 3 x 4 menu icon layout, as illustrated in Figure 1a. Participants were encouraged to accomplish each task as fast and accurately as possible, using any interaction methods available on the navigation keys. For each of the 10 list scrolling tasks, the participant was asked to use the navigation keys to move the highlight (up and down) to the target located in a list of total 300 items, as illustrated in Figure 1b. Similarly, participants were told to accomplish the scrolling task as fast and accurately as possible, using any interaction method available on the navigation keys. The scrollbar was not displayed to ensure that all participants would concentrate on the list items without anticipating the total number of items available in the list.
Fig. 1a. Icon Navigation Task
Fig. 1b. List Scrolling Task
4 Results and Discussion One-way Analysis of Variance (ANOVA) was used to analyze the quantitative data collected in this study. Findings are discussed in the following sections. 4.1 Discoverability Discoverability of each navigation methods was measured by participants’ answers and subjective ratings on the discoverability questionnaire. Within two minutes of practice, all of the 12 participants successfully discovered the Single Click mode on both units. Only one participant failed to discover the Light Touch mode on unit B and another participant was not able to identify the Hard Press mode on unit A.
434
S. Xu and K. Bradburn
However, 7 out of the 12 participants did not discover Hard Press on unit B. This result was also confirmed by participants’ subjective ratings to the questions of “How easy was it to discover this navigation method?” and “How easy was it to use this navigation method?” as shown in Figure 2.
Fig. 2. Perceived Ease of Discovery and Ease of Use (pre-task)
While all participants considered Single Click easy to discover on both units (F1,22=1.00, P=0.328), they found it slightly easier to use Single Click with the traditional navigation key set on unit A than on unit B (F1,22=8.04, P=0.010). The participants who discovered Light Touch rated it acceptably easy to find (MeanunitB=2.75) and use (MeanunitA=3.58) on the 7-point Likert scale, as 1 being the easiest and 7 being the most difficult. Significant differences were reported on participants’ perception of whether it was easy to discover Hard Press on unit A and unit B (F1,14=19.04, P=0.001), and whether it was easy to use Hard Press (F1,14=55.69, P<0.001). In this study, no delay was implemented in Light Touch on cell phone unit B. This means that as soon as a user touched, but not pressed down, one of the four direction keys, she would see the automatic movement of the highlight on the display towards the corresponding direction. This design encouraged easy discovery of the Light Touch mode by first-time users. However, our findings indicate the following two problems. First, some users might consider Light Touch as the “next level” of navigation mode and thus failed to discover the Hard Press mode on unit B. Also, because Light Touch was enabled immediately, the participant could not initiate a Hard Press on unit B without going through the Light Touch mode. The second problem resulted in significantly lower ratings on perceived ease of use of the Hard Press on unit B. It further degraded participants’ task performance on unit B, as discussed in the following sections.
Usability Issues in Introducing Capacitive Interaction into Mobile Navigation
435
4.2 Efficiency For tasks in icon navigation and list scrolling, participants were asked to accomplish each pre-defined task as quickly and accurately as possible. Participants were also encouraged to achieve the goal with any interaction methods available on each unit, in order to mimic the mobile navigation usage in reality. For example, the participant could use Single Click exclusively on unit B for an icon navigation task on a 3 x 4 grid layout, if Light Touch were not perceived as an optimal interaction mode for the tasks. Therefore, participants’ performance of icon navigation using only Single Click on unit B was collected for precaution. Similar measurement was not taken for performance of the list scrolling tasks. Because participants would typically use a combination of interaction methods, where Light Touch and/or Hard Press were used to improve speed and Single Click was used to improve accuracy. Forcing them to use Single Click for the scrolling tasks on a list of 300 items would have increased the task duration and fatigue, which could further result in poorer performance and perception.
Fig. 3. Task Performance (Completion Time)
Figure 3 shows participants’ completion time for tasks in icon navigation and list scrolling, respectively. The results indicate that while participants did use Light Touch on unit B, using the traditional Single Click and Hard Press was faster to accomplish the icon navigation tasks (F2,357=17.92, P<0.001). Similarly, participants spent more time on the list scrolling tasks using the three interaction methods available on unit B (F1,238=18.22, P<0.001). Another interesting finding was revealed from a close look at participants’ scrolling movement time over the scrolling distance. The pre-defined 10 list scrolling tasks ranged from 5-line to 275-line scrolling and were randomized before the assignment. The power lines plotted in Figure 4 confirmed that on average scrolling task completion was slower on unit B than on unit A (unit A: Time= 0.7402*distance0.5252, R2=0.9327; unit B: Time=0.9803*distance0.5558, R2=0.7456).
436
S. Xu and K. Bradburn
Meanwhile, with the additional Light Touch mode, a higher variance of task completion time was observed (MeanunitA=9.01 sec, SDunitA=7.080; MeanunitB=12.25 sec, SDunitB=4.336), especially for scrolling tasks over longer distances.
Fig. 4. Movement Time over Scrolling Distance
It is not surprising that participants showed more consistent performance using the traditional Single Click and Hard Press on unit A. Furthermore, participants’ feedback indicated that the limitation of the current interaction design (as discussed in 5.1 Discoverability) made it difficult to initiate or control the Hard Press on unit B. Since Hard Press provided the fastest navigation speed in this study, the constrained accessibility of Hard Press on unit B ultimately resulted in degraded efficiency of task performance. 4.3 Accuracy The accuracy of participants’ task performance was measured by the error rate, defined as the total number of overall errors divided by the total number of tasks. As mentioned previously, a Path Error happened if the participant navigated the highlight off the pre-described path. A Pre-selection Error happened if the participant selected an incorrect target. Either a Path Error or a Pre-selection Error was considered as an Overall Error. Participants’ performance of each task was measured as either having (1) or having no (0) Path, Pre-selection, and Overall errors. For icon navigation tasks, participants made significantly more errors using the Single Click, Light Touch, and Hard Press on unit B as in Path Errors (F2,357=4.43, P=0.013), Pre-selection Errors (F2,357=3.96, P=0.020), and Overall Errors (F2,357=7.00, P=0.001). For list scrolling tasks, it was anticipated to observe more path errors than pre-selection errors since over-shooting or under-shooting happens more often than
Usability Issues in Introducing Capacitive Interaction into Mobile Navigation
437
selecting the wrong target. Accuracy of list scrolling tasks on unit B was similarly outperformed by unit A for Path Errors (F1,238=6.83, P=0.01), Pre-selection Errors (F1,238=8.34, P=0.004), and Overall Errors (F1,238=7.56, P=0.006).
Fig. 5. Task Performance (Error Rates)
Many participants commented that Light Touch was extremely sensitive on unit B. In this study, we did not implement any delay in the responsiveness of Light Touch. As a result, the high sensitivity caused unintentional movement of the highlight. While some over-shooting and under-shooting happened because the participant was trying to achieve the fastest possible scrolling speed, many path errors, as well as some pre-selection errors, happened when the participant was clicking on the center select key and barely touched one of the adjacent direction keys. Since Light Touch responded immediately upon finger touch on unit B, it moved the position of the current highlight and caused a wrong selection or a deviation from the correct navigation path. Participants’ frustration of not being able to control the movement of the highlight accurately on unit B had a great impact on their subjective ratings on the task load questionnaire, as discussed below. 4.4 Subjective Ratings of Task Load Participants’ subjective ratings of their task experience in this study were collected using the NASA-TLX questionnaire [10]. NASA-TLX is an assessment tool that allows users to perform subjective workload assessments on operators working with various human-machine systems [11]. It is considered as the strongest tool available for reporting perception of workload and has been widely used in various research areas such as aircraft cockpits, mobile interactions, and communication systems [12, 13, 14]. As summarized in Figure 6, participants reported significantly higher overall perception of their task experience on unit A (F1,22=31.06, P<0.001). They were particularly disappointed with the required mental demand (F1,22=9.70, P=0.005),
438
S. Xu and K. Bradburn
required physical demand (F1,22=29.72, P<0.001), self-evaluated performance (F1,22=13.06, P=0.002), required effort (F1,22=24.61, P<0.001), and perceived frustration (F1,22=51.06, P<0.001) of their experience on unit B.
Fig. 6. Subjective Ratings of Task Load (NASA-TLX)
Although the interaction constraints of Hard Press and the high sensitivity of Light Touch on the demo unit B caused degraded performance, participants were not against the idea of enabling additional interaction methods on the conventional mobile keypad. Some representative comments include: “It’d be cool to have this feature (Light Touch) on my cell phone, if the control was as dependable as my old 4-way navigation.” “I expect that it takes trial and err to pick up a new way of interaction, but it was very frustrating that I had to get the Hard Press via Light Touch on this phone (unit B).” Therefore, one should not be discouraged by the results of this study regarding integrating capacitive touch-based interaction with conventional mobile interaction techniques. However, how to appropriately design and introduce the innovative layer of touch interaction without impeding the conventional interaction activities remains a critical mobile usability issue.
5 Conclusion and Future Work In conclusion, capacitive interaction techniques provide a promising means to alleviate mobile interaction constraints. Results from this investigation suggest the following design guidelines to ensure that mobile users feel in control when using integrated capacitive and conventional interaction methods: (1) Optimized or customizable delay in responsiveness should be provided so that users can easily discover the capacitive input mode while not being hindered from their immediate access to press-and-hold; (2) Various methods should be provided to encourage users to discover available capacitive interaction gestures; (3) Visual, auditory, or tactile feedback should be provided to confirm users’ capacitive inputs; and
Usability Issues in Introducing Capacitive Interaction into Mobile Navigation
439
(4) It should ensure that additional capacitive interaction gestures are easy to maneuver and do not interfere with users’ routine activities using conventional interaction methods. This study did not explore various design solutions of integrating capacitive touch interaction with current mobile navigation methods. In the near future, we would like to investigate the responsiveness of capacitive touch interaction. Fine-tuning the delay may allow the user to easily discover this touch-based input mode without interfering with her physical key presses. Other research areas along this direction also include (1) understanding the proper use of capacitive touch-based interaction from users’ perspectives, such as identifying the optimal speed of automation and preferred finger motions in different mobile applications; and (2) enabling personal customization, such as introducing adjustable sensitivity levels and delay time to optimize interaction performance for individual users.
References 1. Parhi, P., Karlson, A., Bederson, B.: Target Size Study for One-Handed Thumb Use on Small Touchscreen Devices. In: MobileHCI 2006, Espoo, Finland, pp. 203–210. ACM Press, New York (2006) 2. Roudaut, A., Huot, S., Lecolinet, E.: TapTap and MagStick: Improving one-hand target acquisition on small touch-screens. AVI, Napoli, Italy, 146-153 (2008) 3. Hinckley, K., Sinclair, M.: Touch-Sensing Input Devices. In: CHI 1999, Pittsburgh, Pennsylvania, pp. 223–230. ACM Press, New York (1999) 4. Hinckley, K., Pierce, J., Sinclair, M., Horvitz, E.: Sensing Techniques for Mobile Interaction. In: UIST 2000, San Diego, California, pp. 91–100. ACM Press, New York (2000) 5. Clarkson, E., Patel, S., Pierce, J., Abowd, G.: Exploring Continuous Pressure Input for Mobile Phones. GVU Technical Report (2006), http://hdl.handle.net/1853/13138 6. Rekimoto, J.: SmartSkin: An Infrastructure for Freehand Manipulation on Interactive Surfaces. In: CHI 2002, Minneapolis, Minnesota, pp. 113–120. ACM Press, New York (2002) 7. Rekimoto, J.: ThumbSense: Automatic Input Mode Sensing for Touchpad-Based Interactions. In: CHI 2003, Fort Lauderdale, Florida, pp. 852–853. ACM Press, New York (2003) 8. Rekimoto, J., Ishizawa, T., Oba, H.: SmartPad: A Finger-Sensing Keypad for Mobile Interaction. In: CHI 2003, Fort Lauderdale, Florida, pp. 850–851. ACM Press, New York (2003) 9. Rekimoto, J., Ishizawa, T., Schwesig, C., Oba, H.: PreSense: Interaction Techniques for Finger Sensing Input Devices. In: UIST 2003, Vancouver, British Columbia, Canada, pp. 203–212. ACM Press, New York (2003) 10. NASA TLX subjective workload assessment scale, http://iac.dtic.mil/hsiac/docs/TLX-UserManual.pdf 11. Hart, S., Staveland, L.: Development of Nasa-TLX (Task Load Index): Results of Empirical and Theoretical Research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload. Elsevier, North-Holland (1988) 12. Mustonen, T., Olkkonen, M., Hakkinen, J.: Examining Mobile Phone Text Legibility while Walking. In: CHI 2004, Vienna, Austria, pp. 1243–1246. ACM Press, New York (2004) 13. Park, S., Harada, A., Igarashi, H.: Influences of Personal Preference on Product Usability. In: CHI 2006, Montréal, Québec, Canada, pp. 87–92. ACM Press, New York (2006) 14. Öquist, G., Lundin, K.: Eye Movement Study of Reading Text on a Mobile Phone Using Paging, Scrolling, Leading, and RSVP. In: MUM 2007, Oulu, Finland, pp. 176–183. ACM Press, New York (2007)
Balance Ball Interface for Performing Arts Tomoyuki Yamaguchi, Tsukasa Kobayashi, and Shuji Hashimoto Department of Applied Physics, Waseda University, 3-4-1 Okubo, Shinjuku,Tokyo, 169-8555, Japan [email protected], [email protected], [email protected]
Abstract. This paper introduces a novel balance ball-shaped interface to translate human body expression to sound. Although a variety of mobile interfaces have been introduced for musical performance, most of them are small not to disturb performer’s action. On the other hand the proposed interface is visible and large enough to act with human performer. All the sensors are equipped in a large balance ball which moves, rolls and deforms according to the performer’s actions such as pushing and kicking. Keywords: Balance ball interface, Musical performance, body expression.
1 Introduction Digital musical instruments are devices to translate human body motion to sounds for musical performances. Recently, many digital musical devices have been proposed to generate sound based on human actions. Especially, iPhone has been used for the musical device. It becomes the most popular device because it has some sensors to detect the human actions. Therefore, there are developed many application software to perform music production and music editing [1][2]. As these researches used iPhonebased applications, they have been reported a lot of new music systems by developed the original devices based on a haptic interface, a video and motion capture devices [3-11]. These systems also can create music according to human gesture and body movement. We also have proposed musical interfaces [12-15]. GrapMIDI[13] and TwinkleBall[14, 15] are ball-shaped interface for easy handing. The sound is generated by grasping force and human motion. The performer can control the note and tempo by varying grasping force and motion, respectively. TwinkleBall is also able to generate the sound from the body expression and achieve the free-style performance. Most of existing interfaces are small not to disturb performer’s action. On the other hand, we present a large ball-shaped musical interface in this paper. The proposed interface detects the user’s actions by luminance intensity, acceleration and angular velocity and large enough to interact with human performer. It moves, rolls and deforms according to the performer’s actions such as pushing and kicking. Our goal is to achieve a new body expression to create sound in the large stage performance. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 440–445, 2011. © Springer-Verlag Berlin Heidelberg 2011
Balance Ball Interface for Performing Arts
441
2 Structure of the Proposed Balance Ball Interface Figs. 1 and 2 show a prototype of the proposed balance ball interface and the overview of the proposed system, respectively. The main body of the proposed balance ball interface consists of a rubber ball, a Bluetooth wireless module, a 3-axis accelerometer, a 3-axis gyro sensor, a photo diode, high intensity LEDs, a PIC (Peripheral Interface Controller) and two batteries (9.0 V). All electronics devices are enclosed in the rubber ball which represents translucent and hollow. The Bluetooth wireless module, the photo sensor, 3-axis accelerometer, 3-axis gyro sensor, the PIC, and the batteries are place on an electronic circuit board inside the core, which is fixed to the rubber ball using rubber tubes. In the rubber ball, the Bluetooth wireless module, the photo diode, the PIC, and the battery are deposited on the substrate. The LEDs are placed in the interior of the rubber ball. The specifications of the rubber ball are as follow: diameter 900 mm, weight 2.7 kg, material silicon rubber. As shown in Figure 2, the signal output from the photo sensor, the accelerometer and the gyro sensor are digitized and sent to an external computer via internal Bluetooth wireless module. In the computer, the velocity, note and tempo are calculated by using received values for controlling the MIDI sounds. The internal speaker on the computer or the connected external speaker outputs the sounds depending on the calculated the velocity, note and tempo.
Fig. 1. Prototype of the balance ball interface
442
T. Yamaguchi, T. Kobayashi, and S. Hashimoto
LEDs
♪♪ ♪
Accelerometer Photo diode Bluetooth communication
Gyro sensor
Battery Bluetooth module
External computer
Fig. 2. Overview of the proposed system
3 Performing Styles Depending on the Human Body Expression One of the features of TwinkleBall is hand-held size. Although it has throwing and bounding performances for the style of performance, the typical style of its performance is grasping performance [15]. On the other hand, the performing style by using the proposed interface is different because the size of the proposed interface is bigger than hand-held size. Figure 3 illustrates the various styles of performance using the proposed interface. Fig. 3 (a) shows the rolling performance. Users are able to perform the music by their hands or arms actions. Fig. 3 (b) shows lifting performance. User can be lifting the interface by both hands. The typical performance style of the proposed interface is riding performance as shown Fig. 3 (c). It is possible to perform music by riding on the proposed interface as if user performs exercise using a balance ball.
(c)
(a )
(b) Fig. 3. Styles of musical performance. (a) rolling, (b) lifting, (c) riding
Balance Ball Interface for Performing Arts
443
In the system, we use MIDI sound as an output of the generated sound. In this paper, the velocity is changed by the pushing force which is controlled by the luminance intensity. The tempo is changed by the angular velocity which is detected by the output of the gyro sensor. The note is changed depending on the gradient direction which is measured by the outputs of the gyro sensor and accelerometer. These setting are determined by preliminarily experiments.
4 System Configuration In order to confirm the validity of the proposed balance ball interface, we perform the system confirmation. We verify 4 pattern performances. The typical performance styles by using the proposed interface are rolling performance, lifting performance and riding performance. Figure 4 shows the captured images during rolling, lifting
(a) Rolling performance
(b) Lifting performance
(c) Riding performance
(d) Diving performance Fig. 4. Results of performances
444
T. Yamaguchi, T. Kobayashi, and S. Hashimoto
and riding performances. In Fig. 4(a), the velocity is almost constant but the tempo and note are changed by human motion. Human controls the tempo by changing the rolling speed depending on the arm speed. The note is also controlled by the rotating angle of the interface. In Fig. 4(b), the velocity does not change during the performance but the tempo and note are changed by human action. Human controls the tempo by changing the speed of the interface using both arms. Then the note is also changed the gradient of the interface. In Fig. 4(c), the tempo does not change but the velocity and the note change depending on the riding motion and direction, respectively. Moreover, Fig. 4(d) shows the diving performance. Because the proposed interface is robust by using silicon rubber, it is possible to this performance style. The output sound is also same as the riding performance. During the system configuration, the proposed interface can generate the body movement. Especially, the riding performance and diving performance are sort of new performance style. These are not possible by using the mobile interfaces. The proposed interface has the robustness feature. It is important feature to make the human interface.
5 Conclusions This paper introduced a novel balance ball interface. The proposed interface can be expressed larger human performance such as rolling, lifting, riding and diving performances via the system confirmation. In the future, we are planning to apply this interface for performing arts. Moreover, we will perform the experiments about cocreation performance and sound by a number of users. Acknowledgments. This work was supported in part by the “Global Robot Academia,” Grant-in-Aid for Global COE Program by the Ministry of Education, Culture, Sports, Science and Technology, Japan; CREST project “Foundation of technology supporting the creation of digital media contents” of JST; and a Waseda University Grant for Special Research Projects(B) No.2010B-164.
References 1. Oh, J., Herrera, J., Bryan, N.J., Dahl, L., Wang, G.: Evolving The Mobile Phone Orchestra. In: Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pp. 82–87 (2010) 2. Essl, G., Müller, A.: Designing Mobile Musical Instruments and Environments with urMus. In: Proceedings of the International Conference on New Interfaces for Musical Expression, NIME (2010) 3. Tanaka, A.: Musical technical issue in using interactive instrument technology with application to the BioMuse. In: Proc. ICMC, pp. 124–126 (1993) 4. Siegel, W., Jacobsen, J.: Composing for the Digital Dance Interface. In: Proc. of ICMC, pp. 276–277 (1999) 5. Paradiso, J.: The Brain Opera Technology: New instruments and gestural sensors for musical interaction and performance. Journal of New Music Research 28, 130–149 (1999)
Balance Ball Interface for Performing Arts
445
6. Camurri, A., Hashimoto, S., Ricchetti, M., Ricci, A., Suzuki, K., Trocca, R., Volpe, G.: EyesWeb - Toward Gesture and Affect Recognition in Dance/Music Interactive Systems. Computer Music Journal 24(1), 57–69 (2000) 7. Kaltenbrunner, M., Geiger, G., Jordà, S.: Dynamic Patches for Live Musical Performance. In: Proc. of the 4th Conference on New Interfaces for Musical Expression (NIME 2004), pp. 19–22 (2004) 8. Morita, H., Hashimoto, S., et al.: A Computer Music System that Follows a Human Conductor. IEEE Computer 24(7), 44–53 (1991) 9. Paradiso, J., Hsiao, K.Y., Hu, E.: “Interactive Music for Instrumented Dancing Shoes. In: Proc. of ICMC, pp. 453–456 (1999) 10. Lyons, M., Tetsutani, N.: “Facing the Music: A Facial Action Controlled Musical Interface. In: Proc. of CHI, pp. 309–310 (2001) 11. Weinberg, G., Aimi, R., Jennings, K.: The Beatbug Network - A Rhythmic System for Interdependent Group Collaboration. In: Proc. of NIME 2002, pp. 106–111 (2002) 12. Sawada, H., Onoe, N., Hashimoto, S.: Sounds in Hands -A Sound Modifier Using Datagloves and Twiddle Interface. In: Proc. ICMC1997, pp. 309–312 (1997) 13. Hashimoto, S., Sawada, H.: A Grasping Device to Sense Hand Gesture for Expressive Sound Generation. J. of New Music Research 34(1), 115–123 (2005) 14. Yamaguchi, T., Hashimoto, S.: Grasping Interface with Photo Sensor for a Musical Instrument. In: Proc. of the 13th International Conference on Human-Computer Interaction (HCI International 2009), USA, pp. 542–547 (July 2009) 15. Yamaguchi, T., Kobayashi, T., Ariga, A., Hashimoto, S.: TwinkleBall: A Wireless Musical Interface for Embodied Sound Media. In: Proc. of New Interfaces for Musical Expression (NIME 2010), Australia, pp. 116–119 (June 2010)
Study on Accessibility of Urgent Message Transmission Service in a Disaster Shunichi Yonemura1 and Kazuo Kamata2 1 2
NTT Cyber Solutions Laboratories Department of Information Science, Utsunomiya University
Abstract. In this paper, the layer model of urgent message transmission service is proposed and the semantic level channels of communication inescapable when considering the accessibility of an urgent message are discussed. Keywords: accessibility, urgent message, sign language.
1 Introduction When natural disasters, such as an earthquake or a flood occur, an urgent message transmission service for minimizing damage is indispensable. The urgent messages are needed to transmit the situation, evacuation plans, etc. correctly to residents. They can guide the residents to suitable evacuation sites and contribute to safe movement patterns. To transmit urgent messages to residents we need more than just a communications network. We must have assurances that the messages will be received by the residents and will be understood, even in an emergency. Therefore, the communication channels for urgent message transmission have a layered structure consisting of a media layer with its physical channels for communication, and a semantic layer, which provides logical channels for communication. The channels in the media layer are formed by physical links and so can be influenced by physical disasters, such as the loss of a wireless relay tower. On the other hand, the semantic layer channel is not established until the meaning of the message is correctly understood by the recipient. This is influenced by the recipient's physical and mental condition, knowledge, etc. A foreigner unfamiliar with the official language of a disaster area, for example, will not understand the meaning of an urgent text-based message. In this case, although the physical channel is reliable, the logical channel is unreliable. Previous research on the urgent message transmission service focused on the media layer and ignored the semantic layer. Comprehensive research on accessibility of the urgent message, which must consider the recipient's physical and mental condition, is impossible to find. This paper proposes the layered model of urgent message transmission service, and discusses the semantic layer communication essential to assuring the accessibility of urgent messages. Urgent message transmission service to deaf people is taken to be an example of semantic layer communication, and the transfer characteristic has been G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 446–450, 2011. © Springer-Verlag Berlin Heidelberg 2011
Study on Accessibility of Urgent Message Transmission Service in a Disaster
447
investigated experimentally. The use of visual displays is introduced as an example of a semantic layer channel. An experiment showed that differences in the display format of a message influence the transfer characteristic of the semantic layer channel.
2 Layer Model of Urgent Message Transmission The layer model of urgent message transmission is shown in Figure 1. From the left, the sender transfers disaster information to the receivers. When a disaster occurs, one of more local autonomous organizations collect damage information and transmits it to disaster victims through various physical channels, i.e. the media layer. The channel consists of terminal units and a communication line. The person in charge (sender of a report) of a local autonomous organization transmits, for example, an evacuation order based on the collected disaster condition report to the victims. The order, input via the organization’s terminal, is relayed to a server near the disaster area through an existing communications network (cable or radio), and then transmitted to the victim’s terminal. The receiving terminal displays the transmitted signal as text, sounds, images, or various combinations thereof, to the victim. Materializing this media layer channel is the first step. In an actual disaster, since various factors (for example, communication line cut, congestion, and failure of server or terminal) can impede the establishment of media layer channels, it is necessary to have path redundancy. Note that here we ignore communication channels that permit message distortion, for example, oral messages and handwritten memos. Sociological factors, such as the dissemination of false rumors, need to be considered for person-to-person communication. The 2nd layer in Figure 1 is the semantic layer. In this layer, the intention of the message sender's should be correctly understood by the receiver (a human). There are some grades in an evacuation order, for example, recommendation of preparation of evacuation, a direction of an evacuation beginning, a direction of immediate evacuation, etc. However, correspondence with the grade of an evacuation order and the words representing the grade is not fully understood by residents in general. This is because a word without the familiarity for residents is contained in an evacuation message. The transmission pass of the semantic layer channel is not established in such case. Also when the linguistic backgrounds of the message sender and receiver differ, the semantic layer channel is not always formed appropriately. When transmitting a Japanese message to the deaf people using sign language in everyday life, the suitable semantic layer channel may not be established. Because, Japanese is not a native language for deaf people. Examination of the semantic layer channel in consideration of deaf people's linguistic skill is needed.
3 Universal Design of Urgent Messages In order to illuminate the semantic layer channel, this chapter discusses the universal design underlying urgent message transmission for deaf people. For most deaf people the first language is signing, Japanese is the 2nd language. Since our mental workload is high in an emergency, an urgent message must be easy to be understood. In public
448
S. Yonemura and K. Kamata
spaces, the method created for deaf people displays sign language and Japanese text simultaneously. This is far from optimal since many deaf people note that message understanding in public space is difficult. Since sign language is premised on the conversationalists being together, its effectiveness in one-way information presentation (the flow of information is one way from sender to receiver) is low. Furthermore, since Japanese is a second language for deaf people, it is difficult to understand a message certainly and quickly in emergency. Moreover, since the two types of visual media are competing for the receiver’s attention, overall information passing is poor. For this reason, when deaf people read an urgent message, the strategy which compares the both sides of sign language and a Japanese text is performed. Therefore, the presentation format which helps comparison of sign language and Japanese easer is required of the universal design of information to deaf people. We proposed listed sign language as the optimal presentation method for passing urgent messages to deaf people. Listed sign language is a presentation format that displays itemized Japanese text and sign language fragments side by side. A comparison of against sentence message presentation (the conventional method) with listed sign language showed that the latter was ranked significantly more successful by deaf people. That is, the success of the semantic layer channel strongly depends on the information presentation format.
4 Urgent Message Reading Experiment by Listed Sign Language Figure 2 shows an example of a message presented in the listed sign language format. The written expressions are displayed in table form on the left side of the screen. The corresponding sign language movie on the right side. Differences in sign language skill and the variation in Japanese sentence comprehension could be overcome by the listed sign language technique. An experiment was conducted using 25 subjects ranging in age from 20 to 60 (average age of 34, SD=11.7). Based on gaze measurements, the subjects’ reading strategies encompassed three types. - JSL dominant: JSL movie received the most attention. - JT dominant: The Japanese text received the most attention. - Neutral: No one form predominated. The recorded data is shown in figure. 3. It was found that subjects compared JSL with JT. This comparison improved the comprehension of expressions that were hard to understand when expressed in only one form. For example, it is more intelligible to read a text, although a numerical value and a proper noun are expressed by the finger character in sign language. In interviews after the experiment, the subjects noted that while listed sign language was initially found to be incongruous, urgent messages by listed sign language format were easy to understand. The subjects' impressions of the three methods were collected at the end of the experiment using a questionnaire with the following five items. (1) Accuracy: I think that the message was transmitted correctly. (2) Quickness: I think that the message was transmitted quickly. (3) Easy understanding: I think that it was easy to understand the message.
Study on Accessibility of Urgent Message Transmission Service in a Disaster
449
(4) Conformity with emergency: I think that this displaying method is suitable for urgent messages like this time. (5) Sense of security: I think that this display method provides a sense of security. Figure 4 shows the results of an analysis of the responses. Long sentence sign language results (black bars) are topped by those of listed sign language (white bars). Listed sign language messages were well accepted by deaf people.
5 Discussion In list sign language, since sign language is fragmented according to itemized statement Japanese, there is sense of incongruity from deaf people. Interview to deaf people showed that although the long sentence was good at the time of usual, list sign language was good in an emergency. In an emergency, a message needs to be understood certainly quickly. Therefore, a message needs to be read, comparing sign language with Japanese. That is, in the universal design of an urgent message, it needs to be taken into consideration that comparison of sign language and Japanese certainly occurs. Thus, in urgent message presentation service, it is important to be taken into consideration to establishment of a semantic layer transmission channel in addition to a media layer.
References 1. Kamata, K., Shionome, T., Yuyama, I., Yamamoto, H.: Study on Universal Design for Information and Communication Services. In: Proceedings of the IEICE General Conference, vol. 2005, p. 202 (2005) 2. Nagashima, Y.: Support to elderly people and disabled persons - Support for hearing impaired. Journal of Human Interface Society 7(1), 33–40 (2005) 3. Kamata, K., Yamamoto, H., Yuyama, I., Shionome, T.: Facial Factors in Signed Language Communication Services. IEICE technical report, vol. SITE2005-2, pp. 7-12 (2005). 4. Kamata, K., Shionome, T., Inoue, T.: Basic Study on Public Relations Magazine in Signed Language. Technical report of IEICE, vol. HCS2004-3, pp. 11-16 (2004). 5. Yano, Y., Matsumoto, Y., Zhang, L., Kamata, K., Yamamoto, H.: Universalisation of Public Relations Information Services. Technical report of IEICE, vol. HCS2005-57, pp. 13-16 (2006) 6. Naito, N., Kato, H., Murakami, Y., Ishihara, H.: Evaluation of the Visual Information in Remote Sign Language Interpreting System for a Lecture Scene. The Transaction of Human Interface Society 5(4), 475–482 (2003) 7. Igi, S., Watanabe, R., Lu, S.: Synthesis and Editing Tool for Japanese Sign Language Animation. The transactions of the IEICE D-I, J84-D-I(6), 987–995 (2001) 8. Sakiyama, T., Oohira, E., Sagawa, H., Ooki, M., Ikeda, H.: A Generation Method for RealTime Sign Language Animation. The transactions of the IEICE D-II J79-D-II(2), 182–190 (1996) 9. Abe, M., Sakou, H., Sagawa, H.: Sign Language Translation Based on Syntactic and Semantic Analysis. The transactions of the IEICE D-II J76-D-II(9), 2023–2030 (1993)
450
S. Yonemura and K. Kamata
10. Kamata, K., Satoh, T., Yamamoto, H., Fischer, S.: Effect of Signing Rate on Temporal Structures in Japanese Sign Language Sentences. The Journal of the Institute of Image Information and Television Engineers 56(5), 872–876 (2002) 11. Nishikawa, S., Takahashi, H., Kobayashi, M., Ishihara, Y., Shibata, K.: Real-Time Japanese Captioning System for the Hearing Impaired Persons. The transactions of the IEICE D-II J78-D-II(11), 1589–1597 (1995) 12. Monma, T., Sawamura, E., Fukushima, T., Maruyama, I., Ehara, T., Shirai, K.: Automatic Closed-Caption Production System on TV Programs for Hearing-Impaired People. The transactions of the IEICE D-II J84-D-II(6), 888–897 (2001) 13. JIS X8341-2. Guidelines for older persons and persons with disabilities – Information and communications equipment, software and services – Part 2: Information processing equipment, Japanese Standards Association (2004) 14. ISO 9241-12:1998 Ergonomic requirements for office work with visual display terminals (VDTs) - Part. 12: Presentation of information (1998)
Is ACARS and FANS-1A Just Another Data Link to the Controller? Vernol Battiste1, Joel Lachter1, Sarah V. Ligda1, Jimmy H. Nguyen3, L. Paige Bacon3, Robert W. Koteskey1, and Walter W. Johnson2 1 San Jose State University, Flight Deck Display Research Lab, USA NASA Ames Research Center, Flight Deck Display Research Lab, USA 3 California State University, Long Beach, USA {vernol.battiste-1,joel.lachter, sarah.v.ligda,walter.johnson}@nasa.gov, {mrjimnguyen,paigebacon86,rob.koteskey}@gmail.com
2
Abstract. This report investigates issues surrounding TBO procedures for the current aircraft fleet when requesting deviations around weather. Air and ground procedures were developed to stringently follow TBO principles using three types of communication: Voice, ACARS, and FANS. ACARS and FANS are both text-based communication systems, but FANS allows uplinked flight plans to be automatically loaded into the FMS, while ACARS does not. From the controller perspective, though, all flight plan modifications were completed using a trial planner and delivered via voice or data comm, making FANS and ACARS similar. The controller processed pilots’ request and approved or modified them based on traffic management constraints. In this context, the rate of non-conformance across all conditions was higher than anticipated, with off path errors being in excess of 20%. Controllers did not differentiate between the ACARS and FANS data comm, and showed mixed preferences for Voice vs data comm (ACARS and FANS).
1 Introduction The U.S. government, through the Joint Planning and Development Office (JPDO), has developed a plan to overhaul the nation’s air traffic control system called the Next Generation Air Transportation System (NextGen)[1, 2]. The FAA’s NextGen Implementation Plan calls for a shift to automation supported trajectory based operations (TBO), where all aircraft remain on their ground-approved trajectories [3]. The advantage to TBO is that the trajectories can be vetted by conflict detection and resolution systems to aid the controller in maintaining safe and efficient aircraft separation and scheduling [4]. Successful implementation of TBO will require sound concepts of operations and new tools [5]. However, often neglected are the essential elements in the infrastructure behind the concepts: an efficient and reliable air/ground communication system. The present research aims to address the issue of air/ground communication in the context of TBO. In today’s clearance-based operation, controllers frequently issue vectors of indeterminate duration via voice in order to resolve conflicts or allow aircraft to avoid weather. However, in future TBO operations, the controller will need G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 453–462, 2011. © Springer-Verlag Berlin Heidelberg 2011
454
V. Battiste et al.
to handle flight path modifications by updating the full trajectory in a “Host” computer, and then ensure that the aircraft conform to this new trajectory. These new tasks could have an adverse effect on controller workload; therefore, tools that will aid the controller in creating and updating trajectories are being developed (e.g., trail planners; conflict detection and auto-resolution tools) [6, 7]. These tools should aid the controller in creating trajectories; however, these new trajectories must be delivered to and flown by the aircraft. TBO poses a challenge for pilot-controller communications because significantly more information must be conveyed to all parties involved – ATC, flight deck, and automation (ground host computer). Compounding the issue is the fact that, at the present time, only about 20% of current transport aircraft are equipped with the Future Air Navigation System (FANS). FANS includes data link communications (data comm) that is integrated with the aircraft’s flight management system (FMS). The controller station is equipped with tools that support the creation and exchange trajectory modifications in a digital form that is loadable into the Host and the aircraft’s flight management system (FMS). FANS has traditionally been used only in oceanic ATM environments. It is currently being tested on a limited basis by the European ATM Programme at the Maastricht Upper Area Control Center, and was tested on a limited basis at Miami Center [8]. Thus, for early implementation of TBO, we develop procedures for nonFANS equipped aircraft. One option is to take advantage of the much more prevalent Aircraft Communications Addressing and Reporting System (ACARS). This system is predominantly used for communication between an Airline Operation Center (AOC) and its aircraft, and is widely available on today’s transport aircraft. The main drawback of ACARS messages is that they are text-only and, therefore, are not loadable into the FMS.
2 Current Study The present research had three goals and one critical assumption, all related to any near term implementation of TBO. The first goal was to develop flight deck TBO procedures for aircraft with different communication equipage levels, and to evaluate their effectiveness and usability in negotiations with the controller. The second goal was to evaluate and compare controller performance and workload in handling TBO operations across different percentages of flight deck equipage. The assumption was that advanced groundside tools supporting TBO will become available well before NAS-wide changes to flight deck equipage, as indicated in the FAA’s NextGen Implementation Plan. The three communication equipage levels were examined: 1) FANS, providing data comm integrated with the FMS; 2) ACARS, providing a nonintegrated data comm; and 3) voice only. Although several studies have assessed communications with different FANS and ACARS protocols, none have examined off-nominal situations [9, 10, 11], and none were aimed at full TBO operations, and the associated problem of keeping all aircraft on trajectory during operations in the presences of convective weather. Thus, the third goal of this study was to examine controller/pilot data comm communication during TBO with convective weather, where there is a greater likelihood that pilot goals (e.g., weather avoidance) and controller goals (e.g., efficient traffic management) will
Is ACARS and FANS-1A Just Another Data Link to the Controller?
455
diverge resulting in negotiation. These negotiations may be more difficult because most transport flight decks use onboard weather radar, while the ground uses NextRad, resulting in very different views of the weather. In this simulation, the controller was responsible for managing ~1-1.2X traffic in a high altitude sector in Kansas City Center’s airspace. The sector is on the eastern boundary, adjacent to Indianapolis Center. The primary sector traffic was modeled after normal day time traffic flows, along with UPS arrivals into UPS’s HUB at Louisville International Airport. The major impediment to normal sector traffic flow was significant convective weather near the eastern border of the sector. Controllers were responsible for aircraft separation and traffic management, while pilots were responsible for weather avoidance. Additionally, controllers were responsible for maintaining trajectory-based operations, if at all possible. To accomplish this task they were instructed to minimize vectoring of aircraft by creating flight plan modifications using a trial planner and delivering the modified trajectories via voice or data comm. Although from the controller perspective, sending and receiving information from a FANS and ACARS aircraft required the same actions, they were briefed on the flight deck procedures for loading and executing data comm clearances on each flight deck, and were aware of the possible differences in pilot response time when using FANS versus ACARS.
3 Communication Procedure Development Initial procedures for the flight deck and controllers were developed by the authors, one of whom is a current airline pilot, and another who is a former air traffic controller. The procedures were then vetted by two retired controllers and two pilots (one current and one retired airline pilot). The goal of all procedures was to keep an updated flight plan trajectory in a ground-based “host” computer, and to make it possible for the aircraft to closely adhere to those flight plan trajectories. A large number of assumptions and justification for our procedures are provided in Lachter, (in press), and are too numerous to describe in this paper. In general, the procedures were designed to reduce workload/complexity on the controllers DSR, and to aid both the controllers and crews in coordinating and approving flight path deviations.
4 Method 4.1 Participants Four controllers (two per week) and sixteen commercial airline pilots (eight per week) were paid participants in this HITL simulation. Because the focus this paper is on controller performance, we limit our description to the controller participants. Controllers were retired TRACON controllers with at least 19 years of service. All were trained on, and familiar with, advanced ATC tools. The controllers received a week of training on en route and arrival operations in ZKC and ZID centers.
456
V. Battiste et al.
4.2 Controller DSR The controller display was presented on a 32” high resolution monitor and was configured to support traffic managed in ZKC 90, see Figure 1. Two aircraft symbols with two colors were used to display the three equipment types. Dynamic NextRad weather was displayed near the eastern sector boundary. The controller DSR was equipped with advanced conflict detection, out to 8 minutes, and auto- and manual resolution tools coupled with a trial planner. See Prevot [6], for a complete description of the MACS DSR.
Fig. 1. DSR display of equipage type and data comm portal – FANS, green chevron & data tag; ACARS, gray chevron, green data tag; Voice, gray aircraft symbol and yellow data tag. Note data link pending bracketing callsign UPS589.
The controllers also had a separate touch-screen computer used to measure real time workload and flight plan acceptability. Participants were asked to rate their workload on a 1-5 scale once every minute throughout the trial. Following a flight plan change, participants were asked to rate the quality of the flight plan. An additional paper-and-pencil workload questionnaire was administered after each trial; a post-simulation questionnaire was administered after the final trial. 4.3 General Procedures The HITL simulation spanned two consecutive weeks with different controller and pilots each week. Each week began with one day devoted to training, three days scheduled for data collection, and a final day for make-up trials and debriefing. Participants were briefed on both flight and ground operations near convective weather. Additionally, controllers were briefed on managing optimized profile descents with merging and spacing during arrival operations into Louisville.
Is ACARS and FANS-1A Just Another Data Link to the Controller?
457
4.4 Scenarios Thirty-two 20-minute scenarios were run. Experimental controllers managed traffic in a high altitude sector (ZKC 50) west of Louisville KY. In each scenario, two experimental flight decks (dual pilot) flew west to east through this sector reaching a storm front on the eastern side of the sector. Experimental flight decks were arrivals headed for Louisville International Airport (SDF). Their top of decent (TOD) which varied by flight levels was near the eastern edge of the sector. Pseudo-pilots managed all additional traffic to bring the traffic sector count to between 16-20 aircraft at all time; ~ 1 to 1.25X traffic load. There were four starting conditions at the beginning of the scenario, as defined by the location of the weather and traffic. The weather for each of these four starting conditions evolved in one of four ways so that neither the controller nor pilots could make assumptions about the optimal path through the weather until they have monitored the storm system development. 4.5 Experimental Design The experimental design consists of two fixed factors (Airspace Mixture and Aircraft Equipage) and three random factors (scenario, crew, and controller). Airspace mixture was the number of equipped aircraft in the sector, set at three levels: predominantly Voice (80% Voice, 20% FANS), predominantly FANS (80% FANS, 20% Voice) and predominantly ACARS (60% ACARS, 20% Voice and 20% FANS). These three conditions were intended to reflect three possible ways of managing the current majority of aircraft in the NAS that are equipped with ACARS. The Airspace Mixture factor should affect the controller’s workload while the Aircraft Equipage factor affect individual experimental aircraft, see Brandt, [12]. 4.6 Communications Procedures To maximizing adherence to TBO objectives, communications procedures were designed to keep aircraft on the flight plans in the host. Thus flight plans were communicated as closed loop trajectories with a specified point to depart from, and return to, the original flight path. Procedures were developed in which proposed trial plan amendments included a push point two minutes ahead of the aircraft, allowing time for negotiation, implementation, and possibly rejection of the proposal before any maneuver began. During training it became apparent that, for Voice aircraft, communicating this added waypoint increased controller workload disproportionally. Thus, the procedures were modified so that, maneuvers were off the nose for Voice aircraft and controllers amended the flight path in the host after the maneuver if the displayed symbology showed the aircraft to be off-path. 4.7 Controller Procedures Data comm messages were logged and ordered on the DSR display based on when they were received. They are also coded in the aircraft’s data tag. However, the
458
V. Battiste et al.
controller has discretion as to when each message was handled. To reply to the message the controller normally selects the portal in the data tag which shows the route for FANS aircraft, and highlighted the ACARS or voice aircraft in the list. For voice aircraft controllers had to handle the request immediately or copy/remember the request and ask the aircraft to standby. The procedure for modifying the host flight plan using the trial planner was generally the same for all aircraft. The current path is modified by selecting the portal in the aircraft’s data tag, then selection a point on the original path and dragging that point to a location that is clear of weather and is conflict free. The path automatically snaps to a named fix if one is proximal to the desired location. The controller then uplinks the new trajectory to the aircraft, if data comm equipped, or delivers the clearance via voice if not equipped. 4.8 Dependent Variables In addition to the workload and route quality ratings described earlier, there were two dependent performance variables: the miles added to the original trajectory by the path modification (path stretch), and the percent of time the aircraft was not in conformance with the host trajectory (non-conformance was defined as when the aircraft was at least 1 mile off path, or off track by 15 deg from the nominal trajectory). For this paper we will only present controller data (see Brandt, in press for the flight deck data).
5 Results and Discussion 5.1 Performance Our initial analyses show little difference in performance between conditions. ANOVAs showed no significant differences between different flight deck equipment or different equipage mixtures in terms of the trajectories flown, path stretch to avoid weather, or time off path. The lack of effects on path stretch and amendments was not too surprising since the factors that drive flight path changes (e.g., weather, conflicting traffic, distance to top of descent) were built into the scenario and may have overwhelm any influence of communication method. However, the absence of any effect on non-conformance was somewhat surprising, since it was expected that the FANS condition should have performed best and the Voice worst. This was not found, but a surprising overall level of non-conformance was found. When non-conformance was examined as a function of individual controller and Aircraft Equipage, the mean non-conformance rate was at or above 20%, cresting 45% in one case. These were very high numbers. The controllers also differ both in overall performance and in how Aircraft Equipage affected their ability to keep aircraft on their trajectories (Fig. 2). For two controllers, Voice aircraft are off path much more often than FANS or ACARS aircraft, while for the other two, Voice aircraft are off path less often than FANS, nearly as little as ACARS.
Is ACARS and FANS-1A Just Another Data Link to the Controller?
459
Fig. 2. Proportion of time off path by controller
For all four controllers, ACARS aircraft were off path less frequently than either Voice or FANS aircraft. The higher non-conformance by the Voice aircraft may be explained by the requirement for controllers to create modifications "off-the-nose" of the Voice aircraft (immediate turns), or by controllers giving an OK to a request before actually entering it into the host. Either of these procedures introduces delays between the creation in the host and implementation of the trajectories on the flight deck. For the ACARS aircraft, the controller was instructed to use the trial planner to create trajectories. The trial planner automatically inserted a push point located two minutes ahead of the aircraft. When uplinked the crew would need to manually enter the amendment, with the instruction to just turn the aircraft onto its first leg when the push point was reached if the amendment had not been fully entered. The crews reported that this procedure was cumbersome (see Brandt, in press) and may be responsible for the 20% non-conformance. Finally, the significant amounts of nonconformance with FANS procedure may have been due to the downlinked FANS routes also being off-the-nose. When the controller approves them they are subject to the same delays between creation and implementation that were present with Voice aircraft. 5.2 Workload Ratings At the end of each trial, all participants rated their workload (1 low to 5 high) on four criteria: Overall and Peak Workload associated with maintaining separation and with handling weather avoidance requests. For controllers the four post trial workload ratings were obtained for each level of Equipage Mixture. These resulting 12 mean workload ratings clustered in a fairly restricted range (from 3.28 to 3.85). For the four measures, controllers' mean
460
V. Battiste et al.
workload was highest in the Predominantly FANS condition, and was lowest for Predominantly Voice, with the exception of Overall workload associated with separation, for which Voice and ACARS were nearly identical. These trends were significant for the “Overall workload associated with separation” (F(2,6) = 13.91, p < .05) and “Overall workload associated with weather avoidance requests (F(2,6) = 5.966, p< .05). Controllers were also asked to rate their workload every minute during the trial, also on a 1 low to 5 high scale. Interestingly, controllers rated their workload higher on this online measure than they did on the post trial questionnaire. Again, taking the means for each controller for each level of Equipage Mixture the range ran from 4.02 to 4.97, with two of the controllers rating all three conditions over 4.9. Differences between Equipage Mixtures were not found to be significant, presumably because of the ceiling effect. Similarly, there were no differences in RT to the workload probes. 5.3 Post Simulation Ratings After the simulation, both pilots and controllers rated how much they agreed with 15 statements (1 - complete disagreement to 5 complete agreement) about each of the different communication procedures (e.g., “I felt adequately aware of what the pilots of ACARS/FANS/Voice aircraft were doing,” and “Trajectory operations using solely ACARS/FANS/Voice is, in principle, a workable concept.”). No ANOVAs were conducted on controller ratings due to the very limited sample size (4), but the pattern is quite clear. Controllers generally rated the two data comm procedures identically. Two of the four rated ACARS and FANS identical on all 15 criteria. That is, they saw no difference in the two procedures. One controller gave FANS better ratings on three criteria, and the remaining controller gave ACARS a better rating on one criteria. In addition, only one of the four controllers agreed with the statement “I was very aware of whether an aircraft I was handling was integrated data comm (FANS-1A) or ACARS.” Thus, it appears that our procedures were successful in allowing ACARS-equipped aircraft to be managed similarly to FANS aircraft from the controller’s perspective. While ACARS and FANS appeared very similar to the controllers, Voice, naturally was quite different. Yet the four controllers differed on whether Voice was preferable to the two data comm conditions (FANS and ACARS). Because FANS and ACARS were considered so similar we average them into one “data comm” rating for the following discussion. One controller rated the data comm conditions better than Voice on eight of the 15 criteria, while rating Voice better on none. A second controller rated the data comm better on six and Voice better on one. However, a third controller rated Voice better on eight and data comm better on none, and the final controller rated Voice better on one and data comm better on one. Thus, it appears that controllers varied in their relative preferences for Voice and data comm. However, it should be noted that they were relatively unfamiliar with the FANS and ACARS procedure. Thus, it is possible that any residual preference for Voice occurred because it was familiar and they were well practiced with it.
Is ACARS and FANS-1A Just Another Data Link to the Controller?
461
6 Conclusions In our simulation environment, where aircraft can make requests, all equipage levels were off trajectory more than expected. Despite our emphasis in training that aircraft be kept on host trajectories, we occasionally observed that a controller appeared unconcerned if one or two flights were not conforming. This was probably due to fact that there were no ATM penalties for allowing aircraft to be off trajectories: no additional coordination with the next sector, no downstream schedules that needed to be met, and only losses of separation in the sector were of concern. Because no class of aircraft can easily create flight plans that contain push points, controllers could not simply approve requests but must either create a new flight plan and send it back to the flight deck or adjust the flight plan in the host to match what the aircraft was actually flying. Either way, this additional step adds significantly to the controller workload, as reflected in the data. As to the question of making FANS and ACARS similar to the controller, all data from the study suggest that we did just that from the controller's perspective - Two of the four controllers reported no difference in the two data comm procedures. One controller gave FANS better ratings on three criteria, and the remaining controller gave ACARS a better rating on one criteria. In addition, only one of the four controllers agreed with the statement “I was very aware of whether an aircraft I was handling was integrated data comm (FANS-1A) or ACARS. It is possible that a mixture of data comm and voice could result in more acceptable response times while accruing many of the benefits of data comm (such as reduced transmission error, the ability to transmit more complex clearances, and a reduction in voice traffic). Several pilots in our study stated during the debriefing that their concerns about data comm would be greatly ameliorated if requests were acknowledged more promptly even if there was a delay in the actual response. Acknowledgement. This study was supported by the NASA Airspace Program, Concepts and Development Project. We would like to thank George Lawton, Dominic Wong, Riva Canton, Tom Quinonez and John Luk for software programming support. We would particularly like to thank Dr. Kim Vu and Dr. Tom Strybel of California State University, Long Beach, as well as their students for their aid and support in running this study.
References 1. United States Department of Transportation, Research and Innovative Technology Administration: Air Carrier Traffic Statistics (2011), http://www.bts.gov/programs/airline_information/ air_carrier_traffic_statistics/airtraffic/annual/ 1981_present.html 2. Joint Planning and Development Office. Next Generation Transportation System: Concept of Operation V 3.0. Government Printing Office, Washington D.C. (2010) 3. Federal Aviation Administration. NextGen Implementation Plan. NextGen. Integration and Implementation Office, Washington D.C. (2010)
462
V. Battiste et al.
4. Erzberger, H.: Transforming the NAS: The Next Generation Air Traffic Control System. In: 24th International Congress of the Aeronautical Sciences, Yokohama, Japan, August 2005 (2004) 5. McNally, D., Mueller, E., Thipphavong, D., Paielli, R., Cheng, J., Lee, C., Sahlman, S., Walton, J.: A Near-Term Concept for Trajectory-Based Operations with Air/Ground Data Link Communication. In: Proceedings of the 27th International Congress of the Aeronautical Sciences, Nice, France (2010) 6. Prevot, T., Callantine, T., Lee, P., Mercer, J., Battiste, V., Johnson, W., Palmer, E., Smith, N.: Co-Operative Air Traffic Management: A Technology Enabled Concept for the Next Generation Air Transportation System. In: 5th USA/Europe Air Traffic management Research and Development Seminar, Baltimore, MD (June 2005) 7. Prevot, T.: Exploring the many perspectives of distributed air traffic management: the multi Aircraft Control System MACS. In: Proc. of the. Int. Conf. on Hum.-Comp. Interaction in Aero, pp. 23–25 (2002) 8. Gonda, J.C.: Miami Controller-Pilot Data Link Communications Summary and Assessment. In: 3rd USA/Europe Air Traffic Management R&D Seminar, Italy, Napoli (June 2000) 9. Morrow, D., Lee, A., Rodvold, M.: Analysis of problems in routine controller-pilot communication. Int. J. of Av. Psych. 3, 285–302 (1993) 10. Smith, N., Lee, P., Prevot, T., Mercer, J., Palmer, E., Battiste, V., Johnson, W.: A Humanin-the-loop Evaluation of Air-Ground Trajectory Negotiation. In: Proc. of the 4th Amer. Inst. of Aero. and Astro. Av. Tech., Integ., and Oper. Conf., Chicago, IL (2004) 11. Lozito, S., Martin, L., Dunbar, M., McGann, A., Verma, S.: The impact of voice, data link, and mixed air traffic control environments on flight deck procedures. In: Proc. of the ATM 2003, The 5th USA/Eur. R&D Sem., Budapest, Hungary (2003) 12. Brandt, S., Lachter, J., Dao, Q., Battiste, V., Johnson, W.: Flight Deck Workload and Acceptability of Verbal and Digital Communication Protocols. In: The Proceedings of HCI International, Orlando, Fl (2011)
Flight Deck Workload and Acceptability of Verbal and Digital Communication Protocols Summer L. Brandt1, Joel Lachter1, Arik-Quang V. Dao1, Vernol Battiste1, and Walter W. Johnson2 1 San Jose State University, Moffett Field, CA 94035, USA NASA Ames Research Center, Moffett Field, CA 94035, USA {summer.l.brandt,joel.lachter,quang.v.dao, vernol.battiste-1,walter.w.johnson}@nasa.gov 2
Abstract. The Federal Aviation Administration hopes to convert air traffic management to Trajectory Based Operations (TBO), under which aircraft flight plans are known to computer systems which aid in scheduling and separation. However, few aircraft flying today have equipment designed to support TBO. We conducted a human-in-the-loop simulation of TBO using current fleet equipage. Three aircraft equipage levels were explored: Voice (the equipment currently used), FANS (the Future Aircraft Navigation System datacom designed for use in TBO), and ACARS (a datacom system widely used for communication with Airline Operation Centers). FANS uplinked flight plans can be automatically loaded into the Flight Management System, while ACARS delivers text that must be entered manually. Pilots rated various aspects of the procedures. Voice was preferred to FANS, with ACARS rated worst, apparently because of slow response times for requests with datacom. Using a mixture of Voice and datacom may provide the benefits of both. Keywords: Trajectory Based Operations (TBO), Datacom, NextGen, Humanin-the-Loop simulation.
1 Introduction Air traffic is increasing beyond the capacity of the National Air Space (NAS). To fix this, the United States government developed a plan referred to as the Next Generation Air Transportation System or NextGen [1]. One focus of NextGen is to shift workload away from air traffic control (ATC) by providing automated tools to aid controllers in scheduling and maintaining separation between aircraft. However, in order to do this, the automation must know the aircrafts’ future plans, not just their current speed and direction. Under current day operations, if an aircraft encounters weather or traffic, ATC will typically send it on a vector, a change in course in which only a new direction is specified. For NextGen automation to work, a new flight plan will have to be entered into a computer on the ground (typically referred to as the “host”) that specifies a complete path reconnecting with the original flight path. Such procedures are part of an operational concept known as Trajectory Based Operations (TBO). TBO poses a challenge for pilot-controller communications because significantly more information must be exchanged between ATC and the flight deck. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 463–472, 2011. © Springer-Verlag Berlin Heidelberg 2011
464
S.L. Brandt et al.
The flight deck needs not only a direction to fly but also the locations of all subsequent waypoints until it connects with the original flight plan (e.g., a precise location to turn off the current flight plan and a precise location to turn back). In current day operations, updated flight plan information is conveyed by voice communications. Voice communication provides pilots with speed and flexibility in negotiations with controllers. However, because voice information is inherently ephemeral, pilots and controllers frequently request clarifications [2]. Message blocking occurs when more than one message is transmitted simultaneously on the same channel resulting in all but one of the messages being omitted or interrupted when they reach the intended recipient. Message blocking occurs particularly when radio frequencies are congested in areas of the air space where traffic density is highest. Further, communication errors, as well as workload increase with the complexity of the message being transmitted [3]. The more complex TBO messages will likely increase workload, frequency congestion and communication errors. To examine solutions to these communication issues, researchers have implemented and compared digital communication protocols with voice [2]. In addition to voice, most transport category aircraft are also equipped with datacom technology called the Aircraft Communications Addressing and Reporting System (ACARS). This system is currently used for communication between aircraft and their Airline Operations Centers (AOCs) but can be used for the transmission of text-based messages between pilots and controllers. ACARS is not prone to errors due to intelligibility, and aids memory because once transmitted the message is perpetually available to the human operator [3]. One of the main advantages of ACARS over voice is the ability to transmit complex messages accurately. However, data would need to be manually entered into the Flight Management System (FMS) with ACARS making it cumbersome to use, and ultimately making transaction times and workload in some cases comparable to voice protocols [4]. Similarly, for pilots to make requests of ATC using ACARS, they would have to type the message into the CDU, which is noticeably slower than making a voice request. Integrating datacom and the FMS could help reduce these costs. Such an integrated datacom system is the Future Air Navigation System (FANS). FANS allows flight plans in the FMS to be downlinked as requests and uplinked clearances to be loaded into the FMS. Unfortunately, relatively few planes in the NAS are currently FANS equipped, largely because procedures for FANS equipped aircraft are only used in oceanic environments, and then only for arrivals. Datacom - FANS or ACARS - cannot be exploited until procedures are developed and put in place. Research into using datacom for TBO has focused on implementation of controller clearances. In weather impacted airspace, however, aircraft must also communicate requests to the ground. Further, previous research in TBO environments has focused on aircraft with FANS equipage that (as noted above) is still uncommon in the NAS. Mueller [5], for example, examined the use of FANS for implementing controller clearances. Mueller found that this system was satisfactory to implement trajectorybased clearances from controllers. In a follow-up study, Mueller and Lozito examined flight deck procedures for trajectory-based clearance negotiations [6]. They compared procedures for handling uplinked strategic trajectory clearances that varied the responsibility distribution
Flight Deck Workload and Acceptability of Verbal and Digital Communication Protocols
465
between the pilot flying and the pilot monitoring (sharing versus non-sharing) in one simulation study and whether to print out datacom clearances in another. All procedures evaluated were rated to be generally suitable.
2 Current Study The current research examines issues related to a near-term implementation of TBO where aircraft are expected to have more or less the same equipage as today. Data were collected on both air traffic controller and pilot participants; however, here we report only pilot data. This study specifically examined the ability of aircraft as currently equipped to maintain TBO while modifying trajectories during weather impacted flights. Weather impacted operations were selected in order to require twoway communication between pilots and controllers: Pilots make requests for weather and may reject controller clearances if they take them into weather; controllers issue clearances for traffic and may reject aircraft requests if they take them into traffic. We examined trajectory change requests and clearances using three types of aircraft equipage: Voice only, ACARS and FANS.
3 Method 3.1 Participants Participants were sixteen commercial airline pilots with glass cockpit experience (eight per week) and four retired TRACON controllers (two per week) recruited from similar simulations who were familiar with the prototype NextGen ATC tools used in this simulation. Pilots were divided into two person crews with the more experienced chosen as the captain. They remained together for the duration of their participation. 3.2 Equipment Four dual-pilot stations and two controller stations were operated by study participants. Confederates operated a simulation manager station, two pseudo-pilot stations that handled additional non-experimental aircraft and two ghost controller stations that handled aircraft after they left the sector. Dual Pilot Stations. Each dual-pilot station consisted of three monitors, on which the controls and displays necessary for operating a generic Boeing transport category aircraft were simulated. An autopilot Mode Control Panel (MCP) allowed direct manipulation of the heading, speed and altitude of the aircraft and dual Control Display Units (CDU's) allowed manipulation of the flight plan in the FMS. Flight path and navigation information was presented on dual Cockpit Situation Displays (CSD's) and dual Primary Flight Displays (PFD's). Controls for hand flying the aircraft (e.g., yoke) were not available. The autopilot MCP, CDUs, PFDs and alerting display were driven by MACS, the Multi-Aircraft Control System developed by the Airspace Operations Lab at NASA Ames [7]. These operated as on most Boeing transport category aircraft. The CSD was
466
S.L. Brandt et al.
developed by NASA Ames Flight Deck Display Research Laboratory [8] and presented a 2D display of navigation information, weather radar targets and TCASlike traffic. 3.3 General Procedures The study was run in two one-week sessions. Each week began with one day devoted to training, three days scheduled for data collection and a final day for make-up trials and debriefing. The entire simulation consisted of thirty-two 20-minute scenarios. During each trial two simulation worlds were run simultaneously, each world containing one participant controller and two participant dual-pilot flight decks. Background aircraft flown by pseudo-pilots provided a realistic traffic load of approximately 16 aircraft at any time. Once a minute, pilots used a touch-screen computer to rate their workload on a 1 to 5 scale. After each trial pilots gave overall workload and flight acceptability ratings, again using a 1 to 5 scale. A post-simulation questionnaire asked pilots to use a 5point Likert scale to rate items related to concept acceptability, safety and procedures, and simulation realism. Pilots were provided space to add comments. 3.4 Scenarios All experimental flight decks were arrivals into Louisville International Airport (SDF), reaching top of descent near the eastern edge of the sector. Pilots flew west to east through the sector and negotiated a storm front on the eastern side of the sector. There were four starting conditions at the beginning of the scenario as defined by the location of the weather and traffic. The weather for each of these four starting conditions evolved in various ways creating 16 total scenarios. Because the weather evolved differently from trial to trial, neither the controllers nor the pilots could create a path through the weather until they watched it develop. 3.5 Experimental Design The experimental design had three fixed factors, Airspace Mixture, Aircraft Equipage and Pilot Role (Pilot Flying vs Pilot Monitoring). There were also three random factors, Scenario, Crew and Controller. Aircraft Equipage specified the datacom communication capability of the individual participant aircraft: FANS, ACARS and Voice. The Airspace Mixture factor, which specified the proportion of FANS, ACARS, and Voice aircraft in the sector, was only relevant to controller workload and performance and will not be further considered in this report. Each aircrew was run in a total of 32 scenarios, 12 FANS equipped, 12 Voice equipped, and 8 ACARS equipped. Each aircrew saw each of the 16 weather system/traffic patterns twice. In the trials using these repeated scenarios the aircrew always flew different aircraft (approaching the weather system from a different angle) and had a different Aircraft Equipage type.
Flight Deck Workload and Acceptability of Verbal and Digital Communication Protocols
467
3.6 Communications Procedures Procedures had to be developed for pilots to request and implement trajectory modifications. Procedures were initially developed in which proposed flight plan amendments developed by ATC included an initial waypoint on the current route two minutes ahead of the aircraft. This was added to allow time to negotiate, implement, and possibly reject the proposal before any modification began. However, during training and initial runs it became apparent that, for Voice aircraft, communicating this added waypoint increased controller workload beyond tolerable limits. As a result, the procedures were modified so that, for Voice aircraft, pilots proposed and controllers constructed maneuvers that were off the nose; if necessary controllers then amended the flight plan in the host to be consistent with the path the aircraft was flying (e.g., when there was a significant delay between the time the controller entered the flight plan into the host and when the flight deck implemented the initial turn). In addition, the initial maneuver for downlinked requests from FANS-equipped aircraft were also always off the nose of the aircraft. This is the result of a limitation in FANS automation. Proposed ground automation pushes an initial maneuver point along two minutes in front of ownship during the development of a trajectory modification; current FANS implementation does not do this. Manually inserting an initial waypoint that does not move in this manner encounters a number of problems (e.g., the aircraft is not equipped to calculate the location of such a point). Flight Deck Procedures Voice Aircraft Procedures on the flight deck for Voice aircraft were similar to those followed today, except that trajectory-based requests and clearances were required. Voice clearances typically took the form of adding a waypoint (e.g., “ABC123, direct HILLS then PRINC remainder of the route unchanged”). Pilots then entered this new routing into their FMS. Pilots encountering weather could request a deviation direct a named waypoint or a new track. Controllers then entered a new flight plan based on this request into the host computer on the ground and issued the appropriate clearance. FANS Equipped Aircraft When an uplinked clearance was received on a FANS equipped aircraft, a message appeared on the alerting display on the flight deck. The procedure for the pilot-notflying was to navigate to the ATC Messages page on the CDU, load the clearance into the FMS, and then, if acceptable, send an accept message via the CDU and execute the flight plan. Otherwise, the pilot would reject the clearance and follow up with a revised request to ATC via datacom or voice. For requests, pilots developed a modified flight plan in the CDU using standard tools, downlinked it to ATC, and then monitored the message status to see if it was accepted or rejected. Accepted requests could be executed. Rejected requests were typically followed up with a proposed clearance from ATC. As noted above, although FANS equipped, ATC uplinks always included an initial modified waypoint approximately two minutes in front of the aircraft. FANS trajectory request downlinks were always for initial turns off the nose of the aircraft.
468
S.L. Brandt et al.
ACARS Equipped Aircraft Automation translated ATC clearances for ACARS aircraft into an uplinked free text message that appeared on the ACARS menu in the CDU. As with FANS messages, these were announced on the message alerting display. The pilot-not-flying would then navigate to the ACARS messages page on the CDU, and then read and confirm it with the flying pilot before entering it into the FMS. Clearances were presented in two parts. First, the clearance itself, which contained the path an aircraft was to fly listed as a series of waypoints (possibly including latitude-longitude coordinates). Second, an “FMS contingency” procedure to be executed if the crew could not enter the clearance before arriving at the first modified waypoint, initially located approximately two minutes ahead of the aircraft. The FMS contingency provided the predicted time at which an aircraft would reach this first waypoint, and a track the pilots should fly from the first to the second waypoint. Pilots were taught to initially enter the second waypoint into the CDU. They could then execute the new flight plan at the contingency time and remain on their flight plan if they did not have time to enter the first waypoint. In practice this became the standard as the first waypoint was almost always a lat/long and onerous to enter. ACARS aircraft accepted/rejected clearances by free texting back WWW for Wilco, and UNA for Unable, which were interpreted appropriately by the automation. The pilots were trained to use the free text function of the ACARS ATC message page when making trajectory requests by ACARS, which controllers would read and turn into amendments that were then uplinked back to the aircraft.
4 Results Repeated-measures ANOVAs were used to analyze the pilot ratings. The p-values were adjusted using Greenhouse-Geisser for violations of Sphericity where appropriate. Unless otherwise stated, all ratings were averaged by crew. 4.1 Crew Workload Ratings At the end of each trial, pilots rated their overall and peak workload (1 none to 5 unmanageable) associated with their flight and with weather (see Table 1). Across all four workload measures, crews consistently rated their workload highest in the ACARS condition and lowest in the Voice condition. These differences were significant for overall workload associated with the flight (F(2, 14) = 9.75, p < .01) and peak workload associated with the flight (F(2, 14) = 5.83, p < .05). Overall and peak workload associated with avoiding weather (F(2, 14) = 4.40, p = .07; F(2, 14) = 3.13, p = .08, respectively) were not significant but followed a similar pattern. Workload probes were also collected at one-minute intervals throughout each trial. Missing data (due to computer error) occurred 1.3% of the time. There were no significant differences with missing data by Aircraft Equipage, (F < 1). Individual pilot response latencies to the workload probes were then analyzed as a function of Pilot Role and Aircraft Equipage. The effect of Aircraft Equipage on response latencies approached significance, F(2, 30) = 3.22, p = .054. Pilots took more time to respond to workload probes (M = 4.08 sec) in the ACARS condition
Flight Deck Workload and Acceptability of Verbal and Digital Communication Protocols
469
than the Voice (M = 3.74 sec, p = .058) condition. The FANS (M = 3.87 sec) condition was not different from the other two conditions. There was no effect of Pilot Role on response latencies, (F < 1). Timeouts, or when a participant failed to respond to a workload probe within the 30-second response window (presumably because workload was too high to attend to the probe) occurred 8.4% of the time. Pilots timed out 9.8% of the in the ACARS condition compared to 7.7% in both Voice and FANS conditions. A repeatedmeasures ANOVA of timeouts as a function of Pilot Role and Aircraft Equipage found a significant effect of Aircraft Equipage (F(1.27, 18.98) = 4.19, p < .05) but no effect of Pilot Role (F(1, 15) = 2.92, p = .11). Pilot probe workload ratings (1 bored to 5 busy) were then averaged by crew. Probe workload ratings as a function of Aircraft Equipage showed no significant effects, F(2, 14) = 1.57, p = .24. Overall, pilots found the workload of the simulated flights (M = 2.17) one rating higher than boring. Table 1. Means & Standard Deviations of Post-Trial Crew Workload and Acceptability Ratings by Aircraft Equipage Question Overall workload associated with your flight Peak workload associated with your flight Overall workload associated with avoiding weather Peak workload associated with avoiding weather Ability to communicate with ATC Ability of ATC to communicate with you Ability to understand what ATC proposed Safety of the flight path you flew Efficiency of the flight path you flew
FANS 2.24 (.21) 2.51 (.37) 2.16 (.16) 2.27 (.26) 3.95 (.35)a 3.94 (.37)a* 4.02 (.39)a 4.20 (.38) 3.90 (.41)
ACARS Voice 2.47 (.24)a 2.12 (.17)a 2.79 (.48)* 2.36 (.35)* 2.30 (.14)* 2.13 (.16)* 2.47 (.33) 2.20 (.22) 3.50 (.42)a 4.43 (.35)a 3.59 (.46)b* 4.34 (.38)ab 3.66 (.49)a 4.42 (.37)a 4.05 (.49)* 4.41 (.35)* 3.76 (.51) 4.03 (.53)
Note. Different subscripts represent means that are significantly different at p < .05. Asterisks represent means that are different at p < .07, but not at p < .05.
4.2 Ratings At the end of each trial, pilots provided five acceptability ratings (1 not acceptable to 5 excellent) related to communication with ATC and safety and efficiency of their flight path (see Table 1). Voice was rated the highest, FANS next highest and ACARS lowest in ability to communicate with ATC (F(2, 14) = 38.67, p < .01), ability of ATC to communicate with them (F(2, 14) = 18.98, p< .01) and ability to understand what ATC proposed (F(2, 14) = 34.51, p< .01). A similar pattern also emerged when crews rated the safety (F(2, 14) = 6.18, p < .05; and the efficiency of the flight path flown (F(2, 14) = 2.35, p = .13). Crews found that across all conditions, flight path safety was good (M = 4.22) and efficiency was marginally good (M = 3.90).
470
S.L. Brandt et al.
4.3 Post Simulation Ratings and Comments After the simulation, pilots rated their agreement with 15 statements (1 strongly disagree to 5 strongly agree) about each of the different communication procedures. These questions pertained to crew coordination, air-ground communication, safety and acceptability of trajectory based operations, time required for condition specific procedures, workload, and overall concept acceptance. Data were analyzed by pilot with multiple ANOVAs. Twelve of the 15 statements directly compared the three datacom conditions. For all 12 of these statements, the mean preference ratings was lowest for ACARS and highest for Voice, with FANS falling in the middle (all statistically significant at the p < .05 level). Given that ACARS and FANS procedures were the least familiar to the pilots, this finding is unsurprising. Pilots prefer familiar procedures. Additionally, ACARS flight requests required significantly more work and thus were received the least well. Pilots also provided open-ended feedback regarding each communication condition. Pilots had three major concerns about the datacom procedures. The first, concern was ATC response time (mentioned by 7/16 pilots for FANS and 13/16 for ACARS). In current-day operations with voice, the controller responds to messages as they are received. However, in a datacom environment, they seem to respond to messages based on traffic management needs. For example, an aircraft close to being handed off may have been responded to before an aircraft in the middle of the sector with a previous request. A second concern of pilots was the time and effort needed to create and manage messages on the CDU (mentioned by 7/16 for both FANS and ACARS). Finally, there were other concerns related to message format, such as how the ACARS clearance was formatted and how the ACARS and FANS messages came up across two pages, thereby requiring additional effort to integrate.
5 Conclusion The move to TBO, central to NextGen, has been hampered by low FANS equipage levels in the current fleet. Air carriers are reluctant to invest in FANS equipage while such equipage is of little use in the continental United States. At the same time, service providers see little benefit of TBO as long as equipage levels prevent the vast majority of users from operating under those rules. The proposed procedures point to a potential way out of this impasse by providing a path to TBO that does not require substantial levels of FANS equipage. While other studies have shown clear benefits to datacom procedures, our results show a more mixed picture. In particular, pilots show a strong preference for Voice. Why the difference? Our study stressed pilot requests, while previous work has looked almost entirely at clearances initiated by the controller. Datacom equipment found on current-day transport category aircraft makes it difficult to formulate
Flight Deck Workload and Acceptability of Verbal and Digital Communication Protocols
471
requests and lack the immediate feedback of traditional voice protocols. The generally slow response time for datacom has been noted in other studies [9, 10]. Groce and Boucek [11] specifically note that pilots found datacom unacceptable for weather avoidance clearances because of the time critical nature of such clearances. It is possible that a mix of datacom and voice could result in more acceptable response times while accruing many of the benefits of datacom (such as reduced transmission error, the ability to transmit more complex clearances and a reduction in voice traffic). For example, pilots could make requests by voice and receive an acknowledgement by voice which would then be followed up by a clearance. Several pilots in our study stated during the debriefing that their concerns about datacom would be greatly ameliorated if requests were acknowledged more promptly even if there was a delay in the actual response. Acknowledgement. This study was supported by the NASA Airspace Program, Concepts and Development Project. We would like to thank George Lawton, Dominic Wong, Riva Canton, Tom Quinonez and John Luk for software programming support, and Sarah Ligda and Patrick Cravalho for assistance with recruiting participants. We would particularly like to thank Dr. Kim Vu and Dr. Tom Strybel of California State University, Long Beach, as well for their aid and support in running this study.
References 1. Joint Planning and Development Office: Concepts of Operations for the Next Generation Air Transportation System, Version 3.0, Washington, DC (2010) 2. Kerns, K.: Air-Traffic Control/Flight Deck Integration. In: Wise, J.A., Hopkin, V.D., Garland, D.J. (eds.) Handbook of Aviation Human Factors, 2nd edn., pp. 23-1–23-17. CRC Press, Boca Raton (2010) 3. McGann, A., Morrow, D., Rodvold, M., Mackintosh, M.-A.: Mixed-Media Communications On The Flight Deck: A Comparison Of Voice, Data Link, And Mixed ATC Environments. Int. J. Aviat. Psychol. 8, 137–156 (1998) 4. Tallota, N.J.: Controller Evaluation Of Initial Data Link Terminal Air Traffic Control Services: Mini-Study 3, vol. I, Report No. DOT/FAA/CT-92/18, I. US Department of Transportation, Federal Aviation Administration, Washington, DC (1992) 5. Mueller, E.: Experimental Evaluation Of An Integrated Datalink And Automation-Based Strategic Trajectory Concept. In: Proceeding of the 7th American Institute of Aeronautics and Astronautics (AIAA) Aviation Technology, Belfast, Northern Ireland (2007) 6. Mueller, E., Lozito, S.: Flight Deck Procedural Guidelines for Datalink Trajectory Negotiation. In: Proceedings of the 8th American Institute of Aeronautics and Astronautics (AIAA) Aviation Technology, Integration, and Operations (ATIO) Conference, Anchorage, AK (2008) 7. Prevot, T.: Exploring The Many Perspectives Of Distributed Air Traffic Management: The Multi Aircraft Control System MACS. In: International Conference on Human-Computer Interaction in Aeronautics, pp. 23–25 (2002)
472
S.L. Brandt et al.
8. Granada, S., Dao, A.Q., Wong, D., Johnson, W.W., Battiste, V.: Development And Integration Of A Human Centered Volumetric Cockpit Display For Distributed Air Ground Operations. In: Proceedings of the 12th International Symposium on Aviation Psychology, Oklahoma City, OK, pp. 229–284 (2005) 9. Kerns, K.: Data-Link Communication Between Controllers And Pilots: A Review And Synthesis Of The Simulation Literature. Int. J. of Av. Psych. 1, 181–204 (1991) 10. Smith, N.M., Lee, P.U., Prevot, T., Mercer, J., Palmer, E., Battiste, V., Johnson, W.: A Human-In-The-Loop Evaluation Of Air-Ground Trajectory Negotiation. In: Proceedings of the 4th American Institute of Aerospace and Astronautics Aviation Technology, Integration and Operations Conference, Integration and Operations Conference, Chicago (2004) 11. Groce, J., Boucek, G.: Air Transport Crew Taskin. In: An ATC Data Link Environment. SAE Tech Paper Series, Warrendalte, PA (1987), ISSN 0148-7191
Conflict Resolution Automation and Pilot Situation Awareness Arik-Quang V. Dao1, Summer L. Brandt1, L. Paige Bacon2, Joshua M. Kraut2, Jimmy Nguyen2, Katsumi Minakata2, Hamzah Raza2, and Walter W. Johnson3 1 San Jose State University, Moffett Field, CA 94035, USA California State University Long Beach, Dept of Psychology, Long Beach, CA 90840, USA 3 NASA Ames Research Center, Moffett Field, CA 94035, USA {quang.v.dao,summer.l.brandt}@nasa.gov, {lauren.bacon,josh.kraut,jimmy.nguyen, hamzah.raza}@student.csulb.edu, [email protected], [email protected] 2
Abstract. This study compared pilot situation awareness across three traffic management concepts that varied traffic separation responsibility between the pilots, air-traffic controllers, and an automation system. In Concept 1, the flight deck was equipped with conflict resolution tools that enable them to perform the tasks of weather avoidance and self-separation from surrounding traffic. In Concept 2, air-traffic controllers were responsible for traffic separation, but pilots were provided tools for weather and traffic avoidance. In Concept 3, a ground based automation was used for conflict detection and resolution, and the flight deck tools allowed pilots to deviate for weather, but not detect conflicts. Results showed that pilot situation awareness was highest in Concept 1, where the pilots were most engaged, and lowest in Concept 3, where automation was heavily used. These findings suggest that pilot situation awareness on conflict resolution tasks can be improved by keeping them in the decision-making loop. Keywords: situation awareness, flight deck, automation, NextGen, SAGAT, SPAM.
1 Introduction Rapid increase in air traffic density will exceed the ability of the human controller to successfully manage operations in the national air space using existing traffic management concepts and technology [1]. To meet the capacity demands of the future air transportation system, as well as meet or improve safety and efficiency standards, human controller tasks such as air traffic conflict detection and resolution must be supported by, or shared with, humans in the flight deck and/or new automation technologies. Studies conducted at NASA Ames Research Center have shown that controller performance on conflict avoidance tasks decreases when traffic load increases, but this decrement can lessen when the controller is assisted by automation [2]. However, there may be trade-offs related to situation awareness when deploying G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 473–482, 2011. © Springer-Verlag Berlin Heidelberg 2011
474
A.-Q.V. Dao et al.
automation [3]. The focus of this paper is to assess these trade-offs with respect to pilot situation awareness under conditions where traffic separation responsibility is shared between the flight crew, controllers, and automation.
2 Situation Awareness Situation awareness (SA) has many definitions. The most widely used definition of SA is “the operator’s understanding of the state of the relevant environment and his or her ability to anticipate future changes and developments in the environment” [4]. This definition implies that SA is mostly internal. An alternative definition suggests that SA can be distributed between the operator and his/her task environment (e.g., information located on a display of traffic) [5]. Endsley [6] developed an off-line probe technique, called the Situation Awareness Global Assessment Technique (SAGAT) to assess SA in experimental contexts and simulations. With this technique, task analysis is first used to identify critical information requirements. Probe questions are then developed to capture the operator’s awareness of this information. During a simulation, the scenario is paused, the screen blanked, and the operator is presented with the probe questions. Higher accuracy scores on the questions are indicative of higher SA. However, SAGAT has been criticized for being too heavily reliant on working memory, and the process of freezing and resuming a scenario interrupts the operator’s primary task [7]. Alternatively, Durso, Bleckley, and Dattel [8] proposed that SA can be measured based on the operator’s understanding of the task environment. That is, the operator may not have the information needed to answer the probe question in working memory, but may know the location of SA relevant information in the surrounding environment. Knowing where to find critical information should yield better SA, thus allowing operators access to the display can then improve their accuracy for these events. Therefore, SA information normally available from the display should be available when the operator is being probed. In order for the operator to be probed without stopping the task, SA probes are administered in a two-step process. First, a ready prompt is presented. This prompt informs operators that a probe question is ready to be presented. If the operator’s workload is not too high, and s/he has the resources to answer the probe question, the ready prompt will be accepted by pressing a button or by saying “ready.” The probe question is then administered right after the ready prompt has been accepted. This procedure yields three measures: a ready latency (response latency between the appearance of a ready prompt and when the operator indicates that s/he is ready), a probe response latency (response latency between the presentation of the question and operator response), and a probe accuracy score. The ready latency is considered a measurement of workload because the operator should be able to indicate that s/he is ready more quickly when s/he is not busy. That is, the lower the workload, the shorter the ready latencies. The probe response latency can be used as an indicator of SA because the operator should take less time to answer questions when the information needed to answer the question is easily accessible (either in his/her working memory or s/he knows where to look for the information). In other words, shorter response times suggest better SA.
Conflict Resolution Automation and Pilot Situation Awareness
475
3 Automation Affects Situation Awareness The implementation of automation can vary in terms of degree, with each level of increasing automation having an impact on situation awareness. In cases where automation is completely responsible for undertaking a task, humans may be thrown completely out-of-the-loop leading to complacency [3]. When complacent, the operator no longer proactively seeks to maintain awareness of task relevant information in the environment leading to diminished SA. SA can also be diminished when the level of automation provided does not adequately support the task or imposes high workload. When workload is high, cognitive tunneling can occur where the operator is forced to selectively and narrowly attend to the primary task, reducing the cognitive resources needed to monitor or process other task relevant components [9]. However, a performance benefit can be gained from reduced workload without trading off SA if the human operator is kept “in the loop” by interacting with automation to complete tasks [10]. Dao et al. [10] examined the impact of varying levels of automation on individual pilot SA. Pilots were asked to perform a traffic conflict avoidance task with and without the support of automation. On manual trials, pilots were given a null resolution (no change to route) which they had to modify in order to resolve the conflict. On automated trials, pilots were given a resolution proposed by an automated system, which they could evaluate to ensure that it does solve the conflict, but could not modify it for efficiency or other preferences. On interactive trials, pilots were given an automation-proposed resolution that they could accept as is or revise to improve it based on his/her preference. Pilots were probed for SA at the end of each trial, when the scenario was frozen, but all displays were still active and in sight. Results showed that pilot SA was lowest in the automated condition when compared to the manual and interactive conditions; there were no differences between the manual and interactive conditions. Low SA in the automated condition suggests that factors such as automation complacency had a significant impact on SA. Additionally, comparable SA found in the interactive and manual conditions suggest that an interactive, human-in-the-loop implementation of automation would provide better support for SA than at fully automated levels. Because Dao et al.’s [10] study examined short, 2-minute conflict scenarios, it is not clear whether the same effect of automation would be observed when pilots must fly longer scenarios that involve different phases of flight as well and where they have additional responsibilities. Thus, the present study expands on Dao et al.’s findings by examining pilot SA when separation responsibility is distributed between pilots, controllers, and automation in longer, 80-minute scenarios.
4 Current Study Pilots and controllers engaged in real-time simulations focused on trajectory-based en route and arrival operations into Louisville International-Standiford Field Airport (SDF). In trajectory-based operations controllers and pilots attempt to maintain
476
A.-Q.V. Dao et al.
complete trajectories at all times, modifying complete trajectories rather than using temporary vectors. Although both pilot and controller SA was a focus in this study, this paper will only focus on the pilot’s SA. Situation awareness for pilots was examined under three concepts of operations and under high en route traffic density (three times normal). In all three concepts the pilots were responsible for engaging in an interval management task (often referred to as merging and spacing). In Concept 1, experimental pilots had onboard conflict detection and resolution tools (CD&R) and were responsible for interval management, for autonomous weather avoidance, and conflict resolution/separation assurance (they did not have to obtain concurrence from ATC). In Concept 2, experimental pilots again had CD&R tools and were responsible for interval management. Pilots were also responsible for generating conflict free weather avoidance route modifications but, unlike in Concept 1, they had to downlink proposed routes for concurrence from the ATC (who was responsible for separation assurance). Concept 3, was similar to Concept 2, but without flight deck CD&R. As a result pilots often could not see traffic conflicts on their proposed routes, requiring the ATC to modify them. Based on results from Dao et al. [10], it was predicted that pilot SA would be greatest when operators were involved in the decision making process. Therefore, better pilot SA scores were predicted for Concept 1 and 2 than for Concept 3.
5 Method 5.1 Participants Eight commercial airline pilots with glass cockpit experience were recruited for this experiment. They were compensated $25/hr for their participation. 5.2 Apparatus Pilots in the simulation managed a desktop simulator that included the Cockpit Situation Display (CSD), a PC-based interactive 3-D volumetric display developed by the NASA Ames Flight Deck Display Research Laboratory (see Fig. 1). The CSD provides pilots with the location of surrounding traffic and weather, plus the ability to view planned 4-D trajectories [11]. Although both standard airborne weather radar depictions, and advanced 3D weather depictions were examined, results were not presented as part of the present report. The CSD contained logic that detected and highlighted simulated conflicts and was 100% reliable. In addition, the CSD had pulse predictors that emitted synchronous bullets of light that traveled along the displayed flight plans at a speed proportional to the speeds of the associated aircraft. Using these functions (conflict detection and pulse), a prediction of up to 20 minutes into the future could be made, graphically depicting ownship proximity to traffic along the planned route.
Conflict Resolution Automation and Pilot Situation Awareness
477
Fig. 1. Plan View of 3-D Cockpit Situation Display (3-D CSD) with Weather Radar
Pilots modified the flight path of ownship for weather and traffic avoidance using the Route Assessment Tool (RAT) [12], a graphical tool that permitted them to move, insert and delete waypoints. In Concepts 1 and 2, the RAT was linked to conflict detection software allowing pilots to find conflict-free paths. In Concept 3 conflict detection was disabled. The interval management task was implemented using the NASA Langley ASTAR algorithms [13]. When engaged, ASTAR calculated speed adjustments designed to achieve a time-in-trail of the leading aircraft at the runway. A spacing error time, how early or late the aircraft was expected to be at the runway, was displayed in seconds. When coupled with the auto throttles, the spacing tool gradually modified the aircraft speed to achieve the assigned spacing interval.
Fig. 2. Multi-Aircraft Control System (MACS)
The Multi-Aircraft Control System (MACS) [14] provided an underlying 747 aircraft simulation, plus a display of flight deck instruments and controls (Fig. 2). These included a primary flight display (PFD), a mode control panel (MCP), a data link display and controls for sending/receiving data link messages and new routes from the ground or automation, as well as flaps and gears for landing procedures. Uplinked route modifications from the controller appeared in the data link window
478
A.-Q.V. Dao et al.
from which they were loaded, visually examined by the pilot on the CSD, and, if acceptable, directly loaded into the flight management system. MACS, a highly versatile piece of software, also provided the interface for controllers and pseudopilots. However, since the controllers’ activities are not specifically germane to the present report on pilot SA, the reader is directed to a separate book chapter [15] for details of their tasks. Workload and situation awareness probe questions were administered using a separate touch screen tablet computer. All probes required a yes/no or multiple choice response (equal number of yes/no and multi-choice questions per pilot per scenario). 5.3 Design and Procedure The independent variable was Concept [Concept 1: Pilot Responsible with Flight Deck CD&R, Concept 2: Ground (Controller) Responsible with Flight Deck CD&R, Concept 3: Ground (Auto-Resolver Agent) Responsible without Flight Deck CD&R] and the dependent measures were the three metrics obtained from the probes (ready latencies, probe latencies, and probe accuracy). Participants completed three blocks of four trials over four data collection days. Each block was grouped by Concept and was presented once per day. Two trials were repeated on the fourth data collection day due to software malfunctions. Each trial lasted approximately 80 minutes. Participants received classroom training prior to data collection days and were provided three practice trials, one with each concept level. Experimental pilots started each scenario in an en-route phase of flight during which they initiated an interval management operation (also known as merging and spacing) that continued through the arrival into SDF. Approximately 2 minutes into each scenario the pilots received and implemented their interval management instructions, which included spacing interval and lead aircraft, from an air traffic control (ATC) ground scheduling station. Subsequently pilots also needed to use the RAT to make, or request, a route modification to avoid hazardous weather. In Concepts 1 pilots were responsible for avoiding and resolving all traffic conflicts, and in Concept 2 for generating weather deviation requests that were conflict free. Thus, pilots in all concepts adjusted their route relative to the weather based on their own safety criteria and, in concepts 1 and 2, with respect to the constraints imposed by surrounding traffic. In addition to experimental pilots, confederate “pseudo-pilots” were used to manage the background traffic. This traffic was set at three times today’s level to be consistent with the future traffic levels expected when the concepts being explored in this simulation may be implemented. Pilots received a ready prompt for one SA question every 3 minutes from the start of each trial. Pilots were instructed to press the “ready” button on the touch screen panel to reveal the question. The simulation did not stop while they were answering the questions, and they were allowed to reference information on the displays (see Table 1 for example SA questions). The display timed out after one minute of nonresponsiveness for both the ready prompt and the probe question. Pilots completed a trial when they landed in SDF.
Conflict Resolution Automation and Pilot Situation Awareness
479
Table 1. Sample Situation Awareness Questions
In the next 5 minutes how many aircraft will be within 10nm and 2000ft of ownship? What is the heading of the aircraft closest to you? How many times did ownship change speed more than 5 knots in the last five minutes? Is the difference in heading between ownship and lead less than 10 degrees?
6 Results One participant’s data was removed from analyses due to non-compliance with probe procedures. Timeouts, or when the participant did not respond to either the ready button or select their response (presumably because workload was too high to attend to the probe questions) occurred 9% of the time. 6.1 Timeouts A repeated measures ANOVA of timeouts as a function of Concept showed no significant effect (F(2, 12) = 1.99, p = .18). Although not significant, the data pattern suggested that pilots attended to the probe questions more when they were responsible for traffic separation. In Concept 1 (Pilot Responsible), pilots timed out on 3.8% of the ready prompts compared to 4.6% in Concept 2 (Controller Responsible) and 7.9% in Concept 3 (Auto-Resolver Agent Responsible). Although not significant, the pattern suggests pilots attended to the probe questions more when they were responsible for traffic separation. This pattern is consistent with workload findings reported in Ligda et al. [16]. 6.2 Analyses of Ready Response Latency A natural log transformation was performed on all response latency data, given the non-normal distribution. A repeated measures ANOVA was performed for each SA probe measure with Concept as a factor. The p-values were adjusted using Greenhouse-Geisser for violations of Sphericity where appropriate.
Fig. 3. Response Latencies by Concept
480
A.-Q.V. Dao et al.
A repeated measure ANOVA for ready response latencies (in seconds) was performed, with Concept as a factor. There were no significant differences in ready prompt latencies by Concept (F(1.15, 6.87) = 2.36, p = .17). Pilots took less time to respond to the ready prompt in Concept 1 (pilot primary) and Concept 2 (controller primary) than in Concept 3 (auto-resolver primary), see Fig. 3. The pattern of results was also consistent with workload findings from the same study reported in Ligda et al. [16]. 6.3 Analyses of Probe Response Latency Probe response latencies (in seconds) were submitted to a repeated-measures ANOVA, with Concept as a factor. A significant effect of Concept on probe response latency was found, (F(2, 12) = 4.01, p = .046), see Fig. 3. Post-hoc comparisons indicated that pilots were faster answering the SA questions in Concept 1 compared to Concept 3, p = .05. Again, the pattern of the means was in the same direction as hypothesized. This suggests that when pilots were responsible for separation, they had the lowest probe response latency, implying they had the best SA. 6.4 Analyses of Percent Correct Responses to SA Probes The percent correct responses to the SA questions were analyzed in a similar manner. There was no effect of Concept, F(1.06, 6.33) = 2.34, p = .18; however, the direction of the means was consistent with the probe response latency findings. In Concept 1, pilots correctly answered 84% of the SA probes compared to 81% in Concept 2 and 79% in Concept 3. Again, this pattern suggests that pilots have better SA when they are responsible for separation. An additional analysis was performed that examined probe response latencies as a function of probe accuracy. There was a significant difference between probe response latency for correct versus incorrect SA questions (t=(1, 6) = 2.95, p = .03). Overall, pilots responded quicker to questions they answered correctly (M=14.85 sec) than questions they answered incorrectly (M=18.13 sec). This is consistent with Durso’s [8] proposition that shorter response times suggest better SA.
7 Conclusion Pilot situation awareness in the conflict avoidance task was improved when they remained in the decision-making loop. This finding is consistent with that obtained by Dao et al. [10]. Although not significant, the pattern of the results observed for timeouts and response latencies to the probe questions suggest that an intermediate level of automation introduced in Concept 1 can be implemented to help reduce workload. Furthermore, the improved SA scores in the Concept 1 condition where pilots remained involved in the conflict resolution task showed that reduced workload can be achieved without a high cost to SA. The presence of diminished pilot situation awareness under conditions where the automation carried greater responsibility for air traffic separation and where pilots were not involved in the decision-making suggests that automation mistrust or complacency factors could play a greater role in influencing pilot situation awareness
Conflict Resolution Automation and Pilot Situation Awareness
481
[3]. Also under these same high automation conditions, mistrust in the automation may have lead to over-monitoring of system behavior and subsequently increasing workload – as shown by higher workload patterns in the Concept 3 condition [9]. SA probe latencies with the online probe technique were found to be a more sensitive measure of SA than probe accuracy (see also [7]). The fact that the SA probe latencies were able to distinguish between levels of automation suggest that they are good tools that can be used in the evaluation of operator SA in future ATM concepts. Findings from this study demonstrate that automated decision support tools can be introduced to the flight deck without significant loss of SA, and that it is possible to keep the operator in the decision-making loop without the burden of high workload. Thus future flight deck system designs should focus on designs that support interaction between the operator and automation. In addition, future studies may implement the SA and workload probe techniques described in this study to examine how to optimally distribute roles and responsibilities between the human operator and automation. Acknowledgement. This study was supported by the NASA Concepts and Technology Development Project, and in collaboration with NASA cooperative agreement (NNA06CN30A) researchers. These researchers, located at Cal State University Long Beach, Cal State University Northridge, and Purdue University, provided pseudopilots and controllers as part of a distributed simulation network. All participant pilots were tested at NASA Ames FDDRL. We thank Kim-Phuong Vu, Vernol Battiste, and Tom Strybel for their comments on prior versions of this paper.
References 1. Joint Planning and Development Office. Concept of operations for the Next Generation Air Transportation System, Version 2.0. Washington, DC (2007) 2. Prevot, T., Homola, J., Mercer, J., Mainini, M., Cabrall, C.: Initial Evaluation Of Air/Ground Operations With Ground-Based Automated Separation Assurance. In: Proceedings of the 8th USA/Europe Air Traffic Management Research and Development Seminar, Napa, CA (2009) 3. Parasuraman, R., Sheridan, T.B., Wickens, C.D.: A Model For Types And Levels Of Human Interaction With Automation. IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans 3, 286–297 (2000) 4. European Air Traffic Management Programme. The Development of Situation Awareness Measures in ATM Systems. HRS/HSP-005-REP-01 (2003) 5. Durso, F., Rawson, K., Girotto, S.: Comprehension and Situation Awareness. In: Durso, F., Nickerson, R., Dumais, S., Lewandowsky, S., Perfect, T. (eds.) Handbook of Applied Cognition, 2nd edn. Wiley, Hoboken (2007) 6. Endsley, M.R.: Measurement Of Situation Awareness In Dynamic Systems. Human Factors 37(1), 65–84 (1995) 7. Pierce, R., Strybel, T., Vu, K.-P.L.: Comparing Situation Awareness Measurement Techniques In A Low Fidelty Air Traffic Control Simuluation. In: Proceedings of the 26th International Congress of the Aeronautical Sciences (ICAS), Anchorage, AS (2008) 8. Durso, F.T., Bleckley, M.K., Dattle, A.R.: Does Situation Awareness Add To The Validity Of Cognitive Tests? Human Factors, 721–733 (2006)
482
A.-Q.V. Dao et al.
9. Parasuraman, R., Wickens, C.D.: Humans: Still Vital After All These Years Of Automation. Human Factors 3, 511–520 (2008) 10. Dao, A.-Q.V., Brandt, S.L., Battiste, V., Vu, K.-P.L., Strybel, T., Johnson, W.W.: The Impact Of Automation Assisted Aircraft Separation On Situation Awareness. In: Salvendy, G., Smith, M.J. (eds.) HCI International 2009. LNCS, vol. 5618, pp. 738–747. Springer, Heidelberg (2009) 11. Granada, S., Dao, A.Q., Wong, D., Johnson, W.W., Battiste, V.: Development And Integration Of A Human-Centered Volumetric Cockpit Display For Distributed AirGround Operations. In: Proceedings of the 12th International Symposium on Aviation Psychology, Oklahoma City, OK (2005) 12. Canton, R., Refai, M., Johnson, W., Battiste, V.: Development And Integration Of HumanCentered Conflict Detection And Resolution Tools For Airborne Autonomous Operations. In: Proc. 12th International Symposium on Aviation Psychology, Oklahoma City, OK (2005) 13. Abbott, T.S.: Speed Control Law For Precision Terminal Area. NASA Technical Memorandum 2002-211742. National Aeronautics and Space Administration, Hamptom (2002) 14. Prevot, T.: Exploring The Many Perspectives Of Distributed Air Traffic Management: The Multi Aircraft Control System MACS. In: International Conference on Human-Computer Interaction in Aeronautics, pp. 23–25 (2002) 15. Vu, K.-P.L., Strybel, T.Z., Battiste, V., Johnson, W.W.: Factors Influencing The Decisions And Actions Of Pilots And Controllers In Three Plausible NextGen Environments. In: Proctor, R.W., Nof, S., Yih, Y. (eds.) Cultural Factors in Systems Design: Decision Making and Action.. CRC Press, Boca Raton (in press) 16. Ligda, S.V., Dao, A.-Q.V., Strybel, T.Z., Vu, K.-P., Battiste, V., Johnson, W.: Impact Of Phase Of Flight On Operator Workload In A Distributed Air Traffic Management System. In: 54th Annual Meeting of the Human Factors and Ergonomics Society, San Francisco, CA (2010)
Effect of ATC Training with NextGen Tools and Online Situation Awareness and Workload Probes on Operator Performance Ariana Kiken1, R. Conrad Rorie1, L. Paige Bacon1, Sabrina Billinghurst1, Joshua M. Kraut1, Thomas Z. Strybel1, Kim-Phuong L. Vu1 , and Vernol Battiste2 1 California State University, Long Beach, Center for Human Factors in Advanced Aeronautics Technologies 1250 N. Bellflower Blvd., Long Beach, CA 90840, USA 2 San Jose State University Foundation and NASA Ames Research Center Moffett Field, CA 94035, USA [email protected], [email protected], {paigebacon86,sabrinabillinghurst,krautjosh}@gmail.com, {tstrybel,kvu8}@csulb.edu, [email protected]
Abstract. The purpose of the present study was to examine (a) how controller performance changes with the introduction of NextGen tools and (b) how much training is needed for controllers to achieve a performance criterion after the tools have been introduced. Seven retired controllers were trained on an enroute sector in three phases: voice, Data Comm, and online probe. The voice phase trained current-day air traffic management techniques, the Data Comm phase trained NextGen tools, including Data Comm, conflict alerting, and conflict probes, and the probe phase trained controllers on an online probing technique. Although safety was not affected by the introduction of NextGen tools, the tools disrupted operator sector efficiency performance. Keywords: Training, NextGen, Air Traffic Control.
1 Introduction The introduction of Next Generation Air Transportation System (NextGen) tools and technologies into the National Airspace System (NAS) over the next two decades is expected to impact the way in which air traffic controllers (ATCs) perform their duties. NextGen tools, such as conflict probes, conflict alerts, and trial planners, are expected to reduce controller workload by automating the tasks of conflict detection and flight projection. Reductions in workload, however, may be accompanied by reductions in situation awareness (SA) and will possibly result in controller errors. Moreover, in the near-term, the implementation of NextGen tools will likely occur incrementally, requiring controllers to integrate NextGen tools with their current traffic management tasks. The combination of NextGen equipped and unequipped aircraft (AC) in the same airspace may provide controllers with challenges, as the two types of AC may require different traffic management skills. Thus, successful G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 483–492, 2011. © Springer-Verlag Berlin Heidelberg 2011
484
A. Kiken et al.
implementation of NextGen tools and concepts will depend on effective training of existing and future controllers regarding the purpose of the tools and how they can be used most effectively. The purpose of the present study was to examine (a) how controller performance changes with the introduction of NextGen tools and (b) how much training is needed for controllers to achieve a performance criterion after the tools have been introduced. We also evaluated the extent to which controller performance was affected by the addition of online workload and SA probes because the evaluation of the effectiveness of NextGen tools requires that ATC workload and SA be assessed. In general, controller training is necessary for increasing performance and facilitating retention of newly-learned information and skills [1]. Training also plays a crucial role in ensuring that rudimentary skills and knowledge can be transferred to a variety of situations in which air traffic controllers operate [2]. As such, controllers go through a rigorous training process from basic courses in air traffic control, to FAA training in Oklahoma City, and on-the-job-training at their facility. A possible result of proper training is a schema, a mental representation of a situation, which allows problems to be categorized and solved based on previously learned information and experiences [3]. Controllers can develop a schema for the sector that they are managing through repeated exposure to the sector in the course of their training. Different training methodologies are indicated for different circumstances [1]. For example, part-whole training is a methodology in which subtasks are learned incrementally before they are combined and trained as a whole task [1]. This type of training can be useful for learning complex tasks, such as air traffic management (ATM) [1]. A repetitive-part variant of the part-whole training schedule organizes training of subtasks to be gradually combined, allowing each subtask to be mastered and then practiced together with previously learned subtasks before a new part is added [1]. One advantage of this training schedule, as compared with a whole-task training schedule, is that the incremental mastery of subtasks may help mitigate overloading the learner’s cognitive resources by allowing each subtask to be mastered before beginning a new subtask. High cognitive load during training becomes a concern because it can hinder the acquisition of a new schema, making training less efficient and more time consuming [3]. With regard to research of NextGen tools, the application of part-whole training is useful because it allows controllers to practice and maintain their current skills before adding the new skills associated with NextGen tools. The two most important aspects of an air traffic controller’s job are to ensure safety by keeping aircraft separated and maintain sector efficiency by moving traffic through the airspace as quickly as possible without sacrificing safety [4]. Current radarcertified controllers have demonstrated the ability to reliably manage traffic under current day levels; however, air traffic is expected to increase 2-3X in the next 15-20 years [5]. Controllers will not be able to manage such an increase in traffic density without reliance on NextGen tools [5]. As a result, current-day controllers need to be trained in the use of the new tools and the integration of the tools with their current skill sets. In addition to evaluating controller performance with NextGen, it is also important to evaluate their workload and situation awareness (SA) because both of these factors affect human performance [9]. Online probe methods have been identified as a promising technique for measuring operator workload and SA in NextGen environments [6]. One
Effect of ATC Training with NextGen Tools and Online Situation Awareness
485
such technique, Situation Present Awareness Method (SPAM), presents questions to operators about information in their environment while the task is being performed [7]. The SPAM procedure measures workload, with the latency to a ready prompt, and utilizes a combination of response accuracy and response time to measure SA [7]. Having the controller continue to perform his/her primary task at the time a question is asked is both a benefit and disadvantage of SPAM. Because the displays remain active, the operator may refer back to the environment for information pertinent to the question being asked. The response time is indicative of the distribution of information between the operator’s memory and the display. Some have noted that a disadvantage of the method is that it may either interfere with the operator’s primary task or create additional workload [8, 9, 10]. In the studies that use online probes, operators typically receive minimal training on the probe technique itself and on performing the primary task while answering probe questions [e.g. 9]. In order to evaluate whether training may reduce the intrusiveness of the online probe technique, we evaluated how operator performance changes when operators are trained to timeshare the tasks of air traffic management and responding to SA probes. The present study examines the effectiveness of ATC training on NextGen automation tools and online probe techniques. Following a repetitive-part training schedule, controllers first practiced managing traffic utilizing current day methods to familiarize them with a new sector. Subsequently, they were introduced to three potential NextGen tools: controller-pilot Data Communication (Data Comm), conflict altering, and conflict probing. Finally, ATCs practiced responding to SA and workload questions presented using an online probe technique while managing NextGen equipped and unequipped aircraft. The training phase reported in the present study was part of a larger simulation on situation awareness in a near-term NextGen environment. Results from other parts of the simulation are reported in papers by Kraut et al. (in press) [11], Strybel et al. (in preparation) [12], and Bacon et al. (in preparation) [13].
2 Method 2.1 Participants Seven retired, radar-certified ATCs (M = 25.8 years of ATM experience) completed both a training and experimental phase of this simulation study. As noted earlier, the present paper only reports data from the training phase of the study. 2.2 Apparatus The simulation was run with the Multiple Aircraft Control System (MACS) and Aeronautical Datalink and Radar Simulator (ADRS) software, developed by NASA Ames Airspace Operations Laboratory [14]. Participant controllers were trained on combined sectors in the Indianapolis Air Route Traffic Control Center (ZID 91 & ZID 81). Within these sectors are two arrival streams to Louisville International Standiford Airport (SDF) from the west, departures from SDF and overflights. Aircraft in the simulation were flown by “pseudopilots” who responded to controller commands for both voice only and Data Comm equipped aircraft. ATCs
486
A. Kiken et al.
communicated with pseudopilots in an adjoining room via DagVoice, a VoiceIP software [15]. Twenty 50-minute scenarios were used. The scenarios were populated with 25 to 58 aircraft that were either voice only, NextGen equipped (allowed use of Data Comm, conflict alerting, and conflict probes), or a combination of voice only and NextGen equipped aircraft. Mixed equipage scenarios were intended to model the gradual introduction of NextGen into the NAS. 2.3 Procedure Training took place over five days, beginning with a one-hour briefing regarding the general purpose of the study, the ZID 91/81 airspace, controller roles and responsibilities and NextGen tools. Controllers were then shown the components of the DSR station and were trained on the sector procedures using MACS software. Training took place in 3 phases. In the first phase, controllers were trained to manage traffic in the sector using current day procedures and only voice communication. In the second phase, Data Comm, conflict alerting, and conflict probes, were introduced. Data Comm is a digital controller-pilot communication system. Aircraft that are Data Comm equipped show conflict alerts up to 8 minutes in advance of a loss of separation only when the 2 aircraft in conflict are Data Comm equipped. Conflict alerts appear as flashing data tags on the controller’s screen, alerting them to the conflict. Conflict probes allow the controller to bring up a trial planner to assess route modifications for potential traffic conflicts. The second training phase began after the controllers were familiar with traffic flows in the sector, thus, the controllers were allowed to learn the NextGen tools and integrate them with the traffic management strategies they developed in the first phase. In the third phase, controllers utilized voice communication and NextGen tools for managing unequipped and equipped aircraft, respectively, and practiced answering probe questions. The administration of these probes followed the SPAM technique in which a ready prompt was given to the controllers when a probe question was available. Probe questions were presented and answered on a separate, touch-screen display. Once the controller indicated being ready, a SA or workload probe question was presented. The three training phases will subsequently be referred to as voice, Data Comm, and probe phases. Each participant completed between 18 and 21 training trials. Data from any trials beyond 18 were not included in these analyses. The number of trials spent in a particular training phase was determined by the instructor who evaluated each controller’s performance after a simulation run. Trials with missing data due to recording errors were not used in the analysis. Missing data comprised less than 2% of the total trials. Controllers were trained to a criterion of two consecutive 50-minute trials with no losses of separation before introducing the online workload and SA probe procedure.
3 Results Controller performance was measured in terms of sector safety and efficiency. Unsafe conditions exist when two aircraft lose separation, meaning that two aircraft were within either 5 nm laterally or 1000 ft vertically of each other. Consequently, safety was measured by tracking the number of losses of separation (LOS). ATCs can ensure that aircraft stay on-schedule by adjusting aircraft altitude, heading, and speed in
Effect of ATC Training with NextGen Tools and Online Situation Awareness
487
order to get the aircraft to their destination in the most direct manner [4]. Sector efficiency was measured in a variety of ways, including the distance and time taken for aircraft to travel through the controller’s sector. 3.1 Training As shown in Table 1, the number of trials completed by each controller in the three training phases differed, F(2,12) = 4.57, p < .05. On average, controllers spent more trials in the Data Comm (M = 7, SEM = .58) and probe (M = 6.71, SEM = .71) phases than in the voice (M = 4.29, SEM = .36) phase. Table 1. Number of trials completed by each controller in the three training phases Controller
Voice
Data Comm
Probe
1 2 3 4 5 6 7
5 5 5 5 4 3 3
9 6 9 5 6 7 7
4 7 4 8 8 8 8
The effect of the online probe technique on controller performance was determined by comparing the number of trials needed for controllers to reach criterion (2 consecutive trials with no LOS) before and after the introduction of probe questions. Only one controller was unable to reach criterion after the introduction of probe (see Table 2). For five controllers, the criterion was reached in the minimum number of trials following the introduction of probe. One exception was controller 2, who required 6 trials to reach criterion after the introduction of probe despite reaching criterion in the first 2 trials before the probes. This finding was likely due to a malfunction in the software during Controller 2’s first trial with the probes. Table 2. Number of trials to reach criterion (2 consecutive trials with 0 LOS) before and after introduction of the online probing technique Controller
Trials to Criterion Before Probe
Trials to Criterion After Probe
1
4
2
2
2
6
3
7
*
4
8
2
5
4
2
6
9
2
7 9 2 *Controller 3 did not reach criterion after the introduction of probe in the allotted training time.
488
A. Kiken et al.
3.2 Safety Average LOS. On average, controllers had less than 1 LOS per scenario during the voice phase (M=.8; SEM= .15). For the Data Comm and probe phases, the average number of LOS decreased and were roughly equivalent (Data Comm: M = .47, SEM = .12; probe: M = .51, SEM = .16). A one-way repeated measures ANOVA was conducted on the mean number of LOS with phase as a factor; however, it was not significant, F(2, 12) = 2.06, p = .137 (see Fig. 1). This was probably due to the fact that the range of numbers of LOS (0-4) was very small and the mode was 0 LOS. Five of the 7 controllers showed an average decrease in LOS from the voice phase to the Data Comm phase, and 1 controller (Controller 2) showed no change because he committed 0 LOS during these phases. Four of the 7 controllers showed an increase in average LOS with the introduction of the probes; however, 3 of these controllers had one trial with 2-3 LOS and no more than 1 LOS per trial in the remaining trials. Three of the 7 controllers showed a decrease in the number of LOS during the probe phase. 3.3 Efficiency Repeated measures ANOVAs were conducted on two efficiency measures as a function of the training phase. Efficiency was measured by average distance traveled through the sector and average time taken by an aircraft to pass through the sector.
ControllerNumber
Fig. 1. Average number of LOS by controller and training phase
Average Distance through the Sector. A significant effect of training phase was found, F(2, 12) = 25.36, p < .001. Figure 2 shows the average distance for each phase by controller. Bonferroni post-hoc tests indicate that the average distance through the sector for the voice phase of training (M = 77.24 nm; SEM = .81 nm) was significantly lower than for the Data Comm phase (M = 81.61 nm; SEM = .83 nm), p = .002, as well as the probe phase (M = 79.83 nm; SEM = .80 nm), p = .006. A marginally significant difference was found between the Data Comm and probe phases, p = .08.
Effect of ATC Training with NextGen Tools and Online Situation Awareness
489
ControllerNumber
Fig. 2. Average AC distance traveled through the sector by controller and training phase
ControllerNumber
Fig. 3. Average aircraft time through the sector by controller and training phase.
490
A. Kiken et al.
Average Time through the Sector. The average time spent in the sector is related to the distance traveled. Therefore, it was not surprising that similar results were found. There was a significant effect of training phase for average time through the sector, F(2, 12) = 28.82, p < .001. Bonferonni post-hoc tests indicated that the average time through the sector for the voice phase (M = 611.08 s; SEM = .81s) was significantly lower than the average time through the sector for the Data Comm phase (M = 649.36 s; SEM = .83 s), p = .001, and the probe phase (M = 635.28 s; SEM = .80 s), p = .003 (see Fig. 2). The difference between the Data Comm and probe phases was not significant, p = .133.
4 Discussion Individual differences between controllers were seen in the number of trials spent in each training phase. The additional trials required in the Data Comm phase as compared with the voice phase suggest that learning sector characteristics was easier than learning NextGen tools because controllers had extensive experience with current-day ATM techniques. As described by Sweller [3], the high cognitive load of learning new Data Comm commands and how to use the trial planner may have hindered schema acquisition for handling a sector with aircraft of mixed equipage, necessitating additional trials to learn the sector. As assessed by number of trials to reach criterion, the introduction of the online probe technique did not disrupt the performance of most controllers with respect to safety. For 5 of the 7 controllers, criterion was reached in the minimum number of trials after the introduction of probe. Moreover, no effect of phase on LOS was found. Neither the addition of Data Comm and the other NextGen tools, nor the addition of the probe task significantly affected the ability of controllers to maintain aircraft separation. This finding is not surprising considering the primary role of an ATC is to ensure safe separation between aircraft, and the controllers in the present study were highly experienced (M = 25.8 years experience) with current-day air traffic management techniques. The phase of learning did, however, have an effect on both efficiency measures (average distance and average time through sector), with the trend of controllers moving aircraft more efficiently through the sector in the voice phase than the Data Comm or probe phases. Part of the benefit observed for the voice phase could be related to the ease of the scenarios in the first phase of training. For both average efficiency measures, the least efficient phase was the phase in which Data Comm and the trial planner were introduced. With the introduction of NextGen tools, efficiency initially declined but gradually improved with training. The reduction in efficiency seen in the Data Comm condition may be due in part to the cognitive demands of learning the Data Comm commands as well as simply learning how to use the trial planner tool to adjust aircraft routes in response to conflict probes. The introduction of probe questions also resulted in an initial decline of efficiency that was subsequently improved with practice. However, several additional training trials appeared to be necessary in order for controllers to reach the same levels of efficiency
Effect of ATC Training with NextGen Tools and Online Situation Awareness
491
seen in the voice training phase. It should be noted that the marginally-significant increase in efficiency (see Fig. 1) experienced by controllers in the probe phases was most likely indicative of a practice effect and not necessarily an effect of the online probe technique improving controller sector management skills. In summary, the present preliminary examination of training effectiveness on NextGen tools and online probe techniques showed that safety was not affected by the introduction of NextGen tools in the task environment examined, but sector efficiency may have been adversely affected. Note, however, that our results were obtained with controllers who were highly experienced in current-day air traffic management. For novice controllers, developing training programs that integrate NextGen training with basic ATM skills may be critical for the success of NextGen, both in the near and far term. Acknowledgements. This study was supported by NASA cooperative agreement NNX09AU66A, Group 5 University Research Center: Center for the Human Factors in Advanced Aeronautics Technologies (Brenda Collins, Technical Monitor).
References 1. Proctor, R.W., Van Zandt, T.: Human factors in simple and complex systems. CRC, Boca Raton (2008) 2. Salden, R.J.C.M., Paas, F., Broders, N.J., Van Merriënboer, J.J.M.: Instr. Sci. 32, 153–172 (2004) 3. Sweller, J.: Cognitive Load During Problem Solving: Effects on Learning. Cognitive Sci. 12, 257–285 (1988) 4. Wickens, C.D., Mavor, A.S., McGee, J.P.: Flight to the future: Human factors in air traffic control. National Academy Press, Washington (1997) 5. Joint Planning and Development Office. Concept of Operations for the Next Generation Air Traffic Transportation System Version 1.2 (2007), http://www.jpdo.gov/library/NectGenOpsv12.pdf 6. Chiappe, D.L., Strybel, T.Z., Vu, K.-P.L.: Review of Situation Awareness Theories from the Perspective of Cognitive Science (2010) (Submitted for publication) 7. Durso, F.T., Dattel, A.R.: SPAM: The real-time assessment of SA. In: Banbury, S., Tremblay, S. (eds.) A cognitive approach to situation awareness: Theory and application, pp. 137–154. Aldersot, Ashgate, UK (2004) 8. Endsley, M.R.: Measurement of Situation Awareness in Dynamic Systems. Hum. Factors 37(1), 65–84 (1995b) 9. Strybel, T.Z., Vu, K.-P.L., Kraft, J.: Assessing the Situation Awareness of Pilots Engaged in Self Spacing. In: Proceedings of the Annual Meeting of the Human Factors and Ergonomics Society, pp. 11–15. HFES, NY (2008) 10. Pierce, R.S., Strybel, T.Z., Vu, K.-P.L.: Comparing Situation Awareness Measurement Techniques in a Low Fidelity Air Traffic Control Simulation. In: 26th International Congress of the Aeronautical Sciences, pp. 1–8. Anchorage, AK (2008) 11. Kraut, J.M., Kiken, A., Billinghurst, S., Morgan, C.A., Strybel, T.Z., Chiappe, D., Vu, K.P.L.: Effects of Data Communications Failure on Air Traffic Controller Sector Management Effectiveness, Situation Awareness, and Workload. To be presented at Human Computer Interaction International (2010)
492
A. Kiken et al.
12. Strybel, T.Z., Vu, K.-P.L., Bacon, L.P., Kiken, A., Billinghurst, S.S., Rorie, R.C., Morgan, C.A., Battiste, V., Johnson, W.: Situation Awareness, Workload, and Performance in Midterm NextGen: Effect of Dynamic Variations in Aircraft Equippage Levels. To be presented at the International Symposium on Aviation Psychology. Daton, OH (2011) 13. Bacon, L.P., Strybel, T.Z., Vu, K.-P.L., Kraut, J.M., Ngyuen, J.H., Battiste, V., Johnson, W.: Situation Awareness, Workload, and Performance in Midterm NextGen: Effect of Variations in Aircraft Equipage Levels between Scenarios. To be presented at the International Symposium on Aviation Psychology. Daton, OH (2011) 14. Prevot, T.: Exploring the Many Perspectives of Distributed Air Traffic Management: The Multi Aircraft Control System MACS. In: Proceedings of the HCI-Aero, pp. 149–254 (2002) 15. Canton, R., Refai, M., Johnson, W.W., Battiste, V.: Development and Integration of Human-Centered Conflict Detection and Resolution Tools for Airborne Autonomous Operations. In: Proceedings of the 15th International Symposium of Aviation Psychology, Oklahoma State University, Columbus (2005)
Effects of Data Communications Failure on Air Traffic Controller Sector Management Effectiveness, Situation Awareness, and Workload Joshua M. Kraut, Ariana Kiken, Sabrina Billinghurst, Corey A. Morgan, Thomas Z. Strybel, Dan Chiappe, and Kim-Phuong L. Vu California State University Long Beach, Center for Human Factors in Advanced Aeronautics and Technologies 1250 N Bellflower Blvd. Long Beach, CA 90840, USA {krautjosh,aegkiken,sbillinghurst,coreyandrewmorgan}@gmail.com, {tstrybel,dchiappe,kvu8}@csulb.edu
Abstract. Data communications (Data Comm) is a tool needed to implement future concepts of air traffic management envisioned by NextGen. A combination of voice and pilot-controller data communications will allow the National Airspace System to handle 2-3X current day traffic by 2025. The performance, situation awareness, and workload of seven air traffic controllers was analyzed in a medium fidelity, human-in-the-loop simulation where a discrete Data Comm failure occurred after several trials with completely reliable Data Comm tools. We found that the Data Comm failure resulted in increased controller workload and decreased sector efficiency performance. However, the controllers were able to maintain safety in their sectors despite the Data Comm failure. Keywords: data communications failure, aviation, air traffic controllers, NextGen.
1 Introduction The Next Generation Airspace Transportation System (NextGen) is a modernization program for air traffic management. It will change the U.S. National Airspace System (NAS) to meet the needs of an increasingly populated airspace, while maintaining safety and increasing efficiency. According to the Joint Planning and Development Office (JPDO) [1], NextGen’s goal is to allow the NAS to handle traffic levels that are 2-3X current day levels by 2025. To meet the goals of NextGen, new tools and technologies need to be developed to support NextGen’s concepts of operation. One such tool, data communications (Data Comm), is a communications tool used by air traffic controllers (ATCs) and flight crews. Data Comm allows operators to share real-time spatial, weather, and security information, and the operational status of AC. Data Comm also enables ATCs and flight crews to upload or negotiate entire flightplan changes in real-time. Although voice communication has been shown to be an efficient form of controllerpilot communication [2], it can cause information transfer problems [3]. ATC clearances G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 493–499, 2011. © Springer-Verlag Berlin Heidelberg 2011
494
J.M. Kraut et al.
may be executed by the wrong flight crew if the crew misidentified the callsign because voice transmissions are broadcasted to all ACs on frequency. If the error is detected by the ATC during the pilot readback of the clearance, the ATC must correct the flight crew, and retransmit the original clearance. Data Comm can reduce this miscommunication problem because the data comm transmissions are only received and observed by the intended recipient. Furthermore, Data Comm transmissions can be easily referenced after the transmission is received because it is stored as a text message. Therefore, Data Comm messages should reduce pilot workload because the pilot does not have to hold the clearance in his/her working memory. Despite the aforementioned benefits, Data Comm has some drawbacks as a method for ATC-pilot communication. In particular, the average transaction time for Data Comm is about twice as long as that for voice transmissions [4]. Furthermore, anecdotal evidence suggests that ATCs may incorrectly believe that a flight crew is complying with the Data Comm clearance when the ATC display merely indicates the message was transmitted [2]. ATC may be unaware that the flight crew is not executing clearances when the flight crew does not notice a Data Comm transmission, or if there is a failure in the Data Comm system. The combination of voice and Data Comm may allow each communication method to compensate for the other’s associated drawbacks. In support of this, studies have shown that workload and total number of communication actions executed are reduced when the two forms of communication are used in combination [2], [5]. However, this benefit may be lost when Data Comm fails because, under these conditions, operators will need to devote additional resources to verbally communicate with all flight crews. An increase in workload during this failure period could also take resources away from attaining and maintaining sector situation awareness (SA), and, thus, decrease operator performance. The goal of the present study was to determine the effect of Data Comm failure on operator performance, SA, and workload. In the nominal condition, Data Comm was 100% reliable. In the failure condition, Data Comm failed 10 minutes into the scenario, without controller knowledge of the failure. It was hypothesized that controllers would experience higher workload, lower SA and lower sector efficiency performance in the failure scenario than in the nominal scenario. Controllers would still be able to maintain sector safety, though, because all of the controllers are highly experienced and should be able to prioritize their workload so that their primary goal of preventing loss of human life is achieved.
2 Method 2.1 Participants Seven retired, radar-certified ATCs participated in the simulation, and were monetarily compensated for their participation. All participants were male, had an average of 25.4 years of civilian experience, and 2.4 years military experience as professional ATCs. The least amount of civilian experience reported by any of the controllers was 21 years, and the largest amount was 35 years. Three controllers reported no military experience, and one controller reported 9 years of military experience.
Effects of Data Communications Failure on Air Traffic Controller Sector Management
495
2.2 Apparatus A simulation “world” was created using the Multi Aircraft Control System (MACS) software to mimic ground-side and air-side operations in the NAS. MACS was created and distributed by the Airspace Operations Laboratory at NASA Ames Research Center [6]. The MACS software was configured to simulate the NAS and allow participants to control the sector using current-day ATC tools and displays, as well as Data Comm, conflict alerting and conflict probing. MACS was also configured to record ATC performance data. Four simulation worlds were run in parallel. A voice communications server was shared by all worlds, enabling voice communication between the pseudo pilots and their participant controllers. Each simulation world required the use of seven, dedicated workstations for running the simulation. The 7 dedicated stations were used to run: 1. The Aeronautical Datalink and Radar Simulator (ADRS), which acts as a radar emulator and communications hub for all of the other workstations. 2. The simulation manager, which plays the scenario and controls all AC not owned by the pseudo pilot’s station. 3. A pseudo pilot station, which controls all AC owned by the participant controller. 4. A ghost ATC station, which manages AC not owned by the participant controller. 5. The Creative Media Player software, which records voice communications between the participant controller and pseudo pilot. 6. An ATC station with the participant controller’s display of ZID sectors 91 and 81, which is a combined center sector that was used in the simulation to allow the ATC to handle arrival AC heading East for the Louisville/Standiford Airport (SDF). 7. A touch-screen probe panel that was placed adjacent to the participant’s primary ATC display. The touch-screen panel ran a custom Visual Basic program that displayed the ready prompt, the probe question, and the response buttons for the participant to touch for registering responses to the ready prompt or probe question. The participant ATC workstations consist of a standard QWERTY keyboard, mouse, a 30’’ monitor, and probe station. The participant controllers also used a pushto-talk headset to communicate via voice with pseudo pilots. All AC in the simulation were piloted by pseudo pilots (experimental confederates) who responded to, and initiated, voice transmissions with ATCs. 2.3 SPAM Probe Question Development and Implementation Two question categories, “status” and “conflict,” were developed to assess participants’ SA in the present study. Questions from the status category required knowledge of the traffic within the participant’s sector. For example, a status question could ask which quadrant of the sector has the most Data Comm equipped AC, or from which direction will most AC enter the sector in the next 5 minutes. The conflict questions were related to conflicts or potential conflicts between AC in the controller’s sector. For example, “Will any southbound AC create a conflict in the next 3 minutes (20 miles)?” All response options for the SA probe questions were
496
J.M. Kraut et al.
provided on the touch-screen, probe panel, and were either in true/false or multiplechoice formats. Workload queries were also included to measure subjective workload during the simulation. For the workload probe questions, the ATWIT scale was used, where 1 = “Low” and 7 = “High.” The probe queries were administered using the SPAM technique. The first probe query of each trial was delivered 4 minutes after the start of each scenario, and subsequent probe queries were delivered every 3 minutes. An audible tone was heard in the participant controllers’ headphones simultaneously with the display of a ready prompt on the touch screen. If the ready prompt was not responded to within 1 minute, the ready prompt was removed and no probe question was administered. If the participant did respond affirmatively, a question immediately appeared on the touch-screen with large response buttons visible under the question for each response option. Participants did not know in advance whether the probe question would be a workload probe or SA probe. 2.4 SART and NASA-TLX SART questionnaires were administered post-trial to capture participants’ subjective SA assessments. The SART consists of ten statements that are categorized into 3 dimensions: Demand, Supply, and Understanding. Responses for each of the statements were on a 7-point scale, where 1 = “Low” and 7 = “High.” A Combined SART score, derived from the combination of the three subscales, was used to assess overall SA: Combined = Mean Understanding – (Mean Demand - Mean Supply). The NASA-TLX was used to assess the participant controllers’ subjective experience of overall workload at the end of each trial. The NASA-TLX consisted of six subscales: Mental Demand, Physical Demand, Temporal Demand, Effort, Frustration, and Performance. Each of the scales on the TLX has a 15-point scale, where 1 = “Low/Good” and 15 = “High/Poor.” Part 2 of the TLX was not used to determine the weights for each scale because weighted and unweighted scores have been shown to be highly correlated [7]. Instead, a combined score was derived for the TLX by adding the six subscales and multiplying that amount by 1.11 to derive a score based on a 100-point scale. 2.5 Procedure All controllers were given extensive training for one week prior to the experiment. They gained familiarity with sector 91/81 traffic patterns, rules, and requirements, then NextGen Data Comm and conflict resolution tools. For the first phase of training there were no AC equipped with Data Comm. For the second phase, the ATCs were instructed in the use of Data Comm. All scenarios for the second phase, and the rest of the experiment, used a mixed fleet of AC that were either Data Comm equipped or were only able to communicate via voice. When two conflicting AC are Data Comm equipped, the two AC visually indicate on the controller’s screen that they are in conflict with one another. When one of the AC in a conflict pair was not Data Comm equipped, no indication of the conflict is displayed. One-third of the six conflicting AC pairs in each scenario were both Data Comm equipped, resulting in notification to the ATCs that the AC pair was in conflict. The third phase introduced probe questions
Effects of Data Communications Failure on Air Traffic Controller Sector Management
497
to familiarize the controllers with the procedure for answering probe questions. The controllers’ progress was tracked over the week of training to ensure that all controllers had no loss of separation (LOS) on two consecutive trials in phase 2 before they were allowed to continue on to the experimental trials. Each of the simulation trials during the experimental phase lasted for approximately 55 minutes. After each experimental trial was completed, the participants completed a NASA- TLX and a SART. The first 6 trials of the larger study were completed under nominal Data Comm conditions, then the failure scenario was run. After finishing all 7 trials, the participants completed a post-experiment questionnaire, were debriefed, and were asked to give feedback about their experience.
3 Results 3.1 Performance Separate repeated measures analyses of variance (ANOVAs) were performed on number of LOSs, average handoff accept time, average distance through sector, and average time through sector for managed AC as a function of scenario type (nominal vs. failure). Performance data from one of the participants was not analyzed due to an error recording the data in one of the scenarios. The handoff accept time, a sector efficiency performance indicator, was significantly lower for the nominal scenario (M = 39 s) than for the non-nominal scenario (M = 53 s), F(1,5) = 6.90, p<.05. The average distance that an AC traveled through the sector, a measure of sector efficiency, showed a marginal effect of scenario type, F(1,5) = 4.28, p<.10. The average distance that an aircraft traveled through the sector tended to be lower in the nominal scenario (M = 73.6 nm) than in the failure scenario (M = 75.5 nm). There were no significant differences in the number of LOSs, a sector safety measure, or average time through the sector, a sector efficiency measure, between the nominal and failure scenario, ps > .10. 3.2 Situation Awareness A repeated measures ANOVA was performed on response time and accuracy of responses to the SA probe questions as a function of scenario type. Accuracy of the probe questions was significantly higher for the nominal scenario (M = 80%) than for the failure scenario (M = 62%), F(1,6) = 10.49, p<.05. There was no significant difference in response times to correct probe questions between the two conditions, F < 1.0. A repeated measures ANOVA was also performed on SART scores as a function of scenario type. The SART subscale measuring “demand on resources” was significantly lower in the nominal scenario (M = 4.62) than in the failure scenario (M = 6.29), F(1,6) = 15.91, p<.01. The supply subscale, understanding subscale, and the Combined SART scores were not influenced by scenario type, ps >.10.
498
J.M. Kraut et al.
3.3 Workload A repeated measures ANOVA was performed on the ATWIT workload probe ratings and ready prompt response times as a function of scenario type. The ATWIT workload probe ratings were significantly lower in the nominal scenario (M = 3.76) than in the failure scenario (M = 4.46), F(1,6) = 5.22, p< .05. However, there were no significant differences between the two conditions for ready prompt response time, p >.10. Another one-way repeated measures ANOVA was performed on the combined NASA-TLX scores as a function of scenario type. Results showed that workload was lower in the nominal (M = 53) scenario than in the failure (M = 73) scenario, F(1,6) = 7.57, p< .05.
4 Discussion As hypothesized, controllers showed higher workload levels when the Data Comm failed than when it was reliable in the nominal conditions. This difference in workload was evident with the NASA TLX that was administered post-trial and in the ATWIT workload ratings administered during the trial, but not with the ready latencies. Possible reasons for why the probe latencies did not show any difference in workload is that the “ready” latencies are only obtained if workload was low enough for a participant to be able to read and attempt to answer a probe question. In the failure scenario, some of the ready prompts were ignored. Thus, the ready latencies obtained were from periods of reasonable workload that were probably similar in the failure scenario as to those in the nominal scenario. Thus, the overall increase in workload associated with the Data Comm failure was captured in the rating measures, but not with the online probe procedures. It was hypothesized that the increase in workload associated with the Data Comm failure would reduce operator SA and performance. Both SA probe question accuracy scores and the demand subscale of the SART showed that the controllers had lower SA in the failure scenario than the nominal one. The increase in workload and decrease in SA with the Data Comm failure reduced operator performance on two out of three sector efficiency variables, but it did not affect sector safety, as measure by the number of LOSs. Thus, the controllers were able to maintain performance of their highest priority task, that of ensuring safety, at a cost to efficiency with which they manage the sector. It should be noted that the controllers in the presented simulation were highly experienced controllers, so this ability to maintain safety with Data Comm failure may not be feasible with novice controllers. At the end of the simulation, the controllers were also interviewed during the debriefing of the study to obtain comments regarding the Data Comm failure. The controllers reported that because the failure in the system was not obvious, there was initial anxiety and confusion. Although controllers were frustrated by pilots not responding promptly to Data Comm clearances after the failure occurred (in fact, pilots did not respond at all), this lack of responsiveness prompted the controllers to realize that Data Comm was not working. Although aware of the failure, controllers continued to try to initiate Data Comm to determine if the failure was temporary or permanent.
Effects of Data Communications Failure on Air Traffic Controller Sector Management
499
The controllers also remarked that trying to troubleshoot the problem while managing traffic caused an increase in their workload and demands on working memory. In fact, we did not inform the first two controllers in the study about the Data Comm failure at all. However, these controllers indicated that it is essential that the controllers be notified as soon as the failure was realized. As a result, we informed the remaining 5 controllers that the Data Comm indeed failed 10 minutes after the failure occurred. All controllers reported that once they figured out Data Comm was not working, or were informed of the Data Comm failure, their workload reverted to near normal levels, but the demands place on their working memory was still higher than normal. All controllers later indicated that they could adjust to the Data Comm failure, but they needed to be informed immediately when the failure occurs. Otherwise, the initial confusion of the system’s non-responsiveness and time needed for readjustment of strategies and procedures for managing traffic can negatively affect their performance. Finally, the controllers also indicated that there should be training for Data Comm failure and that controllers should have knowledge of procedures to be employed when a failure occurs to ensure continuation of safe and efficient management of sector traffic. Controllers noted that their experience in managing traffic compensated for the automation failure, but that new operators must be trained on current traffic management techniques in case a failure does occur. Acknowledgments. This study was supported by NASA cooperative agreement NNX09AU66A, Group 5 University Research Center: Center for the Human Factors in Advanced Aeronautics Technologies (Brenda Collins, Technical Monitor).
References 1. Joint Planning and Development Office. Concept of Operations for the Next Generation Air Transportation System Version 1.2 (2007), http://www.jpdo.gov/library/NextGenConOpsv12.pdf 2. Talotta, N.J., et al.: Operational Evaluation of Initial Data Link En Route Services, vol. 1. Technical report, DOT/FAA/CT-90/1, I (1990) 3. Lee, A.T.: Display-Based Communications for Advanced Transportation Aircraft. Technical memorandum 102187, NASA Ames Research Center, Moffett Field (1989a) 4. Waller, M.C., Lohr, G.W.: A Piloted Simulation of Data Link ATC Message Exchange. Technical paper 2859, NASA Langley Research Center, Hampton (1989) 5. Hinton, D.A., Lohr, G.W.: Simulator Investigation of Digital Data Link ATC Communications in a Single-pilot Operations. Technical paper 2837, NASA Langley Research, Hampton (1988) 6. Prevot, T.: Exploring the Many Perspectives of Distributed Air Traffic Management: The Multi Aircraft Control System MACS. In: Proceedings of the HCI-Aero, pp. 149–254 (2002) 7. Nygren, T.: Psychometric Properties of Subjective Workload Measurement Techniques: Implications for their Use in the Assessment of Perceived Mental Workload. Hum. Factors 33(1), 17–33 (1991)
Pilot Information Presentation on the Flight Deck: An Application of Synthetic Speech and Visual Digital Displays Nickolas D. Macchiarella, Jason P. Kring, Michael S. Coman, Tom Haritos, and Zoubair Entezari College of Aviation, Embry – Riddle Aeronautical Univerisity 600 S Clyde Morris Blvd, Daytona Beach, Florida - 32114 {dan.macchiarella,jason.kring,michael.coman,tom.haritos}@erau.edu, [email protected]
Abstract. Integration of synthetic speech for Next Generation Air Transportation System (NextGen) communicative purposes is in its infancy. Integration of synthetic speech on the flight deck has the potential to improve air traffic control (ATC) and pilot communications through a multimodal presentation of critical information. In a synthesized speech system, digital-data traffic from ATC is converted into a synthetic, or computer-generated, voice for presentation to the pilot. Parameters to implement a synthetic speech system on the flight deck, as a means of optimizing communications between ATC and pilots, are under study at Embry-Riddle Aeronautical University in conjunction with the FAA- Human Factors Research and Engineering Group for NextGen (AJP-61) and John A. Volpe National Transportation Center. Keywords: Synthetic Speech, NextGen, Data Comm, Air Traffic Control, Flight Training Device.
1 Introduction The national air transportation system is undergoing an evolutionary process to become more efficient in order to keep pace with growing air traffic and new technologies. Commercial and general aviation flight deck systems generate a variety of sounds used for auditory cueing. Synthesized voice communication (i.e., synthetic speech) is one of the tools aircraft designers can integrate into the flight deck. Synthetic speech can present data communication (Data Comm) type messaging from Air Traffic Control (ATC) to pilots. The aim of applying synthetic speech systems on the flight deck is to assist the pilot by reducing memory load, decreasing interruptions, decreasing frequency congestion, reducing call-sign confusion (BürkiCohen & Macchiarella, 2010). Researchers at Embry-Riddle Aeronautical University (ERAU), in conjunction with the FAA- Human Factors Research and Engineering Group for NextGen (AJP-61) and John A. Volpe National Transportation Center, are conducting a study using the Frasca 172 Flight Training Device (FTD). The FTD is an FAA-qualified, Level 6 device with a 220-degree wraparound visual system and enhanced aerodynamic modeling. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 500–506, 2011. © Springer-Verlag Berlin Heidelberg 2011
Pilot Information Presentation on the Flight Deck
501
The research team capitalizes upon its access to a large population of experienced professional pilots, advanced flight simulators, and virtual air traffic/air traffic control simulation that can replicate real world communications. The Data Comm message set proposed by the Requirements and Technical Concepts for Aviation SubCommittee 214 (FAA, 2011) is in use for the research. Synthetic speech can be adapted to deliver NextGen-type digital communications. As a result of NextGen transformations, pilots will be challenged by the presentation of greater levels of visual information on the flight deck.
2 Flight Deck Presentations for Data Comm As the amount of information presented on the flight deck grows, pilots will increasingly be constrained by the amount of information that can be perceptually processed at any given time. NextGen will enable pilots to access higher levels of information, in a more-timely fashion, and with more-precision. Pilots will attempt to adapt by spreading attention across multiple informational inputs (Eriksen & St. James, 1986). If an item of interest is detected, attention can be re-allocated so that the phenomenon can be investigated. Without integrating new approaches to humanmachine interface on the flight deck, pilots could increasingly focus their attention inside the aircraft. A synthetic speech system has the potential to reduce pilot-ATC communicative voice traffic and decrease miscommunication. The possibility exists that synthetic speech used in conjunction with visually presented Data Comm may decrease workload, decrease memory errors, and reduce misunderstanding when compared to traditional spoken communications. As part of this effort, the researchers at ERAU developed a Visual Digital Display that is designed for single pilot use in a confined cockpit (see Figure 1). Cardosi, Lennertz, and Donohoe (2010), reviewed various related studies that led to several constructs addressing possible affects of synthetic speech during performance of piloting tasks. By reducing workload in the visual domain, and reducing the need for focusing attention heads-down while piloting, pilots are afforded more opportunity to perform the primary task—flying the aircraft. This is particularly of concern for single-pilot flight operations, as these are often experienced by general aviation pilots and corporate pilots under instrument meteorological conditions (IMC). Using visually presented Data Comm messages in conjunction with computer synthesized speech (i.e., text annunciations) could shift workload from the auditory domain to the visual domain. However, it is possible that utilizing visually presented Data Comm for communication could add more load to the visual task. Heads-down time is an important determinant in understanding the task load affecting a pilot. Heads-down time is significantly reduced by reducing workload in the visual domain that is focused inside of the cockpit. This is particularly important while performing single-pilot operations during IMC. The manual task of accessing and responding to visually presented Data Comm messages might be performed more quickly than listening to and understand an annunciated data link message. There are mixed outcomes of several studies addressing such messages (Cardosi, Lennertz, & Donohoe, 2010). In one study, synthetic speech elicited faster response times when compared to the presentation of plain text alone while using a digital display. Another study, using two pilot crews flying an airline-style flight profile,
502
N.D. Macchiarella et al.
Fig. 1. Visual Digital DisplayTM Human-Machine Interface for Single Pilot Use
found that the pilot-not-flying had increased heads-down time. These observations occurred during all instance of digital display. Pilots’ preferences for auditory-only, textonly, or redundant displays were also mixed. A study by Lancaster & Casali (2008) examined single pilot flight operations while communicating via a data link. Pilots received ATC instructions provided in four formats, these were: text only, digitized speech (i.e., recorded voices) only, synthesized speech (i.e., text converted to speech) only and synthesized speech with text (i.e., a mixed modality presentation). Pilot performance varied based upon the presentation format. Pilots more frequently rated the workload for text-only communications as dangerous. The workload rating for the digitized speech communications and synthesized speech with text communications did not differ. The text only group produced the greatest amount of heads-down time. There was no difference for heads-down time between other groups. These results could suggest that the use of textual data link alone in a GA
Pilot Information Presentation on the Flight Deck
503
environment lends to lower levels of performance. While no significant performance benefits were observed in the mixed modality condition (i.e., synthesized speech with text) when compared to the speech-only condition, the researchers concluded that the mixed modality condition, which provides a semi-permanent record of the ATC instruction, may be the safest.
3 Apparatus The Synthetic Automated Flight Training Environment with Virtual Air Traffic (SAFTE-VAT) serves as a means of creating realistic communications while flying FAA certified flight simulations. The SAFTE-VAT engages pilots with virtual ATC and virtual air traffic communications. Designers developed SAFTE-VAT for use during simulation-based flight training with the aim of decreasing instructor pilot workload and increasing the behavioral fidelity when immersed in a virtual training environment (Macchiarella & Doherty, 2007). Embry-Riddle Aeronautical University and Frasca International Inc. jointly developed SAFTE-VAT to create virtual air traffic control and virtual air traffic that instructional designers can apply during scenario-based training. The SAFTE-VAT system uses triggers to initiate synthetic speech or voices recorded in a digital format. The triggers include speech recognition, location of the training aircraft in the virtual environment, time, or specific event in the simulation (see Figure 2). Student pilots will interact with the SAFTE-VAT while they are flying scenarios. Additionally, the pilot using the FTD has access to the normal functionalities afforded by the simulation. The SAFTE-VAT receives the pilots’ speech via the FTD’s radio functionality. The speech is parsed into text and compared to anticipated text via a relational database. A scripted text-based response is selected and synthesized into an audio format (see Figure 3). The audio is presented to the pilot through the FTD’s radio functionality. In essence, to the pilot, there is a perception that pilot’s speech was understood by ATC and/or air traffic. The SAFTE-VAT allows the simulation to communicate with the pilot as if the flight were occurring in the real world. The communication is half-duplex and occurs only through the radio functionality after the pilot chooses and implements the correct frequency (e.g., the pilot selects and tunes a frequency then presses the transmission switch to begin communication) (Macchiarella, 2008).
Fig. 2. Virtual Air Traffic Trigger Selection
504
N.D. Macchiarella et al.
Fig. 3. Scripting Interface for Synthetic Speech Annunciations and ATC Specifications
4 Method The researchers are addressing, “Does supplementing text only communications with synthetic speech mitigate some of the text-only presentation risks in a single pilot environment?” NextGen applications of synthetic speech could occur in airspace ranging from virtually unstructured and low density airspace with aircraft operating under visual flight rules to highly structured and densely packed air space with air traffic under instrument flight rules. An experiment planned for third quarter 2011 will examine the central research question: Does supplementing text-Data Comm with synthetic-speech-Data Comm mitigate some of the Data Comm risks while maintaining benefits? The design of the study uses 24 pilots experiencing visually presented Data Comm (see Figure 1) and visually presented Data Comm with a synthetic speech annunciation. The independent variable is the presentation of a synthetic speech annunciation. The dependent variables include: • •
•
Message Response Time o Visual message arrival to pilot until a response is sent with the visual digital display Control Input Response Time o upon receipt of ATC message pilots’ response time for to comply as measured by throttle, aileron ,elevator, and rudder deflections Pilot Accuracy o airspeed, altitude, heading, turn rate, climb rate, coordinated flight, pitch, on course, proper frequency
Pilot Information Presentation on the Flight Deck
• •
505
ATC Verbal Queries o number of queries to ATC Heads-Down Time o time pilots spend looking away from the instrument panel inside the flight deck
The Visual Digital Display is a Samsung Q1EX-71g UMPC tablet PC with a 7” touch screen integrated with an FAA qualified Level 6 Flight Training Device. Bidirectional Data Comm messages moves between the Visual Digital Display and the computer hosting SAFTE-VAT and the instructor’s workstation (see Figure 4).
Fig. 4. Instructor’s Workstation
5 Discussion A synthesized speech system may be advantageous for enhancing information presentation and reducing heads-down time on the flight deck during certain phases of flight. In a synthesized speech system, digital data traffic from Data Comm is converted into a synthetic or computer-generated voice, for auditory presentation to the pilot. The pilot can respond to the information via a digital display on the flight deck. Presently, commercial and general aviation flight deck systems generate sounds and automated speech statements that are used for auditory cueing of equipment malfunctions or hazards such as those from ground proximity warning systems (GPWS). For NextGen, expanding the application of synthetic speech systems to encompass data and instructions from ATC will require the creation of design standards to facilitate safe operations and serve as the first step to widespread adoption in commercial aviation.
506
N.D. Macchiarella et al.
Modern synthetic speech possesses several weaknesses in comparison to natural speech. Although there have been improvements in recent years, synthetic voices today are still limited in flexibility, ability to convey subtleties or emotion, and quality. In the context of a Data Comm system, there are several important considerations in regard to characteristics of a given synthetic speech system. The goal of this study is to clarify the affect of synthetic speech in a single pilot setting.
6 Conclusion As the national air transportation system undergoes an evolutionary process to meet the demands of growing infrastructure for commercial and general aviation, the integration of a synthetic speech system on the flight deck, to supplement Data Comm information, has the potential of improving communications between ATC and pilots. Research efforts will help determine the affect of synthetic speech when used in conjunction with digital displays for communication during flight, and subsequently highlight a path for Data Comm enabled communications. As a result of NextGen transformations, pilots will be exposed to a greater level of information delivered in a timelier and more precise manner than traditional voice communications.
References 1. Bürki-Cohen, J., Macchiarella, N.D.: 06-02: DataComm in Single-Pilot Operations:Synthetic Speech to Supplement Visual Display. Area 6: Data Communications: FAA. ATO-P Human Factors Research and Engineering, Data Communications Working Group Meeting (2010) 2. FAA. RTCA SC-214 / EUROCAE WG-78 Standards for Air Traffic Data Communication Services (2011), http://www.faa.gov/about/office_org/headquarters_offices/ato/ service_units/techops/atc_comms_services/sc214/current_docs/ 3. Eriksen, C.W., James St., J.D.: Visual attention within and around the field of focal attention: A zoom lens model. Perception and Psychophysics 40, 225–240 (1986) 4. Cardosi, K., Lennertz, T., Donohoe, C.: Human Factors Research Plan for Flight Deck Data Communications. FAA: ATO-P Human Factors Research and Engineering Group. USDOTVolpe National Transportation Systems Center, Boston (2010) 5. Lancaster, J.A., Casali, J.G.: Investigating pilot performance using mixed-modality simulated data link. Human Factors 50(2), 183–193 (2008) 6. Macchiarella, N.D., Doherty, S.M.: High Fidelity Flight Training Devices for Training AB Initio Pilots. In: Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), Orlando, FL (2007) 7. Macchiarella, N.D.: Advancements in Flight Training Devices for Ab Initio Pilot Use, AIAA 2008-6848. In: Proceedings of the AIAA Modeling and Simulation Technologies Conference and Exhibit, pp. 18–21. Honolulu, HI (2008)
How Data Comm Methods and Multi-dimensional Traffic Displays Influence Pilot Workload under Trajectory Based Operations Jimmy H. Nguyen1, L. Paige Bacon1, R. Conrad Rorie1, Meghann Herron1, Kim-Phuong L. Vu1, Thomas Z. Strybel1, and Vernol Battiste2 1
California State University Long Beach 1250 Bellflower Blvd., Long Beach, CA, USA 2 NASA Ames Research Center, Flight Deck Display Research Lab (FDDRL), Moffet Field, CA, USA {Jimmy.Nguyen,Lauren.Bacon,Robert.Rorie}@student.csulb.com; [email protected]; {kvu8,tstrybel}@csulb.edu; [email protected]
Abstract. The goal of the present study was to examine the impact of different data-communication (Data Comm) methods and use of multi-dimensional displays (2-D or 3-D) on pilot workload when Trajectory Based Operations (TBO) are employed. Eight pilots flew simulated enroute flights using an integrated (FANS-1A) or non-integrated (ACARS) Data Comm method. Pilots were also asked to rate the workload and acceptability of a route modification with the different Data Comm methods. Online assessments during the flight simulation showed no difference in pilot ratings of workload and route acceptability. However, in post trial questionnaires, pilots reported an overall preference for FANS as a Data Comm method compared to ACARS. The display type did not change pilots’ positive ratings for the FANS method, but 3D displays increase the operator’s ability to understand the proposed flight plan changes when they used ACARS. Keywords: Data Comm, 2-D displays, 3-D display, ACARS, FANS-1A, Trajectory Based Operations, Workload, NASA CSD.
1 Introduction According to the FAA, by 2030, large airport hubs will see a 70.6% increase in passenger enplanements [1]. With this increase in demand, the capacity of the current radar-based, ground-centered air traffic management (ATM) system will be exceeded. Moreover, other limitations of the current ATM system such as the number of gates, runways, and airways will lead to delays, fuel inefficiency, and unsafe flying conditions. The Next Generation Air Transportation System (NextGen) is being developed to overcome these problems. NextGen encompasses a systematic overhaul of the air traffic control system based on new procedures and advanced technologies. These new technologies need to be evaluated to determine how they affect flight crew performance. The present study will evaluate two technologies, that of Data Comm and multi-dimensional displays. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 507–515, 2011. © Springer-Verlag Berlin Heidelberg 2011
508
J.H. Nguyen et al.
Currently, ATC-pilot communications are almost exclusively voice based, which may be inadequate for ensuring safe separation in the high-density traffic environments envisioned in the future. One solution for reducing radio communication congestion is to use pilot-controller Data Comm. Data Comm is a digital, text-based messaging system for sending trajectory changes between pilots and ATCs. In its simplest form, pilots will receive a text message and input the changes into their flight management system upon evaluation of the content for flight safety. However, if the Data Comm is integrated into the flight management system (FMS), these flight plan changes can be uploaded directly without the need for manual input. Flight deck technology traditionally uses a 2-D top-down navigation display (NAV), accompanied by a primary flight display to indicate an aircraft’s horizontal and vertical position. In NextGen, multi-view, 3-D displays can be integrated into the flight deck to allow pilots to visually inspect trajectory changes and make decisions based on them. Initial studies comparing 2-D versus 3-D displays on flight decks have shown that 2-D displays provided a more accurate representation of spatial information, and this allowed pilots to better make distance and spatial judgments. This performance benefit for 2-D displays was attributed to the fact that the display screens were 2-D. With 3-D displays, the axes must be compressed into a 2-D format which creates spatial ambiguity [3] [4] [5] [6]. However, more recent traffic displays, such as NASA’s Flight Deck Display Research Laboratory’s (FDDRL) Cockpit Situational Display (CSD) [7], provide the operator with a 3-D volumetric display capable of being manipulated. It has been shown to overcome spatial ambiguity issues associated with traditional 3-D displays. Moreover, there is evidence that this type of 3-D display can yield performance equivalent to that obtained with 2-D NAV displays [8]. One study that examined use of Data Comm with 2-D and 3-D displays showed that pilots had better judgment of their aircraft heading and position when flight path changes provided by the controller were integrated into a 3-D display [2]. The present study extends this work by examining the impact of different Data Comm methods and a multi-view display on pilot ratings of workload and route acceptability changes. We examined two Data Comm methods: (a) ACARS, a commercially available text messaging system that is present in most current-day commercial aircraft; and (b) FANS-1A (FANS), a more advanced, integrated Data Comm method, where route clearances can be uploaded directly into the FMS. With ACARS, the flight information is manually entered by pilots into the FMS. Because ACARS involves more steps for making flight path changes than FANS, it should require more time to complete a task, and may increase pilot workload. We also evaluated whether the type of graphical display, 2-D or 3-D, could affect operator workload and route acceptability ratings with each Data Comm method. Given that visual integration of flight path information increases operator’s awareness of his/her position [2], it was hypothesized that, although ACARS may require more workload to implement a flight change than FANS, this workload could be mitigated with the integration of the flight path capability provided by a 3-D display.
How Data Comm Methods
509
2 Methods This study was part of a larger study called Separation Assurance using FANS-1 Automation for Resolution Implementation (SAFARI). The goal of SAFARI was to examine flight crew and ATC negotiations using Trajectory-Based Operations TBO procedures in the enroute phase of flight. SAFARI was conducted over a two week period. The present study reports data from the second week of the simulation, using eight specific trials where the participant flight crews flew simulated aircraft that were FANS-1A or ACARS equipped, with traditional 2-D displays. In addition, 8 trials were added to the end of the SAFARI simulation that enabled flight crews to use a 3D display, the NASA FDDRL CSD display. 2.1 Participants Eight air transport pilots (6 Captains and 2 First Officers) participated as part of four flight crews. Five pilots reported having more than 3000 hours in “glass” cockpits while the remaining three reported having less than 3000 hours of glass cockpit experience. For Data Comm exposure, six of the eight pilots reported having experience communicating with non-integrated Data Comm (ACARS). Only two pilots reported having experience communicating with integrated Data Comm (FANS-1A) through actual flight time. These eight participants will be referred to as the “participant pilots” or “experimental flight decks”. Other aircraft in the sector were flown by “pseudopilots”, who were researchers working in the lab. There were also two radar-certified controllers that managed the active sectors, but the present study only reports data from the participant pilots. 2.2 Design A 2 (Display type) x 2 (Data Comm type) factorial design was employed to examine the relationship between type of display and type of Data Comm method on pilot ratings of workload and route acceptability. For display type, pilots were presented with a fixed 2-D display similar to the current- day NAV display, with a PFD, on the first half of the trials, and the NASA FDDRL’s 3-D CSD on the second half of the trials. For Data Comm type, integrated and non-integrated Data Comm systems were simulated. The integrated Data Comm system was modeled after FANS-1A; this system allowed route modifications to be loaded directly into the flight management system. The non-integrated Data Comm system, modeled after ACARS, required pilots to manually input route modifications into FMS. 2.3 Dependent Variables Workload probes were presented in real time. Flight crews were queried individually on their current workload (1= low and 5 = high) every two minutes; however, these workload probes were replaced by route acceptability probes based on the actions of the flight crew. If the flight crew executed a route modification, a route acceptability probe would be presented instead of a workload probe at the next interval. The route acceptability probe asked the flight crews to rate the acceptability of the recently
510
J.H. Nguyen et al.
executed route by judging the quality of the route modification based on a scale of 1 to 5 (1= poor, 3 = good, 5= best). In addition to workload probes, post trial workload ratings were obtained from the participant pilots at the end of each simulation run. 2.4 Apparatus The simulation environment was set up using the Multiple Aircraft Control System (MACS) developed by the NASA Ames Airspace Operations Laboratory [9], and the 3-D CSD developed by NASA Ames’ FDDRL [7]. Four desktop pilot stations were used to simulate multiple low fidelity flight decks. Each station was manned by a flight crew of two pilots, one pilot serving the role of Captain and the other First Officer. Each station contained five displays: • • •
two standard monitors o CSD (2-D or 3-D) o PFD two individual touch screen displays o Mode Control Panel (MCP) o Probe panel to display workload and route acceptability probes one shared touch screen monitor with a MACS display.
These displays were used to mimic a flight deck environment. Each pilot shared a center touch screen display with a simulated Boeing 777 MCP, two CDUs, and an EICAS display used to alert pilots to incoming Data Comm messages. The shared touch screen display was used to emulate similar physical actions on an actual MCP and CDU interface. The touch screen monitor for the shared display was an ACER T230H model, with a 23” wide-screen TFT LCD. Additionally, each pilot had a monitor displaying a CSD and a PFD next to the shared touch screen display. 2.5 FANS-1A and ACARS Procedures To integrate TBO with both methods of Data Comm, procedures on how to proceed under specific levels of flight deck equipage were developed for pilots and ATCs. For aircraft equipped with FANS-1A Data Comm, the flight crew responded to ATC uplinks of route modifications by accepting, loading, and executing the new flight plan, or rejecting it. Once accepted, the flight crew was able to examine the route modification via the CSD display (in either 2-D or 3-D format, depending on the condition) and could load this route into the FMS if it was appropriate, or reject it based on flight safety considerations. If the new route was acceptable, the flight crew could execute the new route. If this route was rejected, crews could request another route or propose one to the ATC via the FANS-1A procedures provided to them. For aircraft equipped with ACARS, the procedure required the flight crew to manually input waypoints and/or lat-long coordinates that were uplinked by ATC into the appropriate leg of their flight plan through the use of the CDU. Once inputted, the crew examined the proposed route, and either accepted or rejected it based on flight safety. Since ACARS route modifications could not be directly loaded into the FMS, route modifications required the crew to key in new waypoints and lat-long
How Data Comm Methods
511
coordinates into the CDU within a certain amount of time to make sure the aircraft stayed on course. If the flight crew could not approve and execute the proposed route modification in this time window, then a contingency flight plan was provided to them. 2.6 Procedure Before the SAFARI experimental trials began, participant pilots and ATCs were briefed on their task, types of aircraft, Data Comm equipage, and TBO procedures. Then each pilot was assigned to a flight crew, as Captain or First Officer, based on their reported flight experience. In addition, each pilot was assigned the role of “pilot flying” or “pilot managing”. The pilot managing role required the pilot to input ATC uplinks into the CDU. The pilot flying made the final decision on whether to accept or reject a proposed route. Both pilot roles were counter-balanced so each pilot was the pilot flying or pilot managing on an equal number of trials. After the roles were assigned, the flight crews were trained on the simulation configuration. The training modules addressed the flight deck set up, the functions of the MCP, CDU, CSD, PFD, radio control, pilot responsibilities, communications, and answering probe questions. Each module was followed by a skills test that required each pilot to answer questions about the module on which they were trained. Training modules were not considered completed until pilots successfully answered all skills test questions. Once the training was completed, the experimental trials began. At the beginning of each trial, flight crews were assigned a communication equipage level for their aircraft, as well as given a reference guide for the specific Data Comm procedures to be used displays that required them to downlink a route modification request with ATC for weather avoidance. Flight crews were to follow their assigned Data Comm procedures when communicating with ATC, unless maintaining flight safety was at risk. In this case, they were to revert back to voice communication. With the FANS-1A Data Comm equipage, the crews were able to insert waypoints into the CDU to create conflict-free flight paths that were downlinked to the ATC. The ATC could then accept the proposed route, reject it, or reject it and propose a new flight plan. If the ATC accepted the route, the flight crew received an accepted message, prompting them to execute the flight plan via the CDU. If the ATC rejected the route, flight crews were able to propose another route or wait for a flight plan to be uplinked to them from the ATC. Under ACARS equipage, the flight crews constructed a flight plan by sending a free text message of to the ATC. The ATC would construct that route modification using a trial planner tool and then accept it or proposed another route to uplink back to the flight deck. If the ATC uplinked the requested route, the flight crew would follow ACARS procedure to accept the route and confirm it with ATC. During each scenario, pilots and controllers were probed on their workload every two minutes. Availability of workload queries was signaled by an audio chime to alert the pilots of an awaiting workload question. Pilots responded to the workload queries by providing a rating of 1 to 5, with 1 being low workload and 5 being high workload. The probe station also presented event-triggered route acceptability queries instead of the workload queries. If the flight crews executed a route modification by pressing the “execute” button on the CDU, the next probe query would be a route acceptability probe instead of a workload probe. For the route
512
J.H. Nguyen et al.
acceptability ratings, the scale was also 1 to 5 (1= poor, 3 = Good, and 5 = Best route modification). After each trial, pilots were also required to complete post trial questionnaires. Experimental observers were always on standby to assist flight crews with any procedural questions. At the end of the SAFARI simulations, using the CSD in 2-D display mode, pilots were briefed on the use of the 3-D features of the CSD. Pilots were told that their flight responsibilities and the Data Comm procedures remained the same as in the SAFARI study. However, instead of viewing the weather, traffic, and flight plans using the 2-D view and PFD, pilots would be able to manipulate the 3-D CSD display with their computer mouse to rotate the airspace in any direction. In addition, the weather was displayed in a 3-D format as well, simulating NexRad weather. Pilots were trained on the 3-D CSD display before the experimental trials began. Workload and route acceptability probes, and the post trial questionnaires, were administered in the same manner as in the SAFARI study. Once pilots completed the supplemental trials, they completed a post simulation questionnaire.
3 Results Workload ratings, route acceptability ratings and comments on post experimental trial questionnaires were analyzed. Separate 2 (Display) x 2 (Data Comm) ANOVAs were conducted on the pilots’ ratings of workload and route acceptability during the experimental trials. The same ANOVA was performed on other ratings provided by the participant pilots on the post trial questionnaire relating to flight safety, efficiency, ability to communicate with ATC, and overall/peak workload for the enroute and for weather avoidance phases of flight. 3.1 Workload and Route Acceptability Pilots’ ratings of workload during the experimental trials did not yield a significant main effect for Data Comm method or display type, Fs < 1.0. In addition, no interaction of the two variables was obtained, F < 1.0. Similarly, no significant effects were obtained for route acceptability ratings, Fs < 1.0. Table 1 displays mean ratings for workload and route acceptability for each Data Comm method and display type. Note that both the workload and acceptability ratings were low. Table 1. Mean workload and acceptability ratings presented by Data Comm and display type. The standard error of the mean is in parenthesis.
Data Comm Type ACARS 2-D
3-D
FANS 2-D
3-D
Workload
1.97 (.17)
1.98 (.10)
1.90 (.12)
1.93 (.10)
Acceptability
1.78 (.24)
1.85 (.17)
1.83 (.17)
1.67 (.22)
How Data Comm Methods
513
3.2 Post Trial Questionnaire Pilots were asked to rate their ability to communicate with ATC using the different Data Comm methods on a 1-5 scale, where 1 was low and 5 was high. In terms of effectiveness, there was an effect of Data Comm type, F(1,7) = 18.36, p < .01. Pilots rated FANS (M = 4.23) significantly higher in effectiveness than ACARS (M = 3.81). For efficiency of communication, a marginally significant main effect for Data Comm was also present, F(1,7) = 5.11, p = .058, where pilots reported that ATC communications were more efficient when using FANS (M = 4.23) than ACARS (M = 4.00). For understandability, another marginal effect of Data Comm was obtained, F(1, 7) = 4.65, p = .068. Pilot rated their understanding of ATC-issued flight plan clearances to be slightly better with FANS (M = 4.30) than ACARS (M = 4.00). However, this main effect was qualified by a significant interaction with display type, F(1,7) = 9.32, p < .05, see Figure 1. Display type did not influence pilot understandability ratings of ATC communication with FANS. However, with ACARS, the 3D CSD display increased the pilot’s ability to understand ATC trajectory changes compared to the 2D display. Taken together, these findings suggest that pilots using the FANS Data Comm had better understanding of ATC communication. However, using the 3-D multi-view display reduced the negative effects associated with ACARS relating to pilots’ ability to understand ATC proposed flight path modifications. Pilots also reported that overall workload differed between the Data Comm methods, F (1,7) = 8.41, p < .05, with less workload reported for FANS (M = 1.98) than ACARS (M = 2.31). For overall workload in avoiding weather, a main effect for Data Comm type was also present, F (1,7) = 7.33, p < .05. When using FANS, pilots again reported experiencing less workload (M = 1.97) than when using ACARS (M = 2.20).
Pilot Post-Trial Understandability Ratings
Mean Rating
5.0 4.5
Data Comm Method
4.0
ACARS FANS
3.5 3.0 0
1 2-D
2 3-D
3
Display Type Fig. 1. Display by Data Comm interaction for rating (1 = low; 5 = high) of understandability of ATC clearances
514
J.H. Nguyen et al.
4 Discussion Pilots reported higher workload levels associated with using the ACARS Data Comm method than with the FANS method when queried at the end of the trial, but not during the simulated flight. One possible explanation for the discrepant findings from the online probes versus post-trial results is that, with ACARS, pilots reverted back to voice communications when its use did not lead to timely actions during the simulation. As such, the pilots kept their workload at low, manageable levels throughout the scenario. The post-trial workload ratings likely reflected the pilots’ preference for the FANS Data Comm method over ACARS. However, because operator preference for an interface does not necessarily lead to better performance with the interface [10], performance data with the different Data Comm methods need to be examined to determine the overall effectiveness of the different Data Comm methods. Overall, pilots rated FANS higher than ACARS when queried about the effectiveness, efficiency of communication, and understandability of ATC clearances. These findings suggest that pilots would more likely adopt Data Comm if implemented with FANS than with ACARS. The benefit of using ACARS for Data Comm delivery, though, is that it is already available on most commercial aircraft and can serve as a starting point for Data Comm usage. Although ACARS was rated lower in terms of message understandability than FANS, this difference disappeared when it was used in conjunction with the 3-D CSD display. The greater understandability ratings observed with 3-D displays was attributed to the display allowing pilots to visualize and inspect the flight plan change. This initial finding indicates that 3-D CSDs are promising tools for NextGen environments, and future investigations should continue to examine the use of 3-D displays in aiding pilot decision making. Acknowledgments. This study was supported in part by NASA cooperative agreement NNX09AU66A, Group 5 University Research Center: Center for the Human Factors in Advanced Aeronautics Technologies (Brenda Collins, Technical Monitor).
References 1. Terminal Area Forecast (TAF) Summary Report Fiscal Year 2009 – 2030 (PDF File), http://www.faa.gov 2. Wickens, C.D., Miller, S., Mingpo, T.: The implications of data-link for representing pilot request information on 2D and 3D air traffic control displays. In: 38th Hum. Fac. Erg. Soc. P., Nashville, TN, pp. 61–65 (1994) 3. Alexander, A.L., Wickens, C.D.: 3D navigation and integrated hazard display in advanced avionics: Performance, situation awareness, and workload. (Technical Report AHFD-0510/NASA-05-2), NASA Langley Research Center, Hampton (2005) 4. Boyer, B.S., Wickens, C.D.: 3D weather displays for aircraft cockpits. (Technical Report ARL-94-11/NASA-94-4), University of Illinois, Aviation Research Laboratory, Savoy (1994)
How Data Comm Methods
515
5. Boeckman, K.J., Wickens, C.D.: The resolution and performance effects of threedimensional display rotation on local guidance and spatial awareness. (Technical Report ARL-01-4/NASA-01-3), University of Illinois, Aviation Research Laboratory, Savoy (2001) 6. Alexander, A.L., Wickens, C.D., Merwin, D.H.: Perspective and coplanar cockpit displays of traffic information: Implications for maneuver choice, flight safety, and workload. Int. J. Aviat. Psychol. 15(1), 1–21 (2005) 7. Granada, S., Dao, Q., Wong, D., Johnson, W.W., Battiste, V.: Development and integration of a human-centered volumetric cockpit situation display for distributed airground operations. In: Proceedings of the 13th International Symposium on Aviation Psychology, Oklahoma City, OK, pp. 229–284 (2005) 8. Thomas, L.C., Wickens, C.D.: Display dimensionality, conflict geometry, and time pressure effects on conflict detection and resolution performance using cockpit displays of traffic information. Int. J. Aviat. Psychol. 16(3), 315–336 (2006) 9. Prevot, T.: Exploring the many perspectives of distributed air traffic management: The Multi Aircraft Control System: MACS. In: International Conference on Human-Computer Interaction in Aeronautics, pp. 23–25. MIT, Cambridge (2002) 10. Vu, K.-P.L., Proctor, R.W.: Naïve and experienced judgments of stimulus-response compatibility: Implications for interface design. Ergonomics 46, 169–187 (2003)
Macroergonomics in Air Traffic Control – The Approach of a New System Luiza Helena Boueri Rebello Universidade Federal Fluminense - UFF, Escola de Engenharia, TDT, Desenho Industrial Rua Passos da Pátria 156, Bloco D, São Domingos, Niterói - RJ, 24210-240, Brazil [email protected]
Abstract. This paper aims to make a presentation of a study in order to obtain solutions for a better adequacy of a complex production system, where the system of air traffic control, and the Brazilian system that is in phase of modifications to a human-machine relationships over the working procedures in the light of the constant innovations in information systems in order to improve flight safety. Keywords: Macroergonomics, Air Traffic Control, Flight Safety.
1 Introduction When an air travel is made, both the crew as the passengers of the aircraft depend on a full and secure integration with a series of systems that range from the crew training, passing by the manufacturer of the aircraft and its maintenance, until the air traffic control systems, that will administer the landing, takeoff procedures and aircraft navigation. With globalization started in the early 1990s, there was a growth of air traffic volume growth in the volume of cargo and transport of people. Reflecting this growth came the need to increase the system of air traffic control through a management, which became a fundamental element to ensure the safety of flights. This article aims to make a presentation of a study that in order to obtain solutions for a better adequacy of a complex production system, where the system of air traffic control, which is still under development and readjustment thus human-machine relationships over the working procedures in the light of the constant innovations in information systems. This approach is in macroergonomics procedures.
2 The Problems The Brazilian air accidents involving the Gol flights 1907 occurred on September 29th 2006 and TAM 3054 occurred on July 17th 2007, brought to light various problems related to the Brazilian airline industry. The first accident caused the death of 154 people and led to the crisis, that at the time was called popularly air "blackout" which originated a CPI – Parliamentary Committee of Inquiry – Crisis of Air Traffic G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 516–525, 2011. © Springer-Verlag Berlin Heidelberg 2011
Macroergonomics in Air Traffic Control – The Approach of a New System
517
System [1]. In July 2007, when the crisis seemed to be solved, a new accident occurred, the flight 3054 TAM in Congonhas airport in São Paulo, causing the death of 199 people. The crisis broke, forcing the ANAC – National Civil Aviation Agency to take an urgent action [2]. The bad administration of the Brazilian civil aviation that occurred in the recent past managerial interference caused that could truncate information of control organization (tower-TWR/approach-APP/area control-ACC). Example: in the line of direct connection to the area control centre (ACC) of Brasilia did not work for 2 hours, causing a major problem of communication and hogging all the controllers of this jurisdiction area control centre, causing an "air" blackout and chaos in various Brazilian airports. In accordance with air traffic control, the problems will not be solved by reviewing only the design of workstations and action procedures. We must, above all, review issues relating to the work organization and the work product itself as a whole. According to Hendrick/Kleiner [3], conceptually, the macroergonomics is a topdown approach for sociotechnical systems, system design and implementation of the global system work project for human interfaces projects work, human-machine and human-software. However, in practice, it is much more comprehensive and consists of a general analysis and structuring of the general system. So, macroergonomics deals with the human-organization interface, which is the most comprehensive levels (macro) to more specific and restricted levels of a problem (micro). In this case, this interface must ensure the quality of human work next to the control systems, air traffic from the job controller until the organization to increase safety decision-making, thereby, the total safety of flights and the control of flow of aircrafts. With the CPI of the air “blackout” [1], many questions appears concerning the existing control system. Then, a new system of air traffic control was created to increase the safety and effectiveness. This system was created by ATECH – Fundação de Aplicações de Tecnologias Críticas by looking at the real need of controllers towards the increasing technological innovation in air traffic control systems in the world. Not that the previous system of radar data handling and flight plans, called ESCA4000, was bad, but there were flaws in the system software that created insecurity in some control procedures, which is unacceptable when it comes to the issue of flight safety.
3 Hypotheses At work in air traffic control, human error is a frequent occurrence of accidents and incidents. Common failures and is involved in systems. Only in the USA in 1996 there were about 12 thousand failures during control procedures caused by obsolete equipment. According to this large number of failures, a system was created by the MITRE Corporation for the Federal Aviation Administration (FAA) [4], called URET – User Request Evaluation Tool, which provided an increase in the safety of flights, in 30% the capacity of the aircraft and airports receive reduced by 20% the
518
L.H. Boueri Rebello
submersion landings and delays. URET prototype is being tested in ACC (area control centre) of Memphis and Indianapolis since 1998 and is still in the stage of implementation for other control centres of USA. The primary function of URET is to help the drivers to administer the arrival of aircrafts and maintain a secure flow of traffic, because the system indicates the runway should be used and the arrival of the aircraft. [5] In European terms EUROCONTROL - European Organization for the Safety of Air Navigation, is an organization that aims to develop a system of uniform air traffic control, integrated, secure and proposes considering ergonomics through direct work with human-computer interface and interface controller. It is one of the institutions that have defined interface standards and thus increasingly industry has embraced these converging standards for a common interface operation regardless of manufacturer. [6] In Brazil, the recent lack of reliability in the system of air traffic control was crucial due to maladministration of Brazilian Civil Aviation and conflicting information could generate interference or inadequate control bodies (TWR/APP/ACC). The ESCA-4000 was implemented in the mid-1990s and in less than 20 years a new change is occurring on the systems of air traffic control in Brazil, with a new Brazilian system, designed by ATECH, called SAGITARIO, which will replace the system ESCA-4000. Not deploy a new system of capture and receiving information via radar scope and database with cutting-edge technology, if the radar antenna systems and radio-telephony are not reliable. [7]
Fig. 1. SVTD overview
The software of air traffic control treatment system and data visualization (STVD), developed and maintained by ATECH Foundation, will be deployed in multiple approximation control centres (APP) and the area control center (ACC) in Brazil,
Macroergonomics in Air Traffic Control – The Approach of a New System
519
Brasilia APP that is responsible for control of 600 daily flights, which started operation in May 2009. Figure 1 shows an overview of SVTD, the various components that make up the system and the communication between these components [8]. The APP is responsible for controlling the aircraft traffic and phases of an airport runway. The solution developed by ATECH for the APP is already present in the APPs of São Paulo (SP), Rio de Janeiro (RJ), Belo Horizonte (MG), Natal (RN), Curitiba (PR), Florianópolis (SC), Recife (EP), Cuiabá (MT), Campo Grande (MS), Belém (PA) and Pirassununga (SP). The forecast is that by 2010, the 22 major centres operating with revitalized systems. The program of modernization of the APP is part of an initiative of the Brazilian Air Force. In this system, human-computer interface is one of the main components of the system, noting the importance of the connection between the controller and the transmissions of air-earth messages. Second Endo et alii [8], SAGITARIO - sub-system of visualization of air situation (IDS) of traffic management system was designed as "air international practices and standards defined by EUROCONTROL to create agile, integrated, interfaces and secure systems for the operation of air traffic management." The SAGITARIO system has innovations concerning HCI and information architecture. The system was developed by observing the object-oriented programming, not forgetting the sophistication of information treatment when air traffic control, whether or not with graphical information. With this new system is there a significant change in imaging of flight plans. The ESCA-4000, electronic flight plans were a display next to the radar, and the controller was required to interact with at least two interfaces at the same time. Figure 2 shows a TWR controlling position, with the ESCA-4000 system where the controller has to deal with multi-tasking work.
Fig. 2. Workstation with additional display for flight plans (right)
520
L.H. Boueri Rebello
With the new SAGITARIO system, the flight plan information can be submitted along with the summary information screen of the radar, as the demand for information about the flight that is under control. In Figure 3, you can view the basic summary screen of the radar.
Fig. 3. Basic summary screen
The flight plan will be shown through smart tag, which will show you the basic information from the aircraft identification. When you hover your cursor over the default label, it will show all available information concerning flight (extended label). In Figure 4.1 and 4.2 shows the new default label and extended label.
Fig. 4.1. Default label
Fig. 4.2. Extended label
Macroergonomics in Air Traffic Control – The Approach of a New System
521
By using the mouse or track ball, the controller may interact directly on an extended label fields through a menu action. It is therefore down efficient information being in one single interface. In Figure 5 you can see the menu action on the field (speed) of extended label.
Fig. 5. Example of Action Menu
Fig. 6. Weather information of image of SAGITARIO
This new system you can work with images without the danger of loosing radar information. You can make edits graphics in the flight plan. There is also a toolbar secondary features, which can be closed when not in use to maximize the use of
522
L.H. Boueri Rebello
monitor area with radar information. If after the fact to have a weather images overlay (as shown in Figure 6), which can be used as another element of support for the decision making of the controller. In contrast, in the previous system, resulting in a weather information to the monitor, forcing the controller to divert your attention from the radar summary screen during the development of its task.
4 An Effective Method in Macroergonomics A survey and analysis of data from the production system as a whole should be done. The method of analysis will be participatory approach (participatory ergonomics). Participatory ergonomics has been considered the most appropriate approach and more applied within the context of macroergonomics. Participatory ergonomics looks exactly involve multiple organizational levels in the identification, analysis and troubleshooting. Thus, participatory ergonomics strategy is to encourage participation, because the involvement of employees in resolve ergonomic problems can generate greater trust, interest and experience, to see and solve problems related to your work, often eliminating the presence of specialists. The participation of individuals involved in the work process, provides ergonomic intervention has better result, taking into account the variability and reliability issues in systems. This intervention should be made at four levels: 1. Human-Interface: the environmental and environment with the work relationship; 2. Human-Machine Interface: the direct interaction with the system; 3. Human-Information Interface: the issue of information exchange. As it is made. In the case of air traffic control, the issue is crucial for more modern system used, since the information exchange is the central element of any action. 4. Human-Interface: in the organizational aspect, since it is directly related to managing everything that occurs in jobs, work procedures and the organization that may influence on the system. 4.1 Variability and Reliability in Air Traffic Control The workload changes according to the variability and reliability is closely linked to related reactions with the demands of work situations. According to Vidal/Simoni [9], the concept of variability in work means changing patterns and references to the execution of tasks, this of course referring to in the communication process based on air traffic control. Methodically, there are forms of variability. They can be differentiated as: technical variability (an equipment failure), work process variability (such as the worker intervenes at work) and the human variability, because people are not different machines and are naturally. The way of an operator to interpret information may be different from another, since they may have different ways of interpreting job content. Ideally, the tasks are structured and adapted to the productive system variants, i.e. the different patterns of variability in order to achieve the system reliability.
Macroergonomics in Air Traffic Control – The Approach of a New System
523
In terms of reliability, the mental workload can influence to decrease reliability, having in mind that the operator has a limitation on the ability of mental processing of information. There are techniques for increased reliability, which could also be means to mitigate variability. These techniques are the following: • Terotecnology: anticipation of a situation before there is a real need. Example: the controller can anticipate a procedure on the basis of a radar information. • Preventive maintenance in its most profound meaning. • Fault trees: incident relationship with facts that have already occurred. Example: what were the most common accidents/incidents in a terminal and/or route. • Simulations of various types. • Mapping: to be done in working conditions to a better understanding of events. As was discussed earlier, Brazil has gone through a major period of lack of reliability in the system of air traffic control and now is restructuring, if renewing and modernizing to through the use of new technologies where human-computer interface is paramount and, in parallel to the evolution of current systems of new international air management concept of the future, the CNS-ATM (communication, navigation and surveillance air traffic management) that uses flight management features supported in communications satellites for aircraft navigation. Not deploy a new system of capture and receiving information via radar scope and database with cutting-edge technology, the radio-telephony system and data. Your performance can also degrade more terminals in some handling due to the large amount of noise often interfered. Fortunately the controllers are already accustomed to these types of noise, and able to work around the problem; but it would be much better if they don't exist to ensure the safety of the information received and transmitted. The solution can be the creation of a system of communication done through optical fibers and have the possibility to block the interference through the use of passwords. Much of the reliability of the system is related to the speed with which the driver can see the future (trivial task during the vectoring of information for each flight in radar). The new systems, which use assistance per computer, assist in the run-up to avoiding conflicting routes; you must have full knowledge of system activity and that the decision can be made at the right time without doubts and errors. Since the beginning of training should give value to the content of the task in terms of actual work, if possible, showing the differences and similarities between the old system and the most modern system. It is also important that the controller has the knowledge of the conventional procedure (also called Basic), because such a system can still be used as stand-by in case there are any drop in more modern system. All control centers in the world are still in a conventional intermediate stage and still works as a pro-reliability element, which is natural in situations of transition. There should be a special training for the control of information together with the transmission of information via voice and typing, which requires a specific skill in terms of specific codes and their applications. These codes fall into an operation according to Falzon [10]. The greater the ease and speed controller type information using the terminology of control, both in the mother tongue as in English, controllers not experience security in the transmission of information even in time stress.
524
L.H. Boueri Rebello
5 Participation, Research and Troubleshooting Must give due importance to controller activity under the cooperative and participatory work, because in terms of training these characteristics are virtually neglected. In fact they should be highlighted and encouraged where it would consider specific training modules and collective activities. There should be an incentive for an improvement in the training of new controllers. Older controllers, and therefore, are retiring and younger are facing problems in current training because the training process is very fast. If is necessary, a monitoring support for new controllers. It would also be highly desirable, from the point of view, if there was a planned exchange of information between controllers and pilots, for which there is greater understanding of their work, in order to avoid misunderstandings. In other words, is a representation of mutual activity to provide adequate levels of cooperations. As a result, there is better integration for communications, with guarantee greater flight safety. Questions about the work of controllers in Brazil should be discussed with operators and supervisors, pilots, by ANAC (National Civil Aviation Agency) and the Ministry of Defense. These discussions are presented in FAA, ICAO and Eurocontrol. Should prepared questionnaire concerning the actors opinion research this scenario as the work performed. The issues addressed are: degree of satisfaction at the activities, the pace of work environment, degree of difficulty of learning, understanding of information, equipment and systems that is being used, degree of comfort and physical and mental fatigue. After the research, an analysis should be made to find the mismatch points. Finally, you will need to develop new procedures are in accordance with international standards. A world of systems standardization is necessary, since the air traffic control is not related only to a particular country, is a global issue.
6 Conclusion This article presented the current situation changes in air traffic control in Brazil, a work in progress, and points out the importance of careful review in the light of macroergonomics to the improvement of the work of controllers, aimed at the total safety of flights along with better fluidity in airspace management and soil at airports.
References 1. BRASIL – SENADO FEDERAL. Relatório Final – CPI do “Apagão Aéreo”. Brasília, p. 2156 (2007) 2. BRASIL – CÂMARA DOS DEPUTADOS. CPI – Comissão Parlamentar de Inquérito – Crise do Sistema de Tráfego Aéreo. Brasília, p. 713 (2007) 3. Hendrick, H., Kleiner, B.M.: EVC – Editora Virtual Científica, Rio de Janeiro (2006) 4. MITRE CAADS – URET, http://caasd.org/work/project_details.cfm?item_id=156 5. FAA – URET, http://hf.tc.faa.gov/capabilities/uret.htm 6. EUROCONTROL, http://www.eurocontrol.int
Macroergonomics in Air Traffic Control – The Approach of a New System
525
7. Siedwerdt, E.: O Modelo de Controle do Espaço Aéreo Brasileiro e sua Integração com Outros Sistemas. Apresentação em PowerPoint. VII SITRAER – Simpósio de Transporte Aéreo, Rio de Janeiro (2008) 8. Endo, R., et al.: Interface Humano-Computador do Sub-Sistema de Visualização de Situação Aérea do Sistema de Gerenciamento de Tráfego Aéreo. In: Proceedings do VII SITRAER – Simpósio de Transporte Aéreo, Rio de Janeiro, pp. 576–584 (2008) 9. Vidal, M., de Simoni, M.: Análise da Condição Humana em Situações de Trabalho, Rio de Janeiro, COPPE/UFRJ (1990) 10. Falzon, P.: Ergonomie Cognitive du Dialogue, 1re ed., p. 173. Presses Universitaires de Grenoble, Grenoble (1989)
A Preliminary Investigation of Training Order for Introducing NextGen Tools R. Conrad Rorie1, Ariana Kiken1, Corey Morgan1, Sabrina Billinghurst1, Gregory Morales1, Kevin Monk1, Kim-Phuong L. Vu1, Thomas Strybel1, and Vernol Battiste2 1
California State University Long Beach Center for Human Factors in Advanced Aeronautics Technologies 1250 N Bellflower Blvd. Long Beach, CA 90840, USA 2 San Jose State University Foundation and NASA Ames Research Center Moffett Field, CA 94035, USA [email protected], {aegkiken,coreyandrewmorgan,sabrinabillinghurst, gregory.morales}@gmail.com, [email protected], {kvu8,tstrybel}@csulb.edu, [email protected]
Abstract. Eleven students enrolled in a 16-week radar simulation course were trained on current-day and NextGen tools. The order of the training was manipulated so that half of the students received current-day training first, followed by the training on NextGen tools, while the remaining students received training on the NextGen tools first, followed by current-day training. This paper reports data from the debriefing sessions following the conclusion of the course, with the intent of determining students’ reaction to the training order and their comments and suggestions for future training schedules. Results indicated that future training should start with current-day procedures and delay the introduction of NextGen tools until trainees have established fundamental air traffic management skills. Keywords: ATC, training, part-task, NextGen.
1 Introduction The Next Generation Airspace Transportation System (NextGen) will replace the current air traffic management (ATM) system in the U.S. NextGen is being developed to allow the national air space to handle 2-3X current-day traffic levels in response to estimated increases in air travel in the US and world-wide by 2025 [1]. One limitation of the current ATM system is the amount of aircraft that can be safely handled by air traffic controllers (ATC) with the existing tools. NextGen intends to automate specific air traffic roles and responsibilities in order to reduce controller workload and increase their capacity to handle more aircraft. Tasks that are under consideration for automation in NextGen include conflict detection and conflict resolution [2]. In addition, tools and technologies are being considered to provide controllers with the capability of making route modifications that can be uplinked directly to the flight deck [3]. G. Salvendy, M.J. Smith (Eds.): Human Interface, Part II, HCII 2011, LNCS 6772, pp. 526–533, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Preliminary Investigation of Training Order for Introducing NextGen Tools
527
The introduction of NextGen concepts of operations, tools, and technologies need to be evaluated in terms of their effectiveness. As such, many studies and simulations have been conducted to test potential concepts and technologies for NextGen [4, 5, 6]. One area that has not received much attention, though, is that of training NextGen tools and procedures. Currently, ATC training is typically based on Air Traffic Basics courses provided by Air Traffic Collegiate Training Initiative programs followed by intensive training at the FAA Academy in Oklahoma City [7, 8]. After completing the basic training, controllers are then sent to a location (e.g., Tower, TRACON, Center) to receive on-site training through an apprenticeship model [7]. The apprenticeship can last from months to years. There has been some evidence that simulation training can reduce on the job training time [9]. In the present study, we examine whether the order in which NextGen tools are introduced into training can affect the learning of students who are pursuing careers in air traffic control. Although the effectiveness of NextGen technologies needs to be tested with experienced controllers for operational validity, students are a necessary source of research participants because they will eventually become NextGen operators. Vu et al. [10] showed that highly-trained students in a specific sector can show analogous performance to retired controllers who were only given a 1-day training session to familiarize themselves with a new sector. Moreover, students showed more willingness to try new technologies than experienced controllers. One reason for higher acceptance of new technologies among students is that the students have grown up in an era where technology is dominant in everyday activities. Future ATC training will need to incorporate the training of NextGen concepts and technologies to fully prepare the workforce for the upcoming transformation. This study is an initial step toward that goal by examining how the order of introduction to NextGen tools affects student ATM learning. Results from this preliminary study can inform future research on how to best introduce NextGen technologies into the current-day training paradigm. Schneider [11] described air traffic controlling as a “high performance skill”, defined as a task requiring more than 100 hours of training, producing high failure rates, and exhibiting qualitative differences between the performances of novices and experts. As such, training programs need to be designed to be as effective as possible in order to promote successful completion of their requirements. Sohn et al. [12] noted that, when learning complex skills, it can be helpful to break down high-level tasks into their component parts, otherwise known as part-task training. They reasoned that becoming fluent on the component tasks increases the chances of becoming fluent on the overall task. There have been many demonstrations of parttask training compared to whole-task training in different domains [13, 14, 15]. Young et al. [16] showed that the difficulty participants experienced in training can also influence their performance later. In their study, an unrelated secondary task was added to increase the task’s difficulty during training. Increased difficulty during training was found to lead to better retention of skills over time. With regard to NextGen training, these results suggest that more difficult skills should be learned first, followed by easier skills. To test this hypothesis, we vary the order of two air traffic management components, one involving the learning of current ATM skills and the other involving the learning with NextGen tools that automate the task of conflict detection and provide tools in support of conflict resolution.
528
R.C. Rorie et al.
Current-day air traffic management skills include detecting aircraft in conflict by projecting aircraft trajectory and speed, using strategies for the separation of aircraft, techniques for ensuring safe merging and spacing of aircraft, and using strategies for overall sector management through the exclusive use of voice communications. Introduction of NextGen technologies into ATC training will necessitate instruction on several new, transformative tools such as Data Comm, conflict detection, and conflict probes. Data Comm, a text-based communication system between controllers and pilots, will significantly reduce the amount of voice communications made between air traffic control and pilots. Although the reduction of voice communication should reduce operator workload, the use of Data Comm will require the training of controllers on Data Comm commands. Conflict detection tools will automatically alert controllers of potential conflicts for pairs of aircraft that are equipped with NextGen technologies. Enabling conflict detection will substantially reduce controller workload since they will not need to scan for conflicts between the equipped aircraft. Conflict probes will assist controllers in the task of conflict resolution by providing them cues regarding whether a flight plan change is conflict-free or not. Again, conflict probes should reduce controller workload and cognitive demands. However, studies have shown that tools that decrease operator workload may take operators “out of the loop” and reduce their situation awareness [17]. Low levels of situation awareness can then lead to errors, especially when automation fails and the operator must perform all tasks manually [18]. The present paper is based on a larger study examining whether the order in which student controllers are trained with current day and NextGen procedures affects their performance, situation awareness and workload. This paper reports data from debriefing sessions of the larger study, with the intent of determining students’ reaction to the training order, their assessment about how the training order affected their overall learning, and their comments and suggestions for future training schedules. A content analysis was performed on transcripts of the debriefing sessions and written notes by an experimenter. Due to the small sample size and preliminary nature of this training study, no formal analyses were run. Instead, we provide a qualitative summary of the participants’ responses.
2 Method 2.1 Participants Eleven students enrolled in a 16-week, radar simulation course through California State University Long Beach’s Center for Human Factors in Advanced Aeronautics Technologies (CHAAT) participated. All students were enrolled in an aviation program at Mount San Antonio College (Mt SAC), and are pursuing careers in air traffic control. Students had an average of 1.5 years of study in their program and have all completed a course in the Air Traffic Control Environment at Mt SAC, which includes topics of aircraft characteristics, air traffic procedures, and phraseology. 2.2 Training All students received a minimum of 6 hours per week of training on managing traffic of enroute sectors ZID 91 and 81. Traffic in the sector consisted of arrivals and
A Preliminary Investigation of Training Order for Introducing NextGen Tools
529
departures to/from Louisville Standiford International Airport, as well as overflights. Two of the 6 training hours were dedicated to general air traffic management skills and airspace (en route) operations in a classroom setting taught by a retired, radarcertified controller. The other 4 hours were dedicated to hands-on radar simulation training in the CHAAT simulation lab using the MultiAircraft Control System (MACS) software developed by the Airspace Operations Laboratory at NASA Ames Research Center [19]. All students received instruction on the MACS software and on procedures and strategies for managing traffic in the sector. The students were divided into two separate training groups, differing on the timing of exposure to advanced ATC tools. For the current-day procedures, students were taught traffic management techniques for conflict detection and strategies for judging the threat level of potential conflicts. In addition, they were instructed on point-to-point vectoring techniques for safety and efficiency, and the proper phraseology for communicating with aircraft in their sector. For the NextGen tools procedures, students were trained on how to use Data Comm commands for issuing clearances as well as how to use the conflict alerting and conflict probes for Data Comm-equipped aircraft to detect and resolve conflicts. All students were also taught the procedure for taking handoffs and for handing off aircraft using voice or Data Comm, depending on the training condition. The Current Day-First training group received training on air traffic management using only voice communications during Weeks 2-7 of the laboratory component of the course and received training with NextGen tools (Data Comm and the advanced conflict detection and resolution tools) during Weeks 9-15. Students in the NextGenFirst training group received hands-on training with Data Comm and the advanced ATC tools during Weeks 2-7 and received current-day instruction without the advanced tools during Weeks 9-15. Both groups received mixed training the week before two sets of experimental trials, which occurred at the end of Weeks 8 and 16. The experimental trials utilized a mixed equipage environment with Data Commequipped and voice-only aircraft. At the end of the 16-week second test session, participants were debriefed on the purpose of the simulation and were asked to provide feedback regarding the training design. The conversations were recorded, with permission of the participants, and then were subsequently transcribed for the present analysis. 2.3 Apparatus Participants performed their experimental trials within the same simulation environment taught during the hands-on portion of their training. MACS is a medium fidelity computer application simulating a radar screen of ZID Sector 91. It accommodates both air traffic control and pilot operations. All aircraft were piloted by “pseudopilots” trained specifically on MACS to provide a realistic traffic environment for controllers. Communication between controllers and pilots was provided by a voice server station, allowing communication via push-to-talk headsets [20].
530
R.C. Rorie et al.
3 Results and Discussion Each of the eleven subjects was asked at the end of the 16-week simulation course to comment on the effectiveness of the training order in developing their ATM skills. There were six subjects in the Current Day-First condition (i.e., voice first) and five subjects in the NextGen-First condition. All six subjects in the Current Day-First condition claimed that receiving verbal training before Data Comm training was highly beneficial in developing their ATM skills and controller-pilot communication skills. Perhaps more telling was that all five participants in the NextGen-First condition reported that their order of training was ineffective, stating that it would have been more beneficial to receive the training on manual conflict detection and voice communication first. The particular themes that developed during the debriefing sessions are discussed in more detail below. 3.1 Perceived Effectiveness of Training Order All students indicated that the most important skills they needed for ATM, including how to scan the scope for conflicts, predict flight trajectories, make speed projections, and make effective use of strategies taught in the classroom for modifying routes, were also the most difficult to learn. Therefore, they all reported that current day procedures should be taught first and be given the most emphasis (about threequarters instead of half of the class) during training. This finding is consistent with the difficulty of training hypothesis Young et al. [16] in that the more difficult tasks should be trained first because it leads to better transfer of skill and retention of learning. Training the students with current-day procedures first allows them to acquire fundamental ATM skills. In particular, the task of manually scanning the scope for conflicting aircraft was the primary area that the students reported needing extended training. Since this critical task was only practiced during the Current Day-First condition, those in the NextGen-First condition reported feeling unprepared to manage traffic for their first test halfway through the course. Participants also claimed that training on the NextGen tools required less attention and therefore caused trainees to become passive as they let the computer do the majority of the work for them. Several students in that group reported that the passivity led them to “slack off” and not critically monitor their environment. However, students in the Current DayFirst condition said they continued to monitor traffic and tried to detect conflicts before the conflict alerting system notified them. As such, they were not as reliant on the conflict detection technology as the NextGen-First condition. More generally, all students felt that practice with current day tasks was the most effective at developing a foundation for ATM skills as a whole. Participants in the NextGen-First condition claimed they felt unprepared in the mid-term testing for even routine tasks such as acknowledging aircraft check-ins and giving frequency changes. The added workload needed for them to perform these routine tasks decreased their ability to perform the more critical tasks of verbally issuing commands to aircraft and monitoring the scope for conflicting traffic. These
A Preliminary Investigation of Training Order for Introducing NextGen Tools
531
students also reported that the advanced tools training emphasized the computer system and interface rather than the actual traffic separation and monitoring techniques. Due to this fact, when students in the NextGen-First condition made the switch to the current-day training in the second half of the study, they felt as if they were starting the class all over again. In other words, there was little transfer of training. This echoes the sentiment of the Current Day-First condition, which stated that current-day procedures provide a more supportive foundation for learning ATC tasks. One student noted feeling held back by beginning his training with NextGen tools. He remarked that training with NextGen tools contradicted the classroom training because those lessons taught strategies for detecting conflicts manually; however, during hands-on training with the advanced tools, students were told to refrain from manual conflict detection because the computer would perform that task for them. In this sense, the student was not able to put into practice what they learned in the classroom component of the course. Most students, irrespective of training condition, indicated that the advanced tools were easy for them to learn and required little focused attention to use. It is not surprising, then, that participants felt eight weeks of training on NextGen tools became redundant. It is important to note, however, that the students did agree that NextGen tools were worthy of some extended training. In their opinion, the inevitable introduction of NextGen tools into the ATC domain necessitates sufficient levels of training in order to bring them up to a proficient level of performance. They also noted that NextGen tools did provide them with novel ways to separate traffic and offload some of their workload by reducing their responsibility for predicting potential conflicts. 3.2 Limitations and Suggestions for Future Training This study was a preliminary investigation of how the order in which NextGen tools are introduced affects student learning of ATM tasks. Based on feedback from students in the course, it is recommended that current-day ATM techniques be trained prior to the introduction of NextGen tools. However, this recommendation assumes that NextGen air traffic will consist of both equipped and unequipped aircraft, a feature of near term NextGen. As more aircraft become equipped with NextGen tools there may be less need for learning current-day ATM techniques. The results from this study were only intended to provide an initial input to research on training of NextGen tools. In fact, we only tested the introduction of three specific NextGen tools, so the findings may not generalize to other NextGen technologies. Also, it should be noted that the training of current-day and NextGen tools occurred only in an environment with all voice or all NextGen equipped aircraft, respectively. It may be the case that introducing NextGen tools early, in a mixed equipage environment, may increase the students’ ability to acquire fundamental ATM skills while benefiting from the NextGen capabilities. In addition, other training schedules should be examined in future investigations.
532
R.C. Rorie et al.
Acknowledgements. This study was supported by NASA cooperative agreement NNX09AU66A, Group 5 University Research Center: Center for the Human factors in Advanced Aeronautics Technologies (Brenda Collins, Technical Monitor).
References 1. Joint Planning and Development Office. Concept of Operations for the Next Generation Air Transportation System Version 1.2 (2007), http://www.jpdo.gov/library/NextGenConOpsv12.pdf 2. Prevot, T., Homola, J., Mercer, J.: Human-in-the-Loop Evaluation of Ground-Based Automated Separation Assurance for NextGen. In: Congress of International Council of the Aeronautical Sciences Anchorage, Anchorage, AK (2008) 3. Strybel, T.Z., Vu, K.-P. L., Bacon, L.P., Kraut, J., Battiste, V., Johnson, W.W.: Diagnosticity of an Online Query Technique for Measuring Pilot Situation Awareness in NextGen. In: Digital Avionics Systems Conference, pp. 4.B.1-1–4.B.1-12 (2010) 4. Ligda, S.V., Johnson, N., Lachter, J., Johnson, W.W.: Pilot Confidence with ATC Automation Using Cockpit Situation Display Tools in a Distributed Traffic Management Environment. In: Salvendy, G., Smith, M.J. (eds.) HCI International 2009. LNCS, vol. 5618, pp. 816–825. Springer, Heidelberg (2009) 5. Dao, A.V., Lachter, J., Battiste, V., Brandt, S.L., Vu, K.-P.L., Strybel, T.Z., Ho, N., Martin, P., Johnson, W.W.: Automated Spacing Support tools for Interval Management Operations during Continuous Descent Approaches. In: Human Factors and Ergonomics Society Annual Meeting Proceedings, pp. 21–25 (2010) 6. Dwyer, J.P., Landry, S.: Separation Assurance and Collision Avoidance Concepts for the Next Generation Air Transportation System. In: Salvendy, G., Smith, M.J. (eds.) Proceedings of the Symposium on Human Interface. LNCS, pp. 748–757. Springer, Heidelberg (2009) 7. Morrison, J.E., Fotohui, C.H., Broach, D.: A Formative Evaluation of the Collegiate Training Initiative. Technical paper DOT/FAA/Am-96/6, FAA Civil Aeromedical Institute, Oklahoma City (1996) 8. Cavcar, A., Cavcar, M.: New Directions for ATC Training. Int. J. Aviat. Psychol. 14, 135– 150 (2004) 9. Yacef, K., Alem, L.: Intelligent Tutoring Systems: Student and Expert Modelling for Simulation-based Training. In: Frasson, C., Gauthier, G., Alan, L. (eds.) ITS 1996. LNCS, vol. 1086, pp. 614–622. Springer, Heidelberg (1996) 10. Vu, K.-P.L., Minakata, K., Nguyen, J., Kraut, J., Raza, H., Battiste, V., Strybel, T.Z.: Situation Awareness and Performance of Student versus Experienced Air Traffic Controllers. In: Salvendy, G., Smith, M.J. (eds.) HCI International 2009. LNCS, vol. 5618, pp. 865–874. Springer, Heidelberg (2009) 11. Schneider, W.: Training High-Performance Skills: Fallacies and Guidelines. Hum. Factors 27(3), 285–300 (1985) 12. Sohn, M., Douglass, S.A., Chen, M., Anderson, J.R.: Characteristics of Fluent Skills in a Complex, Dynamic Problem-Solving Task. Hum. Factors 47, 742–752 (2005) 13. Carlson, R.A., Khoo, B.H., Elliot II, R.G.: Component Practice and Exposure to a Problem-Solving Context. Hum. Factors 32(3), 267–286 (1990) 14. Kurtz, S., Lee, T.D.: Part and Whole Perceptual-Motor Practice of a Polyrhythm. Neurosci. Lett. 338, 205–208 (2003)
A Preliminary Investigation of Training Order for Introducing NextGen Tools
533
15. Lee, F.J., Anderson, J.R.: Does Learning a Complex task have to Be Complex?: A Study in Learning Decomposition. Cognitive Psychol 42, 267–316 (2001) 16. Young, M.D., Healy, A.F., Gonzalez, C., Dutt, V., Bourne Jr., L.E.: Effects of Training with Added Difficulties on RADAR Detection. Appl. Cognit. Psychol. (2010) 17. Dao, A.-Q.V., Brandt, S.L., Battiste, V., Vu, K.-P.L., Strybel, T., Johnson, W.W.: The Impact of Automation Assisted Aircraft Separation on Situation Awareness. In: Salvendy, G., Smith, M.J. (eds.) HCI International 2009. LNCS, vol. 5618, pp. 738–747. Springer, Heidelberg (2009) 18. Kraut, J.M., Kiken, A., Billinghurst, S., Morgan, C.A., Strybel, T.Z., Chiappe, D., Vu, K.P.L.: Effects of Data Communications Failure on Air Traffic Controller Sector Management Effectiveness, Situation Awareness, and Workload. To be presented at Human Computer Interaction International (2010) 19. Prevot, T.: Exploring the Many Perspectives of Distributed Air Traffic management: The Multi Aircraft Control System MACS. In: Proceedings of the HCI-Aero, pp. 149–254 (2002) 20. Canton, R., Refai, M., Johnson, W.W., Battiste, V.: Development and Integration of Human-Centered Conflict Detection and Resolution Tools for Airborne Autonomous Operations. In: Proceedings of the 15th International Symposium of Aviation Psychology, Oklahoma State University, OK (2005)
Author Index
Abe, Masanobu II-11 Abujarad, Fuad II-325 Ahmad, Rahayu I-521 Albayrak, Sahin I-585 Asahi, Toshiyuki I-180 Asahi, Yumi I-291 Asami, Yoshitaka I-443 Asao, Takafumi I-381, I-450, I-478 Bacon, L. Paige II-453, II-473, II-483, II-507 Bae, Ilju I-577 Bagci, Volkan H. II-355 Bailey, Brian I-278 Baizid, Khelifa II-364 Balcisoy, Selim II-345 Battiste, Vernol II-453, II-463, II-483, II-507, II-526 Belluco, Paolo I-391 Bergland, Mark I-20 Billinghurst, Sabrina II-483, II-493, II-526 Bordegoni, Monica I-391 Bottaro, Antonio I-3 Boueri Rebello, Luiza Helena II-516 Bradburn, Keith II-430 Brandon, Merel II-219 Brandt, Summer L. II-463, II-473 Braun, Anreas I-567 Breyer, Matthias I-239, I-528 Bryce, Renee C. I-122 Burkhardt, Dirk I-239, I-528 Cha, Jeong-Won I-558 Chan, Alan H.S. II-3 Chang, Teng-Wen II-207 Chellali, Ryad II-364 Cheok, Adrian David II-66 Chiabrando, Elisa I-538 Chiappe, Dan II-493 Cho, Hyunchul I-558 Choi, Sang-Min I-558 Choi, Taeil II-190 Chu, Pin-Yu II-278
Chung, Jinwook II-268 Coman, Michael S. II-500 Corriveau, Philip II-36 Cugini, Umberto I-391 Daly, Jason I-30 Dao, Arik-Quang V. II-463, II-473 de Groot, Thomas II-219 Dennis, Toni A. II-325 Dobashi, Yoshinori II-388 Dong, Yujie I-82 Duncker, Elke I-48 Eitoku, Shin-ichiro II-11 Elliott, Linda R. I-399 Entezari, Zoubair II-500 Eom, Hae-Sung I-558 Epskamp, Simon II-219 Faasch, Helmut I-641 Fagerstrøm, Asle II-229 Fernando, Owen Noel Newton Fields, Bob I-48 Fontaine, Matt I-30 Forsell, Camilla I-170 Forster, Jeanette I-239 Franssen, Tim II-219 Friedemann, Monika I-201 Fujita, Kinya I-152 Fukada, Hidemi II-373 Fukuzumi, Shin’ichi I-180 Furnari, Roberto I-538 Garbharran, Ameetha I-301 Garc´ıa, Alejandro J. II-236 Ghinea, Gheorghita II-229 Gong, Yang II-253 Gonz´ alez, Mar´ıa Paula II-236 Gottifredi, Sebastian II-236 Grillo, Pierluigi I-538 Guo, Yinni I-93 Han, Yo-Sub I-558 Haritos, Tom II-500 Hashiguchi, Kyoko II-21
II-66
536
Author Index
Hashimoto, Shuji I-659, II-440 Hattori, Kiyohiko II-335 Hattori, Masatsugu II-157 Hayashi, Masayoshi I-450 Hedstr¨ om, Johan I-454 Hercegfi, Karoly I-521 Herron, Meghann II-507 Hirai, Nobuhide I-627 Hirasawa, Naotake I-13, II-373 Hirata, Yukihiro I-627 Hiroyuki, Miki I-76 Hofmann, Cristian I-567 H¨ oger, Rainer I-641 Horiba, Yosuke I-418 Hosono, Naotsune I-231, II-123 Hotta, Shintaro II-246 Hua, Lei II-253 Huang, Xiaojing I-82 Ichikawa, Yoshihiro II-335 Iida, Koji I-408 Ikegami, Yoshikazu II-31 Ikegaya, Yusuke II-103 Inoue, Hiroaki I-627 Inoue, Hiromitsu II-123 Inui, Shigeru I-418 Isa´ıas, Pedro II-285 Ishii, Hironaga II-31 Ishii, Ryo II-131 Ishii, Yutaka II-180 Ishizu, Syohei I-618, II-246 Itai, Shiroh I-408 Ito, Keita II-31 Ito, Teruaki I-425 Ito, Tetsuro II-305 Itou, Junko II-141, II-165 Jakub, Hranac I-366 Jeon, Woongryul I-311, I-548 John, Bonnie E. I-180 Johnson, Walter W. II-453, II-463, II-473 Jung, Hanmin II-262 Kajio, Takuya II-381 Kamata, Kazuo II-446 Kamijo, Kenichi I-636 Kaneko, Shun’ichi I-462 Kanenishi, Kazuhide I-259 K¨ arkk¨ ainen, Tuula II-111
Kasamatsu, Keiko I-597 Kato, Satoshi I-627 Kato, Toshikazu I-612 Katsuki, Aki II-373 Kido, Nobuki I-450 Kiken, Ariana II-483, II-493, II-526 Kikuchi, Senichiro I-627 Kim, Jae Kwan I-577 Kim, Jeeyeon I-311, I-321, I-339 Kim, Laehyum I-558 Kim, Pyung II-262 Kinoe, Yosuke II-147 Kinoshita, Yuichiro I-211 Kitami, Kodai II-75 Klima, Martin I-435 Klyczek, Karen I-20 Ko, Sang-Ki I-558 Kobayashi, Daiji I-443, II-411 Kobayashi, Kazue II-373 Kobayashi, Tsukasa II-440 Komatsu, Tsuyoshi I-103 Komlodi, Anita I-521 Koo, Jahwan II-268 Kotani, Kentaro I-381, I-450, I-478 Koteskey, Robert W. II-453 Kountchev, Roumen II-355 Kountcheva, Roumiana II-355 Kraut, Joshua M. II-473, II-483, II-493 Kring, Jason P. II-500 Kubo, Hiroyuki II-388 Kuijper, Arjan I-239, I-528, I-567 Kumazaki, Yuta I-381 Kuriiwa, Hidetaka I-488 Kurze, Martin I-585 Kuwahara, Noriaki I-603 Kwack, Seungjin II-268 Lachter, Joel II-453, II-463 Lee, Mikyoung II-262 Lee, Soo-Hong I-577 Lee, Ya-Ching II-278 Lee, Youngsook I-311, I-321, I-339, I-548 Lif, Patrik I-454 Ligda, Sarah V. II-453 Likavec, Silvia I-538 Lim, Jae-Kwon I-577 Limberger, Carsten I-567 Lin, Chi-Cheng I-20 Lindahl, Bj¨ orn I-454
Author Index Ling, Chen II-84 Lingelbach, Jan II-55 Liu, Yan I-221 Lombardi, Ilaria I-538 Louden, Robert I-30 Lu, Mei II-36 Macchiarella, Nickolas D. II-500 Macedo, Mario II-285 Maeshiro, Midori I-109 Maeshiro, Tetsuya I-109 Malo, S´ebastien I-40 Marinc, Alexander I-567 Marino, Enrico I-3 Martin, Glenn A. I-30 Matsumoto, Kazunori II-75 Matsushima, Hiroyasu II-335 Matsushima, Norikazu I-508 Matsuura, Norihiko II-131 Migneault, Jo¨el I-40 Miki, Hiroyuki I-231, II-123 Milanova, Mariofanna II-355 Milde, Jan-Torsten II-55 Milicchio, Franco I-3 Minakata, Katsumi II-473 Mior Ibrahim, Emma Nuraihan I-330 Mita, Tatsuya I-603 Miwa, Yoshiyuki I-408, I-508 Miyamoto, Keita II-315 Mochizuki, Rika II-11 Monk, Kevin II-526 Morales, Gregory II-526 Morgan, Corey A. II-493, II-526 Mori, Hirohiko I-76, I-118, I-488 Mori, Yuki I-462 Morimoto, Kazunari I-603 Morishima, Shigeo II-388 Motegi, Manabu II-11 Mozaffari, Elaheh II-46 Mudur, Sudhir II-46 Mukouchi, Takafumi II-131 Munemori, Jun II-141, II-165 Murakami, Masashi I-612 Murata, Kazuyoshi II-157 Muto, Shin-yo II-11 Nagai, Yoshimitsu I-618, II-246 Nagamatsu, Takashi I-651 Nagashima, Yuji II-123
537
Nagata, Mizue II-199 Naito, Go I-508 Naito, Hisashi I-142 Najjar, Lawrence J. II-292 Nakagawa, Seiji I-478 Nakama, Takumi I-211 Nakanishi, Miwa I-470, I-498, II-419 Nakashima, Makoto II-305 Nakayama, Shin-ichi I-109 Nam, Junghyun I-339 Nara, Hiroyuki I-627 Nasoz, Fatma I-122 Nasu, Ayumi I-478 Nazemi, Kawa I-239, I-528 Nergiz, Ahmet Ozcan II-345 Nguyen, Jimmy H. I-349, II-453, II-473, II-507 Nishi, Hiroko I-408, I-508 Noda, Mihoko II-147 Nomura, Makoto II-419 Oˇcen´ aˇsek, Pavel I-165 Oehl, Michael I-641 Ogaki, Tomoyasu II-141, II-165 Ogata, Shinya I-13 Ogawa, Katsuhiko II-21 Ohkura, Michiko I-103, II-31 Ohori, Raita I-618 Oka, Makoto I-488 Okubo, Masashi I-132 Ooba, Yutaro I-488 Oskarsson, Per-Anders I-454 Otaki, Atsushi II-335 Otani, Masayuki II-335 Otsuka, Kazuhiro II-171 Ozawa, Shiro II-131 Paik, Juryon I-339 Palmer, Craig J. I-122 Paoluzzi, Alberto I-3 Pappas, Lisa I-249 Park, Mi Kyong I-597 Park, Myon-Woong I-577 Pavel, Oˇcen´ aˇsek I-359, I-366, I-374 Plumbaum, Till I-585 Pohl, Hans-Martin II-55 Post, Lori A. II-325 Proctor, Robert W. I-93, II-62
538
Author Index
Raape, Ulrich I-201 Ranasinghe, Nimesha II-66 Raza, Hamzah II-473 Redden, Elizabeth S. I-399 Richard, Jocelyn I-40 Robert, Jean-Marc I-40 Roimela, Kimmo II-111 Romero, Mario II-190 Rorie, R. Conrad II-483, II-507, II-526 Rosina, Maurizio I-3 Rozga, Agata II-190 Rugg, David J. I-122 Saga, Ryosuke II-75 Saitoh, Yoshihiko II-301 Sakamoto, Maiko I-636 Sakamoto, Makiba II-396 Sakamoto, Shinichiro II-305 Sakata, Mamiko II-315 Sakata, Masahiko I-498 Salvendy, Gavriel I-93 Sato, Hiroshi I-651 Sato, Hiroyoki II-335 Sato, Keiji II-335 Sato, Tomohiro II-419 Schatz, Sae I-30 Schmeisser, Elmar T. I-399 Schoeckel, Thorsten I-201 Schulz, Katja I-585 Sejima, Yoshihiro II-180 Shehab, Randa II-84 Sheikh, Javed Anjum I-48 Shibuya, Yu II-157 Shimizu, Shunji I-627 Shimohara, Katsunori II-381 Shin, Grace II-190 Shinkai, Daiki I-618 Siebert, Felix W. I-641 Simari, Guillermo R. II-236 Slavik, Pavel I-435 So, Joey C.Y. II-3 Sohn, Young-Tae I-577 Spini, Federico I-3 Stab, Christian I-528 Stegman, Alex II-84 Stelovsky, Jan I-66 Stockl¨ ow, Carsten I-567 Strobl, Christian I-201 Strybel, Thomas Z. II-483, II-493, II-507, II-526
Sumi, Kaoru II-199 Sung, Won-Kyung II-262 Suto, Hidetsugu II-396 Suzuki, Michio II-123 Svensson, Jonathan I-454 Swierenga, Sarah J. II-325 Takadama, Keiki II-335 Takahashi, Noboru I-627 Takahashi, Shogo I-132 Takahashi, Toru II-403 Takahashi, Yuichi II-411 Takamizawa, Seiko II-75 Takata, Shino I-636 Takeuchi, Yugo I-142 Taki, Seiko I-597 Tanaka, Takahiro I-152 Tanaka, Takayuki I-462 Tanaka, Yasuhiro I-498 Taner, Berk II-345 Tanev, Ivan II-381 Tanuma, Kazuhiro II-419 Teng, Fumina I-161 Terano, Takao II-403 Terawaki, Yuki I-58 Tessmann, Sven I-201 Tews, Tessa-Karina I-641 Thatcher, Andrew I-301 Todorov, Vladimir II-355 Togawa, Satoshi I-259 Togo, Akiya I-132 Tokosumi, Akifumi I-161 Tomita, Yutaka II-123 Tomiyama, Ken I-190 Tsai, Pei-shiuan II-94 Tsang, Steve N.H. II-3 Tseng, Hsien-Lee II-278 Tsuji, Hiroshi II-75 Tsukanaka, Satoshi I-211 Uesaka, Makoto II-103 Ugai, Takanori I-268 Vaittinen, Tuomas II-111 van Gennep, Bart II-219 Visser, Thomas II-219 Vogel, Ivan I-165 Vrotsou, Katerina I-170 Vu, Kim-Phuong L. I-349, II-62, II-483, II-493, II-507, II-526
Author Index Wada, Ryo I-603 Wang, Jieyu I-521 Wang, Ying-Chong II-207 Watanabe, Eiju I-627 Watanabe, Manami II-381 Watanabe, Takabumi I-508 Watanabe, Tomio I-651, II-180 Whitman, Lisa I-249 Won, Dongho I-321, I-339, I-548 Wu, Jo-Han I-66 Xu, Shuang
II-430
Yagi, Takashi II-11 Yamada, Ryuta I-488 Yamada-Kawai, Kiko I-13 Yamaguchi, Tomoyuki I-659, II-440 Yamamoto, Michiya I-651 Yamamoto, Sakae I-76, I-231, I-443, I-470, II-123, II-411
Yamamoto, Tomohito II-103 Yamanoi, Takahiro I-636 Yamazaki, Toshimasa I-636 Yamazaki, Yuka I-498 Yano, Naomi I-180 Yano, Yoneo I-259 Yildizli, Can II-345 Yokoyama, Noriko I-659 Yonemura, Shunichi II-446 Yoneyama, Akihiro I-418 Yoshida, Keisuke I-651 Yoshikawa, Hidekazu I-82 You, Man-lai II-94 Yu, Guo-Jhen II-207 Zedek, Frantiˇsek I-165 Zenkoyoh, Masaki I-190 Zhou, Yangping I-82 Zilouchian Moghaddam, Roshanak I-278
539