Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5618
Gavriel Salvendy Michael J. Smith (Eds.)
Human Interface and the Management of Information Information and Interaction Symposium on Human Interface 2009 Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 Proceedings, Part II
13
Volume Editors Gavriel Salvendy Purdue University, Grissom Hall, Room 263 315 North Grant Street, West Lafayette, IN, 47907-2023, USA E-mail:
[email protected] and Tsinghua University, Department of Industrial Engineering Beijing 10084, P.R. China Michael J. Smith University of Wisconsin Department of Industrial and Systems Engineering 1513 University Avenue, Madison, WI 53706, USA E-mail:
[email protected]
Library of Congress Control Number: Applied for CR Subject Classification (1998): H.5, H.3, H.4, K.4.3, D.2 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13
0302-9743 3-642-02558-7 Springer Berlin Heidelberg New York 978-3-642-02558-7 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12704330 06/3180 543210
Foreword
The 13th International Conference on Human–Computer Interaction, HCI International 2009, was held in San Diego, California, USA, July 19–24, 2009, jointly with the Symposium on Human Interface (Japan) 2009, the 8th International Conference on Engineering Psychology and Cognitive Ergonomics, the 5th International Conference on Universal Access in Human–Computer Interaction, the Third International Conference on Virtual and Mixed Reality, the Third International Conference on Internationalization, Design and Global Development, the Third International Conference on Online Communities and Social Computing, the 5th International Conference on Augmented Cognition, the Second International Conference on Digital Human Modeling, and the First International Conference on Human-Centered Design. A total of 4,348 individuals from academia, research institutes, industry and governmental agencies from 73 countries submitted contributions, and 1,425 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of the design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human–computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. This volume, edited by Gavriel Salvendy and Michael J. Smith, contains papers in the thematic area of Human Interface and the Management of Information, addressing the following major topics: • • • • • • •
Interacting with the World Wide Web Intelligent Techniques for Access to Information and Personalization Visual Interfaces, Visualization and Images Mobile Devices and Services eHealth Applications and Services Education, Learning and Entertainment Information Systems in Safety-Critical Domains
The remaining volumes of the HCI International 2009 proceedings are: • • • • •
Volume 1, LNCS 5610, Human–Computer Interaction––New Trends (Part I), edited by Julie A. Jacko Volume 2, LNCS 5611, Human–Computer Interaction––Novel Interaction Methods and Techniques (Part II), edited by Julie A. Jacko Volume 3, LNCS 5612, Human–Computer Interaction––Ambient, Ubiquitous and Intelligent Interaction (Part III), edited by Julie A. Jacko Volume 4, LNCS 5613, Human–Computer Interaction––Interacting in Various Application Domains (Part IV), edited by Julie A. Jacko Volume 5, LNCS 5614, Universal Access in Human–Computer Interaction––Addressing Diversity (Part I), edited by Constantine Stephanidis
VI
Foreword
• • • • • • • • • • •
Volume 6, LNCS 5615, Universal Access in Human–Computer Interaction––Intelligent and Ubiquitous Interaction Environments (Part II), edited by Constantine Stephanidis Volume 7, LNCS 5616, Universal Access in Human–Computer Interaction––Applications and Services (Part III), edited by Constantine Stephanidis Volume 8, LNCS 5617, Human Interface and the Management of Information––Designing Information Environments (Part I), edited by Michael J. Smith and Gavriel Salvendy Volume 10, LNCS 5619, Human-Centered Design, edited by Masaaki Kurosu Volume 11, LNCS 5620, Digital Human Modeling, edited by Vincent G. Duffy Volume 12, LNCS 5621, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris Volume 13, LNCS 5622, Virtual and Mixed Reality, edited by Randall Shumaker Volume 14, LNCS 5623, Internationalization, Design and Global Development, edited by Nuray Aykin Volume 15, LNCS 5624, Ergonomics and Health Aspects of Work with Computers, edited by Ben-Tzion Karsh Volume 16, LNAI 5638, The Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience, edited by Dylan Schmorrow, Ivy Estabrooke and Marc Grootjen Volume 17, LNAI 5639, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris
I would like to thank the Program Chairs and the members of the Program Boards of all thematic areas, listed below, for their contribution to the highest scientific quality and the overall success of HCI International 2009.
Ergonomics and Health Aspects of Work with Computers Program Chair: Ben-Tzion Karsh Arne Aarås, Norway Pascale Carayon, USA Barbara G.F. Cohen, USA Wolfgang Friesdorf, Germany John Gosbee, USA Martin Helander, Singapore Ed Israelski, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Kari Lindström, Finland
Holger Luczak, Germany Aura C. Matias, Philippines Kyung (Ken) Park, Korea Michelle M. Robertson, USA Michelle L. Rogers, USA Steven L. Sauter, USA Dominique L. Scapin, France Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK Teresa Zayas-Cabán, USA
Foreword
Human Interface and the Management of Information Program Chair: Michael J. Smith Gunilla Bradley, Sweden Hans-Jörg Bullinger, Germany Alan Chan, Hong Kong Klaus-Peter Fähnrich, Germany Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA Youngho Rhee, Korea
Anxo Cereijo Roibás, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P.R. China Bernhard Zimolong, Germany
Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven Landry, USA
Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Ron Laughery, USA Wen-Chin Li, Taiwan James T. Luxhøj, USA
Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK
VII
VIII
Foreword
Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa
Matthew J.W. Thomas, Australia Mark Young, UK
Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth André, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian Bühler, Germany Noelle Carbonell, France Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Arthur I. Karshmer, USA Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA
Patrick M. Langdon, UK Seongil Lee, Korea Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Toshiki Yamaoka, Japan Panayiotis Zaphiris, UK
Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA
Gordon M. Mair, UK Miguel A. Otaduy, Switzerland David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Dieter Schmalstieg, Austria Dylan Schmorrow, USA Mark Wiederhold, USA
Internationalization, Design and Global Development Program Chair: Nuray Aykin Michael L. Best, USA Ram Bishu, USA Alan Chan, Hong Kong Andy M. Dearden, UK
Susan M. Dray, USA Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA
Foreword
Sung H. Han, Korea Veikko Ikonen, Finland Esin Kiris, USA Masaaki Kurosu, Japan Apala Lahiri Chavan, USA James R. Lewis, USA Ann Light, UK James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA
Elizabeth D. Mynatt, USA Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin Röse, Germany Eunice Ratna Sari, Indonesia Supriya Singh, Australia Christian Sturm, Spain Adi Tedjasaputra, Singapore Kentaro Toyama, India Alvin W. Yeo, Malaysia Chen Zhao, P.R. China Wei Zhou, P.R. China
Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Amy Bruckman, USA Peter Day, UK Fiorella De Cindio, Italy Michael Gurstein, Canada Tom Horan, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Jonathan Lazar, USA Stefanie Lindstaedt, Austria
Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan Anthony F. Norcio, USA Jennifer Preece, USA Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Sergei Stafeev, Russia Charalambos Vrasidas, Cyprus Cheng-Yen Wang, Taiwan
Augmented Cognition Program Chair: Dylan D. Schmorrow Andy Bellenkes, USA Andrew Belyavin, UK Joseph Cohn, USA Martha E. Crosby, USA Tjerk de Greef, The Netherlands Blair Dickson, UK Traci Downs, USA Julie Drexler, USA Ivy Estabrooke, USA Cali Fidopiastis, USA Chris Forsythe, USA Wai Tat Fu, USA Henry Girolamo, USA
Marc Grootjen, The Netherlands Taro Kanno, Japan Wilhelm E. Kincses, Germany David Kobus, USA Santosh Mathan, USA Rob Matthews, Australia Dennis McBride, USA Robert McCann, USA Jeff Morrison, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Glenn Osga, USA
IX
X
Foreword
Dennis Proffitt, USA Leah Reeves, USA Mike Russo, USA Kay Stanney, USA Roy Stripling, USA Mike Swetnam, USA Rob Taylor, UK
Maria L.Thomas, USA Peter-Paul van Maanen, The Netherlands Karl van Orden, USA Roman Vilimek, Germany Glenn Wilson, USA Thorsten Zander, Germany
Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Thomas J. Armstrong, USA Norm Badler, USA Kathryn Cormican, Ireland Afzal Godil, USA Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Sung H. Han, Korea Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Tianzi Jiang, P.R. China
Kang Li, USA Zhizhong Li, P.R. China Timo J. Määttä, Finland Woojin Park, USA Matthew Parkinson, USA Jim Potvin, Canada Rajesh Subramanian, USA Xuguang Wang, France John F. Wiechel, USA Jingzhou (James) Yang, USA Xiu-gan Yuan, P.R. China
Human Centered Design Program Chair: Masaaki Kurosu Gerhard Fischer, USA Tom Gross, Germany Naotake Hirasawa, Japan Yasuhiro Horibe, Japan Minna Isomursu, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan
Kun-Pyo Lee, Korea Loïc Martínez-Normand, Spain Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Kazuhiko Yamazaki, Japan
In addition to the members of the Program Boards above, I also wish to thank the following volunteer external reviewers: Gavin Lew from the USA, Daniel Su from the UK, and Ilia Adami, Ioannis Basdekis, Yannis Georgalis, Panagiotis Karampelas, Iosif Klironomos, Alexandros Mourouzis, and Stavroula Ntoa from Greece. This conference could not have been possible without the continuous support and advice of the Conference Scientific Advisor, Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications Chair and Editor of HCI International News, Abbas Moallem.
Foreword
XI
I would also like to thank for their contribution toward the organization of the HCI International 2009 conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, and Maria Bouhli. Constantine Stephanidis
HCI International 2011
The 14th International Conference on Human–Computer Interaction, HCI International 2011, will be held jointly with the affiliated conferences in the summer of 2011. It will cover a broad spectrum of themes related to human–computer interaction, including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. More information about the topics, as well as the venue and dates of the conference, will be announced through the HCI International Conference series website: http://www.hci-international.org/
General Chair Professor Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email:
[email protected]
Table of Contents
Part I: Interacting with the World Wide Web Development of a Coloration Support Tool for Making Web Page Screens User-Friendly for Color Blind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michiko Anse and Tsutomu Tabe The Persuasive Effects from Web 2.0 Marketing: A Case Study Investigating the Persuasive Effect from an Online Design Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asle Fagerstrøm and Gheorghita Ghinea
3
10
Formalizing Design Guidelines of Legibility on Web Pages . . . . . . . . . . . . . Fong-Ling Fu and Chiu-Hung Su
17
The Assessment of Credibility of e-Government: Users’ Perspective . . . . . Zhao Huang, Laurence Brooks, and Sherry Chen
26
Auto-complete for Improving Reliability on Semantic Web Service Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hanmin Jung, Mi-Kyoung Lee, Won-Kyung Sung, and Beom-Jong You
36
Effects of AJAX Technology on the Usability of Blogs . . . . . . . . . . . . . . . . Sumonta Kasemvilas and Daniel Firpo
45
Usability Evaluation of Dynamic RSVP Interface on Web Page . . . . . . . . Ya-Li Lin and Darcy Lin
55
“Online Legitimacy”: Defining Institutional Symbolisms for the Design of Information Artifact in the Web Mediated Information Environment (W-MIE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emma Nuraihan Mior Ibrahim and Nor Laila Md Noor Evaluation of Web User Interfaces for the Online Retail of Apparel . . . . . Dominik Rupprecht, Rainer Blum, and Karim Khakzar
65 74
A Coauthoring Method of Keyword Dictionaries for Knowledge Combination on Corporate Discussion Web Sites . . . . . . . . . . . . . . . . . . . . . Shinji Takao, Tadashi Iijima, and Akito Sakurai
84
An Empirical Study the Effects of Language Factors on Web Site Use Intention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hui-Jen Yang and Yun-Long Lay
94
XVI
Table of Contents
Part II: Intelligent Techniques for Access to Information and Personalization Enhancing Document Clustering through Heuristics and Summary-Based Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sri Harsha Allamraju and Robert Chun Email Reply Prediction: A Machine Learning Approach . . . . . . . . . . . . . . . Taiwo Ayodele, Shikun Zhou, and Rinat Khusainov An End-to-End Proactive TCP Based on Available Bandwidth Estimation with Congestion Level Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sangtae Bae, Doohyung Lee, Chihoon Lee, Jinwook Chung, Jahwan Koo, and Suman Banerjee Smart Privacy Management in Ubiquitous Computing Environments . . . Christian B¨ unnig A Fuzzy Multiple Criteria Decision Making Model for Selecting the Distribution Center Location in China: A Taiwanese Manufacturer’s Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chien-Chang Chou and Pei-Chann Chang A Hierarchical Data Dissemination Protocol Using Probability-Based Clustering for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moonseong Kim, Matt W. Mutka, and Hyunseung Choo An OWL-Based Knowledge Model for Combined-Process-and-Location Aware Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gunhee Kim, Manchul Han, Jukyung Park, Hyunchul Park, Sehyung Park, Laehyun Kim, and Sungdo Ha Human-Biometric Sensor Interaction: Impact of Training on Biometric System and User Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eric P. Kukula and Robert W. Proctor Representing Logical Inference Steps with Digital Circuits . . . . . . . . . . . . . Erika Matsak
105
114
124
131
140
149
159
168
178
An Interactive-Content Technique Based Approach to Generating Personalized Advertisement for Privacy Protection . . . . . . . . . . . . . . . . . . . Wook-Hee Min and Yun-Gyung Cheong
185
Loopo: Integrated Text Miner for FACT-Graph-Based Trend Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryosuke Saga, Hiroshi Tsuji, and Kuniaki Tabata
192
Table of Contents
Using Graphical Models for an Intelligent Mixed-Initiative Dialog Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan Schw¨ arzler, G¨ unther Ruske, Frank Wallhoff, and Gerhard Rigoll Input Text Repairing for Multi-lingual Chat System . . . . . . . . . . . . . . . . . . Kenichi Yoshida and Fumio Hattori
XVII
201
210
Part III: Visual Interfaces, Visualization and Images Interactive Object Segmentation System from a Video Sequence . . . . . . . Guntae Bae, Sooyeong Kwak, and Hyeran Byun COBRA – A Visualization Solution to Monitor and Analyze Consumer Generated Medias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amit Behal, Julia Grace, Linda Kato, Ying Chen, Shixia Liu, Weijia Cai, and Weihong Qian Visual String of Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arne Berger, Jens K¨ ursten, and Maximilian Eibl Industrial E-Commerce and Visualization of Products: 3D Rotation versus 2D Metamorphosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francisco V. Cipolla Ficarra, Miguel Cipolla Ficarra, and Daniel A. Giulianelli Evaluating the Effectiveness and the Efficiency of a Vector Image Search Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrizia Di Marco, Tania Di Mascio, Daniele Frigioni, and Massimo Gastaldi
221
229
239
249
259
Building and Browsing Tropos Models: The AVI Design . . . . . . . . . . . . . . . Tania Di Mascio, Anna Perini, Luca Sabatucci, and Angelo Susi
269
A Multiple-Aspects Visualization Tool for Exploring Social Networks . . . Jie Gao, Kazuo Misue, and Jiro Tanaka
277
Multi-hierarchy Information Visualization Research Based on Three-Dimensional Display of Products System . . . . . . . . . . . . . . . . . . . . . . Zhou Hui and Hou WenJun Efficient Annotation Visualization Using Distinctive Features . . . . . . . . . . Seok Kyoo Kim, Sung Hyun Moon, Jun Park, and Sang Yong Han Content Based Image Retrieval Using Adaptive Inverse Pyramid Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mariofanna Milanova, Roumen Kountchev, Stuart Rubin, Vladimir Todorov, and Roumiana Kountcheva
287 295
304
XVIII
Table of Contents
Event Extraction and Visualization for Obtaining Personal Experiences from Blogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoko Nishihara, Keita Sato, and Wataru Sunayama
315
Minato: Integrated Visualization Environment for Embedded Systems Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yosuke Nishino and Eiichi Hayakawa
325
Batik KR Semantic Network: Visualizations of Creative Process and Design Knowledge for the Malaysian Batik Designers’ Community . . . . . Ariza Nordin, Nor Laila Md. Noor, and Ahmad Zainuddin
334
A Tool for Analyzing Categorical Data Visually with Granular Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kousuke Shiraishi, Kazuo Misue, and Jiro Tanaka
342
Part IV: Mobile Devices and Services Understanding Key Attributes in Mobile Service: Kano Model Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seung Ik Baek, Seung Kuk Paik, and Weon Sang Yoo Discovering User Interface Requirements of Search Results for Mobile Clients by Contextual Inquiry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David L. Chan, Robert W.P. Luk, Hong Va Leong, and Edward K.S. Ho Evaluation of Pointing Efficiency on Small Screen Touch User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryosuke Fujioka, Takayuki Akiba, and Hidehiko Okada An Integrated Approach towards the Homogeneous Provision of Geographically Dispersed Info-Mobility Services to Mobile Users . . . . . . . Dimitrios Giakoumis, Dimitrios Tzovaras, Dionisis Kehagias, Evangelos Bekiaris, and George Hassapis Legible Character Size on Mobile Terminal Screens: Estimation Using Pinch-in/Out on the iPod Touch Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Satoshi Hasegawa, Masako Omori, Tomoyuki Watanabe, Shohei Matsunuma, and Masaru Miyao
355
365
375
385
395
Location-Based Mixed-Map Application Development for Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyo-Haeng Lee, Kil-Ram Ha, and Kwang-Seok Hong
403
A Comparison of Artifact Reduction Methods for Real-Time Analysis of fNIRS Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takayuki Nozawa and Toshiyuki Kondo
413
Table of Contents
Investigation on Relation between Index of Difficulty in Fitts’ Law and Device Screen Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hidehiko Okada, Takayuki Akiba, and Ryosuke Fujioka Influence of Vertical Length of Characters on Readability in Mobile Phones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masako Omori, Satoshi Hasegawa, Tomoyuki Watanabe, Shohei Matsunuma, and Masaru Miyao
XIX
423
430
Intelligent Photo Management System Enhancing Browsing Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuki Orii, Takayuki Nozawa, and Toshiyuki Kondo
439
Freeze TCPv2: An Enhancement of Freeze TCP for Efficient Handoff in Heterogeneous Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minu Park, Jaehyung Lee, Jahwan Koo, and Hyunseung Choo
448
Expanding SNS Features with CE Devices: Space, Profile, Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youngho Rhee, Hyunjoo Kang, Yeojin Kim, Juyeon Lee, and IlKu Chang Empirical Evaluation of Throwing Method to Move Object for Long Distance in 3D Information Space on Mobile Device . . . . . . . . . . . . . . . . . . Yu Shibuya, Keiichiro Nagatomo, Kazuyoshi Murata, Itaru Kuramoto, and Yoshihiro Tsujino Usefulness of Mobile Information Provision Systems Using Graphic Text -Visibility of Graphic Text on Mobile Phones . . . . . . . . . . . . . . . . . . . Tomoyuki Watanabe, Masako Omori, Satoshi Hasegawa, Shohei Matsunuma, and Masaru Miyao
458
468
476
Part V: eHealth Applications and Services The Importance of Information in the Process of Acquisition and Usage of a Medicine for Patient Safety: A Study of the Brazilian Context . . . . . Patricia Lopes Fujita and Carla Galv˜ ao Spinillo
489
A Proposal of Collection and Analysis System of Near Miss Incident in Nursing Duties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akihisa Furukawa and Yusaku Okada
497
Effects of Information Displays for Hyperlipidemia . . . . . . . . . . . . . . . . . . . Yang Gong and Jiajie Zhang Clinical Usefulness of Human-Computer Interface for Training Targeted Facial Expression: Application to Patients with Cleft Lip and/or Palate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyoko Ito, Ai Takami, Shumpei Hanibuchi, Shogo Nishida, Masakazu Yagi, Setsuko Uematsu, Naoko Sigenaga, and Kenji Takada
503
513
XX
Table of Contents
The Evaluation of Pharmaceutical Package Designs for the Elderly People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akira Izumiya, Michiko Ohkura, and Fumito Tsuchiya
522
Implications for Developing Information System on Nursing Administration – Case Study on Nurse Scheduling System – . . . . . . . . . . . Mitsuhiko Karashima and Naotake Hirasawa
529
Analysis on Descriptions of Dosage Regimens in Package Inserts of Medicines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masaomi Kimura, Kazuhiro Okada, Keita Nabeta, Michiko Ohkura, and Fumito Tsuchiya Non-intrusive Human Behavior Monitoring Sensor for Health Care System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noriyuki Kushiro, Makoto Katsukura, Masanori Nakata, and Yoshiaki Ito Impact of Healthcare Information Technology Systems on Patient Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Byung Cheol Lee and Vincent G. Duffy Patient Standardization Identification as a Healthcare Issue . . . . . . . . . . . Mario Macedo and Pedro Isa´ıas A Proposal of a Method to Extract Active Ingredient Names from Package Inserts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keita Nabeta, Masaomi Kimura, Michiko Ohkura, and Fumito Tsuchiya Examination of Evaluation Method for Appearance Similarity of PTP Sheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshitaka Ootsuki, Akira Izumiya, Michiko Ohkura, and Fumito Tsuchiya Identifying Latent Similarities among Near-Miss Incident Records Using a Text-Mining Method and a Scenario-Based Approach . . . . . . . . . Tetsuo Sawaragi, Kouichi Ito, Yukio Horiguchi, and Hiroaki Nakanishi Patient Safety: Contributions from a Task Analysis Study on Medicine Usage by Brazilians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carla Spinillo, Stephania Padovani, and Cristine Lanzoni Remote Consultation System Using Hierarchically Structured Agents . . . Hiroshi Yajima, Jun Sawamoto, and Kazuo Matsuda
539
549
559 566
576
586
594
604 609
Part VI: Education, Learning and Entertainement How Mobile Interaction Motivates Students in a Class? . . . . . . . . . . . . . . . Akinobu Ando and Kazunari Morimoto
621
Table of Contents
XXI
Sensation Seeking, Self Forgetfulness, and Computer Game Enjoyment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaowen Fang and Fan Zhao
632
Development of an Annotation-Based Classroom Activities Support Environment Using Digital Appliance, Mobile Device and PC . . . . . . . . . . Yoshiaki Hada and Masanori Shinohara
642
An Empirical Investigation on the Effectiveness of Virtual Learning Environment in Supporting Collaborative Learning: A System Design Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Na Liu, Yingqin Zhong, and John Lim
650
Personalization for Specific Users: Designing Decision Support Systems to Support Stimulating Learning Environments . . . . . . . . . . . . . . . . . . . . . . Laura M˘ aru¸ster, Niels R. Faber, and Rob J. van Haren
660
Construction of Systematic Learning Support System of Business Theory and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshiki Nakamura and Katsuhiro Sakamoto
669
Learning by Design in a Digital World: Students’ Attitudes towards a New Pedagogical Model for Online Academic Learning . . . . . . . . . . . . . . . . Karen Precel, Yoram Eshet-Alkalai, and Yael Alberton
679
Promoting a Central Learning Management System by Encouraging Its Use for Other Purposes Than Teaching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Franz Reichl and Andreas Hruska
689
Framework for Supporting Decision Making in Learning Management System Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuki Terawaki
699
Statistics-Based Cognitive Human-Robot Interfaces for Board Games – Let’s Play! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frank Wallhoff, Alexander Bannat, J¨ urgen Gast, Tobias Rehrl, Moritz Dausinger, and Gerhard Rigoll The Design and Development of an Adaptive Web-Based Learning System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chian Wang
708
716
Part VII: Information Systems in Safety-Critical Domains Human-System Interface (HSI) Challenges in Nuclear Power Plant Control Rooms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jo-Ling Chang, Huafei Liao, and Liang Zeng
729
XXII
Table of Contents
The Impact of Automation Assisted Aircraft Separation on Situation Awareness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arik-Quang V. Dao, Summer L. Brandt, Vernol Battiste, Kim-Phuong L. Vu, Thomas Strybel, and Walter W. Johnson Separation Assurance and Collision Avoidance Concepts for the Next Generation Air Transportation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John P. Dwyer and Steven Landry Analysis of Team Communication and Collaboration in En-Route Air Traffic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kazuo Furuta, Yusuke Soraji, Taro Kanno, Hisae Aoyama, Daisuke Karikawa, and Makoto Takahashi Comparison of Pilot Recovery and Response Times in Two Types of Cockpits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vishal Hiremath, Robert W. Proctor, Richard O. Fanjoy, Robert G. Feyen, and John P. Young
738
748
758
766
Information Requirements and Sharing for NGATS Function Allocation Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nhut Tan Ho, Patrick Martin, Joseph Bellissimo, and Barry Berson
776
HILAS: Human Interaction in the Lifecycle of Aviation Systems – Collaboration, Innovation and Learning . . . . . . . . . . . . . . . . . . . David Jacobson, Nick McDonald, and Bernard Musyck
786
Redefining Interoperability: Understanding Police Communication Task Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gyu H. Kwon, Tonya L. Smith-Jackson, and Charles W. Bostian
797
Unique Reporting Form: Flight Crew Auditing of Everyday Performance in an Airline Safety Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Chiara Leva, Alison Kay, Joan Cahill, Gabriel Losa, Sharon Keating, Diogo Serradas, and Nick McDonald Pilot Confidence with ATC Automation Using Cockpit Situation Display Tools in a Distributed Traffic Management Environment . . . . . . . Sarah V. Ligda, Nancy Johnson, Joel Lachter, and Walter W. Johnson A Study of Auditory Warning Signals for the Design Guidelines of Man-Machine Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mie Nakatani, Daisuke Suzuki, Nobuchika Sakata, and Shogo Nishida Computer-Aided Collaborative Work into War Rooms: A New Approach of Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeremy Ringard, Samuel Degrande, St´ephane Louis-dit-Picard, and Christophe Chaillou
806
816
826
835
Table of Contents
Optimizing Online Situation Awareness Probes in Air Traffic Management Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Z. Strybel, Katsumi Minakata, Jimmy Nguyen, Russell Pierce, and Kim-Phuong L. Vu A Development of Information System for Disaster Victims with Autonomous Wireless Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuichi Takahashi, Daiji Kobayashi, and Sakae Yamamoto Situation Awareness and Performance of Student versus Experienced Air Traffic Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kim-Phuong L. Vu, Katsumi Minakata, Jimmy Nguyen, Josh Kraut, Hamzah Raza, Vernol Battiste, and Thomas Z. Strybel Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XXIII
845
855
865
875
Development of a Coloration Support Tool for Making Web Page Screens User-Friendly for Color Blind Michiko Anse and Tsutomu Tabe Aoyama Gakuin University, Department of Industrial and Systems Engineering, 5-10-1 Fuchinobe, Sagamihara-city, Kanagawa, Japan {anse,tabe}@ise.aoyama.ac.jp
Abstract. Websites are providing more and more information because of their convenience. Information is often discriminated by color vision ability between individuals. Some people cannot discriminate information depending on the coloration. In these cases, information cannot be correctly discerned. Therefore, the development of a supporting tool for configuring screens is needed. Keywords: color blind, web page, coloration support tool.
1 Introduction Difference in color vision between individuals is caused by three types of nerve cells (cone cells that absorb different wavelengths). Persons who have normal chromatic vision perceive color with all three types of cone. Color deficient observers perceive colors at different levels depending on the status of impaired cone function. Color deficient observers include protanopes (red cone does not function), deuteranopes (green cone does not function) and tritanopes (blue cone does not function), which, when only two colors are perceived, is called dichromatic vision. Color deficient observers account for 5% and 0.2% of East Asian males and females, respectively. They account for 8% and 4% of Caucasian males and black males respectively. According to the ratio, one out of 20 Japanese is color deficient, which is a considerable number [1, 2]. Information collection is indispensable in daily life. The spread of the Internet is allowing websites to provide more and more information. The use of colors is effective for emphasizing information to be provided because colors increase image discrimination. Therefore, colors are widely used in websites. However, information dispensed by colors may not be passed on correctly because color vision varies between individuals. We should thus use picturesque colors for passing information, and this information should be able to be understood even by color deficient observers.
2 Methods Offering Support to Make Color Web Page Screens That Can Be Discriminated by Color Blind 2.1 Methods to Simulate Color Vision There are two main methods to simulate the vision of color deficient observers. The first method is to look through a filter to simulate imperfect color vision or to use a M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 3–9, 2009. © Springer-Verlag Berlin Heidelberg 2009
4
M. Anse and T. Tabe
liquid crystal display monitor with a color vision simulating function. The second method is to convert normal color vision to anomalous color vision via VsCheck software, etc, after reading a file on a personal computer. By these methods, we can see what normal color vision looks like when converted into abnormal color vision, allowing us to realize how color deficient observers see things and find our extensive use of colors confusing. 2.2 Methods for Detecting Confusing Colorations Uding (software) can detect confusing colorations displayed by simulation software and change the coloration [3]. ColorSelector (software) determines whether or not a specified combination of colors can be discriminated by color deficient observer [4]. They modify already existing files or determine whether two chosen colors can be discriminated or not. Therefore, they have a disadvantage in that they can be used only after files have been created or require time-consuming work to investigate whether two colors chosen from many combinations can be discriminated. 2.3 A Method for Supporting Coloring-Decisions When Making Web Page Screens Using a color vision simulator or ColorSelector for determining coloration when creating web page screens, is quite troublesome. Therefore, the author has devised a tool that supports the determination of coloration when making web page screens, and has created and verified the prototype.
3 Functions of the Web Page Screen Coloration Support Tool It should be provided as an add-in function of web page creation software. A user selects an area requiring the support of coloration during the creation of a web page. Then, the user calls up the coloration support tool. The functions of the tool consist of “Select color,” “Diagnose” and “Display.” The user can determine coloration with the consideration of color deficient observers by calling up this tool and reflecting the coloration on the screen. This tool helps a user make colorations so that two colors on a background color can be discriminated by color deficient observers. This tool has two modes, one for characters and the other for illustrations only. 3.1 “Select Color” Function A user can select a background color and two colors on the background color from color palette tabs. Each color palette has 216 colors called web-safe colors that appear as almost the same color on both Windows and Macintosh. A user can specify a color he/she wants to use by clicking a color on the color palette or drawing a color out of the screen by using a dropper. A user selects a combination of colors, for example, “background and character color” whose coloration the user wants to evaluate. The color of any character which is inappropriate for the selected background color is
Development of a Coloration Support Tool for Making Web Page Screens User-Friendly
5
crossed out on the tab. Character colors which are appropriate for the selected background color, yet requiring some caution are checked with a triangular symbol. Color combinations which are deemed to be inappropriate for even one color vision out of the four types are also crossed out. 3.2 “Diagnosis” Function In the “Select color” mode, the color palette displays whether color combinations are appropriate or not. After determining the background color, and colors 1 and 2, visibility by the four vision types is diagnosed. The result is displayed by circle (appropriate), triangle (Due consideration is required though it is not inappropriate) and cross (inappropriate). The value of RGB of persons with normal color vision displayed on the color palette is converted to the value of RGB of the other three types of vision and saved in the table in order to diagnose colorations in the four types of color vision. The RGB value is used in the calculations listed below. In the diagnosis, combinations of colors are determined to be appropriate if they meet certain standards (i.e., differences in brightness, luminance and color). Other combinations are determined to be inappropriate. Color combinations with a color difference of 500 or more and a brightness difference of 96 through 125 are determined that caution be exercised, though they are not inappropriate. In WCAG, when the RGB of two colors is assumed to be (R1, G1, B1) and (R2, G2, B2), if the conditions listed below are met, the two colors are considered to be discriminated from each other concerning brightness and color difference. However, the WCAG standard is too strict and the number of usable colors becomes too small. Thus, in this study, among combinations of colors with a color difference of 500 or more, it was ascertained that the brightness difference can be alleviated to 96 in an experiment, expanding the range of options. In WCAG, Brightness difference, Contrast ratio and Color difference are defined as these expressions (1), (2) and (3) [5]. Brightness difference: ((R1 - R2) * 299 + (G1 - G2) * 587 + (B1 - B2) * 114 ) / 1000 ≥ 126 . Contrast ratio: (L1+0.5) / (L2+0.5) ≥ 5 . Where L1=max(((R1 / 255) ^ 2.2*0.2126 + (G1 / 255) ^ 2.2 * 0.7152 + (B2 / 255) ^ 2.2 * 0.0722), ((R2 / 255) ^ 2.2 * 0.2126 + (G2 / 255) ^ 2.2 * 0.7152 + (B2 / 255) ^ 2.2 * 0.0722) L2=min(((R1 / 255) ^ 2.2 * 0.2126 + (G1 / 255) ^ 2.2 * 0.7152 + (B2 / 255) ^ 2.2 * 0.0722), ((R2 / 255) ^ 2.2 * 0.2126 + (G2 / 255) ^ 2.2 * 0.7152 + (B2 / 255) ^ 2.2 * 0.0722) Color difference: (max (R1, R2) – min (R1, R2)) + (max (G1, G2) – min (G1, G2)) + (max (B1, B2) – min (B1, B2)) ≥ 500 .
(1)
(2)
(3)
6
M. Anse and T. Tabe
Fig .1. shows the methods of the diagnosis to appropriate combinations of colors.
Fig. 1. The flow of the diagnosis to appropriate combinations of colors
3.3 “Display” Function First, colors selected by the color palette are judged whether they are appropriate or not. Second, the vision of color deficient observers is simulated. Third, the coloration is determined to be appropriate or not. Finally, coloration of the selected area is replaced with a new one on the web page screen being created. The area that was saved is colored by the background color. The other two areas are colored by a new coloration. They are put to the original position to make an image with the new coloration.
4 Tools This study has two types of coloration support tool. One is for the area including characters. The other is the area consisting of only images. The number of target areas is limited to three. If there are four or more areas, this tool must be used repeatedly. Each tool has tabs for each area. A tab has a color palette which displays whether the
Development of a Coloration Support Tool for Making Web Page Screens User-Friendly
7
coloration is appropriate or not. A user can select combinations of colors which are deemed to be appropriate from the color palette or select colors from the screen by using the dropper function. A user can understand the vision parameters of selected combinations of colors by the color deficient observers. He/she can understand whether the coloration on the selected areas is appropriate or not by the four vision types. Selected areas can be colored with the coloration. 4.1 Tool for Characters A user selects an area including a character string and background to be colored. Then, the user calls up the coloration support tool. The selected area is displayed on the tool. The coloration support tool has three tabs. The first tab is for the color of the characters. The second tab is for the background color which surrounds the color of the characters. The third tab is for the color which surrounds the background color. A user can specify a color from 216 colors from the color palette on the tab. A user can also use a dropper to select a color from the screen being created. After selecting a color, a user can display the vision seen by a normal color observer, protanope, deuteranope and tritanope by the “Reproduce defective color vision function.” The “Diagnose” function displays the evaluation (i.e., circle, triangle and cross) of the total color combination by the color vision. The “Reflect on screen” function draws the selected color combination on the selected area on the web page screen. Fig. 2. shows the main and sub screens of the character coloration support tool.
Fig. 2. Character coloration support tool
8
M. Anse and T. Tabe
4.2 Graphic Drawing Tool A user selects an area which he/she wants coloration support. Then, the user calls up the graphic drawing coloration support tool. The user selects the background color, and first and second color. Then, visibility of the two specified colors by the four color vision types is displayed on the tool. The originally selected image is displayed in a separate window. The window has three buttons denoting the defective color combinations above them. Pressing the button displays a window with the coloration corresponding to the “visibility” of each color combination. The user selects a tab and a new color to exchange with the first color. Then, selecting the second color tab displays compatibility with the new first color using symbols (no symbol, triangle and cross). The user selects a color other than the colors with the cross symbol. Presses “See all” to open a window with the selected coloration. Presses buttons in the window to check the vision by the color deficient observers. After determining a satisfactory combination, presses “Diagnose” to check the vision by the four color visions. If all color visions have no problem (circle), presses “Reflect on screen” to re-draw the coloration of the original graphic. Fig. 3. shows the main and sub screens of the Graphic coloration support tool.
Fig. 3. Graphic coloration support tool
Development of a Coloration Support Tool for Making Web Page Screens User-Friendly
9
5 Verification of Effectiveness A test was performed to draw graphics which can be discriminated by color deficient observers by using the character coloration tool and graphic drawing coloration tool. Subjects included 20 university students (males: 8, females: 12). Subjects were made to understand color vision deficiency by reading a manual. Then, they were ordered to make two screens by using the tools. After the test, the vision by color deficient observers was displayed by using a color vision simulator to check whether coloration that could be discriminated by color deficient observers was made or not. Two color deficient observers were ordered to view the screens made by all the subjects. It was proven that the coloration had no problem. Therefore, all subjects could make graphics which could be discriminated by color deficient observers. All subjects answered that the tool had supported them. Some subjects answered that the tool was difficult to use. As a result, the functions selected for developing this “web page screen preparation support tool in consideration of color deficient observers” are appropriate.
6 Conclusions We are not really aware that color deficient observers account for approximately 5% of the total population. It is understood that normal color observers can enjoy rich expressions of various color while color deficient observers cannot enjoy them as readily. As W3C proposed, we should be careful not to express information that is dependant solely on colors. We should use easy tools to make coloration in consideration of color deficient observers, especially when creating highly public information websites. For that purpose, we should develop tools to support coloration in combination with existing software. Acknowledgements. The author sincerely appreciates Ms. Saori Azuma and Ms. Kazuko Sato who graduated in 2007 from the Department of Industrial and Systems Engineering, College of Engineering and Science, Aoyama Gakuin University for their considerable contribution to this study.
References 1. 2. 3. 4.
Ikeda, M.: Basic Color Engineering. Asakura shoten, Tokyo (2000) Fukami, K.: Color Discrimination. Kanehara-shuppan, Tokyo (2003) UDing, http://www.toyoink.co.jp/ud/index.html Color Selector, http://jp.fujitsu.com/about/design/ud/assistance/ colorselector/ 5. Techniques For Accessibility Evaluation And Repair Tools W3C Working Draft, April 26 (2000), http://www.w3.org/TR/AERT
The Persuasive Effects from Web 2.0 Marketing: A Case Study Investigating the Persuasive Effect from an Online Design Competition Asle Fagerstrøm1 and Gheorghita Ghinea2 1
The Norwegian School of Information Technology, Schweigaardsgt. 14, 0185 Oslo, Norway
[email protected] 2 School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge UB8 3PH, London, United Kingdom
[email protected]
Abstract. This case study investigates the effect from a Web 2.0 campaign, an online design competition, accomplished by a company that produces and marketed feminine care products (sanitary towels). The target segment for the campaign was girls in four Nordic countries, in the age between 14 and 25. The main characteristic for the target segment is that they are not much interested in the product category. Our interpretation is that the online design competition had a persuasive effect towards the target segment. By using the internet in an interactive and social way, companies can achieve brand awareness and create a positive attitude towards a brand in low-involvement segments. Suggestions for further research are given. Keywords: Web 2.0, Interactivity, Persuasion, Involvement, Interactive marketing.
1 Introduction Web 2.0 can be used to do what traditional advertising does: to push information to persuade consumers to buy products or services. For example, a company may implement a blog on its web site and regularly publish information about products and their benefits. However, according to Parise et. al. [1] that kind of approach misses the point of Web 2.0. Instead, companies should use the Web 2.0 tools to get the consumers involved. To investigate to what extent Web 2.0 marketing has the ability to influence the target segments’ intentional effort, we used the elaboration likelihood model (ELM) of persuasion as a guide to data analysis and interpretation. The ELM, developed by Petty et al. [2], is based on the idea that attitudes are central in guiding the consumer’s decisions and other behaviors. While attitudes can result from a number of cues in the consumers setting, persuasion is a primary source. The ELM framework suggests that important variations in the nature of persuasion are a function of the likelihood that receivers will engage in elaboration of (thinking about) information relevant to the persuasive issue [2]. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 10–16, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Persuasive Effects from Web 2.0 Marketing
11
By investigating the persuasive effect from interactive and social campaign companies could better understand why Web 2.0 communication works, and as a result increase their benefits from marketing activities online. This paper is structured as follows: In the first section we give a presentation of present studies on the effect from online interactive and social marketing. Second, there will be a short presentation of the ELM framework. Third, based on a case study we discuss the persuasive effect that the Web 2.0 marketing may have had on the target segment. Finally, our last section contains concluding comments on the use of Web 2.0 activities to influence consumers’ intention to purchase by means of interactive and social marketing. Suggestions for further research are given.
2 Related Work Web 2.0 is a term that is used to describe changing trends in the use of World Wide Web technology and web design. The term was introduced for the first time in 2004 by Dale Dougherty, a web pioneer and O´Reilly VP, at a conference brainstorming session between O´Reilly and MediaLive International. Web 2.0 is, according to Tim O´Reilly, the business revolution in the computer industry caused by the move to the Internet as platform, and an attempt to understand the rules for success on that new platform. The rules for success entail using the Internet in a interactive and social way [3]. Most authors focus on describing the way of using Web 2.0 in interactive online campaigns in general [e.g. 4]. However, little has been done to explain the persuasive effect that Web 2.0 marketing has toward consumers. Hoffman and Novak [5] point out that the Web frees customers from their traditional passive role as receivers of marketing communications, gives them much greater control over the information search and acquisition process, and allows them to become active participants in the marketing process. The unique forms of interactivity, “machine interaction” and “personal interaction” respectively, have contributed to a rapid diffusion of the Web as a commercial medium in the last several years [5]. Interactivity can be conceptualized from various scopes. For example, Ghose and Dou [6] conceptualize interactivity from a marketing perspective, identifying 23 functions of interactivity mainly driven by communication-based conceptualization of interactivity. Another example is Ha and James [7] who conceptualize interactivity from an interpersonal communication perspective focusing on interactivity as communication, either through a medium or without the aid of a medium. The latter conceptualization of interactivity will be used in this paper as basis for discussion and interpretations. Some studies have examined the effect of interactivity on companies’ web sites. Accordingly, Coyle and Thorson [8] did an experiment on the effect of interactivity and vividness on commercial web sites. Results show that perception of telepresence (simulated perception of direct experience) grew stronger as levels of interactivity and levels of vividness in web sites increased. In a study on the attractiveness of a web site, Ghose and Dou [6] found that the greater the degree of interactivity, the more likely it is for the company’s web site to be considered as a top site. In addition, they find that the “customer support” component of interactivity has a significant positive
12
A. Fagerstrøm and G. Ghinea
impact on the likelihood of a company’s web site being included in a list of highquality web sites. Interesting studies have been done on the effect from interactivity. On the other hand, none of these studies investigate the persuasive effect that interactivity has on the consumer’s intention to purchase a brand.
3 The Elaboration Likelihood Model According to O`Keefe [9], persuasion can be defined as: “a successful intentional effort at influencing another’s mental state through communication in a circumstance in which the persuaded has some measure of freedom.” The ELM, developed by Petty et al. [2], is a theory that proposes a global view of how attitudes are formed and changed. The basic idea is that the receivers (e.g. online consumers) will vary in the degree to which they are likely to engage in elaboration (thinking about) of information relevant to the persuasive issue. For example, when information becomes more personally relevant, consumers are willing to engage in extensive issue-relevant thinking. They will pay attention to a specific web site or an online campaign, evaluate thoroughly the information that is presented, and, recall from memory other issues that are relevant to the specific situation. However, sometimes when the information is not personally relevant, consumers will not undertake much issuerelevant thinking and display relatively little elaboration. The degree to which consumers engage in issue-relevant thinking forms a continuum, from cases of extremely high elaboration to cases of little or no elaboration. The ELM suggests that important variations in the nature of persuasion are a function of the likelihood that receivers will engage in elaboration of information relevant to the persuasive issue. Two types of persuasion process can be engaged depending on the degree of elaboration [2]: a central route involving systematic cognitive thinking and a peripheral route involving cognitive shortcuts. The central route to persuasion represents the persuasion processes involved when elaboration is relatively high, and persuasion is achieved through the consumer’s thoughtful examination of issue-relevant considerations. The peripheral route is the persuasion processes involved when elaboration is relatively low. When persuasion is achieved through peripheral routes it usually comes about because the consumer uses some simple decision rules to evaluate the advocated position. For example, the consumer may be guided by whether they like the color or the design of the web site. That is, the consumer may rely on peripheral cues as guide to attitude, rather than to engage in extensive issue-relevant information processing.
4 Case: An Online Design Competition Our research has an inductive approach which indicates that we gather empirical data without having a hypothesis in advance. Because it is difficult to separate the object to be studied from its context, we found that a case study was an appropriate method to use [see 10]. The rationale for this choice is also that the case study is suitable for investigating up-to-date processes or behaviors of others, which happen in their reallife context but are little known [11]. The key to successfully designing a case study is
The Persuasive Effects from Web 2.0 Marketing
13
to have developed beforehand a theoretical proposition to guide data collection and data analysis [10]. A company that had accomplished a Web 2.0 campaign was chosen. The company produces and marketed feminine care products (sanitary towels) in 85 countries worldwide. In Europe the products occur under the brand name Libresse™. Besides demographics (gender and age) the company reported that there are two main characteristics of the target segment for feminine care products: first the consumer often stick to the brand she decides to choose first time, and, second the consumer is not much interested in the product category (sanitary towels). This is a challenging situation for the marketing department of Libresse™. To be able to achieve brand awareness and attitude towards the brand the company accomplished additional research to better understand the target segment. Results from the market research (survey and focus groups) show that the target segment has varying interests. However, fashion design was one of the most dominant interests reported by the respondents. As many as 25 % of the respondents, age 14 to 25, reported that they want to work with fashion and design. The owner of Libresse™ sees this as an opportunity for their 2007 campaign. A two month online design competition was created as the main communication activity in the campaign toward the Nordic segment (Iceland was not included). The other communication channel that was part of the campaign was print, TV and stores. Libresse™ did also create a package design exclusively for the online design competition. The target segment was invited to design a pair of underpants on the Libresse™ web site. The competition was open for each and everyone but the main target segment was girls between 14 and 25 years of age. Girls who were attracted to the invitation were able to use a design tool to choose color and patterns for their underpants (see figure 1). With help of the drawing program, she could create the submission by choosing between templates, complete figures, and by freehand drawing. The Libresse™ brand name was strategically placed on the top left of the web site. The web site also presented the jury and the attractive prizes. When the consumer was satisfied with the underpants design, she could submit it on the Libresse™ web site and join the design competition. However, she could also
Fig. 1. Design tool on Libresse™ web page
14
A. Fagerstrøm and G. Ghinea
invite friends by Facebook™ to vote for her design underpants. If she didn’t want to design a pair of underpants, she could vote for her favorite pattern, and in addition send it as a postcard. Every week the winner in each country was lining up for the final. The winner of the design competition obtains a sum of money and, even more important, her underpants were launched in 180 JC™ stores around the Nordic countries. So, the dream of being a fashion designer could be realized on Libresse™ web site.
5 Campaign Outcomes One goal that Libresse™ had for the campaign was to increase the number of visitors to each country’s website by 25 %. In total (see figure 2), the number of visitors increased from 277 657 to 483 036, in other words, an increase of 205 379 visitors (+ 74 %). The time consumers spent at the web site also increased by 60 % from approximately 12 minutes to 19 minutes.
Fig. 2. Visitors at Libresse™ web site
The response from the target segments was enormous. As much as 90 000 underpants were designed and submitted to the design competition by girls in the Nordic countries (see figure 3). Sweden had the highest participation with 40 500 submitted underpants. Norway was second with 17 100, and then Denmark and Finland with both 16 200 submitted underpants. Impact-wise, the media impact was extremely strong, especially the digital. Within the blog world, the competition was one of the major topics during the summer 2007. In regard to sale, Libresse™ also witnessed an increase in sale in the campaign period.
The Persuasive Effects from Web 2.0 Marketing
15
Fig. 3. Numbers of submissions in Nordic segments
6 Discussion The increase in sale in the campaign period can be explained by means of the ELM. The ELM is based on the idea that under different conditions, receivers will vary in the degree to which they are likely to engage in elaboration of information relevant to the persuasive issue [2]. The basic idea is that consumers are more likely to carefully evaluate the attributes of a product when the purchase is of high relevance to them. Conversely, the likelihood is great that consumers will engage in a very limited information search and attribute evaluation when the product holds little relevance or importance to them. The target segment of Libresse™ reported that they are not much involved in the product category (feminine care products). Their engagement in information search and attribute evaluation will therefore most probably be limited. As presented in the introduction, ELM makes a distinction between two routes to persuasion [2]: a central and a peripheral route. The central route to persuasion represents the persuasion processes involved when elaboration is relatively high, and the peripheral route represents the persuasion processes involved when elaboration is relatively low. According to Petty et al. [2] attitude changes that occur via the peripheral route occur because the attitude issue or object is associated with positive or negative cues. In the Libresse™ design competition campaign, brand awareness and a positive attitude towards the brand are most probably achieved because the brand Libresse™ is associated with fashion design (positive cues). The peripheral route to persuasion can explain the brand awareness, positive attitude towards the brand, and finally, increase in sale for Libresse™ in the campaign period.
7 Conclusion This case study has demonstrated that the peripheral route to persuasion strategy can be realized with the support of Web 2.0 marketing towards low-involvement
16
A. Fagerstrøm and G. Ghinea
consumers. By using the Internet in an interactive and social way, companies can achieve brand awareness, positive attitude towards a brand, and finally, increase in sales in the target segment. This study is not without limitations. One limitation is its lack of empirical data. Its interpretative design has obvious limitations, especially with regards to internal validity (to what extent the interactive design competition is a cause to the increase in sales). In spite of the limitations, the online marketing implication of the ELM is apparent: when planning an online campaign, companies should consider to what extent the target segments are involved in the product category. A follow up study could be to conduct an experiment which would increase validity regarding causal conclusions. Another follow up study could be to investigate the persuasive effect of Web 2.0 marketing towards high-involvement consumers. To what extent can companies use Internet in an interactive and social way toward target segments that are likely to engage in elaboration of information relevant to the persuasive issue? A third follow up study could be to replicate the present study in different contexts and see if it gives the same results.
References 1. Parise, S., Guinan, P.J., Weinberg, B.D.: The Secrets of Marketing in a Web 2.0 World. The Wall Street Journal (2009) 2. Petty, R.E., Cacioppo, J.T., Schumann, D.: Central and Peripheral Routes to Advertising Effectiveness: The Moderating Role of Involvement. Journal of Consumer Research 10(2), 135–146 (1983) 3. O’Reilly, T.: What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. Communication & Strategies, First Quarter (1), 17–37 (2007) 4. Moran, M.: Do It Wrong Quickly: How the Web Changes the Old Marketing Rules. IBM Press, Upper Saddle River (2008) 5. Hoffman, D.L., Novak, T.P., Chatterjee, P.: Commercial scenarios for the Web: Opportunities and challenges. Journal of computer-mediated communication 3(3) (1995) 6. Ghose, S., Dou, W.: Interactive functions and their impacts on the appeal of internet presence site. Journal of Advertising research 38, 29–43 (1998) 7. Ha, L., Lincoln, J.E.: Interactivity reexamined: A baseline analysis of early business Web sites. Journal of Broadcasting & Electronic Media 42(4), 456–473 (1998) 8. Coyle, J.R., Thorson, E.: The Effects of progressive levels of interactivity and vividness in web marketing sites. The Journal of advertising 30(3), 65–77 (2001) 9. O’Keefe, D.J.: Persuasion: Theory & Research, 2nd edn. Sage Publications, Inc., London (2002) 10. Yin, R.K.: Case study research: Design and Methods, 3rd edn. Applied social research methods series, vol. 5. Sage publications, Thousand Oaks (2003) 11. Amaratunga, D., et al.: Quantitative and qualitative research in the built environment: application of “mixed” research approach. Work Study 51(1), 17–31 (2002)
Formalizing Design Guidelines of Legibility on Web Pages Fong-Ling Fu and Chiu-Hung Su Department of Management Information Systems, National Cheng-chi University, Taipei 11605, Taiwan
[email protected],
[email protected]
Abstract. Screen design of web pages is challenging because web pages contain lot of icons, consisting not only of texts with various fonts but also graphics with different sizes and content. The objectives for screen design of a web page can be either to provide aesthetic beauty, to convey complex information, to improve legibility, or some combination of the above. This study chooses to formalize design guidelines of legibility because information is becoming more and more complicated in web pages and hampering the efficiency of information searching. This study proposes six measurements of screen legibility: screen ratio of navigator to content, font size variety, variety of icon types, color contrast between background and foreground, content density, and number of alignment points. These six factors were then use to measure the legibility effectiveness of the startup page on four different yahoo.com sites. Combined with the results from a survey study, we concluded that all six factors were validated to be attributes with a significant and measurable impact on web site legibility. Keywords: Guidelines of Web pages, Web pages design, Screen layout design, Legibility design, Complexity measurement.
1 Introduction A Web site is like a big house: it should be firm in its basic structure (e.g. secure and stable), should be functionally convenient (e.g. easy to use), and should be a delight to use [6], [16]. Proper interface design helps all the above needs to be satisfied. Screen design is very challenging because a web page contains lot of icons, which consist of not only texts with various forms but also graphics with different sizes and contents [13], [14]. The objectives for screen design of a web page can be either to provide aesthetic beauty, to convey complex information, to improve legibility, or some combination of the above [14]. Among them, previous studies seem to have been more focused on aesthetics. This study chooses to formalize design guidelines of legibility because information is becoming more and more complicated in web pages and hampering the efficiency of information searching. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 17–25, 2009. © Springer-Verlag Berlin Heidelberg 2009
18
F.-L. Fu and C.-H. Su
2 Experimental Design The objectives of a Web site design are influenced by its tasks [18]. There are three different types of Web sites based on the nature of the tasks: pleasure-oriented (hedonic), productivity-oriented (utilitarian), or hybrid [21], [8], [19]. At one end of the continuum, Web sites that provide solutions to problems and are typically visited out of necessity to effectively support utilitarian tasks [19]. Utilitarian tasks would include many e-shopping behaviors such as information searching, analysis, comparison, and evaluation. In such situations, users would like to accomplish their tasks effectively and efficiently. At the other end of the continuum, Web sites that are experiential, entertaining, and gratifying to the senses effectively support more hedonic tasks such as games[19]. Along the continuum between utilitarian and hedonic Web sites, there are also hybrid Web sites with most e-retailers supporting both task types [19] [21]. The more emphasis placed on hedonic importance, the more influence by aesthetic design of interface. On the other hand, the more emphasis placed on utilitarian importance, the more influence by efficiency on information searching [19]. Therefore, to validate the measurements of legibility, this study selected four startup screens of four Yahoo websites from Japan, USA, Taiwan, and Korea as controls to the same task. The home page of any website can be quite large, but when entering the site, the initial content to display is limited to whatever fits into the space of the user’s screen; this is what is referred to as the “startup screen” in our experiment. The startup screen
Fig. 1. Startup screen and structure model of yahoo.com.jp
Fig. 2. Startup screen and structure model of yahoo.com
Formalizing Design Guidelines of Legibility on Web Pages
19
Fig. 3. Startup screen and structure model of yahoo.com.tw
Fig. 4. Startup screen and structure model of yahoo.com.ka
of the Yahoo.com used in four different countries, and each of their structure models are listed below in Fig. 1 to 4. The unit of web page measured is a ‘block’ which is represented as a rectangle in the structure model [4]. To formulize the Design Guidelines of Legibility on Web pages, we based our experiment on the findings and results of previous research into this topic. A summary of the key findings from previous research is provided below: (1) Legibility of screen ratio (LR) involved the ratio of navigator/content with best performance achieved using a 23/77 ratio [20]. In terms of the magnitude or size of the LR, the closer to the perfect ratio the better the results, as shown by the following analysis: LR = 1/
|W
nav
- 23
| ∈ [0,1]
(1)
where Wnav is the width ratio of navigator to total screen. (2) Legibility of font size variety (LS) is based on the Principle of Economy - a careful and discreet use of display elements to get the message across as simply as possible [3]. LS involved the categorization of elements into groups according to actual physical size and variation in those sizes [5]. LS =
1 nsize
∈ [0,1]
where nsize is the number of sizes and n is the total number of objects.
(2)
20
F.-L. Fu and C.-H. Su
(3) Legibility of density (LD) is the extent to which the screen is filled with objects [4].
∑a LD= 1 − 2 n
i
i
a frame
∈ [0,1]
(3)
where ai and aframe are the areas of object i and the frame, respectively; and n is the number of objects on the frame. (4) Legibility of alignment (LA): In order to achieve simplicity on the screen, the smaller the number of alignment points, the better [17], [9]. LA =
( n vap
3 ∈ [0,1] + n hap + n )
(4)
where nvap and nhap are the numbers of vertical and horizontal alignment points, respectively; and n is the number of objects on the frame [3]. (5) Legibility of icon type (LI): Graphic ingredients enabled considerable presentation enhancements, making screens easier to understand and use [2]. Every icon was perceived as an information unit. Icons with same size and font were perceived as one type of icon. Every graph icon is different from others because graph icons contain different graphs and texts with different size, color, or font. LI =
1 ntype
∈ [0,1]
(5)
where ntype is the number of sizes and n is the total number of objects. (6) Legibility of color contrast (LC): The combination of colors that had higher levels of contrast between background and foreground (BW, YB) generally led to better performance than combinations of lower contrast [7], [12], [1]. Color design is better when simple, not exceeding more than 4 per screen [15]. Also, the impact of color on
Fig. 5. I.R.I 116 Palette [23]
Formalizing Design Guidelines of Legibility on Web Pages
21
visualization depends on the area size which the color is used [10], [11]. Therefore, the formula for calculating LC is as follows: LC =
∑ C × A − ∑ C × A ∈ [0,1] 4
n
i
i
i
5
(6)
i
i
where Ci is the degree of contrast between background and foreground and Ai is the % of icon area in the web page. According to I.R.I 116 palette, the contrast of the hues was classified as either identity, similarity, analogous, contrast, blur, contrast or complementary (fig. 5). The researchers set the degree of contrast Ci as: identity is 0, similarity is 0.2, analogous is 0.4, blur 0.6, contrast 0.8 and complementary 1,
3 Results The indicators of legibility, LR, LS, LI, LC, LD, and LA are calculated based on formulae 1 to 6 as mentioned above, and the legibility results for each of the four web pages are shown in the following six tables. Table 1. LR of four web pages
Japen
USA
Taiwan
Korea
% area of navigator
17
15
16
15
% area of Content
77
79
78
79
0.203
0.142
0.167
0.142
LR
Table 2. LS of four web pages
Japen
USA
Taiwan
Korea
n
145
109
146
146
nsize
40
52
55
66
LS
0.025
0.019
0.018
0.015
Taiwan
Korea
Table 3. LI of four pages
Japen
USA
ntype
47
62
69
81
n
145
109
146
146
LI
0.021
0.016
0.014
0.012
22
F.-L. Fu and C.-H. Su Table 4. LC of four pages
Japen
USA
Taiwan
Korea
C1, W1
0.099
0.095
0.104
C2, W2 C3, W3
0.051 0.017
0.025 0.015
0.035 0.027
0.130 0.026
C4, W4
--
0.006
0.007
C5, W5
--
0.009
LC
0.167
0.132
--
0.012 0.010 --
0.173
0.178
Table 5. LD of four pages
Japen
USA
Taiwan
Korea
270353
243785
281157
976 665
976 665
976 665
288123 976
Σaframe
649040
649040
649040
649040
LD
0.167
0.249
0.134
0.112
Σai area width area high
665
Table 6. LA of four web pages
Japen
USA
Taiwan
Korea
nhap
53
61
71
79
nvap
63
69
75
n nhap + nvap + n
145 261
109 239
146 292
61 146 286
LA
0.011
0.013
0.010
0.010
A summary of the legibility values is shown in Table 7. According to the average of all the legibility values, we consider Yahoo Japan to be the best; it got the highest values in LR (ratio of navigator/content), LS (font size variety), and LI (icon type variety). The Yahoo USA Web page came in second; it got highest values in LD (density) and LA (alignment). Yahoo Taiwan is a little lower than the previous two while Yahoo Korea ranks last.
Formalizing Design Guidelines of Legibility on Web Pages
23
Table 7. Summary of legibility of four web pages
indicator
Japen
USA
Taiwan
Korea
LR
0.203
0.142
0.167
0.142
LS
0.025
0.019
0.018
0.015
LI
0.021
0.016
0.014
0.012
LC
0.167
0.132
0.173
0.178
LD
0.167
0.249
0.134
0.112
LA
0.011
0.013
0.010
0.010
average
0.099
0.095
0.086
0.078
rank
1
2
3
4
Research into information systems has proposed that the solution to managing complexity is to include enough variety in the attributes of the system’s basic elements. By manipulating attributes such as position, size, icon type and color, web elements are allowed to be distinguishable yet grouped. The other important factor is how strongly and in how many different ways these groups can be related to the task at hand, such as density and alignment in the study performed by Xing [22]. Based on the data in Table 7, we consider the Yahoo Japan home page to be the best at providing distinctiveness to the web page elements (block) through variety in position of navigator (index), text size, and icon type. Yahoo USA also provided stronger relations to group the blocks, through density and alignment, than the other websites included in this study.
4 Survey A survey study was conducted to test the robustness of the formulae for legibility using model screens. The subjects who judged the legibility of actual screens comprised of 64 undergraduate student volunteers from a university in Taiwan who each took the course “Introduction of software”. The subjects consisted of 26 males and 38 females; the average age was 29.2 years old, with 35 majoring in business, 1 computer science major and 28 others. The average previous experience on web is 7.8 years. The average web usage per week was 20.6 hours. The questionnaire contained pictures of the Web pages of Fig. 1-4 and questions regarding the rating the legibility of pair on figures. For each pair, viewers assigned a numerical value between one and nine, indicating their perception of the Web pages’ difference in legibility. Value of five meant no difference between the pair. Values from one to four represented “very much distinct clear and legible”, “distinct clear and legible”, “more clear”, and “a little clear” the former compared to the latter respectively. Values from six to nine represent the degree of legibility which the latter is better than the former. The means and standard deviations of viewer’s judgment are listed in Table 8.
24
F.-L. Fu and C.-H. Su Table 8. Statistics of the pair comparisons
Japan Korea Taiwan USA
Japan Mean (S.D.) -2.81 (1.7) 3.32 (1.7) 5.64 (1.7)
Korea Mean (S.D.) 6.19 (1.7) -5.98 (1.7) 7.03 (1.6)
Taiwan Mean (S.D.) 5.68 (1.7) 3.02 (1.7) -6.29 (1.6)
USA Mean (S.D.) 3.36 (1.7) 1.97 (1.6) 2.71 (1.6) --
The values of comparisons in Table 8 indicates that Yahoo Korea is the worst, Yahoo Taiwan is worse than Yahoo USA and Yahoo Japan, but better than Yahoo Korea. Yahoo USA and Yahoo Japan are better than the other two and Yahoo Japan is a little better than Yahoo USA. The order of legibility from best to worst is USA Japan > Taiwan > Korea. The results of viewers’ judgments on legibility almost resemble those obtained using the measures proposed by the above formulae, with the exception of Yahoo Japan. The mean score of legibility of Yahoo Japan was 0.099 and that of Yahoo USA was 0.095. We can ignore the difference between the web pages Yahoo USA and Japan because it is very small. Based on the similar results generated, we conclude that the proposed formulae are valid.
≧
5 Conclusions Web site designers are in need of practical guidelines on how to create and measure effective design. This study proposed six measurements of screen legibility: ratio of navigator to content size, font size variety, variety of icon type, color contrast between background and foreground, content density, and number of alignment points. The former four are the factors that help the viewers to distinguish differences in the blocks on a web page, and the last two help the viewers to connect the groups of blocks to a function or task. All of them are useful for decreasing information complexity. Utilizing these six legibility factors to evaluate display pages of four international yahoo.com sites, we calculated the legibility ranking, in order from most legible to least, to be: USA, Japan, Taiwan, Korea. We then conducted a survey with real human users to verify the theoretical results, finding our theoretical projections to be a validate match to recorded human opinion. Yahoo Japan was effective in providing variety in: position of navigator, font size and icon type. Yahoo USA was effective in providing connections between groups using space and alignment. In the end, we can conclude each of the six legibility factors contains a real and significant impact on web page legibility. As legibility is more important for utilitarian and hybrid web sites, and aesthetics is more important to pleasure-oriented web sites, further studies can help advance understanding of the relationship and complementary nature between aesthetics and legibility.
Formalizing Design Guidelines of Legibility on Web Pages
25
References 1. Bedrogi, P.: Chromaticity Contrast in Visual Search on the Multi-colour User Interface. Displays 24, 39–48 (2003) 2. Ch’ng, E., Ngo, D.C.L.: Screen Design: a Dynamic Symmetry Grid Based Approach. Displays 24, 125–135 (2003) 3. Ngo, D.C.L., Teo, L.S., Byrne, J.G.: Formalising Guidelines for the Design of Screen Layouts. Display 21, 3–15 (2000) 4. Dong, Y., Ling, C., Hua, L.: Effect of Glance Duration on Perceived Complexity and Segmentation of User Interfaces. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4552, pp. 605–614. Springer, Heidelberg (2007) 5. Fu, F.L., Su, C.H.: Measuring the Screen Complexity on Web Pages. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4558, pp. 720–729. Springer, Heidelberg (2007) 6. Kim, J., Lee, J., Han, K., Lee, M.: Businesses as Buildings: Metrics for the Architectural Quality of Internet Businesses. Information System Research 13(3), 239–254 (2002) 7. Ling, J., van Schaik, P.: The effect of text and background colour on visual search of Web pages. Displays 23, 223–230 (2002) 8. Massey, A.P., Khatri, V., Montoya-Weiss, M.: Usability of online services: The role of technology readiness and context. Decision Sciences 38(2), 277–308 (2007) 9. Miyoshi, T., Murata, A.: A Method to Evaluate Properness of GUI Design Based on Complexity Indexes of size, Local Density, Aliment, and Grouping. In: 2001 IEEE International Conference on Man and Cybernetics, vol. 1, pp. 221–226 (2001) 10. Parry, M., Eberle, S.D.: Aesthetic measure applied to color harmony. Journal of the Optical Society of America 34(4), 234–242 (1944) 11. Moon, P., Spencer, D.E.: Area in Color Harmony. Journal of the Optical Society of America 34(2), 93–103 (1944) 12. Näsänen, R., Ojanpäa, H.: Effect of Image Contrast and Sharpness on Visual Search for Computer Icons. Display 24, 137–144 (2003) 13. Ngo, D.C.L., Teo, L.S., Byrne, J.G.: Modeling Interface Aesthetics. Information Science 152, 25–46 (2003) 14. Schenjkam, B.O., Jönsson, F.U.: Aesthetics and Preferences of Web Pages. Behaviour & Information Technology 19(5), 367–377 (2000) 15. Shneiderman, B., Plaisant, C.: Designing the User Interface, 4th edn. Addison Wesley, England (2004) 16. Palmer, J.: Web Site Usability, Design, and Performance Metrics. Information Systems Research 13(2), 151–167 (2002) 17. Parush, A., Nadir, R., Shtub, A.: Evaluating the Layout of Graphical User Interface screens: Validation of a Numerical Computerlized mode. International Journal of HumanComputer Interaction (2005) 18. van Schaik, P., Ling, J.: The effects of screen ratio and order on information retrieval in web pages. Displays 24, 187–195 (2003) 19. Valacich, J.S., Parboteeah, D.V., Wells, J.D.: The Online Consumers Hierarchy of Needs. Communications of the ACM 50(9), 84–90 (2007) 20. van Schaik, P., Ling, J.: The effects of screen ratio and order on information retrieval in web pages. Displays 24, 187–195 (2003) 21. Van der Heijden, H.: User Acceptance of Hedonic Information Systems. MIS Quarterly 28(4), 695–704 (2004) 22. Xing, J.: Measures of Information Complexity and the Implications for Automation Design, National Technical Information Service, Springfield, Virginia (2004) 23. I.R. I 116 Palette, http://www.iricolor.com/04_colorinfo/sensetest.html
The Assessment of Credibility of e-Government: Users’ Perspective Zhao Huang, Laurence Brooks, and Sherry Chen School of Information System, Computing and Mathematics, Brunel University, Uxbridge, Middlesex UB8 3PH, UK {zhao.huang,Laurence.brooks,Sherry.chen}@brunel.ac.uk
Abstract. Electronic government is increasing worldwide; however there are still some problems which influence users’ interaction with them. One of these problems is trustworthiness, which appears to be affected by whether e-government websites demonstrate their credibility. This study uses an empirical approach to evaluate credibility of e-government websites, especially at the local level in the UK. The evaluation consists of three steps: free interaction, task-based interaction and questionnaire. The results indicate that the majority of credibility problems are related to “site easy use”, “site looks professional” and “site update”. The value of the study is that it provides guidance for designers to improve the credibility of e-government websites. Keywords: e-government website, credibility, web-based online systems.
1 Introduction With the rapid development of the Internet, users have increasingly been able to interact with Web-based online systems. Among a variety of Web-based online systems, electronic government (e-government) is becoming part of the revolution applied in the public sector. Nowadays, thousands of e-government websites are widely accessible via the Internet. Such a rapid growth arises from the way that e-government websites have the potential to change the working environment of the traditional government to enhance access and delivery of government services [1]. However, e-government is still facing a big challenge to interact with users. Trustworthiness can be seen as the underlying catalyst for e-government adoption [2]. With higher trustworthiness, users can overcome perceptions of risk and uncertainty in the use and acceptance of online systems [3]. Trustworthiness can be affected by whether online systems demonstrate their credibility [4]. In general, credibility refers to reliability, accuracy, authority and quality [5]. Regarding e-government website particularly, credibility can be enhanced by the site look, information quality and readability [4]. This suggests that there is a need to consider credibility when developing e-government websites. By doing so, egovernment websites can be accepted by a wider range of users. As such, credibility evaluation of e-government website becomes crucial in order to develop user-centered e-government. However, existing research has not paid much attention to evaluating credibility. To this end, the paper aims to assess M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 26–35, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Assessment of Credibility of e-Government: Users’ Perspective
27
credibility of e-government websites, particularly focusing on the local level in the UK. To carry out the evaluation, an empirical study has been conducted based on Fogg’s credibility guidelines [6]. Accordingly, the paper is presented as follows: Section 2 presents the theoretical background from literature to demonstrate the importance of credibility to e-government website. In section 3, an empirical study is designed to evaluate credibility of e-government websites. This allows the detection of credibility problems which are discussed in section 4. Finally, conclusions are drawn and possibilities for future study are recommended in section 5.
2 Theoretical Background: e-Government Websites Governments worldwide have caught on to the revolution in Internet and Web technologies and made significant attempts to deliver their services to citizens, business and other government agencies via the Internet in the public sector [9]. Generally, e-government is the use of the Internet, as a tool to achieve better government, which enables richer information resources, higher quality services and greater participation [10]. All services of e-government can be produced via information presentation, interaction, transaction and integration [11]. The benefits of e-government can increase transparency; service delivery; civil service performance; policy effectiveness; strengthen citizen trust and achieve big cost savings [10]. Equally, the number of e-government website worldwide has increased from 142 in 1995 to more than 50,000 in 2001 [8]. In the U.S., more than one million users visit federal websites every week [12]. A survey reports that 60% of respondents prefer to choose e-government to deal with their demands [13]. This suggests that there are a large number of users who are using and have been willing to engage in e-government website services during the last decade. However, with the rapid growth of e-government websites, there is still a big challenge for e-government to interact with users. Sillence et al. [14] found that trustworthiness is a key factor in user decisions about website engagement. In other words, the success of e-government websites will largely depend on reliability [15]. Further, a range of studies demonstrate and adopt credibility to explain interaction between users and information system. In these, credibility of the website directly results in users’ satisfaction [7]. The evidence indicates that credibility is an important factor which influences users’ interaction with system. 2.1 Credibility Credibility is a highly complex concept. A simple definition is “believability” [6]. Many studies attempt to identify multiple criteria in credibility assessment. Berlo, et al. [16] extend believability by adding “safety,” “qualification,” “dynamism,” and “sociability”. Robins and Holmes [17] argue that “sociability” is a strong factor relating to credibility. Within web-based systems, Rieh [5] indicates that except for “trustworthiness” and “dynamism”, “authority” can be used instead of “expertise”. However, Toms and Taves [18] argue “reputation” as a comprehensive factor which includes “trustworthiness”. Fogg [19] defines credibility as consisting of trustworthi-ness and expertise. Trustworthiness is the users’ perception in terms of goodness and the unbiased nature of
28
Z. Huang, L. Brooks, and S. Chen
the system [19]. Expertise is seen as the users’ perception regarding knowledge and skill with the system resources [6]. Based on this concept, the major problems of credibility lie within “Aesthetic design”; “Information structure”; “Information focus”; “Company motive”; “Usefulness of information” and “Accuracy of information” [19]. It seems that credibility is a key factor in determining the success of e-government [4]. Users will not fully accept, satisfy and interact with e-government until credibility issues have been addressed in sufficient details in e-government design. Thus, there is a need to evaluate credibility in current e-government in order to develop more user centered e-government.
3 Methodology Design 3.1 Conceptual Framework Having demonstrated the importance of credibility to e-government website, this paper reports an empirical study to evaluate credibility of e-government websites. Firstly, a set of tasks is developed for participants to perform (Section 3.2). Subsequently, a guideline-based questionnaire is designed to capture participants’ perception about credibility (Section 3.3). Then, three local e-government websites in U.K. are selected to be evaluated (Section 3.4). The evaluation procedure composes of three steps: a free interaction; a task based interaction and questionnaire (Section 3.5). Once the evaluation is completed, a score is assigned to each questionnaire question to indicate the seriousness of credibility problems. Finally, the data collected is presented and discussed (Section 4), and these findings can be used to help credibility design in further development of e-government websites (Section 5). 3.2 Task Design The study aims to detect the credibility problems of e-government websites. In the evaluation, the participants are required to perform a set of practical tasks on e-government websites. These tasks are representative activities that users would be expected to perform. Generally, three characteristics are used to categorize the services on e-government websites: information dissemination, products offer and user participation [8]. Information dissemination is the presentation of information. Products offer refers to one-way delivery services, such as forms download; registration; job searching. User participation is regarded as the interactive services which involve users in two-way communication, for example, participating in government decision making. 3.3 The Guidelines Based Questionnaire Design To measure credibility, a questionnaire based on Fogg’s credibility guidelines [6] is used to capture participants’ perception. Three phases are constructed of the questionnaire design. Firstly, there is a need to extend existing guidelines to meet the specific requirements of e-government websites. Secondly, since Fogg’s guidelines are broad principles which can not prescribe a step by step approach to cover specific elements, it is important to develop associated criteria for each guideline. Finally, the
The Assessment of Credibility of e-Government: Users’ Perspective
29
questionnaire consists of a series of questions which are developed based on associated criteria from the extended Fogg’s guidelines. Extension of Guidelines. Fogg’s set of guidelines (see Table 1) is used as a benchmark for assessing credibility [7]. However, as Fogg’s ten guidelines were developed 10 years ago and for general websites, it is important to extend these guidelines in order to fit the specific needs of e-government websites. As e-government serves the public, users’ participation is given more attention [11]. Barnes [20] demonstrates that the service interaction quality is the main factor in users’ satisfaction. Furthermore, Garcia et al. [15] use privacy and transparency as criteria to evaluate e-government Therefore, the existing guidelines are extended by adding three new guidelines: Transparency; Service agility and Privacy (see Table 2). Table 1. Fogg’s credibility guidelines [6] Credibility Guidelines G1. Site looks professional G2. Information accuracy G3. A real organization behind site G4. Highlight the expertise in site G5. Show the trustworthy people G6. Make it easy to contact you G7. Site ease to use and useful G8. Update site’s content often G9. Restraint promotional content G10. Avoid errors of all type
Explanations The site considers layout, typography, consistency The site shows the validation of the materials The site needs to prove it is a legitimate organization The site indicates an expert team and authority services The site shows people who convey trust through the site The site provide clear contact information at any time Users can easily complete their tasks using the site The site is up to date and review its content regularly The site should clearly distinguish sponsored content The site prevents a problem occurring in the first place
Table 2. Extended Guidelines Extended Guidelines G11. Transparency G12. Service agility G13. Privacy
Explanations The site should keep users informed a clear governmental operations The site should provide flexible services to fit in different user paths The site should protect users’ information and secure its services
Development Criteria. Although Fogg’s guidelines are extended, it may encounter difficulties to assess detailed aspects of credibility in questionnaire. Therefore, associated criteria for each guideline need to be developed. The precise criteria for each guideline may provide a step to step process to closely focus on specific aspects so that the specific question in the questionnaire can be exactly designed based on those criteria. Credibility Questionnaire. A questionnaire ensures that all participants are asked the same questions and provide quick responses. Therefore, a questionnaire is created to assess credibility. The participants will be asked to complete a set of questions using five-point Likert scales, which presents their level of agreement to those statements. The main advantage of five-point scales is that an odd number of response formats with a neutral level in the middle may not force the participants to choice a positive or negative option when they really do not have. The other advantages include gathering
30
Z. Huang, L. Brooks, and S. Chen
respondents’ opinion by summing up the participants’ responses to the same latent variable and the options are approximately equal spaces across the continuum of approval [21]. 3.4 e-Government Website Selection The evaluation is conducted using three local e-government websites in the U.K. so that evaluation results can be comparatively analyzed. The reason behind choosing the local level to evaluate is that the local e-government website is the closest level for citizens, frequently used by the general public and an important role for citizens’ participation [22]. In addition, previous studies have found the various problems in local e-government. For example, Yang and Paul [23] detect that the majority of problems in local e-government lie within security and content update issues. The three local e-government websites will be called: London Authority1, London Authority2 and London Authority3. 3.5 Experimental Evaluation Procedure To conduct the experiment, 30 participants are assigned to evaluate three e-government websites (10 participants for each site). Each evaluation follows the same three steps: free interaction; task-based interaction and questionnaire. Firstly, the participants freely look at the e-government website so that a general perception and initial interaction are developed. Subsequently, task-based interaction requires participants to complete a number of practical tasks. User scenario is used as the technique to translate and present the selected tasks to the participants. Having completed all tasks, the participants communicate their perceptions of credibility in the questionnaire.
4 Discussion of the Results To obtain a comprehensive evaluation, both quantitative and qualitative approaches are used to analyze the results. The former uses the numeric results from the closed questions of the questionnaire to identify the overall credibility of each e-government website (Section 4.1), while the latter presents the useful features and problems from open-ended questions to indentify the strengths and weaknesses of credibility (Section 4.2). 4.1 Quantitative Measurement In order to comparatively evaluate the results, a score has been assigned to each question in the questionnaire. This has been calculated by multiplying the number of answers under each weight of five-point scale. Subsequently, the sum of these calculations is added together to obtain the score for each question. Lower scores indicate the most serious credibility problems (see Table 3). By assigning the scores in this way, it is easy to identify the most serious problems in each e-government website. More importantly, it can comparatively highlight the specific categories of credibility that cause the most common credibility problems among the e-government websites evaluated.
The Assessment of Credibility of e-Government: Users’ Perspective
31
Overall, the three e-government websites appear to be clear and fairly straightforward. Different parts of information are properly displayed in the various formats so that users can easily read information presented, which can also be quickly accessed. The main menu is suitably located on the left side of the page and the quick links are always available at the top of each page. It helps users to easily start the tasks and quickly access shortcuts whenever they need. In addition, a clear logo and staff photos add to the credibility of the site. Meanwhile, other reliable government websites are presented and linked with the target e-government website. This helps to improve the reputation of the site. After that, the navigation descriptions are useful to support users identify where they are in the site, with advertisements restrained to help users’ concentrations. However, some serious credibility problems have been found in each egovernment website evaluated (Table 3). In London Authority 1, firstly, information is not consistently presented in the different colors, so users feel a little confused when they try to identify information through the site. Secondly, it dose not offer concise instruments or messages to support user selection of sub-options correctly. Subsequently, although advertisements are restrained, there are still some difficulties in distinguishing ads from content. Finally, the structure of the site is quite unclear, massive number of option makes site looks less credible. The most serious problems in London Authority 2 are (see Table 3): firstly, information may not be presented in consistent colors, so users may spend extra time to identify information. Secondly, any awards earned by the website are not displayed properly. Thirdly, the site may need to improve its usability. Fourthly, the information about latest updates is not clearly indicated, especially for online forms and documents. Lastly, the site lacks a secure message to keep users informed during the data transaction process. The problems in London Authority 3 indicate the similar difficulty of information identification by colors. Moreover, some information lacks a good balance between breadth and depth. During online transactions, the site does not clearly indicate progress which may cause users to lose their patience easily. Lastly, privacy needs to be considered, such as personal information protection and secure message presentation. The quantitative results indicate that the most common credibility problems of the three e-government websites are related to “site ease to use”, “site looks professional” and “site update”. For example, in “site ease to use”, users feel difficulty with the links because the links already visited are not clearly marked. This suggests that more attention needs to be drawn to information access and navigation [8]. There is also a lack of balance in information presentation between breadth and depth. This suggests that designers need to consider different users’ tendencies in searching information. More specifically, if users want an overview of information, more categories with fewer levels can be used. Conversely, if users search for detailed information, fewer categories with more levels are needed. In addition, users expect a clear indication of progress in the process. This suggests that user control over the process is paramount. With regard to “site looks”, users find it hard to identify relevant information. The reasons may be firstly layout, instruction and colors are not used consistently throughout the site. Secondly, some information can not exactly match with its
32
Z. Huang, L. Brooks, and S. Chen Table 3. Crediblity problems
London Authority 1 Guidelines Questions 2 G1 4 G2 6 G4 14 G6 20 21 22 23 G7 24 G8 25 G9 27 G10 30 G12 37 London Authority 2 Guidelines Questions G1 2 15 G5 16 22 23 G7 24 G8 25 G13 39 London Authority 3 Guidelines Questions G1 2 G3 10 16 G5 17 G7 23 G8 25 38 G13 39
Problems Information is not identified in the different colors Page is not labeled to show its relation to others The information is not well organized Instruments displayed by the system are not concise The site does not show detailed contact information The system is not easy to use Navigating the system is not easy It is not clear how far left of the quote process The site is not in a way that makes sense to user The system has not a latest update It is not easy to distinguish ads from content The site is not free from typographical errors The site does not offer agile functions Problems Information is not identified in the different colors It is not easy to find an “about us” page The site does not display any awards it has owned Navigating the system is not easy It is not clear how far left of the quote process The site is not in a way that makes sense to user The system has not a latest update A secure message is not appeared Problems Information is not identified in the different colors The site does display photos of offices or staff The site does not display any awards it has owned No enough information about who is in charge of It is not clear how far left of the quote process The system has not a latest update Confidential areas are not secure A secure message is not appeared
Top five lowest scores 31 36 35 33 35 36 36 35 35 35 34 36 36 Top five lowest scores 26 29 25 26 26 27 25 28 Top five lowest scores 31 31 26 28 31 26 25 29
categories; especially where the pictures are too vague to make sense. This suggests that a higher aesthetic treatment is needed to increase credibility [17]. Regarding “site update”, users find difficulty with identifying updated versions for forms, documents and system. In addition, users worry about privacy and security associated with personal information and safety. The possible solution is widespread adoption of digital certificates and a public key infrastructure [20]. On the other hand, by summing the scores together for each e-government website, an overall assessment can be made of the credibility problem of each e-government website. In details, London Authority 2 has the lowest score with a total score of 1305. London Authority 3 is placed next one with a total score of 1386. The egovernment website with the fewest credibility problems is found to be London Authority 1 which has a total score of 1470.
The Assessment of Credibility of e-Government: Users’ Perspective
33
4.2 Qualitative Assessment During the evaluation, comments about the problems and the good features which are not covered in the questions are recorded. The most frequently cited useful features are regarded as strengths and the most frequently encountered problems are considered as weaknesses (Table 4). Table 4. Qualitative results for London Authority 1,2 amd 3
1G 2G 3G 4G 5G 6G 7G 8G 9G 10G 11G 12G 13G
London Authority1 Strengths Weaknesses Consistent Too many colors options Content is Irrelevant reliable Pictures Democracy Irrelevant offered info. Full info. Too much info. Staff photos Confused options General No detailed contact contact Easily Too many access options None No clear update Few ads None Clear Message is not categories concise Terms, None conditions User’ path Search is limited A sign-in None offered
London Authority2 Strengths Weaknesses Logo, Flashed format images None Unclear pictures Relevant None contents Lots of No user ID info. required Staff No award photos presented Relevant None contact Easily Poor category access None No updated date No ads None FAQs is Site map is not provided helpful None No term, conditions None No secure messages None No password required
London Authority3 Strengths Weaknesses A clear loge Color is not constant Detailed info. Unclear subhead Right URL None None
No services feedback None No sources of news Quick contact None Links work properly None
Search engine is limited No site update
No ads None
None Unclear categories No transparency
None None None
No languages support None
Common Strengths • Site looks professional: generally, the logos are used consistently. In addition, the content designed can properly match the organization and be organized logically. Information can be easily read in the sites, which can also be quickly accessed. • Easy to contact: a fixed contact location on each page. Users can quickly find general contact information whenever need. Most contact information is categorized by different departments and offer various contact methods. • Easy to verify the information accuracy: all e-government websites offer much information. Most information is presented at the right level of detail. Furthermore, the URL can properly match the content presented on each page. In particular, the news on the sites indicates their source and the latest update with a clear date.
34
Z. Huang, L. Brooks, and S. Chen
• Restraint with promotional content: in e-government websites, most advertisements are restrained to help users’ concentrations. Common Weaknesses • Highlight expertise in the organization: some information can not be matched to categories, especially for pictures. Users easily lose patience if they choose irrelevant categories. Moreover, the instruments for options are not concisely explained. • Easy to use: users often get confused where links already used are not clearly marked and the key links are not located in an important place. Additionally, too much information is presented at the home page so that users have to spend more time reading it carefully. In particular, users feel it is difficult to find information because the site search engine can not support advanced search. • Site update: not all information is updated regularly with a clear date, especially for forms and documents. Furthermore, the updated date and version of the website is not indicated at the main page.
5 Conclusion This paper reports an empirical study which assesses credibility in existing egovernment websites, focusing on three local e-government websites in the UK. The evaluation results indicate that the current e-government websites have a much room for improvement of their credibility. The most significant problems are found within the areas of “site easy to use”, “site looks professional” and “site update”. These evaluation results suggest that designers need to pay more attention to credibility in egovernment website design so that e-government websites may be more widely accepted and accessed, ultimately to achieve user centered e-government. Therefore, this study develops an approach to identify credibility problems of e-government websites. The value of this study not only lies within the guidance for designers to improve the credibility of e-governments, but also help designers of other web-based systems to enhance their credibility. However, this study uses a questionnaire-based approach to evaluate credibility, which emphasizes on user perception. In order to obtain a more comprehensive evaluation, both user perception and user performance are recommended. Future studies are needed to assess users performance with the tasks In addition, the results of this study indicate that usability difficulties have an important impact on credibility. Designers who enhance the usability of a web site are likely to enhance the site’s credibility [19]. This suggests that there is a need to conduct future research to evaluate usability of e-government website. The findings of such a study can be used to analyze the relationship between usability and credibility.
References 1. Basu, S.: E-government and developing countries: an overview. International Review of Law Computers & Technology 18(1), 109–132 (2004) 2. Warkentin, M., Gefen, D., Pavlou, P.A., Rose, G.M.: Encouraging citizen adoption of egovernment by building trust. Electronic Markets 12(3), 157–162 (2002)
The Assessment of Credibility of e-Government: Users’ Perspective
35
3. Pavlou, P.A., Gefen, D.: Building effective online marketplaces with institution-based trust. Information Systems Research 15(1), 37–59 (2004) 4. Araujo, M.C.R., Grande, J.I.C.: Performance in e-government: website orientation to the citizens in Spanish Municipalities. In: Proceedings of european conference on egovernment, Trinity College, Dublin (2003) 5. Rieh, S.Y.: Judgment of information quality and cognitive authority in the web. Journal of the American Society for Information Science and Technology 55(8), 743–753 (2002) 6. Fogg, B.J., Tseng, H.: The elements of computer credibility. In: Proceedings of the CHI 1999 conference on human factors and computer system, pp. 80–87 (1999) 7. Grady, L.O.: Future directions for depicting credibility in health care web sites. International Journal of Medical Informatics 75, 58–65 (2006) 8. Kumar, V., Mukerji, B., Butt, I., Persaud, A.: Factors for successful e-government adoption: a conceptual framework. Electronic Journal of E-Government 5(1), 63–76 (2007) 9. Tambouris, S.: European cities platform for online transaction services. In: Proceedings of the European Conference on E-Government (2001) 10. OECD: The E-government imperative. In: OECD E-Government Studies. OECD, Paris (2003) 11. Tapscott, D.: Blueprint to the Digital Economy. McGraw-Hill, New York (1998) 12. Eschenfelder, K.R., Beachboard, J.C., McClure, C.R., Wyman, S.K.: Assessing U.S. federal government websites. Government Information Quarterly 14(2), 173–189 (1997) 13. James, G.: Empowering bureaucrats. MC Technology Marketing Intelligence 20(12), 62– 68 (2000) 14. Sillence, E., Briggs, P., Harris, P., Fishwick, L.: A framework for understanding trust factors in web-based health advice. International Journal Human-Computer Studies 64, 697–713 (2006) 15. Garcia, A.C.B., Maciel, C., Pinto, F.B.: Electronic government: a quality inspection method to evaluate e-government sites. Springer, Heidelberg (2005) 16. Berlo, D.K., Lemert, J.B., Mertz, R.J.: Dimensions for evaluating the acceptability of message sources. The Public Opinion Quarterly 33(4), 563–576 (1969) 17. Robins, D., Holmes, J.: Aesthetics and credibility in web site design. Information Processing & Management 44(1), 386–399 (2007) 18. Toms, E.G., Taves, A.R.: Measuring user perceptions of web site reputation. Information Processing and Management 40, 291–317 (2004) 19. Fogg, B.J.: Credibility and the world wide web. Persuasive Technology, 147–181 (2003) 20. Barnes, S.J., Vidgen, R.: Interactive e-government services: modeling user perceptions with eQual. International Journal of Electronic Government 1(2), 213–228 (2004) 21. Gill, J., Johnson, P.: Research methods for managers. Paul Chapman Publishing Ltd., Boca Raton (1991) 22. Michael, C., John, F.: Capacity building: facilitating citizen participation in local governmence. Australian Journal of Public Administration 64(4), 64–80 (2005) 23. Yang, J.Q., Paul, S.: E-government application at local level: issues and challenges: an empirical study. International Journal of Electronic Government 2(1), 56–76 (2005)
Auto-complete for Improving Reliability on Semantic Web Service Framework Hanmin Jung1, Mi-Kyoung Lee1, Won-Kyung Sung2, and Beom-Jong You1 1
Information Technology Research Lab., KISTI, Korea 2 Dept. of Policy Research, KISTI, Korea {jhm,jerryis,wksung,ybj}@kisti.re.kr
Abstract. This paper presents two methods for enhancing auto-complete which providing search keywords that the user wants. The first is to display only search keywords that can guarantee a successful search result in real time regardless of document’s insertion, deletion, and update. The second is to display search keywords with their entity types such as person, institution, and topic. To accomplish them, we introduce an auto-complete table that stores the entities extracted and indexed from input documents and their document frequency (DF). An auto-complete manager checks whether each entity in the table can guarantee a successful search result or not by considering its DF, and provides proper entities with their types to the user. To verify the effect of the auto-complete, we are designing a comparative experiment. OntoFrame 2007 without the functions will be compared with OntoFrame 2008 with the functions for discovering the effect of our auto-complete on the reliability of Semantic Web services. Keywords: Auto-complete, Semantic Web, Semantic Web Framework, OntoFrame, Reliability, Document Indexing.
1 Introduction Auto-complete is a feature for predicting a word or phrase that the user wants to type in without the user actually typing it completely. Auto-complete on the Web is usually implemented by Ajax (Asynchronous JavaScript and XML) which is one of key technologies classified in Web 2.0 (see fig. 1). They are widely applied over the Web sites including digital libraries, commercial portals, and enterprise applications [1]. It is expected to be applied more and more in the viewpoint of enhancement of the user experience. However, most auto-completes simply display search keywords retrieved from the user’s query logs and system dictionaries without considering the quality of search results. The reliability of such auto-completes would be dropped in case that the user selects a search keyword which can not generate a successful search result. It is not a special case for small/medium enterprise portals because they suffer from relative short of contents compared with commercial portals such as Google, Yahoo, and Amazon. Even worse, different types of search keywords are mixed in their auto-complete lists. It compels the users to look up whole of the list for finding a M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 36–44, 2009. © Springer-Verlag Berlin Heidelberg 2009
Auto-complete for Improving Reliability on Semantic Web Service Framework
37
Fig. 1. Classic Web application model and Ajax Web application model1
search keyword they want. These problems can be solved by introducing two methods that display only search keywords that can guarantee a successful search result and provide their entity types such as person, institution, and topic. This research deals with the way how the two methods can be achieved on a Semantic Web service framework.
2 Related Studies Han and Lee deals with costs and benefits of internet search with costumers’ prospective [2]. They reveal the fact that pains during the search are closely related with the user satisfaction, thus it should be designed to minimize the time and effort required. Auto-complete is an important function in that it can reduce the pains of the user by providing a way to select a search keyword without typing it completely. Another study on the usability of auto-complete concludes that the user satisfaction and search efficiency (the time needed to complete a given task) are affected by the function [3]. It also reports many positive comments were received from the users. Even for mobile devices, auto-complete enables the users to finish their tasks with fewer errors [4]. The study has clearly proven its advantages in terms of satisfaction, efficiency, and stability. However, studies on auto-complete enhancement are comparatively insufficient against its importance. A patent created by Lee and Wales proposes auto-complete using multiple dictionaries in ways of lookup and merge [5]. Auto-complete introduced 1
http://www.adaptivepath.com/images/publications/essays/ajax-fig1_small.png
38
H. Jung et al.
by Miki et al. can be classified as an advanced function in that it deals with ontology and data conversion [6]. An application form is generated in ways of referring the ontology data that knowledge managers have constructed manually. When the users input data in the form, the auto-complete function recognizes language types and field types in order to convert the data into appropriate values. However, the study is not closely concerned with ours because we concentrate on only the construction of search keywords in auto-complete list. A study on auto-complete for predicting Chinese characters with partial uses a prefix tree decoder [7]. It also adopts speech recognition for supporting bi-modal Chinese character input. Liu et al. found the bi-modality improves the input speed, but their research scope does not match with us exactly. Bangalore et al., as a significant study on auto-complete for improving reliability, introduces a method for providing search keywords that are able to generate a successful search result in the UMLKSK interface [8]. It uses a flag to mark the success or failure of search results. In case of success, the corresponding search keyword is marked with the flag. This method ensures a successful search result by displaying only the search keywords by considering the flag when the auto-complete list is provided. However, the auto-complete list would be occasionally out-of-date because the marking occurs only when the user enters a search keyword. Even after adding an input document which includes a search keyword that has caused the failure previously, the keyword will not be display in the auto-complete list until the user types manually the search keyword without referring to the list.
Fig. 2. Example of a search result with failure caused by selecting an improper search keyword in auto-complete list (‘Wizwid’ online shopping mall)
Many popular Web sites, even including enterprise portals, provide auto-complete for helping the users to find search keywords in ease. They usually use the popularity of the users’ input regardless of success/failure of search results as shown in fig 2. Because they do not consider incremental data add-up, which would cause mismatch problem between auto-complete and search results, search using auto-complete occasionally fails. As a new technology, Google offers keyword suggestions in real time in the form of auto-complete. However, mismatch in the number of documents occurs as shown in fig. 3. It may be caused by incomplete incremental indexing
Auto-complete for Improving Reliability on Semantic Web Service Framework
39
Fig. 3. Example of mismatch between auto-complete and search results in the number of documents (‘Google’ suggest; 7,710,000 results in auto-complete and 7,300,000 in search results)
management related with auto-complete. The following section explains how we resolve these problems with an auto-complete manager and an auto-complete table.
3 Auto-Complete in OntoFrame OntoFrame, a Semantic Web service framework [9] [10], aims at search and discovery of science and technology information using Semantic Web technologies. It includes a search engine and a reasoning engine. The former searches full-text documents and the latter discovers implicit knowledge by exploiting relations between instances, i.e. entities. OntoFrame gathers legacy data and transforms them into RDF (Resource Description Framework) triples by referring to predefined ontology schema designed for a specific application domain. The reasoning engine expands the triples at idle time, i.e. forward chaining, using user-defined rules, and puts back the results into an RDF triple store. OntoFrame-based service communicates with the two engines through Web Services, SPARQL (Simple Protocol and RDF Query Language) queries, and XML documents. OntoFrame provides entity-centric unified search 2 . Predefined entities are managed in a URI server which is a semantic data management tool with the RDF triple store. The server is referred by a document indexer for acquiring entities and their types from an input document (see fig. 4). Auto-complete is applied to OntoFrame for helping the user to search with convenience and efficiency. To sustain reliability, auto-complete should provide search keywords that guarantee successful search results. However, the previous version of OntoFrame (OntoFrame 2007) provided a simple auto-complete function which just shows a search keyword list matched with the user’s input string. The keywords are pre-extracted and refined topic keywords from test collection. Even the case that a keyword never appears in indexed documents, it can be displayed in auto-complete list, 2
Entity-centric unified search can be defined as a unified search generating a Web page which consists of service components selected dynamically according to the user’s input corresponding with an entity reserved in the system. Different kinds of search result pages would be generated in case of different entity types.
40
H. Jung et al.
Document
Document Indexing OntoFrame Service URI Server
Auto-complete Manager Auto-complete Table
Fig. 4. Auto-complete process
Fig. 5. Example of changes in the auto-complete table according to document’s indexing (P: person, I: institution, T: topic)
which will cause a failed search result. Even worse, the use of all of the topic keywords increased the load of auto-complete. We also found the users’ reliability of the service is degraded when they select improper keywords, i.e. keywords that can not guarantee a successful search result, in the list. Thus, this version of OntoFrame (OntoFrame 2008) introduces an auto-complete manager for guaranteeing successful search results from selecting a search keyword in auto-complete and further increasing usability by displaying entity types additionally. After extracting entities from an input document, the auto-complete manager updates the document frequency (DF) of the entities in an auto-complete table, i.e. DF of the entities would increase by one as shown in fig. 5.
Auto-complete for Improving Reliability on Semantic Web Service Framework
41
When the user enters a search keyword, the auto-complete manager finds the entities matched with the keyword by looking up the auto-complete table. Then, it checks whether DF of each retrieved entity is zero or not. Those entities with more than zero are displayed in the auto-complete list. Since the entities in the list are extracted from indexed documents, search results generated from them would be always successful. This method is different from that of Bangalore et al. because DF of entities is updated instantly according to document’s indexing, and only proper entities are always displayed in the auto-complete list [8]. The following pseudo codes explain how the auto-complete manager deals with the auto-complete table for generating an auto-complete list corresponding with the user’s input. manage_auto_complete(String) { Entity[] = get_matched_entity_list(String); Valid_entity[] = check_doc_frequency(Entity[]); sort_entity_by_type(Valid_entity[]); return Valid_entity[]; } sort_entity_by_type(Entity[]) { find_entity_type(Entity[], Person_entity[], Institution_entity[], Topic_entity[]); sort_entity(Person_entity[]); sort_entity(Institution_entity[]); sort_entity(Topic_entity[]); sorted_entity[] = merge_entity(Person_entity[], Institution_entity[], Topic_entity[]); return Sorted_entity[]; } update_auto_complete_table(String[], Operation) { While (Entity = get_next_string(String[])) { Switch (Opreation) { Case Insert: increase_doc_frequency(Entity); Case Delete: decrease_doc_frequency(Entity); } } } The above functions lookup and manage the auto-complete table. Manage_auto_complete() gets the user’s input string from OntoFrame service. It finds the entities of which name is matched with the input string. Check_doc_frequency() checks each entity whether it has DF of more than zero. Entities with document frequency of zero are excluded. By referring the auto-complete table, sort_entity_by_type() makes a merged entity list sorted by their types, e.g. topic and person. Then manage_auto_complete() returns them to the service. DF of each entity is managed by update_auto_complete_table(). The function increases or decreases DF according to operation types, i.e. insert and delete. In case of document’s update, insert process would follow delete process. Finally, the auto-complete manager returns a proper entity list as shown in the following example and fig. 6.
42
H. Jung et al.
Fig. 6. Example of auto-complete in OntoFrame (It shows topic and person entities matched with “Seman.”)
API: EntityList manage_auto_complete(String keyword) Calling example: manage_auto_complete(“sem”) Result example (see fig. 3): [semantic activation, Topic] [semantic association, Topic] [semantic blocking, Topic] … [Semantha,Ellis, Person] Current number of the entities stored in the service is 923,449 (652,507 for topic and 270,942 for person). We found that 362,319 topic entities (55.53% when compared
Fig. 7. Example of search result generated from topic entity “neural network”
Auto-complete for Improving Reliability on Semantic Web Service Framework
43
Fig. 8. Example of search result generated from person entity “Jinde Cao”
with total topic entities) acquired from indexing of about 200,000 journal papers on information technology and bioinformatics lead to successful search results. Fig. 7 and fig. 8 show examples of successful search results for topic and person entities.
4 Conclusion We introduced an enhanced auto-complete with two functions; displaying only search keywords that can guarantee a successful search result in real time regardless of document’s insertion, deletion, and update and displaying search keywords with their types such as person, institution, and topic. For accomplishing them, an auto-complete manager and an auto-complete table are used. The manager gets the user’s input string and returns a proper entity list by looking up the table. To verify the effect of the auto-complete, we are designing a comparative experiment. The task to be achieved is “Find a representative researcher and an institution for top 5 topics in the auto-complete list corresponding with ‘neural’.” OntoFrame 2007 3 without the functions will be compared with OntoFrame 20084 with the functions. We expect to find the effect of our auto-complete on the reliability of Semantic Web services.
References 1. Beauheim, C., Wymore, F., Nitzberg, M., Zachariah, Z., Jin, H., Skene, J., Ball, C., Sherlock, G.: OntologyWidget – a Reusable, Embeddable Widget for Easily Locating Ontology Terms. J. BMC Bioinformatics 8, 338 (2007) 3 4
http://isrl.kisti.re.kr/wsearch/search/main.jsp http://150.183.113.186:8080/OntoFrame_ISRL/2008_new/main.jsp
44
H. Jung et al.
2. Han, D., Lee, E.: Exploring the Costs and Benefits of Internet Search from the Online Customers’ Perspective: Implications for the Consumer Adoption of the Semantic WebBased Search Engines. J. Business Education Research 11(1) (2007) (in Korean with English Abstract) 3. Kluge, J., Kargl, F., Weber, M.: The Effects of the Ajax Technology on Web Application Usability. In: International Conference on Web Information Systems and Technologies (WEBIST 2007) (2007) 4. Udyaver, S.: Experimental Comparison of Usability of Hybrid Mobile Devices. In: The 20th Computer Science Seminar (2004) 5. Lee, K., Wales, K.: Methods and Systems for Implementing Auto-complete in a Web Page. US 2004/0039988 A1 (US. Patent) (2004) 6. Miki, T., Ogawa, H., Matsuda, N., Miura, H., Taki, H., Hori, S., Abe, N.: Auto Complete Method for Web Application from Based on Term Hierarchy. In: The 20th Annual Conference of the Japanese Society for Artificial Intelligence (2006) (in Japanese with English Abstract) 7. Liu, P., Ma, L., Soong, F.: Prefix Tree Based Auto-Completion for Convenient Bi-modal Chinese Character Input. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008) (2008) 8. Bangalore, A., Browne, A., Divita, G.: UMLSKS SUGGEST: An Auto-complete Feature for the UMLSKS Interface Using AJAX. In: AMIA 2006 Annual Symposium (2006) 9. Jung, H., Lee, M., Kang, I., Lee, S., Sung, W.: Finding Topic-Centric Identified Experts Based on Full Text Analysis. In: The 2nd International Expert Finder Workshop at ISWC 2007 + ASWC 2007 (2007) 10. Sung, W., Jung, H., Kim, P., Kang, I., Lee, S., Lee, M., Park, D., Hahn, S.: A Semantic Portal for Researchers Using OntoFrame. In: The 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference (ISWC 2007 + ASWC 2007) (2007)
Effects of AJAX Technology on the Usability of Blogs Sumonta Kasemvilas and Daniel Firpo School of Information Systems and Technology, Claremont Graduate University, 130 E. 9th Street, Claremont, CA 91711, USA {Sumonta.Kasemvilas,Daniel.Firpo}@cgu.edu
Abstract. AJAX can enhance Web applications by updating a part of the Web page instead of the whole page. This change of technology relates to a usability issue. We used WordPress 2.3 to create two versions of blogs: non-AJAX and AJAX. Then we conducted an experiment by giving a task scenario to eight participants. We collected performance data by recording users’ mouse movements during the experiment and collected preference data by providing a questionnaire after the tasks. Finally, we conducted post-experiment interviews to gather participants’ experiences. The quantitative results show that AJAX did not improve users’ performances the first time they used it, while qualitative interviews demonstrate participants’ satisfaction with AJAX blogs. Keywords: AJAX, Blog, Ease of Learning, Efficiency of Use, Error Frequency and Severity, Experiment, Satisfaction, Usability, WordPress.
1 Introduction AJAX (Asynchronous JavaScript and XML) is a set of technologies such as eXtensible Markup Language (XML), XMLHttpRequest, Cascading StyleSheets (CSS), and the Document Object Model (DOM), combined with JavaScript. It is not a single technology, but several technologies that, when used together, enhance the capability of Web applications in innovative ways [1]. Many corporations use AJAX in applications, such as Google Maps, Gmail, iGoogle, Hotmail, and Yahoo Flickr. An advantage of AJAX is that Web browsers do not need to refresh the whole page. They only need to reload a portion of the page, which makes the Web site load faster and increases performance, because users can receive responses from the server faster than in classical Web applications. However, this also leads to a problem with AJAX. Users who are familiar with old-style Web applications may not be able to effectively use Web applications that contain AJAX technology due to a lack of understanding of how AJAX works. The users may not notice what has changed on the screen, or they might be waiting for the results to display, or for a response from the browser. Users might need to get accustomed to new conventions. This change of technology relates to a usability issue [2]. Blogs, a slang form of the term “Web logs,” are a Web 2.0 technology used extensively on the World Wide Web. Blogging has gained quite a bit of popularity in the recent years. “Blog” was named as the “word of the year” in 2004 by MerriamWebster. The large increase in popularity of blogs has led to the rapid growth in the M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 45–54, 2009. © Springer-Verlag Berlin Heidelberg 2009
46
S. Kasemvilas and D. Firpo
number of blog users, from individuals using them for personal use, to larger entities such as companies or universities. Blogs play a vital role not only in businesses and organizations, but also in academia. For example, the Brisbane Graduate School of Business at Queensland University of Technology records students’ experiences through the use of the ‘MBA blog’ [3]. Thus, it is in our interest to study human computer interaction of AJAX technology in blogs.
2 AJAX Classical Web applications use HTTP protocol to give requests to and receive responses from the server. This process takes time and refreshes the user’s whole screen, and a sign such as a rotating globe, a spinner, or a progress bar shows the user that the browser is loading. However, now we have moved on to a JavaScript-based paradigm, which calls an AJAX engine to send asynchronous HTTP requests to the server. AJAX supposedly improves the effectiveness of the Web by updating a part of the Web page instead of the whole page. It seamlessly exchanges small quantities of data between the browser client and the server. Because refreshing the Web page is not required when updating, the page becomes more interactive. This results in higher responsiveness, speed, and functionality [2]. AJAX is a client-side technology. In the AJAX Web application model, instead of sending an HTTP request from client to server as before, the user interface of the browser client will send a JavaScript call to the AJAX engine. Then, the AJAX engine sends an asynchronous HTTP request to the Web and/or XML servers. From the server side to the client side, the server sends XML data to the AJAX engine, and the AJAX engine sends HTML and CSS data to the user interface of the client. This also decreases bandwidth usage when compared to classical Web applications. AJAX also gains the advantage of JavaScript, in which developers only need to code once, and their code can work across different platforms. This makes AJAX more adaptable. However, AJAX has several disadvantages. Because it does not require a whole page reload, users cannot push the back button to go back to a prior state. This can cause problems because most users become accustomed to this behavior when they use Web browsers. Moreover, users cannot add bookmarks to keep a certain stage of the page when the page dynamically updates only small portions on the screen [4]. These problems may be solved by adding more complicated programming. Recently, Kluge, Kargl, and Weber [5] reported on the effects of AJAX technology on usability in Web applications such as message boards and auto-completion widgets, while our study focuses specifically on blogs. They studied only user satisfaction and time for completion of tasks. They looked at time for completion of tasks as a parameter of efficiency of use. They concluded that AJAX technology dramatically improves users’ satisfactions and efficiency of use in some scenarios. However, our study not only focuses on efficiency of use and satisfaction, as in Kluge et al’s [5] study, but also on other measures of usability, such as ease of learning and error frequency and severity, which were not measured in their study.
Effects of AJAX Technology on the Usability of Blogs
47
3 Research Question and Hypotheses Development Krug [6] asserts that one of the most important factors for Web usability is when users glimpse the Web page and are capable of easily interacting with it. Users should not have to take a long time to think about how to use the Web page. Oulasvirta and Salovaara [7] assert that it is important for a user interface to be invisible, as an interface must be uncomplicated (requiring no other knowledge besides the common sense of the user to operate) and interactive (ready to respond to the user on a continuous basis). Usability relates to the ability of users to learn and utilize a system or a product, such as a Web site or a computer application, to accomplish their purposes [8]. It includes the degree of satisfaction they perceive from learning and using the system [8]. Measurement of usability involves a group of factors that we need to consider for a user interface [8], [9]. First, ease of learning is measured from the ability of users who have never seen the user interface before to easily accomplish simple tasks without substantial training [8], [9], [10]. This includes effectiveness or the ability of users to successfully achieve tasks. Second, efficiency of use refers to how quickly users who have used the system before can complete their tasks easily and without frustration [8], [9], [10]. Third, memorability refers to how easily users can capably use the system again after not using the system for a period of time [8], [9]. Fourth, error frequency and severity refers to how often users make mistakes when using the system, the severity of these mistakes, and how easily they can recover from these mistakes [8], [9]. Finally, participative satisfaction refers to the extent to which users are satisfied with the system [8], [9], [10]. Currently, few studies have been conducted on the relationship between AJAX technology and Web usability when users utilize blogs. We are interested in finding out if AJAX technology within a blog leads to improvements in Web usability. We will examine whether adding AJAX plugins to a blog has a positive effect on the usability of that blog. To measure if AJAX technology affects the usability of a blog, we categorize these factors into two groups: preference and performance [10]. First, preference is measured from participative satisfaction. Second, performance is measured from ease of learning, error frequency and severity, as well as efficiency of use. We did not measure memorability because the experiment was done in one session. In addition, Kluge et al. [5] concluded that user satisfaction and efficiency of use when using AJAX technology are higher than when not using AJAX. Thus, we hypothesize that: HYPOTHESIS 1: The usability of blogs with AJAX technology is greater than in regular blogs. H1a: The ease of learning in blogs using AJAX technology is greater than in regular blogs. H1b: The error frequency and severity in blogs using AJAX technology is lower than in regular blogs. H1c: The participation satisfaction of users using blogs with AJAX technology is greater than the participation satisfaction of users using regular blogs. HYPOTHESIS 2: The efficiency of use in blogs using AJAX technology is high when users have used AJAX-enhanced blogs before.
48
S. Kasemvilas and D. Firpo
4 Experiment Design 4.1 System Design To test Web usability, we created two versions of a blog that had similar interfaces. The first version did not have AJAX plugins, but the second version did. We used WordPress 2.31 to create blogs in this study. WordPress is a well-known blog application that provides easy installation and usage. Calendar, search box, and comment are basic functions of blogs in WordPress. One can find these functions in most WordPress blogs. This makes it easier to compare the usability of regular blogs and blogs using AJAX technology. In this study, we installed three AJAX plugins. First, AJAX Calendar2 was used to find articles for each day and month. This plugin allows the reader to click a special button to show all articles in a specific month, a service which the regular calendar does not offer. When the user clicks that button, AJAX will retrieve all articles in that particular month and display them on the bottom of the calendar. This plugin makes AJAX blogs faster than their non-AJAX counterpart, since it refreshes only this specific part of the page. Second, LiveSearch3 allowed users to search articles within the blog. This plugin shows the results in a pop-up style menu using AJAX technology (similar to how in Google Suggest, as users type each character, the software displays a set of words that relates to these characters to predict what the user is searching for). Finally, Inline AJAX Comment4 allowed users to add comments and auto-update those comments without a full page reload. With AJAX technology, this plugin provides a much smoother, faster commenting feature in the blog. Users can also click to Show or Hide comments. 4.2 Task Scenario A task scenario was designed for readers because most people use blogs in their daily lives to find information about specific interests and participate with other users. The set of tasks is a basic operation readers perform when they visit a blog. The readers will find interesting articles, read them, search for related information, and add comments. In AJAX version, participants complete a set of tasks by using plugins that contain AJAX technology, while in non-AJAX version, participants use the default plugins that come with the original version of WordPress. To test Web usability, participants needed to accomplish a set of tasks that was listed on the instruction sheet. First, participants needed to login as guests. Second, they needed to use the calendar in the left sidebar to find an article named “Blog#2” in September, and post a comment on that article. Third, before writing any comments, participants needed to use the search box in the left sidebar, input the word “dog,” and find the article related to this word. Then, they had to copy the paragraph from the article to add in a comment box of the blog’s article. They needed to do the same set of tasks in both non-AJAX and AJAX 1
http://codex.wordpress.org/Version_2.3 http://wordpress.org/extend/plugins/ajax-calendar/ 3 http://wordpress.org/extend/plugins/livesearch/ 4 http://www.ditii.com/2006/07/07/wordpress-plugin-inline-ajax-comments/ 2
Effects of AJAX Technology on the Usability of Blogs
49
versions. Then, they used an AJAX version one more time at the end of the experiment so we could measure efficiency of use. 4.3 Method and Procedure To identify usability problems, our experiment was conducted on a small group of users as per Nielsen’s suggestion [9]. Nielsen [11] claimed that “[t]he best results come from testing no more than 5 users and running as many small tests as you can afford.” More test subjects would only result in a marginal increase in the number of problems, errors, and different completion times for tasks found [11]. Moreover, the small number of participants for qualitative usability testing is enough [9]. We recruited eight participants for this study. All users had to complete all three experiments (non-AJAX, AJAX for the first time, and AJAX for the second time) within the same day, back to back to back. To decrease participant bias and avoid an order effect resulting from the participants’ expectations and impressions for the next version of the blog after completing the first one, we used counterbalanced design. Half the participants used the non-AJAX version first, then did the same tasks again with AJAX version, while the other half used the AJAX version before the non-AJAX version. After all eight participants finished the set of tasks with these two versions (AJAX and non-AJAX), we asked them to repeat the set of tasks with the AJAX version to measure efficiency of use. We measured whether they improved performance efficiently after they learned how to use the new kind of blog that incorporated AJAX technology the first time around [12]. At the end of the experiment, participants were asked to complete questionnaires about their background, blog experience, and questions about satisfaction with the blogs. After participants finished answering the questionnaire, we conducted postexperiment interviews to collect qualitative data to substantiate quantitative results. 4.4 Usability Measurement In the usability test, we collected quantitative data of both participants’ performance and participants’ preference (Table 1). We collected two types of data: what really happened when participants used the blog (performance data) and what participants thought when they used it (preference data) [10]. For the performance metric, we used the free trial software All In One Keylogger5 for Windows. This software can visually capture users’ mouse movements and to be reported in visual logs and HTML reports. It records where users click on the screen, how they move the mouse, and when they perform certain actions. We used this information to calculate a performance metric for the usability test. To prevent threats to validity, this software has a “hidden mode” feature. Participants were not aware that the researcher observed their movements when they used the blog. For user preference, we used a questionnaire to measure how satisfied the users were when using the blog. In the questionnaire, ratings on a 1 to 5 Likert scale were used to allow for variation in data. For example, to find out to what extent participants prefer any version, the participants could express their feelings from ‘strongly disagree’ (1) to ‘strongly agree’ (5). 5
http://www.relytec.com/download.htm
50
S. Kasemvilas and D. Firpo Table 1. Usability metric (Adapted from [10])
Performance Usability Metric Total time to completion (Time) Number of steps (Step)
Amount of confusion (Confusion) Pathway analysis and the number of user errors (Error) Preference Usability Metric User satisfaction User comments Preference ratings
Usability Measure Ease of learning: - Can participants complete the task scenario quickly? Ease of learning: - How many steps do participants take in order to successfully complete the tasks? Error frequency and severity: - How many times do participants get confused? Error frequency and severity: - How many errors do participants make when they use the blog? Usability Measure Participative satisfaction: - Do participants get pleasure from using the blog? Participative satisfaction: - Are participants confused when they use the blog? Participative satisfaction: - Do participants prefer the design of the blog?
5 Results 5.1 Data Analysis The participants were eight graduate students aged between 20-40 years old. Three of them were male and the other five were female. All participants were familiar with using the Internet, and half of them had their own blogs. In hypothesis H1a, we were interested in whether the ease of learning in blogs using AJAX technology is greater than in regular blogs. H1a was tested by paired sample t-test on Time and Step parameters of participants using non-AJAX blogs and AJAX blogs for the first time (Table 2). The results indicated that the presence of AJAX increases the time needed to complete tasks for first-time users. At the significant level of .05 (1-tailed), users spent significantly more time completing tasks in AJAX blogs than in regular blogs. However, number of steps participants took to complete tasks in AJAX and nonAJAX blogs were not significantly different. Thus, H1a was disconfirmed. In hypothesis H1b, we were interested in whether or not the error frequency and severity in blogs using AJAX technology was lower than in regular blogs. To test H1b, paired sample t-test was performed on the participants’ Confusion and Error parameters when they used non-AJAX blogs and AJAX blogs for the first time (Table 2). At the significant level of .05 (1-tailed), contrary to our hypothesis, users had more confusion and errors in AJAX blogs than in regular blogs. Thus, H1b was also disconfirmed. In hypothesis H1c, we were interested in whether the participation satisfaction of users when they used blogs with AJAX technology was greater than their participation satisfaction when they used regular blogs. We calculated mean and standard deviation to examine whether the average preference of users when using non-AJAX blogs was different from when they used AJAX blogs. We tested the hypothesis H1c by using paired sample t-test of preference between the two versions
Effects of AJAX Technology on the Usability of Blogs
51
Table 2. Difference of performance scores between AJAX for the first time (AJAX1) and NON-AJAX Usability Dimension (N = 8) Time Step Confusion Error
AJAX1 M SD 455.88 144.21 13.13 4.36 6.25 5.31 4.50 5.10
NON-AJAX M SD 322.13 109.90 14.75 3.92 1.50 2.73 0.50 0.76
t
p
2.01 -0.60 1.91 2.11
.042 .284 .049 .037
Table 3. Difference of preference scores between AJAX and NON-AJAX Usability Dimension (N = 8) Satisfaction
AJAX M SD 3.58 0.97
NON-AJAX M SD 3.13 0.56
t
p
0.89
.201
Table 4. Difference of performance scores between AJAX for the first time (AJAX1) and AJAX for the second time (AJAX2) Usability Dimension (N = 8) Time Step Confusion Error
AJAX1 M SD 455.88 144.21 13.13 4.36 6.25 5.31 4.50 5.10
AJAX2 M SD 176.13 86.63 9.13 1.25 0.13 0.35 0.00 0.00
t
p
5.87 2.50 3.20 2.50
.000 .021 .008 .021
of blogs, non-AJAX and AJAX (Table 3). The results indicated that, at the significant level of .05 (1-tailed), users’ satisfaction with AJAX blogs was not greater than their satisfaction with regular blogs. Thus, H1c was not supported. In hypothesis H2, we were interested in whether the efficiency of use in blogs using AJAX technology is high when users have used AJAX-enhanced blogs before. Thus, H2 was tested by paired sample t-test on participants’ Time, Step, Confusion, and Error parameters when they used AJAX for the first time and when they used it again for the second time (Table 4). At the significant level of .05 (1-tailed), users spent less time, took fewer steps, with less confusions and errors in AJAX blogs the second time around. Thus, H2 was confirmed. From the data, participants using AJAX for the first time took more time to finish the task (Figure 1), felt more confused, and made more errors compared to when they did the same set of tasks without AJAX. Therefore, H1 was disconfirmed. However, in H2 when using the AJAX version for the second time, participants spent less time (Figure 1), used fewer links, were less confused, and made slightly fewer errors than when they performed the set of tasks without AJAX. Thus, the data supports H2. This indicates that learning is a very important step for introducing a new technology.
52
S. Kasemvilas and D. Firpo
Total Time to Completion 800 700
687 637
Time (seconds)
600 500
497
475
454
436
409
400
376
300
370 326
298
AJAX1
339 299
308
309
295
AJAX2
246
215
200
Non-AJAX
164 127
106
100
123
117
119
0 1
2
3
4
5
6
7
8
Participants
Fig. 1. This graph shows the time to completion of each participant when each participant used Non-AJAX blogs, AJAX blogs for the first time, and AJAX blogs for the second time. Three participants out of eight spent less time when they used the AJAX version for the first time as compared with the non-AJAX version. Seven out of eight spent less time on the AJAX version the second time as compared with the non-AJAX version. All participants spent less time when they used the AJAX version for the second time as compared with the first time.
5.2 Post-experiment Interview Although the number of participants was not large, which may cause the power of the statistical test to be low, the results of most statistical analyses reached significant level of alpha .05. However, most of the results surprisingly stand in direct contrast to our hypothesis. Thus, we would like to triangulate the results using both quantitative and qualitative methods. We conducted post-experiment interviews to collect impression of participants when they used two versions of blogs and to see whether their satisfaction matched their performance when they worked with the blogging software. Some sample questions were asked such as: Were you confused when you used the AJAX blog for the first time? Why? If you prefer the AJAX blog to the non-AJAX blog, please give your reasons why. In the study, we had eight participants, and two of them did not like the AJAX version because the interface was more complicated than a non-AJAX version. One participant explained that I prefer non-AJAX version especially if I know exactly what I am searching for. AJAX version confuses my usage. In general, I prefer simplicity rather than sophistication when I search any web browsers. Another participant said: I don’t like it because it is too complicated. It is not necessary. The rest of the participants liked the AJAX version because it provided functions that help users do their jobs faster and more interactively. The following comments point out the potential benefits of AJAX version:
Effects of AJAX Technology on the Usability of Blogs
53
It’s much easier to find what I want using calendar and search box. I like search box the most due to the ease of use. It works like Google Suggest. The interface is much better than non-AJAX version. The AJAX blog has functions that help users do their jobs faster. I love the search box. AJAX blog has more functions and takes less time to refresh. It’s more useful when one knows how it works. AJAX should have increased usability performance and preference, and while the interviews showed that most of the participants liked AJAX, the statistical results did not support our first hypothesis. This may be because some of the AJAX-specific features were confusing for them when compared with regular blogs. Participants expressed that when they used the AJAX blog, they felt confused with issues such as how AJAX Calendar only changes a small portion of the screen. They did not understand how it worked and found it difficult to use when compared with the nonAJAX version. This may be explained with the need for time to get used to new technology. Participants’ performances were low when they used the AJAX blog for the first time, and they confessed that they had not knowingly used AJAX technology before. Thus, they felt unfamiliar with the new interface conventions. However, the participants’ performance when they used AJAX for the second time showed that after the users learned and understood how to use AJAX technology, their performance improved significantly.
6 Discussion and Conclusion This study attempts to measure how AJAX technology affects Web usability when users use a blog. We designed two versions of a blog, which have a similar user interface. However, WordPress provides many different theme styles. Thus, a theme chosen in this study may not have the same result as if we had used other themes. The experiment was controlled in a closed environment to record users’ mouse movements but it still depended on the speed of the Internet and CPU of the computer used during the experiment. This may cause delay times or load times that do not relate to user performance. Kluge et al’s [5] study shows that AJAX technology increases efficiency of use and satisfaction in AJAX Web application in some task scenarios, but our study showed that AJAX technology does not always increase usability. This may point out that learning for new technology is vital, especially when users are familiar with traditional technology. In the United States, the National Institute of Standards and Technology (NIST) has begun to address some of the AJAX technology issues in the need for Web administrators and IT Management to monitor usability and security of Webcode design, development, test, and maintenance [13]. Blogs are a prevalent Web 2.0 tool that has been widely adopted amongst a broad subset of society, including many users who are not particularly tech savvy. To us, it is important to investigate what happens when one applies new technology to a tool so we can learn how to improve the technology and enhance human computer interaction. From our results, we can see
54
S. Kasemvilas and D. Firpo
that although in an easy-to-use tool with widespread adoption, such as blogs, there are many challenges to overcome in applying new technology that users are not familiar with. From this perspective, the results of this paper suggest the problems of using AJAX technology in a blog and may help Web developers use AJAX technology to improve Web usability of a blog.
References 1. Garrett, J.J.: Ajax: A New Approach to Web Applications, http://www.adaptivepath.com/ideas/essays/archives/000385.php 2. Paulson, L.D.: Building Rich Web Applications with Ajax. Computer, 14–17 (2005) 3. Williams, J.B., Jacobs, J.: Exploring the Use of Blogs as Learning Spaces in the Higher Education Sector. Australasian Journal of Educational Technology 20(2), 232–247 (2004) 4. West, J.: Ajax: not just another acronym or is it? Searcher 14, 13–15 (2006) 5. Kluge, J., Kargl, F., Weber, M.: The Effects of the AJAX Technology on Web Application Usability. In: WEBIST 2007 International Conference on Web Information Systems and Technologies, pp. 289–294 (2007) 6. Krug, S.: Don’t Make Me Think: A Common Sense Approach to Web Usability. New Riders Press, New York (2000) 7. Oulasvirta, A., Salovaara, A.: A Cognitive Meta-analysis of Design Approaches to Interruptions in Intelligent Environments. In: Conference on Human Factors in Computing Systems, pp. 1155–1158. ACM, New York (2004) 8. Usability gov., http://www.usability.gov/basics/whatusa.html 9. Jakob Nielsen’s Alertbox: Usability 101: Introduction to Usability, http://www.useit.com/alertbox/20030825.html 10. Usability gov., http://www.usability.gov/basics/measured.html 11. Jakob Nielsen’s Alertbox: Why You Only Need to Test With 5 Users, March 19 (2000), http://www.useit.com/alertbox/20000319.html 12. Nielsen, J.: Usability Engineering. Morgan Kaufmann, San Francisco (1993) 13. National Institute of Standards and Technology, http://csrc.nist.gov/publications/nistpubs/800-28-ver2/ SP800-28v2.pdf
Usability Evaluation of Dynamic RSVP Interface on Web Page Ya-Li Lin and Darcy Lin Department of Statistics, Tunghai University, Taichung, Taiwan 40704
[email protected]
Abstract. The usability of rapid serial visual presentation (RSVP) interface was evaluated using subjective preference questionnaire and performance measurement methods. Forty-two students voluntarily participated in this study. The results indicated the shelf interface moving from bottom-left to upper-right along a linear trajectory with moving speed of 20~30 frame per second (FPS) are most preferable. The carousel interface following circular trajectory in clockwise with moving speed of 10~15 FPS are most preferable. “Meets user experience”, “aesthetic and simple design”, “effective to use”, and “easy to learn” all conform to the usability goals. In addition, the results based on performance measurement showed a logistic regression model with RSVP mode and moving speed are fitted very well. There is the highest probability estimation of correct recognition for the carousel interface and moving speed of 30 FPS, however, the shelf interface and 15 FPS has the lowest probability estimation of correct recognition. Keywords: Dynamic Interface, Subjective Preference, Rapid Serial Visual Presentation (RSVP), Usability Evaluation.
1 Introduction Search for stock and weather information, prices for flights and mortgages, and new book issued is common in Internet search. A typical search session consists of: (1) formulating and entering the query, (2) browsing the search results, and (3) viewing selected result page. Our work aims to investigate and provide a user-centered interface for browsing the search results in the process of search session. Once a user has launched a query, the search engine must look in a variety of databases and return a set of relevant results. There are multiple ways to deliver information to the user and multiple ways to let the user use the result. However, there must be a phased implementation of content searches, both from a consumer usability perspective, as well as an advertiser/merchant perspective. Performance measures will be carrying out on the second phase of browsing the search results. Foster (1970) first used Rapid Serial Visual Presentation (RSVP) to mean rapidly displaying words in a sequence in the same visual location. RSVP originated as a tool for studying reading behavior [3], but lately has received more attention as a M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 55–64, 2009. © Springer-Verlag Berlin Heidelberg 2009
56
Y.-L. Lin and D. Lin
presentation technique with a promise of optimizing reading efficiency, especially when screen space is limited [9]. The reason for the interest is that the process of reading works a little different when RSVP is used and that it requires much smaller screen space than traditional text presentation [8]. RSVP is a method of displaying information using a limited space in which each piece of information is displayed briefly in sequential order [2, 4, 5, 11]. With the development of dynamic design, fastmoving RSVP interface could emphasize its advantage of showing more image information at a time, but compared with slow-moving RSVP interface, its relative advantage could be less mental workload [8]. Can user experience of search results be improved by using dynamic RSVP interface? The images of shelf RSVP interface shown in the fixation area are used to compare with carousel RSVP interface [2, 11]. We are going to provide usability evaluation of image visualization of dynamic RSVP interface in this study. The objective of this study is to evaluate the usability of RSVP interface using subjective preference questionnaire and performance measurement methods. A prototype of simulated E-bookstore system is designed to collect the subjective preference ratings of predetermined designing factors at the beginning of the study. To evaluate the usability for web users, usability evaluation is used to achieve specified goals with effectiveness, efficiency, learnability, memorability, and user satisfaction [7]. Both RSVP display (carousel and shelf) and moving speed (10, 15, 20, 30, and 40 FPS) were varied in the simulated interface of E-bookstore. We would like to propose the following researcher’s hypotheses: (1) Could the design factors, such as RSVP, moving speed, and moving direction affect the subjective preference rating on dynamic RSVP interface? (2) Could the design factors affect the performance of recognition on dynamic RSVP interface? (3) Are there usability goals conforming to user experience?
2 Design of the Dynamic RSVP Interface The simulated E-bookstore interface would contain contents of web searching result. The searching results will be shown on the dynamic RSVP interface as the experimental Web pages. Two kinds of RSVP interfaces were considered based on their trajectory. The carousel RSVP interface is defined as a series of images display appear successively running from the bottom of page in a clockwise (Carousel I) or counterclockwise (Carousel II) along the circular trajectory. The shelf RSVP interface is defined as the linear trajectory of the images followed the diagonal running from bottom-left to upper-right (Shelf-I), from bottom-right to upper-left (Shelf-II), from upper-right to bottom-left (Shelf-III), and from upper-left to bottom-right (Shelf-IV). Specifications of design factors and their factor levels for subjective preference questionnaire and performance measurement are shown in Table 1. The prototypes of simulated E-bookstore interface used in the preference-based phase were illustrated in Table 1. The layer design and the number of frames per second (FPS) were used to produce the moving effect of image visualization. FPS in Micromedia Flash MX means the moving speed for each image [6]. The exposure times for one image are 20, 13.3, 10, 6.67 and 5 seconds corresponding to 10, 15, 20, 30 and 40 FPS. The task assigned to each participant is to browse the searching results after entering a query.
Usability Evaluation of Dynamic RSVP Interface on Web Page
57
Table 1. The specifications of design parameters and their factor levels for dynamic RSVP interface Design parameter RSVP Trajectory
Factor level Carousel I Circular Clockwise
Carousel II Circular Counterclockwise
Shelf I Shelf II Linear Linear Bottom-left to Bottom-right upper-right to upper-left
Shelf III Linear Upper-right Moving to bottomdirection left Growing to Growing to Constant to Constant to Growing to Image size Shrinking Shrinking Shrinking Shrinking Constant Dynamic Dynamic Dynamic Dynamic Dynamic Image position Bottom-right Bottom-left Position of the From 9-12-3 From 3-12-9 Bottom-left o’clock maximum image o’clock Number of 5 5 5 5 5 images visible Total number 10 10 10 10 10 of images
Shelf IV Linear Upper-left to bottomright Growing to Constant Dynamic Bottomright 5 10
Example
3 Research Methods Before the usability experiment the preparation work included the selection of participants and experimental factors, construction and design of the experimental Web pages and the dynamic RSVP Interface, and design of searching results. 3.1 Subjective Preference Questionnaire The subjective preference questionnaire is a structured field of usability assessment. It is useful in the early stages of user-centered design development. The International Organization for Standardization (ISO) defines usability of a product as “the extent to which the product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.” Usability is generally regarded as ensuring that interactive products are easy to learn, effective to use, and enjoyable from user’s perspective [10]. More specifically, usability is broken down into the following goals: effectiveness, efficiency, safety, utility, learnability, and memorability. 3.2 Participants Forty-two undergraduate and graduate students (21 females and 21 males) coming from Tunghai University voluntarily participated in the subjective preference questionnaires. The distribution of age ranged form 23 to 30 years old (mean age of 24.6 years old and standard deviation of 1.72 years old). They all had the experience of surfing the internet and had normal vision or corrected vision reaching at least 20/25 and no color-blindness.
58
Y.-L. Lin and D. Lin
3.3 Experimental Design Dynamic RSVP interface consisted of ten images shown in turn on the Web page of Ebookstore (see Figure 1). Objective function is defined as the correct recognition of targeted image. Design factors are RSVP mode and moving speed. RSVP modes include clockwise and counterclockwise carousel RSVP and four types of shelf RSVP modes (shelf I-IV) (see Table 1). Moving speeds include five levels of 10, 15, 20, 30, and 40 FPS which their exposure time for each frame is 20, 13.3, 10, 6.67, and 5 seconds respectively. Each participant continuously viewed sixteen different searching results imposed on Web pages of E-bookstore and a recognition task was assigned.
(a)
(b)
Fig. 1. Illustration of (a) shelf RSVP and (b) carousel RSVP shown on the simulating E-bookstore interface
3.4 Experimental Procedure After filling in the self-reported background document, the rules of answering subjective preference questionnaire were explained by the experimenter. The varied combinations of RSVP interface on the simulated E-bookstore Webpage were shown one factor at a time. Subjective preference questionnaire was implemented associated with a simulated E-bookstore interface. The favorite display type of RSVP interface on the simulated E-bookstore Webpage was chosen individually by each participant until she/he has finished all the question items of subjective preference questionnaire. In addition, each participant continuously viewed sixteen different searching results presented on Web pages of E-bookstore and a browsing task was assigned. The participants are asked to correctly recognize whether a targeted image has been shown or nor after they finished the browsing task. The questionnaire for user interface satisfaction (QUIS for short) will be implemented after finishing the experiment of performance measurement. 3.5 Apparatus and Materials This study uses a Pentium IV desktop computer (CPU1.62GHZ, 896MB RAM) with Microsoft Internet Explorer 6.0, a 17-inches TFT-LCD monitor (1280×1024 pixels). Micromedia Flash, Dreamweaver and Firework MX 2004 (copyright @Cyberlink Corporation) were used to design the simulated E-bookstore Webpage.
Usability Evaluation of Dynamic RSVP Interface on Web Page
59
3.6 Model Building of Recognition Based on the design of experiment, the objective function of browsing task is collected. Logistic regression model is appropriate for the fitting of probability of correct recognition [1]. Design factors include RSVP mode and moving speed. RSVP modes include carousel and shelf RSVP displays based on the major difference of circular and linear trajectory (Table 1). Moving speeds include four levels of 10~15 (the groups of 10 and 15 FPS being combined due to the consideration of sample sizes), 20, 30, and 40 FPS with the exposure time of 20~13.3, 10, 6.67, and 5 seconds per frame respectively. In addition, gender and college background variables are considered as individual difference. Define Y as the recognition variable of targeted image. The value of Y equals 1 if the participant could correctly recognize the targeted image; otherwise, Y equals 0. Let π be the probability of correct recognition, we have the odds of correct recognition to be [π/(1 - π )] . The logit function log [π/(1 - π )] of π , symbolized by “ logit (π ), ” is the log odds of correct recognition. Whereas π is restricted to the 0-1 range, the logit can be any real number. The proposed model for the fitting of probability of correct recognition initially concludes the main effects of gender, college, RSVP mode, and moving speed as well as the interaction of RSVP mode and moving speed and is shown as follows: π = P( Y = 1 | X ) =
exp (X ′β ) , 1 + exp( X ′β )
(1)
where X denotes the design matrix of two-factor interaction of RSVP mode (RSVP for short) and moving speed (FPS for short), gender (G for short), and college (C for short), that is, X ′ = [1 : G, C , RSVP | FPS] (use “|” for interaction), β is the parameter vector corresponding to X the logistic regression model is expressed as follows:
,
⎛ π ⎞ logit[P(Y = 1 )] = logit(π) = log⎜ ⎟ = X ′β , ⎝1− π⎠
(2)
The nominal-scale explanatory variable with k categories in Equations (1) and (2) is appropriate to be treated using (k-1) indicator variables. For example, design factor RSVP is a categorical variable with two categories using one indicator variable. FPS may be regarded as continuous or regarded as categorical with four categories. If FPS is regarded as nominal-scale explanatory variable, it is going to use three indicator variables to describe their four categories. The frequencies of recognition for the groups of 10 and 15 FPS are combined and renamed as FPS15. We have the value of FPS15 equals 1 if the moving speed is 10 FPS or 15 FPS; otherwise, FPS15 equals 0. Similarly, the value of FPS20 equals 1 if the moving speed is 20 FPS; otherwise, FPS20 equals 0. The value of FPS30 equals 1 if the moving speed is 30 FPS; otherwise, FPS30 equals 0. The value of FPS40 equals 1 if the moving speed is 40 FPS; otherwise, FPS40 equals 0. Let (G, C, RSVP, FPS15, FPS20, FPS30) each take values 0 and 1 to represent the nominal-scalar categories of explanatory variables. The coding values of indicator variables corresponding to the nominal-scale explanatory variables are described as follows:
60
Y.-L. Lin and D. Lin ⎧1, if Female , G=⎨ if Male ⎩0,
⎧1, if major in Art and Design C =⎨ if major in Management ⎩0,
⎧1, if carousel mode , RSVP = ⎨ otherwise ⎩0,
⎧1, if FPS = 20 FPS 20 = ⎨ ⎩0, otherwise
,
⎧1, if FPS = 10 or 15 , FPS15 = ⎨ otherwise ⎩0, ⎧1, if FPS = 30 FPS 30 = ⎨ ⎩0, otherwise
Rewrite Equation (2) as following: ⎛ π ⎞ logit[P(Y = 1)] = logit(π ) = log⎜ ⎟ = β 0 + β1G + β 2C + β 3RSVP + β 4 FPS15 ⎝1− π ⎠ + β 5 FPS20 + β 6 FPS30 + β 7 RSVP × FPS15 + β 8RSVP × FPS20 + β 9 RSVP × FPS30,
(3)
The parameter corresponding to indicator variable RSVP in Equation (3) are β 3 . The value of e β 3 represents the odds ratio defined as the ratio of correct recognition odds between carousel and shelf RSVP interfaces. The values of FPS15 = 1, FPS 20 = 0, FPS30 = 0 in Equation (3) are being substituted for moving speed of 15 FPS. The values of FPS15 = 0, FPS 20 = 1, FPS30 = 0 are being substituted for moving speed of 20 FPS. The values of FPS15 = 0, FPS 20 = 0, FPS30 = 1 are being substituted for moving speed of 30 FPS. The parameters corresponding to indicator variables FPS15 , FPS 20 , and FPS 30 in Equation (3) are β 4 , β 5 and β 6 . The value of e β 4 represents the odds ratio defined as the ratio of correct recognition odds between 15 FPS and 40 FPS. Similarly, the values of e β 5 and e β 6 represents the odds ratios of correct recognition between 20 FPS and 40 FPS as well as between 30 FPS and 40 FPS.
4 Results and Discussion 4.1
Comparison of Subjective Preference
Based on the result of subjective preference questionnaire, the favorite percentage distribution of moving direction for RSVP displays is shown in Figure 2. For shelf RSVP display, 57% of users chose the direction of moving from bottom-left to upperright as their favorite one, however, only 5% of users chose the direction of moving from bottom-right to upper-left (Figure 2(a)). The differences of subjective preference proportions among four moving directions of shelf RSVP display are statistically 2 significant ( χ = 25.62 , P-value <0.001). For carousel RSVP display, 69% of users chose the moving direction of clockwise as their favorite one (Figure 2(b)) and there is 2 a significant difference between clockwise and counterclockwise directions ( χ = 6.10 , P-value= 0.0136). In addition, the disfavored percentage distribution of moving speed for RSVP displays is shown in Figure 3. There are 40% of users choosing 40 FPS (the fastest speed) as their disfavored moving speed and 36% of users choosing 10 FPS (the slowest speed) as their disfavored moving speed (Figure 3(a)). The differences of disfavored proportions among five moving speeds of shelf RSVP display are
Usability Evaluation of Dynamic RSVP Interface on Web Page
5% 17%
57%
21%
Bottom-left to upper-right Upper-left to bottom-right Upper-right to bottom-left Bottom-right to upper-left
61
Clockwise
31%
Counterclockwise 69%
Fig. 2. The favorite percentage distribution of moving direction for (a) shelf and (b) carousel RSVP displays
36%
40%
7%2% 7%
10FPS
15FPS
15FPS
20FPS
20FPS 7% 5%
12%
30FPS
30FPS 40FPS
10FPS
84%
40FPS
Fig. 3. The disfavored percentage distribution of moving speed for (a) shelf and (b) carousel RSVP displays 2 statistically significant ( χ = 23.71, P-value <0.001). For carousel RSVP display, there are as many as 84% of users choosing 40 FPS (the fastest speed) as their disfavored moving speed (Figure 3(b)). The differences of disfavored proportions among five 2 moving speeds are statistically significant ( χ = 76.48 , P-value <0.001).
4.2 Logistic Regression Model Fitting Iteratively compare models and conduct inference about parameters to fit the recognition data. The best fitting of logistic regression model is , ⎛ πˆ ⎞ log it (πˆ ) = log⎜ ⎟ = 1.276*RSVP − 1.084*FPS15 − 0 .023*FPS 20 + 1.323*FPS 30 ⎝ 1 − πˆ ⎠
(4)
The estimated correct recognition rate (ECRR) is obtained by the following estimated probability of correct recognition. ECRR = πˆ =
exp( X ′βˆ )
1 + exp( X ′βˆ ) exp(1.515 + 1.276 * RSVP − 1.084 * FPS15 − 0.023 * FPS 20 + 1.323 * FPS 30) = 1 + exp(1.515 + 1.276 * RSVP − 1.084 * FPS15 − 0.023 * FPS 20 + 1.323 * FPS 30)
,
(5)
The parameter estimate corresponding to indicator variable RSVP in Equation (5) are 1.276. The value of e1.276 = 3.6 represents the estimated odds that carousel RSVP
62
Y.-L. Lin and D. Lin
interface provides correct recognition of a browsing task odds are 3.6 times the estimated odds for shelf RSVP interface. It indicates carousel RSVP interface provides higher ECRR than shelf one. Similarly, moving speed of 30 FPS provides higher ECRR than others since the parameter estimates corresponding to indicator variables FPS15, FPS20, and FPS30 in Equation (5) are -1.084, -0.023, and 1.323. The highest ECRR is 0.9839 for the combination of carousel RSVP and 30 FPS, however, the combination of shelf RSVP and 15 FPS has the lowest ECRR of 0.6060. Based on the performance measure of recognition, the best combination of correct recognition is carousel RSVP interface and 30 FPS and the worst combination of correct recognition is shelf RSVP interface and 15 FPS. 4.3 Statistical Tests for QUIS The result of questionnaire for user interface satisfaction is illustrated in Figure 4. There are 86% of users agreeing upon this interface is easy to learn and reaching 2 statistical significance ( χ = 21.43 , P-value <0.001). It means RSVP interface is not only providing information visualization on Webpage, but also providing the function with ease to learn. There are 69% of users agreeing upon this interface is effective to 2 use and reaching statistical significance ( χ = 6.10 , P-value =0.014). It means RSVP interface would carry out browsing tasks efficiently and provide the function with effective to use in a limited space without scrolling the Webpage. In addition, dynamic Catch my attention
31
11
Aesthetics
14
28
Efficiency
13
29
0%
Agree
36
6
Learnability
Disagree
20%
40%
60%
80%
100%
Fig. 4. Frequency (Percentage) distributions of agreeing upon the heuristic principles Table 2. Usability evaluation for user experience
Chi-squared Test Criteria Easy to learn Effective to use Efficient to use Aesthetics Attraction
χ2
P-value1
21.43 6.10 1.52 4.67 9.52
<0.001* 0.014* 0.217 0.031* 0.002*
Note 1: “*” denotes it reaches significance level of 0.05.
Usability Evaluation of Dynamic RSVP Interface on Web Page
63
RSVP interface would also reduce the time of saccadic eye movement to reach efficiency. Over 60% of users (67%) agree upon this interface conforming to the 2 principle of aesthetics and reach statistical significance ( χ = 4.67 , P-value=0.031). Over 70% of users (74%) agree upon catching user’s attention for this RSVP interface 2 and reach statistical significance ( χ = 9.52 , P-value =0.002). It indicates easy to learn, effective to use, aesthetics and attracting user’s attention all conform to the usability principles except the principle of efficient to use.
5 Conclusions Based on the results of subjective preference questionnaire, the design of preferencebased search results interface for Web users will be suggested in this study. For shelf RSVP interface, users prefer moving direction from bottom-left shrinking to upperright and moving speed between 20 and 30 FPS (exposure time is about 6.67~10 seconds). For carousel RSVP interface, users prefer moving direction of clockwise and moving speed between 10 and 15 FPS (exposure time is about 13.33~20 seconds). However, based on the performance measure of recognition, the best combination of correct recognition is carousel RSVP interface and 30 FPS and the worst combination of correct recognition is shelf RSVP interface and 15 FPS. It looks like individual bias exists between users and needs to investigate the possibility deeply. Based on the questionnaire for user interface satisfaction (QUIS), easy to learn, effective to use, aesthetics, and attraction all conform to the usability principles except the principle of efficient to use. Acknowledgements. The support of the National Science Council, Taiwan (grant NSC 97-2221-E-029 -011) is gratefully acknowledged.
References 1. Agresti, A.: An Introduction to Categorical Data Analysis, 2nd edn. John Wiley & Sons, Inc., Chichester (2007) 2. De Bruijn, O., Spence, R.: Patterns of eye gaze during rapid serial visual presentation. In: Proceedings of Advanced Visual Interface (AVI 2002), Trenton, Italy, pp. 209–217 (2002) 3. Foster, K.L.: Visual perception of rapidly presented word sequences of varying complexity. Perception & psychophysics 8, 215–221 (1970) 4. Lin, Y.L., Ho, C.M.: Evaluation of RSVP and display types on decoding performance of information extraction tasks. In: Stephanidis, C. (ed.) HCI 2007. LNCS, vol. 4556, pp. 380–388. Springer, Heidelberg (2007) 5. Lin, Y.L., Wang, P.C.: Effects of RSVP and Graphical Displays on the Visual Performance of Mobile Devices. In: International Conference on Applied Human Factors and Ergonomics (AHFE), Las Vegas, USA, July 14-17 (2008) 6. Micromedia Flash MX (2004), http://www.adobe.com/support/documentation/en/flash/ 7. Nielsen, J.: Ten usability heuristics (2005), http://www.useit.com/paper/heuristic/heuristic_list.html
64
Y.-L. Lin and D. Lin
8. Öquist, G., Goldstein, M.: Towards an improved readability on mobile devices: Evaluating adaptive rapid serial visual presentation. Interacting with Computers 5, 539–558 (2003) 9. Rahman, T., Muter, P.: Designing an interface to optimize reading with small display windows. Human Factors 41, 106–117 (1999) 10. Sharp, S., Rogers, Y., Preece, J.: Interaction Design: Beyond Human-Computer Interaction, 2nd edn. John Wiley & Sons, Ltd., Chichester (2007) 11. Spence, R.: Rapid, serial and visual: a presentation technique with potential. Information Visualization 1(1), 13–19 (2002)
“Online Legitimacy”: Defining Institutional Symbolisms for the Design of Information Artifact in the Web Mediated Information Environment (W-MIE) Emma Nuraihan Mior Ibrahim and Nor Laila Md Noor Department of System Sciences, Faculty of InformationTechnology and Quantitative Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia {emma,norlaila}@uitm.edu.my
Abstract. The global nature of the Internet raises questions about variety of ways, whether and how trust is established that guide people’s behavior and beliefs. This study explores on the understanding of trust from the non technical perspectives in the context of web mediated information environment (W-MIE); conceptualized within the notion of online legitimacy [1]. We take up the problem of how to enhance the trustworthiness of information on the web through the design deployment that can be rationalized and understood by the ordinary users. The paper highlights four dimensions of design elements that make up the Institutional Trust Inducing Features framework [2] which warrants increased attention. This paper is merely situating our comments in designing information artifact within a sensitive context that is culturally imbued beyond the typical security scope but rather the conceptual understanding on how user’s engage with information interactions. Keywords: information artifact, culturally sensitive design, online legitimacy, web mediated information environment (W-MIE), institutional symbolisms.
1 Introduction As the Internet becomes more pervasive in our everyday lives, our social and economic aspects now are governed by the ‘virtual’ world where interaction are mediated or executed with technology. Its emergence has become the critical part of the overall infrastructure of a society, affecting the social, community, cultural and political life. However, these interactions via the web involved different types and level of risks that are either caused by the uncertainty of using open technological infrastructure for the electronic exchange or can be explained by the conduct of users who are involved in the exchange activities [3]. This is due to the fact that in an online environment the degree of uncertainty of economic transaction is higher than in traditional settings. Today, however, consumer’s interaction is not only involved within the interpersonal or inter-organizational transaction within the electronic exchange model (e.g. e-commerce) but also in the knowledge transaction and exchange within the information exchange mode [4] (e.g. providing services in terms of online advices, information, discussion on topics like career, relationship, M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 65–73, 2009. © Springer-Verlag Berlin Heidelberg 2009
66
E.N.M. Ibrahim and N.L. Md Noor
matrimonial, health, financial, legal, politics and religion sermons). These are some of the evidences that the user nowadays are extending their web uses to access information about matters that affect their lives, also to establish personal and organizational connections. Nevertheless, issues like fraudulent behavior, forgery and pretense, questions concerning the original and the copy [5], the evaluation of goods that are the object of commercial transactions have given rise to new problems in the electronic mediated environment. What is at stake here is the entire range of mechanisms that will facilitate interpersonal and inter-organizational transactions, given the new conditions for knowledge transactions and exchanges: increasing specialization, increasing asymmetrical distribution of information and assessment capabilities, greater anonymity among interculotors and more opportunities for forgery identity [4]. As more and more security breaches occurred due to malicious or innocent attacks, the public opinion on security and trustworthiness of using the Internet for online activities has been towards an attitude of distrust. Hence, the value of trust for a robust online world is obvious. It reduces the complexity of a chaotic situation [6]. In the HCI literature, much of the initial research on consumer’s judgment of trust on information is conceptualized through its credibility perceptions [7] or quality indicators [8]. However some of the works are heavily criticized because the operationalization of trust was not understandable [9]. However, empirical research in this area is beset by conflicting conceptualizations of the trust constructs, inadequate understanding of the relationship between trust and its antecedents and consequents and the frequent use of trust scales that are neither theoretically derived nor rigorously validated [9]. 1.1 Trust within the W-MIE The notion of trust in information conforms to the interpersonal model of trust whereby it is a social attitude of the trustor towards a technological artifact (the trustee), in this case the electronic information or document such as web page or electronic article [10]. A person may browse or retrieve information for several reasons, e.g. as an evidentiary support for a decision making process, as reference material or facts to supplement one’s knowledge. However, limitations can be seen through the questionable quality of the information provided and the risk of getting false information [8]. Less critical and uninformed people are more likely to accept an untruth as a truth [8]. Falsity on the web is seldom revealed because there is too much information. The more information is put on the Internet, the chance of discovering misinformation increases. In addition, most people neither have the time to verify its accuracy nor go back to the same site because the browser may fail to find it again e.g. broken links [11]. Hence, deliberate users trusting decision to use their own knowledge to evaluate the information in its own terms. It is noted, trust exercises on consumer behavior and decision making processes [6]. The Internet spans the globe in all languages and culture. Users and organizations communicate their business ideas, knowledge and information across vast distances. When people are engaged in the knowledge and information exchanges in the virtual world, it is critical to avoid misunderstanding and misinterpretations. Users must comprehend accurately the meaning of what is said. How things are said is as important as what actually is said. This is inhibited by differences in value systems, attitude, beliefs, and
“Online Legitimacy”: Defining Institutional Symbolisms for the Design
67
communication styles. Such differences shall be taken into account in order to ensure that the design of an interface are usable and acceptable as the cultural background of users which in turn affects how they operate and interact with an interface. The communication style one uses for generating ideas, exchanging opinions, sharing knowledge and expressing ideas is indeed cultural centric [12]. However, these key issues rooted in deep cultural identities represented via the interface element within information context have not been fully explored and understood empirically. Some researchers have done work in the area of culture and design [13] but results have been either inconclusive or unrelated to develop an IS for information settings in sensitive context. Moreover there is no consensus as to how the trust constructs should be operationalized. Clearly new methods need to be devised to “certify” the knowledge circulating on the Internet within a context where inputs are no longer subject to control [4]. Another big issue concerns regulation and social behavior and the formation of cooperation based upon trust and shared ethos/ identity in virtual context [14]. This is because, trust and culture are interconnected. The meaning, antecedents, and effects of trust are indeed determined by one’s culture [14]. Indeed, there is a longstanding interest in designing information and computational systems that support enduring human values within sensitive context of design [15]. The need of a deeper understanding for design can be obtained by taking an information perspective on design activities. Under this perspective the major unit of analysis is the information transaction or known as design informatics – the specific needs and tasks associated with capturing, storing, updating, linking and accessing information by describing them with a technologically-neutral vocabulary. This approach offers a strategy for developing a more unified view of design which in turn can provide insight into the requirements of design information systems and elucidate new areas of design competency and opportunity. By analyzing information needs of design and how design teams create capacities to satisfy these needs, we may begin to recognize the invariant, technologically-neutral requirements that emerge from any design methodology. Our goal here is to position our arguments from a non-technical point of view of designing an information artifact which is culturally sensitive in context and design requirements – a theoretically driven approach.
2 Conceptualization of Online Trust in W-MIE In the IS community and the wider human computer interaction (HCI) aspects, trust has been discussed widely in the context of e-tailing environment (e.g. e-commerce) varies in its models, dimensions and constructs [3]. The problems of trust often conceptualized rather loosely to technical security [3, 10] and technologies mechanisms of understanding such as encryption, communication protocols, cryptography and trusted information architectures [16]. It focuses on the tangible trust or hard trust dimensions comprising formal mathematical and cognitive models of trust. The emphasis is placed on the role of trust for e-commerce adoption and the short term transactional values. Nevertheless, we are not interested in addressing neither trust nor trustworthiness as security or through security but are guided by conceptions of trust developed in the theoretical and empirical work of social scientists and philosophers.
68
E.N.M. Ibrahim and N.L. Md Noor
We contend that online trust will not be achieved through security because that vision is founded on a misconstrued notion of trust, missing the point on why we care about trust and making mistaken assumptions about human nature along the way. This separates the view of trust from the conceptual and technical scope because the technological realm of which we speak is so extensive and intricate, and the conceptual domain of trust so broad and varied hence we must not make some qualifications and simplifying assumptions. In addition, the technical point of view on trust derives from the worry about the dependability of these systems, their resilience to various forms of failure and attack and their capacity to protect the integrity of online interactions and transactions. These cases are sufficiently distinct from the W-MIE context that they deserve separate treatment in terms of its setting and veiling properties that affect the formation, readiness or inclination to trust. Hence we cannot rely on traditional mechanisms for articulating and supporting trust in W-MIE as we lack the explicit framework of assurance that support them. We argued that the “logical” model or factor is not sufficient to model of how natural actors behave. Although some of these measures are surely useful and needed, we believed that the idea of total control and a purely technical solution to protect against deception and to favor non-self interested cooperation is unrealistic. Undeniably, the fact that to understand rational actors (human beings) one must take both affect (emotion, values or socialization) and cognitive views into consideration. Hence, researchers need to identify specific user requirements, identify risk problems and thereby the appropriate trust building strategies especially within a cultural setting. Hence this paper put forward the notion of trust from a non-functional perspective as an emotional, “intangible” response to computer based stimuli [17] or known as the “non technical mechanism” of trust perspectives or “soft trust dimension” [3] to answer in what ways of visualizing trust would be acceptable and understandable for the users. How can we map the user’s expressions of trust with existing security technology or should we create a new set of security technologies from scratch that would better take into account the novel uses and novel users? We believed requirements elicitation in sensitive settings demand us to draw a line between the perception of the designers, who often seen to construct solutions and thereby design for people essentially like themselves but not the perception and the understanding of “the other” which includes the novice and non novice users whose views are excluded from design. Hence the emphasis should be on ensuring the design for user sovereignty where it can be signaled, rationalized and understood as part of the overall interface design strategy of information systems for the laymen rather the design for user friendliness. As more people go online for seeking information regardless of context it becomes increasingly important to identify what makes people choose to trust some sites and reject others. In this sense, while not disregarding the importance of online security and the evolving systems that support it, we believe that security has little to do with general consumer trust. This leads to the assumptions that current trust research in IS are hampered by designing for computers rather than humans. Hence we posited that designing trust metrics requires an understanding of not only the technical nuances of security but also the human subtleties of trust perception. The aim of this research is to study trust more closely at end user level, what is trust from the user point of view, what kind of processes of trust decision making based on the interaction with the website and how the words and images are
“Online Legitimacy”: Defining Institutional Symbolisms for the Design
69
encountered and perceived. What is needed here is to find a way for computers to understand what “human trust” is made of and gives way to our initial assumptions on how to design an information artifact that is perceived trustworthy. It looks into on how human reasoning trust online triggered by the perceived trustworthy elements of an interface apparent to the users. It is necessary to have a more cognitive and affective view on trust as a complex mental structure of beliefs and goals, which would imply that the trustor has a “theory of mind” of the trustee possibly including personality, shared values, morality, goodwill etc. In addition we further emphasis on how people perceived trustworthy information by approaching the information mediated exchange activities bringing with them previous experiences of online trust activities and apply it to the new computer mediated situations rather than being tabulae rasae onto which the design for information based systems can write their preferred responses. We argue the facts that trust are not only reduced to rational decisions, but trust is also socially embedded by the institutions, norms, culture and etc. The focus of this paper is to reveal in greater depth on the understanding of trust within W-MIE that takes into account of trust determinants from the perspective of human actor’s reasoning in the context where the sense of a community with common norms and values are significant. A website is merely an extension of human interaction and therefore there is a need in the ability to transfer human trust reasoning mechanisms into the design of a web interface. 2.1 Online Legitimacy – Institutional Symbolisms and Its Dimensions Based on the above considerations, we offer the following suggestions for the design to enhance the trustworthiness of information artifact. Our work links the social science theories of trust and semiotic principles to form the basis of the Institutional Symbolisms Trust Inducing Features framework as explained in our previous works [1, 2]. The perspective of the framework is a descriptive one, since it categorizes the trust dimensions that act as communication embedded in cultural influence features to support the emergence of trustworthy behavior. In the literature, institutional trust is a type of trust that is also known as system trust [10], reputed credibility [7], and is similar to the transference process described in [10] and control trust [18]. Lewicki et al. [19] describe institutional based trust as the trust that develops when individuals generalize their personal trust to a large organizations made up of individuals with whom they have low familiarity, low inter-dependence and low continuity of interaction. For example, we trust the government system to ensure the stability of a state or we place our trust in a juridical system to uphold the law enforcement and sanctions. We conceptualized institutional based trust as a form of symbolism, the system of representations and symbols through institutional trust inducing features on the web which we define as institutional symbolism. The term institutional symbolism refers to a visible, physical manifestation of the institutional characteristics, behavior and values. Institutional symbolism is the trust marks, signs that depicting and presenting connoted message of some assurance. On the basis of these expectations, we content that the trust marks represent the beliefs (values) and expectancies held by individual about the overall impersonal structures and situations construed as to network both cognitive and affective trust warranting properties. These beliefs implying that the
70
E.N.M. Ibrahim and N.L. Md Noor
institutional symbolisms carry its own disposition and attribution meaning, an institutional manifestation through textually or graphically presentation on the web site. We contend, trust on this institutional symbolism act as a form of social trust where trust initiated from the social mechanism, behavior and values through the means of institutional symbolism representation. These elements represented what constituted “online legitimacy”. a) We proposed the need for content credibility which often referred to believability [7], trustworthiness [3, 6, 9, 10], reliability [10] e.g. links, navigation whereby the ease of navigation was frequently mentioned as a key to promote online trust and sources of content. It gives an emphasis on the information dissemination is free from alteration, biasness or falsification which are ways to reassure the users about the quality of the information and its comprehensiveness [8,11] in terms of accuracy and currency. The information provider needs to explain and summarize its general information gathering practices over the web, perhaps what the information is used for and with whom the information may be shared. It should also consider the necessity of legislation to create civil remedies for users in the event of untrustworthy interactions with the information provider. These evoke the need of elements of truth and validity, lawful and evidential when it involves certain ethics or principles associate with one’s beliefs such as in a religious context; appearance and functionality to ensure the reliability of the content presented and written. It defines the overall organization and accessibility of displayed information on the website. It is said users are more confident using a clear design and format of interface elements of website because it reduces the perceived risks, wasting time, deception and frustration [20]. Example, a content that highlights on a legal matter should derived from someone expert within the subject matter, an authority or affiliation appointed by the organizations, institutions or government. In addition, one’s believability can also evolves out of their past experience and prior interaction with the information provider as trust emerged as the relationship matures [19]. For example, the testimonials and user’s feedback placed special regards to this developing, building and declining of trust in order to identify the reliability (e.g. on site performance) and believability (e.g. deliverables) of the information provider. b) Emotional assurance or known as benevolence [3, 6, 9, 10] is an important trust element as it depicts the degree to which the information provider (trustee) is believed want to do good to the users. In the aspect of information dissemination it considers the essence of good intention portrayed by the organization/ institution in adherence to their code of conduct. Positive intentions should leads to the overall public benefits without taking advantage by other means, misleading intention or creating confusion. Example, by disclosing ownership of the information up front, it hopes to inform the users about the nature of the relationship of the organization and its affiliations and thus prevent misconceptions or perceived inherent biases. c) Trustworthy content can also be signaled by the symbols of trusted third party (TTP)/ seals of approval which is usually seen in any online transaction websites. Well known TTP such as TRUSTe seal was perceived by users as a stamp of approval for the quality of a web site’s privacy policy and security [21, 22]. These TTP should effectively capture the process involved in demonstrating user’s satisfaction, guarantees and safety nets and information practices. For example the need for assurance that the information collected or distributed is being handled as outlined in the privacy guidelines and disclosure policies. In addition, the TTP will act as a
“Online Legitimacy”: Defining Institutional Symbolisms for the Design
71
guarantee that the information or data transferred is well protected and secured and some kind of a stamp in validating or certifying the truth presented in the content displayed on the web. d) We also believe that to design trustworthy information artifact should reflect or create the essence of brand and reputation of the information provider. This is because when a person perceives the brand name or symbol associate to it; it is the interplay of the associations of the branded object that manifests as image constructed by the user which in turn influence the reputation of an organization in general. It is noted that brand image bears great potentialities to strengthen trust [23, 24]. The design can take into account from the creation of the domain name, trademarks or logo which in turn highlights on the trustee ethical values, expertness (authority, knowledge) and familiarity (general business sense).
3 Discussions and Future Directions We conclude our working suggestions with a key point that designing trustworthy information artifact does not stand apart from other enduring human values, expressions and perceptions especially within the sensitive context of application. In our eagerness to create the tools and conditions to support online legitimacy within the creation of information artifact on the web, we should not depict trust in its narrower sense. We should not address the problem of trust in its mathematical, technical or logical sense where the truth, say of a sentence in a formal language can be formally examined on a meta-linguistics level. Often people feel threatened by these technological processes simply because they are not necessarily ideally devised and constructed to serve people especially the laymen and often tend to dominate people by their complexity. We are interested of the nature of understanding online trust within W-MIE in the sense of its correspondence to reality or at least its pragmatic utility in predicting real events or in solving practical problems by utilizing the right technology facilities. We are also aware that this question can be and have been very diversely interpreted in philosophy resulted in the conclusion that trust has social value and is the basis of sound social interaction, is even an elementary requirement of evolution of social cooperation in the virtual world. It should be noted, not all users are technically well versed. Technology should be the servant, not the master of people and many technology producers put their technology into dominant position; hence force its users to behave as they think she or he should behave. Therefore we stressed again, every new technology imposed should respect the principle of again the user sovereignty not only of user friendliness. Hence, in our previous works [1, 2] and in this paper, we introduce the basis of the discussion by evaluating the relative complexity of processing visual and textual information as part of the trustworthiness processes in imposing and sanctioning behavior online. We proceed then to the evolutionary rational and irrational trust in relation to the psychology definition of trustworthy information. And in the future discussion on the validation of these design definitions and how the consequences enable us to understand the design deployment of an information artifact, its constitutive parts and the epistemological aspects will take place. This will eventually provide us clearer design guidance for suitable technologies that support trust and effective security and privacy protecting behaviors within the culturally sensitive design in W-MIE.
72
E.N.M. Ibrahim and N.L. Md Noor
References 1. Mior Ibrahim, E.N., Noor, N.L.M., Mehad, S.: Wisdom on the Web: On Trust, Institution and Symbolisms, A Preliminary Investigations. In: Proceedings of Enterprise Information Systems, ICEIS, Barcelona, Spain, vol. (5), pp. 13–20 (2008) 2. Ibrahim, E.N.M., Md Noor, N.L., Mehad, S.: Seeing Is Not Believing But Interpreting, Inducing Trust Through Institutional Symbolism: A Conceptual Framework for Online Trust Building in a Web Mediated Information Environment. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4558, pp. 64–73. Springer, Heidelberg (2007) 3. Kräuter-Grabner, Kaluscha, S., Ewald, A., Marliese, F.: Perspectives of Online Trust and Similar Constructs – A Conceptual Clarification. In: Proceedings of The Eighth International Conference on Electronic Commerce, pp. 235–243. ACM, New York (2006) 4. Collini, S. (ed.): Eco, Umberto, Interpretation and Overinterpretation. Tanner Lectures In Human Values. Cambridge University Press, Cambridge (1992) 5. Forray, D.: The Economics of Knowledge. MIT Press, Cambridge (2004) 6. Corritore, C.L., Wiedenbeck, S., Kracher, B.: On-line Trust: Concepts, evolving themes. A model in International Journal of Human Computer Studies (58), 737–758 (2003) 7. Fogg, B., Tseng, H.: The Elements of Computer Credibility. In: Proceedings of CHI 1999. ACM Press, New York (1999) 8. Alexander, J.E., Tate, M.: Web wisdom: How to evaluate and create information quality on the Web. Lawrence Erlbaum, Mahwah (1999) 9. Kräuter-Grabner, S., Kaluscha, A.E.: Empirical research in on-line trust: a review and critical assessment. International Journal of Human-Computer Studies 58(6), 783 (2003) 10. Chopra, K., Wallace, W.A.: Trust in electronic environments. In: 36th Annual Hawaii International Conference on System Sciences, Big Island, Hawaii, January 2003, pp. 331– 340 (2003) 11. Marchand, D.: Managing information quality. In: Wormell, I. (ed.) Information Quality: Definitions and Dimensions. Taylor Graham, London (1990) 12. Bonnani, C., Cyr, D.: Trust and Loyalty: A Cultural Comparison. In: Proceedings of International Conference on Business, Canada (2004) 13. Matsumoto, D.: People: Psychology from a cultural perspective. Brooks/Cole, California (1994) 14. Hofstede, G.: Culture’s Consequences: comparing values, behaviors, institutions, and organizations across nations, 2nd edn. SAGE Publications, Thousand Oaks (2001) 15. Friedman, B., Kahn Jr., P.H., Borning, A.: Value Sensitive Design and information systems. In: Zhang, P., Galletta, D. (eds.) Human-computer interaction in management information systems: Foundations, Armonk, New York, pp. 348–372. M.E. Sharpe, London (2006) 16. Ratnasingam, P., Pavlou, P.A.: Technology Trust in B2B Electronic Commerce: Conceptual Foundations. In: Kangas, K. (ed.) Business Strategies for Information Technology Management, pp. 200–215. Idea Group Publishing, Hershey (2004) 17. French, T., Liu, K., Springett, M.A.: Card-sorting Probe for E-Banking. In: Proceedings of British Human Computer Interaction, vol. 1. BCS Publications (2007) 18. Pavlou, P.A., Gefen, D.: Building Effective Online Marketplaces with Institution-Based Trust. Information Systems Research 15(1), 37–59 (2004) 19. Lewicki, R.J., Bunker, B.B.: Developing and maintaining trust in work relationships. In: Kramer, R., Tyler, T. (eds.) Trust in Organizations: Frontiers of Theory and Research, pp. 114–139. Sage, Newbury Park (1996)
“Online Legitimacy”: Defining Institutional Symbolisms for the Design
73
20. Wang, Y.D., Emurian, H.H.: Inducing Consumer Trust Online: An Empirical Approach to Testing E-Commerce Interface Design Features. In: Proceedings of the International Conference of the Information Resources Management Association: Innovations Through Information Technology, New Orleans, USA, May 23-26, pp. 41–44 (2005) 21. Head, M., Hassanein, K.: Trust in e-Commerce: Evaluating the Impact of Third Party Seals. Quarterly Journal of Electronic Commerce 3(3), 307–325 (2002) 22. Hu, X., Lin, Z., Zhang, H.: Myth or reality: Effect of trust promoting seals in electronic markets. Presented at WITS 2001, New Orleans, LA, USA (2001) 23. Aaker, D.: Managing Brand Equity. The Free Press, New York (1991) 24. Einwiller, S., Will, M.: The role of reputation to engender trust in electronic markets. In: Proceedings for the 5th International Conference on Corporate Reputation, Identity and Competitive (2002)
Evaluation of Web User Interfaces for the Online Retail of Apparel Dominik Rupprecht, Rainer Blum, and Karim Khakzar Hochschule Fulda - University of Applied Sciences, Marquardstr. 35, 36039 Fulda, Germany {Dominik.Rupprecht,Rainer.Blum,Karim.Khakzar}@hs-fulda.de
Abstract. In this paper we present intermediate findings of the ongoing research project SiMaKon. It presents results of user tests, on how to design a web-based user interface for the online retail of apparel. The aim was to increase both the usability and the shopping experience. The study focuses on the manner of the product catalogue navigation and presentation in relation to different “modalities of needs”. The results indicate that not only one approach for an interface is to be chosen, but that a combination of several concepts should be combined. Keywords: E-Commerce, Evaluation, Web-Based User Interface, Apparel, Rapid Prototyping, Modalities of Need.
1 Introduction Many scientific studies have investigated online shopping service attributes. These studies have classified the important attributes of online stores into four categories: merchandise, customer service and promotions, navigation and convenience, and security [1]. Concerning the third attribute category (navigation), aspects like store layout, organization features, information presentation as well as usability, media richness, accessibility, personalization, adaptation or experience-related traits (like design quality, entertainment, playfulness etc.) have been identified as relevant and important. In the research project SiMaKon, “Simulation of Made-to-measure and Readymade Clothing Online for Fit Checking” at Fulda University of Applied Sciences the development of an e-commerce user interface for the B2C retail sector is a current subject. In doing so, the aforesaid navigation and design-related attributes are regarded as fundamental quality criteria. With focus on these aspects, it was evident, that a wide range of fundamental user interface design options already exists. These options do not only cover small details like user interface controls or the structuring of individual pages, but also higher-level functional components of online shops like shopping carts and product catalogues. Due to missing standards or universally accepted guidelines on best practise for online shops, it was obvious that a controlled experiment comparing different design M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 74–83, 2009. © Springer-Verlag Berlin Heidelberg 2009
Evaluation of Web User Interfaces for the Online Retail of Apparel
75
alternatives may point the way for the future development of the SiMaKon system’s web user interfaces. This paper describes the chosen procedure in the following sections. Section 2 depicts the method of using four different concepts and building rapid prototypes for each and section 3 discusses how the tests were conducted and executed. Finally in section 4 the results are shown and discussed in section 5. 1.1 Objectives and Significance In this paper we present intermediate findings of the ongoing research project SiMaKon. Focus of its research activities is an innovative, interactive system for the support of apparel retail via Internet. This paper presents results of user tests carried out, on how to design a web-based user interface for an online shop in order to increase both the usability and the shopping experience. To interpret our findings correctly and to transfer them to other research contexts, we provide details on the study’s context. 1.2 The SiMaKon Project The ongoing research project SiMaKon deals with trying on and reliably appraising clothing in online shops with the help of individual virtual bodies. It tries to find solutions for the following challenges: • reliable fit checking and appraisal of optical characteristics in relation to a virtual representation of one’s own body • seamless integration with product catalogue respectively product configurator • comprehensive interactivity, thorough usability and absorbing shopping/ user experience • high customer acceptance An integrated system composed of different functional components is developed and continuously evaluated. The first component needed is an e-commerce back-end and online shop front-end. The second is a three-dimensional virtual try-on scene, again consisting of three elements. On the one hand, an avatar generator to create individual virtual human bodies (avatars) with configurable body measurements, hair style, skin tone and face characteristics. On the other hand, a clothing pattern generator to build the virtual garment construction of all apparel available in the online-shop. The most important part in the end is the three-dimensional simulation. It renders the 3D content with avatar, complete clothing outfit and background scenery displayed in the Web browser, integrated in the online shop. In doing so, the sewing of the cloth is based on real garment pattern data. The result is a realistic simulation of the garments’ drape and cloth surface. The simulation is capable of representing subtle details like sewing threads and knobs, as well as of high-quality lightning and shading. Customers can interactively change perspective and background scenery as well as zoom into details.
76
D. Rupprecht, R. Blum, and K. Khakzar
After entering the online shop customers may start by first configuring their avatar(s) or by browsing the product catalogue. There, interesting clothing can spontaneously be selected for try-on and appraisal with the currently activated avatar. In cooperation with the simulation customers can interactively swap pieces of clothing or made-to-measure garment components directly from the product catalogue for easy comparison. The user-related criteria are dealt with in a cyclic user-centred design process based on ISO 13407 [2]. As many fundamental user interface design options exist for the system’s different functional components, comparative experiments with prototypic mock-ups are continuously conducted in order to come to well-founded design decisions. The functional component relevant for the study reported in this paper is the product catalogue navigation and presentation. 1.3 Modalities of Needs Modality of needs is one concept to describe the ways a customer usually proceeds through the decision making process, depending on the consumer’s current intension. It forms the main aspect of the evaluation of different product catalogue navigation and presentation concepts in this study. Michele Ambayé [3] describes three different kinds of modalities. A customer may start the shopping process with any one of these three types. Pre-knowledge driven customers know exactly what they are looking for. So, the purchasing criteria are quite clearly defined in the customer’s mind, because for example a customer has previously bought some brand jeans (and has been satisfied with it) and would now purchase another piece again. From this it follows that the customer only needs to search on limited additional criteria like best price and delivery. In contrast, if the customer only knows what sort of product is needed, the modality is called function driven. For example the customers is looking for a suit shirt, but has not yet established specific criteria intrinsic to the product, such as for example its style or material. Therefore, the purchasing process is driven by the need for the function provided by the product: it must be a shirt, that fits well and maybe has a certain colour. Impulse driven consumers are just browsing the Internet and have no conscious knowledge of a need. The need is established for example as a result of interacting with an Internet site or looking at an advertising banner. Even though the customer does not really needs the found product, the purchase decision is made impulsively. One important point is, that people may possibly switch between modalities during the purchasing process (shifts in mode). This occurs, when the consumer’s current need changes. For e-commerce user interfaces trying to optimally support modalities of needs including shifts between them may be a promising way to best serve an otherwise extensive, heterogeneous audience. Furthermore, it is likely that a particular e-commerce user interface design may serve one modality of needs better than another.
Evaluation of Web User Interfaces for the Online Retail of Apparel
77
2 Method Four different concepts, each implemented with a rapid prototype with basic functionality, were chosen for the comparative study - based on three aspects: • support of the depicted modalities of needs • potential for high usability and user experience for a broad spectrum of apparel consumers and, • innovativeness. Each prototype was assembled with 162 products for men and women with a large array of product types like trousers, blouses, shirts, jackets etc. and different labels and trademarks in various colours and patterns. The following chronology of different approaches is the one used in the realised study. It tries to alternate new concepts with rather known ones, to give the participants a variety between the separate prototypes. 2.1 Associative Approach The first approach is an associative one. In an associative navigation structure, the user does not have to think the way the designer has organized the navigation like in an ordinary hierarchical structure, because the navigation is based on every decision the user takes. Every click on a navigation link makes the chosen link the new central topic and arranges related topics around the central topic [4]. The Transfer of this idea to an online shop for the purpose of this test is described in the next paragraph.
Fig. 1. Screenshot of the associative prototype
78
D. Rupprecht, R. Blum, and K. Khakzar
On the front page the customer can choose between different categories separated by men and women. Selecting one category results in showing a picture of one random cloth item of the category in the middle of the page surrounded by four different types of classes: brand, colour, type and pattern. In each of the classes the pieces that are most alike the selected item in terms of the class attribute are visually grouped near it (see Fig. 1). Every new selection by the user is resulting in a new sorting of the shown catalogue by placing the selected item in the middle with a new sorting around it. A navigation throughout the whole catalogue is achieved with navigation arrows at the edges of the screen to even reach items, which are invisible due to their distance from the center space. 2.2 Adaptive Approach The second approach is to use adaptation in the interface. So, the interface adapts itself automatically to one of the three modalities of needs. This is done by changing part of the navigation on the web front-end according to the customer’s current mode. In this test an ordinary organized product catalogue is complemented by additional navigation components on the right side of the web page. This results in a segmentation of the page into three parts: at the left resides a hierarchical navigation structure based on categories, the content is placed in the middle and at the right side the adaptive interface is shown.
Fig. 2. Screenshot of the adaptive prototype in impulse driven mode
This adaptive interface uses an advanced search in pre-knowledge driven mode with search filters for various product features like colour, material or pattern. In function driven mode all featured trademarks are used as root categories with the respective types of garment as subcategories. Listing different thematic, life-style
Evaluation of Web User Interfaces for the Online Retail of Apparel
79
oriented areas like winter, sports or outdoor should help to browse through the catalogue in impulse driven mode (see Fig. 2). 2.3 3D Approach The third one uses a three-dimensional environment to display the catalogue like a real store. In this way, the customers can walk through the store similar to real life. Each product item is displayed in two-dimensional information spaces located on separate wall sections. Using several floors with several virtual rooms the product range is structured spatially like in a real store based on garment types. In the middle of each room outline maps in the form of a schematic presentation of the room and level structure allow the transfer between the separate floors. Navigation means, all used via mouse and keyboard, are moving forward, backward, to the right and to the left, turning around at the current position and automatically moving along a predefined path to a certain destination, while observing the crossed environment. The prototype was realised using the 3B software for creating virtual three-dimensional rooms [5].
Fig. 3. Screenshot of the 3D prototype
2.4 Classic Approach The fourth and last prototype is a classic, hierarchical presentation for comparative purposes, to see if one of the other concepts is better than the traditional way to present a catalogue. It consists of a hierarchical category tree on the left and a selection and detail view in the middle. The detail view presents either the items belonging to the currently selected product category or detailed information about a chosen product (see Fig. 4).
80
D. Rupprecht, R. Blum, and K. Khakzar
Fig. 4. Screenshot of the classic prototype
3 Tests Each prototype was built and tested under laboratory conditions with sixteen mixedgender test candidates, which did not have to have special prior knowledge. A user interview with a questionnaire of one hour per person, one person at a time was conducted by two test assistants, one for the overall questions and one to observe the behaviour of the test candidate during the test. The study employed a 3 (modality of needs) by 4 (prototype approaches) factorial design with both factors manipulated within subjects. All candidates had to try out the respective prototype in all three modalities of needs controlled via scenario descriptions that were presented to the participants. Each description detailed a typical situation for the respective modality. For example, for the pre-knowledge driven modality it stated: “Imagine, you had run across an advertisement for a certain name-branded product, you want to buy. Therefore, you are browsing the catalogue of an Internet shop for products of this brand”. Prepared like this, the test subjects had to accomplish three to four given tasks with each of the prototypes. The tasks were characteristic for the current modality, e.g. “Please chose the women’s trousers XY of the brand YZ.” During task execution the test assistants collected usability related performance measures like speed, success, effort of completion, occurrence and severity of difficulties, noted observations and estimated user experience attributes on Likert items, e.g. joy, fun or how interesting the candidate conceived the prototype. After having passed through all the modality scenarios, the interview concerning the prototype was conducted. First, the candidates had to rate the tasks’ complexity. Then, for each prototype general usability-related questions were raised, involving estimation, e.g. usefulness for the given task or comfort and orientation in the prototype, and open-end questions, e.g. if the participant especially likes a specific
Evaluation of Web User Interfaces for the Online Retail of Apparel
81
feature and why or if the subject missed some known functionalities. This was followed by prototype-specific questions, e.g. concerning characteristic features. Finally, the participants had to appraise the same user experience attributes concerning the shopping experience as mentioned above. At the end of the whole process the test candidates had to bring the single prototypes in a sequence from best to worst prototype.
4 Results For each of the four prototypes the results of the tests are summarized in the following: 4.1 Associative Approach This prototype was difficult to use for the test candidates und they needed some training period to understand the part of the system that shows the product catalogue. The problem was, that the participants did not recognize the sorting and resorting in the four different classes and conceived it as chaotic. In terms of usability the statement of the test persons is, that the system is something completely new to them but is to confusing with too many pictures and an overloaded page. On the other hand, the prototype got relative neutral ratings regarding shopping experience. It was conceived very interesting but was no fun to use. 4.2 Adaptive Approach This prototype was very easy to use and the additional navigation on the right side was considered to be a very useful addition. The usability was rated very positive throughout all criteria. One thing that annoyed and confused some test candidates was the automatically done changes per modality of needs. They desired to switch the adaptive changes independently on their own. Just as the usability the shopping experience was rated positive, too. Using the prototype was perceived as pleasant. 4.3 3D Approach This prototype caused the most problems during the testing process. So, the processing was often only possible with the support of the test assistants and two participants even abandoned the test with this prototype. Most of the problems resulted from the candidates loosing the overview in the three-dimensional world and having problems with the navigation. Hence a very long process time emerged. Therefore, this prototype had the worst result relating to the usability - despite the fact that the first impression of the system was very positive and well-known from PC games by some test candidates. Another critical comment was, while the shop was of a three-dimensional kind, the products were presented only in two-dimensional pictures. It is not surprising that the user experience was classified as very unpleasant by most of the participants. But on the other hand it was the most interesting und exciting prototype for the remaining attendees.
82
D. Rupprecht, R. Blum, and K. Khakzar
4.4 Classic Approach Like the adaptive prototype this one showed a very easy handling for all the participants. Referred to usability the prototype gave a good orientation and clarity in the used navigation structure, but felt ordinary and standard. Furthermore, the adaptive additions of prototype two were missed during the work with this one. The shopping experience was sensed as rather boring, with no big fun and not interesting but with a relatively pleasant usage experience. 4.5 Order of Prototypes and General Notes The adaptive concept was rated as best ahead of the classic and the associative one. Due to the problems in the three-dimensional world the 3D approach comes in last of this sequence, whereas it is to say, that half of the test candidates think this one is the best, while the other half uttered the opposite. In all four prototypes search, filter and help functions were missed by the participants, owed to the fact, that the shops are only realized as rapid prototypes with basic functionality.
5 Conclusion The results of the interviews show, that no clear statement for a single concept can be found. A good starting point is to use a classic, hierarchical construction and improve it with a selection of the other concepts. The traditional way to present a catalogue was familiar and easy to use for the test participants, but provided no real shopping experience. In general, the concept of adaptation got an overall positive feedback, but the automatic adaptation of the interface displeased the test candidates. So, giving the customers the choice to select the adaptive components themselves, like an advanced search mode, would improve the interface. Both the three-dimensional and the associative interface should not be used for navigation in a catalogue alone, but they can possibly be useful as additional components of the interface. Especially, the three-dimensional aspect is very important for improving the shopping experience concerning fun or how interesting the prototype is. On the other hand, it became apparent, that this concept is very controversial. Half of the test candidates were amazed by using the 3D prototype and the other half had severe problems navigating through the 3D world. So, using this approach for only parts of a web site, like a virtual try-on as an addition to the “normal” page, would possibly increase the shopping experience. The associative prototype needed some initial training of the candidates and the sorting of the items was not recognised but experienced as chaotic. On the other hand, using this prototype proved to be very interesting. So as a consequence, a possible approach is not to take only one concept, but to combine aspects of all four ideas, like described above.
Evaluation of Web User Interfaces for the Online Retail of Apparel
83
Acknowledgments. This research was financially supported by the German Federal Ministry of Education and Research within the framework of the program “Forschung an Fachhochschulen mit Unternehmen (FHprofUnd)”.
References 1. Park, C.-H., Kim, Y.-G.: Identifying key factors affecting consumer purchase behavior in an online shopping context. International Journal of Retail & Distribution Management 31, 16– 29 (2003) 2. ISO 13407, Human-centred design processes for interactive systems (1999) 3. Ambayé, M.: A Consumer Decision Process Model For The Internet. Brunel University Information Systems and Computing PhD Theses (2005) 4. Peake, R.: An Experiment in Associative Navigation (2007), http://www.robertpeake.com/archives/ 365-An-Experiment-In-Associative-Navigation.html 5. Three-B International Limited, 3B, http://3b.net/browser/newhome.html
A Coauthoring Method of Keyword Dictionaries for Knowledge Combination on Corporate Discussion Web Sites Shinji Takao1, Tadashi Iijima2, and Akito Sakurai2 1
NTT Advanced Technology Corporation 12-1, Ekimaehoncho, Kawasaki-ku, Kawasaki-shi, Kanagawa 210-0007, Japan
[email protected] 2 Keio University Faculty of Science and Technology 3-14-1, Hiyoshi, Kouhoku-ku, Yokohama-shi, Kanagawa 223-8522, Japan {iijima,sakurai}@ae.keio.ac.jp
Abstract. This paper states the issues faced, and the role played by keyword dictionaries with regard to discussion based web sites which aim to achieve a 'collective knowledge' through the voluntary participation of corporate employees, and proposes a corrective strategy. A keyword dictionary is valuable in that it helps to integrate fragmented accumulated knowledge with generalized knowledge. However, this necessitates a method that allows for coauthoring, and Wiki, BBS and other existing tools are still insufficient in this respect. As well as offering a method for expanding BBS, this paper shows a method for assessing use within an actual corporation. Keywords: coauthoring, collective knowledge, wiki, bulletin board system.
1 Introduction In recent years corporate entities have moved to intensify relations and knowledge sharing between employees through the establishment of web sites where the employees can freely exchange opinions [1,2]. The main function of such corporate web sites is often as a Bulletin Board System (BBS) or Diary (Blog). These sites offer a place for discussion and the sharing of personal records through asynchronous text. Web sites based on these functions (discussion web sites) are capable of fulfilling objectives such as discussion, Q&A, and publicity. However, information inputted on a daily basis is fragmented; often meaning it is only useful at that point in time. For this reason, it is quite difficult to systemize this knowledge to allow for its reuse and the further creation of new knowledge. It is thought that a keyword function is a valid method for solving this problem. With this method, registering a certain expression as a “keyword” generates a page for the expression (Figure 1). This page subsequently displays an explanation of the expression, along with a list of links to information within the site that contain the M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 84–93, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Coauthoring Method of Keyword Dictionaries for Knowledge Combination
85
Keyword Keyword Web Page Registration 【Explanation of the Expression】 ・・・・・ User A Inputting of 【Link List】 Explanation Article A for the Article B Expression .... Automatic Generated Mutual Link User B
Writings of BBS or Blog
Fig. 1. Overview of Keyword Functions
expression. Furthermore, links are automatically generated, and explanations for each expression are coauthored by the registrant of the keyword as well as other users. This enables the integration of information within the site centered on keywords, while also allowing for the creation of reusable systematic information based on the integrated information. More specifically, the combination of fragmented information accumulated on corporate discussion sites is promoted, making the creation of new knowledge possible. Services analogous to this are already on the internet [3]. However, while these existent services presently achieve their information integration function, they have not yet led to the creation of sufficient systematic information. This is thought to be due to the following 2 points. 1. Existence of competitive services: Wikipedia already exists as a systematic information source, and users have an inclination to use these services. 2. Issues with the coauthoring method: as the coauthoring method resembles that of Wiki, there are relative few users who can make full use of its features. Issue (1) is rarely directly related to corporate web sites covered by this paper due to the existence of specific corporate issues and knowledge that cannot be obtained with on-line services such as Wikipedia (conversely, an overlapping of objectives between corporate sites and on-line services can lead to the inert use of corporate sites). On the other hand, issue (2) is very directly related. From herein, this issue will be addressed. Although Wikipedia is currently the most successful on-line service utilizing Wiki, it is thought that less than 1% of users join the authoring process on an on-going basis [4]. Despite this, it is thought that the ability to draw many on-line users means that they obtain the assistance of enough overall coauthors. On the other hand, it is thought that relatively small scale services and corporate sites cannot obtain a sufficient amount of coauthor assistance. While even corporate BBS and blog use achieves a certain degree of popularity, Wiki has still not been extensively introduced. This paper proposes a coauthoring method that expands BBS by investigating the issues faced by the Wiki based coauthoring method and by reviewing existent
86
S. Takao, T. Iijima, and A. Sakurai
research regarding coauthoring. Moreover, these proposals are implemented and introduced in depth along with methods for evaluation their effectiveness.
2 Investigation of Coauthoring Methods 2.1 Wiki and BBS Users of the current Wiki service are thought to encounter the following 3 problems. 1. A lack of user friendliness as special tags called *wiki markup* have to be used. 2. More strenuous text creation when compared the personal and fragmented text created with BBS and blogs. 3. Difficulty in finding reference information regarding others opinions and past text. While problem(1) can be solved through the development of additional functions such as GUI featuring editing tools, (2) and (3) are inherent to the basic nature of Wiki, making any ultimate solution difficult as long as Wiki is utilized. (2) and (3) are discussed below. It is expected that text created by Wiki would encompass documents utilized as knowledge and cross-referenced by a large number of people in comparison to BBS and Blogs. The need for direct editing of this is a great burden on the user. In addition, such methods make it difficult to reference information including past records and discussion processes (Wiki's implementing figuration also means records are sometimes not saved). Therefore, this method may increase the feeling of burden felt by users when creating text. On the other hand, BBS and Blog do not target a single document, but individually handle each user's multiple fragmented entries. The difference between BBS and Blog is that while the former focuses on mutual Q&A and discussion through entries, blog mainly consist of entries made by a single person1. These BBS and Blog sites have become extremely popular as a means of communicating on the WWW and other such on-line services. However, they have not been utilized with the aim of creating a single document such as with Wiki. 2.2 Coauthoring Related Research The research of text coauthoring by multiple users in an electronically aided environment has been mainly carried out in the groupware field. Here, it is believed that communication between participants is important. As a result of investigating various jobs and organizations, Couture and Rymer (1991) state that text authoring through the direct participation of several persons (group writing) is actually not undertaken very frequently, but rather it is more often the case that a single person authors the text based on the opinions of coworkers (interactive writing)[5]. Our experiences also persuade us to come to the same conclusion. Therefore, there is little proposal of group writing with groupware research, and a greater emphasis is placed on authoring through the division of responsibilities into 1
There are cases where readers can comment on blog entries as with BBS, but it is generally the case that the blog writer has the ability to reject such comments.
A Coauthoring Method of Keyword Dictionaries for Knowledge Combination
87
the form of multiple authors, commentators and reviewers[6-8], or commenting on text (annotation)[9-11]. Note, however, that the existent groupware research covers complex and special systems, and is not well suited to popularization through simple implementing via WWW technology. Moreover, methods proposed by this research set a certain user within the group as the author. Therefore, this is well suited to use within a small group aiming to create text in a certain period of time, and where responsibilities are clearly divided. However, these methods are not suited to text creation wherein an unspecified number of persons contribute steadily over a long period of time. For the latter, a method where text creation rights are equally shared amongst a large number of users, such as Wiki, is well suited. In addition, related methods that emphasize annotation necessitate the creation of an initial document by a specified author before other participants can comment on the text. Therefore, the existence of some kind of document is a prerequisite of group work. 2.3 Authoring Method That Expands BBS This paper applies a deliberative model [12] that expands BBS based on the problems faced by existent systems such as Wiki, and related research. Figure 2 is a pattern diagram showing the deliberative model. Firstly, each individual enters information as with a BBS or blog (1). Any participant can play a part in integrating this and creating a proposal (2). Proposal creation in (2) is undertaken in the same way messages are posted on BBS. This allows the user to create a proposal based on the BBS discussion utilizing a method that is almost identical to general BBS use. This type of process is similar to that used in meetings where issues are first discussed, and conclusions are made after all opinions are heard and then integrated. The creation of text through discussion allows users to create content with considering multiple aspects, and to share responsibility. This is why our method is called the deliberative model. Identical Group
Rejection (4) (1)
Comment Documentation
Comment
Discussion Acceptance
Comment
(2) Adoption(3) A ny Participant
(5)
Document
Fig. 2. Schematic Diagram of Deliberative Model
88
S. Takao, T. Iijima, and A. Sakurai
The proposal is made official through adoption by the user group(3). A vote is generally taken to decide on adopting the proposal. However, in the case there is no set number of users with voting rights, voting is not always undertaken and the document can be adopted when proposed at (2). It is not problematic because proposals can be made any number of times and results can be renewed. In addition, another proposal (document) is created if the proposed document is not adopted upon voting (4). Moreover, the created document is commented on once again (5) and a cycle of renewal based on this is repeated. Therefore, the deliberative model is an iterative process and it this cycle of iteration that makes it possible to increase document quality.
3 BBS Expansion The deliberative model is realized by expanding existing BBS. An example of such an expansion is shown in Figure 3, screens from the system utilized by this paper for actual evaluation. In this figure the left side is the main BBS screen, showing a list of messages posted, and the screen on the right side is for entering messages to post. First, one clicks the "Input button [A]" on the main BBS screen and opens the message editing screen. Comments posted from here in the same fashion as a general BBS are displayed on the main BBS screen(1). This is how comments are posted. In order to propose a document, one selects "Proposal" under “Message selection field [B]” on the same message editing screen. The posted document proposal is displayed along with comments, and at the top of the main BBS screen(2). A button [C] connected to the voting function is displayed with the document proposal shown at the top. The voting function executed with this button allows for approval rating, inputting of reasons, and voting. Adoption of the proposed document is generally decided by a majority vote. The text of the adopted proposal is newly displayed at the top of the BBS as it stands at the time (3). BBS Present document text (3 )
Proposal with new document text (2 ) Voting function button [C]
Message editing screen Message selection field [B] Comment Proposal
○
[A] Input button
Message 1 Comment
◎
Enter message here (1)
Message 2 Comment Message 3 Proposal
(2 )
Message posting button
Tag to indicate it is a proposal
Fig. 3. BBS Expansion
A Coauthoring Method of Keyword Dictionaries for Knowledge Combination
89
Note, however, that in the case it is immediately adopted without voting at (2), the present text is immediately replaced with the proposed document. Accordingly, in such a case, the present text and the newly proposed text are not displayed along side one another. Upon implementing, standard plugins (modules) used by XOOPS [13] powered BBS were modified. XOOPS is an open source CMS (contents management system) that can incorporate a variety of functions, including BBS. XOOPS is a relatively easy to install, widely used tool that is equipped with broad range of functions required by the on-line community beginning with user administration. Implementing as this type of CMS module increases user-friendliness during actual use. The tool created is called EMCOT (evolution manual coauthoring tool) and is available from a dedicated download site [14,15].
4 Significance and Research Issues Table 1 explains the characteristics of the deliberative model proposed by this paper. The deliberative model is similar to the process used in meetings to draw a conclusion after finishing discussions, allowing the user to create a proposal while referring to other user’s opinions. Therefore, it is easy to undertake discussions centered on text. On the other hand, authoring with the Wiki method (Wiki model) involves the direct editing of text without discussion between users. For this reason, it is thought that text creation based on the deliberative model is more user-friendly than Wiki. Moreover, while the annotation method (annotation model) often used by coauthoring tools in groupware research is suited to text centered discussions, it is not suited to text created over a long period of time with the assistance of multiple users, such as a keyword dictionary, as responsibility for proposal authoring and proposal adoption is allocated to a specific participant. Table 1. Comparison of each type
Wiki model Annotation model Deliberative model
Text centered discussion difficult easy
Responsibility for proposal creation Any participant A specific participant
Responsibility for proposal adoption Any participant A specific participant
easy
Any participant
Majority vote, or Any participant
Accordingly, it is the Wiki model and the deliberative model that can be utilized in the creation of this paper's objective, a keyword dictionary. And upon comparison of these two, it is predicted that the deliberative model facilitates easier text creation due to the nature of its user-friendly text centered discussion. Note, however, that it not yet confirmed whether this is the case with actual use. The research issue of this paper is to actually apply the Wiki model and the deliberative model, and comparatively evaluate them. Methods of application along with evaluation techniques for these models are explained in the next section with regard to keyword dictionary coauthoring.
90
S. Takao, T. Iijima, and A. Sakurai
5 Corporate Evaluation Method 5.1 Applicable Subject A given IT company has been running web site where employees can exchange opinions since 2004. Mainly discussion (opinion exchange) is undertaken at this site, with frequent Q&A exchanges carried out. Recently, as user numbers increase along with an increase in accumulated postings it has become difficult for newly participating users to know what kind of discussion previously look place, subsequently making it difficult to utilize information. By introducing a keyword function to the site users become able to integrate information containing the expression which the user is interested in. The keyword function utilizes a modified version of XWORDS, a XOOPS module. 5.2 Comparative Experiment A function for coauthoring expression explanations was introduced to the above keyword function in the following way. The coauthoring function introduced is based on either the Wiki model or the deliberative model. When a user registers a certain expression as a keyword, one of the above two models is randomly assigned as the coauthoring method for that expression. Keyword Page Expression (Heading) Explanation of Expression [Edit Button] Automatic link to internal information containing the expression
Only valid for Wiki model
Explanation of the authoring method Index list of posted messages
Click to jump to BBS
Fig. 4. Keyword Page
A keyword page is as shown in Figure 4. An expression heading and explanation are displayed at the top of the page, while a list of links to internal information containing that expression is displayed at the bottom left. An explanation of the authoring method is displayed at the middle right hand side. An edit button for jumping to the editing screen is also displayed here for expressions assigned the Wiki model as the authoring method. Expression explanations can be directly edited at the editing screen.
A Coauthoring Method of Keyword Dictionaries for Knowledge Combination
91
An indexed list of BBS comments for the expression is displayed below the explanation of the authoring method. This is common to both the Wiki model and the deliberative model. With the Wiki model, clicking a link in the list jumps to a regular BBS screen where there is no coauthoring function and only message posting is undertaken. On the other hand, with the deliberative model, clicking a link jumps to a BBS screen with coauthoring functions described in section 2.3. Both these BBS are realized by altering the EMCOT operational settings. Therefore, there are almost no differences except the presence/absence of a coauthoring function. Moreover, the markup language utilized during the authoring of explanations was introduced to both models through XOOPS. As you can see, all functional conditions except the comparative subject, the coauthoring method, have been made identical to every possible extent. 5.3 Evaluation This evaluative experiment evaluates logs automatically recorded by the system along with users’ subjective analysis. Table 2 shows a simplified representation of these indexes and research items. The result of this evaluative experiment which began in late January 2009 was not yet available at the point this paper was being written. This evaluative experiment is to be continued for a given period of time into the future and the result reported in a separate paper. Table 2. Indexes & Research Items
Index Log No. of comments No. of revisions Text count No. of text references Subjective evaluation User-friendliness Ease of authoring Burden on user User satisfaction Text adequacy
Research Item Discussion vitality Amount of contribution to the text Text adequacy Text usefulness Ease of operation Authoring model evaluation Authoring model evaluation Authoring model evaluation Text adequacy
6 Summary and Future Research This paper stated that a keyword dictionary is valid for the discussion web sites being employed by corporate bodies of late, and that the adequacy of text content is currently an issue. Then the deliberative model, which is a coauthoring method that expands BBS, was shown as a solution to this problem.
92
S. Takao, T. Iijima, and A. Sakurai
A method based on the deliberative model is similar to Wiki’s method (Wiki model) in that it is well suited to creating text over a long period of time through the steady contribution of multiple participants. In addition to this, another characteristic of the deliberative model is that it facilitates easy text centered discussions. It can be anticipated that creating a proposal after it is thought that sufficient information has been gathered through text centered discussions simplifies text authoring for participants. On the other hand, it is thought that this cannot be sufficiently achieved with the Wiki model. Along with applying an authoring method based on this deliberative model to the keyword dictionary, this paper explained an evaluative experiment method for its comparison with the Wiki model. This evaluative experiment will be continued for a fixed period of time and the result reported on in the future.
Acknowledgement I sincerely thank Mr. Takashi Okada, Mr. Kentaro Shimizu, Mr. Kota Motomura (Nippon Telegraph and Telephone Corporation), Mr. Terunao Soneoka, Mr. Keizo Sugita, Mr. Hiroshi Koyano and Mr. Toshiyuki Iida (NTT Advanced Technology Corporation), for their consideration and support.
References 1. Ministry of internal affairs and communications. Publication of business blog and SNS practical use examples (December 22, 2005), http://www.soumu.go.jp/s-news/ 2005/051222_13.html (retrieved January 25, 2009) 2. IDG Japan. Research on the state of affairs on business blog and SNS in national enterprises (digested version) (October 2007), http://www.idg.co.jp/expo/ research/report/200710.html (retrieved January 25, 2009) 3. HATENA keyword (n.d.), http://k.hatena.ne.jp/ (retrieved December 25, 2007) 4. Imai, M., Hasegawa, H.: (April 2007) Wikipedia: An encyclopedia in the collaborative era, http://www.deepscience.miraikan.jst.go.jp/special/rebazaar/ 2007/04/wikipedia.html (retrieved January 25, 2009) 5. Couture, B., Rymer, J.: Discourse interaction between writer and supervisor: a primary collaboration in workplace writing. In: Lay, M.M., Karis, W.M. (eds.) Collaborative writing in industry: investigation in theory and practice, pp. 87–108. Baywood, Amityville (1991) 6. Leland, M., Fish, R., Kraut, R.: Collaborative document production using Quilt. In: Proceedings of the 1988 ACM conference on Computer-supported cooperative work, Portland, Oregon, pp. 206–215 (1988) 7. Neuwirth, C.M., Kaufer, D.S., Chandhok, R., Morris, J.H.: Issues in the design of computer-support for co-authoring and commenting. In: Proceedings of the third conference on computer-supported cooperative work (CSCW 1990), Baltimore, MD, pp. 183–195 (1990)
A Coauthoring Method of Keyword Dictionaries for Knowledge Combination
93
8. Neuwirth, C.M., Kaufer, D.S., Chandhok, R., Morris, J.H.: Computer support for distributed collaborative writing: a coordination science perspective. In: Olson, G.M., Malone, T.W., Smith, J.B. (eds.) Coordination theory and collaboration technology. Lawrence Erlbaum Associates, New Jersey (2001) 9. Brush, A.J.B.: Annotating digital documents for asynchronous collaboration. Ph.D. dissertation, University of Washington. (2002), http://research.microsoft.com/~ajbrush/papers/TRBrush.pdf 10. Weng, C., Gennari, J.H.: Asynchronous collaborative writing through annotations. In: Proceedings of the 2004 ACM conference on Computer supported cooperative work, Chicago, Illinois, pp. 578–581 (2004) 11. Zheng, Q., Booth, K., McGrenere, J.: Co-authoring with structured annotations. In: Proceedings of the SIGCHI conference on Human Factors in computing systems, Montreal, Quebec, pp. 131–140 (2006) 12. Takao, S., Iijima, T., Akito, S.: Developing bulletin board systems that enable to improve multiple communities and documents. The IEICE transactions on information and systems (Japanese edition) J89-D (12), 2521–2535 (2006) 13. XOOPS Cube Japan (n.d.), http://xoopscube.jp/ (retrieved January 25, 2009) 14. EMCOT (n.d.), http://wwww.emcot.net/ (retrieved January 25, 2009) 15. Takao, S.: A Study of Support Functions for Active Knowledge Sharing and Discussion on an Electronic Bulletin Board. Ph.D. dissertation, Keio University (2008)
An Empirical Study the Effects of Language Factors on Web Site Use Intention Hui-Jen Yang1 and Yun-Long Lay2 1 Department of Information Management, National Chin-Yi University of Technology, Taiwan 2 Department of Electronic Engineering, National Chin-Yi University of Technology, Taiwan
[email protected]
Abstract. Based on the research model, language anxiety, prior non-native language experience, Internet self-efficacy and language self-efficacy are analyzed for the intention to use non-native language commercial web sites, respectively. Prior non-native language experience has affected language anxiety, language self-efficacy and intention to use non-native language commercial web sites, respectively. By the same token, whether or not Internet self-efficacy and language self-efficacy affected by language anxiety is also examined. A valid sample of 418 undergraduates was tested in this study. Path analysis results fully supported the model tested. These results suggest that language anxiety, prior non-native language experience, language self-efficacy and Internet self-efficacy have an effect on the intention to use non-native language commercial web sites. Prior non-native language experience has significantly affected language anxiety, language self-efficacy and the intention to use the non-native language commercial web sites, respectively. Furthermore, language anxiety has significantly affected language self-efficacy and Internet self-efficacy, respectively. Educational research and practitioner implications are provided at the end of the paper. Keywords: non-native language commercial web site; prior non-native language experience, language anxiety; language self-efficacy; Internet self-efficacy.
1 Introduction The function of language is to communicate with other people, so language is likely to play a crucial role as a communication medium [14]. English is one of the most popular languages the world over. In Taiwan, the government encourages students to start learning English from the fifth grade in elementary school. However, will a student with more English learning experience intend to use a commercial web site in English context more easily, have less anxiety using English or have more selfefficacy in English? Hence, the first purpose of this study is to uncover the impact of language experience on language anxiety, language self-efficacy and intention to use non-native language commercial web sites. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 94–102, 2009. © Springer-Verlag Berlin Heidelberg 2009
An Empirical Study the Effects of Language Factors on Web Site Use Intention
95
A number of recent studies have focused on the factors affecting Internet usage, such as Internet self-efficacy, perceived ease of use, social factors, subjective norm and so on [30, 8, 24]. Therefore, it is worth noticing the impact of language anxiety and language self-efficacy on the intention to use non-native language commercial web sites. Previous research has indicated many web users are identifying attractive shopping opportunities on the Internet, but there are barriers and other concerns preventing the purchase from being completed or the users from revisiting international web sites [36-37]. Why do web users hesitate to use commercial web sites? Does language interface play a critical role as the foundation of communication? Previous researches indicated that people possessing better prior language experience and greater fluency in the non-native language have increased opportunities for interaction with nonnative people [12,29]. In fact, language anxiety has been generally missing from information technology behavioral research. Recent estimates suggest that there are over 480 million Internet users globally. Among them, just over half do not use English as their language of communication [18, 20, 21]. Thus, in this research the commercial web site applications in English context were the study cases. The second purpose of this study is to investigate the effect of language anxiety and language self-efficacy on the intention to use commercial web sites.
2 Theoretical Background 2.1 Self-efficacy Self-efficacy plays an important role in affecting one’s behavior and motivation to execute the courses of action required to produce given attainments or behavior [2-3]. People with low self-efficacy will be less likely to perform related behavior in the future [5, 15]. Bandura [2, 5] identified four powerful sources influencing the personal self-efficacy which are performance accomplishment, vicarious experience, verbal persuasion, and emotional arousal. Bandura [2, 5] also suggested three outcomes of self-efficacy that can predict changes in people’s behavior- choice behavior, effort expenditure and persistence and emotional reactions or arousal. Thus, Bandura [5] suggested that the measurement of self-efficacy needed to be tailored to the specific domain of interest to maximize the prediction power. Therefore, in this study, Internet self-efficacy and language self-efficacy are applied to a specific type of self-efficacy in the Internet commercial site environment and language domain, respectively. 2.2 Anxiety Generally, there are three types of anxiety: state anxiety, trait anxiety, and situational anxiety. State anxiety is apprehension experienced at a particular period of time [41]. Trait anxiety refers to relatively stable individual differences that characterize people’s anxiety stated and their prominent defense against such states. Situational
96
H.-J. Yang and Y.-L. Lay
anxiety refers to individuals who suffer from reaching a specific environment, feeling anxious only when certain factors are present. According to Horwitz et al. [22], foreign language anxiety can be associated with three factors: a fear of negative evaluation, test anxiety, and communication apprehension. Hence, in this study we proposed language anxiety is a state anxiety and is then defined it as “a negative emotional state or negative cognition experienced by a user while an individual is accessing a non-native language sites.” 2.3
Language Experience
The word “experience” may refer both to mentally unprocessed immediately-perceived events as well as to the purported wisdom gained in subsequent reflection on those events or interpretation of them. Yates and Chandler’s research presented that prior knowledge or experience provides a great amount of relevant information in specific fields or domains as a foundation for organizing that knowledge [45]. In Eviatar’s research [16], he suggested that the language experience can be operationalized in two ways. One is by the number of language system in which the subject is fluent (nonverus multilingualism), and the other is by a specific characteristic of languages being tested. In this study, we proposed language experience is an experience in reading and writing abilities which accumulate over a period of time.
3 Research Models and Hypotheses According to previous research, we proposed a number of hypotheses related to the intention to use non-native language commercial web sites incorporating the selfefficacy theory and anxiety. 3.1 Prior Non-native Language Experience Prior knowledge or experience provides a great amount of relevant information in specific fields or domains as a foundation for organizing that knowledge [45]. Previous research indicates that if students enter a program with a wide range of prior experience and knowledge, it will help them quickly adopt and develop self-confidence in the new skills or capabilities they are learning [45]. Previous research also has shown that failed and successful feedback from the past experiences significantly affected the subsequent behaviors [6]. Specifically, past success would increase one’s self-efficacy and reduce one’s fear, and thus, increase one’s effort to put into potential new projects [44]. So the hypotheses 1 to 3 were then proposed as follows. H1: Prior non-native language experience has a negative effect on non-native language anxiety. H2: Prior non-native language experience has a positive effect on non-native language self-efficacy. H3: Prior non-native language experience has a positive effect on the intention to use non-native language commercial web sites.
An Empirical Study the Effects of Language Factors on Web Site Use Intention
97
3.2 Internet Self-efficacy, Language Self-efficacy and Language Anxiety Anxiety would inhibit one’s ability to process incoming behavior or performance. Language anxiety can pose potential problems because it can interfere with the acquisition, retention and production of the new language [31]. Thus, language anxiety in this study is defined as “a feeling of tension, apprehension and nervousness associated with the situation of using commercial web sites in English context”. As mentioned above, possessing greater fluency in the non-native language can lead to increased opportunities for interaction with non-native people [19,33]. This means that if an individual is good at a language which is not his native language, it can increase his opportunities to communicate with foreign people. With globalization, English has become one of the most popular languages. Currently, many homepages are designed in English. Therefore, if a person has problems comprehending a foreign language site, it will reduce their confidence or intention to shop or use specific nonnative language commercial web sites. Much previous research supports that the greater the anxiety is, the lower the selfefficacy is [2, 13, 15, 17]. The relationship between anxiety and self-efficacy is a well-established topic in the socio-psychological research field. A number of studies in MIS literature found that there is a negative relationship between computer anxiety and computer self-efficacy [10,34]. Generally speaking, the depressed people would be a self-critical denying his own idea and ability. Bandura [2] presented that the psychological pressure or anxiety would hinder one’s capability to judge the problem he/she faces. H4: Non-native language anxiety has a negative effect on non-native language selfefficacy. H5: Non-native language anxiety has a negative effect on Internet self-efficacy. 3.3 Anxiety, Self-efficacy, and Intention to Use Commercial Web Site Self-efficacy judgments are in turn related to outcome expectations. Outcome expectation depends on how well one thinks he can perform the behavior [2]. Internet self-efficacy and language self-efficacy focus on what individuals believe they can accomplish with online surfing. In turn, a significant amount of research has been conducted to examine the relationship between anxiety and adoption behavior [10, 34, 40, 43]. Previous MIS research also suggests that computer self-efficacy has a positive impact on computer usage [25]. Previous researchers argue that computer self-efficacy is a natural precursor to the Internet, and is invariably a necessary component for use of the Internet [32, 37, 39]. In sum, these studies and theories have laid a firm background for the study of the intention to use commercial web sites. Thus, our research hypotheses six to eight are presented as follows. H6: Non-native language self-efficacy has a positive effect on the intention to use non-native language commercial web sites. H7: Internet self-efficacy has a positive effect on the intention to use the nonnative language commercial web sites. H8: Non-native language anxiety has a negative effect on the intention to use nonnative language commercial web sites.
98
H.-J. Yang and Y.-L. Lay
4 Method 4.1 Participants The target sample for this study was undergraduate students. Complete data sets were obtained for 418 of the original 476 participants of college or university students in Taiwan, which included 218 males (52.2%) and 200 females (47.8%). The age of participants ranged from 18 to 25 years. The native or host language of participants is Mandarin. English is their second language. In the public school system, students have learned English from junior high school till the first year of university study, which is a period of seven years with three to five hours a week of normal study, not including multimedia teaching and computer-aided teaching. The respondents completed a five-section self-report questionnaire in 15-20 minutes. The survey was administered in class by the researcher with a brief introduction regarding the goal of the study. 4.2 Measures The questionnaire items used to operationalize the constructs of each investigated variable were adopted from relevant previous studies, with necessary validations and modification of wording. The Internet self-efficacy and language self-efficacy were adopted from the work of Torkzadeh & Van Dyke’s [42] and Eastin & LaRose’s concept [15] of Internet self-efficacy, replacing the references to computer software and hardware with the non-native language commercial web site environment. The adaptation of the foreign language classroom anxiety scale of Horwitz et al. [23] with modifications made to fit the language anxiety in commercial web sites. The language experience scale was adapted from Oxford [38] and Dobson [13], which is modified into the previous language experience scale. The intention to use commercial web sites. Intention to use the commercial web site use was adopted and modified from the works of Agarwal and Prasad [1] and Davis [11] with modifications to fit the content of non-native language commercial web sites. To ensure the data’s content and face validity and reliability, this study had the questionnaire pre-tested by 26 undergraduate students (10 males and 16 females) in their second year of study in business school. After pre-test, only some wordy in the questionnaire was modified and no items were deleted. 4.3 Reliability and Validity The Cronbach’s alpha values of variables all exceed 0.7, which is recommended as the passing mark of the reliability test for social science research [35]. All items passed the reliability test for language anxiety in general and yielded 0.92 alpha values. Prior non-native language experience has 0.87 alpha values. No items were dropped from language self-efficacy and Internet self-efficacy, which yielded 0.82 and 0.89 alpha values, respectively. The alpha value of the intention to use non-native language commercial web sites is 0.93. This demonstrates good internal consistency of each construct and indicates that the scale is trustworthy.
An Empirical Study the Effects of Language Factors on Web Site Use Intention
99
Discriminant validity was checked by means of factor analysis [28]. The measurement model using exploratory factor analysis (EFA) was assessed to check discriminant validity. The steps of EFA were, in the beginning, using principal component analysis to process factor, then using Varimax as orthogonal rotation and Eigen value equaling to 1 to get factor loading which should be greater than 0.5 [27]. If an item with factor loading values not greater than 0.5, then the item had to deleted and abandoned from further analysis. Among them, no items of factor loading in each construct were less than 0.5, so no items were deleted in each construct.
5 Results Structural equation modeling (SEM), a path analysis, was used to test the postulated hypotheses. Chau [9] and Joreskog et al. [26] suggested some general model fit indices criteria in SEM for identifying model’s goodness-of-fit. The best theoretical model fit was evaluated in terms of chi-square, root mean square residuals (RMR), and a number of goodness-of-fit indices. The chi-square (chi) divided by the degrees of freedom (df) can be seen as a less biased fit estimate (chi²/df) than the chi-square itself because it is dependent on sample size. This ratio should be small, and values below three are considered to be satisfactory (in this study, the ratio is 2) [7]. The root mean square residual should be very small, with values below 0.05 being desirable (0.007). The goodness of fit index (GFI) should be above 0.90 (0.998). The same applies to the adjusted GFI (AGFI; adjusted for degrees of freedom; 0.997). Prior non-native language experience had a significant negative effect on language anxiety ( =-0.331, P<.05), positive effect on language self-efficacy ( =0.374, P<.05) and positive effect on intention to use non-native language commercial web sites ( = 0.467, P<.001), as hypothesized H1, H2 and H3, respectively. Language anxiety had a significant negative effect on language self-efficacy ( =-0.243, P<0.001), Internet self-efficacy ( =-0.062, P<.001) and intention to use non-native language commercial web sites ( =-0.073, P<.001), as hypothesized H4, H5 and H8, respectively. Language self-efficacy and Internet self-efficacy had a significant positive effect on intention to use non-native language commercial web sites ( = = 0.190, P<.001) as hypothesized in H6 and H7. All eight 0.171, P<.001; hypotheses are significantly supported.
β
β
β
β β
β
β
β
6 Discussion and Research Implications The results of this research support the hypotheses that a causal model exists for prior non-native language experience, language anxiety, Internet self-efficacy, and language self-efficacy on the intention to use non-native language commercial web sites. As predicted and consistent with the theoretical perspective, language anxiety has a negative effect among language self-efficacy, Internet self-efficacy and the intention to use non-native language commercial web sites, respectively. This result is also consistent with previous studies [25, 33, 34]. This result indicated that if students have higher anxiety in a non-native language, they would lack confidence of nonnative language and students would be thus afraid to access non-native language
100
H.-J. Yang and Y.-L. Lay
commercial web sites. As to the relationship between language anxiety and Internet self-efficacy, the resulting data suggests a negative relationship in this research. This result indicates that if students have higher anxiety with non-native language usage, they have low Internet self-efficacy. This means that if students with greater nonnative language anxiety, they would not have confidence to access the Internet. This result is consistent with previous research [33, 34]. The results imply that people know what it is which is bothering them, should they try to eliminate or minimize their anxiety in some way. There is a consensus in the literature on the effects of self-efficacy. The extensive self-efficacy literature suggests that users’ self-efficacy with specific domain application influences subsequent behavior [2, 4, 5]. Language self-efficacy and Internet selfefficacy respectively has a positive effect on the intention to use non-native language commercial web sites. These findings are consistent with the theory of self-efficacy [2]. Language self-efficacy has a positive effect on the intention to use non-native language commercial web sites. Additionally, prior non-native language experience, as predicted, has a positive effect on language self-efficacy and intention to use non-native language commercial web sites and a negative effect on language anxiety, respectively. From a theoretical perspective, it seemed logical to hypothesize that more experience with the non-native language would result higher judgment of self-efficacy on the part of individuals because experience would affect one’s self-efficacy. Bateman & Zeithaml [6] proposed that failed and successful feedback from past experiences significantly affected subsequent behavior. Specifically, past success would increase one’s selfefficacy, and thus, increase one’s effort to put into potential new projects [44]. So in this study, students with more positive prior non-native language experience would have both greater language self-efficacy and intention to use non-native language commercial web sites. Whereas, prior non-native language experience as predicted has a negative effect on language anxiety. 6.1 Limitations and Future Research Two weaknesses must be addressed in this study. Firstly, the measurement of prior non-native language experience is based on respondents’ self-reporting about their non-native language experience, which may cause subjective judgment bias. Further research should try to overcome this bias. Secondly, this research concerns applying anxiety and self-efficacy into the scope of commercial web sites. For the intention or confidence to adopt (use) a new technology or new language to be a competent commercial web site user, it is possible that self-efficacy in the scope of Internet is different from the concept of language self-efficacy. More reliable and direct measure of language self-efficacy is needed. Further research should have this limitation rectified. Thus, future follow-up research should improve to further interpret the complex effects of intention to use the commercial web sites and so on. Acknowledgment. This work was supported by the National Science Council of Taiwan, ROC under Grant No. NSC-97-2416-H-167-009.
An Empirical Study the Effects of Language Factors on Web Site Use Intention
101
References 1. Agarwal, R., Prasad, J.: Are Individual Differences Germane to the Acceptance of New Information Technology. Deci. Sci. 30(2), 361–391 (1999) 2. Bandura, A.: Self-efficacy: Toward a Unifying Theory of Behavioral Change. Psychologist 37, 122–147 (1977) 3. Bandura, A.: Self-efficacy: The Exercise of Control. Freeman, New York (1997) 4. Bandura, A.: Self-efficacy Mechanisms in Human Agency. American Psychologist 37, 122–147 (1982) 5. Bandura, A.: Social Foundations of Thought and Action: A Social Cognitive Theory. Prentice Hall, Englewood Cliffs (1986) 6. Bateman, T.S., Zeithmal, C.P.: The Psychological Context of Strategic Decisions: A Model and Convergent Experimental Findings. Strategic Management Journal 10, 59–74 (1989) 7. Bollen, K.A., Long, J.S.: Testing Structural Equation Models. Sage, Newbury Park (1993) 8. Chang, M.K., Cheung, W.: Determinants of the Intention to Use Internet/WWW at Work: A Confirmatory Study. Info. & Manag. 39, 1–14 (2001) 9. Chau, P.Y.K.: Reexamining a Model for Evaluating Information Center Success Using a Structural Equation Modeling Approach. Decision Sciences 28(2), 309–334 (1997) 10. Compeau, D.R., Higgins, C.A.: Application of Social Cognitive Theory for Computer Skills. Information Systems Research 6(2), 118–143 (1995) 11. Davis, F.D.: Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly 13, 319–340 (1989) 12. de Haan, M., Elbers, E.: Reshaping Diversity in a Local Classroom: Communication and Identity Issues in Multicultural Schools in the Netherlands. Langu. and Comm. 3, 315–333 (2005) 13. Dobson, J.E.: Self-regulated Listening of French Language Students in a Web Environment. Dissertation, Department of Education. Catholic University of America (2001) 14. Dustmann, C.: The Effects of Education, Parental Background and Ethnic Concentration on Language. The Q. Rev. of Eco. and Fin. 37, 245–262 (1997) (special issues) 15. Eastin, M., LaRose, R.: Internet Self-efficacy and the Psychology of the Digital Divide. Journal of Computer-Mediated Communication 6(1) (2000), http://www.ascusc.org/jcmc/vol6/issue1/eastin.html 16. Eviatar, Z.: Language Experience and Right Hemisphere Tasks: The Effects of Scanning Habits and Multilingualism. Brain and Language 58, 157–173 (1997) 17. Gist, M.E., Mitchell, T.R.: Self-efficacy: A Theoretical Analysis of its Determinants and Malleability. Academy of Management Review 17, 183–211 (1992) 18. Global Research, Global Internet statistics (by language) (June 2001), http://www.glreach.com 19. Gullahorn, J.E., Gullahorn, J.T.: American Students Abroad: Professional versus Personal Development. Annals 368, 43–59 (1966) 20. Hass, R.: The Austrian Country Market: A European Case Study on Marketing Products and Service in a Cyber Mall. J. of Bus. Res. 55, 637–646 (2002) 21. Hills, P., Argyle, M.: Uses of the Internet and their Relationships with Individual Differences in Personality. Comp. in Hum. Beha. 19, 59–70 (2003) 22. Horwitz, E.K.: Preliminary Evidence for the Reliability and Validity of a Foreign Language Anxiety Scale. In: Horwitz, E.K., Young, D.J. (eds.) language anxiety: from theory and research classroom implications, pp. 37–39. Prentice-Hall, Englewood Cliffs (1991) 23. Horwitz, E.K., Young, D.J.: Language Anxiety from Theory and Research Classroom Implications, pp. 37–39. Prentice-Hall, Englewood Cliffs (1986)
102
H.-J. Yang and Y.-L. Lay
24. Hu, P.J.H., Clark, T.H.K., Ma, W.W.: Examining Technology Acceptance by School Teachers: A Longitudinal Study. Info. & Manag. 41, 227–241 (2003) 25. Igbaria, M., Iivari, J.: The Effects of Self-efficacy on Computer Usage. Int. J. of Manag. Sci. 23(6), 587–605 (1995) 26. Joreskog, K., Sorbom, D.: LISRELL 8: Structural Equation Modeling with the SIMPLIS Command Language, Erlbaum, Hillsdale, New York (1993) 27. Kaiser, H.: The Varimax Criterion for Analytic Rotation of Factors. Psychometrika 23, 187–200 (1958) 28. Kerlinger, F.N.: Foundations of Behavior Research, 3rd edn. Holt, Rineheart and Winston, Forth Worth (1986) 29. Landau, S.: Security, Liberty, and Electronic Communications. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 355–372. Springer, Heidelberg (2004) 30. Liao, S.Y., Shao, Y.P., Wang, H.Q., Chen, A.: The Adoption of Virtual Banking: An Empirical Study. Int. J. of Info. Manag. 19, 63–74 (1999) 31. Maclntyre, P.D., Gardner, R.C.: Methods and Results in the Study of Anxiety and Language Learning: A Review of the Literature. Language Learning 41, 85–117 (1991) 32. Maitland, C.: Measurement of Computer/Internet Self-efficacy: A Preliminary Analysis of Computer Self-efficacy and Internet Self-efficacy Measurement Instruments. Newsbytes New Network (1996), http://www.newsbytes.com/pubNews/97/105125.html (access date, May 5 2004) 33. Mak, A.S., Tran, C.: Big Five Personality and Cultural Relocation Factors in Vietnamese Australian Students Intercultural Social Self-efficacy. International Journal of Intercultural Relations 25, 181–201 (2001) 34. Marakas, G.M., Yi, M.Y., Johnson, R.D.: The Multilevel and Multifaceted Character of Computer Self-efficacy: Toward Clarification of the Construct and an Integrative Framework for Research. Information System Research 9(2), 126–163 (1998) 35. Nunnally, J.C.: Psychometric Theory, 2nd edn. McGraw-Hill, New York (1978) 36. Nvision, 4 out of 5 Users Never Re-visit the Average Web Site. CyberAtlas (1999), http://cyberatlas.internet.com/big-picture/demgraphics/ article/0,1323,5931_212071,000.html (access date, April 25 2008) 37. O’Cass, A., Fenech, T.: Web Retailing Adoption: Exploring the Nature of Internet Users Web Retailing Behavior. J. of Reta. Cons. Service 10, 81–94 (2003) 38. Oxford, R.L.: Language Learning Strategies: What Every Teacher Should Know. Heinle & Heinle Publishers, Boston (1990) 39. Rampodi-Hnilo, L.A.: The Hierarchy of Self-efficacy and Development of an Internet Self-efficacy Scale. Department of telecommunication, Michigan State University (access date, April 25 2004) (1996), http://www.tc.msu.edu/TC960/self.html 40. Reisinger, Y., Mavondo, T.: Travel Anxiety and Intention to Travel Internationally: Implications of Travel Risk Perception. Journal of Travel Research 43(3), 212–225 (2005) 41. Spielberger, C.D.: Manual for the State-trait Anxiety Inventory (Form Y). Consulting Psychological Press, Palo-Alto (1983) 42. Torkazdeh, G., Van Dyke, T.P.: Development and Validation of an Internet Self-efficacy Scale. Behavior and Information Technology 20(4), 275–280 (2001) 43. Tung, C.H., Chang, S.C.: Exploring Adolescents’ Intentions Regarding the Online Learning Courses in Taiwan, October 1, vol. 10, pp. 729–730 (2007) 44. Wood, R., Bandura, A.: Social Cognitive Theory of Organizational Management. Academy of Management Review, 361–384 (July 1989) 45. Yates, G.C.R., Chandler, M.: The Cognitive Psychology of Knowledge: Basic Research Findings and Educational Implications. Australian of Education 35, 131–153 (1991)
Enhancing Document Clustering through Heuristics and Summary-Based Pre-processing Sri Harsha Allamraju and Robert Chun San Jose State University, Department of Computer Science, San Jose, CA 95192
[email protected],
[email protected]
Abstract. Knowledge workers are burdened with information overload. The information they need might be scattered in many places, buried in a file system, in their email, or on the web. Traditional Clustering algorithms help in assimilating these wide sources of information and generating meaningful relationships amongst them. A typical clustering preprocessing involves tokenization, removal of stop words, stemming, pruning etc. In this paper, we propose the use of summary and heuristics of a document as a pre-processing technique. This technique preserves the formatting of a document and uses this information for producing better clusters. In addition, only a summary of a document is used as the basis for clustering instead of the whole document. Clustering algorithms using the proposed pre-processing technique on formatted documents resulted in improved and more meaningful clusters. Keywords: document clustering, clustering, summarization, heuristics.
1 Introduction In today’s information age, a typical computer user’s information is stored in many places. This information is stored in many forms but can be broadly classified into two forms, namely formatted and un-formatted documents. Unformatted documents typically include plain-text files, whereas formatted documents include word documents, presentations, and web pages. File managers allow users to store information in treestructured hierarchies, also known as folders. Thus the user faces the ontological burden of classifying one’s documents and storing them in relevant folders. With hard disk storage space in the orders of gigabytes, the user is burdened with information overload. Clustering algorithms help in grouping objects into one cluster based on a similarity measure. When applied on documents, this can help in grouping similar documents based on content. Many clustering algorithms have been proposed for clustering documents. A typical clustering process involves tokenization, removal of stop words, stemming and pruning. A formatted document such as a word document typically consists of headings, emphasized words, de-emphasized words, italicized words etc. Also the font, size and color of text in these documents vary. This implies that some words in the document are more important than others. This importance is evident from the increased human readability of a formatted document over that of an unformatted one. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 105–113, 2009. © Springer-Verlag Berlin Heidelberg 2009
106
S.H. Allamraju and R. Chun
In the traditional clustering pre-processing step, the document is first tokenized. However, once tokenized, the words lose their formatting. This implies that all words contribute equally to clustering irrespective of its formatting in the original document. In this paper, we propose an additional pre-processing step. For each document, a representative document is generated. This is obtained by combining the summary of a document with its heuristics. The heuristics of a document is the set of words which are emphasized in the document through the author’s use of various formatting techniques. For example, words that are bolded, underlined, italicized, or that appear in headings contribute valuable heuristics concerning the document’s content. The summary of a document is obtained by using a document summarization algorithm. The document’s summary, together with its heuristics, is clustered instead of the whole document. The proposed approach has many advantages. Firstly, it takes into account the formatting of the document. This helps in identifying words in the document which are more important and representative of the document’s content. Secondly, only the summary of the document is utilized for clustering rather than the entire document. This helps in reducing the “noise” in a document and gathers sentences that are of utmost importance. Therefore, the summary of the document presents a realistic view of the document’s content. Thirdly, the proposed pre-processing helps in producing more accurate and realistic clusters. The rest of the paper is organized as follows. Section 2 describes existing work done with relevance to the topic of this paper. Section 3 consists of a detailed explanation of the proposed pre-processing technique. Section 4 provides experimental results. Section 5 outlines some concluding remarks, ongoing research and future direction of work.
2 Related Work Clustering algorithms have a variety of applications and are used in various fields such as image segmentation, object recognition and information retrieval. Different Clustering algorithms work best for different types of data. Jain et al. [1] provides an overview of Data Clustering techniques. Budzik et al. [2] proposed a system which extracts keywords from a document that are representative of the document’s content. These keywords are later fed to a web search engine and web pages related to the context in which the user is working is shown. In order to extract search terms from a document, the authors proposed a set of heuristics. A subset of these heuristics forms the basis of the proposed preprocessing technique discussed here. CACTUS [9] attempts to cluster categorical data using summaries. Visser et al. [3] built an automatic summarizer system based on word frequency count, cue phrase, location, title and query method. In the word frequency method, each sentence is assigned a score based on the relevant words in that sentence. In a cue phrase method, each sentence was assigned a cue score based on the presence of relevant and important phrases. In the location method, a score is assigned to the sentence based on its location in a paragraph or proximity to headings. In the title method, sentences containing words present in the document’s title are given a higher score. In the query method, sentences matching the query words are given more importance. The final score
Enhancing Document Clustering
107
of each sentence is obtained by weighted sum of above-mentioned features. Thus, the summary obtained gives the list of sentences which are of utmost importance and most representative of the document’s content. This summarization technique is also used as a basis of the proposed pre-processing technique described next.
3 Proposed Pre-processing Technique Traditional Clustering algorithms pre-process the input data through tokenization, stemming, pruning and removal of stop words. This works well for unformatted documents such as plain-text files. However by using the same pre-processing technique for formatted documents such as word documents, presentations, web pages etc., certain important information is lost. This is reflected in the quality of clusters thus obtained. This paper stands by the premise that formatted documents contain some more information than plain-text files and so should be treated differently. The proposed pre-processing step consists of two components, namely document heuristics and summarization. The following section describes each of them in detail.
Fig. 1. Positioning of proposed pre-processing technique with respect to existing practice
3.1 Document Heuristics Budzik et al. [2] proposed a set of heuristics for extracting important keywords from a document. They are as follows: 1) remove stop words, 2) value frequently used words, 3) value emphasized words, 4) value words that appear in the beginning of the document rather than at the end, 5) punish words appearing to be intentionally deemphasized, 6) ignore the ordering of words in a list and 7) ignore words that occur in sections of the document that are not indicative of the document’s content. Of the heuristics mentioned above, some of the heuristics are not applied here since they are taken into consideration by the document summarizer. One of the main heuristics used by the proposed pre-processing technique is to value emphasized words and to punish words appearing to be intentionally de-emphasized. Emphasized words refer to the set of words that are formatted as bold characters, italicized, underlined, appear in capital letters and in headings. In addition, words that are colored are also considered emphasized. De-emphasized words refer to words that have a font size smaller than that of the majority of words in the document. Thus, by subjecting a document to heuristics, a set of emphasized words is obtained. However, in the case of plain-text files, there is no formatting present. Therefore, these particular heuristics will not produce any results on plain text documents.
108
S.H. Allamraju and R. Chun
3.2 Document Summary Automatic Text Summarization is one of the important aspects of the proposed preprocessing technique. The main idea of using automatic summarization is to use the portion of the document that is most important and that can represent the whole document in terms of its content and context. By using automatic summarization, only those sentences of the document are obtained which are most relevant. Thus, this reduces the noise in the clustering data that might be obtained due to the presence of unwanted sentences and words. Therefore, instead of performing clustering on the whole document, only the summary and heuristics of the document are used. The automatic summarizer built by Visser et al. [3] was created for generating summaries of scientific documents. Since most of the scientific documents such as research papers are well formatted, the following summarizer was chosen for improved accuracy. The summary is generated by the system based on weighted scores obtained by word frequency count, cue phrase, location, title and query. However, since there is no query involved in clustering, a weight of zero is assigned for the query method. In addition, the original summarizer was designed to work more effectively for scientific documents. The cue phrase method looks for certain phrases most frequently found in research papers and other scientific documents. To keep the summarizer more generic in nature, the cue phrase method is also assigned a weight of zero. 3.3 Process The whole process of the proposed pre-processing technique is as shown in Fig 2. The document is first sent to a heuristic analyzer. It returns a set of emphasized words in the document. Then, the document is fed to an automatic summarizer. This returns the summary of the document. The summary along with the words obtained from heuristics is stored in a file. This new file is a representative document for the original formatted document. Thus for each document in the corpus which is to be clustered, a representative document is generated which consists of its summary and heuristics. These representative documents are used for clustering instead of the original documents. The standard pre-processing techniques are still applied to the representative documents and then sent as input to the clustering engine. Once the representative documents are clustered, they are mapped back to their original document.
Fig. 2. Proposed Pre-processing technique in detail
Enhancing Document Clustering
109
Fig. 3. Comparison between original document and its representative document
4 Experimental Setup 4.1 Clustering Engine The CLUTO [4] clustering toolkit was used for clustering documents. It is a highly scalable toolkit and can be used on high dimensional dataset. A wide variety of clustering algorithms such as partitional, agglomerative and graph-partitioning based have been implemented in this toolkit. An extensive selection of similarity measure functions is available such as Euclidean, Cosine, correlation co-efficient and Jaccard. In addition, a user defined similarity measure can be used. The CLUTO clustering toolkit provides detailed reports for each clustering activity. Different external quality measures such as entropy and purity are computed for each cluster. In addition, CLUTO implements five new clustering criterion functions proposed by Karypis et al. [5]. 4.2 Visualization Engine In order to understand the topology of the clusters that are obtained, gCLUTO [6] was used for visualization. It produces two types of visualizations on cluster data, namely Matrix Visualization and Mountain Visualization. The Mountain Visualization was used here for analyzing the results. The Mountain Visualization technique uses peaks to represent clusters. The degree of separation of one cluster from the other denotes the relative similarity of clusters. Clusters that are very similar to each other are close to each other and in some cases overlap. This visualization is effective in understanding the relationships between clusters. Each peak corresponds to a single cluster. These peaks are of varying height, size and color. The height of the peak represents the internal cluster similarity. The higher the peak the greater is the internal similarity and vice versa. The volume of the peak represents the number of documents in a cluster. The color of the peaks represents the internal standard deviation. Different colors mean different levels of deviation. Red represents a low standard deviation, whereas blue corresponds to high
110
S.H. Allamraju and R. Chun
Fig. 4. Mountain Visualization of sample data showing peaks of different heights and colors
standard deviation. Clusters with high standard deviations are noisy and are definitely unwanted. Colors such as red, orange, yellow and green represent standard deviations from low to medium range, with red representing least standard deviation. Thus an ideal clustering would be expected to have distinct clusters with low standard deviation and high internal cluster similarity. This corresponds to a mountain visualization consisting of high non-overlapping peaks colored in red or orange. 4.3 Clustering Algorithms Different clustering algorithms work best for different datasets. Primarily, the agglomerative clustering algorithm was applied on documents for clustering. Given the required number of clusters, the agglomerative algorithm first assigns each document to its own cluster and then merges other documents repeatedly until the required number of clusters is obtained. The criterion used for merging one document into a cluster depends on the merging schemes. The CLUTO clustering toolkit supports a wide variety of merging schemes, namely single-link, complete-link and group average approaches in addition to seven new merging schemes. 4.4 Dataset The Reuters Transcribed Subset dataset1 was used to evaluate the effectiveness of the proposed pre-processing technique. This dataset is a subset of Reuters-21578 collection2. The Reuters Transcribed dataset consists of 20 files picked from each of the 10 largest classes in the Reuters-21578 collection. These files were generated by an automatic speech recognition (ASR) system. This possibly introduces a certain degree of noise in the data. However, the proposed pre-processing technique is not highly effective when applied to plain-text documents without any formatting. Thus, all the 200 files were manually formatted using headings, bolds, italics and other forms of emphasis. 1 2
Available at http://kdd.ics.uci.edu/databases/reuters_transcribed/reuters_transcribed.html Available at http://www.daviddlewis.com/resources/testcollections/reuters21578/
Enhancing Document Clustering
111
5 Results The Reuters Transcribed dataset is formatted and then subjected to the proposed preprocessing technique. The document is sent to a heuristics analyzer and a document summarizer. The words returned by the heuristic analyzer are merged with the summary obtained from the document summarizer. This merged document becomes the representative document for the original document and is clustered on behalf of the original document. An agglomerative k-means clustering algorithm is used. Since the original dataset is manually classified into 10 categories, the value of k for the k-means clustering algorithm is given as 10. The CLUTO clustering engine applies the agglomerative k-means clustering algorithm and tries to divide the given input dataset into 10 clusters.
Fig. 5. Mountain Visualization of Reuters Transcribed Subset Dataset using standard clustering techniques
Once the clusters are obtained, the gCLUTO software is used to visualize the resulting cluster formation. The gCLUTO software takes a matrix file as input. A matrix file is an intermediate file generated by the CLUTO clustering engine. The columns in the matrix file correspond to unique words in the given document corpus. The rows represent the document number. Once the required matrix file is given as input to gCLUTO, Mountain Visualization of resulting clusters is generated and displayed. In the first experiment, the test dataset is pre-processed using standard clustering techniques and then sent to the clustering engine for k-means agglomerative clustering. The resultant clusters obtained are as shown in Fig. 5. By observing the Mountain Visualization (shown in Fig. 4.) it is seen that some of the clusters are overlapped and are not separated such as clusters (0,6) and (3,7). Most of the peaks (clusters) are green in color indicating a comparatively high standard deviation. In addition, cluster 2 is pale blue in color indicating very high standard deviation. Cluster 4 is the highest of all indicating greater internal similarity. In the second experiment, the test dataset is pre-processed using the proposed preprocessing techniques and then sent to the clustering engine for k-means agglomerative clustering. The resultant clusters obtained are as shown in Fig. 6.
112
S.H. Allamraju and R. Chun
The Mountain Visualization of cluster distribution obtained by applying the proposed summary and heuristics-based pre-processing produced better and elegant results. This is evident from the visualizations obtained as shown in Fig. 6. Most of the clusters are in red, orange or yellow colors indicating a low standard deviation within clusters. Most of the clusters are of similar height indicating a uniform distribution of internal similarity across clusters. Cluster 4 is the highest peak, indicating high internal similarity. The clusters are evenly distributed and well separated from one another. In comparison with the standard clustering technique, it is observed that the proposed pre-processing technique helped in creating distinctly separated clusters. Also, the overall internal similarity of elements within a cluster is increased. Additionally, by using the proposed heuristics and summary-based pre-processing, there has been an improvement in standard deviation within clusters. The proposed pre-processing produced clusters with low internal standard deviation. It can supplement, and can be used in conjunction with, existing approaches to clustering. The augmentation of traditional clustering techniques with our proposed heuristics and summary based technique does not add significant processing times – the analysis of the 200 News articles took just 1.2 seconds more than otherwise.
Fig. 6. Mountain Visualization of Reuters Transcribed Subset Dataset using proposed preprocessing techniques
6 Conclusions and Future Work In this paper, we proposed summary and heuristics-based pre-processing for document clustering. We have shown that this pre-processing technique results in clusters with improved internal and external cluster quality measures than compared to existing clustering techniques. We have used a visualization technique as the basis for such a conclusion. This states that when clustering formatted documents, using just the summary and heuristics themselves are enough and that the whole text of the document may not be necessary. Heuristics and Summary-based pre-processing opens a new dimension in document clustering.
Enhancing Document Clustering
113
The proposed pre-processing technique has been applied only on the Reuters Transcribed Subset dataset. The plain-text dataset was converted into individual word documents and the news article title was manually set to bold for each file. However, in case of research papers, which follow certain naming conventions for headings such as “Abstract”, “Related Work”, “Conclusions” etc. it would be interesting to observe if the proposed pre-processing would actually be able to classify them into different classes. In short, the effectiveness of the proposed pre-processing steps should be examined using a wider variety of input files as test data, and that it is early to conclude the generality of the proposed pre-processing technique. In this paper, the given dataset is pre-processed using the proposed technique and then subjected to agglomerative k-means clustering. However, it would be interesting to see the effect of using other clustering algorithms on the proposed pre-processing technique. In addition, it would be interesting to note the effect of different summary ratios on the quality of clusters. This would lead to finding an ideal summary ratio value where the quality of clusters produced is optimal for this kind of pre-processing. Finally the effect of using a weighted approach to summary and heuristic preprocessing should be investigated.
References 1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999) 2. Budzik, J., Hammond, K.J., Birnbaum, L.: Information access in context. Knowledge-Based Systems 14, 37–53 (2001) 3. Visser, W.T., Wieling, M.B.: Sentence-based Summarization of Scientific Documents. The design and implementation of an online available automatic summarizer. Report (2009), http://home.hccnet.nl/m.b.wieling/files/wielingvisser05automa ticsummarization.pdf (last retrieved Febuary 12, 2009) 4. Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for documents datasets. In: International Conference on Information and Knowledge Management, McLean, Virginia, United States, pp. 515–524 (2002) 5. Zhao, Y., & Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Technical report, Department of Computer Science, University of Minnesota 6. Rasmussen, M., Karypis, G.: gCLUTO: An interactive clustering, visualization and analysis system. Technical Report 04-021, University of Minnesota, s (2004) 7. Reuters-21578 Dataset, http://kdd.ics.uci.edu/databases/reuters_transcribed/ reuters_transcribed.html 8. Reuters Transcribed Subset Dataset, http://www.daviddlewis.com/resources/ testcollections/reuters21578/ 9. Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS-Clustering Categorical Data Using Summaries. In: Proceedings of the ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, United States (1999)
Email Reply Prediction: A Machine Learning Approach Taiwo Ayodele, Shikun Zhou, and Rinat Khusainov Department of Electronics and Computer Engineering, University of Portsmouth, Anglsea Building, Anglsea Road, Portsmouth, PO1 3DJ, Hampshire, United kingdom {taiwo.ayodele,shikun.zhou,rinat.khusainov}@port.ac.uk
Abstract. Email has now become the most-used communication tool in the world and has also become the primary business productivity applications for most organizations and individuals. With the ever increasing popularity of emails, email over-load and prioritization becomes a major problem for many email users. Users spend a lot of time reading, replying and organizing their emails. To help users organize and prioritize their email messages, we propose a new framework; email reply prediction with unsupervised learning. The goal is to provide concise, highly structured and prioritized emails, thus saving the user from browsing through each email one by one and help to save time. In this paper, we discuss the features used to differentiate emails, show promising initial results with unsupervised machine learning model, and outline future directions for this work. Keywords: Email reply prediction, machine learning, email messages, interrogative words, need reply, do not need reply, email headers, unsupervised learning.
1 Introduction One of the annoying things is when someone does not get back after sending so many email messages to them or when one is waiting to hear back from a friend or colleague at work about completing a particular project which can have a severe impact on the overall operation. This can be frustrating. Email prediction is a method of anticipating if email messages received require a reply or did not require any urgent attention. Our email prediction system will enable email users to both manage their email inboxes and at the same time manage their time more efficiently. Whittaker and Sider [1] analyzed the use of email to perform task management, personal archiving and asynchronous communication and referred to the three as “email overload”. They concluded that: • Users perform a large variety of work related task with email and • As a result, users are overwhelmed with the amount of information in their mail box. The existing solutions to email reply prediction relied on the intuition that user’s previous patterns of communication are indicative of future behaviour [2]. Also M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 114–123, 2009. © Springer-Verlag Berlin Heidelberg 2009
Email Reply Prediction: A Machine Learning Approach
115
Dredze et al [3] provided solutions to email reply prediction by assessing date and time in email messages as email containing date and time are time sensitive and may require a reply. Other studies have focused on how people save their email, what purposes it serves for them, and its importance as a tool for coordination in everyday life [5, 6, 7, 8]. This paper proposes to solve the problem of email prioritization and overload by determining if email received needs reply. Our prediction system provides a better and efficient way of prioritizing email messages as well as provides a new method to email reply prediction.
2 Previous Work Because email is one of the most used communication tools in the world. Sproull and Kiesler [8] provide a summary of much of the early work on the social and organizational aspects of email. Here we will focus on work about email reply prediction strategies, as well as research dedicated to alleviating the problem of “email overload and prioritization.” Mackay [7] observed that people used email in highly diverse ways, and Whittaker and Sidner [1] extended this work. They found that in addition to basic communication, email was “overloaded” in the sense of being used for a wide variety of tasks-communication, reminders, contact management, task management, and information storage. Ducheneaut and Bellotti [5] performed a study of email usage in three organizations and found, as had previous authors, that email was being used for a wide variety of functions. In particular, they noted that people used emails as reminders for things they had to do and for task management more generally. Mackay [7] also noted that people fell into one of two categories in handling their email: prioritizers or archivers. Prioritizers managed messages as they came in, keeping tight control of their inbox, whereas archivers archived information for later use, making sure they did not miss important messages. Tyler and Tang in a recent interview study identified several factors that may influence likelihood of response [9].
3 System Framework We used machine learning techniques for finding salient noun phrases, interrogative words, questions marks, date and time that can determine whether email message require a reply. This section describes the three steps involved in this prioritization task: • What representation is appropriate for the information to be prioritized as relevant or non-relevant phrases, interrogative words • Which features should be associated with each and • Which model should be used? We implemented machine learning approach to solve the problem of email reply predictions. Machine learning is learning the theory automatically from the data, through a process of inference, model fitting, or learning from examples. It is also an
116
T. Ayodele, S. Zhou, and R. Khusainov
automated extraction of useful information from a body of data by building a good probabilistic model. 3.1 Importance of Learning Our work involves machine learning because it is the underlying method that enables us to generate high statistical results. These are the importance of machine learning as applied in our work: • Environments change over time, and new knowledge is constantly being discovered. A continuous redesign of the systems “by hand” may be difficult. So, machine that can adapt to changing environment would reduce the need for constant redesign. • New knowledge about tasks is constantly being discovered by humans. Like vocabulary changes, and there is constant stream of new events in the world. Continuing redesign of a system to conform to new knowledge is impractical, but machine learning methods might be able to tract much of it. Figure 1 shows the schematic diagram. Figure 1: A schematic diagram of the architecture for email words extraction from incoming email messages for efficient reply prediction proposes in this work. Raw emails
Algorithm Processing
Predicting Algorithm
Message control
Need reply -1 Do not need reply- 0
Fig. 1. Architecture for words extraction from incoming email
Our proposed prediction system accept email messages as input data and as the emails are passed unto our machine learning algorithm, features are obtained from each emails and the predictor determines in numeric values the mails that require replies and the emails that does not require replies as shown in figure 1 above.
4
Email Reply Prediction (ERP)
This is a decision making system that could determine if emails received require a reply. For any given email datasets, there are multiple email conversations and to capture these different conversations, we assume that if one email was a reply to the sender’s original message, then such a mail may require attention and this is where
Email Reply Prediction: A Machine Learning Approach
117
our email reply scoring method originated. We developed a dictionary of favourite users’ words- this is a dictionary of words that email users keep and noted for their favourable word that they use when communicating using their email, develop a scoring mechanism for each annotated emails and the more score that a mail acquire the more apparent such email needs reply. All emails have same scores zeros at the beginning of the analysis. Negative score is possible. Also each email has been annotated with these properties: • definitely need reply, definitely not need reply If one email have both “definitely need reply” and “definitely not need reply” then this properties delete each other but this case is rare. The term “definitely need reply” status is given if our algorithm detect phrases such as “please reply soon” and “definitely not need reply” status is given when we found phrases such as “do not reply” or address such as
[email protected] or if other extracted features in the email content does not suggest any urgency. In the case that a phrase such as do not reply is found, our system has been designed in such a way that such a mail will still be brought to the attention of the email user but will be flagged as do not need a reply but it will be in the email client to be read any time. Our scoring system changes score allocation to each email before making a decision if they need a reply or not and also, if it founds interrogative words or questions and questions mark (s) in email messages, it increases or decreases the score. The other features that we investigated are: e.g. • Interrogative words: Can…? Could…? will…? Who ls…? has…? have…? may…? need…? are…? is…? Etc • Who…? where…? what…? or common pattern Why…? how…? • Communications from sender: if communication with sender was earlier then increase score (“Re:”-letters) (user sent emails analysis need) • if sender send emails earlier and that was not answered then decrease score(user sent emails analysis need) • Email Domains: if senders name is something like “… .com” then decrease score, it is big companies emailing • Email Fields: when many emails in “CC” field then decrease score • Dictionary of words: if email has many words that interesting to user then increase score, dictionary of favorite users words need • Attachments: if big attachment in letter then increase scores (photo or interesting pdf-article from friend) . 4.1 Scoring Method Our approach analysed the feature of emails namely; phrases, interrogative words, questions and question mark, attachments, early communications of senders and many other aforementioned features in the email and our algorithm prediction system (APS) performs unsupervised scoring methods using weighting measures [4]. All new emails have number – score. Then more score then more email need reply. We calculate the weighting scores on the features of the email by implementing a method called “the inner product” with its elements. We collect n numbers of emails using this function below:
118
T. Ayodele, S. Zhou, and R. Khusainov
S q ,e =
∑ (w
q ,t
.we,t )
T
Here, we ,t is the email-term weight while query-term weight is denoted by wq ,t and we also denote these various set: • • • •
the set E of emails; for each term t, the set Et of emails containing t; the set T distinct terms in the database and the set Te of distinct terms in emails e , and similarly Tq for queries and
Tq ,e = Tq ∩ Te The terms are the features extracted to determine the email prediction namely: phrases, interrogative words, question marks, attachments and many more. When the formula above is applied, the average weighting score is calculated for each email and if it is above the set threshold, then that mail will be categorized as need reply or do not need reply as given relevant item is retrievable without retrieving number of irrelevant items. Our predictor assigns a weight score to any question (s), question mark (s) found in email subject as well as contents of the mail. For example: A question in the subject has a weight score of 3 point of value and a weight score of 2 in the body of the email message. Do note that a question is a sentence that ends with the sign "?" and start with an interrogation pattern like: "where", "when", etc. Also, a score of 1 is assigned to the following sample features: "if communication with sender was earlier (“Re:”letters)", emails from specific domain (.ac.uk, .edu), phrases such as “please reply soon”, if there is email address in cc or bcc, all these are assigned a score of 1. The prediction analysts concluded that the maximum weight score that could be assigned to every email is 10 and choose 7 as the threshold weighting score that a mail must attain before it could be grouped as “need reply- 1” and any email that does not measure up to the threshold will be re-examined and if other factors have been reassessed and could not meet up with the threshold at the second attempt, then it will be grouped as “do not need reply- 0. 4.2 Email Prediction Methods (EPM) Email space is a function of the manner in which terms and term weights are assigned to the various emails with an optimum email space configuration that provides an effective performance. We make use of inner product method. Inner product space is a vector space of arbitrary (possibly infinite) dimension with additional structure, which, among other things, enables generalization of concepts from two or threedimensional Euclidean geometry. Since our annotated emails from Enron corpus are treated like a bulk of dataset, we used term weighting with unsupervised techniques with our approach of heuristic techniques to provide a well organised and prioritized email prediction system.
Email Reply Prediction: A Machine Learning Approach
119
Reply Prediction algorithm 1. Define X as the number of matching needed to mark the message needs reply 2. Define Count as the number of matching =0 3. If CC or BCC contains email addresses then a. Count = Count+1 4. create a rule that a. If the contents contains some of these words i. Count = Count+1 b. must, should, what about, meeting ,priority, i. Count = Count+1 c. Dear, hello, hi i. Count = Count+1 d. Multiple of "?" i. Count = Count+1 e. Dates or months names i. Count = Count+1 f. AM,PM i. Count = Count+1 5. if(Count > X) a. then mail need reply b. Else c. mail doesn’t need reply }
Fig. 2. Algorithm prediction System
4.3
Algorithm Prediction System (APS)
Algorithm prediction system uses a heuristics-based approach with embedded favorite dictionary of word and phrases, with weighting measures. The assumption is that if interrogative words, questions, questions mark(s), phrases such as do reply, when will you, if date and time are found in email messages, such a mail is important and will be assigned some score. The algorithm is shown in figure 2. Algorithm prediction system (APS) for email management is a new unsupervised machine learning techniques that is implemented. APS described above uses a precision and recall to evaluate this new technique in comparison with gold standard (Human participants).
5 Dataset Setup We collected over 5000 email conversations from the Enron email dataset [9], a publicly available corpus with about 150 users and 120,000 emails as the test bed and had 80 human reviewers to review the email prediction system. Notice that having such a gold standard may also be used to verify our assumptions and algorithm. We annotated 5000 emails to determine the original class with numeric values: need reply- 1 and do not need reply- 0. We then used human annotated emails as the gold standard to compare our algorithm result with human review results. The 80 human prediction analysts reviewed those 5000 selected email conversations. All the analysts were undergraduate and graduate in university of Portsmouth. Their discipline covered various areas including Science and Engineering, Arts, Education, Law, Business and IT. Since many emails in the Enron datasets [9] relate to business, IT
120
T. Ayodele, S. Zhou, and R. Khusainov
and law issues, the variety of the human prediction analysts, especially those with business and legal background are of asset to this user study. Each prediction analysts reviewed 50 distinct email conversations in one hour. For each email features extracted as described above, human prediction analysts (hpa) retain the highest weighting score, and then choose threshold and emails found above this threshold to be categorized as need reply and any emails that does not reach up to the threshold will be categorized as do not need reply. Thus, our expectation that human-annotated email prediction will show great variation was borne out, we discuss these differences further in section 6.
6 Evaluations and Results In order to compare different approaches of email reply prediction, a gold standard is needed. In practice, for comparing extractive predictor, we annotated 5000 emails from Enron email corpus to either: • Need reply • Do not need reply We tested our algorithm with the embedded similarity measure approach on the 5000 email datasets. To measure the quality and goodness of the email prediction, gold standards are used as references. It is noticed that our unsupervised machine learning approach achieved 98% accuracy in comparison to the gold standard. A sample output is shown below:
Fig. 3. A sample Reply Predictor System
Figure 3 and 4 show an output sample of our email reply prediction mail client. As emails are being passed unto the prediction system, they aforementioned features are being extracted by our prediction model and the email messages in yellow colour indicate those that require a reply. This section describes experiments using APS system to automatically induced email features classifiers, using the features described in Section 4. Like many learning programmes, APS takes emails as input and the classes to be learned, a set of features names and possible values and training data specifying the class and feature values for each training example. In our case, the training examples are the Enron email datasets. APS outputs a classification model for
Email Reply Prediction: A Machine Learning Approach
121
predicting the class (i.e need reply- 1, do not need reply- 0). We obtained the results presented here using precision and recall. In this paper, we evaluated APS system based on weighting measures, and human judgments. To measure the quality and goodness of the email reply prediction system, gold standards are used as references. Human email predictions make up the gold standards. We evaluate our proposed email prediction system against human email predictions from human participants. Human participant detected replies by matching email features: • Interrogative words in email contents - when, where, can, how as these words indicate a request and a need for an answer. • Phrases - reply soon, need your help, looking forward to hearing from you, interview now as these phrases indicate time and urgency. We build most used phrases from the corpus and our algorithm keeps learning from this and keep expanding its knowledge with the new phrases and becomes more intelligent • previous email conversations - this is important because previous conversation is a clue to discover what has happened in the past and extract the old content and check with the new conversation to determine if such previous communication is vital to our decision making or not. • Attachments - most email users from our survey send attachments when there is need for clarity or as prove of a task or evidence of something and this indicate that an email message that contain attachments may need a reply and other extracted features that we investigated are “extracting interesting words as chosen by email user from dictionary, question mark (s) in emails and reference fields of a message with message Id from original message. Figure 4 below shows more evaluation graphical output.
Fig. 4. A sample of our prediction output
Our solutions to reduce email overloads, unstructured email messages, email congestions were analysed, tested and evaluated by group of participant from the university of Portsmouth and companies around the city ranging from academic and non-academic staffs. Evaluation Result ratio of male to female participant is 50:30
122
T. Ayodele, S. Zhou, and R. Khusainov
Precision and recall (compared to gold standard) Correct predicted group 4897
Total Predicted Group Found 4993
Total Emails 5097
Precision
Recall
98.1%
96.1%
Precision and recall on a per email basis
Fig. 5. Evaluation Result
and the whole group is a mixture of diverse cultures and backgrounds. There are 50 male that participated in the testing and evaluation while 30 females participated. The ratio does not change the output result of the analysts and that their decision is made based on realistic approach. Figure 4 explains how our reply prediction client works. The participants were separated into 2 groups and were given 1500 emails to analyse, annotate and predict the mails that require a reply. Group 1 confirmed that out of 1500 emails, only approximately 668 require a reply while group 2 confirmed that approximately 665 require a reply. With the human participants, the average estimation of mails that need reply is 667 which are estimated as 100% accurate. Our proposed email reply prediction system estimated that approximately 672 require a reply and is approximately 95% accurate as compared with gold standard. We also evaluate our algorithm prediction system using precision and recall as the measurement of evaluation for our system: • For 1500 emails, compute the recall and precision where a correct predicted group is found. • Given our prediction system whose input are email messages, and whose outputs are need reply- 1 and did not need reply- 0. The recall and precision is computed as:
Email Reply Prediction: A Machine Learning Approach
123
We evaluate our prediction algorithm’s performance by comparing performance of human participants with our proposed algorithm result. Figure 5 shows detail results.
7 Conclusion and Future Work In this paper, we presented a better way to prioritized email messages and automatically alert email users regarding email messages that require attention. Our solution to reduce email overload, overcome unstructured email boxes, reduce high volume of emails, reduce email congestions and overcome limited storage space for email messages is one of the better way to manage email messages. We did study the features of over 5000 email messages and train our prediction system based on the aforementioned features to extract and decide the email messages that require replies and keep learning as new incoming emails arrives. Our system appears to work better than existing approach and the unsupervised learning approach help the knowledge based and self learning trait and ability to continue to learn and be more intelligent and all these make our solution a better approach.
References 1. Whittaker, S., Sider, C.: Email overload: exploring personal information management of email. In: CHI 1996, pp. 276–283. ACM Press, New York (1996) 2. Tyler, J., Tang, J.: When can I expect an email response? A study of rhythms in email usage. In: Proc. of ECSCW 2003, pp. 238–258. Oulu Univ. Press (2003) 3. Dredze, M., Blitzer, J., Pereira, F.: Reply Expectation Prediction for Email Management. In: CEAS 2005, 2nd Conference on Email and Anti-Spam, Stanford University, CA (2005) 4. Salton, G., Wong, A., Yang, S.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (2005) 5. Ducheneaut, N., Belloti, V.: Email as habitat: An exploration of embedded personal information management. Interactions 8(5), 30–38 (2001) 6. Kraut, R.E., Attewell, P.: Media use in a global corporation: Electronic mail and organizational knowledge, in Culture of the Internet, pp. 323–342. Lawrence Erlbaum Associates, Mahwah (1997) 7. Mackay, W.: Diversity in the use of electronic mail: A preliminary inquiry. ACM Transactions on Office Information Systems 6(4), 380–397 (1988) 8. Sproull, L., Kiesler, S.: Connections: New ways of working in the networked organization. MIT Press, Cambridge (1991) 9. Klimt, B., Yang, Y.: The Enron corpus: A new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 217–226. Springer, Heidelberg (2004)
An End-to-End Proactive TCP Based on Available Bandwidth Estimation with Congestion Level Index Sangtae Bae1, Doohyung Lee2, Chihoon Lee2, Jinwook Chung2, Jahwan Koo3, and Suman Banerjee3 1
Korea Institute of S&T Evaluation and Planning
[email protected] 2 School of Information and Communication Engineering, Sungkyunkwan University, Chunchun-dong 300, Jangan-gu, Suwon 440-746, South Korea
[email protected],
[email protected],
[email protected] 3 Computer Sciences Department, University of Wisconsin-Madison, WI 53706, USA {jhkoo,suman}@cs.wisc.edu
Abstract. Transmission control protocol (TCP) is one of the core communication protocols of the Internet protocol suite. For this reason, significant enhancements on TCP have been made in both wired and wireless networks. In this paper, we propose an end-to-end proactive TCP based on available bandwidth estimation with congestion level index (CLI), called CLI-based TCP. From the previous TCP schemes, we have found that the TCP sender does not know how much the network is congested because network congestion is represented by only two status, congestion exists or not. Therefore, we define the concept of CLI, outline the procedure of the CLI algorithm, and describe how to realize the CLI-based TCP. In addition, we have shown that CLI-based TCP can handle network congestion more minutely and improve overall TCP performance. Simulation results show that under 90% traffic load, the CLIbased TCP outperforms TCP New Jersey by 49.8% improvement in goodput. Keywords: Transmission control protocol, end-to-end proactive approach, available bandwidth estimation, congestion level index.
1 Introduction The conventional transmission control protocol (TCP) may suffer from severe performance degradation in wireless networks due to atmospheric conditions, high transmission errors, temporal disconnections and multipath fading. The main reason for the TCP performance degradation in wireless networks is that the conventional TCP scheme (such as TCP Reno [1]) cannot distinguish between packet losses caused by transmission errors and those caused by network congestion, thus, reacting to these losses by reducing its congestion window cwnd. Consequently, these inappropriate reductions of the cwnd lead to unnecessary throughput degradation [2]. To cope with such limitation and degradation, several schemes have been proposed and are classified in [3] and [4]. In this paper, we focus on the end-to-end proactive approach based on available bandwidth estimation (ABE). M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 124–130, 2009. © Springer-Verlag Berlin Heidelberg 2009
An End-to-End Proactive TCP Based on Available Bandwidth Estimation
125
ABE estimates current end-to-end bandwidth available to a TCP connection and guides the sender to properly adjust its transmission rate. For instance, TCP Westwood [5] measures end-to-end available bandwidth by monitoring the interval of returning acknowledgments (ACKs) and uses it to compute the cwnd in case of three duplicate ACKs (DUPACKs) or a retransmission timeout (RTO) timeout. However, TCP Westwood does not discriminate the cause of packet loss. Therefore, it will adjust the sending rates constantly even upon experiencing packet losses by transmission errors, resulting in a lower throughput in high bit-error-rate (BER) wireless networks. In addition, TCP Westwood's ABE is rather complex because it is calculated by a low-pass filter using Tustin approximation. The other possible approach is to modify the conventional TCP to be implemented both bandwidth estimation and loss differentiation algorithm. For example, TCP New Jersey [6] aims to improve TCP performance using both the ABE at the sender and the congestion warning (CW) at the intermediate router. The CW is an explicit congestion notification that helps the sender to effectively distinguish packet losses caused by network congestion from those caused by transmission errors. In other words, TCP New Jersey calculates the size of its cwnd based on the bandwidth estimation by the ABE and the congestion notification by the CW. Consequently, the joint approach with the combination of ABE and CW allows to improve TCP performance even in high BER wireless networks. However, it requires implementation, deployment and management complexities in terms of the addition and modification of the sender-side and the intermediate router-side modules. Our research goal is to implement a new bandwidth estimation scheme, which has simpler bandwidth estimation than TCP Westwood and better performance than TCP New Jersey without the help of any router-supported modules just like CW. Toward these issues, we propose an end-to-end proactive TCP based on ABE with congestion level index (CLI).
2 Proposed Scheme CLI represents the current congestion status of the bottleneck link as a relative value of between 0 and 1 as shown in Fig. 1. The highest congestion level is the maximum of round trip time RTTmax that is represented by 1, whereas the lowest congestion level is the minimum of round trip time RTTmin that is represented by 0. In other words, current congestion level index CLIcur is matched by current round trip time RTTcur that is a specific value between RTTmin and RTTmax. Therefore, it can be defined as follows: (1) The pseudocode of the algorithm of calculating CLI is presented in the following. The window size W is a scaling factor that controls the size of the circular queue for storing the values of the measured RTTs. Procedure Calculate-CLI with parameter RTTcur is invoked upon receiving an ACK. RTTcur is inserted into the circular queue. After selecting the maximum and minimum RTT in the current circular queue, the current CLI is computed by using the linear function of CLI.
126
S. Bae et al.
Fig. 1. Congestion level index function
The proposed scheme adopts slow start, congestion avoidance, and fast recovery from TCP Reno, but implements the rate-based congestion window control algorithm based on ABE. The bandwidth estimation we propose follows the same idea as TCP Westwood to observe the rate of the returning ACKs in order to estimate the available bandwidth for the connection, but its estimator is simpler. On the other hand, the syndromes at the reverse link, such as ACK compression, ACK delay, and ACK losses, can have a significant impact on ABE's accuracy, and further on the overall performance of the TCP scheme. Thus, we also adopt TCP timestamp option proposed in [7] to overcome this problem. In the proposed scheme, instead of using the ACK arrival time, the packet arrival time, which is stamped by the receiver and delivered by the ACKs, is used in the bandwidth estimation. Its bandwidth estimation is thus less affected by the reverse link conditions. Consequently, upon receiving the nth ACK, the optimized bandwidth OBn is estimated as: (2) where CLIcur is the current CLI, Ln is the size of data acknowledged by the nth ACK, and Δ t is the time interval between nth and (n - 1)th packet arrivals at the receiver. Since the timestamp option is widely implemented in most of the TCP protocols, there will be no additional overhead. Given the segment size segsize, the size of the congestion window in units of segments upon the receipt of the nth ACK is calculated as: (3) To verify the effectiveness of our proposed scheme, called CLI-based TCP, we conducted the following simulation using the NS-2 network simulator [8]. The network topology consists of two network nodes, the sender node and the receiver node. The bottleneck link is configured as a 2 Mb/s error free duplex link with propagation delay of 1 ms. The bottleneck queue is a drop-tail queue that can contain a number of packets equal to the bandwidth-delay product of the connection. FTP traffic and CBR background traffic are simulated between the sender and the receiver.
An End-to-End Proactive TCP Based on Available Bandwidth Estimation
127
Their packet sizes are all equal to 1000 bytes. The rate of the CBR traffic varies during the simulation as follows. From time 20 to 30 s, the CBR source generates traffic at the rate of 1 Mb/s; from 30 to 40 s, the CBR generates traffic at the rate of 0.5 Mb/s, and then stops till the end of the simulation time. The total simulation time is 50 s.
(a) RTT trace
(b) CLI trace
Fig. 2. Behavior of CLI-based TCP
128
S. Bae et al.
(c) Estimated bandwidth Fig. 2. (continued)
Under the considered scenario, we first monitored a part of time traces of RTTs and CLIs as shown in Fig. 2(a) and 2(b), respectively, in which we can see the operation of the proposed scheme. In addition, Fig. 2(c) shows that the CLI based bandwidth estimation follows the changes of the available bandwidth fairly closely. Note that CLIs are based on RTTs and the values of CLIs become one of the main input parameters for the optimized bandwidth estimation.
3 Performance Evaluation The main performance metrics used to evaluate TCP schemes are the average goodput of a single TCP connection and the fairness index between a set of TCP connections. We also compare the proposed scheme with TCP Reno, Westwood and New Jersey in terms of average goodput and fairness index. More specifically, the average goodput of a single TCP connection is defined as the bandwidth delivered to the receiver from the sender, excluding duplicate packets. Meanwhile, the fairness index is normally used to show the fairness between a set of n TCP connections. We first investigated the average goodput for window size W. As mentioned in the previous section, window size W is the length of circular queue that can be optimized. As can be seen in Fig. 3(a), the average goodput for window size W is almost the same in the window sizes of between 1 and 5 regardless traffic loads. However, as the window size increases, the performance decreases significantly beyond the traffic load of 50 %. In addition, the selection of the window size presents a tradeoff: smaller values increase time responsiveness to network changes, but incur more CLI oscillations. Thus, we select window size of 3 in the following simulation. We setup the simulation for the average goodput where a TCP connection running the FTP application and CBR background traffic generated by UDP connections share a 2 Mb/s bottleneck link. UDP sources offer the traffic load of from 10 to 90 % between the sender and the receiver. Even if the performance of TCP Reno, Westwood and New Jersey decreases significantly as shown in Fig. 3(b), the CLIbased scheme is robust to high traffic loads because the CLI can indicate the level of current network congestion explicitly. Under 90 % traffic load, the CLI-based TCP
An End-to-End Proactive TCP Based on Available Bandwidth Estimation
(a) Window Size
129
(b) Goodput
(c) Fairness Fig. 3. Performance comparisons of TCP schemes
outperforms TCP New Jersey by 49.8 % improvement in goodput without any routersupported modules. Next we run the simulation for fairness index between a set of 10 TCP connections. The value of fairness index is between 0 and 1. If the throughput of all TCP connections is the same, the index will take the value of 1. Fig. 3(c) shows all TCP schemes maintain satisfactory fairness index, but the CLI-based TCP provides a more stable fairness than the other schemes in case of high traffic load.
4 Conclusion We have presented a new bandwidth estimation scheme, CLI-based TCP, which has simpler bandwidth estimation than TCP Westwood and better performance than TCP New Jersey without the help of any router-supported modules just like CW. The CLI measured at the sender side explicitly represents the current network congestion status of the end-to-end bottleneck link fairly closely and the optimized bandwidth estimation based on the CLI is calculated upon receiving the ACK arrival with TCP timestamp option. Throughout the simulation, we have evaluated that the CLI-based TCP is fairly robust to high traffic loads with the help of the direct indicator CLI.
130
S. Bae et al.
Acknowledgments This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD), under Contract KRF-2007-357-D00162.
References 1. Allman, M., Paxson, V., Stevens, W.: TCP Congestion Control, RFC 2581 (1999) 2. Lakshman, T.V., Madhow, U.: The Performance of TCP/IP for Networks with High Bandwidth-delay Products and Random Loss. IEEE/ACM Trans. Networking 5(3), 336–350 (1997) 3. Balakrishnan, H., Padmanabhan, V.N., Seshan, S., Katz, R.H.: A Comparison of Mechanisms for Improving TCP Performance over Wireless Links. IEEE/ACM Trans. Networking 5(6), 759–769 (1997) 4. Tian, Y., Xu, K., Ansari, N.: TCP in Wireless Environments: Problems and Solutions. IEEE Radio Communications, S27–S32 (2005) 5. Casetti, C., Gerla, M., Mascolo, S., Sansadidi, M.Y., Wang, R.: Westwood: End-to-end Congestion Control for Wired/Wireless Networks. Wireless Networks Journal 8, 467–479 (2002) 6. Xu, K., Tian, Y., Ansari, N.: Improving TCP Performance in Integrated Wireless Communications Networks. Computer Networks 47(2), 219–237 (2005) 7. Jacobson, V., Braden, R., Borman, D.: TCP Extensions for High Performance, RFC 1323 (1992) 8. Network Simulator NS-2, http://www.isi.edu/nsnam/ns
Smart Privacy Management in Ubiquitous Computing Environments Christian B¨ unnig University of Rostock, Institute of Computer Science Chair for Information and Communication Services
[email protected]
Abstract. Privacy in ubiquitous computing environments is primarily considered as a problem of protecting personal information from unauthorized access and misuse. Additionally it can also be seen as a process of interpersonal communication where not hiding but selective disclosure of personal information is the central issue, i.e. how users can practice privacy intuitively and dynamically in computerized environments similar to the analog world. In this work we discuss the management of private information concerning interpersonal privacy implications in smart environments. Existing work mostly does not match the intuitive and dynamic aspects of privacy in context of interpersonal communication. As an alternative we suggest an ad hoc approach to privacy management which uses learning techniques for an in situ disclosure assistance and present user interaction models for this disclosure assistance.
1
Introduction
Privacy in ubiquitous computing environments is still an open research issue. In most cases privacy is considered as a problem of protecting personal information to make it only visible to certain entities or from being misused by malicious entities. While this is an important aspect of privacy on the data level it neglects that privacy is much more than hiding personal information. Often people want to show certain information to other entities, e.g. to utilize personalized services, to receive information from other persons in return or to represent oneself to the public in a specific manner. Especially this self-representation in a social context is a dynamic process in which humans intuitively decide which personal face to show depending on the current situation [10]. The crucial issue is how ubiquitous computing environments with its potentially vast amount of communicated personal information influences intuitive capabilities to practise privacy. When acting in smart environments the personal information communicated within that environment creates further instances of a user’s self-representation which makes privacy management more complex compared to the analog world. Users
Christian B¨ unnig is funded by the German Research Foundation (DFG), Graduate School 1424 (Multimodal Smart Appliance Ensembles for Mobile Applications MuSAMA).
M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009,LNCS 5618, pp.131–139, 2009. c Springer-Verlag Berlin Heidelberg 2009
132
C. B¨ unnig
need an understanding of the implications of communicating specific personal information. In general there are two types of implications. The first is a technical one, e.g. a possible recording and post processing of the communicated information by the environment’s infrastructure. The second relates to interpersonal implications because other persons within the environment directly or indirectly perceive the communicated information. In this work we focus on interpersonal implications and assume a non-malicious infrastructure which handles private data as users expect it. Concerning interpersonal implications, in most cases users intend them as they are the reason for acting within a smart environment, e.g. because they want to collaborate with colleagues or socialize with friends. The point is how users can ensure that actual implications comply with intended ones, or in other words, how users can be enabled to practise interpersonal privacy in smart environments as similar as possible to the way it is done in a non-technical environment. Example Scenario A typical use case for smart environments is the support of collaborative work. Consider the following scenario as an illustrative example. A room is equipped with a collaboration desk – a desk with a touch screen surface used as a shared work space (Fig. 1). Depending on a specific application this desk may display and arrange various information, contributed by the persons working at the desk. A team working on a film project meets at this Fig. 1. A collaboration desk as an exdesk to discuss current results and upcom- ample service for interpersonal commuing tasks in the project. Depending on the nication in smart environments current state of the project, the persons present at the meeting and each person’s task in the project, each team member wants to provide specific information to the meeting by disclosing them to the collaboration desk service. Alice contributes some drafts for a poster, Bob has a story board ready and Clark has some suggestions concerning the crew casting. Dent, the organizer of the team, guides the meeting and controls and arranges the currently displayed information on the desk. Additionally Dent requests to access the calendar of the meeting participants in order to schedule upcoming events. All team members are working as freelancers – they may work at this desk in other projects, in other roles and with other team members too. Special characteristics of this scenario are that there is no distinct hierarchy among the persons and no fixed relationship between the persons and the room. Hence, looking only at the persons and the room, there is no inherent structure that could be used for predefined rule based disclosure models. This shall emphasize the in situ characteristic of information disclosure in such a meeting.
Smart Privacy Management in Ubiquitous Computing Environments
133
Overview The next section presents related work on user side privacy control in smart environments. As a result of this review we motivate an ad hoc approach of privacy management in ubiquitous computing environments. This ad hoc approach is described in section 3. An ad hoc privacy control requires a significant portion of a user’s attention to focus on privacy issues additionally to the actual task to fulfil within the environment. To reduce the necessary user attention we are developing an assistance for deciding the disclosure of personal information based on learning techniques. The general idea of such a learning approach is described in detail in one of our previous work [1]. A crucial aspect of a disclosure assistance is how users can interact with the assistance. The data managed by the assistance may be very sensitive. Thus in most cases users want to be able to understand and control automated disclosure decisions. In section 4 we present different approaches for user interaction with a disclosure assistance. The critical point here is the transparency of the assistance, i.e. how users can verify and align automated ad hoc disclosure decision with actual privacy preferences. Finally, in section 5, we summarize our work and provide an outlook on future work.
2
Related Work
Typical approaches to control personal information disclosure are rule and role based systems. Users can set up rules to decide conditions for disclosing an information and roles for specifying the information to disclose. Alternatively the terms policy (for a rule set) and identity or face (for a role) are used. An example for a policy based approach is given by Langheinrich [5]. He presents pawS, a privacy awareness system that extends P3P/APPEL1 for ubiquitous systems. Services within a smart environment announce their data handling policy expressed in P3P via service related privacy proxies. In return, users express privacy preferences using APPEL and personal privacy proxies. These proxies then negotiate the flow of personal information between a user and services within the environment. It is advantageous that this approach makes use of already existent concepts. On the other side, while there are user friendly tools for specifying privacy preferences and interacting with service-side policies in the realm of web activities [3], there is a lack of such tools for the domain of ubiquitous systems. Further pawS deals with privacy issues concerning the used infrastructure – our work focuses on interpersonal privacy implications. In the domain of location privacy, Myles et al. [9] suggest a rule based system to decide disclosure of location information (also partly using P3P/APPEL). Location information is shared by a location server which contacts user specific validators if someone requests a user’s location. Amongst other methods, these validators decide the location disclosure based on a set of rules defined by its users. The authors suggest that location provider offer appropriate rule set templates and additionally “wizards” help users to comfortably set up rules that 1
Specifications of P3P and APPEL can be found at http://www.w3.org/TR/.
134
C. B¨ unnig
differ from the templates. This approach limits its use for managing location. It could be extended for other types of personal information but that would add complexity to the rule sets users have to specify in their validators. Actually Prabaker et al. [11] have shown that users already have difficulties to set up disclosure rules for location as the only type of information to manage. Next to rules, there exist concepts which utilize virtual roles, identities [2,4,8] or faces [7] for managing disclosure of personal information in smart environments. The basic idea is to abstract a specific set of personal information to a role, e.g. “anonymous”, “private”, “job” or “public”. Roles are supposed to provide an easy to grasp way of managing personal information. However, role concepts always conflict between simple but too general and subtle but too complex. In fact Lederer highlights this problem of generality in a subsequent work [6]. Rule and role based approaches to control communication of personal information try to release users from repeatedly deciding information disclosure ad hoc. On the other side such preconfigured privacy fails in many scenarios as it contradicts to the way users normally practise privacy. Rules require users to specify their privacy preferences in advance in an abstract manner. As long as users have a clear idea of situations to come and how to handle their information in these situations rules and roles are a suitable way for managing personal information. In contrast when situations get more complex and the diversity of potentially communicated personal information grows or is not known in advance, policies are hard to create and maintain [11,6]. Roles, as a concept for managing the possible sets of information to disclose, fail if a user’s privacy preferences do not match a clear scheme like “private”, “job” or “public” but requires a more fine-grained selection of information. In that case the increased number of roles would make it hard for users to distinguish and maintain their roles. The main message of this review is that existent work on user side control of personal information disclosure in smart environments focuses on a priori configured privacy preferences, which conflicts with the dynamic and intuitive aspects of privacy. There are cases when the decision which information to communicate within a smart environment can only be decided ad hoc, in the moment of disclosure: when the number of possible situations is very high, when there are situations which cannot be predicted or when there are many, subtly differing variances in the personal information to disclose.
3
Ad Hoc Privacy Management
An ad hoc approach of controlling the flow of personal information in smart environments has some significant advantages. It releases users from drawing attention to privacy before the actual use of a system. Instead users decide the disclosure of their information in the moment when it is utilized by an environment. Hence, users are able to perceive the immediate implications of information disclosure and can perform privacy in conformance with intuitive and up to date preferences. Obviously a disadvantage is that a significant portion of a user’s attention, while interacting with services, is needed for privacy issues, which reduces the attention a user can draw on the actual task to fulfil with the service.
Smart Privacy Management in Ubiquitous Computing Environments
(a)
(b)
135
(c)
Fig. 2. Users decide the disclosure of personal information with their mobile devices during service interaction. A service lists its required personal information (a). For each information users are able to configure a disclosure for the requested information (b). Detailed meta information for the service in general and each requested information informs users about privacy implications when using this service (c).
In other words, an ad hod privacy control is more similar to natural privacy handling but it may overload a user’s attention. To balance this drawback an assistance is needed, that relieves users of frequently making disclosure decisions while still supporting the general idea of ad hoc privacy management. For this purpose we suggest a learning scheme that observes and learns a user’s decisions which information to disclose to which service in which situation (respectively context). Currently we are doing a user study which observes context and disclosure behavior of users in a virtual smart environment. Virtual means that users can discover and interact with services to negotiate communication of personal information, but the services’ proposed functionality is not present as it is not part of our observations. This enables us to be as flexible as possible in setting up service environments for our observation scenarios. Figure 2 illustrates how users interact with services in the context of ad hoc privacy management. First observation results have shown, that it is possible to correlate user disclosure decisions with context information 2 . Examples for context information utilized in our setting are nearby persons, devices and services, time (day of week, hour of day, . . . ), location and movement patterns of persons within the environment. The intended result of a learning scheme is to generalize this correlation to a disclosure decision model (DDM) that “mirrors” a user’s privacy preferences and that can be used as an agent managing private data on behalf of 2
Complete results and their analysis are part of a separate work, submitted for publication at the Symposium On Usable Privacy and Security, SOUPS 2009.
136
C. B¨ unnig
the user (or at least making suggestions for disclosure decisions) [1]. The challenges on learning a users disclosure behavior are input selection (determining disclosure-relevant context information and its levels of abstraction), learner implementation (choosing an appropriate method with regard to accuracy) and output representation (a user’s view on and interaction with a learned output model). The challenge focused here is the output representation, i.e. how users see and interact with a DDM. A DDM maps a service, its requested information and the context during the request to a disclosure decision. We use the following notation for this mapping. A service S requests several information labeled I1 , .., In . For each information Ii there is a set of possible disclosure configurations D(Ii ) = Ii1 , .., Iim . So D(Ii ) describes all configurations specified by a user for an information Ii . The subset DS (Ii ) ⊆ D(Ii ) describes all disclosure configurations of Ii previously used for the service S. The context C during the request is composed by several context information ci so that C = (ci , .., cm ). Given that, we express a user’s disclosure behavior as follows: S × I1 × .. × In × C → DS (I1 ) × ..DS (In )
(1)
This maps a service S, its requested information Ii and a context C to disclosure configurations DS (Ii ). For now disclosure configurations are simply considered as instances, we do not regard their inner structure. Future work may also deal with integrating the inner parameters of a disclosure configuration (as seen in Fig. 2.b) into the modeling of disclosure decisions.
4
User Interaction with DDMs
In an ideal case the mapping (1) describes privacy preferences in such a way that users are able to match and align them with real preferences (Fig. 3.a). However, this requires a rather small set of disclosure rules. Further, these rules must be expressed with an appropriate abstraction of the context information. Low level context like raw sensor values do not match intuitive privacy preferences which utilize user selected abstract situation characteristics. These requirements on the disclosure model representation limit the learning schemes that can be used to create DDMs and may drop learning schemes which are less transparent but provide better performance. Further the required context abstraction (done by the developer) is a first error source for DDMs, as there is no general abstraction valid for all users. This motivates to also consider learning schemes that create non-transparent DDMs. Since no learning scheme will create a perfect DDM, the question arises, how users can interact with black box DDMs when they cannot control a DDM’s inner decision process. The simplest approach to adjust a black box DDM are user vetoes which mark wrong disclosure decisions (Fig. 3.b). This is simply continued learning. In this case users are only able to react, they cannot integrate distinct preferences into a DDM which have not yet been learned by the DDM’s learning scheme. Such a DDM can only be used as a suggestion mechanism. To automate disclosure
Smart Privacy Management in Ubiquitous Computing Environments
a) Transparent DDM
b) Black box
137
c) Black box + user rules
Fig. 3. User interaction with DDMs depends on the transparency of a DDM’s internal decision process
Fig. 4. A disclosure process integrating a black box DDM, user vetoes and manual user rules. User rules have a higher priority than DDMs to enable users to express clear disclosure decisions for sensitive personal information. A DDM assists the creation of user rules by providing relevant context information for rule parametrization.
decisions, users must be sure that certain sensible information is handled properly by the DDM. Since a black box DDM cannot be adjusted in that manner, it could be backed up by a simple set of disclosure rules manually compiled by users (Fig. 3.c). In fact, a veto based interaction can be combined with a black box DDM backed up by user rules. Whenever users express a veto on a disclosure decision suggested by a DDM, users can be supported in specifying a manual disclosure rule, e.g. by providing the current context information that can be used to parametrize a manual disclosure rule. Figure 4 illustrates the combination of user vetoes and user rules. Here, the first step in deciding a disclosure is to query the user rules, they have priority
138
C. B¨ unnig
over disclosure decisions of the DDM. If the user rules do not apply to the current situation, the DDM is consulted (step 2). At this point users can express a veto if they do not feel comfortable with the automated disclosure decision (step 3). As a result, the DDM hands out a template for a custom rule to set up by the user (step 4). This template is a draft for a rule using current context information and the requested information as parameters. Setting up custom disclosure rules in that manner preserves the aimed ad hoc characteristic of privacy management, it releases users from abstracting privacy preferences to rules a priori.
5
Conclusion and Outlook
In this work we discussed the management of private information concerning interpersonal privacy implications in smart environment. We identified limitations in existing work, which mostly does not match the intuitive and dynamic aspects of privacy. As an alternative we presented an ad hoc approach to privacy management. This approach uses a disclosure assistance based on learning techniques. To enable users to validate and align actual privacy preferences with the assistance, we presented two concepts for the interaction between a user and a disclosure assistance. Currently we are accumulating user disclosure data which is used for implementing a learning scheme as described above. In our future work we use this learning scheme for implementing and evaluating the presented approaches for interacting with the disclosure assistance.
References 1. B¨ unnig, C.: Learning context based disclosure of private information. In: The Internet of Things & Services - 1st Intl. Research Workshop, Valbonne, France (September 2008) 2. Clauß, S., Pfitzmann, A., Hansen, M.: Privacy-enhancing identity management. The IPTS Report 67, 8–16 (September 2002) 3. Cranor, L.F., Guduru, P., Arjula, M.: User interfaces for privacy agents. ACM Trans. Comput.-Hum. Interact. 13(2), 135–178 (2006) 4. Jendricke, U., Kreutzer, M., Zugenmaier, A.: Pervasive privacy with identity management. Technical Report 178, Universit¨ at Freiburg (October 2002) 5. Langheinrich, M.: A privacy awareness system for ubiquitous computing environments. In: Borriello, G., Holmquist, L.E. (eds.) UbiComp 2002. LNCS, vol. 2498, p. 237. Springer, Heidelberg (2002) 6. Lederer, S., Hong, J.I., Dey, A.K., Landay, J.A.: Personal privacy through understanding and action: Five pitfalls for designers. Personal Ubiquitous Computing 8(6), 440–454 (2004) 7. Lederer, S., Mankoff, J., Dey, A.K., Beckmann, C.P.: Managing personal information disclosure in ubiquitous computing environments. Technical Report UCB/CSD-03-1257, University of California, Berkeley (2003) 8. Maibaum, N., Sedov, I., Cap, C.H.: A citizen digital assistant for e-government. In: Traunm¨ uller, R., Lenk, K. (eds.) EGOV 2002. LNCS, vol. 2456, pp. 284–287. Springer, Heidelberg (2002)
Smart Privacy Management in Ubiquitous Computing Environments
139
9. Myles, G., Friday, A., Davies, N.: Preserving privacy in environments with locationbased applications. IEEE Pervasive Computing 2(1), 56–64 (2003) 10. Palen, L., Dourish, P.: Unpacking “privacy” for a networked world. In: CHI 2003: Proc. of the SIGCHI Conference on Human Factors in Computing Systems, pp. 129–136. ACM, New York (2003) 11. Prabaker, M., Rao, J., Fette, I., Kelley, P., Cranor, L., Hong, J., Sadeh, N.: Understanding and capturing people’s privacy policies in a people finder application. In: UBICOMP 2007: Workshop on UBICOMP Privacy (September 2007)
A Fuzzy Multiple Criteria Decision Making Model for Selecting the Distribution Center Location in China: A Taiwanese Manufacturer’s Perspective Chien-Chang Chou1 and Pei-Chann Chang2 1
Department of Shipping Technology, National Kaohsiung Marine University, Taiwan
[email protected] 2 Department of Information Management, Yuan Ze University, Taiwan
[email protected]
Abstract. The purpose of this paper is to propose a fuzzy multiple criteria decision making model for evaluating the distribution center alternative locations in China and selecting the best one for investing distribution center from a Taiwanese manufacturer’s perspective. Although a lot of papers focus on the subject of location selection, few discuss the subject of distribution center location selection in China. Thus this paper summarizes the criteria for evaluating the candidate distribution center locations, and then develops a fuzzy multiple criteria decision making model. Finally, the proposed fuzzy multiple criteria decision making model is tested by a Taiwanese manufacturer’s case. The results show that the fuzzy multiple criteria decision making model can be used to explain the procedures for distribution center location selection decision making.
1 Introduction Due to the rapid growth of economic in China in the recent years, a lot of Taiwanese business companies went to China to invest the manufacturing industry, logistics industry, civil engineering industry, banking industry and so on. It is an important and difficult for the Taiwanese manufacturing business company to select a best location for building the distribution center in China when beginning the investment planning. Thus, the purpose of this paper is to propose a fuzzy multiple criteria decision making model for the Taiwanese manufacturing company to evaluate and select the best location for investing and building the distribution center in China. Location selection is one of the most important decision issues for the decision makers of business companies or industrial organizations. Many precision-based methods for location selection have been developed in the past. Dahlberg and May [13] utilized the simplex method to determine the optimal location of energy facilities. Tompkins and White [33] introduced a method that used the preference theory to assign weights to subjective factors by making all possible pairwise comparisons between factors. Spohrer and Kmak [30] proposed a weight factor analysis method to integrate the quantitative data and qualitative rating to choose a plant location from numerous alternatives. Stevenson [31] proposed a cost-volume analysis method to M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 140–148, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Fuzzy Multiple Criteria Decision Making Model
141
select the best plant location. Multiple criteria decision-making methods were provided to deal with the problem of ranking and selecting locations under multiple criteria [21, 29]. All the methods stated above are based on the concept of accurate measure and crisp evaluation. The decision making of location selection in the real world is complex. The selection of a best location for business companies and industrial organizations from two or more alternatives locations on the basis of two or more factors is a multiple criteria decision-making problem. Under many situations, the values for the qualitative criteria are often imprecisely defined for the decision maker. It is not easy to precisely quantify the rating of each alternative location and the precision-based methods as stated above are not adequate to deal with the location selection problem [8, 24]. Since human judgments including preference are often vague and can not estimate his preference with an exact numerical value. A more realistic way may be to use linguistic terms to describe the desired value and important weight of criteria, e.g. “very low”, “low”, “fair”, “high”, “very high”, etc [2, 36]. Due to this type of existing fuzziness in the location selection process, fuzzy set theory is an appropriate method for deal with uncertainty and the subjective evaluation data can be more adequately expressed in fuzzy linguistic variables [6, 7, 8, 15, 16, 20, 24, 36]. Although a lot of papers focus on the subject of location selection, few discuss the subject of location selection of distribution center in China, in terms of a Taiwanese manufacturer’s perspective. Thus this paper proposed a fuzzy multiple criteria decision making model for selecting the distribution center location in China from a Taiwanese manufacturer’s perspective. The rest of this paper is organized as follows. Section 2 is the literature review. The fuzzy methodology is introduced in section 3, followed by a case study. Finally, conclusions are given in section 5.
2 Literature Review Brenes et al. [4] found key determinants considered important when making decisions for new manufacturing in free zones included labor cost, political stability, social stability, the parent company’s degree of investment diversification, and other relevant factors, such as public utility efficiency, on site customs houses, and labor recruiting and training services. Globerman and Shapiro [18] identified the impact of government policies on the investment in Canada. They presented a set of criteria for investment decision, including the gross domestic product (GDP), the rate of growth of GDP, cost factor, exchange rate and trade variables. Sun et al. [32] investigated the government policy influenced location decision in China. Wu and Strange [34] examined the location of foreign insurance companies in China based on the investment literature. Five important criteria for investment decisions were identified in their study, namely, the market size and the prospects for growth, agglomeration effects produced by the concentration of producer services and other investment, government policy measures and restrictions, infrastructure quality, ranging from the state of the transportation and telecommunications systems to the existence of a specialized labor force, and the cost considerations, both labor and land.
142
C.-C. Chou and P.-C. Chang
Bevan et al. [3] analyzed the impact of different dimensions of the newly created institutional framework in East European transitional economies on investment. Several specific dimensions were found to influence investment: private ownership of business, banking sector reform, foreign exchange rate, trade liberalization, and legal development. Based on the literature review of criteria considered important when evaluating the distribution center investment, this paper summarized four major criteria and twenty three sub-criteria in the questionnaire of this paper as follows. The influential criteria include • the growth of economic (C1) − the present volume of cargoes (C11) − the potential volume of cargoes in the future (C12) − the trade variables (C13) • cost (C2) − the exchange rate (C21) − the labor cost (C22) − the transportation cost (C23) − the operation cost (C24) − the land cost (C25) • government policies (C3) − the efficiency of government department (C31) − the co-operative relationship between the enterprise and government (C32) − the tax break (C33) − other preferential treatment (C34) − the law on investment & investment restrictions (C35). − the political stability (C36) − the social stability (C37) • Other (C4) − the availability of land (C41) − the infrastructure quality (C42) − the labor quality (C43) − the trade liberalization (C44) − the efficiency of Customs (C45) − the future development (C46) − the banking sector (C47) − the private ownership of enterprise (C48)
3 The Fuzzy Methodology The fuzzy set theory was introduced by Zadeh [35]. Fuzzy numbers are a fuzzy subset of real numbers, and they present the expansion of the idea of confidence interval. Fuzzy set theory was developed exactly based on the premise that the key elements in human thinking are not numbers, but linguistic terms or labels of fuzzy sets [2, 35, 37]. Fuzzy set theory has been applied to solve many decision making problems.
A Fuzzy Multiple Criteria Decision Making Model
143
The basic fuzzy arithmetic operations on fuzzy numbers have been proposed in previous literature [1, 5, 14, 17, 19, 22, 23, 25, 26, 27, 28]. Although many fuzzy arithmetic methods proposed in the previous literature, few of these papers presented the representation of multiplication operation on two or more fuzzy numbers. But Chen and Hsieh [9] proposed the Graded Mean Integration Representation method for operation and ranking of fuzzy numbers. Chou [10] proposed the canonical representation of multiplication operation on two triangular fuzzy numbers by the Graded Multiple Integration Representation Method. Chou [11] further propose the canonical representation of multiplication operation on three trapezoidal fuzzy numbers based on the Inverse Function Arithmetic Representation Method, and then this canonical representation is applied to solve the fuzzy multiple criteria decision making problems [12]. Based on the Inverse Function Arithmetic Representation Method, this paper constructs a fuzzy multiple criteria decision making mode for selecting the distribution center location. The Inverse Function Arithmetic Representation Method is introduced briefly as follows. In this section, we first introduce briefly the graded mean integration representation method for presenting the representation of one fuzzy number. Based on the graded mean integration representation method, we further propose the Inverse Function Arithmetic Representation Method for presenting the representation of multiplication operation on multiple fuzzy numbers. Chen and Hsieh [9] proposed the graded mean integration representation method of fuzzy numbers based on the integral value of graded mean h-level of generalized fuzzy number. Here we describe the meaning as follows. Suppose A=(c, a, b, d) is a fuzzy number. The graded mean integration representation of A is P(A)=
1 (c+2a+2b+d) 6
(1)
The triangular fuzzy number Y=(c, a, b) is a special case of generalized trapezoidal fuzzy number. The graded mean integration representation of triangular fuzzy number Y becomes 1 6
P(Y)= (c+4a+b)
(2)
Based on the above graded mean integration representation method, Chou [11, 12] further proposed the Inverse Function Arithmetic Presentation Method for the multiplication operation on multiple fuzzy numbers as follows: P(A1 ⊗ A2 ⊗ A3)=
1 1 1 (c1+4a1+b1) × (c2+4a2+b2) × (c3+4a3+b3) 6 6 6
(3)
4 A Case Study Based on the literature review of criteria considered important when evaluating the preference of distribution center investment, this paper summarized four major criteria (Ci) and twenty three sub-criteria (Cij) in the questionnaire of this paper in Table 1.
144
C.-C. Chou and P.-C. Chang Table 1. The weights and the preference for criteria
Criteria and sub-criteria • The growth of economic (C1) − The present volume of cargoes (C11) − The potential volume of cargoes in the future (C12) − The trade variables (C13) • The cost (C2) − The exchange rate (C21) − The labor cost (C22) − The transportation cost (C23) − The operation cost (C24) − The land cost (C25) • The government policies (C3) − The efficiency of government department (C31) − The co-operative relationship between the enterprise and government (C32) − The tax break (C33) − Other preferential treatment (C34) − The law on investment & investment restrictions (C35). − The political stability (C36) − The social stability (C37) • Other (C4) − The availability of land (C41) − The infrastructure quality (C42) − The labor quality (C43) − The trade liberalization (C44) − The efficiency of Customs (C45) − The future development (C46) − The banking sector (C47) − The private ownership of enterprise (C48)
Fuzzy weights (4.5, 5, 5) (4.5, 5, 5) (3, 4, 5)
Fuzzy preference (4.5, 5, (4.5, 5,
5) 5)
(2, (3, (2, (4.5, (3, (3, (4.5, (2, (3,
3, 4, 3, 5, 4, 4, 5, 3, 4,
4) 5) 4) 5) 5) 5) 5) 4) 5)
(2,
3,
4)
(2, (3, (2, (3, (3,
3, 4, 3, 4, 4,
4) 5) 4) 5) 5)
(1,
2,
3)
(3,
4,
5)
(2,
3,
4)
(2, (3, (3,
3, 4, 4,
4) 5) 5)
(2, (2, (1,
3, 3, 2,
4) 4) 3)
(3, (3, (1, (3, (2, (4.5, (2, (4.5, (3, (2, (3,
4, 4, 2, 4, 3, 5, 3, 5, 4, 3, 4,
5) 5) 3) 5) 4) 5) 4) 5) 5) 4) 5)
(2, (1,
3, 2,
4) 3)
(0, (0, (0, (2, (0, (4.5, (2, (1,
1, 1, 1, 3, 1, 5, 3, 2,
2) 2) 2) 4) 2) 5) 4) 3)
Based on the linguistic variables in Tables 2 and 3, we interviewed one Taiwanese manufacturing company in China and then obtained the weights (wi) for criteria, weights (wij) for sub-criteria, and the preference (pijk) for candidate location k under sub-criteria (Cij). The weights for criteria and sub-criteria, and the preference for candidate location are listed in Table 1. After preliminary screening, five candidate locations in China remain for further evaluation. Table 2. Linguistic variables for preference Very Poor Poor Fair Good Very Good
(0.0,1.0,2.0) (1.0,2.0,3.0) (2.0,3.0,4.0) (3.0,4.0,5.0) (4.5,5.0,5.0)
A Fuzzy Multiple Criteria Decision Making Model
145
Table 3. Linguistic variables for importance weight of criteria Very Low Low Fair High Very High
(0.0,1.0,2.0) (1.0,2.0,3.0) (2.0,3.0,4.0) (3.0,4.0,5.0) (4.5,5.0,5.0)
Solution Process Let AWi be the percentage of importance weight of criteria Ci. Let AWij be the percentage of importance weight of sub-criteria Cij. Let pijk be the preference for candidate location k under sub-criteria Cij, and TPk be the total preference for candidate location k. AWi =
wi
, AWij =
I
∑w i =1
i
I
w ij
, and TPk=
J
∑w
ij
J
∑∑
AWi ⊗ AWij ⊗ pijk
i =1 j=1
j=1
For example, TP1 is the total preference for candidate location Shanghai (k1). I
TP1=
J
∑∑
AWi ⊗ AWij ⊗ pij1
i =1 j=1
=
w1 w 11 ⊗ ⊗ p111 w1 + w 2 + w 3 + w 4 w 11 + w 12 + w 13
+
w1 w 12 ⊗ ⊗ p121 w1 + w 2 + w 3 + w 4 w 11 + w 12 + w 13
+
w 13 w1 ⊗ ⊗ p131 w1 + w 2 + w 3 + w 4 w 11 + w 12 + w 13
w2 w 21 ⊗ ⊗ p211 w1 + w 2 + w 3 + w 4 w 21 + w 22 + w 23 + w 24 + w 25 +… +… +… w 48 w4 ⊗ ⊗ p481 + w1 + w 2 + w 3 + w 4 w 41 + w 42 + w 43 + w 44 + w 45 + w 46 + w 47 + w 48 +
where wi, wij, pijk are all fuzzy numbers. Thus we can obtain the total preference for candidate location Shanghai as follows. TP1=
(4.5, 5, 5) (4.5, 5, 5) ⊗ ⊗ (4.5,5,5) (4.5,5,5) + (3,4,5) + (2,3,4) + (1,2,3) (4.5, 5, 5) + (3,4,5) + (2,3,4)
146
C.-C. Chou and P.-C. Chang
(4.5, 5, 5) (3,4,5) ⊗ ⊗ (4.5,5,5) (4.5,5,5) + (3,4,5) + (2,3,4) + (1,2,3) (4.5, 5, 5) + (3,4,5) + (2,3,4) (4.5, 5, 5) (2,3,4) + ⊗ ⊗ (2,3,4) (4.5,5,5) + (3,4,5) + (2,3,4) + (1,2,3) (4.5, 5, 5) + (3,4,5) + (2,3,4) +… +… +… (1,2,3) ⊗ + (4.5,5,5) + (3,4,5) + (2,3,4) + (1,2,3) (3,4,5) ⊗ (1,2,3) (3,4,5) + (2,3,4) + (4.5,5,5) + (2,3,4) + (4.5,5,5) + (3,4,5) + (2,3,4) + (3,4,5) +
By the formula (3), we can easily obtain
118.85 96.69 44.25 36.00 78.67 48.00 64.00 78.67 + + + + + + + 165.84 165.84 165.84 289.93 289.93 289.93 289.93 289.93 24.00 36.00 27.00 36.00 24.00 36.00 24.00 8.00 + + + + + + + + 375.75 375.75 375.75 375.75 375.75 375.75 375.75 429.10 6.00 9.83 18.00 9.83 39.33 18.00 16.00 + + + + + + + =3.46 429.10 429.10 429.10 429.10 429.10 429.10 429.10
TP1=
By the same solution process, we can obtain the total performance for candidate location TP2= 3.19, TP3= 3.21, TP4= 2.98, TP5= 2.87. The candidate location Shanghai (k1) is selected as the best location for investing and building distribution center in China by the Taiwanese manufacturer.
5 Conclusion This paper proposes a fuzzy multiple criteria decision making model for selecting the distribution center location in China from a Taiwanese manufacturer’s perspective. This fuzzy multiple criteria decision making model is tested by a Taiwanese case in the real world. The results show that this model seems to be promising. This model can be used to explain the decision making procedure for distribution center location selection of manufacturing company well under fuzzy multiple criteria decision making environment. On the other hand, we can see while selecting one distribution center, Taiwanese manufacturer is much care about the present volume of cargoes, potential volume of cargoes in the future, labor cost, transportation cost, operation cost, land cost, efficiency of government department, the co-operative relationship between the enterprise and government, law on investment and investment restrictions, political stability, social stability, availability of land, labor quality, efficiency of Customs, future development and private ownership of enterprise. Finally, Shanghai is selected as the best location for investing and building the distribution center, due to Shanghai has excellent advantages.
A Fuzzy Multiple Criteria Decision Making Model
147
References 1. Adamo, J.M.: Fuzzy Decision Trees. Fuzzy Sets and Systems 4, 207–219 (1980) 2. Bellman, R.E., Zadeh, L.A.: Decision-making in a Fuzzy Environment. Management Science 17(4), 141–164 (1970) 3. Bevan, A., Estrin, S., Meyer, K.: Foreign Investment Location and Institutional Development in Transition Economies. International Business Review 13, 43–64 (2004) 4. Brenes, E.R., Ruddy, V., Castro, R.: Free Zones in E1 Salvador. Journal of Business Research 38(1), 57–65 (1997) 5. Campos, L., Verdegay, J.L.: Linear Programming Problems and Ranking of Fuzzy Numbers. Fuzzy Sets and Systems 32, 1–11 (1989) 6. Chang, P.C., Liu, C.H.: A TSK Type Fuzzy Rule Based System for Stock Price Prediction. Expert Systems with Applications 34(1), 135–144 (2008) 7. Chang, P.C., Wang, Y.W.: Fuzzy Delphi and Back-propagation Model for Sales Forecasting in PCB Industry. Expert Systems with Applications 30(4), 715–726 (2006) 8. Chen, C.T.: A Fuzzy Approach to Select the Location of the Distribution Center. Fuzzy Sets and Systems 118(1), 65–73 (2001) 9. Chen, S.H., Hsieh, C.H.: Graded Mean Integration Representation of Generalized Fuzzy Number. In: Proceeding of 1998 Sixth Conference on Fuzzy Theory and Its Application (CD-ROM), Filename: 031.wdl, Chinese Fuzzy Systems Association, Taiwan, Republic of China, pp. 1–6 (1998) 10. Chou, C.C.: The Canonical Representation of Multiplication Operation on Triangular Fuzzy Numbers. Computers & Mathematics with Applications 45, 1601–1610 (2003) 11. Chou, C.C.: The Representation of Multiplication Operation on Fuzzy Numbers and Application to Solving Fuzzy Multiple Criteria Decision Making Problems. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 161–169. Springer, Heidelberg (2006) 12. Chou, C.C.: A Fuzzy MCDM Method for Solving Marine Transshipment Container Port Selection Problems. Applied Mathematics and Computation 186, 435–444 (2007) 13. Dahlberg, M.D., May, J.H.: Linear Programming for Sitting of Energy Facilities. Journal of Energy Engineering 106, 5–14 (1980) 14. Delgado, M., Vila, M.A., Voxman, W.: On a Canonical Representation of Fuzzy Numbers. Fuzzy Sets and Systems 93, 125–135 (1998) 15. Deng, Y.: Plant Location Selection Based on Fuzzy TOPSIS. International Journal of Advanced Manufacturing Technology 28, 839–844 (2006) 16. Deng, Y., Cheng, S.: Evaluating the Main Battle Tank Using Fuzzy Number Arithmetic Operations. Defence Science Journal 56, 251–257 (2006) 17. Dubois, D., Prade, H.: Operations on Fuzzy Numbers. Journal of Systems Sciences 9, 613– 626 (1978) 18. Globerman, S., Shapiro, D.M.: The Impact of Government Policies on Foreign Direct Investment: the Canadian Experiences. Journal of International Business Studies 30(3), 513– 532 (1999) 19. Heilpern, S.: Representation and Application of Fuzzy numbers. Fuzzy Sets and Systems 91, 259–268 (1997) 20. Hsu, H.M., Chen, C.T.: Fuzzy Credibility Relation Method for Multiple Criteria Decisionmaking Problems. Information Sciences 96, 79–91 (1997) 21. Hwang, C.L., Yoon, K.: Multiple Attributes Decision Making Methods and Applications. Springer, Heidelberg (1981)
148
C.-C. Chou and P.-C. Chang
22. Kaufmann, A., Gupta, M.M.: Introduction to Fuzzy Arithmetic Theory and Applications. Van Nostrand, Reinhold (1991) 23. Li, R.J.: Fuzzy Method in Group Decision Making. Computers and Mathematics with Applications 38, 91–101 (1999) 24. Liang, G.S., Wang, M.J.: A Fuzzy Multiple Criteria Decision-making Method for Facilities Site Selection. International Journal of Production Research 29(11), 2313–2330 (1991) 25. Liou, T.S., Wang, M.J.: Ranking Fuzzy Numbers with Integral Values. Fuzzy Sets and Systems 50, 247–255 (1992) 26. Ma, M., Friedman, M., Kandel, A.: A New Fuzzy Arithmetic. Fuzzy Sets and Systems 108, 83–90 (1999) 27. Mizumoto, M., Tanaka, K.: The Four Operations of Arithmetic on Fuzzy Numbers. Systems Computers Controls 7(5), 73–81 (1976) 28. Nahmias, S.: Fuzzy Variables. Fuzzy Sets and Systems 1(2), 97–111 (1978) 29. Rietveld, P., Ouwersloot, H.: Ordinal Data in Multi-criteria Decision Making, a Stochastic Dominance Approach to Sitting Nuclear Power Plants. European Journal of Operational Research 56, 249–262 (1992) 30. Spohrer, G.A., Kmak, T.R.: Qualitative Analysis Used in Evaluating Alternative Plant Location Scenarios. Industrial Engineering 16, 52–56 (1984) 31. Stevenson, W.J.: Production/ Operation Management. Richard D. Irwin Inc., Illionois (1993) 32. Sun, Q., Tong, W., Yu, Q.: Determinants of Foreign Direct Investment across China. Journal of International Money and Finance 21, 79–113 (2002) 33. Tompkins, J.A., White, J.A.: Facilities Planning. John Wiley & Sons Company, New York (1984) 34. Wu, X., Strange, R.: The Location of Foreign Insurance Companies in China. International Business Review 9, 381–398 (2000) 35. Zadeh, L.: Fuzzy Sets. Information and Control 8, 338–353 (1965) 36. Zadeh, L.A.: The Concept of a Linguistic Variable and Its Application to Approximate Reasoning. Information Sciences 8, 199–249 (1975) 37. Zimmermann, H.J.: Fuzzy Set Theory and Its Applications, 2nd edn. Kluwer Academic Publishers, Boston (1991)
A Hierarchical Data Dissemination Protocol Using Probability-Based Clustering for Wireless Sensor Networks Moonseong Kim1, Matt W. Mutka1, and Hyunseung Choo2 1
Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824, USA
[email protected],
[email protected] 2 School of Information and Communication Engineering Sungkyunkwan University Suwon, 440-746, Korea
[email protected]
Abstract. A major challenge for designing a dissemination protocol for Wireless Sensor Networks (WSNs) is energy efficiency. Recently, researchers have studied this issue, and SPMS, a representative protocol, outperforms the well-known protocol SPIN. One of the characteristics of SPMS uses the shortest path to minimize energy consumption. However, since it repeatedly uses the same shortest path, maximizing network lifetime is impossible, although it reduces the energy consumption. In this paper, we propose a Hierarchical data dissemination protocol using Probability-based clustering, called HiProc. It guarantees energy-efficient data transmission and maximizes network lifetime. HiProc solves the network lifetime problem by a novel probability function, which is related to the residual energy and the distance to a neighbor. The simulation results show that HiProc guarantees energy-efficient transmission and moreover increases the network lifetime by approximately 78% than that of SPMS. Keywords: Wireless Sensor Networks (WSNs), Data Dissemination Protocol, Energy Efficiency, Network Lifetime, SPIN, SPMS.
1 Introduction In a Wireless Sensor Network (WSN) environments, the sensor node performs not only data transmission but also relay of data. When some sensor nodes become energy-exhausted, the whole network lifetime may be reduced. Hence, an efficient low power consumption design is very important not only for the network protocol, but also for the operating system, middleware, and security [1]. Data dissemination is a fundamental feature of a WSN, since thousands of sensor nodes may collect, exchange, and transmit data [2]. Flooding and Sensor Protocols for Information via Negotiation (SPIN) [3], two well-known proactive schemes, have been employed for M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 149–158, 2009. © Springer-Verlag Berlin Heidelberg 2009
150
M. Kim, M.W. Mutka, and H. Choo
data dissemination. First, in Flooding, each node retransmits the received data from all its neighbors. This is a very simple and primitive protocol. It rapidly disseminates the data; however, it quickly wastes energy when a data implosion occurs [4][5]. Second, SPIN solves this issue by exchanging information and negotiating at each node. Even if SPIN exchanges high-level data descriptors called metadata to prevent the duplication of data, there is still another energy problem. That is, it transmits all data at the same power level and thus does not consider the distance to a neighbor. Therefore, SPIN is incapable of energy-efficient transmission. Recently, Khanna, et al. proposed a protocol, called the Shortest Path Minded SPIN (SPMS) [6], to minimize the energy consumption using the shortest path and multi-hop to reach the destination. In order to use the shortest path, each node executes the Bellman-Ford algorithm within its zone, which is defined as the area a node can reach by transmitting at its maximum power level [7]. Thus, SPMS has to maintain a routing table. It can waste minimum energy since it uses the shortest path. However, since specific nodes in the shortest path would be used repeatedly, the network lifetime may decrease. Moreover, once a node failure occurs, parts of the network might not be able to use the shortest path anymore. In this paper, we propose a Hierarchical data dissemination protocol using Probability-based clustering, called the HiProc, which definitely guarantees an energy-efficient data transmission and strongly increases the network lifetime when the sensing data is disseminated throughout the entire network. For transmitting data, HiProc avoids the network lifetime problem of SPMS by selecting a path according to its two attributes. The first attribute is the residual energy for each node. HiProc selects a node that has a high energy level in order to prevent a certain node from being selected repeatedly. The second attribute is the considering transmission distance between nodes. It has to guarantee more efficient energy consumption. The remainder of this paper is organized as follows. Section 2 explains previous dissemination protocols. Section 3 presents details of the proposed protocol. Section 4 evaluates our proposal and finally, Section 5 concludes this paper.
2 Previous Work Traditionally, with proactive dissemination protocols, the source node distributes the sensed data through the entire network. Flooding and SPIN, two well-known proactive schemes, have been previously employed for data dissemination. In Flooding, this protocol need not have a special scheme to disseminate the sensed data. The data are continuously sent to neighbor nodes until it reaches maximum-hop or destination. Flooding is easy to implement but has some problems. Since a node transmits data to its neighbors irrespective of whether or not the neighbor already has the data in Flooding, SPIN is proposed to solve the data duplication problem. To overcome this problem, nodes negotiate with their neighbors before transmitting the data in SPIN, therefore it guarantees that only needed data will be transmitted. SPIN uses metadata, which describes the information to negotiate successfully. The strength of SPIN is simplicity. Nodes make an uncomplicated decision whenever data is received, so it wastes little energy due to computation. Furthermore, an advertising message is very light as compared to the sensing data, SPIN is able to distribute 50%
A Hierarchical Data Dissemination Protocol Using Probability-Based Clustering
151
more data per unit energy than Flooding [3]. However, when SPIN transmits data at the same power level, it does not consider the distance to a neighbor. Since the energy consumption generally increases exponentially with the distance, SPIN is incapable of energy-efficient transmission. SPMS employs the metadata concept used in SPIN and uses a multi-hop model for data transmission to avoid the exponential increase in energy consumption with distance [8]. However, for multi-hop routing, the next hop should be known before transmitting data. It is infeasible to maintain a routing table for each node, since sensor networks comprise thousands of nodes. Thus, SPMS maintains a routing table for a zone, which is defined as the maximum power level of a node, to reduce the cost of building the routing table. Each node can build a routing table with the shortest path using the Distributed Bellman-Ford (DBF) algorithm. After building the routing table, SPMS begins data transmission. SPMS sends data through the shortest path except in the case of a node failure. When the predecessor node on the computed shortest path fails, current node sends a request message to the node, from which it initially received an advertising message instead of using the shortest path. However, it may not guarantee energy-efficient transmission when the distance between the node that sent the advertising message and the current node is too long. Further, if the advertising node fails, there is no way to receive the data. To solve this problem, each node in SPMS maintains a Primary Originator Node (PRONE) and a Secondary Originator Node (SCONE). PRONE is an energy-efficient source from which data can be obtained and SCONE is an alternative node that can be used if PRONE fails. As mentioned, SPMS has the mechanism to overcome its weakness because of only using the shortest path. However, the network lifetime problem still persists.
3 The Proposed Dissemination Protocol: HiProc 3.1 Motivation and Design of HiProc Due to weaknesses of previously mentioned proactive protocols, the purposes of the proposed protocol in this paper are to guarantee an energy efficient data transmission and increase the network lifetime. For energy efficiency, the distance of transmission could be considered. For long network lifetime, the energy consumption could be distributed evenly by considering the residual energy of each node. SPMS maximizes the energy efficiency by transmitting data through the shortest path. However, this may result in a specific node being used repeatedly, since the shortest path has already been selected. This is inefficient in terms of the network lifetime. Further, SPMS does not guarantee maximized energy-efficiency in the case of a node failure, since parts of the network cannot use the shortest path anymore. In this paper, a clustering scheme is proposed that uses a probability function with parameters consisting of the residual energy and the distance to a neighbor. Here, the clustering scheme means that some nodes, called the Cluster Heads (CHs), disseminate the sensed data. Since each node elects itself to be a CH with the probability function, the given appropriate probability function may impact on the performance of the clustering scheme. We propose a Hierarchical data dissemination protocol using Probability-based clustering, called the HiProc, which definitely
152
M. Kim, M.W. Mutka, and H. Choo
guarantees an energy-efficient data transmission and strongly increases the network lifetime when the sensed data is disseminated throughout the entire network. First, we explain the basic concept of clustering scheme in HiProc. As shown in Fig. 1, the source node (src) senses the data and broadcasts the advertising message (ADVold) to its neighbor nodes. The node that has received ADVold elects itself to be the CH. The src transmits the data only to the CHs and each CH node broadcasts ADVnew to its neighbor nodes. If the remaining nodes have received ADVold receive ADVnew, the remaining nodes request its CH to receive the data.
fe Base Ȗ Base 10 Base 100
eres
Fig. 1. The basic concept of clustering scheme in HiProc
eini
Fig. 2. The energy metric from Equation (1) according to each logarithm base
Second, we propose a probability function to be used in the HiProc protocol. The proposed probability function simultaneously takes into account two properties; the level of residual energy and the distance to a neighbor. The energy metric formula is as follows.
⎛ e f e = − log b ⎜⎜ m − res eini ⎝ m=
⎞ ⎟+n ⎟ ⎠
b , n = 1 − log b (b − 1) b −1
(1)
(2)
Equation (1) is energy metric function and the value must be between 0 and 1, which mean the low and high energy level. Equation (2) is obtained by the condition (0,0) and (1,1) from Equation (1). Here, eres and eini are defined as the residual energy and initial energy, respectively. Fig. 2 shows the graph of Equation (1) according to each logarithm base. fd =
distance(node1 , node2 ) transmissi on range
(3)
Equation (3) is a distance metric function. If the distance between two nodes is close to the transmission range, then the value is near 1. In Fig. 3, two cases of the elected
A Hierarchical Data Dissemination Protocol Using Probability-Based Clustering
(a) The elected CHs are located in near transmission range
153
(b) The elected CHs are located in near the nodes sent data
Fig. 3. Two cases of choosing CH nodes
Fig. 4. The proposed probability function Pr (N )
CH nodes are depicted. There is one relay to send the data in Fig. 3(a), but three relays are needed in Fig. 3(b). It means that the unnecessary energy consumption happens in mid course of data transmission. Pr ( N ) = (ω e f e + ω d f d ) 2
− ( N −1)
(4)
where, ωe , ω d ≥ 0 and ω e + ω d = 1 . Equation (4) is the proposed probability function to be used in HiProc. The proposed probability function uses f e and f d as the main parameters. Each attribute can be adjusted by weight factor ωe and ωd . Pr (N ) is power increasing along with the iterative processes N . The power increasing property enables the function having the ability to reach 1 within a few iterations as shown in Fig. 4. 3.2 Operation Overview for HiProc Protocol
In HiProc, a node may have four possible states. Before describing the proposed protocol, we first define the following terms and the detailed pseudo code is as follows.
154
M. Kim, M.W. Mutka, and H. Choo
• State of nodes • Unclustered Node (UN): A node that has received no message up to now, and has no relation with the data. • Cluster Head (CH): A node that is elected as a cluster header. • Cluster Candidate (CC): A node that has received an advertising message, but does not determine its CH. • Cluster Member (CM): A node that was CC, has determined its CH, and requests the CH to receive the data.
HiProc(ωe ,ωd ) 01. Data is generated; 02. The sensing node disseminates the advertising message, ADVold, to its neighbor nodes; 03. N =1; 04. A node that has received the ADVold determines itself whether or not it is CH using Pr (N ) . 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
That is, if random(0,1) ≤ Pr ( N ) , then the node is CH; If(A node is elected as CH) Disseminate the advertising message, ADVnew, to its neighbor nodes; A node that has received the ADVnew message becomes CC; Else A node becomes UN; While(1) ++N ; The UN redetermines itself whether or not it is CH using Pr (N ) after some waiting time. If(A node is elected as CH) Disseminate the advertising message, ADVnew, to its neighbor nodes; A node that has received the ADVnew message becomes CC; Else A node becomes UN; If(All nodes are CH or CC) Break; CC determines its CH by comparing signal strengths of the received ADVnew. And CC becomes CM; CH requests the data from the node sent ADVold; CM requests the data from its CH;
Fig. 5 shows how CHs are elected and how the sensed data is disseminated. The source node (src) first senses the event and broadcasts ADVold to the nodes within its transmission range. We assume that the node A is elected as CH in Fig. 5(a). The node A also disseminates ADV1new to its neighbor nodes as shown in dotted line in Fig. 5(b). The nodes that have received both ADVold and ADV1new become CC. If the nodes B and C are elected as CH at the next iteration in Fig. 5(c), they also broadcast ADV2new to its neighbor nodes and the nodes that have received both ADVold and ADV2new become CC as well. If all nodes that have received ADVold are CH or CC,
A Hierarchical Data Dissemination Protocol Using Probability-Based Clustering
Pr(1)
155
A
A
(a)
(b)
A
A Pr(1)
B
B
Pr(2) Pr( 2)
C
C
(c)
(d) Fig. 5. Operation scenario for HiProc
then each CC determines its CH by comparing signal strengths of the received ADVnew and becomes CM. Finally, CH and CM request the data from the src and its CH, respectively. Therefore, the sensed data could be shared with all nodes.
4 Performance Evaluation We implemented the proposed HiProc using JAVA to evaluate its performance. The main parameters are listed in Table 1. Sensor nodes with 35m transmission range are randomly distributed in a network over an area of 250m × 250m . Each node has an initial energy level of 0.5 J . The control messages – advertising, request – are 16bytes each, while the sensed data is 500bytes . We use the following energy model [8].
Et = α11 + α 2 d 2
(5)
E r = α12
(6)
where, Et and Er denote the energy consumption for transmitting and receiving a bit over a distance d , respectively. Hence, the energy consumption for a relay is E r + Et . The energy model has values of α11 , α12 = 80nJ / bit and α 2 = 100 pJ / bit / m 2 .
156
M. Kim, M.W. Mutka, and H. Choo Table 1. Simulation variables Network size
250m × 250m
Initial energy
0.5 J
α11 , α12
80nJ / bit
α2
100 pJ / bit / m 2
Packet size
ADV, REQ
16bytes
DATA
500bytes
1.8
1.7
35m
Energy consumption for an event per a node (mJ)
Energy consumption for an event per a node (mJ)
Transmission range
HiProc(1.0, 0.0) HiProc(0.5, 0.5) HiProc(0.0, 1.0) SPMS SPIN
1.6
1.4
1.3
1.2 0.004
0.005
0.006
0.007
0.008
0.009 2
Density (nodes / m )
(a)
0.010
0.011
1.240
1.235
1.230
1.225
1.220
1.215 0.004
HiProc(1.0, 0.0) HiProc(0.5, 0.5) HiProc(0.0, 1.0) 0.005
0.006
0.007
0.008
0.012
0.009
0.010
0.011
0.012
2
Density (nodes / m )
(b)
Fig. 6. Average energy consumption for an event per a node
Here, α11 is the energy/bit consumed by the transmitter electronics, α 2 is the energy dissipated in the transmit op-amp, and α12 is the energy/bit consumed by the receiver electronics. We also set the logarithm base is 10 in Equation (1). Fig. 6 shows a graph of each scheme’s average energy consumption according to the density. SPIN has the highest energy consumption. SPMS exhibits a better performance than SPIN, although its energy consumption increases with density as compared to that of HiProc. The energy efficiency enhancement of HiProc is up to 33% and 13% compared with SPIN and SPMS, respectively. On the other hand, HiProc differs with SPIN and SPMS, the energy consumption does not increase greatly. Fig. 6(b) illustrates the energy consumption of HiProc for each ω d. It is noted that the efficiency of the energy consumption increases when ω d is increasing to 1. Fig. 7 explains the network lifetime according to the density. Due to HiProc distributes a data over the entire network by considering the residual energy, the network lifetime might be significantly prolonged rather than the other protocols. When ωe is 1 especially, HiProc has the longest network lifetime and the enhancement is 71~78% compared with SPMS. Finally, we continuously generate 400 events. As shown in Fig. 8, the energy distribution of SPIN is very exhaustive; however, the energy consumption of HiProc could be distributed evenly.
A Hierarchical Data Dissemination Protocol Using Probability-Based Clustering
157
Network Lifetime (Number of Events)
700
600
500
400
300
HiProc(1.0, 0.0) HiProc(0.0, 1.0)
HiProc(0.5, 0.5) SPMS SPIN
0.006
0.009
200 0.004
0.005
0.007
0.008
0.010
0.011
0.012
2
Density (Nodes / m )
Low
High
Fig. 7. Network Lifetime
(a) SPIN
(b) HiProc(1,0)
(c) Energy
Fig. 8. Distribution for energy consumption
5 Conclusion This paper describes proactive protocols, such as Flooding, SPIN, and SPMS for WSNs. We proposed a Hierarchical data dissemination protocol using Probabilitybased clustering, called the HiProc. HiProc solves the energy consumption and the network lifetime problems by a novel probability function, which is related to the residual energy and the distance to a neighbor. We have compared SPIN and SPMS through simulation experiments and have proved that the performance of HiProc is superior to earlier protocols. This proves that HiProc guarantees the energy efficient transmission and the longer network lifetime. Furthermore, HiProc provides a more flexible dissemination protocol due to it uses two weighted attributes. Acknowledgments. The authors thank Jihoon Cho for supporting simulation. This research was supported in part by MKE (Korea) under ITRC IITA-2008-(C10900801-0046) and NSF (USA) Grants No. OCI-0753362, CNS-0721441, and CNS0551464.
158
M. Kim, M.W. Mutka, and H. Choo
References 1. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: a survey. Computer Networks 38, 393–422 (2002) 2. Al-Karaki, J.N., Kamal, A.E.: Routing techniques in wireless sensor networks: a survey. Wireless Communications 11(6), 6–28 (2004) 3. Heinzelman, W.R., Kulik, J., Balakrishnan, H.: Adaptive Protocols for Information Dissemination in Wireless Sensor Networks. In: Proceeding of MOBICOM. ACM/IEEE (1999) 4. Ni, S.-Y., Tseng, Y.-C., Chen, Y.-S., Sheu, J.-P.: The Broadcast Storm Problem in a Mobile Ad Hoc Network. In: Proceeding of MOBICOM, pp. 151–162. ACM/IEEE (1999) 5. Le, T.D., Choo, H.: Efficient Flooding Scheme Based on 2-Hop Backward Information in Ad Hoc Networks. In: Proceeding of ICC, pp. 2443–2447. IEEE, Los Alamitos (2008) 6. Khanna, G., Bagchi, S., Wu, Y.-S.: Fault Tolerant Energy Aware Data Dissemination Protocol in Sensor Networks. In: Proceeding of DSN, pp. 739–748. IEEE, Los Alamitos (2004) 7. Haas, Z.J., Pearlman, M.R.: The performance of query control schemes for the zone routing protocol. Transactions on Networking 9(4), 427–438 (2001) 8. Bhardwaj, M., Garnett, T., Chandrakasan, A.P.: Upper bounds on the lifetime of sensor networks. In: Proceeding of ICC, vol. 3, pp. 785–790. IEEE, Los Alamitos (2001)
An OWL-Based Knowledge Model for Combined-Process-and-Location Aware Service Gunhee Kim, Manchul Han, Jukyung Park, Hyunchul Park, Sehyung Park, Laehyun Kim, and Sungdo Ha Intelligence and Interaction Research Center, Korea Institute of Science and Technology, Seoul, Korea {kani,manchul.han,parkjk,hyunchul.park,sehyung, laehyunk,s.ha}@kist.re.kr
Abstract. This paper presents a knowledge model for spatiotemporal context awareness. The knowledge model is designed to understand user goals and to guide users who perform complicated tasks involving several routes, such as tasks at a hospital. An information system should be able to consider both the location and process context to provide relevant guidance regarding tasks, their location, and their process, as location and process information are collaboratively associated. A service that is both process- and- location-aware is considered, and a knowledge model that represents the semantics of the proposed process- andlocation-aware service is given. Keywords: spatiotemporal context, process, location, knowledge model, OWL (Web Ontology Language).
1 Introduction Context-aware computing has been drawing much attention with the development of u-computing environments. A context-aware system can help users concentrate on their tasks with reduced complexity and can provide users relevant guidance when performing their tasks. Context awareness or situation awareness can be defined as the answer to “what is going on?” [1]. For instance, in a case in which a user is watching a football game without any background knowledge, it would be difficult to infer what is going on. In short, situation awareness is the inference of a situation using background knowledge. Recently, a number of applications have been developed to provide context-aware services, and various context models have also been designed to support these applications. Baldauf et al. [2] present architectures of context-aware systems as well as approaches for context models. Among them, the ontology-based approach has been used widely due to its formal expressive and reasoning techniques. Krummenacher and Strang [3] defined contexts for a context-aware service and suggested criteria for context and ontology modeling. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 159–167, 2009. © Springer-Verlag Berlin Heidelberg 2009
160
G. Kim et al.
In addition, research has centered on a context-aware system with a knowledge model. Ongenae et al. [4] suggested an ontology-based system for context-aware hospital nurse call optimization. The context-aware call system has a context model that includes the status of the nurse, the characteristics of the nurse (gender, language, etc), and the location of the patient and the nurse. With the model, patients can walk around freely, select their preferred nurse (gender, language, etc), and nurses can identify whether calls are urgent or normal. An ontology-based personalized pathfinding system using deviation detection for individuals with cognitive impairment [5] was suggested by Chang et al. They suggested a system and developed a knowledge model for guiding cognitively impaired people to their work places, by showing pictures at crucial places and times. The system considered the leaving time and the expected time of arrival to an area when alerting the system of a problem, using the person’s ID for personalization. Niu and Kay present Pervasive Personalization of Location Information: Personalized Context Ontology (PECO) [6]. The purpose of this system is to provide personalized information to a user using an ontology model with seven parts: location sensors, building plans, technical building data, staff directories, internal email aliases, direct user input. However, these knowledge models of the systems are designed only to focus on location-awareness or limited situations. To provide users guidance with accomplishing complicated tasks, it is necessary to understand the current process and location context together, as the process and the location context are collaboratively associated. This paper presents the design of the knowledge model for a spatiotemporal context-aware service and presents a knowledge model which enables systems to recognize and adapt to changing situations. To validate the proposed knowledge model, scenarios in general hospitals are presented and instances of the knowledge model are developed for the scenarios.
2 System and Considerations A process and location-aware information service system [7] that helps users to handle tasks with complicated procedures was designed. The system provides information via a mobile device such as a PDA or a smart phone regarding what to do, how to do it, and where to go while performing complicated tasks in unfamiliar public places such as hospitals. To give a user relevant guidance, the system first should recognize the time and location from the process data of a local information system and the location data of sensor networks, and must then infer the tasks and location with which the users should proceed. To understand users’ situation and provide relevant information, the system should be able to consider both the process and location context because process and location are collaboratively associated. Moreover, the system should be able to adapt to changes in time and space situations, as processes and locations can change frequently as a task is being completed. Thus, a knowledge model of a context-aware information guide system should be able to reflect the following two features: interrelation between the process and the location, and adaptation to changes of the spatiotemporal context.
An OWL-Based Knowledge Model for Combined-Process-and-Location Aware Service
161
2.1 Interrelation between Process and Location The relationships between the process and location should be expressed collaboratively in the knowledge model because process and location information can be helpful to interpret each context. The location information of a user can be grounds for inferring a user’s task or processes. Additionally, a system should be able to infer the location of the user from process information and provide relevant information services; the meaning of a particular location can be interpreted differently according to a user’s goals and processes. Figure 1 shows the difference between a location-based and the proposed-processand- location-based knowledge structure. In the location-based service, the system provides information only related to the user’s current location. It is difficult to guide the user in terms of his or her next location and process. Moreover, if there are several possible tasks or services at one place, this system cannot easily decide which of these is necessary within the current user’s situation. In comparison, the proposed system can provide information regarding the next service and location as well as the most relevant information among possible candidates related the current location based on recognition of user’s spatiotemporal situation.
Fig. 1. Comparison between the location-based and the combined- process- and- location-based knowledge structure
2.2 Adapting to a Change of the Spatiotemporal Context The system should have a flexible knowledge structure to adapt to changes of the spatiotemporal context, as process and location procedures can vary according to ongoing results or to changes of the user’s intentions. At a hospital, for instance, the patient’s procedure can be modified by the result of a doctor’s examination. In addition, if a patient would like to rest or use the toilet before going to the doctor’s office, the route information should be modified. Figure 2 shows an example of a process adaptation. The initial service sequence (a)->(b)->(c) becomes (a)->(b)->(d)>(e)->(c) due to the result of (b). According to the modified process, the guide information about what to do, how to do it, where to go, and how to go there also changes.
162
G. Kim et al.
Fig. 2. Adaptation to a context change
3 Knowledge Model A knowledge model for a context-aware information guide service that requires consideration of both process and location situations was designed. The knowledge model expresses the relationships between the process and location and has a flexible structure for changes of the spatiotemporal context. The knowledge model includes three main parts: the process, the location, and the user. The process part represents the Process, Service, and Task classes. They are linked to the user and the location classes where they are performed. In the location part, classes are classified into two layers: a physical layer and a logical layer. The classes included in the physical layer represent real-world entities such as rooms. On the other hand, the classes of the logical layer are defined by their use, such as waiting
Fig. 3. Classes and relations
An OWL-Based Knowledge Model for Combined-Process-and-Location Aware Service
163
Table 1. Definition of classes of the knowledge model
User
Goal
Physical
Location Context
Logic -al
Process
Sub Class
Context
Primary Class
Definition A final aim of a user
Process
A flow of Services for accomplishing user`s goal
Service
A unit operation such as examination, inspection
Task
An activity that user to do for the Service
Area
A place where the Services are executed
Zone
A place where the Task is executed
Floor
A level on inside a building
Section
An abstract physical space including Room, Aisle, and Hall
Room
A space within a building enclosed by a ceiling, and walls
Aisle
A passage between rows of rooms
Hall
An opened space linked to rooms and aisles
places, dental departments, etc. Lastly, the User, Goal, and User type classes are defined in the user part. The overall structure of the knowledge model is illustrated in Figure 3, and the definitions of each class are described in Table 1. Figure 3 also represents the relationships between the classes. In particular, the logical location classes play the important role of inferring the relationships between the process classes and the physical location classes. The Area class is defined to match the Service class; the Area denotes the places where the Services occur. In addition, the Zone class is defined on the same level of the Task class, which represents the detailed activities of the user as they pertain to services. At a hospital, for example, a dental examination can be an instance of the Service class, and the dental department can be defined as an instance of the Area class, which is related to the Service of the dental examination. The dental department instance also contains the Floor and Section information so that the guide system can provide location information to patients who are set to receive a dental examination. 3.1 Process Context Model A Process Context model was designed to provide users with guidance regarding what to do and how to do it according to the users’ goal. Figure 4 presents the process model. This model has three classes: Process, Service, and Task. The Process consists of a flow of the Services they are needed for the goal. These Services in the Process can be swapped with other Services depending on ongoing results. Each Service can also have a sequence of the Tasks that users should carry out for the Service. With the Process Context, the system can recognize whether or not users are doing well with the process and can provide users with relevant guidance when they have problems. If a user is waiting for his/her turn in a waiting area without the task ‘take a number’,
164
G. Kim et al.
Fig. 4. Process Context model
the guide system can inform the user that he/she should take a number for his/her service. It also tells the user how to do this. 3.2 Location Context Model The Location Context model was designed to recognize a user’s current position in addition to his or her next destination with the process context. From the location context, the system can provide users with route guidance. Figure 5 describes the Location Context model classified into the Logical Location and Physical Location. The Logical location has Area and Zone classes, and Zones are linked to other Zones so that the path of the user can be expressed as a sequence of Zones. Additionally, the Physical Location includes the Floor and Section classes. The Section class has three subclasses: Room, Aisle, and Hall classes.
Fig. 5. Location Context model
4 An Example of a Knowledge Model 4.1 Hospital Scenarios In order to design a more practical model, a general hospital was chosen as the domain for this research. The services of hospitals and users’ tasks were analyzed before the knowledge model was designed.
An OWL-Based Knowledge Model for Combined-Process-and-Location Aware Service
165
It was assumed that a patient was visiting the hospital to resolve a toothache. During the first step, the process of the patient is set up as a sequence of basic services that include Reception, Examination and Payment. The remaining processes are modified after the completion of each step. After the Examination (Step 2), the doctor requests an X-ray, and the remaining process, Payment, is changed by adding the X-ray inspection. During the last step, the remaining process is only Payment. The scenario shows how the proposed knowledge model reflects the process changes after Step 2. Figure 6 shows the context information at each step.
Fig. 6. Hospital scenarios
4.2 Implementation The knowledge model was built with the OWL language and a scenario-based prototype system to validate the developed knowledge model. The knowledge model was built using a protégé. Figure 7 describes an example how ontology is written in OWL/XML. The Sequence represents the flow of the Tasks, and the flow of the Service is represented as shown below.
Figure 7 also represents the relationships between the Logical and the Physical Locations as well as the relationships between the Services and Tasks. Here, the current location is Dental Surgery Doctor Kim’s room with room number B211. It is a Zone of the Dental surgery Area and provides dental services. The Task of the zone is the dental examination. With these relationships, the reasoning engine RacerPro can infer the
166
G. Kim et al.
Fig. 7. An OWL expression for the Zone and its related Tasks
location of the Zone, what Services are executed in the Zone, and other information. The system can then provide the user spatiotemporal-context-aware guidance.
5 Conclusion In this paper, an OWL-based knowledge model for context-aware system is proposed that considers both the location and the process. The knowledge structure of this model expresses the interrelationships between the process and the location classes as well as adaptations to changes in the spatiotemporal context. With the knowledge model, the system can recognize the combined process- and location-based context of users and can provide users with relevant guidance. The knowledge model was not designed only for a specific domain; hence, the domain of the knowledge model can be expanded easily to other public places such as schools. The knowledge model and the system will be applied to actual public places in future studies in an effort to validate the overall usefulness of the proposed system. Acknowledgments. This work was supported by the IT R&D program of MKE/IITA. [2008-F-045-02, Development of Digital Guardian technology for the disabled and the aged person]
References 1. Kokar, M.M., Matheus, C.J., Baclawski, K.: Ontology-based situation awareness. Information fusion (2007) 2. Baldauf, M., Dustdar, S., Rosenberg, F.: A Survey on Context-Aware Systems. International Jounal of Ad Hoc and Ubiquitous Computing (2007) 3. Krummenacher, R., Strang, T.: Ontology-Based Context-Modeling. In: Third Workshop on Context Awareness for Proactive Systems (CAPS 2007) (2007)
An OWL-Based Knowledge Model for Combined-Process-and-Location Aware Service
167
4. Ongenae, F., Strobbe, M., Hollez, J., Jans, G.D., Turck, F.D., Dhaene, T., Demeester, P.: Ontology Based and Context-Aware Hospital Nurse Call Optimization. Complex, Intelligent and Software Intensive Systems (2008) 5. Chang, Y., Wang, T., Chuang, Y., Tsai, S.: Ontology-based Personalized Wayfinding System Using Deviation Detecting for Individuals with Cognitive Impairments. In: International Conference on Convergence Information Technology, pp. 1844–1848 (2008) 6. Niu, W.T., Kay, J.: Pervasive Personalisation of Location Information; Personalised Context Ontology. In: Adaptive Hypermedia Conference Website, pp. 143–152 (2008) 7. Han, M., Kim, G., Park, S., Kim, L., Ha, S.: Process and Location-aware Information Service System for the Disabled and the Elderly. In: 13th International Conference on Human-Computer. Springer, Heidelberg (2009)
Human-Biometric Sensor Interaction: Impact of Training on Biometric System and User Performance Eric P. Kukula1 and Robert W. Proctor2 1 Department of Industrial Technology, Purdue University, Biometric, Standards, Performance, & Assurance Laboratory 401 North Grant Street, West Lafayette, IN 47907-2021 USA 2 Department of Psychological Sciences, Purdue University 703 Third Street, West Lafayette, IN 47907-2081 USA {kukula,rproctor}@purdue.edu
Abstract. Increasingly sophisticated biometric methods are being used for a variety of applications in which accurate authentication of people is necessary. Because all biometric methods require humans to interact with a device of some type, effective implementation requires consideration of human factors issues. One such issue is the training needed to use a particular device appropriately. In this paper, we review human factors issues in general that are associated with biometric devices and focus more specifically on the role of training. Keywords: biometrics, human-biometric sensor interaction (HBSI), human performance, instruction, training.
1 Introduction Biometrics is defined as the automated recognition of behavioral and physiological characteristics of individuals [1]. Biometric methods are used to authenticate individuals in a multitude of scenarios, including, but not limited to access control, identity management, and time and attendance. Some biometric technologies include: fingerprint, facial, and iris recognition; hand geometry; vein pattern; and signature dynamics. These biometric technologies may be stand-alone systems, utilized as part of a multifactor authentication system (e.g., combined with physical possessions, such as an identification card, or knowledge, such as a personal identification number), or merged to form a multimodal biometric system (e.g., fingerprint and face) with applications varying widely in scope. Applications for biometrics range from military use, such as base access control and third country nationals identification, to protecting consumer information stored on electronic devices such as laptops, personal digital assistants, and cellular phones. These deployments have varied in scale and application, and the outcomes of such implementations have varied as well. However, as the biometrics community analyzes lessons learned, one continually discussed area is that of usability and design of biometric devices [2-7]. Questions that arise include: Can current biometric devices provide rapid and accurate responses when needed? Do existing state-of-the-art M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 168–177, 2009. © Springer-Verlag Berlin Heidelberg 2009
Human-Biometric Sensor Interaction: Impact of Training
169
devices function properly in all applications and environments? Can military personnel capture the biometric sample from a person of interest and not feel endangered? Are consumers satisfied with biometric devices, and do they understand how the devices work? The goal of research in the area of human-biometric sensor interaction (HBSI) is to address usability issues raised by questions of the above type and investigate them in order to develop the next generation of universally usable biometric systems. Historically, the biometrics community has performed limited work in the area of human-computer interaction (HCI), ergonomics, and usability. To reach the goal of universally usable biometric systems, however, further understanding of the user is needed. This understanding includes the exact nature of the physical and cognitive interactions that humans have with the sensor and biometric system, how users can best learn to use biometric technologies successfully, and to what extent users can transfer knowledge regarding use of one biometric system to use of another. Biometrics are ideally supposed to possess five desirable properties [8]: (1) universality – the biometric property is available on all people; (2) invariance – the features extracted are non-changing; (3) high intra-class variability – the features extracted from one user are distinct from those of all other users; (4) acceptability – the characteristic is suitable for use by everyone; (5) extractability – a sensor can extract the features presented in a repeatable manner. Although commonly described in the literature as the ideal characteristics of a biometric measure, each property must overcome challenges, as the majority of biometrics are challenged to satisfy all of these five categories. Though early research was concerned mainly with the design, development, and testing of biometric systems and algorithms, recent research has shown that human physical, behavioral, and social factors affect performance of the overall biometric system. This leads to an approach called biometric system ergonomic design (see Fig. 1). The basic idea is that in order for a biometric system to function properly, ergonomic design principles must be incorporated into its design. Factors to consider include the following. The physical environment may alter the biometric characteristics of a person so as to render that person unidentifiable in particular environmental contexts. Determining those physical limits to use a biometric system is important. Users may be uncertain about where, or how, to position themselves to the biometric sensor to get a valid reading, and they may have concerns about using biometric devices for certain applications and in certain contexts. User acceptance of any biometric system is a factor that must be considered in its implementation. As with any technology with which people are not familiar, instruction and training are necessary when a biometric device is first used. Training is included in biometric system ergonomic design. Training can provide users with knowledge about the properties of the system and with the required steps for their interaction with it. To the extent that biometric devices within a modality have similar features and functions, this training may be generalizable across the set of biometric devices for an entire modality, requiring no additional training when a new device is encountered. To the contrary, if biometric devices within a modality differ in key details regarding their operation, negative transfer may occur in which procedures learned to interact with one device may interfere with use of another device. General principles of training,
170
E.P. Kukula and R.W. Proctor
Fig. 1. Issues that affect biometric system performance and the relationship with the HBSI [9]
described later, can provide a basis for studying training within the specific context of use of biometric devices. If the biometric device is monitored by an assistant, the assistant may instruct the user about the steps in using the device. In most cases, though, an assistant will not be available, in which case training materials must be simple and easy to comprehend. They can consist of a poster that illustrates the steps to be taken by the user, possibly complemented by recorded audio instructions. Feedback is important in learning and performing any perceptual-motor act [10], and it therefore is important that feedback be provided regarding factors such as position of the hand and whether the process was completed successfully. When unsuccessful, the feedback should help enable the user to know what s/he should do to get a successful reading.
2 Biometric System Performance Traditional approaches to evaluate the performance of a biometric system have been system-level, meaning that evaluators and designers are more interested in system reported error rates, some of which include: Failure to Enroll (FTE) rate, Failure to Acquire (FTA) rate, False Accept Rate (FAR), and False Reject Rate (FRR). Traditional performance evaluations have worked well to evaluate emerging technologies, new biometric modalities, and algorithm revisions. Moreover, since biometrics entered the commercial marketplace, most research has been dedicated to development in three areas: improving performance, increasing throughput, and decreasing the size of the sensor or hardware device. Limited research has focused on ergonomic design and usability issues relating to how users interact and use biometric devices. Furthermore, limited research has examined human performance, specifically how training methods or techniques impact the HBSI and biometric performance. 2.1 Development of the Human-Biometric Sensor Interaction Any interaction of a human with a biometric sensor requires a series of steps. For example, human interaction with a biometric device that captures the fingerprint of a single finger will typically include the following [11]: (1) visually sight a prompt on a display terminal that prompts the user to place a finger on the fingerprint reader; (2) visually sight the fingerprint reader; (3) move a hand towards the fingerprint reader until it is in close proximity; (4) rotate the hand until the palm side is down; (5) extend a finger until it fits over the fingerprint reader; (6) visually sight the display terminal for confirmation that the fingerprint read was successful. Because several steps are involved in this interaction, task completion time may be long, errors of
Human-Biometric Sensor Interaction: Impact of Training
171
various types may occur, and, if the problems are very great, user resistance to using the device will increase (see [12] for more extended discussion of these factors). Errors can involve performing the steps out of order, failing to position the finger properly on the fingerprint sensor, failing to keep finger still for long enough to allow the fingerprint to be read, and so on. Seminal research and publication in the area of usability and accessibility, which was concerned with biometric system ergonomic design, were pioneered by the User Research Group at National Cash Register (NCR) [4, 13]. The United Kingdom Home Office Identity and Passport Service has also published reports based on their biometric trials and implementations which discuss biometric usability and ergonomic design [14]. Maple and Norrington [6] have reported on one particular trial of the United Kingdom’s Passport Service Trial Program and its usability, and found issues with each of the three evaluated biometric systems: fingerprint, face, and iris recognition systems. More recently, work in this area has focused on: • Creating an evaluation method for biometrics that examines biometric performance and usability [15, 16], • Determining “optimal” device heights for hand geometry [17] and fingerprint recognition [18], • Usability studies involving a ten print fingerprint capture device [19] and comparison between swipe and large area fingerprint sensors [20], and • Impact of demography on biometric sample quality and performance [21-23]. Research in the area of biometric system ergonomic design has been called the Human-Biometric Sensor Interaction, or HBSI and is shown in Fig. 2. The model shows how the different areas of research and principles of the different fields of ergonomics [26], usability [27], and biometrics [28] converge in the overlapping HBSI area.
Fig. 2. The Human-Biometric Sensor Interaction Model [9, 15, 16, 21, 24]
2.2 The Human-Biometric Sensor Interaction The human and sensor components of the HBSI model are similar to Tayyari and Smith’s human-machine interaction model [26]. Much like the traditional model, the
172
E.P. Kukula and R.W. Proctor
human and biometric sensor components look to achieve the optimal relation between humans and a biometric sensor in a particular environment. The overlap of these two sections is best summarized by ergonomics, with the goal of adapting the sensor so the presentation of a user’s biometric traits to the sensor is more natural to the user. The human and biometric system components of the HBSI model are arranged in the model to accommodate the way biometric sensors, software, and implementations occur and are presented to users. A biometric sensor must not only be designed so that a user can interact with it in a repeatable fashion, but the sensor(s), software, and the way the entire “system” is packaged must be usable. According to ISO 9241-11 [27] usability is comprised of three factors: effectiveness, efficiency, and satisfaction. Each of the three metrics is distinct and important to understand for products to balance between the three. First, biometric systems must be effective, meaning users are able to complete the desired tasks without too much effort. Second, biometric systems must be efficient, meaning users must be able to accomplish the tasks easily and in a timely manner. Third, users must like, or be satisfied, with the biometric system, or they will discontinue use and find alternative methods to accomplish the task. As mentioned in the previous two sections, users need to be able to interact with a sensor in a consistent manner over time, and users must find the entire biometric system usable. To enable this to occur, the third relationship of the HBSI conceptual model emerges – the sensor-biometric system, whose key metric is sample quality. Sample quality is the link between these two components because the image or sample acquired by the biometric sensor must contain the characteristics or features needed by the biometric system to enroll or match a user in the biometric system. It is well documented in the literature that sample quality affects the biometric matching algorithm. Yao, Pankanti, and Haas [29] stated that “in a deployed system, the poor acquisition of samples perhaps constitutes the single most important reason for high false reject/accept rates.” So, not only does the human-sensor relationship need to be functional and the human-biometric system need to be usable, the sensor-biometric system needs to be functional. An efficient sensor-biometric system only occurs if the sensor can capture and pass usable features to the biometric matching algorithm.
Fig. 3. The HBSI Evaluation Method [15, 16, 24]
Human-Biometric Sensor Interaction: Impact of Training
173
To evaluate the model, the overlap of the different components in Fig. 2 has been expanded to reveal the HBSI Evaluation Method (Fig. 3), which addresses how each area of overlap can be measured. Since the conceptual model is derived from different fields, each component (usability, ergonomics, and biometrics) produces a unique output. The authors acknowledge that the metrics used in the HBSI Evaluation Method may produce a trade-off between performance and usability.
3 Training and Instruction An important component for use of biometric devices is training. Users need to be trained how to use the various devices appropriately in order to optimize the biometric system performance. Yet, relatively little work has been done on training to use biometric devices. In addition to concern about training a person to use a particular device within a specific environment, it is also important that the training transfer to different environments in which the same device is used and, more importantly, to different devices of the same general type. 3.1 Principles of Training and Transfer Though research on training and transfer with biometric devices is sparse, research in cognitive psychology has established many broadly applicable principles of training and transfer for both knowledge and skills [30]. Many of these theoretical principles should be applicable to the domain of biometrics. Their application will help determine the best way to train people to use biometric devices and predict how well this training can transfer to new/alternative devices. We describe below various training principles [31] that could be applicable to developing effective training practices for use of biometric devices. One generally accepted principle is the power law of practice. According to this principle, the time to complete a task decreases as a function of the number of times it is performed. This means that having people perform the same task multiple times may be an effective form of training. A related principle is that of deliberate practice: Practice is most beneficial when it is highly motivated and focused. Deliberate practice has been shown to be necessary for skill acquisition in a variety of domains. The principle of depth of processing emphasizes that training which requires deep and elaborate processing, creating distinctive encodings, enhances durability of knowledge and skills. According to the principle of contextual reinstatement, performance of a task at a later time (the test) is improved by training that matches the test conditions as closely as possible. Closely related to this principle is that of procedural vs. declarative training. Procedural training is more durable than declarative training, whereas declarative training leads to better generalization. With regard to instance- vs. rulebased training, instance-based strategies lead to more efficient performance in simple tasks, whereas rule-based strategies lead to more efficient performance in more complex tasks. Also, rules tend to be more durably represented in memory than are instances. People will tend to remember rules better than specific instances.
174
E.P. Kukula and R.W. Proctor
Knowledge seeding is sometimes beneficial for training. When tasks require having a certain type of quantitative knowledge, providing a small number of examples is often sufficient knowledge to encompass an entire domain. The principle of spacing of practice is that knowledge is retained for longer periods of time when training sessions are spaced in time. The power law of forgetting is that performance decreases as a power function of the time since training [32]. One of the more intriguing findings in recent years is that testing impedes forgetting: Testing after initial presentation of material slows forgetting [33]. In the area of information security, this principle has been found to hold for memory of several passwords used to access different e-commerce sites [34]. The final principle is generalization depends on similarity. The gain in performance on one task as a consequence of training on a different task is an exponentially decaying function of the similarity between the two tasks [35]. Thus, transfer can be expected to occur to the extent that the transfer task is similar to the task used at training. 3.2 Training and Instruction Research in Biometrics Not much research has been conducted investigating the impact of training on biometric and usability performance measures. The most extensive study was one conducted by researchers at NIST to evaluate the time to acquire a 10-print slap fingerprint image using three different instruction methods [19]. Participants were told to make a right slap, left slap, and simultaneous thumb prints. One of three instructional techniques was used for each participant: a poster illustrating the steps; verbal instructions spoken by the test administrator; a soundless video demonstrating the procedure. Participants who received the poster instructions made more errors than those who received the verbal or video instructions. Moreover, only 56% of the participants in the poster group completed the fingerprinting process, a smaller percentage than in the other instruction groups, and those who did complete the process took longer to do so than did the participants instructed by the other methods. A limitation of the study is that participants instructed by poster were allowed to selfdetermine the duration for which they examined the poster, whereas participants in verbal and video conditions were required to engage in the training on average approximately 50 s. In fact, the median time spent examining the poster was approximately half the duration of the other instructional conditions. Thus, the poorer performance with the poster instructions may be a consequence of not studying the poster for a sufficiently long duration. We currently are conducting an experiment to evaluate this possibility by allowing one group of participants to self-determine how long to examine the poster, as in [19], but having another group examine the poster for a fixed period of 50 s, a duration comparable to that for the other training methods in [19]. Preliminary results agree with the findings of [19] in suggesting that the poster alone is not a particularly effective instruction method, even when participants are required to view the poster for a longer time than they do when allowed to self-terminate. Kukula, Gresock, Elliott, and Dunning [36] examined the influence of type and amount of user training on interaction with a hand geometry biometric device. Because hand geometry depends on orientation of the user’s hand, most hand geometry devices have pins with respect to which the user’s hand must be positioned
Human-Biometric Sensor Interaction: Impact of Training
175
for correct alignment. Training is perhaps even more important for hand geometry devices than for fingerprint reasons because it needs to convey how users should interact with the alignment pins. Four groups of participants received initial instructions and demonstrations of using a hand geometry device. They then received different amounts of practice with the device one day per week in some weeks of a six-week training period. In the seventh week, all participants were required to achieve a criterion of three consecutive successful readings with the device at a set performance level (threshold set to 30). The control group performed only the criterion task in the last session. The group that received the most practice performed the task to the same criterion of three successful readings in four of the six weeks prior to the final test. This group showed continued improvement in performance over the period and performed the best of any group in the final session. This result is consistent with the power law of practice, for which performance improves with continued practice. Of interest were third and fourth groups, each of which were required to perform only three successful readings prior to the final session, but differed in when those were done. One group performed all three readings in the first session (week 1), whereas the other group had the readings spread out over three different sessions, each separated by a week. The former group did not perform any better in the test session than in week 1 and was the worst of the three practice groups in the week 7 test. In contrast, the latter group showed a benefit of the prior practice. This outcome is consistent with the spacing of practice principle, though differences in retention interval preclude a clear attribution of the difference to that factor. The point of these studies is that various aspects of training, including instructional materials and practice schedules, influence users’ performance with biometric devices. This work is only a start of the research that needs to be done to establish the most effective ways to teach people to use various biometric devices.
4 Summary and Conclusions In this paper, we connect cognitive ergonomics to biometrics because prior work in the area of biometric system ergonomic design has tended to consider mainly physical ergonomics and usability. We focused on principles of training and transfer that need to be applied to and evaluated in biometrics to ensure that users know how to interact with biometric devices. Cognitive ergonomics has tended to be left out of biometric considerations, as in the case of the HBSI Evaluation Method (Fig. 3). Thus, future work needs to be done to further examine and revise the HBSI Evaluation Method to include assessment of training and transfer, as well as other cognitive factors that have been shown to affect performance.
References 1. International Organization for Standardization: ISO/IEC JTC1/SC37 Standing Document 2 - Harmonized Biometric Vocabulary. ISO/IEC, Geneva SC37N1779 (2007) 2. Origin, A.: UK Passport Service Biometrics Enrolment Trial (May 25, 2005) 3. Batch, K., Millett, L., Pato, J.: Summary of a Workshop on the Technology, Policy, and Cultural Dimensions of Biometric Systems. National Research Council 0-309-6578t7-3, March 15-16, 2005 (2006)
176
E.P. Kukula and R.W. Proctor
4. Coventry, L., De Angeli, A., Johnson, G.: Usability and biometric verification at the ATM interface. In: Conference on Human Factors in Computing Systems, Ft. Lauderdale, FL (2003) 5. Jain, A., Pankanti, S., Prabhakar, S., Hong, L., Ross, A.: Biometrics: A Grand Challenge. In: 17th International Conference on Pattern Recognition (ICPR 2004), Guildford, UK (2004) 6. Maple, C., Norrington, P.: The Usability and Practicality of Biometric Authentication in the Workplace. In: First International Conference on Availability, Reliability and Security (ARES 2006), Vienna, Austria (2006) 7. Rood, E., Jain, A.: Biometric Research Agenda: Report of the NSF Workshop. National Science Foundation, Morgantown EIA-0240689, April 29-May 2 (2003) 8. Clarke, R.: Human Identification in Information Systems: Management Challenges and Public Policy Issues. Info. Technol. & People 7, 6–37 (1994) 9. Kukula, E.P., Elliott, S.J.: Biometric System Ergonomic Design. In: Li, S. (ed.) Encyclopedia of Biometrics, 1st edn., p. 1000. Springer, Heidelberg (2009) 10. Swinnen, S.P.: Information Feedback for Motor Skill Learning: A Review. In: Zelaznik, H.N. (ed.) Advances in Motor Learning and Control, pp. 37–65. Human Kinetics, Champaign (1996) 11. Schultz, E.E., Proctor, R.W., Lien, M.-C., Salvendy, G.: Usability and Security: An Appraisal of Usability Issues in Information Security Methods. Computers & Security 20, 620–634 (2001) 12. Proctor, R.W., Lien, M.-C., Salvendy, G., Schultz, E.E.: A Task Analysis of Usability in Third-Party Authentication. Information Security Bulletin, 49–56 (April 2000) 13. Coventry, L., De Angeli, A., Johnson, G.: Biometric Verification at a Self Service Interface. In: McCabe, P. (ed.) Contemporary Ergonomics, pp. 247–252. Taylor & Francis, London (2003) 14. Home Office Identity & Passport Service. Publications, vol. 2007 (2007) 15. Kukula, E.P.: Design and Evaluation of the Human-Biometric Sensor Interaction Method. Ph.D. Dissertation, p. 510. Purdue University, West Lafayette (2008) 16. Kukula, E.P., Elliott, S., Duffy, V.: The Effects of Human Interaction on Biometric System Performance. In: 12th International Conference on Human-Computer Interaction and 1st International Conference on Digital-Human Modeling, Beijing, China (2007) 17. Kukula, E.P., Elliott, S., Tamer, S., Senarith, P.: Biometrics and Manufacturing: A Recommendation of Working Height to Optimize Performance of a Hand Geometry Machine. Biometrics Standards, Performance, & Assurance Laboratory, West Lafayette BSPA/09-0001 (2007) 18. Theofanos, M., Orandi, S., Micheals, R., Stanton, B., Zhang, N.: Effects of Scanner Height on Fingerprint Capture. National Institute of Standards and Technology, Gaithersburg NISTIR 7382 (2006) 19. Theofanos, M., Stanton, B., Orandi, S., Micheals, R., Zhang, N.: Usability Testing of TenPrint Fingerprint Capture. NIST, Gaithersburg NISTIR 7403 (2007) 20. Kukula, E.P., Elliott, S., Wolleschensky, L., Parsons, M., Whitaker, M.: Analysis of Performance and Usability of Small-Area and Swipe Fingerprint Sensors Using FTA and FTE. Purdue University (2007) 21. Elliott, S.J., Kukula, E.P., Modi, S.: Issues Involving the Human Biometric Sensor Interface. In: Yanushkevich, S., Wang, P., Gavrilova, M., Srihari, S. (eds.) Image Pattern Recognition: Synthesis and Analysis in Biometrics, pp. 339–363. World Scientific, Singapore (2007)
Human-Biometric Sensor Interaction: Impact of Training
177
22. Frick, M., Modi, S.S., Elliott, S., Kukula, E.P.: Impact of Gender on Fingerprint Recognition Systems. In: Proceedings of the 5th International Conference on Information Technology and Applications, Cairns, Australia (2008) 23. Modi, S., Elliott, S.: Impact of Image Quality on Performance: Comparison of Young and Elderly Fingerprints. In: 6th International Conference on Recent Advances in Soft Computing (RASC 2006), Canterbury, UK (2006) 24. Kukula, E.: Understanding the Impact of the Human-Biometric Sensor Interaction and System Design on Biometric Image Quality. In: NIST Biometric Quality Workshop II, Gaithersburg, MD (2007) 25. Young, M., Elliott, S.J.: Image Quality and Performance Based on Henry Classification and Finger Location. In: IEEE Workshop on Automatic Identification Advanced Technologies, Alghero, Italy (2007) 26. Tayyari, F., Smith, J.: Occupational Ergonomics: Principles and Applications. Kluwer Academic Publishers, Norwell (2003) 27. International Standards Organization: ISO 9241: Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs) - Part 11: Guidance on Usability. ISO, Geneva (1998) 28. International Standards Organization: Information Technology - Biometric Performance Testing and Reporting - Part 1: Principles and Framework," ISO/IEC, Geneva ISO/IEC 19795-1(E) (2006) 29. Yao, M., Pankanti, S., Haas, N.: Fingerprint Quality Assessment. In: Ratha, N., Bolle, R. (eds.) Automatic Fingerprint Recognition Systems, pp. 55–66. Springer, New York (2004) 30. Proctor, R.W., Dutta, A.: Skill Acquisition and Human Performance. Sage, Thousand Oaks (1995) 31. Healy, A.F.: Skill learning, enhancement of. In: Pashler, H. (ed.) Encyclopedia of the Mind. Sage, Thousand Oaks (in press) 32. Wickelgren, W.A.: Trace Resistance and the Decay of Long-Term Memory. J. of Math. Psychol. 9, 418–455 (1972) 33. Carpenter, S.K., Pashler, H., Wixted, J.T., Vul, E.: The Effects of Tests on Learning and Forgetting. Memory & Cognition 36, 438–448 (2008) 34. Vu, K.-P.L., Proctor, R.W., Bhargav-Spanzel, A., Tai, B.-L., Cook, J., Schultz, E.E.: Improving Password Security and Memorability to Protect Personal and Organizational Information. Int. J. of Human-Computer Studies 65, 744–757 (2007) 35. Shepard, R.N.: Toward a Universal Law of Generalization for Psychological Science. Science 237, 1317–1323 (1987) 36. Kukula, E.P., Gresock, B.P., Elliott, S.J., Dunning, N.W.: Defining Habituation using Hand Geometry. In: IEEE Workshop on Automatic Identification Advanced Technologies, Alghero, Italy (2007)
Representing Logical Inference Steps with Digital Circuits Erika Matsak Tallinn University, Department of Computer Science, 25 Narva Road, 10120 Tallinn, Estonia and Tallinn University of Technology, Department of Computer Engineering, Raja 15, 12618 Tallinn, Estonia [email protected]
Abstract. The use of inference steps in natural language reasoning is observed. An algorithm is presented for representing logically correct inference steps with digital circuits. New foundations for creating decision making systems are studied. Keywords: Logical inference steps, logic gates, digital circuits representing logical inference steps.
1 Introduction Many decisions that are required for efficient results with modern systems need to be made without human intervention. For example, driving a Mars rover remotely from Earth is not practical because the sensor information from Mars takes tens of minutes to reach Earth and it takes equally long for the steering commands to reach the rover. Another example would be taking defensive action in case of cyber attacks: a human will not be able to understand the situation and make a (informed) decision in a fraction of a second. Therefore, computers must make these necessary decisions. At the same time we want that the computer-made decisions would be at least as reliable as the one that an intelligent person would make (if he/she would be able to do that). This brings us to the point that the decision making computer must possess logical instruments: logic formulas (for formulating propositions) and logic inference rules (for constructing an argument). A problem in this case is that a number of different logic systems are in use. For example, classical logic is suitable for operating with legal arguments, while intuitionistic logic [16, 17] can be used for structural program synthesis. One of the ways to distinguish the various logic systems is to use inference rules and the corresponding inference steps. In information technology the inference rules (using inference steps to move between formulae) are implemented in software [1, 2]. However, this does not have to be the only viable option. It is not excluded, in principle, that some of the inference steps could be more efficiently implemented at the hardware level. For this we would first need to develop instruments that allow the separation of logical constructs, such as formulas and inference steps, from natural language [6, 10-14]. We could then proceed to implement these constructs with digital circuits using logic gates [5]. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 178–184, 2009. © Springer-Verlag Berlin Heidelberg 2009
Representing Logical Inference Steps with Digital Circuits
179
2 Inference Steps and Implications in Logic Gate Circuits Let us agree that within this paper we rely on the classic bivalent logic and that we will stay in the confines of first-order predicate calculation. In this case we can use the fact that each correct inference step corresponds to a correct implication, which has the conjunction of the premises of the inference step as an antecedent and the formula of the conclusion of the inference step as a result [6]: M P … Q
- inference step,
(1)
R (M&P& … &Q)⊃R - corresponding implication. From this point on the implication representing an inference step will be matched with a two-part digital circuit, where the first part (above the dotted line on drawings) represents the conjunction of the premises of the inference step M&P& ... &Q, and the second part represents the formula R. Note. In bivalent logic the implication can be replaced by the disjunction of the negation of the antecedent and the result (for example, the implication X⊃Y can be replaced with the disjunction ¬X∨Y). We did not use such replacements above! Therefore, for example, the Modus Ponens inference step is not represented with the formula ¬(A&(¬A∨B))∨B, but with a two part circuit, where the first part represents the formula A&(¬A∨B) and the second part represents the formula B. The solution described above allows for representing inference steps with traditional digital circuits composed of three types of logic gate elements: negation, conjunction and disjunction. As explained previously, each circuit is divided into two parts, where the first part represents the list of premises and the second part represents the conclusion formula. The use of the inference step therefore corresponds to moving from the first part of the circuit to the second part. A separate problem in here is creating such circuits as well as suitable visualization software, which was not as simple as it first appeared.
3 An Algorithm for Using Logic Gates to Design a Digital Circuit That Represents Inference Steps While designing a digital circuit we assume that as we move from left to right in the formula all signals must have reached the corresponding gates. In order to guarantee this property, we will change the formula (and sub-formulas) as necessary: − If the formula contains a conjunction that is not in parentheses and immediately before or after it are other operations then the conjunction must be surrounded by parentheses. − For example we replace the formula A ∨ B & C ⊃ D with the formula A ∨ (B & C) ⊃ D − If the formula contains sub-formulas or their negations then we nest the components from left to right in successive parentheses. − For example we replace the formula ¬A & B & ¬C & D with the formula (((¬A & B) & ¬C) & D). We use an analogous process in a formula consisting of only disjunctions. − If conjunctions (disjunctions) contain sub-formulas of various lengths (including negations or “quantifications” of formulas) then we arrange them from left to right by order of decreasing length (number of symbols).
180
E. Matsak
− For example we replace the formula ( Δ ∨ Γ ) & ( ¬Α & Γ ) ∨ ( ( Α & Β ) & Χ ) with the formula ( ( Α & Β ) & Χ ) ∨ ( ¬Α & Γ ) & ( Δ ∨ Γ ). − We replace implications with applicable formulas consisting of negations, conjunctions and disjunctions. For example we replace the formula X⊃Y with the formula ¬X∨Y. − If following the rearrangements there is a negation at the right end of the formula then we surround it with parentheses. For example we replace Δ&¬B with Δ&(¬B). If there are two conjunctions or two disjunctions without parentheses at the right end of the formula, then we surround them with parentheses. For example we replace Δ & A & B with Δ & ( A & B ). Similarly, we replace Δ ∨ A ∨ B with Δ ∨ ( A ∨ B ). − The final change is perhaps the most unusual. We write the negation symbol after the formula in question, not before. For example, we replace ¬C with C¬. The described changes enable the use of the algorithm in Figure 1.
Fig. 1. The algorithm for designing an inference step
Representing Logical Inference Steps with Digital Circuits
181
Using the algorithm in figure 1 we get the following circuit for the Modus Ponens inference step:
Fig. 2. “Digital” Modus Ponens
By introducing the universal quantifier to the rule [3] ( Α(β) & Γ ) ⊃ Δ ( ∀ξ Α(ξ) & Γ ) ⊃ Δ we get the following digital circuit:
Fig. 3. Digital circuit of the univeral quantifier rule (∀+→)
(2)
182
E. Matsak
4 Circuits in Practice The logic module of decision system could consist of the following components: − a binary matrix of notation-denotation (symbol-meaning) relations, where rows represent notation (symbols, signs) and columns represent denotation (meaning). The intersections contain markers (for example 1 or 0) that indicate that the corresponding notation applies to the denotation (or not). It is important to remember, that according to Lorents [7-9] some notations may have multiple denotations, and some denotations may have multiple notations; − the set of correct formulas; − the set of inference rules. Before implementing the inference steps by using digital circuits, the data in the role of predicates must be inserted. In order to achieve this, the relation between formulas and digital logic gates must be established. Since classically there are two possible truth values 1 and 0 (or true and false) and each logic gate also has two values 1 and 0 (or High Voltage (+5V) and Low Voltage (0V)), then it is natural to connect the gates in a way that correct atomal formulas are represented by the signal “1”. Non-atomal formulas should be treated in the following way: − Identify the part of the circuit that corresponds to the non-atomal formula in question; − Identify the input points (corresponding to the atomal formulas) for that specific circuit part; − Identify an input signal combination for the above input gates that produces “1” as an output for that circuit part. This way we can provide the necessary input signals to the (upper) part of digital circuits, which corresponds to the predicates of the inference step. The construction of decision may take the simple form of “fitting” puzzle pieces, where the “suitable” premise set of an inference rule allows the rule to be matched to a combination of existing formulas that normally do not involve more than a few formulas.
5 Advantages of the Proposed Circuits While creating decision making systems (that are based on, for example, binary decision diagrams (BDD), negation normal form (NNF), propositional directed asyclic graph (PDAG), etc.) data structures related to Boolean functions are often used. The logical operations used to form decisions are simple: AND, OR, NOT. In recent years, several problems have surfaced in solutions relying on neural networks or graphs. This does not mean that these methods should be cast aside (for example, neural networks have advantages in modeling non-linear characteristics of sample data – [4]). However, systems based on implementing inference steps with digital circuits also have advantages. One source of these advantages is the ability to include “regular” operations (AND, OR, NOT), as well as other operations (implication, etc.) and quantifiers. Second and more important advantage is the possibility to notably
Representing Logical Inference Steps with Digital Circuits
183
“shorten” the decision making process (it is well known from logic studies that manipulating with the rules may sometimes allow an exponential (!) decrease in the number of inference steps).
6 Conclusion The described digital circuits consisting of logic gates are not the only way to represent inference steps. In principle, using special transformations one could implement them in neural networks [15] or other circuits. The important part here is how to implement logical inference steps in hardware based on the logical constructs extracted from natural language. It is possible that a similar implementation is present in the human brain, which allows us to use logical constructs, including the ability to formulate propositions and to come up with the correct conclusion. Acknowledgements. The author of the given work expresses profound gratitude to professor Peeter Lorents for assistance in a writing of given clause and to Rain Ottis for English version edition.
References 1. Chang, C., Lee, R.: Symbolic logic and mechanical theorem proving. Academic Press, New York (1973) 2. Fitting, M.: First-Order Logic and Automated Theorem Proving, 2nd edn. Springer, Heidelberg (1996) 3. Gentzen, G.: Die Widerspruchsfreiheit der reinen Zahlentheorie. Mathematische Annalen 112, 493–565 (1936) 4. Kim, D., Lee, J.: Rule Reduction over Numerical Attributes in Decision Trees Using Multilayer Perceptron. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS, vol. 2035, p. 538. Springer, Heidelberg (2001) 5. Kunz, W., Stoffel, D.: Reasoning in Boolean Networks: Logic Synthesis and Verification Using Testin Techniques. Kluwer Academic Publishers, Dordrecht (1997) 6. Lorents, P.: Language and logic. EBS Print, Tallinn (2000) 7. Lorents, P.: Formalization of data and knowledge based on the fundamental notationdenotation relation. In: Proceedings of the International Conference on Artificial Intelligence, ICAI 2001, vol. III, pp. 1297–1301 (2001) 8. Lorents, P.: Knowledge and understanding. In: Proceedings of the International Conference on Artificial Intelligence, ICAI 2004, vol. I, pp. 333–337 (2004) 9. Lorents, P.: Taxonomy of intellect. In: Proceedings of the International Conference on Artificial Intelligence, ICAI 2008, vol. II, pp. 537–544 (2008) 10. Matsak, E.: Dialogue system for extracting Logic constructions in natural language texts. In: Proceedings of the International Conference on Artificial Intelligence, ICAI 2005, vol. II, pp. 791–797 (2005) 11. Matsak, E.: Using Natural Language Dialog System DST for Discovery of Logical Constructions of Children’s Speech. In: The 2006 International Conference on Artificial Intelligence, ICAI 2006, Las Vegas, Nevada, USA (2006)
184
E. Matsak
12. Matsak, E.: System DST for Transforming Natural Language Texts, Representing Estimates and Higher Order Predicates and Functionals. In: The 3rd International Conference on Cybernetics and Information Technologies, Systems and Applications: CITSA 2006, Orlando, Florida, USA (2006) 13. Matsak, E.: The prototype of system for discovering of inference rules. In: Proceedings of the International Conference on Artificial Intelligence. International Conference on Artificial Intelligence, ICAI 2007, vol. II, pp. 489–492 (2007) 14. Matsak, E.: Improved version of the Natural Language Dialog System DST and its application for discovery of logical constructions in children’s speech. In: International Conference on Artificial Intelligence, ICAI 2008, vol. I, pp. 332–338 (2008) 15. Minsky, M.: Finite and Infinite machines. Prentice-Hall, Inc., Englewood Cliffs (1967) 16. Mints, G., Tyugu, E.: Justification of the structural synthesis of programs. Science of computer programming 2(3), 215–240 (1982) 17. Mints, G., Tyugu, E.: The programming system PRIZ. Journal of Symbolic Computation (4) (1987)
An Interactive-Content Technique Based Approach to Generating Personalized Advertisement for Privacy Protection Wook-Hee Min and Yun-Gyung Cheong Graphics&OS Group, Samsung Advanced Institute of Technology Giheung-Gu, Gyeonggi-Do, South Korea {wookhee.min,yuna.cheong}@samsung.com
Abstract. Personalized contents have been getting more attention from industry and academia due to its effective communicative role in product advertisements. However, there exist potential threats to the customer’s privacy in conventional approaches where a data server containing customer profiles is employed or the customer profiles is required to be sent over the public network. To address this, this paper describes a framework that employs a script-based interactive content technique for privacy protection. We illustrate our approach by a sample scenario. Keywords: Privacy, Interactive content, Personalized advertising.
1 Introduction While traditional media (e.g., TV, radio station, newspaper, billboards) advertise products and services to people non-selectively, sponsors are getting more interested in personalized advertisements that present products and services for their prospective customers. Personalized advertisements also have drawn more attentions from researchers in multimedia, e-commerce, and AI because contents and presentations that are tailored to an individual’s preference tend to attract her attention. In a conventional approach for the provision of personalized contents, the service framework either contains a server that keeps track of the customer’s preferences and interactions (e.g., web page visits, search keywords, shopping transactions) or requires that the customer device (e.g., mobile handsets, web browsers, IPTV) send her personal information over the public network in order to select advertisements pertinent to her [1, 3, 9, 12]. These approaches, however, have potential threats to the customer’s privacy. For instance, the customer’s personal information could be disclosed in the process of collecting and managing his profile in a system that employs a server storing user profiles; if the system is violated, the privacy of thousands of customers will be threatened, unfortunately. In this system, additionally, while the user profile is sent to a server over a network, the network shall be guaranteed to be secure. To address this privacy related issues, conventional approaches to generating personalized commercials for privacy protection have focused on data management, M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 185–191, 2009. © Springer-Verlag Berlin Heidelberg 2009
186
W.-H. Min and Y.-G. Cheong
data exchange protocol and security. For example, a server and a client define a privacy policy respectively and the user is asked if he consents the policy provided by the server when the two policies do not match [14]. In anonymous communication approaches, a random number is assigned to each user in order to remove his identifying information, and thus preventing his privacy from being disclosed. [7, 18] As an online practice for this approach, Phorm developed an online advertising system that provides relevant ads for internet users based on keywords acquired from a combination of search terms, URLs and contextual page analysis, etc without infringing their privacies [13]. This system anonymises the users by assigning a unique random number to each user and by incinerating the user’s web surfing traces as it extracts keywords (e.g., cameras, autos, etc.) of products or services that would interest the user. These keywords are kept until appropriate advertising channels for a user are found and their advertisements are delivered to the user. Additionally, Phorm allows users to switch this advertising service on or off at any time while most advertising companies deliver advertisements regardless of users’ preference. Unlike other conventional approaches, our work is closely related to interactive contents structure and script-based graphics that realize 3D animation in real-time. Interactive content unfolds differently as it interacts with the user. There are two primary approaches to representing interactive contents (or stories), branching graphs and planning techniques. In branching graph approaches, interactive contents comprise a series of nodes and conditional branches; a node describes a unit content and a conditional branch refers to a transition from a node to another that is carried out when user inputs and/or story states meet pre-defined logical statements [4, 5, 6, 15, 16]. In the planning formalism, a content plot is constructed as the planner algorithm generates links between unit contents considering users’ interaction, and preconditions and postconditions of the unit contents. In this paper, we present a framework that employs a script-based interactive content technique [2, 6, 8, 10, 17] which can provide personalized advertisements for users without compromising their privacy. The next section details our work followed by a simple example which illustrates personalized commercial generation. In the final section, we conclude this paper by discussing the impact of our research and plan for future work.
2 A Framework for Generating Personalized Contents This section describes our framework that generates personalized advertisement without exchanging the customer’s personal information or her web surfing activities. When a request for a personalized commercial of a product is issued, the system sends entire interactive contents containing complete commercials that would cover all types of prospective customers to the user-side device so that the device can locally determine appropriate contents for the user. One major challenge of this approach is the large data volume transmitted on limited network resources, especially in wireless channels. To address this, our approach employs a script-based 3D graphic engine that converts text into 3D animation [2, 6, 8, 10, 17], as detailed below.
An Interactive-Content Technique Based Approach
187
Fig. 1. A Framework for privacy-safe personalized advertisement
Our framework consists of a server and a user-device (i.e., client) as in Figure 1. When the user asks for content (e.g., an advertisement for a product), the user device sends the request (e.g., a text such as vacuum cleaner) to the server. The server then looks up the content database seeking for an interactive content corresponding to the advertisements for the product, and it sends the interactive content back to the user device over the network. Upon receiving the interactive content from the server, the user device realizes the content as a 3D animation on the device which consists of a rendering module, a branching manager, a planner and a user profile. While producing a 3D animation, our approach achieves to generate a personalized content in a secure manner. Since a branching manager and a planner determine the next node (i.e., unit content) to be presented to the user as they consult with the user profile and then the rendering module realizes each node into an animation, a content can be displayed non-identically but in even more appealing way according to users; here, the user model could include a variety information from relatively fixed personal information such as age, gender and address to changing and complicated information such as the user’s current location or life style. In addition to the matter of personalized advertising, the user profile is stored in the user device and is disclosed outside on no occasion, thereby guaranteeing users’ privacy to be protected. These node selection and realization phases iterate until the last node of the interactive content structure is finally rendered. To minimize the data volume communicated over the network, we utilize a text-based script to represent the interactive content. More details on our script-based interactive-content authoring tool and player which employ the TVML (TV program Making Language) technique as its framework can be found in Cheong et al. [2]. The client device also has a user profile manager that enables a user profile to be updated consistently with detailed version by information induced from analysis on a user’s life pattern and viewing habit, environmental data surrounding a user, etc. A user’s habit to viewing advertisements, for example, could be categorized into one of three propensity: indifferent, greedy or selective; to bypass most of advertisements means that the user is probably indifferent to advertisements and thus the user profile manager can update the value on the corresponding field (e.g., user’s propensity to advertisements) in that user profile.
188
W.-H. Min and Y.-G. Cheong
The process of presenting advertisements with our approach guarantees advertisements to be customized for each device owner and, at the same time, prevents an intrusion of user privacy by preserving the user model in the user-side device. The system, also, continues gathering a series of users’ behaviors and updating the user model based on the analysis about them, thus we expect each user to have a stronger interest in an advertisement than the user would have with advertisements made only with the fixed information.
3 A Simple Example In this section, we deal with an example for our framework and describe how a personalized advertisement could be generated without harming one’s privacy. Figure 2 shows a sample advertisement which is made up with an interactive content structure proposed in Cheong et al. [2, 11], and table 1 describes two user profiles that could be applied to the sample advertisement in figure 2.
.. .
Fig. 2. Sample interactive content for a vacuuming robot advertisement
The nodes in the graph of the figure 2 constitute a sequence of actions, and the branches structure the multi-story line by integrating nodes with events which represents a transition between a pair of nodes. While an event in a conventional interactive content system corresponds to a user input such as mouse movement or text typing, our approach utilizes a user model (e.g., the target user’s profile) as well as the conventional events in order to determine the unit content to be presented in the next sequence.
An Interactive-Content Technique Based Approach
189
Table 1. Sample user profiles for the interactive content described in figure 2
Field name
User profile 1
User profile 2
Name Gender Address Marriage Children Pet Job Current Location
Wooky, Min M San Monica, CA, USA N/A None Cat N/A Town and Country Resort & Convention Center, San Diego, CA, USA
Rita, Lee F Chicago, IL, USA Y 3 year-old baby None Computer Scientist N/A
Figure 2 describes an interactive content for the advertisement of vacuuming robot products. The initial node N1 begins with a voice introduction of a vacuum cleaning robot. On the completion of N1, the system checks if the device owner has a pet by referencing his user profile stored in the device; in the case of a user who corresponds to the user profile 1, an animation that the robot cleans up the pet’s hairs scattered on the floor can be shown to the user (as textually described in N2). A user whose information constitutes the user profile 2, on the other hand, would experience N3 which illustrates a child dropping her cranberry juice on a carpet followed by the vacuum machine removing the stain. The rest of this section discusses an advertising scenario when the user profile 1 is employed. When the presentation of N2 is completed, the system tests the user’s residential district to determine the neighborhood and cultural environment with which the user would feel familiar (e.g., background setting in the advertisement). For instance, a resident in an urban community may prefer an episode in a busy town. This data can be reused for selecting the final node which contains the direction to the nearest store from the user’s place. In case where the user actively tries to check out other aspects of the products which are not explained in the main plot of an advertisement, we define global content units (N31 and N32 in the Figure 2) that can be accessed in response to the user’s input. Since these nodes do not have any incoming or outgoing branches, they are presented to the user only when he directly inquires through an input device such text typing or mouse click. For instance, if he touches a button labeled “specification” in the middle of watching the advertisement, the product specification will be shown to him before the next content unit is presented. More details on managing main plot and global units are explained in Cheong et al. [2, 11].
4 Conclusion In conclusion, we present a personalized advertisement generation approach that protects the target user’s privacy by the use of interactive content techniques. Unlike other conventional approaches, our approach creates personalized advertising contents without exchanging the user’s private information on the communication channel.
190
W.-H. Min and Y.-G. Cheong
Instead, we send an interactive content structure that contains the entire advertisement scenarios to the user device; then the device is responsible for selecting and rendering some of the contents that are appropriate to the user model on the device. Since there is an issue regarding data volume in transmitting the whole content structure as 3D raw images, we also give a brief introduction of our framework that employs a text-based approach in representing 3D animation thereby minimizing the data transmission cost. Up to this point, we have implemented a text-based 3D animation toolkit, which enables users to describe a story in a script format and realizes it into a 3D animation. Our future work is to augment the functionalities of the toolkit to the extent that it supports authoring and rendering interactive contents so that it can facilitate to create and present personalized advertisements and to conduct formal evaluation of our framework.
References 1. Bozios, T., Lekakos, G., Skoularidou, V., Chorianopoulos, K.: Advanced Techniques for Personalized Advertising in a Digital TV Environment: The iMEDIA System. In: Proceedings of the 2001 eBusiness and eWork Conference, Venice, Italy, pp. 1025–1031 (2001) 2. Cheong, Y.-G., Kim, Y.-J., Min, W.-H., Shim, E.-S., Kim, J.-Y.: PRISM: A Framework for Authoring Interactive Narratives. In: Interactive Storytelling 2008, pp. 26–29, Erfurt, Germany (2008) 3. Chorianopoulos, K., Lekakos, G., Spinellis, D.: Intelligent user interfaces in the living room: usability design for personalized television applications. In: Proceedings of the 8th international conference on Intelligent user interfaces, Miami, Florida, USA, pp. 230–232 (2003) 4. INSCAPE (2008), http://www.inscapers.com 5. Iurgel, I.: From Another Point of View: ArtEFact. In: Göbel, S., et al. (eds.) TIDSE 2004. LNCS, vol. 3105, pp. 26–35. Springer, Heidelberg (2004) 6. Kelso, M.T., Weyhrauch, P., Bates, J.: Dramatic Presence. The Journal of Teleoperators and Virtual Environments 2(1), 1–15 (1993) 7. Lucent Personalized Web Assistant, http://www.lpwa.com/ 8. Magerko, B.: Story Representation and Interactive Drama. In: Proceedings of the 1st Annual Conference on Artificial Intelligence and Interactive Digital Entertainment. AAAI Press, Marina del Rey (2005) 9. Langheinrich1, M., Nakamura, A., Abe, N., Kamba, T., Koseki, Y.: Unintrusive Customization Techniques for Web Advertising. In: WWW 1999: Proceeding of the eighth international conference on World Wide Web, New York, USA, pp. 1259–1272 (1999) 10. Mateas, M., Stern, A.: Structuring Content in the Façade Interactive Drama Architecture. In: Proceedings of the 1st Annual Conference on Artificial Intelligence and Interactive Digital Entertainment, pp. 93–98. AAAI Press, Marina del Rey (2005) 11. Min, W.H., Shim, E.S., Kim, Y.J., Cheong, Y.G.: Planning-integrated story graph for interactive narratives. In: Proceeding of the 2nd ACM international workshop on Story representation, mechanism and context, Vancouver, British Columbia, Canada, pp. 27–32 (2008) 12. Otsuka, T., Onozawa, A.: Personal Information Market: Toward a Secure and Efficient Trade of Privacy. In: Kim, W., Ling, T.-W., Lee, Y.-J., Park, S.-S. (eds.) Human. Society. Internet 2001, vol. 2105, p. 151. Springer, Heidelberg (2001)
An Interactive-Content Technique Based Approach
191
13. Phorm, Inc., http://www.phorm.com/ 14. Platform for Privacy Preferences (P3P), http://www.w3.org/P3P/ 15. Spierling, U., Weiß, S.A., Müller, W.: Towards Accessible Authoring Tools for Interactive Storytelling. In: Göbel, S., Malkewitz, R., Iurgel, I. (eds.) TIDSE 2006. LNCS, vol. 4326, pp. 169–180. Springer, Heidelberg (2006) 16. Swartout, W., Hill, R., Gratch, J., Johnson, W.L., Kyriakakis, C., LaBore, C., Lindheim, R., Marsella, S., Miraglia, D., Moore, B., Morie, J., Rickel, J., Thiébaux, M., Tuch, L., Whitney, R., Douglas, J.: Toward the holodeck: integrating graphics, sound, character and story. In: 5th international conference on Autonomous agents, Montreal, Quebec, Canada, pp. 409–416 (2001) 17. Young, R.M., Riedl, M.O., Branly, M., Jhala, A., Martin, R.J., Saretto, C.J.: An Architecture for Integrating Plan-based Behavior Generation with Interactive Game Environments. Journal of Game Development 1(1), 51–70 (2004) 18. Zero-Knowledge System, Inc., http://www.freedom.net/
Loopo: Integrated Text Miner for FACT-Graph-Based Trend Analysis Ryosuke Saga1, Hiroshi Tsuji2, and Kuniaki Tabata1 1
Kanagawa Institute of Technology, School of Information Technology, 1030 Shimo-ogino, Atsugi, Kanagawa, 243-0292, Japan 2 Osaka Prefecture University, Graduate School of Engineering, 1-1 Gakuen-cho, Nakaku, Sakai, 559-8531, Japan {saga,tabata}@ic.kanagawa-it.ac.jp, [email protected]
Abstract. This paper proposes an integrated tool to analyze trend visualization graph called “FACT-Graph”. FACT-Graph is generated from text data with time stamp and is useful for trend analysis. However, it faces three key problems: First, it is difficult to configure parameters (such as analysis span, exceptive keywords and thresholds) to generate FACT-Graph; Second, a FACT-Graph does not provide the required information and interface for trend analysis because the process of generating the FACT-Graph eliminates that information; and third, it cannot reflect a user’s awareness in a FACT-Graph. In order to solve these problems, the authors have developed a tool called “Loopo”. Loopo integrates a term database, analysis components, and a graphdrawing function and provides users (i.e., analyzers) with information for trend analysis. Loopo also provides an interactive GUI for configuring parameters at ease and to reflect a user’s awareness in a FACT-Graph instantly. Keywords: Keyword Visualization, Trend Analysis, Co-occurrence graph, Analysis Tool.
1 Introduction The application and utilization of information has become a vital service as the volume of information stored in companies and organizations increases. Such information includes records such as POS data and access logs as well as text data such as questionnaires and reports. This information is stored in a “data warehouse”, and data mining and text mining are applied to it in order to discover useful knowledge [1]. The researched of text mining cover a wide range of areas, for example, keywords extraction, summarization, visualization [2,3]. Especially for the text mining using time-series text data, we developed a visualization technique called FACT-Graph to analyze trends [4]. FACT-Graph extracts keywords from time-series text data and visualizes trends by co-occurrence graph based on the change of attributes. Using a FACT-Graph, we can get macro trends and topics that consist of plural keywords. The usefulness of analysis by FACT-Graph is confirmed in several references [4,5]. To comprehend a FACT-Graph, the user (i.e., analyzer) is shown clues regarding trends from keywords that look important to them. However, FACT-Graph itself doesn't M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 192–200, 2009. © Springer-Verlag Berlin Heidelberg 2009
Loopo: Integrated Text Miner for FACT-Graph-Based Trend Analysis
193
have concern information such as parameters and user interface to analyze trends. It is also a problem to reflect an analyzer’s awareness in FACT-Graph. This is a barrier to use of FACT-Graph. In light of the above-described issues, the authors have developed integrated software that provides an environment for analysis of a FACT-Graph. This software allows analyzers (called “users” for convenience) users to carry out trial-and-error for trend analysis at ease.
2 FACT-Graph 2.1 Concepts and Architecture A FACT-Graph visualizes the change of keywords trend between two time periods as co-occurrence graph (Figure 1). It treats time-series text data and shows the change between categories provided a certain time period is regarded as a category. One of the components of a FACT-Graph is class transition analysis which separates keywords into four classes based on Term Frequency (TF) and Document Frequency (DF) shown in Table 1, and shows the transition of keyword between two time-periods (Table 2). For example, if a term belongs to Class A in a certain time period and moves into Class D in next time period, then the trend regarding that term is referred to as “fadeout”. FACT-Graph identifies these trends by node’s color.
Fig. 1. FACT-Graph on Front-Page Articles in Japanese newspaper (Jul. 1998 - Aug. 1998, Keywords: 30, TF: 45, DF: 30, Simpson: 0.5) [3]
194
R. Saga, H. Tsuji, and K. Tabata Table 1. Keyword Class based on TF and DF DF High
Low
High
Class A (Major Word)
Class C (Domain Word)
Low
Class B (Complementary Word)
Class D (Minor Word)
TF
Table 2. Transition of Keyword Class
Before
A B C D
A Hot Common Broaden New
After B C Cooling Bipolar Universal Locally Active Widely New Locally New
D Fade Fade Fade Negligible
Fig. 2. Process of generating FACT-Graph
Additionally, a FACT-Graph visualizes keywords and relationships between keywords by using co-occurrence information. As a result, useful keywords can be obtained from their relationship with other keywords, even though that keyword does not seem to be important at a glance, and the user can extract such keywords by using FACT-Graph. Moreover, from the result of the class-transition analysis, the user can comprehend trends in keywords and in topics (consisting of several keywords) by FACT-Graph. The steps for generating a FACT-Graph are as follows shown in Figure 2: 1. 2. 3. 4.
Separate time-series text data according to the analysis periods Extract keywords in each period by morphological analysis and TF-IDF algorithm [7] Carry out class transition analysis and extract co-occurrence relations. Visualize keywords and relations.
Loopo: Integrated Text Miner for FACT-Graph-Based Trend Analysis
195
Fig. 3. Layered architecture of FACT-Graph
A FACT-Graph is composed of three factors: time, keywords, and a co-occurrence network. The user of a FACT-Graph configures several parameters related to these factors such as analysis period, filtering keywords, and threshold of co-occurrence. The software structure of FACT-Graph is shown schematically in Figure 3. The FACT-Graph consists of several components, namely, keywords extraction, class transition analysis, co-occurrence calculation, graph renderer, and a term database. Each component is highly independent (i.e., only loosely coupled to the others). For example, references [4] and [5] use Graphviz for rendering and MeCab1 for keywords extraction [8]. The components do not share information about keywords and cooccurrence relations. 2.2 Problem of Analysis by FACT-Graph Analysis by FACT-Graph necessitates generating FACT-Graph and setting of parameters. Accordingly, the following problems regarding the analysis must be solved. − Parameter setting The user has to configure certain parameters in order to analyze trends by FACTGraph. The user is forced into using trial-and-error method, so the cost of analysis increases. However, the components of FACT-Graph are so highly independent that there is troublesome for the user to configure each parameter of each component. This also raises the problem that it is difficult for the user to manage parameters consistently. − Information reference and interface for analysis Keywords as well as relations change according to analysis period. By analyzing trends from keywords and relations shown in FACT-Graph, the user checks important keywords, and comprehends implied trends in information source such as the original text. 1
MeCab, http://mecab.sourceforge.net/
196
R. Saga, H. Tsuji, and K. Tabata
Reference of the information source is useful for the analysis of trends. However, it takes a lot of effort to refer to the information about the keywords and relations in FACT-Graph. The keywords and relations in a FACT-Graph are shown as a static image. The effectiveness of a FACT-Graph is therefore governed by the loss of links to the information source, so the user cannot access the information directly from the FACT-Graph. Another problem is that multiple FACT-Graphs do not share graph information. For example, information about positions of keywords is shown as nodes in a FACTGraph. However, a FACT-Graph itself does not contain the information about positions of nodes. As a result, even if user analyzes same keywords in two different FACT-Graphs, their positions are changed. This results the problem of understanding trend in noteworthy keywords over periods. In order to solve these problems simply, FACT-Graph has to keep and share information between two periods. However, a FACT-Graph consists of the loosely coupled software shown in Figure 3. It is good a flexibility handling but not so good at collaborative performance. As a result, the components do not share information about keywords and co-occurrence, and a FACT-Graph cannot be linked with that information (that is, nodes to keywords and links to co-occurrence). it is therefore difficult to refer to these clues seamlessly in order to analyze a FACT-Graph. − Awareness in analysis The user analyzes a FACT-Graph as he considers it and gets the idea. They may find out and come up with new awareness in the course of analysis. Also, the user may want to analyze keywords which appear to fadeout words from FACT-Graph. It is important to implement a function to reflect the awareness based on subjectivity in knowledge discovery and knowledge acquisition [9]. In order to reflect the user’s awareness, the user needs to configure parameters. However, it is troublesome for the user to achieve it. For example, keywords shown in FACT-Graph are extracted by TF-IDF algorithm. TF-IDF algorithm is based on TF and DF which are also used for the thresholds whether a FACT-Graph uses a term as a keyword. So it is difficult to appear only one remarkable keyword which user wants to survey because of difficulty of parameters setting. These three above problems may become a barrier to analyzing a FACT-Graph. In light of the above-described problems, we have developed software based on the following requirements: − Information must be shared among components and be managed consistently. − A graphical user interface (GUI) for ease of reference to information (such as keywords and co-occurrence) must be supported. − A user must be able to operate FACT-Graph itself. − The software must reflect the user’s awareness in FACT-Graph immediately. In this paper, the operation of FACT-Graph is restricted to moving, fixing, and holding nodes (that is, keywords). Moreover, a keyword is regarded as a word that reflects awareness, and the software can add a certain keyword and its relations to the FACT-Graph immediately.
Loopo: Integrated Text Miner for FACT-Graph-Based Trend Analysis
197
3 Loopo 3.1 Overview of Loopo In order to satisfy the requirements outlined in Chapter 2, the authors have developed software called “Loopo”. Loopo is software to improve analysis by FACT-Graph. Loopo generates FACTGraph based on the parameters (such as keywords threshold, analysis period), which Loopo can configure easily. Figure 4 is a screenshot that Loopo draws FACT-Graph from certain text data. Loopo consists of four windows: “FACT-Graph View,” which shows and operates the FACT-Graph itself; “Keyword Manager,” which manages keywords; “Time Manager,” which manages information and parameters concerning analysis periods; and “GraphInfo,” which shows and manages parameters concerning the network of the FACT-Graph. Several parameters such as keywords setting, analysis periods can be configured from these windows and the parameters are shared by the windows during multiple analysis periods. The analysis by Loopo starts with the import of time-series text data. After importing text data separated according to analysis period, Loopo carries out morphological analysis, keyword filtering, and several initial setting steps along with the process of generating FACT-Graph. Also, Loopo can also export an image of a FACT-Graph drawn in FACT-Graph View. FACT-Graph View. FACT-Graph View shows the analysis results for the text data which is imported to Loopo as FACT-Graph. The user can move, clear and fix keywords for trend analysis easily via the window. For example, the “fixing keyword” function is used to fix the locations of noteworthy keywords between multiple analysis periods. The user can therefore browse through remarkable keywords and their related keywords over the periods at ease. FACT-Graph View also allows the user to refer to original text data from remarkable keywords and helps them to comprehend macro/micro trends. Time Manager. It is important to configure time period for time-series analysis. Usually, the number of articles is shown as a clue of setting of time periods. By displaying the trend in article volume as a chart, Time Manager helps the user configure the parameter concerning analysis period. The window indicates how the time periods are divided up for analyzing a FACT-Graph (which is output according to the time periods). Time Manager also has a function for setting time periods forward or backward. With this function, the user can view a series of FACT-Graphs via FACT-Graph View along with the change of time period. Keyword Manager. Keyword Manager is the window for listing and managing the keywords currently shown in a FACT-Graph. The user can add and delete keywords, and refer to the original text data from a keyword via Keyword Manager or FACTGraph View. As a result, the window can reflect the user’s awareness in FACTGraph. The user can also configure the parameter, such as thresholds, concerning keyword extraction.
198
R. Saga, H. Tsuji, and K. Tabata
GraphInfo. One of the measures for identifying whether a FACT-Graph is meaningful is network information such as network size and density. GraphInfo shows the network information about FACT-Graph. GraphInfo shows network size, density, and the type of links as an overview of a FACT-Graph. It also shows several centralities such as betweenness centrality and closeness centrality when user selects a node of interest via FACT-Graph View [10]. Moreover, GraphInfo allows the user to change co-occurrence type and thresholds of co-occurrence.
Time Manager FACT-Graph view
Keyword Manager
Graph Info
Fig. 4. Screenshot of Loopo
Fig. 5. Software structure of Loopo
Loopo: Integrated Text Miner for FACT-Graph-Based Trend Analysis
199
3.2 Architecture The software architecture of Loopo is shown schematically in Figure 5. Loopo has four databases concerning network information, parameters, original text data, keywords, and 3 component concerning keyword extraction, class transition, and cooccurrence analysis. It also provided four windows, namely, FACT-Graph View, Time Manager, Keyword Manager, and GraphInfo as user interfaces. Each window (in front end) is operated by user and has a relation to all of the databases. The network-information database is related with and GraphInfo. The textdata and keyword databases are related to Keyword Manager and FACT-Graph View to allow the original text data to be referred to from remarkable keywords and enable operations such as addition and deletion of keywords. The parameter database is related to all windows because each window provides the functions of configuring parameters. For keyword extraction (in back end), Loopo adopts the TF-IDF algorithm [11] of Harman. TF-IDF algorithm calculates the weight of terms based on TF and DF, and Top n weights are regarded as keywords in time period. The value of n is one of the parameters to generate a FACT-Graph and is configured by Loopo. For all databases, SQLite2 is adopted in consideration of ease of installation. For rendering a FACT-Graph, Loopo uses the popular drawing algorithm “Spring Model” of Kameda et al. [12].
4 Discussion Loopo was developed as a mining support tool for analyzing trends. There are many methods concerning visualization such as multi-dimensional scaling and selforganization maps as same as FACT-Graph. However, there are not many tools for mining support. Polaris is one analysis tools [13]. It was developed for easy analysis of “chance discovery” by KeyGraph [14]. The concept of Loopo is similar for that of Polaris. However, Loopo is used for trend analysis, so the goal of Loopo is different from Polaris. Loopo was also developed on the assumption that an inexperienced user is analyzing a FACT-Graph. If the user has data for FACT-Graph, Loopo outputs FACT-Graph for the moment and can provide an opportunity for analyzing a FACTGraph. It is also assumed that even a user unfamiliar with FACT-Graph in detail can carry out simple analysis of trends by FACT-Graph. One of the other problems concerning FACT-Graph itself is that there is no systematic methodology for analyzing a FACT-Graph. Moreover, the results of trend analysis, which are derived from individual subjectivity, are often shared with other people. We consider that these problems can be solved by the functions of information-sharing tools (such as whiteboard and memo) and a wizard (which guides analysis process). These implementations are future works.
5 Conclusion This paper has described a tool called Loopo for analyzing trends from a FACTGraph. In analysis of FACT-Graph, we have discussed problems concerning 2
http://www.sqlite.org
200
R. Saga, H. Tsuji, and K. Tabata
parameter setting, information reference and interface, and reflection of awareness. To resolve these problems, GUI-based software which manages essential parameters via for windows (FACT-Graph View, Time Manager, Keyword Manager, and GraphInfo) has been developed.
References 1. Inmon, W.H.: Building the Data Warehouse. Wiley Publishing, Inc., Chichester (2005) 2. Nanba, H., Okuda, N., Okumura, M.: Extraction and Visualization of Trend Information from Newspaper Articles and Blogs. In: Proceedings of the 6th NTCIR Workshop, pp. 243–248 (2007) 3. Yamanishi, K., Li, H.: Mining Open Answers in Questionnaire Data. IEEE Intelligent Systems 17(5), 58–64 (2002) 4. Saga, R., Terachi, M., Sheng, Z., Tsuji, H.: FACT-Graph: Trend Visualization by Frequency and Co-occurrence. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds.) KI 2008. LNCS (LNAI), vol. 5243, pp. 308–315. Springer, Heidelberg (2008) 5. Terachi, M., Saga, R., Sheng, Z., Tsuji, H.: Visualized Technique for Trend Analysis of News Articles. In: Nguyen, N.T., Borzemski, L., Grzech, A., Ali, M. (eds.) IEA/AIE 2008. LNCS (LNAI), vol. 5027, pp. 659–668. Springer, Heidelberg (2008) 6. Terachi, M., Saga, R., and Tsuji, H.: Trends Recognition in Journal Papers by Text Mining. In: Proceedings of IEEE International Conference on Systems, Man & Cybernetics (IEEE/SMC 2006), pp. 4784–4789 (2006) 7. Salton, G.: Automatic Text Processing. Addison-Wesley Publishing Company, Reading (1989) 8. Ellson, J., Gansner, E.R., Koutsofios, E., North, S.C., Woodhull, G.: Graphviz - open source graph drawing tools. Graph Drawing, 483–484 (2001) 9. Tabata, K., Mitsumori, S.: An assertion-based information-probe system: Documentskeleton and glossary-skeleton approach. Information, Knowledge, and Systems Management 3(4), 123–152 (2003) 10. Freeman, L.C.: Centrality in social networks: Conceptual clarification. Social Networks 1(3), 215–239 (1979) 11. Harman, D.: Ranking algorithms. In: Information Retrieval, ch.14. Prentice Hall, Englewood Cliffs (1992) 12. Kamada, T., Kawai, S.: An algorithm for drawing general undirected graphs. Information Processing Letters 31(1), 7–15 (1989) 13. Okazaki, N., Ohsawa, Y.: Polaris: An Integrated Data Miner for Chance Discovery. In: Proceedings of Workshop of Chance Discovery and Its Management (in conjunction with International Human Conputer Interaction Conference (HCI 2003), Crete, Greece (2003) 14. Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: Automatic Indexing by Segmenting and Unifing Co-occurrence Graphs. IEICE D-I J82-D-I(2), 391–400 (1999)
Using Graphical Models for an Intelligent Mixed-Initiative Dialog Management System Stefan Schwärzler, Günther Ruske, Frank Wallhoff, and Gerhard Rigoll Institute for Human-Machine Communication Technische Universität München 80290 Munich, Germany {sts,rus,waf,ri}@mmk.ei.tum.de
Abstract. The main goal of dialog management is to provide all information needed to perform e. g. a SQL-query, a navigation task, etc. Two principal approaches for dialog management systems exist: system directed ones and mixed-initiative ones. In this paper, we combine both approaches mentioned above in a novel way, and address the problem of natural intuitive dialog management. The objective of our approach is to provide a natural dialog flow. The whole dialog is therefore represented in a finite state machine: the information gathered during the dialog is represented in the states of the finite state machine; the transitions within the state machine denote the dialog steps into which the dialog is separated. The information is obtained from each natural spoken sentence by hierarchical decoding into tags, e. g. the name-tag and the address-tag. These information tags are gathered during the dialog; either by human initiative or by distinct questioning by the dialog manager. The models use information from the semantic information tags, the dialog history, and the training corpus. From all these integrated parts we achieve the best path to the end of the dialog by Viterbi decoding through the transition network after each information step. From the Air Travel Information System (ATIS) database, we extract all 21650 naturally spoken questions and the SQL-queries as answers for the trainings phase. The experiments have been realized on 200 automatically generated dialog sentences. The system obtains the semantic information in all test-sentences and leads the dialogs successfully to the end. In 66.5% of the sample dialogs we achieve the minimum of the required dialog steps. Hence, 33.5% of the dialogs have over-length. Keywords: dialog management, learning, knowledge management, intelligent systems.
1 Introduction For a spoken dialog system one can distinguish between five task areas. Depending on the system these areas are more or less developed and often their transitions are fluent [6]. The speech recognizer recognizes spoken phonemes and returns a sequence of words according to a lexicon. The sentence analysis (parsing) assigns a meaning to M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 201–209, 2009. © Springer-Verlag Berlin Heidelberg 2009
202
S. Schwärzler et al.
this sequence and translates the ordered relevant information into system language [8]. The dialog management determines the dialog strategy and thus, how the system responds to the user's input. During the communication with external sources, the information is written to or read from databases. The generation answer is virtually the opposite of the sentence analysis and translates words from the system language into a sequence of words that the user understands. The audio front-end is formed by a text-to-speech system. A sequence of written words is converted into a spoken text by a speech synthesis. All components are depicted in Fig. 1.
Dialog policy
Dialog history
b(sm)
D(sm) Belief Estimator
am
b(sm)
Answer generation
amAG Text-to-speech
Dialog managment
auSU Parsing
User
auSR
au
Speech recognition
Fig. 1. Components of a speech dialog system: semantic slots and the dialog history estimates a dialog policy and ask the user over a text-to-speech system
The dialog management differs from all other components in Fig. 1. That is, why standard learning techniques are not suitable. A dialog management system requires a decision on the dialog step and which system answer should be generated in the next step. Two principal approaches for dialog management systems exist: rule-based ones and adaptive probabilistic ones. With trindikit [4] exists a toolkit for rapid development of update rules, information states and dialog transitions. Beside the rule-based systems there are adaptive probabilistic approaches in the dialog procedure, published in [5, 7]. In contrast to the Partial Observable Markov Decision Process (POMDP) [11] we do not calculate the policy in every dialog step. Thereby we reduce the runtime-complexity. In this paper, we use the following components for decision-making: inputs from the user atSR run through a parsing system and deliver semantic slots atSU to the belief estimator [8]. As the next belief state b(t+1) is dependent on the current belief state b(t), the dialog history D(sm) is augmented by the current belief state b(t). Simultaneously, the user is achieved of an action atm by the text-to-speech system. This procedure is repeated until the user reaches the dialog goal or the dialog aborts. The paper is organized as follows: after the introduction of dialog techniques and their strategies, the novel mixed-initiative dialog approach is presented in the next section. A corpus description with concrete use case follows in section 4. After the presentation of the experiments and their results, the treatise closes with a summary and additional experiments to be added in the future.
Using Graphical Models for an Intelligent Mixed-Initiative Dialog Management System
203
2 Dialog Strategies A dialog strategy follows the target to obtain all semantic information to create a SQL1 statement of an estimated user goal. To that end, different methods are described below. System directed Dialog g
h
Frame Base Dialog i Slots
1
a
2 e
b
3
c
f
4
g h i e f
true 0.5 ... ... Frame 1
Value w x y z
0.4 false ... Frame 2
Subframe
Fig. 2. The user inputs are controlled and verified in a system-directed dialog system by finitestate machines. Frame-based dialog systems catch the initiatives from the user [6]. The user is allowed to confront the system with any semantic slot in different time steps.
2.1 System Directed Systems Semantic slots are specified by the system in a fixed sequence, realized by a finitestate machine (see Figure 2). The user has no possibility to form the dialog in runtime. In VoiceXML, the W3Consortium has created a standard for system-directed dialogs [10]. 2.2 Frame Based Systems Frame based systems use semantic templates. The system asks the user to fill open semantic slots. The strategy for filling can be realized in rule-based or probabilistic systems. The user is not constrained to answer directly system directed questions. Optionally the user can answer more semantic slots in one utterance. 2.3 Mixed Initiative Systems In this paper, we combine the advantages of both approaches: An agent based system allows complex combinations between the user and the system, whereas both user and system can initiate the dialog. For passive or inexperienced users or even on problems in the dialog fluency, the system can control the dialog to the user goal. Beside the control of complex dialog actions, the mixed initiative system can change the dialog topic.
3 Novel Approaches The primary goal of this paper is the estimation and the achievement of a user goal to create a SQL statement. To that end, we evolve a semantic frame (frame-based or 1
SQL: Structured Query Language: A database computer language designed for the retrieval and management of data in relational database management systems.
204
S. Schwärzler et al.
mixed-initiative), which can be filled through pointed questions to the user. The dialog strategy is derived from a Graphical Model. The parameters of the model are learned by naturally spoken sentences in the ATIS corpus. 3.1 Graphical Model A Graphical Model combines the theory of probabilities and graphs. Edges model the statistical dependencies between the variables (nodes). Fig. 3 shows the graphical model of a mixed-initiative dialog system. It is an extension of a discrete HMM, which has been realized with the Graphical Model Toolkit (GMTK) [2]. Prolog S0
Chunk
Epilog
S1
S2
ST-1
Observation
O1
O2
OT-1
Future Flag
F1
F2
FT-1
t=1
t=2
t=T-1
ST
State
t=0
t=T
Fig. 3. Dialog strategy design by a stochastic model: The parameters of the states Smt from the dialog model are created from the previous model states Smt-1 and the user utterances aSUt. The future flag F1 controls the matrix of the observation Ot.
The model contains the observable nodes s0 and sT, which are called as prolog and epilog of the graphical model. The chunks s1 till sT-1 are hidden nodes. The observations (ot) of the semantic slots aSUt are linked by an alternating transition with the st. The statistical joint probability in Fig. 3 can be factorized with Eq. 1. (1)
3.2 Information Slots In our novel approach for dialog management, information slots are introduced. They are capable of holding the information tags: if one certain information tag is available, its corresponding information slot is filled. Each dialog step is either initiated by the system or by the human dialog partner. The ending of the user's input also denotes the end of one dialog step. The information available for the current dialog task is evaluated after each dialog step by checking which additional information slots have been filled
Using Graphical Models for an Intelligent Mixed-Initiative Dialog Management System
N=1 2 3 4 20+21= User Goal 3
1
2N possible User Goals 2 3 4 1 2 3 4
21+22= User Goal 5
21+23= User Goal 10
1
205
2 3 4
20+21+23= User Goal 11
Fig. 4. User Goals, defined in binary configuration, are learned from a trainings corpus
Fig. 5. On every time step t the semantic frame will be filled with a slot-value pair. Here the user will at first be asked for the destination, according the trained Graphical Model, before the time.
during the preceding dialog step. Each combination of available information tags, and hence, each filled information slot, is represented by one state in a finite state machine: if N information tags are required, 2N states are defined (see Fig. 4.) A transition from one state to another is made between two dialog steps, depending on the new information gathered and the missing information tags indicated by empty information slots. Missing information can now be asked for by the dialog system by a machine initiative (see Fig. 5). The information tags are thereby weighted depending on their relevance for the current dialog. The more important one certain information tag is for the dialog task, the earlier this information is asked for by the system if not provided during the preceding dialog steps. The transitions within the state machine can be either learned from a database or predefined by the dialog designer. By using a dataset, containing natural, taskspecific example dialogs, as training corpus, the transitions can be adopted or learned in order to provide a natural dialog-flow. This depends on the information tags (and hence, the current state of the finite state machine) being already available or missing. Information tag-combination which can not be found in the training corpus can be provided by the above mentioned weighting and absolute discounting. Thereby, a probabilistic modeling of the dialog is achieved by using various models of dynamic Bayesian networks (DBNs). 3.3 Trellis Representation All possible transitions from a belief state b(t-1) according to b(t) are learned iteratively by [1] and represented in a trellis diagram (see Fig. 6).
206
S. Schwärzler et al.
b(7,t-1) = 0.05
b(7,t) = 0.04
b(6,t-1) = 0.01
b(6,t) = 0.01
b(5,t-1) = 0.04
b(5,t) = 0.02
b(4,t-1) = 0.06
b(4,t) = 0.05
b(3,t-1) = 0.14
b(3,t) = 0.13
b(2,t-1) = 0.1
b(2,t) = 0.1
b(1,t-1) = 0.1
b(1,t) = 0.45
b(0,t-1) = 0.5
b(0,t) = 0.2
Possible Transition Transition
Semantic Slots
t-1
t
Fig. 6. Trellis representation of the semantic slots and their machine learned state transitions
The Viterbi algorithm computes the best path through the trellis diagram [9]. To that end the belief state b(t) describes the probability of one path at time t, which ends in state s(t). The Viterbi algorithm is defined as follows: Initialization: (2)
Recursion: (3)
Termination: (4) The best path is then found by backtracking: (5)
Using Graphical Models for an Intelligent Mixed-Initiative Dialog Management System
0
1
2
t
T-1
207
T
Time t
Fig. 7. The dialog progression through the user goals is described by the Viterbi algorithm
4 Corpus Description This speech database is the first in a series of recordings of natural speech, in the Air Travel Information System (ATIS) domain. Queries collected for these corpora are spoken, without scripts or other constraints to ATIS [3]. Table 1. Example dialog between human and system derived from the ATIS Corpora Speaker Human System Human System Human System
System
Dialogue and Actions Hello. What can I do for you? Show me all the nonstop flights from Atlanta to Philadelphia. Lists/List of flights from cities whose city name is Atlanta and to cities whose city name is Philadelphia and whose stops is 0. Yes, I would like some information on the flights on April 22nd, evening. Lists/List flights from cities whose city name is Dallas and to cities whose city name is Denver and whose departure time is between 1645 and 1715 and flying on flight days whose day name is Sunday. Can’t find any result.
A human wizard simulating the speech recognizer of the future gives the impression of a speech-recognizing computer system. ATIS contains 21650 user utterances and 2195 sessions. For each session, a problem was posed to a subject, such as “find the cheapest way to fly from Atlanta to Dallas by next Thursday”. The subject's queries by SQL to the computer system and the computer system's responses were saved as data. The average dialog length is about eight question and answer steps. A dialog example is shown in Table 1.
208
S. Schwärzler et al.
5 Experiments and Results We analyse the length of a dialog with 200 automatically generated and completely disjoint test-sentences. The experiments show reliability of our system; all dialogs reach the dialog goal with their users goals (stopover). In 66.5% of the tested dialogs, we achieve exactly the same user goals as described in the ATIS corpus. Hence, 33.5% of the dialogs have over-length. The system achieves more user goals than necessary to create a SQL statement to the database. To that end the user is confronted with more questions than required by our approach. There are on average 6 dialog steps necessary for the achievement of the dialog goal. Our system requires about 8 dialog steps before a SQL statement is sent to the database.
6 Conclusion and Future Work In this work a novel mixed-initiative dialog management system based on Graphical Models has been presented. Information slots are introduced and modeled within the Graphical Model to a DBN. Their parameters have been learned from naturally spoken sentences in the ATIS task. The Viterbi algorithm computes iteratively the best path through the trellis. Our approach achieves all dialog goals in 200 test sentences, but 33.5% of the tested dialogs have over-length. However, the first results show, that the system works reliable. In future we plan to analyse, how the model could recognize errors in the information slots and how to correct them.
References 1. Baum, L.E., Petrie, T.: Statistical Inference for Probabilistic Functions of Finite State Markov Chains. The Annals of Mathematical Statistics 37, 1554–1563 (1966) 2. Bilmes, J., Zweig, G.: The Graphical Model Toolkit: An Open Source Software System for Speech and Time-Series Processing. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2002) 3. Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS Spoken Language Systems Pilot Corpus (1990), http://acl.ldc.upenn.edu/H/H90/H90-1021.pdf 4. Larrson, S., Bernman, A., Hallenborg, J., Hjelm, D.: Trindikit Manual (2004) 5. Levin, E., Pieraccini, R., Eckert, W.: Using Markov Decision Processes For Learning Dialogue Strategies. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, Seattle, USA (1998) 6. McTear, M.F.: Spoken Dialogue Technology. Springer, London (2004) 7. Rieser, V., Lemon, O.: Using Machine Learning to Explore Human Multimodal Clarification Strategies. In: IEEE/ACL Workshop, Palm Beach, Aruba (2006) 8. Schwärzler, S., Geiger, J., Schenk, J., Al-Hames, M., Hörnler, B., Ruske, G., Rigoll, G.: Combining Statistical and Syntactical Systems for Spoken Language Understanding With Graphical Models. In: Proc. of the 9th International Speech Communication Association (Interspeech 2008), Brisbane, Australia (2008)
Using Graphical Models for an Intelligent Mixed-Initiative Dialog Management System
209
9. Viterbi, A.: Error Bounds for Convolutional Codes and an Asymptotically OptimumError Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory, series 13, 260–267 (1967) 10. W3C Recommendation, Voice Extensible Markup Language (VXML), Version 2.0 (2004), http://www.w3.org/TR/2004/REC-voicexml20-20040316 11. Young, S.: Using POMDPs for dialog management. In: IEEE/ACL Workshop, Palm Beach, Aruba (2006)
Input Text Repairing for Multi-lingual Chat System Kenichi Yoshida and Fumio Hattori Graduate School of Science and Engineering, Ritsumeikan University, Japan [email protected], [email protected]
Abstract. Even though various communication tools have resulted in a remarkable increase of global communications, language barriers remain high and complicate communication across languages. Although the multi-lingual chat system allows users to chat with each other in different language using machine translation. the quality of translation is not so high when the input sentence reflects spoken language. In this paper we propose a method that repairs the input sentences in spoken language by retrieving similar sentences using keywords. Keywords: Language Grid, multi-lingual chat, cross-cultural communication, machine translation.
1 Introduction The advance of the Internet technology and various communication tools have resulted in a remarkable increase of global communications. However, language barriers remain high and complicate inter-cultural communication. The Language Grid Project [1], which is an infrastructure that makes it possible to combine various language resources on the Internet, started to solve this problem in 2006 by improving the understanding of the Internet contents written in different languages and by people from different countries. The multi-lingual chat system1 [2], which is one of the applications developed by the Language Grid Project, allows users to chat with each other in different languages using machine translation. Most sentences in multi-lingual chats are spoken words. However, almost all of the machine translation resources used in multi-lingual chat systems translates written words. Therefore, the translation quality in multi-lingual chat systems is not always ensured. Although a multi-lingual chat system provides such functions as back translation and auto completion, they are insufficient for practical use. To improve the quality, repairing input spoken sentences to written sentences that match the machine translation is expected to be effective. In this paper, a method is proposed that repairs input sentences in spoken language by retrieving similar sentences using keywords. 1
Multi-lingual chat system is developed by College of Informatics, Kyoto University.
M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 210–217, 2009. © Springer-Verlag Berlin Heidelberg 2009
Input Text Repairing for Multi-lingual Chat System
211
In Section 2, the details of the multi-lingual chat system are introduced. An input repairing system is proposed in Section 3. In Section 4, experiments and discussion are described. Section 5 concludes our paper.
2 Multi-lingual Chat System 2.1 Overview of Multi-lingual Chat System A multi-lingual chat system adds a translation function to traditional text chat systems. Users can send messages in their native languages and receive messages from partners. In other words, users can exchange messages with partners whose native languages are different. A multi-lingual chat system utilizes the machine translation resources available on the Internet. However, current machine translation systems are designed to handle written documents: that is, well-formed sentences that are easily and correctly translated. However, the spoken words that often appear in chats are rarely translated correctly because the translation quality is not high in multi-lingual chat systems. For example, “Shukudai susunderu?” in Japanese, which means “How have you finished your homework?” might be translated as “Is homework developed?” The characteristics of the Japanese spoken sentences used in chatting are different from written sentences. First, subjects are often omitted. In addition, people often answer with very simple predicates. Second, the end of a sentence can be spelled in several different ways. Third, several words have the same meaning but the expressions are different. For example, Japanese has at least three expressions that mean “dinner”: “yorugohan,” “yuhan,” and “banmeshi.” These factors prevent machine translation from translating correctly. To improve the quality of the translation of multi-lingual chats, repairing input sentences to adapt to machine translation is considered to be effective. 2.2 Back Translation To repair input sentences, the multi-lingual chat system provides a back translation function. Fundamentally speaking, to confirm whether the input sentence is translated correctly, the target language must be understood. However, this is rare for the users of multi-lingual chat systems. Back translation re-translates the translated sentence in the target language to the source language. Users can compare the original input and the back-translated sentences and confirm whether the meaning of both sentences is the same. If the meanings are identical, users can expect that the input sentence was translated correctly. Conversely, if the meanings are different, the translation might be incorrect. In the latter case, users need to repair the input sentence and translate again by repeating this process until the meanings become equivalent. Using the above example, when “Is homework developed?” is back-translated into “Naishoku ha hatten saserareruka?”, the user realizes that the input sentence was not translated correctly into his/her native language.
212
K. Yoshida and F. Hattori
A preliminary experiment showed that 65% of sentences need to be repaired twice or more using back translation. This means that this function is very useful for improving translation quality. However, it only informs users that the input sentence is not good. Users still have to repair the input sentences by themselves. Learning how to repair input sentences is not easy. The fact that 65% of sentences need to be repaired twice or more impairs the immediacy of the chat system. 2.3 Auto Complete Another function, auto complete, which is also equipped in the multi-lingual chat system of the Language Grid Playground, retrieves example sentences that match the input text. Since the example sentences have corresponding translated sentences, the selected example sentence is translated instantly and correctly. However, it can’t handle the variety of spelling at the end of sentences or the variety of synonyms. In addition, the number of examples is limited, so the auto complete function is only useful in few cases.
3 Input Text Repairing by Retrieving Generalized Sentences 3.1 Method Overview Given an input sentence, this function retrieves generalized sentences in written styles using keywords. A generalized sentence is one in which several words are replaced to generalized words. For example, “ongakushitsu wa doko desuka?” (“Where is the music room?”) can be generalized as “«basho» wa doko desuka?” (“Where is «the place»?”). In this example, «basho» («the place») is the generalized word. The keywords used for retrieval are extracted from the input sentence and generalized. From input sentence “Ongakushitsu, doko?” (“Music room, where?”), “ongakushitsu” (“music room”), “doko” (“where”), and “?” are extracted. Then “ongakushitsu” (“music room”) is generalized as «basho» («the place»), so that the keywords used for retrieval are «basho» («the place»), “doko” (“where”), and “?”. The grammatical structure of the input sentence is disregarded in this method. Retrieving a generalized sentence database might return the following sentence: “«basho» wa doko desuka?” (“Where is «the place»?”). Finally, the generalized words are specialized as they appear in the input sentence, resulting in the following repaired input sentence: “ongakushitsu wa doko desuka?” (“Where is the music room?”). The generalized sentence database includes sentences whose use can be anticipated based on each domain in which the system is used. For example, if the system is used in an elementary school, sentences about schoolwork or classrooms must be registered. It also includes sentences commonly used in daily life such as asking about places or exchanging information. 3.2 Process Flow Fig. 1 shows the process flow of the input repairing method. The example of input repairing for a dialogue between a teacher and a foreign pupil is shown in Fig. 2.
Input Text Repairing for Multi-lingual Chat System
213
Fig. 1. Process flow
Fig. 2. Example of input repairing
1. Extracting keywords: First, by applying morphological analysis, nouns, adjectives, independent verbs, and question or exclamation mark are extracted. For example, from the sentence “Ongakushitsu, doko?” (“Music room, where?”), “ongaku” (noun-generality), “shitsu” (noun-suffix-generality), “doko” (noun-pronoun-generality), and “?” (mark-generality) are extracted. Suffixes are combined with the preceding noun. That is, “noun-suffix-generality” is always combined with the preceding “noungenerality” to form one word. So “ongaku” (noun-generality)” and “shitsu” (nounsuffix-generality)” are combined into “ongakushitsu” (“music room”).
214
K. Yoshida and F. Hattori
2. Keyword generalization: In this step, the extracted words are replaced by generalized words using domain ontology, which is a formal representation of a set of concepts within a domain and the relationships among those concepts. Section 3.3 explains domain ontology in detail. 3. In this paper, only noun-generality is generalized. In the preceding example, “ongakushitsu” (“music room”) is generalized as “«basho»” (“«the place»”). After the generalization, the three keywords, “«basho»” (“«the place»”), “doko” (“where”), and “?”, are used for retrieving a generalized sentence database. 4. Retrieving the generalized sentence database: In the beginning, the generalized sentences match all three keywords perfectly. If no sentence is matched, partial matching is done using one or two keywords. 5. Word specialization: Generalized words in the retrieved generalized sentence are replaced backwards with specialized words, as in step 2. In the preceding example, “«basho»” (“«the place»”) is replaced again with “ongakushitsu” (“music room”), and the sentence is shown to the user. 3.3 Keywords Generalization Using Domain Ontology Sentence variety appears as input of the chat system. Since preparing all example sentences is impractical, we focus on the fact that many sentences have the same structure, but only the noun in the sentence differs. These sentences can be generalized to one sentence that includes generalized words instead of the original nouns. Therefore, keywords extracted from input sentences have to be generalized to retrieve the generalized example sentence. This generalization is done using domain ontology. Fig. 3 shows a sample domain ontology for places in a school. Keywords are generalized by ascending the concept hierarchy in the domain ontology.
Fig. 3. Sample of domain ontology
Input Text Repairing for Multi-lingual Chat System
215
3.4 Retrieval of Generalized Sentences A generalized sentence database is composed of pairs of generalized sentences and corresponding keywords. Fig. 4 shows the structure of the database. Generalized sentences are developed by generalizing examples in the domain using the same method for generalizing input sentences as described above.
Fig. 4. Concept chart of generalized database
Retrieval is done by comparing the keywords extracted from the input sentence with the keywords in the generalized sentence database. If all keywords match, a corresponding generalized sentence is returned. When no sentences match, two options are provided: A and B. Option A retrieves sentences that include more than four words that match three or more keywords. In the example of Section 3.1, “«basho» wa doko ni aruka shitte imasuka?” (“Do you know where «the place» is?” is retrieved using keywords “«basho»” (“«the place»”), “doko” (“where”), and “?”. Option B retrieves generalized sentences from the database with one keyword missing, but the combination of keyword sentences partially matches the keywords, including the generalized keywords. For example, the used keywords are the generalized keyword “«basho»” (“«the place»”) and another keyword, such as “doko” (“where”). The result might be “«basho» ha doko desuka? (“Where is «the place»?”). 3.5 System Architecture Fig. 5 shows the system architecture of the input repairing system. 1. The input sentence is passed through the morphological analysis module to extract keywords from it. 2. Keywords are generalized using the domain ontology. 3. Generalized sentence database are retrieved using generalized keywords. 4. The generalized word is returned to the original one. 5. The repaired sentence is presented to the user. A sample snapshot of the multi-lingual chat system with an input repairing facility is shown in Fig. 6. When a sentence is input into the input field and the translation button is pushed, the repaired candidate sentences appear below.
216
K. Yoshida and F. Hattori
Fig. 5. System configuration
Fig. 6. Sample snapshot of input repairing
4 Experiment 4.1 Experimental Method An experiment was conducted supposing a chatting situation between a Chinese pupil and a Japanese teacher at an elementary school. The translation quality, which was compared with and without input repairing, was measured by the number of back translations needed until an acceptable translation result was obtained. The translation of Japanese into Chinese was assumed. 4.2 Experimental Result and Discussion Using multi-lingual chat without input repairing, 10 of 23 sentences were translated correctly, a success rate of 43%. On the other hand, using the system with the
Input Text Repairing for Multi-lingual Chat System
217
proposed input repairing, 19 of 23 sentences were translated correctly, a success rate of 83%. Table 1 shows several examples of input repairing. In the first, the back translation result implies that the input sentence was mistranslated because an article was inadequately supplemented by the machine translation. In this case, input repairing worked well. The second example shows that input repairing caused a mistranslation. Since the repaired sentence was complex, perhaps the machine translation could not understand it well. The last one was caused by incorrect morphological analysis. Table 1. Examples of input reparing
5 Conclusion A method was proposed to repair input sentences for multi-lingual chat systems. The method retrieves similar generalized sentences using keywords extracted from input sentences. The experiment shows that the successful translation rate was improved from 43% to 83%. The idea of retrieving generalized sentences using keywords might be applicable for young children or people with such language difficulties as aphasia. Input repairing must still be improved. For example, complex sentences must be decomposed into simple sentences. Flexibility that can satisfy the cases of morphological analysis error is also desired.
Acknowledgement This research was supported by the Strategic Information and Communications R&D Promotion Programme of the Ministry of Internal Affairs and Communications, Japan.
References 1. Ishida, T., Grid, L.: An Infrastructure for Intercultural Collaboration. In: IEEE/IPSJ Symposium on Applications and the Internet (SAINT 2006), keynote address, pp. 96–100 (2006), http://langrid.org/ 2. http://langrid.org/playground/chat/ChattingMain.html
Interactive Object Segmentation System from a Video Sequence Guntae Bae, Sooyeong Kwak, and Hyeran Byun Department of Computer Science, Yonsei University 134 Sinchon-Dong, Seodaemun-Gu, Seoul, 120-749, Korea {gtbae,ksy2177,hrbyun}@yonsei.ac.kr
Abstract. In this paper, we present an interactive object segmentation system form video, such as TV products and films, for converting 2D to 3D contents. It is focused on reducing the processing time for the object segmentation, increasing the usability. The proposed system is consist of three steps which are trimap generation based on polygon and object segmentation using Graph Cut algorithm and refinement by a user interfaces (UI) based on rectangle and local features. It makes it easy to get object segmentation rapidly. It is also helpful to create 3D contents. Keywords: Object Segmentation, interactive System, trimap generation, trimap estimation, Graph Cut.
1 Introduction and Related Works In recent years, 3D devices have been developed such as 3D monitors, TVs, projector and screen. On the other hand, 3D contents for these devices are insufficient particularly for the movies. There are two methods for making 3D contents: 1) using the special capture devices, such a stereo camera, 2) combining previously created 2D contents and depth information. The first method is not common because stereo camera is expansive and hard to handle. On the other hand, it is possible to easily obtain 2D contents. Thus there are many works for creating 3D contents using 2D contents which should be able to separate into objects regions for combining with depth information. Therefore the object segmentation is an essential technique. 1.1 Previous Approaches Object segmentation in images or videos is a popular research topic. There are many proposed algorithms and tools. Representative examples are Magnetic Lasso [1] and Pen Tools in Adobe Photoshop. They are the most popular tools due to theirs immediacy and flexibility, M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 221–228, 2009. © Springer-Verlag Berlin Heidelberg 2009
222
G. Bae, S. Kwak, and H. Byun
and thus they are widely used for years. It gives good segmentation results by allowing fine manipulation, but it is too complex to user for beginner and required many phases. The popular algorithms for object segmentation are Graph Cut [2, 3] and Matting [4, 5, 6]. In the Graph Cut, an image is segmented into several regions by using graph concept and color model of the object. Users provide a little information such as trimap that contains the information of foreground and background regions. It is the problem of determining whether it belongs to foreground or background for each node in a graph. Its result value has discrete values, 0 or 1. Lazy snapping [7], proposed by Yin Li, is a representative tool using the Graph Cut algorithm. Literally lazy snapping is a segmentation tool for lazy user. User can segment the object from an image by a few simple marking. It provides border brush and pixel editing user interface for the refinement on the segmentation result. Graph Cut performs on a small border region. Other popular tool is GrabCut [8] using the iterative Graph Cut, proposed by Carsten. In this tool, user can segment with a few interactions that draw a rectangle with a mouse. It is also a user interface for the refinement of inaccurate segmentation result due to the incomplete labeling. Both algorithms well segment for image. However, they repeatedly require a user input about foreground and background, and thus it is not proper for video data. For accurate object segmentation, matting is a popular algorithm. It is the problem of determining whether it is foreground, background or mixture for each pixel in an image. Almost matting algorithms focus on improving the segmentation quality in single image through accurate inference of mixture parameter “alpha” based on sampling the nearby foreground and background pixels. The segmentation results of these algorithms are more accurate than the other segmentation approaches’ for the image compositing. However matting algorithms are generally too slow to use video segmentation because it deals with the number of pixels in image. Soft Scissor [9] proposed by Jue W. is an interactive tool for extracting alpha mattes of foreground objects in real time. For segmenting, user should be marking the boundary regions by using a user interface like a brush which is automatically adjusted its border width. To overcome the time complex problem of matting algorithm, alpha matte extracts locally. It is helpful to reduce time. Matting is good for compositing of image. But it doesn’t correspond to video, because it is still slow for dealing with several images. 1.2 Proposed System In this paper, we propose an object segmentation system form video sequences, such as TV products, films and UCC, for converting 2D to 3D contents. It is focused on reducing the processing time, increasing the usability. It will allow that general person can achieve the high quality object segmentation result from a video. The proposed system is consist of three steps which are follow:(1)trimap generation (2)object segmentation using Graph Cut (3)the refinement by an user interfaces using the local Graph Cut algorithm. Figure 1 shows the workflow of the proposed system.
Interactive Object Segmentation System from a Video Sequence
223
Fig. 1. Work flow of proposed interactive object segmentation system from a video for 3D converting
2 Interactive Object Segmentation System 2.1 Trimap Generation In the object segmentation using Graph Cut algorithm, prior knowledge about object is needed. Rectangle in GrabCut and marking in Lazy snapping correspond to it. Prior knowledge roughly indicates the position and size of the interest object. The quality of segmentation result is depended on user input. If user input is given in detail, we can achieve high quality segmentation result. However it requires high concentration and considerable time, and thus it is not proper for video object segmentation. We propose a method based on polygon and control point for trimap generation. It is consist of three steps which are polygon creating, border width adjusting, and object assigning. Each step is follow: 1. Firstly, user creates the polygon, similar shape of the boundary of interest object by using mouse left button click. If user clicks the mouse button at specific position for pointing, control points needed to control are generated. They are formed into polygon. 2. Secondly, user adjusts border width of the polygon to include the object boundary by scrolling mouse scroll. Region of polygon’s border is assigned as unknown region (U) in trimap. 3. Finally, user determines that which region is assigned as foreground region (F) in trimap by using mouse right button click. Other region separated by polygon is automatically assigned as background region (B). 4. If an image is completely assigned as F, B, and U, trimap generation is completed. Figure 2 shows the steps of the trimap generation. Additionally, the propose system provides a function of adjusting a transparency of trimap. It is helpful for the users to generate and adjust a polygon instinctively. Consequently it can reduce the time for object segmentation.
224
G. Bae, S. Kwak, and H. Byun
Fig. 2. The steps of trimap generation and result of transparency adjustment (a) creation of polygon (b) adjustment of border width (c) region assigning (d) result of transparency adjustment
2.2 Object Segmentation Using Graph Cut Object segmentation can be changed a binary labeling problem. Graph Cut algorithm is an approach which utilize graph concept for solving the problem. In Graph Cut, an image is represented by a graph G
= v, ε , where v is a set of
all node, ε is an edge set of adjacent nodes. It is a method about how to assign each node as foreground (0) or background (1) for separating graph into two regions. Generally, to solve the problem, it firstly defines an energy function and then minimizes it. In the propose system, we define a Gibbs energy function [10] like equation (1). We use the max-flow algorithm [3], proposed by Boykov, to minimize this energy.
E ( X ) = ∑ E1 ( xi ) + λ i∈ν
∑ E (x , x )
( i , j )∈ε
2
i
j
(1)
xi is an assigning result of node i xi ∈ [0,1] , and X is a set of all xi . E1 is a likelihood energy and E 2 is a prior energy. xi and x j are the labels of adjacent nodes i and j . Likelihood energy E1 is computed by the color similarity between each node and pre-assigned foreground/background regions by trimap. If node i which’s color is where
similar to the region, assigned as foreground in the trimap, is labeled as foreground ( xi = 1 ), E1 will be lower (nearby 0). In other case, if it is labeled as background ( xi
= 0 ), E1 will be close to 1. Table 1 shows E1 of Node i in each case. In table 1,
K is a constant, diF and d iB are the minimum distance from its color to foreground and background cluster regions [7]. Prior energy E 2 is energy about boundary region of segmentation result. In the proposed system, it is computed by the L2-Norm of the RGB color difference between two nodes i and j . If similar adjacent nodes are labeled as foreground and background respectively, E 2 will be also larger and it will be lower like E1 in other case.
Interactive Object Segmentation System from a Video Sequence
225
Table 1. Node i ‘s likelihood energy in each case Assigned result F(Foreground) B(Background) U(Unknown)
xi = 1 0 K F d i /(d iF + d iB )
xi = 0 K 0
d iB /(d iF + d iB )
2.3 Refinement and Trimap Estimation Segmentation result is not always good. Therefore a refinement step for correcting mislabeled result is needed. The proposed system provides a refinement tool based on Graph Cut algorithm using local features. If we use the global feature for using Graph Cut, it is occasionally the reason of inaccurate segmentation due to the color similarity of between foreground and background regions. Therefore, the proposed system uses the local feature for reducing the color similarity in the refinement step. Target region is indicated by using mouse drag. If specific region is mislabeled, user simply indicates the rectangle region. At this time, the rectangle should include both foreground and background region, because this information is utilized for the local Graph Cut in the automatic trimap estimation. Boundary between these two regions is assigned as unknown region (U) by dilation operation. Foreground (F) and background (B) regions are used the previous step result. If F, B, and U are completely assigned, trimap estimation is done. The proposed systems perform the local graph by using estimated trimap. In the video sequence, the trimap estimation technique is available to generate trimap for next frame, because the difference between consecutive frames is a small. Therefore it is possible to estimate the trimap for next frame. It is also helpful to reduce the working time. Figure 3 shows the work flow of the trimap estimation.
Fig. 3. Work flow of the automatic trimap estimation
226
G. Bae, S. Kwak, and H. Byun
3 Experimental Results We have performed several experiments that showed the validity of the proposed tool for trimap generating and the proposed interactive system. The proposed system is implemented in VC++ and is run on Pentium D CPU 3.0 GHz PC with 2G RAM. 3.1 Test Dataset To evaluate the propose system we use various test datasets which are captured by webcam, Logitech QuickCam IM(1.3 megapixels). The size of test datasets is 320 by 240. The datasets have a scenario that is a common situation of the indoor visual teleconferencing. 3D converting is able to increase the reality of video.
Fig. 4. Representative frame of test datasets (a) dataset 1 (b) dataset 2 (c) dataset 3 (d) dataset 4
3.2 Performance Evaluation For the performance evaluation of the proposed system, we create the ground truth data manually. It is created by the conventional image segmentation tool such as Adobe Photoshop. The proposed interactive system is measured by two parts. The first is the evaluation of accuracy and efficiency of the trimap estimation method. The second is a usability evaluation of usability of trimap generating tool. First, we compared between ground truth data and segmentation result using estimated trimap for accuracy evaluation of the trimap estimation. For measuring more accurate measurement of proposed algorithm, we measured various dilation degree of boundary region in trimap estimation. Measurement result had a good performance in slow motion video sequence, but it doesn’t in fast case because object boundary escape unknown region in estimated trimap because of large difference between consecutive frames. To evaluate the performance of initial trimap generation method, we measured time of initial trimap generation using tool. And we compared results with results using general graphics tool. In result, the average time took less than 1 minute, so it is fast to general graphics tool. The advantage of the proposed method is that our tool is very easy and intuitive to use quickly for beginner.
Interactive Object Segmentation System from a Video Sequence
227
3.3 Segmentation Results
Fig. 5. Result of the video object segmentation, second row shows well segmented objects from the dataset 1. In the other hand, fourth row shows the miss segmentation result because of strong edge.
4 Conclusion and Future Works In conclusion, we have proposed an interactive system for video object segmentation from 2D video. The propose system makes it easy to get an object segmentation result from video such as movies, UCC, and TV products. It is also helpful to generate the 3D content. We have proposed the user interface for trimap generation and the trimap estimation method for reducing the processing time. Finally, we have shown the validity of the proposed interactive system by experiments. However, we still have some issue for improving the performance. For more automatic segmentation of video object, we should solve the problem of fast moving object and improve the segmentation result more accurately. It is our future works.
Acknowledgments This research was supported by MIC, Korea under ITRC IITA-2008-(C1090-08010046).
228
G. Bae, S. Kwak, and H. Byun
References 1. Mortensen, E.N., Barrett, W.A.: Intelligent scissors for image composition. In: Proceedings of ACM SIGGRAPH 1995, pp. 191–198 (1995) 2. Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In: Proceedings of ICCV 2001, vol. 1, pp. 105–112 (2001) 3. Boykov, Y., Kolmogorov, V.: An experimental comparision of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9), 1124–1137 (2004) 4. Chuang, Y.-Y., Curless, B., Salesin, D., Szeliski, R.: A Bayesian Approach to Digital in Matting. In: Proceeding of IEEE Computer Vision and Pattern Recognition, pp. 264–271 (2001) 5. Grady, L.: Random walks for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2004) 6. Wang, J., Cohen, M.-F.: Optimized color sampling for robust matting. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 264–271 (2007) 7. Li, Y., Sun, J., Tang, C.-K., Shum, H.-Y.: Lazy Snapping. ACM Transaction on Graphics (SIGGRAPH) (2004) 8. Rother, C., Kolmogorov, V., Blake, A.: GrabCut – Interactive Foreground Extraction using Iterated Graph Cuts. ACM Transaction on Graphics (SIGGRAPH) (2004) 9. Wang, J., Agrawala, M., Cohen, M.-F.: Soft Scissors: An Interactive Tool for Realtime High Quality Matting. ACM Transaction on Graphics (SIGGRAPH) (2007) 10. Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741 (1984)
COBRA – A Visualization Solution to Monitor and Analyze Consumer Generated Medias Amit Behal1, Julia Grace1, Linda Kato1, Ying Chen1, Shixia Liu2, Weijia Cai2, and Weihong Qian2 1
IBM Almaden Research Center, 650 Harry Road, San Jose, CA {abehal,jhgrace,kato,yingchen}@us.ibm.com 2 IBM China Research Lab {liusx,caiweij,qianwh}@cn.ibm.com
Abstract. Consumer Generated Medias (CGMs) -- such as blogs, news forums, message boards, and web pages -- are emerging as locations where consumers trade, discuss and influence each other’s purchasing patterns. Leveraging such CGMs to provide valuable insight into consumer opinions and trends is becoming increasingly attractive to corporations. This paper describes COBRA (COrporate Brand and Reputation Analysis), a visual analytics solution that surfaces the text mining and statistical analysis capabilities described in our earlier COBRA papers. Our interaction technique of search, visualization, and monitor enables detailed analysis of many CGMs without overwhelming the user. A suite of visualization solutions expose a variety of embedded COBRA visual analytics capabilities. Real world client engagements and user studies demonstrate the effectiveness of our approach. Keywords: visual analytics, text mining, semi structured search.
1 Introduction Our COrporate Brand and Reputation Analysis (COBRA) solution allows clients to monitor and analyze a variety of Consumer Generated Medias (CGMs) [1]. We provide a unified view of a variety of datasources – such as blogs, message boards, news feeds, and clients’ internal datasets. The users create initial queries to populate the database with their sources. Users then create and refine text patterns to annotate useful entities, such as their brands and risk hotwords. This setup and refinement process is described in our earlier COBRA paper [1]. The system then extracts relevant snippets from each type of document, and alerts the user of information that matches their criteria. The system also enables the user to search the historical collection of snippets, and conduct deeper analysis on the results. This paper describes COBRA’s interaction layer that is presented to the end user analysts. To provide intuitive interaction with data and analysis, COBRA user interactions are organized around three high level tasks to find relevant information: search to M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 229–238, 2009. © Springer-Verlag Berlin Heidelberg 2009
230
A. Behal et al.
filter down to a subset of documents, visualize to graphically understand the results of analysis on those documents, and monitor to track potential risks to different business entities over time. These tasks are closely integrated through the analytics, but flexible enough to be performed in any order by the end user to allow for a customized workflow. For example, search results are visualized using different techniques; these interactive visualizations can surface interesting concepts within the data to monitor. To enable this, COBRA has a search panel that provides a visual representation of the search query to inform users of any refinements they have made, and allow them to narrow (or expand) the dataset being analyzed. A tabbed interface allows users to investigate attributes of their dataset through a variety of visualization and monitoring capabilities. Users can feed results of one visualization into a different analysis by selecting a document subset and switching tabs. This clean and concise layout provides an intuitive application framework that allows for smooth transition between the three tasks without compromising the ease or capabilities of any one task. To see how COBRA is used, we walk through a possible usage scenario. A typical user starts by viewing the newly generated alerts to monitor potential risks identified that day. The user notices that one corporate brand entity is mentioned with much higher frequency than others, and wants to understand the risk factors causing this. To find the correlations between the brands and risk hotwords, the user switches to the relationships tab. There, the user can easily find pairs of highly correlated brands and risk hotwords. Selecting one such pair and switching to the sentiment tab allows the user to further filter down to the documents with negative sentiment. These selected documents can then be automatically clustered to provide better insights, which can be used to create new risk causes to improve the daily monitoring.
2 Search The top search panel serves many functions in our system. Because COBRA provides a unified view of multiple datasources, the panel quickly informs users of the unified and individual schemas. The panel also selects a subset of the data to be analyzed, specified by a semi-structured search query. This query can be explicitly specified by the user to isolate data of interest, or formed using exploratory visualizations below. In addition, the search panel provides a visual representation of the query so the user does not get lost, similar to the description by Kieliszewski, et al. [2]. The initial search panel surfaces the unified schema of all datasources at the top of the interface. It shows the commonly used dimensions (model, brand, hotword, and source) as multiselect dropdowns, along with a date range and text search inputs. To understand the meaning behind a dimension, the user can open a particular dimension’s dropdown to view its values and associated snippet counts. The advanced search can be expanded to show less commonly used dimensions and source specific dimensions associated with just a subset of the data. The date range is prepopulated with the minimum and maximum dates in the data, and the total snippet count is displayed. This progressive display of schema information allows users to understand the initial data without getting overwhelmed by metadata.
COBRA – A Visualization Solution to Monitor and Analyze CGMs
231
Fig. 1. Search and advanced search interface with an open multiselect dropdown
Users manipulate the search query by adding or removing filters, which allows drilling down or moving up to define subset for analysis. Users can easily add multiple types filters at once by interacting with individual components and hitting the search button. Selecting options from the multi select dropdown adds structured field filters, entering keywords adds a text search filter, and a start and end date allows filtering snippets by date. Filters can also be added by interacting with visualizations that help users explore the data and navigate to an interesting subset. For example, a trend chart that allows users to drill in across time allows an option to apply that filter to the search panel. A correlation matrix will also surface that option, so the search panel can focus on some interesting cells with just one click. This convenience allows users to feed the result dataset of one visualization into the input data of another one, allowing for interesting workflows leveraging a chain of COBRA analytics. To ensure users know their position while navigating the search space, the “Current Search Attributes” area surfaces all active filters and total number of matching snippets. The area also allows users to easily remove filters if they are done analyzing a subset or just accidentally added a filter. To help users save and retrieve context across sessions, we are working to add the ability to save or load search queries.
3 Visualize The essence of COBRA from the end user perspective is a set of graphical features supported by sophisticated analytics that surfaces a unique analytical results of the data through six interactive visualizations. Each visualization surfaces details of the data incrementally, to allow review and interpretation of selected results without overwhelming users with excessive information. For example, the high level charts of the Dashboard can be expanded to show more detailed trend charts and supporting documents. Similarly, relevant summary text in the Alerts tab can be expanded to the full set of supporting documents on particular issues of potential consequence. By providing a suite of visualizations
232
A. Behal et al.
that progressively convey and explain statistical and textual analysis results, we empower users to gain a deep understanding of the vast amounts of data very quickly. Following are more detailed descriptions of the six interactive visualizations. Dashboard. The dashboard is the default tab users see when they log into COBRA. It uses interactive pie charts and trend graphs to provide a quick visual overview of the entire dataset. The dashboard also allows drilling into specific pie chart slices or graph time ranges to help users identify and select interesting results from the analysis. The dashboard is layed-out with the three interactive pie charts on top that show the most important dimensions: source, model and hotword. These three high level dimensions were determined through user studies and client requirements to guide insight. Below each pie chart is an interactive multi-line trend graph that shows the change in distribution over time for the dimensions in the pie chart. Included is an additional, hidden interactive pie chart and trend graph to display any one dimension’s distribution.
Fig. 2. The dashboard screen shows the major pie charts and trend charts at the top. The third row in the dashboard is shown after selection in the top pie charts.
The interaction model of the dashboard is top-down – interacting with any component updates the dataset shown in components below it. Selecting pies in any top pie chart updates the corresponding trend graph to show just the selected snippets, and reveals the third row to break down the selected snippets by a different dimension. Hiding the third row until the user makes a selection above makes it clear that it shows just the selection from above, and also keeps the initial dashboard uncluttered. Optionally, users can select a time span from the trend graph to select snippets by time, which updates the third row.
COBRA – A Visualization Solution to Monitor and Analyze CGMs
233
Each component shows the users’ selection, so they remember their context: selected pies are outlined in black, and the trend graph zooms into the selected time range. Each component also allows users to view the selected snippets’ text, or to apply the selected subset to the top search panel so it can be analyzed in a different tab. The end result is a very useful visualization that fulfills two roles. It provides an important high level view to give the user insight into the underlying dataset, and can be focused to find and isolate anomalies or spikes for deeper analysis. Taxonomies. The taxonomies visualization is useful for exploring the values of one dimension at a time, and allows drilling into one or more values to see the underlying documents. It is useful for exploring any dimension, whether it be a structured field originally present in the data (like source or author), or a dimension annotated by COBRA (such as brand or hotword). In addition, users can run automated text clustering algorithms to categorize their subset by text. These dynamic “text cluster” dimensions can be further modified through editing wordlists and regenerating the clustering, which can help create more meaningful clusters or isolate a subset based on text [3,4].
Fig. 3. The taxonomies view
Relationships. This tab surfaces the relationship between values of any two dimensions. COBRA surfaces three different visualizations of the same relationship data: relationship matrix, relationship table and a relationship network graph. The relationship matrix shows the raw relationship data as a sortable matrix. Each row represents a value of the first dimension, and each column represents a value of the second dimension. Each cell counts the number of documents containing both the
234
A. Behal et al.
value of the row and column. We add a normalized affinity value to the cell, which represents the significance of the count. This affinity value is used to color code each cell, with a red color calling attention to cells with high affinity. Clicking on a row or column header sorts the cells by affinity to a particular dimension value. Geoniem et al. showed that matrix representations outperform node-link diagrams for large or dense graphs in several low-level reading tasks, except path finding [5]. This matrix view is very useful to surface high-level structures, like communities, by finding good permutations of their rows and columns, which are difficult to spot in the relationship or graph views.
Fig. 4. The color coded relationships matrix visualization
The relationship table allows a more navigational, drill down view of the data so users can explore the relationship data one row at a time. Users are presented with a list of individual values of the first dimension, which can be selected to see the related values from the second dimension, along with counts and affinites. The relationship network graph shows the entire matrix at once by creating a bipartite network graph using the cells as edges between nodes representing the rows
Fig. 5. The relationships table allows users to drill in one row at a time
COBRA – A Visualization Solution to Monitor and Analyze CGMs
235
and columns. Graph visualization is an intuitive way to discover and visualize nodelink structures in complex relations and outperforms matrix view and relationship view in critical path finding. We show the top edges corresponding to the highest affinity cells, and allow the user to control the number of edges displayed to reduce the visual clutter, so they can see more important data.
Fig. 6. The relationships network graph visualization
Key Influencer. This tab allows users to pinpoint the key influencer of the ideas in a particular subset. It identifies websites most frequently referenced from the documents in the subset, which might influence the content of the CGMs. It also shows the top website the pages are from, to identify where the ideas are being posted. The look and feel of this tab is similar to the taxonomies tab. Sentiments. This tab surfaces positive and negative sentiments in snippets, which are derived using sentiment scores of the words in the snippet[6]. Instead of just displaying the number of snippets with positive or negative snippets, we allow users to correlate sentiment with a dimension of their choice. We will use an intuitively useful dimension “brand” in our example, as it shows clients which brands have the worst or best sentiment. However, the country or TextCluster can also provide useful results. We provide a sortable color-coded co-occurrence table at the top. This can be used to easily pick out the most positive or negative brands. The user can also filter the rest of the analysis to just selected brands by selecting one or more rows. We also present the user with the positive and negative trend stack plots side by side to convey sentiment’s time evolution and strength. The graphs are also highly interactive; clicking within the graphs narrows the dataset by the selected dimension, and dragging across selects a time range. As with most COBRA visualizations, users are provided with the text of snippets if they select a particular brand or time range. We highlight the positive and negative words in the text, so users can determine the meaning behind the positive or negative sentiment rating.
236
A. Behal et al.
Fig. 7. The sentiment visualization shows positive/negative trend graphs, and document text
Search Results. COBRA provides users with a traditional faceted search results view of their selected dataset. A facet tree on the left allows the user to quickly drill down to further narrow their dataset, and an stackplot graphs the number of documents versus time. The snippets and their documents are also shown below, in a list that is easy to sort and page.
Fig. 8. The search results view showing a tree view, trend graph, and document text
COBRA – A Visualization Solution to Monitor and Analyze CGMs
237
4 Monitor Monitoring new data on a daily basis is important to clients that wish to monitor their brand’s social reputation real time. To facilitate that, the “Alerts” tab allows users to easily isolate documents within a given date range, such as ones from the past business day or week. A tree view at the left shows the distribution of snippets across models and brands, which can be used to filter the data shown in the rest of the screen. The alerts summary at the top shows the count and the most typical snippet for each intersection of alert causes and brands. Drilling into an alerts surfaces the underlying documents, with the alert causes highlighted, so users can decide if an alert poses any business risk.
Fig. 9. The alerts view with tree view, summary table, and underlying documents
5 Conclusion This paper described a web based visual analytics solution that surfaces COBRA’s reputation monitoring and analysis capabilities. The solution provides a suite of interactive visualizations that run textual and statistical analysis, and use progressive information disclosure to allow human validation of the results. Our tabbed interface provides a contained experience that quickly conveys the application’s capabilities to first time users. Combined with our search panel, we provide a relatively flat learning curve for users to monitor reputation alerts, search and analyze the dataset, and perform analysis chains. This combined approach has been tested in user studies and client engagements, and provides users with a flexible yet easy to use CGM monitoring and analysis solution. Moving forward, we plan on surfacing more analytics visualizations. In particular, we will be surfacing deeper word level statistics. Also, we plan to allow more customization of the analytics components and layout, to suit different business and user requirements.
238
A. Behal et al.
References 1. Spangler, S., Chen, Y., Proctor, L., Lelescu, A., Behal, A., He, B., Griffin, T.D., Liu, A., Wade, B., Davis, T.: COBRA - Mining Web for Corporate Brand and Reputation Analysis. In: IEEE/WIC/ACM, International Conference on Web Intelligence, pp. 11–17 (2007) 2. Kieliszewski, C., Cui, J., Behal, A., Lelescu, A., Hubbard, T.: A Visualization Solution for the Analysis and Identification of Workforce Expertise. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4557, pp. 317–326. Springer, Heidelberg (2007) 3. Behal, A., Chen, Y., Kieliszewski, Y., Lelescu, A., He, B., Cui, J., Kreulen, J., Rhodes, J., Spangler, S.: Business Insights Workbench – An Interactive Insights Discovery Solution. In: Human Interface and the Management of Information. Interacting in Information Environments, pp. 834–843. Springer, Heidelberg (2007) 4. Spangler, S., Kreulen, J.: Interactive methods for taxonomy editing and validation. In: ACM CIKM (2002) 5. Ghoniem, M., Fekete, J.D., Castagliola, P.: On the readability of graphs using node-link and matrix-based representations: a controlled experiment and statistical analysis. Information Visualization 4(2), 114–135 (2005) 6. Cai, K., Spangler, S., Chen, Y., Zhang, L.: Leveraging Sentiment Analysis for Topic Detection. In: 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 265–271 (2008)
Visual String of Reformulation Arne Berger, Jens Kürsten, and Maximilian Eibl Chair Media Computer Science, Technical University Chemnitz, Strasse der Nationen 62, 09107 Chemnitz, Germany {arne.berger,jens.kuersten,eibl}@informatik.tu-chemnitz.de
Abstract. An interface for query reformulation based on multimedial search widgets is proposed. It allows the co-existence of widgets for unambiguous intellectual metadata and vague, automatically annotated metadata. Keywords: customization, interface, multimodal, query reformulation.
1 Visual String of Query Reformulation The interdisciplinary project SACHSMEDIA conducts research on automatic audio, image and video annotation, high-level semantic metadata, and a user-centered interface approach for combining the above in one information retrieval system. SACHSMEDIA is a joint project of Technical University Chemnitz (Germany), selected solution providers for television production workflow, as well as local TV stations in Saxony (Germany). The project is financed by the Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung). Our work reflects the information retrieval needs of those TV stations for their growing multi-media repositories, as well as an incorporation of the constantly growing opportunities arising from automatically annotated metadata. This paper summarizes the efforts undertaken to form a fully customizable, user-adaptable textbased retrieval interface based on widgets, reflecting the TV stations' needs for a clean and fast multi-media search interface. It also incorporates graphical search widgets for retrieval based on fuzzy automatically annotated metadata, for evaluating the later under real working conditions and for supporting a more open searching/browsing approach. Our aim is to bridge three gaps in modern information retrieval graphical user interfaces: 1) Heterogeneous user population Text-based retrieval suits many users fine, as daily work usually consists of known item searches, for which textual interfaces are sufficient. However, as the user population is heterogeneous, customized search interfaces improve user satisfaction significantly. 2) Modal gap Plain text-based retrieval is not sufficient for all emerging tasks. It leads to user frustration and confusion, as it requires the user to repeatedly switch their mental M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 239–248, 2009. © Springer-Verlag Berlin Heidelberg 2009
240
A. Berger, J. Kürsten, and M. Eibl
model while refining and reformulating a query. Current multi-media information retrieval is the iterative task of searching, browsing, then editing the query for refining, then browsing. We propose an interface model for searching and refining in order to reduce this modal gap. 3) Content based retrieval approaches We propose a solution for merging information retrieval concepts for multiple media. From a user perspective, there is an out-dated paradigm that automatic audio, image and video annotation, content clustering and ontology browsing reside in separate interfaces. We offer a conceptual model to overcome this by implementing distinct retrieval approaches into one retrieval interface.
2 What End Users Do (And Don't Do) The design view User studies and user evaluation are essential in HCI, we learn. But then, a mere “thumbs up” from a small percentage of a proposed user population is usually sufficient for a product to get a go for implementation. This kind of interface evaluation is the reason why so many interfaces fail to give the feeling that it “fits” like a custom tailored suit. Even worse, software nowadays often succeeds in giving a feeling of comfort, by crippling the softwares' possibilities. While Maeda's design approach [8] is groundbreaking for tailoring usable software that fits for large user groups, it still lacks the flexibility of a custom tailored interface, deriving from an indepth user understanding, and resulting customization. [4], [6] What end users do (and don't do) The end-users we work with are professional researchers, editors and account executives at German TV stations. Every TV station annotates and archives their own footage for later retrieval. Those archives are constantly growing, containing multimedia data that is poorly intellectually annotated and fully lacking high-level semantic metadata. Our research purpose is to support end users with interfaces for easier annotation and retrieval, focusing on encouraging them to add more intellectual metadata when adding footage to the TV stations' archives. Additionally, the archive software must be suited to all (about 70) involved TV stations for later connection and interchangeability. The metadata scheme that we applied is “Regelwerk Mediendokumentation” (system of rules for media documentation from ARD) [20], comprising of 70 metadata fields for annotation of intellectual metadata. Extensive site visits and user studies at seven selected TV stations showed a lot of insight: To start out, we found out that many users are involved in creating an individual program, but they differ greatly in their knowledge about it. But no user involved has the time or the skills to collect all metadata from all involved users and add those to the archive. We studied in depth how the TV stations handle their annotation and search tasks. The following example is prototypical for the distribution of knowledge between individual users. One user was asked to find an individual program. The user shared an office with three co-workers and began to voice her associations with this program. Someone in
Visual String of Reformulation
241
the office remembered that the production in question was commissioned. This prompted the account executive to browse the account book for the client in order to approximate the date of the program. There had been only two commissions by the client in the last two years. This information allowed the editor to search all the tapes from the time in question, tracking down the tape and using the label to find the program's position on the tape. As the users are all aware that their archives are poorly annotated and wish to change it, they are willing to contribute their knowledge about the productions they are involved in for annotation. Based on the findings of the site visits an annotation workflow like this is plausible:
Fig. 1. Annotation Workflow
For our first annotation & retrieval prototype, we examined the three main user groups at the TV stations and asked them to sort all those 70 metadata fields that are searchable and may be annotated, according to a) their relevance for the single user's information needs and b) into metadata sets (groups) of thematic coherence. Not surprisingly, the output showed significant patterns in information need, the metadata selected and the user's engagement in the broadcasting process. Structured qualitative interviews showed that users are willing to annotate more metadata when the annotation interface reveals only those metadata for annotation that are relevant for the user's current involvement in the production cycle. Additionally, users were likely to add more production related metadata than already included. Account executives, for example, have been keen to add additional metadata to Regelwerk Mediendokumentation. As they tend to remember names of people in the footage more, they demanded complete customer contact information as metadata for the single programs. Concluding this, we decided to sort metadata fields into the proposed sets and offer an option to add sets with additional metadata according to the TV stations' needs. This allows every user to get their own metadata annotation sets according to their production involvement. The annotation tool was structured accordingly. Going back to the associated search example at our site visits, properly annotating intellectual metadata was only part of the problem. A second finding of the site visit is that users are highly interested in customizable retrieval interfaces according to their different information needs. The information need usually lies somewhere between
242
A. Berger, J. Kürsten, and M. Eibl
the quasi-standard one query input field approach on the one hand, and an expert search interface with around 70 input fields and adequate combination options for a search in all available metadata fields that are annotated, on the other hand. Using two iterations of paper prototypes, we tested whether an interface for retrieval that was structured similarly to the annotation interface would also enhance user satisfaction. Users instantly made a connection between the annotation and retrieval tool and wanted to use the customizable retrieval interface accordingly. A quantitative comparative evaluation of the performance of our interface versus an ordinary expert search will be conducted at a later stage, but interviews look promising.
Fig. 2. Custom Text Widgets
3 Content-Based Retrieval Based on the finding that widgets suitable for only a small, distinct task are helpful for a step-by-step query specification and reformulation, we expanded this idea to multi-media query formulation. This makes sense in two ways. To begin with, users told us they are constantly in need of better search results, mostly because intellectually annotated metadata are not sufficient. Secondly, users are looking for less restricted ways to formulate queries in their daily work. They are aware of automatically annotated metadata, the chances they offer and the drawbacks they currently have. Users are willing to sacrifice their current fixed text based retrieval systems for a more open way to formulate queries. They are also willing to sacrifice some precision at first – if they are allowed to refine their queries using other widgets at hand when it becomes necessary later on. The above findings of high user acceptance enable us on a technical side to enhance the TV stations' archives with rich high-level semantic metadata without compromising the users' faith in the retrieval engine. In the SACHSMEDIA project, we are attempting to solve the recognition of persons via facial and voice recognition, to be included in the retrieval system. However, it is also plausible to include previously solved parts of our research findings in distinct widgets. Those steps include OCR in the lower third of the screen or automated detection of the presence of persons. It is plausible to include those examples as widgets in a retrieval system, but they alone would not automatically justify a stand alone application.
Visual String of Reformulation
243
Fig. 3. Graphical Widgets
In the information retrieval community, a lot of search engines based on high level metadata are used that are helpful in many ways for distinct retrieval tasks, but are – in and of themselves – not always sufficient for sophisticated search tasks in a professional work environment. Formulated in widgets and combined in a query formulation and reformulation flow, they would be much more useful. There are sketch-based image search engines like “retrievr”, or engines that focus on drilldowns in content clusters such as “cuil” or “quintura”. Tools that feature image search based on the color distribution of the images includes “xcavator”. What is available for searching videos and music is text-entry only and includes “seeqpod” and “veoh”. [21], [22], [23], [24], [25], [26]. Unfortunately, there is a gap between what search applications are possible, and what search applications have already been combined in one interface, as most retrieval systems focus on just one issue. That is why we widened our user-driven approach to engage retrieval experts working on content-based information retrieval. We conducted qualitative interviews with those experts, and they all focus on only one of various different annotation and retrieval concepts which are insufficient to form a distinct search experience for a multi-media repository on their own. As the possibilities in analyzing image, speech and temporal features grow, our proposal can also be used as a model for creating distinct widgets. These widgets contain only those interface elements needed for one distinct query part, corresponding to one specific retrieval concept. Combining them into one search interface would significantly enhance the possibilities of information retrieval.
4 Reformulation Search is an iterative process. This has been thoroughly discussed in, among others, the classical model [13] or the 5-phase framework [15]. The query is iteratively reformulated after an inspection of the results until relevant information has been found. There is a difference between direct and explorative search. Methodic search includes alternating search and browsing. I'm feeling lucky basically trusts the retrieval engine's first suggestion. Our approach focuses on the techniques most frequently performed in the work environment of TV stations: query reformulation and methodic search. On the rare occasion that a user actually performs a known item search, a textbased query or query by example is sufficient. But most of the queries are so vague that users just hope to find something similar to their mental reference, needing to refine their query substantially. Typically, users start out with a short query and
244
A. Berger, J. Kürsten, and M. Eibl
Fig. 4. Retrieval Tasks
incrementally modify their query after inspecting the results, slowly forming a string of reformulation emerging in their heads. The back and forth of re-editing the textbased query leads to an iterative modal break and obstructs the users' string of reformulation as previous versions of the text-based query are no longer present after editing. Moreover, there is no tool to visually memorize all the steps taken in the reformulation. Our proposed interface is based on a graphical metaphor for reformulating multi-media information need. It allows end-users to begin with an initial query, then to add more queries in the form of interface widgets in order to narrow down the search results, all while maintaining a graphical representation of the performed queries. This visual string can be edited.
Fig. 5. Widgets & Flow Of Reformulation
We worked out a basic set of various text-based widgets based on users' needs. We also established design guidelines for producing interface widgets that adhere to our conceptual model. Textual widgets can be individually modeled based on a design styleguide and an XML schema. Additionally, we created graphical widgets built on top of the current research status at SACHSMEDIA for automatic multi-media annotation, and included those as well. This allows end-users to use the most appropriate search widgets for a step-by-step reformulation of their queries according to their needs.
Visual String of Reformulation
245
Based on the rich possibilities of contemporary interface design for creating interactive widgets and a user centered customization approach, we like to advance the common filter/flow metaphors for open multimedia repositories, content-based metadata and unpredictable, vague queries. Shneiderman [16] proposed the metaphor of water flowing through filters for a visual representation of boolean query formulation. In his evaluation, users not familiar with Boolean algorithms, showed significant better search results and user-satisfaction compared to using SQL-syntax. However, Shneidermans concept was only applied for searching in closed databases with descriptive metadata. Repositories have grown ever since and most end-users still do not understand Boolean query formulation properly. Jones proposes Venndiagrams [7] in a similar approach to visualize reformulation. With concepts like the Islands Interface [3], Sentinel [9] or InfoCrystal [18] the basic concept is too difficult for user acceptance. That is why expert search systems nowadays focus on more basic metaphors restricting users to more basic ways of query reformulation. The mismatch between these two concepts still persists. Krause coins it like that: users don‘t want to think about how to interact with a system. On the other hand they insist to adapt to a predefined flow because it feels like narrowing their possibilities. [2] This is the reason we like to let users form their own flow as precise or vague as adequate, leaving it in the users hands to formulate simple or complex queries all within one conceptual model, expanding the visual string accordingly. 1. Widgets: According to Krauses model [10] every widget is conceptionalised as a tool with three capabilities or usage sides. On the users side the widget acts as the most suitable tool according to the users capabilities and knowledge and the query at hand. Although possibilities for interaction are pretty endless here, text based widgets may dominate professional work environments for a while. But basic content based retrieval widgets are already successfully implemented for distinct tasks and are helpful as query terms for reformulation. More complex widgets will be useful for more complex tasks and specific users. Developers and designer may create or customize widgets according to emerging user needs and technical possibilities. As the visual string is already evaluated as usable, distinct widgets may be compared against each other for user acceptance and usability. On its system side the widget transforms the text- or graphic based query to XML the retrieval engine can process. It also features a meta side for valuing its own capabilities (precision/recall) and dependencies (to previously submitted query terms). 2. Flow: Current retrieval interfaces lack any user accepted visualization of the query reformulation process. To visually formulate a multi term query, the widgets may be ordered next to each other. Our proposal lets users position widgets vertically below each other, to combine query terms for reformulation. Our findings show that when users drill down results by adding a widget in order to add another query term for reformulation, they expect widgets to be AND-ed when connected vertically, as single queries from within a widget are also AND-ed. Accordingly, a horizontally added widget will be interpreted as an OR. NOT is an option included in each widget to exclude its term from the query. Removing a search widget eliminates this query term. For a single term known item search users may just use one widget.
246
A. Berger, J. Kürsten, and M. Eibl
5 Discussion From Objective to Subjective: We have discussed the everyday dilemma of heterogeneous user population, as well as the co-existence of unambiguous intellectual metadata and vague, automatically annotated metadata, and ways of handling these in a working environment. From a design-oriented perspective, this is more of an advantage than a drawback, as it demands less dualistic forms of interface design and evaluation. From Quantitative to Qualitative: The mantra of the many right ways is a conceptual thread fundamental to this paper and demands an even deeper investigation. Building on top of what is currently being practiced as “qualitative evaluation” [6], we are categorizing scenarios that profit from users' subjectively ideal interfaces. On the other hand, we are also looking into ways to derive subjective instantiations to be integrated in more quantitative scenarios that usually focus on the “objectively good” interface. Interpretation: It is the designer's job to support users in the interpretation of the system. [18] As the interface is the entry point to a search engine that provides exact and fuzzy results, it has to be interpretable and usable as such for the users. The designer's goal is not to propose a generic search interface that will fit most users and imaginable retrieval scenarios, but to create one that adapts well to different users and various precision/recall ratios, and the users' according expectations and interpretations. The interface therefore has to reflect the range of low level content analysis up to the highest level of sophisticated metadata for the users' interpretation. In this context, a more exact or more vague retrieval outcome has to be communicated. Interface design, incorporating graphic design to a greater extent, will help, managing the increasing complexity of usage scenarios as well as the problem of visualizing various retrieval outcomes. Taking into account that different users and heterogeneous user groups will interpret different retrieval outcomes differently, the system has to communicate actively to the user its current state of helpfulness and therefore its skills. Users' feelings and demands have to be considered more carefully and in all phases of the project. That is: qualitative insight counts more than quantitative insight, as the system instantiates itself differently in certain stages and to different users. There is a range of possibilities offered by the system, and a range of interpretations. Both have to be considered for evaluation. The user interface has also to take into account that users' goals might differ significantly from task to task. It has to stay open for interpretations of what is searchable and how precise this could possibly be.
6 Future Work With the future incorporation of context-aware interpretation of the users' knowledge and tasks, as well as the repository and its metadata, the blurring of what objective/subjective serves the information need, gets more apparent. Suitable ways
Visual String of Reformulation
247
of analyzing tasks on micro, meso and macro layers [14] have to be found and applied for evaluation and for implementation of a recommender system. Two more problems will be discussed in our future work. The next step is an evaluation in comparison to an ordinary expert search, assessing the users' thoughts on the quality of the retrieval outcome. Secondly, we are working on several ways of visualizing the retrieval outcome more efficiently and in a more inviting manner for browsing – in favor of another modal gap in need for closing. Multitouch interfaces look promising for that, as they may easily incorporate the current work on query reformulation and magic lenses as filters. Along with a more efficient and more usable result visualization, this will help users switch more seamlessly between the modalities typical for known item searches and explorative searches.
References 1. Bates, M.J.: The Design of Browsing and Berrypicking Techniques for the online search interface. Graduate School of Library and Information Science, University of California at Los Angeles (1989) 2. Bauer-Wabnegg, W., Krause, J.: Visualisierung und Design - Grundlagen von Softwareergonomie und Mediendesign. Universität Koblenz-Landau (2003) 3. Brooks, M., Campbell, J.: Interactive Graphical Queries for Bibliographic Search. Journal of the American Society for Information Science (JASIS) 50(9), 814–825 (1999) 4. Buxton, W., Greenberg, S.: Usability Evaluation Considered Harmful (Some of the Time). In: Proceedings of the ACM CHI 2008 Conference on Human Factors in Computing Systems. ACM Press, New York (2008) 5. Hearst, M., et al.: Finding the flow in web search. Communcations of the ACM 2002 45(9) (2002) 6. Ireland, C.: Qualitative Methods: From Boring to Brilliant. In: Laurel, B. (ed.) Design Research. The MIT Press, Cambridge (2003) 7. Jones, S., McInnes, S.: A graphical userinterface for Boolean query specification. International Journal on Digital Libraries, 207–223 (1999) 8. Maeda, J.: The Laws of simplicity. The MIT Press, Cambridge (2006) 9. Knepper, M.M., Killiam, R., Fox, K.L.: Information Retrieval and Visualization using SENTINEL. In: Proceedings of the Trec-7 Conference, pp. 393–397 (1997) 10. Krause, J.: Das WOB - Modell - Zur Gestaltung objektorientierter, grafischer Benutzungsoberflächen. In: IZ Sozialwissenschaften. Universität Koblenz-Landau, Bonn (1996) 11. Pangaro, P.: Design as I see it, http://pangaro.com/design-is/index.html 12. Pangaro, P.: Participative systems, http://pangaro.com/PS/index.html 13. Robertson, S.E.: Theories and Models in Information Retrieval. Journal of Documentation 33, 126–148 (1977) 14. Russell, D.M.: What are they thinking? Searching for the mind of the searcher. Invited presentation, JCDL (2007) 15. Shneiderman, B., Plaisant, C.: Designing the user interface. Pearson, Boston (2005) 16. Shneiderman, B., Young, D.: A Graphical Filter/Flow Representation of Boolean Queries. Journal of the American Society for Information Science 44, 327–339 (1993) 17. Sengers, P., Gaver, W.: Staying Open to interpretation: Engaging multiple meanings in design and evaluation. In: DIS 2006 (2006)
248
A. Berger, J. Kürsten, and M. Eibl
18. Spoerri, A.: InfoCrystal: a visual tool for information retrieval & management. In: Proceedings of the second international conference on Information and knowledge management, Washington, D.C, pp. 11–20 (1993) 19. Stempfhuber, M.: Towards Expressive and User friendly interfaces for digital libraries containing heterogeneous data. In: Eibl, M., Wolff, C., Womser Hacker, C. (eds.) Designing Information Systems. Festschrift für Jürgen Krause, pp. 198–208. UVK Verlagsgesellschaft mbh, Konstanz (2005) 20. Regelwerk Mediendokumentation, http://rmd.dra.de/arc/php/main.php (last visited, February 1, 2009) 21. http://labs.systemone.at/retrievr (last visited, February 1, 2009) 22. http://xcavator.net (last visited, February 1, 2009) 23. http://www.cuil.com (last visited, February 1, 2009) 24. http://www.quintura.com (last visited, February 1, 2009) 25. http://www.seeqpod.com (last visited, February 1, 2009) 26. http://www.veoh.com (last visited, February 1, 2009)
Industrial E-Commerce and Visualization of Products: 3D Rotation versus 2D Metamorphosis Francisco V. Cipolla Ficarra1,2, Miguel Cipolla Ficarra2, and Daniel A. Giulianelli2 1 2
ALAIPO: Asociación Latina de Interacción Persona-Ordenador AINCI: Asociación Internacional de la Comunicación Interactiva HCI Lab. – F&F Multimedia Communic@tions Corp. Via Pascoli, S. 15 – CP 7, 24121 Bg, Italy [email protected], [email protected]
Abstract. In this current work we make a study of communicability in industrial websites designed for the on-line sale of products, bearing in mind two kinds of 3D rotational animations, and 2D transformations. A table will be presented for the assessment of the communicability of the objects within e-commerce, in particular those which use mobile phones and the Internet. Also presented will be guidelines for qualitative design that may be employed without the need for a usability lab. The universe of study consists of 20 websites, randomly chosen from a total of 200. The contents of these websites is related to electronics, mechanics and renewable energy. Keywords: Visualization, Rotation, Metamorphosis, Computer Animation, Communicability, User-Centered Design, Quality, E-commerce.
1 Introduction One of the main problems in the evolution of hypermedia systems is the amount of visual information that can be presented in the interface, bearing in mind the diverse dimensions of the screens of the personal computers and other devices that hold a CPU; such as a PDA. Obviously an image, and especially a high definition 3D image with the possibility of mobility in 360º, can tell more than a thousand words. Obviously there are exceptions in the case of designs tending to damage the communicability as is the case of the ambiguity of icons or the vagueness of the content [1], [2], [3]. In these situations some claim that the veracity of the image disappears and it is necessary to resort to the text [4]. However, in the new millienum the digital image is taking a bigger space inside the workplace and the home, through the screens of a bigger dimension and greater resolution for the videoconferences or the fruition of films [5]. Many of these images are emitted thanks to a computer, others, in contrast, are mere electronic devices but which can admit interaction as is the case of the interactive blackboards or the Nintendo Wii, for instance. The new organic interfaces are quickly generating a new way of interacting with 3D images [6]. The user can choose and freely manipulate in the space and virtual scenarios. Perhaps it is an ideal solution to solve the problem of the balance sensation that certain virtual M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 249–258, 2009. © Springer-Verlag Berlin Heidelberg 2009
250
F.V.C. Ficarra, M.C. Ficarra, and D.A. Giulianelli
reality environments generate. As almost always happens with these breakthroughs made with computers, the market aim prevails over the educational purpose. Therefore, the commercial sector, marketing, publicity, etc. are the first to ue them. In some cases, the members of this sector resort to art, tourism, ecology among other contexts to draw the user's attention. Not for nothing we are moving towards an era of hypermedia immersion inside the home, starting with sport videogames (tennis, golf, football, etc.) which work with the physical and real movement of the users in such devices a the Nintendo Wii. That is to say, it is about creating a virtual environment around the user so that he interacts with the new technologies inside the home [7]. In the current work we present the importance of the off-line and on-line art-related multimedia systems to draw the attention of the user towards the use of the computer to visit real museums. That is to say, the transition from a user to a visitor. Then an investigation is amde of virtual characters to draw the attention in the consumption of products and the current evolution in the presentation of products mimicking 3D.
2 Art and 3D Navigation There is a tendency of the user to try to get inside the digital images themselves, especially in works of art of renowned international artists. As a rule this is something that is done in the home with the purpose of entertainment. These are users who avoid resorting to the technology of virtual reality due to the high costs involved which even today implies a need for glasses and gloves for the fruition of virtual reality in many European homes. However, this interaction in some hypermedia artistic products and of optimum quality was possible in off-line supports. An excellent example was the Van Gogh CD-ROM of the late nineties [8]. Tridimensional navigation through the pictures of the Dutch artist makes it possible to go deep into each one of the objects that make up the image, giving unique perspectives and a priori imaginable by the users [9]. With the momentum of Internet there is a growing demand in the design of interfaces to obtain the maximum possible visual information (dynamic and static). Some bidimensional resources such as transparencies and blurring of the pictures are beginning to be insufficient with the Web 2.0 and Web 3.0 [10]. The users need to see in a global way, the results of their on-line searches of videos, animations, photographies, maps and other digital images. Obviously, this operation is feasible if their dimension is considerably cut down in the monitor. Another alternative are the mural pictures or dynamic blackboards where the different interfaces are being displayed in a row. Some applications for this purpose are Searchme and Spacetime (navigator in 3D, which needs to be installed in the computer). Also the Windows Vista operating system offers in its most advanced editions this mode of visualization. Digital tridimensionality takes the ambition of use of the perspective a step further, which has been to plunge us inside the image, making coincide the ‘scape point’ with our point of view [11]. A more recent example is the voyage to the inside of the main works of the El Prado Museum through Google Earth (figure 1). Although the exhibit rooms can't be roamed about freely, sight simulates entry to the museum and presence in front of a picture. This can be seen down to the slightest detail, widening the resolution of the paintings until details are found which are inappreciable in the natural size. After
Industrial E-Commerce and Visualization of Products
251
Fig. 1. El Prado Museum through Google Earth
exploring the work, the user can analyze the works without the need of going to successive frames. That is to say, there is a kind of sequentiality in interactive immersion. These are the new possibilities of counting on a fluent interface which lacks the transitions among the spaces that are visited and which boost and motivate the fruition of the artistic works. This interactive artistic communication process in which the user must merge with the content has been widely experienced in the virtual reality environments [12]. However, the experiences in multimedia off-line commercial products have demonstrated a great acceptance by the young and adult users, thanks to the richness of the colours or the novelty of navigating inside the pictures without a need of having available the typical peripheric of virtual reality: glasses, gloves, etc. The strength of the traditional audiovisual means is greater among the adult users than among the young who have interacted from an early age with tvideogames consoles, multimedia phones and internet. Many of this new generation of users have learned to use systems in an autonomous way or accompanied by virtual assistants [13]. These animated assistants usually have a very important role in the motivation and fostering of interaction. However, they have not been massively included in the sale of products and on-line services.
3 Audiovisual, Computer Animation and Morphing If we make a diachronic study of the current animations we see that the cinematographic principles still stay in force in the computers. Digital animation may be relatively new in hypermedia systems, but the designers, programmers and other
252
F.V.C. Ficarra, M.C. Ficarra, and D.A. Giulianelli
professionals of the audiovisual sector usually have experience and/or cinematographic knowledge. That is to say, these specialists have expertise that has been passed down from those who have been animating since moving pictures were first invented. Their knowledge and experiences have been translated to animations using computers. The diffusion of the animations by computers is also due to the progress of graphic informatics and to the reduction of the cost of hardware in the 90s. Although the personal computer makes the creation of 2D and 3D easier to generate and easy to modify, it is essential to have the theoretical knowledge of the classic animation of the 40s. In regard to commercial software for animation and considering only polygonal modeling, the different programs are similar. That is, there is a tendency to create a common denominator inside the animation software, regardless of the hardware which has been used. The design of the off-line multimedia systems has also made a contribution in this sense, and to the interfaces of the operating systems. This possibility of changing shape in a single space on the screen, where metamorphosis has an essential role. The origin of metamorphosis lies in polygonal modeling [14]. The foundations of polygonal modeling are simple because they are based on three main elements: the vertex, the edge and the polygon. There are a set of operations to manipulate these main elements such as: extrusion, leveling, cutting, joining, lengthening, etc. A 3D animation may arise from a 2D drawing, through the use of the scanner or its realization in 2D and later to insert the ‘z’ coordinate or to scan 3D objects directly, for instance. In both cases it is a priority to maintain the concept of simplicity in animation of the character and its environment because of the costs factor. The 3D commercial programs have a high number of options to build up a virtual character (in the current work we use the terms of animated and virtual character as synonymous). In the modeling of geometry it can be polygonal, nurbs, etc. and it can be generated as a single entity (it is more complex at the moment of carrying out special changes or movements) or to divide it in the parts that make it up, as in the classical drawing mannequin. This segmentation is ideal for virtual characters of the robot kind, and they are usually positively accepted by the users in the teaching process. The realistic characters, that is, emulators of human characters and animals are usually not segmented. However, it is advisable to segment them in their main parts, so that the construction, eventual modifications and animations are faster, since the computer does not have to calculate complex changes in real time. The qualitative disadvantage from the visual point of view is that sometimes these articulations are perceived by the users. A way to solve the problem is through items of clothing, for instance, gloves, trousers, belts, ties, necklaces etc. In the case of these items of clothing, in order to gain realism to using textures in real tissues is a positive course of action. In our universe of study we have not detected virtual characters for the presentation of objects with metamorphoses, for instance. The animation of the objects with metamorphoses may be of two kinds: automatic as in the case of the Italian Design off-line multimedia system [15] (figures 3 and 4), or manual. In the latter case the user has the possibility of swinging the product on the axes x-y-z. Next an example on-line where the user can see each one of the sides of the mobile phone. In some publicity banners it is feasible to find changes of automatic objects (photographs) and the possibility of changing manually the colour of the components, as is the case of the photo cameras from the following example:
Industrial E-Commerce and Visualization of Products
253
Fig. 2. The user can see each one of the sides of the mobile phone (3D rotation)
Fig. 3. Motorcycle morphing (start frame)
Fig. 4. Motorcycle morphing (end frame)
3.1 Morphing and Users There are numerous studies currently that show the high number of variables to be considered at the moment of presenting information in the classical bi-dimensional screens of devices such as computers, e-books, PDAs, etc. [16]. The problem that arises in the international sale and merchandising of industrial products of a similar type is that the products’ unique details and characteristics are not immediately visible to the user (for example, internal components details, material, structure of the products, etc.) Therefore, it is necessary to draw the attention of the user to the product, and eventually to its main details. One solution is to resort to holeography or virtual reality. However, due to other factors, especially the cost of the hardware and
254
F.V.C. Ficarra, M.C. Ficarra, and D.A. Giulianelli
the software to do so, both technologies do not have the same circulation degree as a multimedia phone or iPhone, for instance. Consequently, many manufacturers resort to the use of rotation over the ‘y’ axis of the product in order to see each one of its external aspects. This is particularly the case when we talk about finished products where design is an important factor for its purchase, where potential distributors all around the world are concerned. Obviously, this is a low cost solution if you have the product in a tri-dimensional format, that is to say, if the design has been in advance. Otherwise, making these products may entail high costs because of the time that this involves and the staff that are required. Another alternative is to resort to the transformation or bi-dimensional morphing technique. As a rule, it is a bi-dimensional animation that works on the ‘y’ and ‘x’ axes. Obviously, the possibility of emulating the ‘z’ coordinate in the bi-dimensional metamorphosis exists, through the use of visual perspective and shadows. One can resort to digital photography or graphics databases where obviously, the costs are lower than using 3D [17]. Bi-dimensional metamorphosis usually has a great initial impact on the user, that is to say, the power of drawing attention to the products is superior to that of the simple rotation of a product in one of the three axes. Nevertheless, depending on the culture to which the users belong, the use of this type of animation transformation can encourage users to read publicity banners, for example. Another of the variables to be considered when choosing websites with morphing or rotations is the temporal factor. The rotation movements inside the objects that make up our universe of study have demonstrated that the user feels an identification with the present, whereas metamorphosis breeds a communicative empathy with the past. Rotation and morphing in the context of graphic computer science can be included in the set of operations of object transformation [14]. The rotation in graphic computers consists of making an object spin in some of the following axes in the case of bidimensional objects: x e y or x,y,z, in 3D objects. With simple trigonometric calculations it is feasible to determine the coordinate of a spot, after the transformation in function of the original coordinates and the rotation angles that have been chosen. Now, in the case of the images or a selected region, some inconveniences may arise. The origin of these lies in the fact that some pixels may find themselves after the rotation in a position which does not correspond to the normal position of a generic pixel. In 90º operations the pixels are in the right position, in the rest of the cases it is necessary to carry out an interpolation operation, for instance. The rotations, translations and objects scales belong to the set of flat transformations. Some authors include morphing among the deformation operations [19]. Through these operations programs can be carried out which make it possible to turn an image into another, with a given continuity. This is achieved with an interpolation between two different successive images. The origin of this technique in the audiovisual media is to be found in the cinema, with the use of interrelated close-ups, that is, a series of images that overlap until reaching the final image [9]. In the tridimensional cinematographic framework there are numerous films about it, especially when one talks about special effects, for instance, A.I. Artificial Intelligence; I, Robots; Transformers, etc. Now in the case of morphing it is not always feasible to describe with an equation the transformation of the plan you want to apply. Therefore, there is an approximation, that is to say, we imagine the plan made up by a grid of irregular dots, the final image will be made up by a dotted grid, not necessarily regular, but
Industrial E-Commerce and Visualization of Products
255
Fig. 5. Escher Interactive –2D morphing in central area
from which it is possible to obtain the necessary information of the transformation law from each one of the dots on the grid from the initial to the final position. Therefore, by using an interactive program, the designer can modify the starting grid and later on the program will carry out the transformations in each point of the image. Next an excellent morphing example in a multimedia off-line system about Escher's works [20].
4 Metrics for Morphing The morphing is included in the animation set. The animations are components of the dynamic means of the systems. Consequently, the morphing must have a set of quality attributes such as animations. These quality attributes must be decomposed later on for their evaluation into primitives for their analysis. This is the strategy followed in our methodology, which is made up by a series of techniques for heuristic assessment stemming from communication and usability: observation, interviews and test with users, in the current work. These metrics are inscribed inside the context of software engineering. Concerning this, it is important to locate them following two authors like Fenton [21] and Pressman [22]. Fenton, N. [21] claims that measurement has a purpose to describe the attributes of the entities, but there are three kinds of entities inside the softwares, and each one of them is "aimed" at a given aspect of the software: • The processes (predominance of the temporal component) There is a short number of internal attributes which are applied directly in the processes: time that a given process last, required effort and the total of incidences of a given kind. The first works on the subject go back to 1972, and were aimed at improving the quality of the software and increasing the productivity of the programmers [23]. • The products (as results of the process). They include objects ranging from the lifecycle of the software, such as: specifications and level details or the strategy to be followed to carry out an examination. • The resources (components that intervene in the software process). The cost factor is the one that prevails here. The method that is presented cuts down considerably such expenses, by profiling the figure of the specialists in multimedia heuristic assessment.
256
F.V.C. Ficarra, M.C. Ficarra, and D.A. Giulianelli
The internal attributes of the product, of the process or the resources are those that can be measured in terms of: product, process or resources. Whereas the external attributes of the product, of the process or the resources can only be measured on the basis of how the product, the process or the resources are related to the environment. In some way the metamorphism applied to E-commerce is related to these three aspects. We have a temporal component to assess the transformation of the images in regard to the goal that is pursued in communication, for instance, to show details in a lesson over the evolution of a geographical area. In software engineering, Pressman [22] enumerates the following reasons for the justification of an evaluation: In order to indicate the quality of the product, to evaluate the productivity of the staff who participate in the production; to measure the benefits (productivity/quality) deriving from the use of new methods and tools of software engineering, and to support the professional training or the use of new tools. Therefore, according to Pressman, in relation to the goal to be reached it is possible to differentiate the following software metrics [22]: • Technical metrics: these refer to the software characteristics and not so much to the process that was needed to develop the software. • Productivity metrics: these are aimed at the production process. The statistical data are important in order to establish comparisons between the present and the past. • Quality metrics: these respond to how best to satisfy the needs expressed by the customer/user, and it is usability engineering and semiotics that occupies itself the most with these aspects, whether it is from the point of view of the interaction of the formal and factual sciences [24]. • Size-oriented metrics: through these, direct measures of the result can be observed as well as the quality of software engineering. • Function-aimed metrics: such metrics focus on the functionality or usefulness of the program. • Person-aimed metrics: they give information about the human point of view of effectiveness of the tools and of the used methods. They are the first target of study in usability engineering [25], for instance. Implicitly therein lies communicability. The set of metrics that has been used in the current method regards inside the quality metrics the intersection among those aimed at person [22]. It is the user who decides in E-Commerce which product to buy [26]. The quality of that decision lies in the possibility of viewing different perspectives of the virtual product. We believe it is essential that the virtual shops integrate presentation through the metamorphosis and mimicking of 3D. Obviously, there are products that are sold through the sense of touching, such as the textile, and little can be done by knowing such details about the tissue as color, shape, etc. The quality attributes and a first guide for the assessment of the quality that we have used in our research can be seen in annex #1.
5 Conclusions The current work has made apparent the scarce use of metamorphism in industrial E-commerce on-line. Although it is an instrument of great utility for explaining the
Industrial E-Commerce and Visualization of Products
257
details of components and is accepted in 98% of the studies made with users, currently the designers do not use it in the multimedia systems. Besides, instead of boosting the 2D or 3D products that are intended to be marketed, it has been seen that in 92% of the websites that the use of metamorphosis lies in the bidimensional publicity of the banners. The use of the metrics related to human-computer interaction, semiotics, the design models of multimedia systems and software engineering have made it apparent that those analyzed websites that use it have an excellent quality and are linked to the navigation tridimensional emulation on the computer screen. The off-line multimedia systems of the nineties have used the metamorphoses mainly in design and art contents. Some of the components of graphic, static and dynamic design, such as can be the guides or virtual assistants, are positive for the potential users of industrial products, however currently they are not resorted to in industrial E-commerce.
Acknowledgments Thanks to Emma Nicol (University of Strathclyde) and Maria Ficarra for their helps.
References 1. Nielsen, J., Tahir, M.: Homepage Usability –50 Websites deconstructed. New Riders Publishing, Indianapolis (2002) 2. Cipolla-Ficarra, F.: A User Evaluation of Hypermedia Iconography. In: Proc. Compugraphics, GRASP, Paris, pp. 182–191 (1996) 3. Cipolla-Ficarra, F.: An Evaluation of Meaning and Content Quality in Hypermedia. In: CD-ROM Proc. HCI International 2005, Las Vegas (2005) 4. Debray, R.: Vie et mort de l’image, Gallimard, Paris (1995) 5. Venkatesh, A.: Computers and Other Interactive Technologies for the Home. Communications of the ACM 39, 47–54 (1996) 6. Ishii, H.: The Tangible User Interface and Its Evolution. Communications of the ACM 51, 32–36 (2008) 7. Lundgren, S.: Designing Games: Why and How. Interactions 15, 6–12 (2008) 8. Missione Van Gogh CD-ROM. L’Espresso, Roma (2000) 9. Manovich, L., Kratky, A.: Soft Cinema –Navigation the Database. MIT Press, Cambridge (2005) 10. Silva-Salmerón, J., Rahman, M., El Saddik, A.: Web 3.0: A Vision for Bridging the Gap between Real and Virtual. In: Proc. 1st workshop communicability design and evaluation in cultural and ecological multimedia systems, pp. 9–14. ACM Press, New York (2008) 11. Zeki, S.: A Vision of the Brain. Blackwell, Oxford (1993) 12. Greenberg, D.: A Framework for Realistic Image Synthesis. Communications of the ACM 42, 45–53 (1999) 13. Van Welbergen, H., et al.: Presenting in Virtual Worlds: An Architecture for a 3D Anthropomorphic Presenter. IEEE Intelligent Systems 21, 47–53 (2006) 14. Newman, W., Sproull, R.: Principles of Interactive Computer Graphics. McGraw Hill, New York (1979) 15. Italian Design CD-ROM. Editel, Milano (1994) 16. Carroll, J.: Human-Computer Interaction in the New Millennium. ACM Press, New York (2001)
258
F.V.C. Ficarra, M.C. Ficarra, and D.A. Giulianelli
17. Hays, J., Efros, A.: Scene Completion Using Millions of Photographs. Communications of the ACM 51, 87–94 (2008) 18. Wolberg, G.: Image Morphing: A Survey. The Visual Computer 14, 360–372 (1998) 19. Goes, J., et al.: Warping & Morphing of Graphical Objects. Morgan Kaufmann Publishers, San Francisco (1999) 20. Escher Interactive CD-ROM. Byron Preiss Multimedia, New York (1996) 21. Fenton, N.: Software Metrics: A Rigorous Approach. Chapman & Hall, Cambridge (1997) 22. Pressman, R.: Software Engineering –A Practitioner’s Approach. McGraw-Hill, New York (2005) 23. Fagan, M.: Advances in Software Inspections. IEEE Software 12, 744–751 (1986) 24. Cipolla-Ficarra, F.: Synchronism and Diachronism into Evolution of the Interfaces for Quality Communication in Multimedia Systems. In: HCI International 2005, Las Vegas (2005) 25. Nielsen, J.: Usability Metrics –Tracking Interface Improvements. IEEE Software 13, 12– 13 (1996) 26. Jiang, Z., Wang, W., Benbasat, I.: Multimedia-Based Interactive Advising Technology for Online Consumer Decision Support. Communications of ACM 48, 92–98 (2005)
Appendix Annex #1: Table –Design Categories: Presentation, Navigation and Content • Camera effects: zoom in, zoom out, horizontal movement, vertical movement, angular moviment, travelling shot –crane shot, tracking shot, dolly shot, etc. (P, N) • Deformation control and velocity: automatic and/or manual (P, N) • Dynamic media with morphing: film, animation, pictures collection, etc. (P) • External form (image or object): 2D and/or 3D (P) • External transformation: texture, color, illumination, brightness, contrast, etc. (C, P) • Format image: vectorial or bitmap • Frames form a morph from: pictures, objects, etc. (C) • Morph components: only photos, only pictures 2D and/or 3D, only illustrations –draws, maps, graphs, etc., combination photos and pictures, combination illustrations and photos, etc. (C) • Movement form: scale, translation, rotation, etc. (N, P) • Observer and vision: rotoscopy, angular vision, global vision, etc. (P, N) • Rotation: manual and/or automatic; degrees –static or dynamic, all coordinates –y, x, z, etc. (N, P) • Transition effect: fade, matched cut, dissolve, wipe, etc. (P) • Transformation: image 2D/3D or object 2D/3D (P, N) • ‘Y’ coordinate: add in 2D image or subtract in 3D (P, N, C)
Evaluating the Effectiveness and the Efficiency of a Vector Image Search Tool Patrizia Di Marco, Tania Di Mascio, Daniele Frigioni, and Massimo Gastaldi Department of Electrical and Information Engineering, University of L’Aquila, Poggio di Roio, I-67040 L’Aquila, Italy {tania.dimascio,daniele.frigioni,massimo.gastaldi}@univaq.it
Abstract. In this paper we develop VISTO (Vector Images Search TOol), along two directions: (1) we present a new interface for VISTO which is more sophisticated than the original one, since it has been developed having in mind the users and their retrieval requests; (2) we provide a much deeper evaluation of the effectiveness and the efficiency of VISTO in the specific domain of the Blissymbolic images.
1 Introduction The research in the field of Content Based Image Retrieval (CBIR) has been concentrated in the past mainly on raster images. It was maybe the wide variety of formats available for vector images, along with their strong dependence on application programs, which discouraged research in CBIR systems for vector images. It may also be noticed that raster images rule the roost on the World Wide Web, notwithstanding the convenience of vector images on the web, for their reduced size, as well as for the possibility of client-side scaling, which avoids new images to be sent. However, in the last years, the growing popularity of new vectorial-based web design programs, such as Macromedia Flash, along with the new format SVG (Scalable Vector Graphic) proposed by W3C, are changing this trend, promising to bring vector graphics to ordinary web pages soon. Notwithstanding this increasing interest, the great majority of CBIR systems proposed in the literature still deal with raster images (for a complete survey we refer to [16]). To the best of our knowledge, the unique proposal that try to solve CBIR when images are represented in a vectorial data model is VISTO (Vector Images Search TOol). VISTO has been introduced and developed in [9-11] by considering an initial application domain represented by a 2D animation production environment supporting cartoon episodes management. VISTO was initially developed to meet the cartoonists requirements, and also with the aim of having a system that can be tuned to satisfy the requirements of other application domains. In fact, other application domains utilizing vector images (e.g., Clip-art, and CAD systems) share similar requirements. The main characteristics of VISTO from the engine and the interface point of view have been described in [9, 11], while a preliminary experimental evaluation was M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 259–268, 2009. © Springer-Verlag Berlin Heidelberg 2009
260
P. Di Marco et al.
carried out in [10], whose purpose was to evaluate the effectiveness and the efficiency of VISTO by studying the so-called Precision versus Recall curves (see, e.g., [7, 15]). The effectiveness of VISTO was demonstrated by the fact that the behavior of these curves was always descendent. The contribution of the paper is twofold: We present a new interface developed for VISTO. The first prototype [9] of VISTO interface was developed only for tuning purposes of the system and hence it appeared quite raw. The new interface is more sophisticated and it has been developed having in mind the end-users and their retrieval requests. The main characteristics of the new interface are described in Section 2. We provide a deeper evaluation of the effectiveness and the efficiency of VISTO using the Blissymbolic images application domain. In the evaluation process it is important to follow a consolidated evaluation methodology. To the best of our knowledge, a shared evaluation methodology for CBIR systems is not known in the literature. To this aim, we first studied evaluation methodologies in the area of Multimedia Retrieval. As a result, we derived a set of reasonable choices that can be applied to the evaluation of CBIR systems (described in Section 3). After that we applied the derived rules to the evaluation of VISTO in the Blissymbolic images application domain. The outcome of the experiments is described in Section 4 and can be summarized as follows: (1) the effectiveness and the efficiency of VISTO have been confirmed also in the new application domain; (2) for each category of images, we determined the more appropriate engine among those of VISTO; (3) we determined the image category on which VISTO has the best performance.
2 The VISTO System Similarly to CBIR systems for raster images (see, e.g., [16]), VISTO was initially bound to the application domain and it uses, as feature representation, moments representing visual features of images. Differently from the CBIR systems proposed in the literature, VISTO uses the shape, and not the color, as the main visual feature, since in the context of vectorial images the shape is more important and representative than the color, and allows independence from affine transformations. Moreover, VISTO gives the possibility of interactively setting parameters of the retrieval process; it allows performing queries by sketch and queries by example. These design choices lead to a system supporting application domain users in searching tasks and researchers in domain-oriented tuning tasks. In what follows we concentrate on the aspects related to the engines and to the interface. 2.1 The VISTO Collection of Engines Engines currently available in VISTO follow the classical architecture of CBIR systems (see, e.g., [10]). Given a query image, database images are ranked based on the similarity with the input image, so that more relevant images are re-turned first in the query result vector. The processing hence requires a Feature Description Processor to extract visual features and to create a vector containing a proper descriptor of each
Evaluating the Effectiveness and the Efficiency of a Vector Image Search Tool
261
image, and a Comparison Processor to create a ranking vector representing the query image result using distances between descriptors. The similarity between any two images is computed as the similarity between the two corresponding descriptors. Concerning the Feature Description Processor, the image is considered as an inertial system which is obtained by discretizing the vectorial image, and by associating material points with basic elements obtained by the discretization process. The origin of the inertial system is then moved to the center of mass, to which transformation can be applied. Once an image has been transformed into an inertial system, the natural way to represent image shape is to exploit the first four central moments: average, variance, skew and kurtosis. These moments are indices of distribution providing useful information about the image. In our context, the average represents the dimension of image: low average means image poor in strokes, high average means image rich in strokes. The variance represents how image center of mass area is composed: low variance connotes image center of mass area poor in strokes, high variance connotes image center of mass area rich in strokes. The skew suggests the symmetry of images: high value of skew means low symmetry of image, low value of skew suggests high symmetry of image. Finally, kurtosis represents how image is composed, high kurtosis means image poor in empty areas, low kurtosis means image rich in empty areas. In the literature, different invariant central moments sets have been proposed, differing in the way the moments are computed (see, e.g, [17]). The moment sets supported by VISTO are those of Hu [5], Zernike [1], and Bamieh [17]. Concerning the Comparison Processor, our approach is to use metrics well consolidated in the literature, that is: Cross Correlation [1], Discrimination Cost [1], and Euclidean [17] distances. 2.2 The New VISTO Interface Differently from the first prototype [9], the new VISTO interface is organized in Tabs, each being dedicated to a specific task. In detail, the Basic and the Advanced Search tabs are designed to retrieving tasks, the Testing and the Clustering tabs are designed to tuning tasks. Moreover, three new types of results visualization have been included in the interface that well supports users in browsing results. The new interface supports both query-by-example and query-by-sketch. Result images may be selected as target images in a new search, in an incremental querying process. The new interface has been designed to help both application domain users and researchers in retrieving images and in tuning the engine in an interactive way, based on system feedback. In order to support users in these tasks, the new interface provides a Basic Mode for application domain users, and an Advanced Mode for researchers. The Basic Search Tab supports the Basic Mode, the Advanced Search Tab, the Testing Tab and the Clustering Tab support the Advanced Mode. • The Basic Search Tab is designed to handle users input actions, and it is composed of two windows, the query-selection window that is always displayed, and the query-result window that is invoked only when the results of the query are ready to be visualized.
262
P. Di Marco et al.
− The query-selection window is shown in the left part of Figure 1 and it is composed of two panels, the query-input panel (left part of the window) that accepts user input actions, and the query-view panel (right part of the window) that displays the query image selected. The query-input panel requires the user to provide an image, either by sketching it, or by selecting a file containing it. The image selected is automatically visualized in the query-view panel as shown in the right part of the Basic Search Tab. − The query-result window is composed of two panels; the result-view panel that displays the retrieved images ranked by similarity and the re-query-input panel that allows users to perform an incremental querying process. To better use the display space, the re-query-input panel is visualized by simply clicking on the “Show query panel” button, and the result-view panel use tabs to support different types of visualization. When displayed, the re-query-input panel has the same query-input panel form of the query-selection window; the Tabular visualization, the Detailed visualization and the 3D visualization are the different supported types of results visualization. Users can also just point and click on an individual image result to select it as target image in a new search. • The Advanced Search Tab is composed of two windows, the query-selection window and the query-result window. − The query-selection window is composed of four panels (see the right part of Figure 1); in the high part of this window, there are the query-input panel that accepts user input actions, the query-view panel that displays the query image selected, and the search-engine panel that contains engines supported by VISTO; users can select an engine by browsing different tabs of this panel. The fourth panel, containing the selected engine setting parameters, the available indexing and the folder indexing is automatically zoomed on when users click on the “Hide other parameters” button. The search button and the progression bar appear in the low part of this window. − The query-result window, is composed of three panels; the first is the re-queryinput panel that allows users to perform queries incrementally; the second is the result-view panel that displays query results in tabular, detailed and 3D visualization type; the third is the analysis-panel visualized when clicking on the “Hide statistics” button, it provides an initial indication on the search effectiveness. • The Testing Tab is dedicated to researchers to favor both an in-depth analysis of the tool effectiveness. It is composed of two panels: the test-input panel that supports users in selecting a new image file to be added in the test set, and the testview panel that shows the path of selected image files added in the test set. The “Add to query list” button, in the low part allows the adding operation. It is worth noting that only after that a testing session made using this tab is concluded, the “Hide statistics” button in the query-result window of the Advanced Search Tab is able to work. • The Clustering Tab supports the clustering process, which creates an optimized indexing. The objects of this tab are spatially organized according to the same philosophy used in the design process of the others tabs. This tab supports researchers in the clustering creation process used in the test sessions.
Evaluating the Effectiveness and the Efficiency of a Vector Image Search Tool
263
Fig. 1. The Basic Tab (left) and Advanced Search Tab (right) query-selection
3 Evaluation Methodologies In this section we summarize our study of the literature concerning the evaluation of Multimedia Retrieval system. 3.1 State of the Art Interesting results on evaluation methodologies have been proposed in the text, video and image retrieval areas. Among them we considered the systems of Table 1 and we studied their main features. Table 1. Considered Systems
Each column of Table 1 represents a feature we considered relevant for defining a proper methodology. The Test set column represents the images set used in the experiments: the size (♯im.), the number of categories (♯cat.), and the statistic consistency (st.). The Query set column represents the set of benchmark queries used to evaluate the system. The Query set features we considered is cardinality (♯que.) and statistic consistency (st.). The Ground-truth column represents relevance judgments. The evaluation Parameters column allows evaluators to observe the
264
P. Di Marco et al.
retrieval process and to discover where systems are weak. The Results Analysis column analyzes data obtained during experiments, in order to have correct conclusions about evaluation. The Test sets studied in the literature are often small in size; a large number of images is necessary to assure good results in recognizing differences or analogies among images. To guarantee the Test set goodness, it should be divided in categories to easily verify the statistic consistency. Not all evaluation methodologies of Table 1 consider categories and the statistical analysis lacks. We can conclude that the evaluation methodologies we studied: (1) use different test sets, (2) test set cardinality is often inadequate, (3) do not often use division into categories, (4) do not consider a statistic analysis of test sets. The Query set elements have to represent, as much as possible, all kinds of requests users submit to the system. Also in this case, it is important to check the statistic consistency. The methodologies we studied use a small number of queries. Also in the case of the Query set, statistic consistency analysis completely lacks. According to these considerations we can conclude that in the evaluation methodologies we studied: (1) query set size in often inadequate, (2) statistic consistency analysis of query set is not considered. The Ground-truth is not easy to define since it involves several aspects: environmental aspect (it depends on users’ present needs), dynamic aspect (it changes frequently), subjective aspect (it depends on users’ judgment), and cognitive aspect (it depends on users’ behavior and perception). To define the Ground-truth, it can be helpful to involve real users, in order to make more realistic relevance definition. Some of the considered systems involve real users (10 users in STAR, 9 users in MiAlbum); in some evaluations study the ground truth is not defined. We conclude that (1) real users help is not always used; (2) the ground truth is not always defined for queries. The evaluation Parameters are very important since they determine the system efficiency and effectiveness. In the literature several evaluation parameters exist [15], the more frequently used are Precision (P) and Recall (R) or parameters derived from them. Different parameters are used for example in QBIC [12], denoted as AVRR and IAVRR, and depending on the order of the relevant images and on the ideal order of the relevant images, and in Artisan [3], denoted as LPR, and depending on the position of the last relevant image found. To appreciate experiments results, devices (e.g., tables containing parameters values, graphics device histograms, and Cartesian graphics) are used as support in the Results analysis. During this study, some unpredictable and out of control errors occur; for them it is impossible performing a deterministic analysis and, in order to estimate if a relation between a variable and the observed effect exists, it is necessary a statistical test, as for example the well known ANOVA test (AT). The statistical inaccuracy has to be always included in the Results analysis; Table 1 refers that in VisualSeek and in MiAlbum evaluations graphical devices are used, but statistical tests are never performed.
Evaluating the Effectiveness and the Efficiency of a Vector Image Search Tool
265
3.2 Choices According to the considerations made in the previous section, we can derive the following reasonable choices: 1. A good Test set has to be large and made of heterogeneous images. It is important to define categories to test statistical consistency of the chosen Test set. To obtain a realistic classification it is advisable to involve real users to analyze images and to decide categories and their elements. 2. To define a Query set we can choose the set size and the set structure. We can define the Query set choosing a random sequence of images from Test set or we can choose Query set elements more carefully: we can consider an element (or more) from each category of the Test set. 3. To define the Ground truth it is necessary to define the relevance as follow: for a given query, all images belonging to the same category of the query are relevant. The Ground-truth is automatically defined: for each query the Ground-truth is the category of the query. 4. The evaluation Parameters to be chosen are Precision and Recall; in fact they are intuitive, easy to use and to be graphically elaborate. 5. To obtain good Results analysis, it can be helpful to organize the evaluation process by defining work sessions and operative work sheets. After executing experiments it is necessary a statistical analysis with ANOVA tests and logical analysis to find out data relations. We applied the above choices to the evaluation of VISTO as follows: (1) Test set: we consider a set of 400 Blissymbolic images in SVG format. Thanks to real users (♯real users: 5), the Test set is divided into 12 categories (see Table 2). We use the Indexing VISTO functionality and then we prove statistical consistency of the query set with the ANOVA test. (2) Query set: we consider one element of each Test set category (♯im.: 12); we also prove statistical consistency of Query set with the ANOVA test. (3) Ground truth: using VISTO functionalities for the ground truth definition, we save relevance judgments using the Testing Tab (see Section 2.2). (4) Parameters: VISTO offers charts depicted in the query-result window of Advanced Search Tab (see Section 2.2) to analyze Precision vs Recall curves. (5) Results analysis: tables containing evaluation parameters values, graphics device histograms, and Cartesian graphics are used for the analysis, and ANOVA tests are performed to statistically validate consistency.
4 Evaluation Experiments The goals of our evaluation experiments are the following: • Goal A: to evaluate the effectiveness of our system in terms of retrieval performance of all engines of VISTO over all queries listed in Table 2. • Goal B: to study the retrieval efficiency of each engines. • Goal C: to evaluate the best retrieved category.
266
P. Di Marco et al. Table 2. Blissymbolic images classification
• To these aims, we follow the methodology described in the previous section: 1. Test set: we have used a set of 400 Blissymbolic images in SVG format; 2. Query set: we have randomly selected the 12 queries listed in Table 2; 3. Ground-truth: following the real users judgments, an image j of the Test set is relevant for a query Q on an image i, denoted as Q(i), if and only if j and i belong to the same category. We denote as GTQ(i) the ground-truth set of query Q(i). For instance, for query france.svg in Table 2, the ground-truth set GTQ(france) is the set containing all images in the letters category; 4. Parameters: we used the well-known Precision and Recall measures (see, e.g., [7]); Precision is the fraction of the retrieved images which are relevant and Recall is the fraction of the relevant images which have been retrieved. The tests proceed in steps as follows: • Step 1: 12 ∗ 9 ∗ 96 = 10368 executions are issued to the system, in fact, 12 are the queries, 9 are the VISTO engines, and 96 are the couples cut-off and generality (from now denoted as (k, g)) chosen to evaluate the effectiveness of the system. Given a query Q(i) on a collection CQ(i), we define g = |GTQ(i) |/|CQ(i) | and k = |AQ(i) |, where |CQ(i) | is the cardinality of the collection and AQ(i) is the set containing all images of CQ(i) ranked by similarity with respect to i. According to the cardinality of the Test set categories, we chose fixing k ∈ {3, 6, 9, |GTQ(i) |} (k = 6 means that 6 are the retrieved images) and g ∈ {0.3, 0.5, 0.6, 0.9} (g = 0.5 means that the collection contains 50% of the relevant images). • Step 2: in order to perform executions described in the previous step, for each query Q(i), the collection set CQ(i) must be composed according to the g value we considered; then CQ(i) will contain all images relevant for Q(i) (all images ∈ GTQ(i)) plus a number of images, randomly selected from the Test set, such that |CQ(i) | = |GTQ(i) |/g.
Evaluating the Effectiveness and the Efficiency of a Vector Image Search Tool
267
• Step 3: for each query Q(i), the Precision and Recall values, denoted as PRQ(i) and RCQ(i) respectively, are computed as follows: • PRQ(i) = |GTQ(i) ∩ AQ(i) | / k
(1)
• RCQ(i) = |GTQ(i) ∩ AQ(i) | / |GTQ(i) |
(2)
Formulas (1) and (2) highlight that the Precision and Recall values depend on GTQ(i) and AQ(i) . For the query Q(i), in our experiments, while AQ(i) vary with the chosen engine, |GTQ(i) |, representing the cardinality of the category which i belongs, is fixed. • Step 4: using formulas (1) and (2), for each of the 12 queries in Table 2, for each of the 9 engines of VISTO, and for each couple (k, g), different graphics are calculated for inspection. In particular: − Set I: contains the 10368 PR vs RC curves defined in step 1; − Set II: contains 12 histograms, one for each category. Each histogram contains, the average values of P RQ(i) and RCQ(i) , for each couple (k, g).
Fig. 2. Histogram for the hearts category
In relation to goal A, studying curves of Set I, we observed that all curves are descendent. This behavior demonstrates that, as well described in [4], given a query Q(i), all engines well retrieve images in the same category of image i, independently from (k, g). In relation to goal B, studying histograms of Set II, we individuated the best and the worst engines for each category (the best engine is such that has the higher value bars, see for example Figure 2). In relation to goal C, we can conclude that the best retrieved category in terms of Recall is the Letters category, and the best retrieved category in terms of Precision is the Numbers category. In conclusion, by the above described experiments we have demonstrated that all VISTO engines well works, independently from g (when g increases, the relevant images decreases). We
268
P. Di Marco et al.
also discovered that BE is the best engine for 4 categories (Hearts, Question points, Squares and Segments and points), ZC for 3 categories (Houses and Buildings, Circles and Mixtures), BD for 2 categories (Curves, and Letters ), HC for one category (Arrows), BC for one category (Not classified ), and HD for one category (Numbers ). Finally, we discovered that the best retrieved category is Letters.
References 1. Chim, Y.C., Kassim, A.A., Ibrahim, Y.: Character recognition using statistical moments. Image and Vision Computing 17, 299–307 (1997) 2. de Vries, A.P.: The role of evaluation in the development of content-based retrieval techniques 3. Eakins, J.P., Boardman, J.M., Graham, M.E.: Similarity retrieval of trademark images. IEEE, Multimedia 5(2), 53–63 (1998) 4. Heesch, D., Ruger, S.: Combining features for content-based sketch retrieval: a comparative evaluation of retrieval performance. IEEE Transaction on Image Processing 5. Hu, M.K.: Visual pattern recognition by moments invariants. IRE Transactions on Information Theory 8, 179–187 (1962) 6. Huang, T., Mehrotra, S., Ramchandran, K.: Multimedia analysis and retrieval system (MARS) project. In: Data Processing Clinic (1996) 7. Koskela, M., Laasonen, J., Laakso, S., Oja, E.: Evaluating the performance of content based imag retrieval system. In: Laurini, R. (ed.) VISUAL 2000. LNCS, vol. 1929, pp. 430–441. Springer, Heidelberg (2000) 8. Liu, W., Su, Z., Li, S., Zhang, H.: A performance evaluation protocol for content-based image retrieval algorithms/systems (2001) 9. Di Mascio, T., Francesconi, M., Frigioni, D., Tarantino, L.: Tuning a CBIR system for vector images: The interface support. In: Proceedings of Working Conference on Advanced Visual Interfaces (AVI 2004), pp. 425–428. ACM, New York (2004) 10. Di Mascio, T., Frigioni, D., Tarantino, L.: Evaluation of VISTO: the new vector image search tool. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4552, pp. 836–845. Springer, Heidelberg (2007) 11. Di Mascio, T., Frigioni, D., Tarantino, L.: A visual environment for tuning content-based vector image retrieval. In: Lawrence Erlbaum Associates (ed.) Proceedings of HCI 2005, Adjunctive Proceedings (2005) 12. Niblack, W., Barber, R.: The qbic project: Querying images by content using color, texture and shape. In: Proceedings of Conference on Storage and Retrieval for Image and Video Databases, pp. 173–187 (1993) 13. Ogle, V.E., Stonebraker, M.: Chabot: Retrieval from a relational database of images. IEEE Computer 28(9), 40–48 (1995) 14. Smith, J.R., Chang, S.-F.: Visualseek: A fully automated content-based image query system. ACM Multimedia, 87–98 (1996) 15. Smith, J.R.: Image retrieval evaluation. In: IEEE WorkShop on Content-Based Access of Image and Video Libreries (June 1998) 16. Veltkamp, R.C., Tanase, M.: A survey of content-based image retrieval systems. In: Proc. of Content-based image and video retrieval, pp. 47–101. Kluwer Academic Publishers, Dordrecht (2002) 17. Yang, L., Albregtsen, F.: Fast computation of invariant geometric moments: a new method giving correct results. In: Proceedings of IEEE International Conference on Pattern Recognition, pp. 201–204 (1994)
Building and Browsing Tropos Models: The AVI Design Tania Di Mascio1, Anna Perini2, Luca Sabatucci2, and Angelo Susi2 1
University of L'Aquila, Monteluco di Roio, L'Aquila, I-64100, Italy [email protected] 2 Fondazione Bruno Kessler –IRST, Trento - Povo, I-38050, Italy {perini,sabatucci,susi}@fbk.eu
Abstract. This paper proposes the use of the HCI paradigm and techniques to support software system designers in building and browsing visual models during the development of complex distributed systems. In particular, we adopt Usability Evaluation Methods (UEMs) to analyse the first version of the interface of TAOM4E, the tool supporting the Tropos Agent-Oriented methodology. Using the results of this usability study, we collect different requirements to design an Advanced Visual Interface (AVI) of TAOM4E taking into account requirements of supporting software designers during Tropos models design process browsing.
1 Introduction Visual modeling is a core activity in the so called model-driven approaches to software development. Tools played a key role in the diffusion of these practices in industrial settings (an example is the use of Object-Oriented modeling tools such as the IBM Rational Rose). In the latest years, Agent-Oriented (AO) modeling is becoming a reference paradigm for the design of complex distributed software systems. Several methodologies have been proposed so far, but only a few are supported by appropriate modeling tools. Current tools have been built giving considerable attention to practical issues, such as model interoperability, while their user interfaces seem to have been designed neglecting relevant principles of HCI. In our opinion, exploiting HCI principles and techniques will allow to build user interfaces which better fit the needs of the different users of these methodologies, resulting in a considerable contribution towards the diffusion of AO methodologies. In this work, we focus on TAOM4E (http://sra.itc.it/tools/taom4e), the tool supporting the Tropos AO methodology, briefly described in Section 2. In order to build a user interface supporting users in Tropos models design process and browsing, the current GUI of TAOM4E has been evaluated. In order to do that, we used the usability evaluation methods (UEMs), which belong to the HCI research field; the different UEMs existing are described in Section 3. The TAOM4E GUI results evaluation, carried out using the Cognitive Walkthrough method [6], confirm that, mapping the actions prescribed by the Tropos methodology process to the TAOM4E AVI, can improve the users' interaction. The design guidelines of TAOM4E AVI, carried out taking into account requirements of supporting software designers, are described in Section 4. Conclusions and future works are described in Section 5. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 269–276, 2009. © Springer-Verlag Berlin Heidelberg 2009
270
T. Di Mascio et al.
2 Tropos and TAOM4E The Tropos AO methodology [8] adopts a model-driven software development process that guides users in building an initial model of domain stakeholders, with their own goals and social dependencies for goal achievement. This model is then incrementally refined and extended into intermediate design models. The methodology provides a visual modeling language that offers primitive concepts such as actor, goal, plan, and resource, together with relationships, such as strategic dependencies between actors and goal and/or decompositions. Actor diagrams and goal diagrams give a view of the actors’ strategic dependencies and of the way an actor's goals have been decomposed into subgoals and implemented via plans, respectively. The modeling process has been defined in terms of the nondeterministic concurrent algorithm (Generate Tropos Model) recalled in Figure 1 [8]. This process guides a user in building Tropos models. During Phase 1 - the initialization step - a set of domain actors and their associated goals, plans and resources are incrementally added to the model. During Phase 2 - i.e. the modeling step - the procedure goal modeling is executed for each goal in the model in order to analyse it. The process goes on till all relevant entities have been elicited from the domain. In the goal modeling procedure described in Figure 2, three possible decisions can be taken by users with respect to a given goal. The delegate choice can be performed in two ways, that is the goal can be delegated to an existing or to a new actor (adding a new dependency relationships). In the latter case, a new actor will be added to the model before delegating it the goal. For the choice expand, the goal is and/or decomposed into subgoals (adding a new decomposition relationship); the new subgoals are added to a list of goals to be processed. Finally, the goal can be solved (the step solve) associating it a plan p (and eventually a set of resources ri) in the model via a means ends relationships that is added to the model. In order to support the specific analysis techniques adopted in Tropos we developed a modelling environment, called TAOM4E (Tool for Agent-Oriented Modelling for Eclipse), which is based on an implementation of the metamodel described in [9].
Fig. 1. The modelling procedure of the Tropos methodology
Building and Browsing Tropos Models: The AVI Design
271
Fig. 2. The goal modelling procedure of the Tropos methodology
Among the main requirements we considered in developing TAOM4E tool are the following: • Visual Modelling. As pointed out before, the visual aspect of the language is one of the fundamental characteristics of the Tropos methodology, so the modelling environment should support the user during the specification of an AO visual model (e.g., according to the Tropos visual notation). Moreover, the environment should allow us to represent new entities that will be included in the Tropos metamodel, language variants, as well as to restrict its use to a subset of entities of the modelling language. • Specification of model entities properties. The modelling environment should allow us to easily annotate the visual model with model properties like invariants, creation or fulfillment conditions that are typically used in Formal Tropos specification. • Extensibility. The modelling environment should be extensible and allow for different configurations by easily integrating other tools at will. An effective solution to the requirement of a flexible architecture and to the component integration issue is offered by the Eclipse Platform. New tools are integrated into the platform through plug-ins that provide the environment with new functionalities. A plug-in is the smallest unit of function in Eclipse and the Eclipse Platform itself is organized as a set of subsystems, implemented in one or more plug-ins, built on the top of a small runtime engine. The TAOM4E architecture is depicted in Figure 3. It follows the Model View Controller
Fig. 3. The architecture of TAOM4E
272
T. Di Mascio et al.
Fig. 4. Snapshot of the current TAOM4E GUI
pattern and has been devised as an extension of two existing plug-ins. First, the EMF plug-in (http://www.eclipse.org/emf/) offers a modelling framework and code generation facilities for building tools and other applications based on a structured data model. Given an XMI model specification, EMF provides functions and runtime support to produce a set of Java classes for the model. Most importantly, EMF provides the foundation for interoperability with other EMF-based tools and applications. The resulting plug-in, called TAOM4E model implements the Tropos metamodel. It represents the Model component of the MVC architecture. Second, the Graphical Editing Framework (GEF) plug-in (http://www.eclipse.org/gef/) allows developers to create a rich graphical editor around an existing metamodel. The functionality of the GEF plug-in helps to cover the essential requirement of the tool, that is supporting a visual development of Tropos models by providing some standard functions like drag & drop, undo-redo, copy & paste and others. The resulting plug-in, called TAOM4E platform represents both the Controller and the Viewer components of the tool. A snapshot of the tool's GUI is depicted in Figure 4. It consists of three main windows: • the project/model browser on the left; • the diagram editing window in the center; • the Tropos entities/relationships palette at the left of the diagram editing window. In particular, the project model window contains the set of model entities and artifacts of the Tropos project; the diagram editor allows the user to specify the Tropos artifacts, in particular actor and goal diagrams, using the Tropos concepts (actors, goals, tasks, resources and the relationships between those concepts) contained in the palette area. A simple example of such a Tropos diagram is shown in Figure 4. In this
Building and Browsing Tropos Models: The AVI Design
273
diagram the actors are represented via circles, goals via elliptical shapes and tasks via hexagons. Moreover, in the figure is shown the internal goal diagram of the Actor 1 contained inside the dashed line ellipse (named balloon in Tropos). Focusing on the relationships specified in the diagram, two dependencies are shown between the two actors (Actor 1 and Actor 2): a goal dependency whose label is the HardGoal 1, and a task dependency whose label is the Plan 1. Inside the balloon the and decompositions of the root goal HardGoal 2 and of the subgoal HardGoal 3 are described together with a means_ends relationship between the Plan 1 and the HardGoal 5. Currently, the model building process, executed by users, is not visually supported by the TAOM4E GUI. In fact, the user can add to the model entities without particular constraints or guidelines, so having also the possibility to specify Tropos models in a way that is not consistent with the process envisaged before. The designer can start building a new Tropos model directly creating the project in the project window and specifying the kind of diagram to start with (actor or goal diagram); at this stage the designer can start specifying model entities having also the possibility to specify goals or tasks without having specified the actors those goals/tasks are associated to; this way the user building task could fail in following the sequence of actions defined by the process, breaking the rational underlying the Tropos methodology. We aim at designing an AVI that improves the user support in building and browsing tasks. In the next section we delineate some HCI guidelines to be followed in the new GUI supporting the design process.
3 Usability Evaluation Methods The introduction of usability evaluation methods (UEMs) to assess and improve usability in which systems has led to a variety of alternative approaches and a general lack of understanding of the capabilities and limitations of each approach. In [4], authors present a practical discussion of factors, comparison criteria, and UEM performance measures that are interesting and useful while comparing different UEMs. A common difference among UEMs is based on the skills of the evaluators (in general, a person using a UEM to evaluate usability of an interaction design); in particular two main criteria exists, Expert-based criteria and User-based criteria. In the former, experts are requested to evaluate a prototype, comparing it with respect to existing rules and guidelines; in the latter evaluators assess usability through real users, having them using a prototype. In particular while the Expert-based Criteria UEMs include, among others, Heuristic Evaluation method [5], Cognitive Walkthrough method [3], and Expert-based method [6], the User-based Criteria UEMs [2] include, among others, Observational evaluation method [10], Survey evaluation method [10], and Controlled experiment method [3]. Among observational evaluation methods, we focus on Verbal Protocols, and Think Aloud Protocol [3]. In what follows, we briefly describe these methods: • Heuristic Evaluation: This criterion foresees a small set of evaluators that evaluate the interface and judge its compliance with well known usability principles (e.g., the heuristics). • Cognitive Walkthrough: This is typically performed by the interface designer and a group of his or her peers. Small-scale walkthroughs of parts of an interface can
274
T. Di Mascio et al.
also be done by individual designers as they consider alternative designs. In a group situation, one of the evaluators usually takes on the duties of scribe, recording the results of the evaluation as it proceeds, and another group member acts as facilitators, to keep the evaluation moving. • Expert evaluation: This method exploits the knowledge of an HCI expert, so obtaining a prediction about the usability of the system. This method has a main disadvantage: it depends upon the quality of the expert(s). • Observational evaluation method involves real users that are observed when performing tasks with the system (depending on the stage of the project, what the system is ranges from paper mock-ups to the real product). This method offers a broad evaluation of usability. Depending on the specific situation, we may either apply the Observational Evaluation by direct observation or record the interaction between the users and the system (using Usability Lab). Recording (done by video camera) is more valuable, since it allows for storing information, for example the critical points during the interaction (e.g., when the user has to consult the manual, when and where s/he is blocked), the time a user spends to perform a task, the mistakes a user makes, and so on. Obviously, recording, using camera, is very expensive (especially for the time required to analyze the recorded data). − Think Aloud Protocol provides the evaluator with information about cognitions and emotions of a user while the user performs a task or solves a problem. The user is instructed to articulate what s/he thinks and feels while working with a prototype. The utterances are recorded either using paper and pencil or using audio and/or video recording. By using the Think Aloud Protocol, the evaluator obtains information about the whole user interface. This protocol is oriented towards the investigation of the users problems and decisions while working with the system. − Verbal protocols aim at eliciting the users (subjective) opinions. Examples are interviews and questionnaires. The difference between oral interview techniques and questionnaire techniques lies mainly in the effort for setup, evaluating the data, and the standardization of the procedure. • Survey evaluation method. In this case, structured questionnaires and/or interviews are used to get feedback from the users. This method offers a broad evaluation of usability since from the users viewpoint it is possible to identify the critical aspects in user-system. • Controlled experiment method. This method is particularly valid to test how a change in the design project could affect the overall usability. It may be applied in any phase during the development of a system; it provides more advantages when it is possible to test separately the alternative designs, independently by the whole system. This method mainly aims at checking some specific cause-effect relations, and this is achieved checking as many variables as possible. Since the TAOM4E tool design doesn't follow a particular HCI design methodology (as for example the User Centered Design Methodology [7], we use, to evaluate TAOM4E GUI, a method belonging to the Expert-based criteria: the Cognitive Walkthrough [3].
Building and Browsing Tropos Models: The AVI Design
275
4 TAOM4E AVI Design As mentioned the TAOM4E GUI has been analyzed from the usability point of view submitting it to an Expert-based evaluation [6], using the Cognitive Walkthrough [3]. The followed approach in the evaluation has been considering the main tasks of TAOM4E: building and browsing Tropos models. Consequently, the several remarks about visual issues are described considering building task and browsing task obtained, with respect to the possibility of improving user support in these tasks. In particular, mapping the phases and actions proposed by the process algorithm to structural elements of TAOM4E AVI we derived the different requirements. These requirements have to be applied to: • the palette structure; • the palette behavior; • the diagrams dynamic layout. Building task. This task is the most important task of TAOM4E users. It consists in choosing Tropos model elements and managing them in order to create a consistent model, following the modelling process described in the Figures 1 and 2. In order to improve the user support during the model building task, the AVI should accomplish the following requirements: • The palette structure has been divided into three areas. At the top we grouped general actions on the model diagram: simple objects selection; objects group selection; and Tropos balloon opening. In the middle the set of actor internal elements (goal, task, resource) and relationships are presented. At the bottom are positioned the set of social relationships, namely goal-, task-, resource- dependency. • The palette behavior reflects the Tropos process algorithm steps. The system enables and disables object choices following the building process. When creating a new model all the palette icons are disabled except the actor icon. Once an actor is dragged & dropped from the palette in the diagram (so that an actor is added to the model), the icons related to internal elements, in the middle area of the palette, are enabled, together with the icons corresponding to the general actions. Internal and social relationships are not created in this step, so, their icons are maintained disabled. Once a second actor has been specified, the set of social relationships are enabled to allow users to start modeling actors’ social relationships (see the delegate action in the algorithm). • For each actor added to the model, users can choose to open the balloon, dragging it from the general actions palette to a specific actor in the diagram, in order to start doing actor internal goal modeling (the expand action in the algorithm). Browsing task. The TAOM4E AVI should support users in diagram browsing by enriching, through visualization techniques, predefined views. In order to improve the user support in this task, we aim at introducing the layers paradigm to visualize selected objects, using a tool visualization palette containing a list of combo boxes associated to modeling objects; in this way, when users select one or more combo boxes, the system shows/hides the associated objects in the diagram. In order to better fit user needs in diagram visualization, the tool should propose different shape layouts (e.g. star, circle,
276
T. Di Mascio et al.
rectangles) upon user request. Moreover, we rely on a Focus+Context technique [1] to support diagram browsing: when the user selects a particular actor, the diagram's layout changes: the actor is positioned in the center of the diagram (focus) while its proximity and the context appear in the background. It is worth noting that this operation is not an optimization since we are only temporary modifying the model view.
5 Conclusions We used the HCI paradigm and techniques to evaluate the current TAOM4E GUI and to derive the new TAOM4E AVI design. This work has been motivated by requests and observations from users, that downloaded TAOM4E and exploited it as supporting tool in a novel software design approach. The evaluation of the current interface is carried out using the Cognitive Walkthrough method: TAOM4E AVI maps the phases and actions proposed by the Tropos process algorithm to palette structure, palette behavior, and diagram dynamic layout, integrating Focus+Context technique to support diagram browsing. Further investigation will be devoted to: (1) evaluate TAOM4E AVI through user-based methodologies; (2) support the definition of different user profiles based on different skill levels in specifying Tropos models.
References 1. Card, S.K., Mackinlay, J.D., Shneiderman, B.: Information Visualization: Using Vision to Think. M. Kaufmann, San Francisco (1999) 2. UEMs Consortium. Heuristic evaluation (2006) 3. UEMs Consortium. Performing a cognitive walkthrough (2006), http://www.cc. gatech.edu/computing/classes/cs3302/documents/cog.walk.html 4. Hartson, H., Andre, T., Williges, R.: Criteria for evaluating usability evaluation methods. International Journal of Human Computer Interaction 13(4), 373–410 (2001) 5. Hartson, H.R., Andre, T.S., Williges, R.C.: Criteria for evaluating usability evaluation methods. International Journal of Human Computer Interaction 13(4), 373–410 (2001) 6. Hix, D., Hartson, H.R.: Formative evaluation: Ensuring usability in user interfaces. Trends in Software 1, 1–30 (2000) 7. Norman, D., Draper, S.: User Centered System Design. LEA Hillsdale, N.J (1986) 8. Penserini, L., Perini, A., Susi, A., Mylopoulos, J.: High Variability Design for Software Agents: Extending Tropos. ACM TAAS 2(4) (2007) 9. Susi, A., Perini, A., Giorgini, P., Mylopoulos, J.: The Tropos Metamodel and its Use. Informatica 29(4), 401–408 (2005) 10. Yagita, Y., Aikawa, Y., Inaba, A.: A proposal of the quantitative evaluation method for social acceptability of products and services. Technical report, Ochan-omizu University, Otsuka Bunkyo-ku, Tokyo (2001)
A Multiple-Aspects Visualization Tool for Exploring Social Networks Jie Gao, Kazuo Misue, and Jiro Tanaka Department of Computer Science, Graduate School of System and Information Engineering, University of Tsukuba {gao,misue,jiro}@iplab.cs.tsukuba.ac.jp
Abstract. Social network analysis (SNA) has been used to study the relationships between actors in social networks, revealing their features and patterns. In most cases, nodes and edges in graph theory are used to represent actors and relationships, and graph representations are used to visually analyze social networks. However, many visualization tools using network diagrams tend to depict most information about social networks by using the properties of nodes, which result in a visual burden when identifying actors or relationships according to certain properties. There is a lack of tools to support work by investigators to provide insights into multiple-aspect networks. We considered actors, relationships, and communities to be three important elements, and developed a tool called MixVis that integrates a tagcloud, network diagrams, and a list to show the elements. Our tool allows users to explore social networks from elements of interest, and acquire details through links with the three different viewpoints. Keywords: Social network analysis, visualization, human interface.
1 Introduction Social network analysis (SNA) is used to investigate the interactions between one person and another, one company and another, or one country and another. By using SNA, investigators attempt to find an important actor (person, company, or country), to understand the information flow and to master the trends in the whole network. SNA can facilitate an investigator’s decision-making processes. SNA has been applied to many areas such as social science, politics, and psychology. Actors, links, and communities are the main factors in social networks. A community can be defined in two different ways. In some research, a community is made up of an exclusive disjoint group of actors, and often extracted by using a clustering algorithm such as Newman’s method [6]. However, in the real world, the word community can also mean an actor’s affiliations. The difference between the two definitions is that a person can only join in one exclusive group, but can have multiple affiliations. The latter has recently been attracting a great deal of attention with the increasing popularity of the Social Networking Service (SNS), where end users can interact with friends by participating in a variety of virtual communities. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 277–286, 2009. © Springer-Verlag Berlin Heidelberg 2009
278
J. Gao, K. Misue, and J. Tanaka
Nodes and edges in graph theory are used to represent actors and links. A community is a set of actors and can be regarded as a cluster in a graph. A graph can be drawn as a network diagram. Communities are overlaid on the network diagram, and can be drawn as a closed area [7], an adjacent matrix [5], or others. A refined layout for the network can provide an intuitive and comprehensive image of networks. We regard traditional work in exploring social networks as a network-centered approach. Most operations and information are attached to networks. These methods can give a general impression of entire networks and even an actor’s most important properties. However, these methods pose two problems. The first is that images become cluttered since communities overlap, when actors are allowed to join more than one community. The second problem is the great visual burden imposed on the investigator in having to explore networks in detail because of the excessive amounts of information on the networks. To solve these problems, we present an approach that uses a combination of three views corresponding to three aspects: nodes, links, and communities. Based on the approach, we developed a tool that integrates the network diagram, a tagcloud [8], and a list to support SNA. The network diagram shows links between actors and the whole network, the tagcloud displays actors with their attributes, and the list presents the communities that the actors belong to. We also provide a set of techniques of interaction to explore social networks.
2 Task Description SNA aims at identifying important actors, crucial links, communities, and network characteristics [2]. We classified the tasks for SNA into three specific categories. 2.1 Task to Identify Actors, Links, and Communities SNA should identify actors, links and communities with specific properties. There are several properties of actors that an investigator is interested in. Of these, centrality is widely used to identify actors that occupy important positions. The representative criteria of centralities include degree centrality and betweenness centrality. − Degree centrality evaluates the importance of an actor by the number of actors linked to it. An actor linked to most of the other actors is regarded as being the most important. An actor with high degree centrality can be interpreted as an outgoing or popular person in a friendship network. − Betweenness centrality evaluates the importance of an actor by the number of times that the shortest path of any pair of other actors passes through the actor. The most mediated actor is considered to be the most important. The one with high betweenness centrality can be translated to be an intermediary of subgroups in a friendship network. SNA should also identify crucial links such as those that bridge many pairs of actors, and communities of actors who have tight internal connections.
A Multiple-Aspects Visualization Tool for Exploring Social Networks
279
2.2 Task to Comprehend Details of Certain Actors, Links, and Communities Details are also necessary for understanding the indentified actor, links, and communities. SNA should provide investigators with actors that are directly linked to the interested actor and communities that the actor belongs to. SNA should also present links between actors in a way that is understandable. Basically, people want to know the two actors involved in a link, as well as actors linked to a certain actor. In addition, investigators want to know how actors in a community are connected, what the attributes of its members are, and who bridges multiple communities. SNA should comprehensively and clearly answer these substantive questions. 2.3 Task to Understand the Whole Network SNA should provide an overview of the whole network. The datasets for SNA studies are not only about the attributes of an actor, but also systematic features of all the actors. These features reflect various social phenomena or affairs. SNA reveals these phenomena or affairs by providing an overview of the whole network.
3 Traditional Tools for SNA There are a number of visualization systems used to support SNA. We divided these systems into two groups, i.e., statistical analysis-oriented and visual presentationoriented methods. Statistical analysis-oriented methods calculate statistical values that represent a variable feature of social networks. For example, UCINET [1] enables centrality to be measured, communities to be identified, and other functions for analysis. As UCINET has too many complex functions, users always need help from experts or need to study manuals. Pajek [3] is a tool embedded into UCINET. It enables users to understand social networks more easily by drawing the network as a diagram. However, it only depicts the statistical values in colors or the sizes of the nodes in a network diagram. When identifying actors and communities, users are very easily distracted by the complicated denotations in the network diagram. Recent research has tended to take advantage of human vision, and provides visual presentations and various techniques of interaction to enhance exploration done by investigators. NodeTrix [5] integrates a network diagram and a matrix to enable the global structure of the network to be visualized and also to allow local communities to be analyzed in detail. However, it lacks the capacity for fine exploration of actors. This is because users have to take their time to distinguish an actor’s centrality by assessing the brightness of nodes in a large network diagram. Perer et al. suggest a combinatorial view of a network diagram and a list [7]. However, it only provides dependent views of an actor list, while communities are drawn as closed areas and overlaid on the network. Their method only divides the network into disjointed subgroups. If these approaches were applied to displaying communities that all intersected without discrimination, a great number of overlaps would occur and result in visual clutter. As a result, it would be difficult to identify communities and content.
280
J. Gao, K. Misue, and J. Tanaka
4 Our Approach: Views Corresponding to Three Aspects We considered actors, links, and communities as three aspects from which investigators could start their explorations since these provide very distinct information. Each of these aspects has its own features and provides different information. Our idea was to divide presentations of social networks into three appropriate views (see Fig. 1) that corresponded to the three aspects and synchronize these views to reveal other information that concerned aspects in real time. By doing so, information in networks can be dispersed in different views, so that investigators can rapidly and effectively identify all aspect by only focusing on one. In addition, other representations (e.g., lists) can fill in gaps where the network does not lay out networks sufficiently systematically such as in sorting. The following describes the three different views and the techniques of interaction linking them.
(a)
(b)
(c)
(d) Fig. 1. Snapshot of MixVis. Actor clicked in tagcloud is highlighted by blue box in network. Actor’s communities are highlighted in different colors.
4.1 Tagcloud to Rapidly Identify Actors The most important properties of an actor are its centrality, its absolute value, and how important it is in the whole network. We used a tagcloud to represent actors to identify and comprehend them with specified properties. The tagcloud is a visual presentation of a set of words. The attributes of text in a tagcloud such as size and color are used to represent features (such as frequency) of associated terms [8]. This is for “impression formation” and provides a pre-attentive image of the “key person”. Actors’ names in our tagcloud were lined up on several lines. The layout of the actors was changed automatically according to the size of the tagcloud panel and the number of actors. The size and background of each actor was determined by its centralities. An actor’s size was proportional to its degree centrality and the brightness of its background was inversely proportional to its betweenness centrality.
A Multiple-Aspects Visualization Tool for Exploring Social Networks
281
The tagcloud also provided an accurate value for degree and betweenness centrality. When the cursor crossed over an actor, a label popped up, and the absolute values of degree and betweenness centrality were displayed on the label. The actors in a tagcloud could be sorted by their order of centrality and in alphabetical order. Fig. 1(d) shows a tagcloud with about 150 actors. 4.2 Network Diagram for Links and Overview of Networks People are concerned about who is connected to whom in links, especially those actors connected to central actors. Other important information is in the overview of the network, which is the entire content for the links. Networks are drawn as network diagrams. We developed a force-oriented method to lay out the network diagram, by which we showed the links between actors, and also made it possible to overlay communities on the network diagram [4]. The method of drawing a compound graph was leveraged to draw the network. We defined a compound graph as a combination of an inclusion graph and an adjacency graph. It could not only represent links between actors, but also sets of actors (communities). We drew links as straight-line segments with our method, and drew communities as inclusion areas. Our method of drawing allowed communities to intersect. There is a drawing of a network in Fig. 1(b). It is also possible to depict an actor’s properties by using a node’s attributes. This makes it convenient to check the link’s features. For example, it is easy to know whether one link is connecting central actors or not, by checking the connected actors’ colors. We designed three styles to draw nodes (actors). 1) Labels indicate actors’ names and a label’s background color represent an actor’s degree centrality. A red background indicates that an actor’s degree centrality is high (see Fig. 2). 2) Labels indicate actors’ names and a label’s background color represents an actor’s betweenness centrality. A red background indicates actor’s betweenness centrality is high (see Fig. 1). 3) An actor’s image represents the actor. The layout of the network can be refined to observe both the overview of the whole network and the details of one link. We provided a control panel (Fig. 1(a)), in which a slider was used to adjust several parameters to lay out the network diagram. 4.3 List to Flexibly Select Communities of Interest A community indicates a set of actors sharing some common properties. Flexible interactions with communities are required to search and choose communities of interest. An investigator can comprehend the structure (pattern) of communities and find intersections through interactions. We used a list to select communities (see Fig. 1(c)). The list contained all the communities actors belonged to. These communities were arranged in alphabetical order. An investigator could select single or multiple communities of interest. The brightness of a community’s background indicates its size, i.e., the number of members belonging to it. A dark community indicates that it has many members. When highlighting communities, e.g., when multiple communities are selected, we use another set of colors to distinguish them. We used a hue, saturation, and value (HSV) color model to assign colors to the communities to avoid confusion. More
282
J. Gao, K. Misue, and J. Tanaka
importantly, we produced different bright colors, rather than gray, by controlling the hue, saturation, and value. 4.4 Linking Three Views The network diagram, tagcloud, and list were not isolated views, but an integrated, synchronized system. We introduced the concept of “linking and brushing” to design our system. Linking and brushing is a technique to combine different representations to reveal different aspects of data. Interaction with one will cause corresponding changes in the other representations. We leveraged the technique to produce insights into each aspect that was concurrently visible, so that the investigator could rapidly acquire details about the aspect of concern. We mainly harmonized the three views by highlighting related information in the two other views when selecting an object in the one view. Moreover, we used the same color to highlight a community and its members. By doing so, the investigator could visually distinguish members and their affiliations.
5 MixVis: A Tool for Exploring Social Networks Our system is constructed of three main views including a network diagram, a list, and a tagcloud. The middle panel (Fig. 1(b)) is the network diagram, the left of which has the control panel (Fig. 1(a)) that contains widgets to adjust the diagram. The community list (Fig 1(c)) is at the right of the network diagram. A tagcloud (Fig 1(d)) is at the bottom of the whole user interface. The size of the view can be adjusted by users themselves according to their individual needs. They can also hide or maximize one view. In some cases, an appropriate size can enhance identification. For example, an extended network diagram can clearly show large-scale nodes and their relationships. Our tool provides techniques for interactively exploring multiple aspects of insights into social networks. These insights contain an overview of the whole network, insights into communities or parts of the network, and details on some actors or links. They are consistent with people’s interests, i.e., properties of a single actor, links, communities, connectivities between or intersections of communities, and the balance or density of the whole network. In the following, we will describe techniques of interaction with MixVis. 5.1 Interactions from Tagcloud We assumed that an investigator would be focusing on actors and want to find those who were central. Tagcloud shows actors in an impressive way in that: 1) actors with high degree centrality are outstanding among others since they are larger and 2) actors with high betweenness centrality are easy to find because their backgrounds are darker. Consequently, an investigator can find them at a quick glance in a tagcloud. Moreover, they can accurately identify central actors by sorting them by degree centrality, betweenness centrality, or in alphabetical order. Popup labels that post the
A Multiple-Aspects Visualization Tool for Exploring Social Networks
283
value of centrality can facilitate this process. After this, the investigator can know the position of the actor in the network and its communities by clicking a tag. The same actor will be highlighted in the network and its communities will also be highlighted. Fig. 1 illustrates how the other two views change after the tag is clicked. One tag is identified and selected. Due to the three views being linked, the actor is also highlighted by a box in the network diagram and the communities that the actor belongs to are highlighted by different colors on a list. 5.2 Interactions from Network Diagram When browsing the network diagram, an investigator may notice an actor and want to find out whom it is related to, which community it belongs to, and what its centrality is. The role of the actor is investigated by clicking it in the network. Actors directly connected to the one that was clicked are highlighted by a red box, while the other actors are blurred. Communities that the clicked actor belongs to are highlighted on a list by the color of the background. The actor is also highlighted in a tagcloud, where we can see its other properties. The investigator can adjust the layout of the network diagram by using the widgets in the control panel. Fig. 2 is snapshot of MixVis after one actor in the network has been clicked. By taking a glance at the three views, we can know its links, its properties such as its degree centrality, and its communities.
Fig. 2. Actor clicked in network diagram is turned blue in tagcloud. Its communities are highlighted in different colors.
5.3 Interactions from List If an investigator focuses attention on communities, the list can facilitate his/her work by allowing any of one or more communities to be selected. In the meantime, the selected communities are drawn as inclusion areas in the network. As actors shared by multiple
284
J. Gao, K. Misue, and J. Tanaka
communities are placed at the intersection of closed areas, the investigator can find them easily. To see the centralities of the members of every community, tags belonging to these communities are highlighted in the same color as the respective community. As seen in Fig. 3, two communities on the community list have been selected; the actors are contained in an inclusion area to indicate their membership. The correspondence is very clear from the color of the community, the inclusion area, and the actor in the tagcloud. Therefore, that investigator can easily comprehend the details of the communities.
Fig. 3. Two communities are selected. Colors of members in tagcloud area in network diagram are consistent with communities on list.
6 Evaluation 6.1 User Study To test and verify the efficiency of our approach, we administered a user study in which MixVis was compared to two other tools: ToolA and ToolB. ToolA only had the network diagram in MixVis, and the other information related to the community and actor was overlaid on the network diagram. ToolA was similar to the traditional tools that presented all the information in a network diagram. ToolB included a network diagram and a tagcloud. The list in ToolA and ToolB could be displayed or hidden all together. ToolB was similar to work that also used combinations of different representations. We designed seven tasks and assumed them to be important tasks of SNA. There is a brief description of the tasks in Table 1. The tasks covered the categories given in Section 2.
A Multiple-Aspects Visualization Tool for Exploring Social Networks
285
Table 1. Tasks to be solved by subjects Task No. 1 2 3 4 5 6 7
Content To identify actor with highest degree centrality To identify actor with highest betweenness centrality To explore details on an assigned actor To explore details on a specified link To explore details on two specified communities To find groups of actors by browsing the whole network To recognize a trend related to the community and actor by using three views
We asked ten students to participate in the user study. They either knew little or a great deal about visualization. Prior to starting this study, we gave them a short introduction to SNA and the functions of three tools. Then, they were asked to solve the seven tasks by using ToolA, ToolB, and MixVis (To eliminate the influence of names, we assigned the new name of “ToolC” to MixVis when conducting the study). After each task, subjects awarded these three tools scores according to how usable they were. Five points represented the maximum score. The average scores for the three tools for each task are shown in the bar graph in Fig. 4. The figure indicates that MixVis was awarded the highest score for handling the seven tasks.
Fig. 4. Average scores for ToolA, ToolB, and MixVis
6.2 Discussion According to Fig. 4, we assessed that MixVis was better than the other two tools in usability for most of the tasks. The differences in the average scores strongly support our perspective that the representation of multiple aspects can facilitate an investigator’s work. Significant differences, especially in completing Tasks 5 and 7, appeared and were verified with a t-test. We inferred that these differences were caused by the fact that the two tasks were related to the community and only MixVis could provide a list and a flexible function to enable selection. ToolA, ToolB, and MixVis were evaluated similarly in Tasks 1 and 2. The t-test confirmed that there are no significant differences between ToolA and MixVis in accomplishing Task 2. There were also no significant differences between ToolB and
286
J. Gao, K. Misue, and J. Tanaka
MixVis in accomplishing Task 1. The reason for the lack of difference is that in Tasks 1 and 2, we asked subjects to identify the actor who had the highest degree centrality and highest betweenness centrality that could be done without using a tagcloud and a list as in the traditional method.
7 Conclusion We proposed an approach that used a combination of three views corresponding to nodes, links, and communities. Based on the approach, we developed a tool called MixVis to support SNA. We adopted “linking and brushing” to enable real-time interaction between the three different views, and interaction between users and the proposed tool. Our tool allowed drawings of an actor’s affiliations, which is not supported by most traditional work. We administered user studies to evaluate how usable our tool was. Another two tools, which were equipped with only one or two views, were compared to the new tool. The experimental results revealed that MixVis was much easier to use.
References 1. Borgatti, S., Everett, M., Freeman, L.: UCINET V user’s guide. Analytic Technologies (1999) 2. Brandes, U., Wagner, D.: Visone - Analysis and Visualization of Social Networks. In: Jünger, M., Mutzel, P. (eds.) Graph Drawing Software, pp. 321–340. Springer, Heidelberg (2004) 3. de Nooy, W., Mrvar, A., Batagelj, V.: Exploratory Social Network Analysis with Pajek. In: Structural Analysis in the Social Sciences. Cambridge University Press, New York (2005) 4. Gao, J., Misue, K., Tanaka, J.: Drawings of compound graph using free-form curves. In: Proceedings of the 70th National Convention of IPSJ, vol. 1, pp. 407–408 (2008) 5. Henry, N., Fekete, J.-D., McGuffin, M.J.: NodeTrix: A Hybrid Visualization of Social Networks. IEEE Transaction on Visualization and Computer Graphics 13(6), 1302–1309 (2007) 6. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 69(6) (2004) 7. Perer, A., Shneiderman, B.: Balancing Systematic and Flexible Exploration of Social Networks. IEEE Transaction on Visualization and Computer Graphics 12(5), 693–700 (2006) 8. Rivadeneira, A.W., Gruen, D.M., Muller, M.J., Millen, D.R.: Getting our head in the clouds: Toward evaluation studies of tagclouds. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2007, pp. 995–998. ACM, New York (2007)
Multi-hierarchy Information Visualization Research Based on Three-Dimensional Display of Products System Zhou Hui and Hou WenJun Automation School of Beijing University of Posts and Telecommunications, Beijing 100876, China
Abstract. Currently, the information on the Web is countless, which is throughout tens of thousands of Web sites all over the world. And the Web site intertwined with each other through hyperlinks between documents. Regardless of such a big scale of the Web information, it will continue expanding. How to access to the information on the Web easily has become a problem needed to be solved urgently. However, the way of accessing to the information is far from satisfactory. Information visualization will play an increasingly important role in helping people understand the structure of the information space, finding information needed quickly and preventing the lost in the information ocean effectively. The paper used the Multi-hierarchy information visualization on a specific e-commerce web site, and established a three-dimensional products display system. According to the analysis of users on business web site, the establishment of a representative user model was established. In accordance with the user model, system function was analyzed and integrated, and task analysis was hierarchical. Based on the user's demand, the paper confirmed the content and the way of the showing. Finally the paper designed the system according to the information structure, interaction and information visualization. Keywords: user experience, Information architecture, visualization, mapping, Fuzzy Comprehensive Evaluation Method.
1 Introduction With the promoting of the information-based society and the more extensive of the network, information visualization has increasingly become to be the research focus in contemporary information management field. Cyberspace is becoming the main and foundational part of information systems. Information visualization creates a way to comprehend information space characteristics of the network from a new perspective. The core thought of visualization is that it introduces the reality of space form to the Internet. It made a breakthrough of the space concept which has only the characters, graphics, data and video in the original [1]. It made a new expression space though combining a real space and virtual network stack space to form. This expression reveals multi-dimensional characteristics of the information in cyberspace, and made the network information visualized. Information architecture emphasizes information organization and show, and then made the objective knowledge space to be ordered. In fact, people acquire, use or M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 287–294, 2009. © Springer-Verlag Berlin Heidelberg 2009
288
Z. Hui and H. WenJun
sharing external behavior and internal cognitive of information which decides in a large extent whether information technology can play a role. So in the management of information we no longer confined to technical aspects, but pay attention to people, information and the interaction between people and information. Information architecture expands the relevance through the user experience, and then achieves the goal of Information architecture [2]. So it must consider the user experience, the relationship among information resources, information space and users, to provide a reasonable scientific resources space. We achieve the goal of non-spatial information to the user's transmission through interactive three-dimensional systems. It is in fact an effective process of information feedback loop. Non-spatial information transforms into graphics through information visualization system. Through the three-dimensional display system, we implement the compression of the information visualization system signs, which resulting an optimized, more effective way to expression information. In many cases, extraction of information is through interactive, perceptual and cognitive. Therefore, the main function of this system is to research and exploration. This is consistent with human cognitive behavior. Each person has his own style in exploration and cognitive. Three-dimensional display system provides it a great deal of flexibility [3].
2 Information Architecture of the Display System For this article, Information architecture refers to the organization and classification to the system contents, and visualization implementation of the system interface. 2.1 Goal and Content of Information Architecture First we should understand and to clarify the system's mission and objectives, and then meet the uses’ needs. We determine the system's content and functionality and illustrate how to determine the organizational system, navigation system, marking system in order to help users find the required information [4]. Information Architecture in the Information Organization goals embodied in two aspects: First, from the view of the results, it is necessary to make the information clear and understandable; secondly, from the view of the user, make the information usefulness and availability and user having a good user experience. According to the uniqueness of information, from the view of information use and users understanding, we can attribute the content of information architecture to the following basic process: conceptual design, organization of information content, to generate information structure, the design of information interfaces, information navigation, and information display and information dissemination [5]. 2.2 Based on User Experience to Build Interactive Information We consider Information architecture model from the perspective of users and services. In user-oriented context, because of users often putting the system as a tool, using it to complete a particular information access and communicate mission, so Information architecture concerned about the main steps in complete their tasks as
Multi-hierarchy Information Visualization Research Based on Three-Dimensional Display 289
Fig. 1. Based on the user experience the interactive platform for building a model of information
well as the user how to complete the tasks. In services-oriented context, Information architecture concern about the information provided by system as well as the practical value to the users [6]. Figure 1 is the Information architecture model for oriented user experience. The model is fully taken into account users’ different preferences, different working environment and physical ability. It understands user's sensory system (visual, auditory, tactile), and then perform the information using appropriate technical. On the view of this system, in the strategic level, we target on the general users. So the information architecture and design is to help ordinary users, to provide users with as much information as possible. By comparing recommend, users can their interesting products more convenient. In scope level, the system allows users have a positive user experience in browsing products through user-friendly design [7]. The platform collect the information that user interested and browsing history using software. The system recommends products for users may be of interest to at any time and recommend products the user's friends are interested in. In structural level, the system emphasizes professional design. Columns are set clear, and plate structure is reasonable distributed. In frame level the interface is friendly, and recommend is clear. Popular products are recommended to users through the depth of analysis and integration of the information. In Surface level web page is simple, and the functional set highlights the principle of user centered [8]. From the above analysis we can see that information architecture model based on the user experience design by the combination of user-oriented and service-oriented. We put emphasis on visualization and understanding of information. And we put emphasis on the combine of content expression and user needs which resulting in improved user experience.
290
Z. Hui and H. WenJun
3 System Visual Simulation The paper mainly studied information classified and graphics performance. Through analyzing people’s cognitive Features, selective attention, and using of eye-movement apparatus, this paper researched the cognitive graphics and established the mapping between the information type and graphics performance. We look for a of visual form match user’s mental model. Then it will make the visual model and the user model match. 3.1 Visualization Variables Study Traditionally, visualization has seven variables: position, shape, orientation, color, texture, gray level and size. However, in order to express the uncertainty and timedimensional information, visualization variables will be extended to 10 kinds, such as putting color into hue, lightness and saturation. Different combinations of these variables can also constitute different. Different visualization variables express different property of spatial information. (As shown in table 1). The selection of Visualization variables will directly affect the quality of information. Table 1. The perceptual nature of Visualization variables
Connectivity
Selective
sequence
Quantity
position
+
-
-
-
shape
+
-
-
-
orientation
+
0
-
-
color
+
++
-
-
texture
0
+
+
-
gray level
-
+
++
-
size
-
+
+
+
In information visualization process, the information is expressed and transmitted through a series of symbolic. In order to reveal the nature and law better, to facilitate human understanding and using visualization information, expression and transmission of information need to use some intuitive symbols and visual form. these symbols is not only easy for human identification, memory, analysis, and can also be identified, storage and output by the computer. 3.2 System Simulation When the display data is large and multi-variable, users need to filter certain data to complete several goals - to remove the data users not interested in. users only want to see dates meeting specific conditions in order to understand the relationship between the different properties of the dates. A number of properties of the Data can be
Multi-hierarchy Information Visualization Research Based on Three-Dimensional Display 291
selected interactively. For example, when mouse roll over, normally the data will be seen which suggests that a detailed description of this object. Click or double-click will take the user to some other related page. The home page used the open three-dimensional space, allowing users to see the products in all categories. When users enter the system, the system will recommend a category user maybe interested in based on a user's browser history, such as electronic product category. In the internal region of the open space, there are the various types of popular products recommended, which is based on the number of buy or browse. When your mouse scrolls over a popular product, the details of the product parameters will appear in the right side In the home page, users can also see friend’ position.
Fig. 2. The home page
When a user clicks the camera button to enter the secondary interface, the secondary interface to the subscriber information includes the following information:
the camera-related products , cameras surrounding information
, personal collection of records , collection records of friends , recommend the camera , which also includes recommend the type of products and recommend related products. In three-dimensional model, we use a complementary form to express the relationship between the two types of products. Color and size represent the range and attention of recommended products.
292
Z. Hui and H. WenJun
Fig. 3. The second page
When you click on a specific product, you enter the third interface. It simple and clear, enabling users to be more concerned about the product itself. When you move mouse over the graphics, a specific parameters description will appear.
Fig. 4. The third page
4 System Evaluation Fuzzy evaluation is a kind of safety evaluation methods which is used the principle of "fuzzy comprehensive evaluation" in fuzzy math. It belongs to qualitative evaluation methods. Fuzzy evaluation method is based on the "fuzzy set" concept in fuzzy math. It is set up by people’s accumulated knowledge and experience in practice. Therefore, on one hand people's awareness of the objective laws is restricted and limited; On the other hand it minimizes the impact of man-made one-sidedness. This approach allows evaluation to be more accurate, reasonable. The system will use fuzzy evaluation method to be evaluated. (1) Determine the factors set and evaluation set The factors of the display system content a total of six areas: information comprehensiveness, structure clarity, color coordination, quick visit, and recommend reasonable. Set a factor set R = │ R1, R2 ... ... R5 │. R1, R2 ... ... R5 separately represent information comprehensiveness, structure clarity, color coordination, quick visit, and recommend reasonable. We determine the evaluation set V = (0.2, 0.4, 0.6, 0.8, 1.0), which represent (v1 (bad), v2 (poor), v3 (general), v4 (better), v5, (good)).
Multi-hierarchy Information Visualization Research Based on Three-Dimensional Display 293
(2) Determine the evaluation matrix We get the evaluation matrix by calculating the questionnaire results which has been gathered statistics. Sij is the ratio of the number of users who made the j lever to the i factor to the total number taking part in the evaluation.
S= (3) Calculate the weights of factors Based on the user’s questionnaires, we get the factor weight coefficient vector: A = (0.25, 0.3, 0.1, 0.15, 0.2). (4) Calculate fuzzy comprehensive evaluation vector of the factors Here we use the Μ (●, +) model. B = S ● A = (0, 0.1675, 0.285, 0.395, 0.1525) (5) Determine the overall evaluation effect. W = B × V = 0.7065 0.7065 is the overall evaluation of the display system. We can see that the users are basically satisfied with the system. Therefore, the system can do a good job in next perfect job applied to this information.
5 Conclusion and the Future Work According to the user analysis of the Product display system, we set up a representative user model, and against the user model, a systematic functional is analyzed and integrated and hierarchical task is analyzed. In accordance with the user’s needs, we customized display contents and graphical display mode. Finally we designed the system from the structure, interaction and information visualization. There are still many deficiencies in this paper, such as the lack of the navigation system; and for the needs of special populations inadequate; evaluation factors relatively easy and so on. These problems will be focused in the future research.
References 1. Carpendale, S., Light, J., Pattison, E.: Achieving higher magnification in context. In: Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology, Santa Fe (2004) 2. Chen, C.: Searching for intellectual turning points: Progressive Knowledge Domain Visualization. In: Proc. Natl.Acad. Sci., 101th edn., USA (2004) 3. Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Proceeding of the Sixteenth International Conference on Machine Learning, Bled, Slovenia (2006)
294
Z. Hui and H. WenJun
4. Klinkenberg, R., Renz, I.: Adaptive information filtering:learning in the presence of concept drifts. In: Learning for Text Categorization, Menlo Park, CA, 1998, pp. 33–40. AAAI Press, Menlo Park (2003) 5. Morinaga, S., Yamanishi, K.: Tracking dynamics of topic trends using a finite mixture model. In: KDD 2004, Seattle, Washington, pp. 811–816. ACM, New York (2007) 6. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks, arXiv: cond- mat/ 0309488 v1 (2003) 7. Tabah, A.N.: Literature dynamics: studies on growth, diffusion, and epidemics. Annual Review of Information Science and Technology 34, 249–286 (1999) 8. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (2004)
Efficient Annotation Visualization Using Distinctive Features Seok Kyoo Kim, Sung Hyun Moon, Jun Park, and Sang Yong Han School of Computer Science and Engineering, Seoul national University, 599, Gwanak-ro, Gwanak-gu, Seoul, 151-744, Korea {anemone,shmoon}pplab.snu.ac.kr, [email protected], [email protected]
Abstract. Annotation is often used for supplement real objects in Augmented Reality. Previous researches on the annotation have been focused on optimal label placement, where annotations are placed close to the objects while avoiding overlapping. However, optimally placed annotations are still hard to recognize when the number of annotations is large. Human eyes may easily perceive an object by seeing its distinctive features. In this paper, we proposed visualization methods for easily perceivable annotations using distinctive features. The proposed methods are based on studies of effects of colors, depth, style, and transparency of the annotations. Keywords: Augmented Reality, Mixed Reality, Annotation, Usability, Human Factors, Visualization, View Management.
1 Introduction Augmented Reality is associating virtual world with real world by integrating information into their counterparts. People can see the objects in physical world and virtual objects at the same time through HMD or monitor screen. The annotation is one of the important AR topics [1] as users are easily able to figure out or understand the objects in physical world by reading the attached information tag simultaneously. There used to be problems how to avoid annotation overlapping and provide appropriate label placement when lots of annotated objects are placed on a limited space in AR. In order to solve these problems, various researches on label placement methods and algorithms have been conducted. However, there are only a few studies conducted on annotation visualization methods. If there are small numbers of objects labeled, ideal spatial layout may not be the main issue to consider avoiding label overlapping. Otherwise, we have to think about how to reduce the reaction time for AR users to find and read the annotations. One approach is to give visual distinction which would help to get more attention on a certain intended object. Moreover, well-organized annotations by visual distinction would provide additional functionality to support better memorization, thinking, and clarification [5]. In this paper, several annotation visualization methods are suggested, and evaluations were carried out together with existing general annotation visualization M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 295–303, 2009. © Springer-Verlag Berlin Heidelberg 2009
296
S.K. Kim et al.
methods to determine which method is more efficient. Usability tests were conducted by adopting each method we proposed on a couple of cases. Our suggested visualization methods are to display annotations distinctively based on the degrees of importance or correlation among them accordingly. As an application of AR annotation, we have tested the map visualization. The world map contained a large number of annotations in a limited space. As a result, it turned out that our suggested visualization methods demonstrated better results than existing general label placement methods in terms of attention and searching time. It can be predicted from the test results that usability will be greatly improved in AR by adopting our annotation methods. We expect that more researches on the efficient annotation visualization methods should be conducted and contribute on progress of AR view management [3] technology.
2 Background Research Unlike static graphic environment which usually does not often change background color in real time, various condition on background and objects has to be concerned and text color and style greatly affect readability of annotation under AR or Virtual Reality environment. There are researches on label placement and view management related directly and indirectly to AR annotations. Azuma et al. mentioned that appropriate label placement to objects in real world is one of important AR issues [2]. Spatial layout of annotation on AR screen is a problem of view management which is an important AR researching area. Through their researches, Azuma et al. have tested various label placement algorithms including 2D label placement, and adaptation to 3D objects using view management techniques. There are previous studies on label placement about how to place labels on a static map. The static map labeling problem is NP-hard, and a fast approximation might have O(nlogn) complexity [7]. Azuma et al. suggested cluster-based method and he conducted user tests by comparing their method with three existing methods; greedy depth first placement, discrete gradient descent, and simulated annealing algorithm [12]. The test result revealed that their method presented the best performance compared to the existing ones. Their research included user test, calculating reaction time how fast users could read AR data tags. Leykin and Tuceryan described a pattern recognition approach to determine readability of text labels on textured background by comparing computed contrast features as the difference between mean intensities of labels and their surrounding vicinity [8] and Gabbard et al. presented an experiment which examined text legibility with various color and style of text on various background in outdoor Augmented Reality [9]. They showed that readability is greatly affected by texture of background, text style and etc., but their researches did not suggest how to stress certain labels against to others or how to effectively differentiate more important labels. Placing city name tags on the static map is one of the commonly used applications for label placement research. Jaochim et al. have studied on general information and operation by using an augmented static map [4]. That could be an example of AR visualization on paper maps.
Efficient Annotation Visualization Using Distinctive Features
297
3 Efficient Annotation Visualization Annotation is a common and consistent method [16] to add extra information on the contents and knowledge by including additional explanation of those annotations which improves the quality of the contents. Furthermore, Bottoni, et al. dedicated that “annotation supports different cognitive Functions such as remembering – by highlighting the most significant parts of the annotated document, thinking – by adding one’s own ideas, critical remarks, and questions, and clarifying – by reshaping the information contained in the document into one’s own verbal representation” [5]. As aforementioned, there have been many researches on annotation placement and related algorithms. These researches focused on simple annotation placement, not considering visualization styles or methods. We have suggested efficient annotation methods which support additional cognitive functions when labeling objects according to their degree of importance or correlation. First, if there is neither correlation nor difference of importance among them, the existing simple label placement method would be satisfactory. Second, when priority exists, (or objects need to be recognized based on degree of importance), users are able to read or find labels quicker if the labels are distinguishable according to their priority or degree of importance. Third, in the case when the objects have correlation, users are easily able to access to the labels when associated objects are presented in a group. Mechanical repairing manual can be a good example of grouping. Users may understand more efficiently if related parts or equipments are displayed in group-manner. Lastly, there are cases that the degree of importance and correlation are combined together. Even in these cases, combined grouping and distinction methods are expected to improve users’ accessibility to annotations. In order to adopt distinction and grouping methods, we need to classify them in graphical points of view. It is known that humans perceive things better when one object is distinctive to the next or surrounding ones. This is called ‘Pop-out effect’[11]. If this pop-out effect is presented on labels of higher priority, users can pick them up better. By means of distinction method, transparency, depth, color, size, and font style were selected for giving graphical difference, and by means of grouping method, color and link were selected to present correlation. 3.1 Transparency Fig. 1 shows that important objects are displayed vividly while the rest are dim. It gets easier to clarify objects as labels of lower transparency are more easily perceivable, and enables users to find objects of high importance quickly. On the contrary, if the objects are of low importance, it took more time for users to find them out, or some users even failed within a given time. In case of transparency, it is difficult to present level of transparency acceptable to most users. We have used only two levels of transparency (low and high) for objects of importance.
298
S.K. Kim et al.
Fig. 1. Annotations with different level of transparency
3.2 Size Generally, using different sizes of letter is a most commonly used method for map annotation. Continents are annotated by large letters, and the smaller letters for country names. Sizes of letters are getting smaller for capital cities and other cities and so on as shown on Fig. 2.
Fig. 2. Annotations with different sizes
3.3 Depth The labels with 3D depth may present degree of importance. The labels placed in the front attract more of users’ attention. Another advantage is that labels can be displayed in several layers based on importance. It is similar to use of different letter size but no overlapping happens due to being displayed in 3D. 3.4 Font Style It is helpful to use noticeable font style for important labels. Bold and Italic font styles usually attract more attention, but there is a limitation to express several grades of importance.
Efficient Annotation Visualization Using Distinctive Features
299
3.5 Color Fig. 3 shows the differentiation by applying colors on labels. Colors rather indicate brightness or clarity than grant the degree of importance. That means it might not be an appropriate method to display degrees of importance. On the other hand, color differentiation can be used for grouping method. When the labels with the similar characteristics or activities are displayed in same color, users notice them more easily.
Fig. 3. Annotations with d different colors
3.6 Link Links can be effectively used for connecting related labels if objects are dependent on each other. When several groups of associated labels are located together on the limited space, applying different colors for grouping would make users difficult to perceive them efficiently, but connecting related labels with lines can be effective. Fig. 4 shows an example of link.
Fig. 4. Annotations connected with links
3.7 Hybrid Method We may combine two or more methods mentioned above, such as size-color combination, transparency-color combination, and etc. Fig. 5 shows an example of size-color combination.
300
S.K. Kim et al.
Fig. 5. Annotations with different sizes and colors
Several visualization methods for distinction and grouping are suggested as above. There might be more methods available in addition to the methods that we have described.
4 Experiment 4.1 Usability Testing Usability has been regarded as an important awareness factor of the system quality since interactive systems have been introduced [10]. Usability can be explained objectively by measuring accuracy rate and performance time while users achieve tasks in a given system. The concept of usability evaluation includes tests on the interactivity between the system and the users during performing tasks, and the objective and subjective data from the evaluation result could be useful explanations on usability issues [6]. Among the available usability test methods, laboratory usability testing is mainly selected for new or improved interface testing. Current studies reveal that usability testing is a widely adopted evaluation method [14] together with Heuristic Evaluation [13]. In this paper, we consulted Rubin’s guide [15] about usability testing. Rubin defined usability testing as an experiential data collecting technology. It was claimed that usability test could be conducted while observing test procedures of representative users executing representative tasks. The procedure of usability test divided into 6 stages. 1) Planning, 2) Selecting participants, 3) Test material preparation, 4) Testing, 5) Obtaining test result, 6) Data analysis & suggestion. We performed usability testing with our suggested annotation methods and observed how these methods add better cognitive functions. We evaluated each annotation method with preliminary experience tests, and based on the result, two best methods were chosen for augmented map test. Ten people with average age of 25 were selected to participate in each test. 4.2 Preliminary Experiment In this test, ten independent participants were presented on computer screen with total of different thirty words, in which six were differentiated with suggested efficient
Efficient Annotation Visualization Using Distinctive Features
301
methods. All participants viewed all words, and they were requested to say seven words which attracted their attention most one by one. Number of differentiated words in those seven words was counted. We hypothesized that those seven words represented more perception of importance than others. The words chosen in the given test were the most frequently occurring surnames from United States Census Bureau which are easy and wellknown enough so that there was no discrimination for distinction on meaning and only efficient method illustrated degree of importance or priority. In the test of size method, selected six words were enlarged by 15% bigger and in color, cyan was chosen which provided high legibility based on [9].
Fig. 6. Preliminary tests with color and font method Table 1. Number of matched words Method Size Font Style Color Size-Color
Min 3 0 4 5
Max 6 3 6 6
Median 4.5 1 6 6
Mean 4.4 0.9 5.5 5.7
Table 1 summarizes the result. The result showed that the differentiated words using color annotation method attracted more attention to participants than others. Different font styles of words did not work as much as color method. This test represented that certain annotations could be expressed priority implicitly through efficient annotation method. 4.3 Experiment: Augmented Map We conducted user tests by using color and size-color annotation methods which gave best result from the preliminary experiment and measured the reaction time to see in which environment users were able to find labels quickly. Static Map was chosen as a test application because it contained a lot of labels displayed over the screen. Relatively well-known cities and low-recognized cities were selected to be searched for, and reaction time was recorded.
302
S.K. Kim et al.
Fig. 7. Augmented Map annotated with size and color Table 2. Reaction time in seconds Method Simple Color Size-Color
Min 1 1 1
Max 53 50 45
Median 20.5 10.5 6.0
Mean 22.7 14.3 9.4
Table 2 shows the test result. Compared to simple annotation method, it was shown that size-color combination annotation enabled users to find out cities much quicker.
5 Conclusions and Future Works This paper presented various efficient annotation visualization methods. Each visualization method realized distinction and grouping by expressing the degrees of importance and the correlation among objects. Evaluation result revealed that, compared to existing general placement methods, suggested annotation methods shortened the searching time and given AR tags were found more easily and quickly. Although small number of users took part in the tests, the provided result was very meaningful because it revealed that usability would be greatly improved by adopting efficient annotation visualization methods in AR. In conclusion, due to tremendous development of computer processing technology, real-time annotation distribution is no longer a surprise. Even overlapping or time delaying issues have been overcame. Rather, in this super rapid processing environment, users are more interested in how quickly they perceive information they are looking for. More researches and development on AR environment should be directed towards satisfying these needs and interests. We plan to keep continuing further studies on AR annotation visualization to fulfill this demand. We also plan to perform researches on different evaluation methods such as Heuristic method in AR environments. We believe that further suggestions for improving current usability issues would be made through these future works.
Efficient Annotation Visualization Using Distinctive Features
303
References 1. Azuma, R.T.: A Survey of Augmented Reality: Presence: Teleoperators and Virtual Environments 6(4), 355–385 (1997) 2. Azuma, R., Furmanski, C.: Evaluating label placement for augmented reality view management. In: Proceedings of the 2nd IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 66–75 (2003) 3. Bell, B., Feiner, S., Hollerer, T.: View Management for Virtual and Augmented Reality. In: Proc. Symp. on User Interface Software and Technology, pp. 11–14 (2001) 4. Bobrich, J., Otto, S.: Augmented maps: Geospatial Theory. Processing and Applications 34(4) (2002) 5. Bottoni, P., Civica, R., Levialdi, S., Orso, L., Panizzi, E., Trinchese, R.: MADCOW: a multimedia digital annotation system. In: Proceedings of the Working Conference on Advanced Visual interfaces, AVI 2004, Gallipoli, Italy, May 25 - 28, 2004, pp. 55–62. ACM Press, New York (2004) 6. Butler, K.A.: Usability Engineering Turns 10. Interactions 3(1), 59–75 (1996) 7. Christensen, J., Marks, J., Shieber, S.: Labeling point features on maps and diagrams. Technical Report TR-25-92. Center for Research in Computing Technology, Harvard University (1992) 8. Leykin, A., Tuceryan, M.: Automatic determination of text readability over textured backgrounds for augmented reality systems. In: Proceedings of the Third IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 224–230 (2004) 9. Gabbard, J.L., Edward Wan II, J.: Usability Engineering for Augmented Reality: Employing User-Based Studies to Inform Design. IEEE Transactions on Visualization and Computer Graphic 14(3) (2008) 10. Dzida, W., Herda, S., Itzefeldt, D.: User-perceived quality of interactive systems. IEEE Trans. Software Eng. 4(4), 270–276 (1978) 11. Goldstein, E.B.: Cognitive Psychology: Connecting Mind, Research, and Everyday Experience. Wadsworth, Belmont (2006) 12. Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220, 671–680 (1983) 13. Nielson, J.: Heuristics Evaluation. In: Nielson, J., Mack, R.L. (eds.) Usability Inspection Methods. John Wiley and Sons, New York (1994) 14. Rosenbaum, S., Rohn, J.A., Humburg, J.: A Toolkit for Strategic Usability: Results from Workshops, Panels, and Surveys. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 337–344 (2000) 15. Rubin, J.: Handbook of Usability Testing. John Wiley & Sons, New York (1994) 16. Verhaart, M.: An annotation framework for a virtual learning portfolio. In: Proceedings of the Sixth International Conference on Advanced Learning Technologies (ICALT 2006), pp. 156–160 (2006)
Content Based Image Retrieval Using Adaptive Inverse Pyramid Representation Mariofanna Milanova1, Roumen Kountchev2, Stuart Rubin3, Vladimir Todorov4, and Roumiana Kountcheva4 1
Computer Science Departmentt, UALR, USA [email protected] 2 Department of Radio Communications, Technical University of Sofia, Bulgaria [email protected] 3 SSC San Diego, California, USA [email protected] 4 T&K Engineering, Bulgaria [email protected], [email protected]
Abstract. This paper presents a new approach for content-based image retrieval using cognitive representation with pyramidal decomposition. This approach corresponds to the hypothesis of the human way for object recognition based on consecutive approximations with increased resolution for the selected regions of interest. The method is based on object model creation with Inverse Difference Pyramid controlled by neural network. The method’s basic advantages are the high flexibility and the ability to create general models for various views and scaling with relatively low computational complexity. The method is suitable for great number of applications – medicine, digital libraries, electronic galleries, geographic information systems, documents archiving, digital communication systems, etc. Keywords: content- based image retrieval, multi-layer representation, IDP decomposition.
1 Introduction Research in the area of Content Based Image Retrieval (CBIR) has come a long way since it was first introduced by T. Kato in 1992. CBIR has been a focus of intensive research with more than 300 scientific publications per year [1]. Most of the widely known methods for image and video retrieval are based on the use of quantitative (low –level) features and qualitative (high level) features. Feature design problems include finding how many meaningful visual features do exist and on which spatiotemporal regions of media objects should the selected features be applied on. The classical answer to these problems is the Multi-Resolution Analysis (MRA). The basic MRA hypothesis is that using interactively computed 2D wavelet coefficient matrices as features is sufficient for content retrieval. Generally, the visual retrieval process aims at finding media objects that are similar to given examples. “Similarity” is a weakly defined term, and therefore difficult to implement in computer systems. Two requirements (the similarity matching and the M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 304–314, 2009. © Springer-Verlag Berlin Heidelberg 2009
Content Based Image Retrieval Using Adaptive Inverse Pyramid Representation
305
user feedback) have to be satisfied by visual information retrieval (VIR) systems. The similarity matching has to be performed on media objects represented by feature vectors and the user feedback has to be integrated in the retrieval process. The retrieval therefore is a necessary interactive communication process between user and computer. One major advance of VIR in recent years was achieved by using relevant feedback. Unfortunately, even the most sophisticated algorithms are still not able to satisfy the users’ need for similarity-based retrieval sufficiently. The question “How domain knowledge is represented?” is still open. Most of the VIR systems derived from text retrieval concepts, but it is not necessary to use the same or similar mining techniques in VIR systems. The human visual system has the ability to correctly interpret most images even using low resolution images. Search and visual information processing, as seen by psychologists, is observed in the following three basic hypotheses: • The first is that image resolution exponentially decreases from the fovea to the retina periphery. Unlike digital cameras and their uniform sampling acquisition system, humans do not see the world uniformly, because the retina receptors are not equally distributed on its surface, but are concentrated in the fovea [2]. This hypothesis can be represented computationally with different resolutions. The visual attention points may be considered as the most highlighted areas of the Visual Attention model i.e., these points are the most salient regions in the image. When going further from these points of attention, the resolution of the other areas dramatically decreases. There are existing models where perception of visual environment is based on the fact that the observer first fixates the higher attention level areas and only then he looks at the other areas. Different authors work with various filters and kernel size [3]. • Another interesting question is the role of the visual contextual information in the attention model creation and VIR. Most computational attention models ignore the contextual information provided by the correlation between objects and the scene. Schyns and Oliva [4] showed that a coarse representation of the scene initiates semantic recognition before the identification of objects is performed. Many studies support the idea that scene semantics can be available early in the chain of information processing and suggest that scene recognition may not require object recognition as a first step [5], because humans can recognize the scene even using low-special frequency image. • Covert attention allows us to select visual information at a cued location, without eye movements. It is proved that covert attention not only improves discriminability, but also accelerates the rate of information processing [6]. Attention affects both spatial and temporal aspects of visual processing. By enhancing the signal, attention improves discriminability and enables us to extract relevant information in a noisy environment by accelerating information processing. In this paper is presented one new approach for content-based image retrieval based on these main hypotheses. The proposed solution is based on image representation with adaptive inverse difference pyramid (IDP) decomposition controlled by neural network. Such image representation corresponds to the human way of objects perception and is suitable for the creation of flexible objects' models, which to be used for query procedures in image databases in accordance with predefined decision rules. Significant element of the new representation is the use of a feedback, which provides
306
M. Milanova et al.
iterative change of the cognitive models' parameters in accordance with the data mining results obtained.
2 Basic Principles of the IDP Decomposition The algorithm for recursive IDP coding of halftone digital images comprises the following steps: Step 1. The matrix [X] of the original image is divided into sub-images of size 2n×2n and each is then processed with a two-dimensional (2D) orthogonal transform (OT) using only a limited number of spectrum coefficients (usually, the low-frequency ones). The values of these transform coefficients constitute the first pyramid layer. Step 2. Using the values of the transform coefficients, every sub-image is restored by inverse orthogonal transform and then subtracted pixel by pixel from the original one. The difference sub-image with elements e p (i, k ) in the IDP layer р is defined as:
x 0 (i, k ) for p = 0; ⎧ x (i, k ) − ~ e p (i, k ) = ⎨ ~ ⎩e p-1 (i, k ) − ep −1 (i, k ) for p = 1,2, .., P,
(1)
where x(i,k) is the pixel (i,k) in a sub-image of size 2n×2n of the input image [X] (Fig.1a); ~ x 0 (i, k ) and ~ep-1 (i, k) are correspondingly the pixels of the recovered input and the difference sub-images in the IDP layer р. Step 3. The difference sub-image is divided into 4 sub-images of size 2n-1×2n-1. Each sub-image is then processed with 2D OT again and the values of the used transform coefficients build the second pyramid layer. The image is then restored and the second difference image is calculated. The process continues in a similar way with the next pyramid layers. The block diagram for pyramid of 3-layers is shown in Fig. 1. The applications usually do not require all the pyramid layers to be calculated, because the needed image quality is usually obtained in the lower layers. Such pyramid is called “truncated”. The approximation models of the input or difference image in the layer p are represented by the relations: y p (u , v) = T[ x (i, k ) / e p −1 (i, k )] and ~ x (i, k ) / ~ep −1 (i, k ) = IT[~y p (u , v)] (2)
where T[•] is the operator for the truncated direct two-dimensional orthogonal transform applied on the input block of size 2n×2n, or on the difference sub-image of size 2n-р×2n-р from pyramid layers р=1,2,..,P (Fig. 2b); IT[•] is the operator for the inverse OT of the spectrum coefficients ~y p (u,v) from the layer р of the truncated transform 2n-р×2n-р, obtained in result of the transformation of each ¼ part of the difference sub-image, ep-1(i,k). Specific for the IDP is that the OT coefficients, used for every pyramid layer, can be different. The coefficients from all pyramid layers are sorted in accordance with their frequency, and scanned sequentially. The obtained one-dimensional massif for
Content Based Image Retrieval Using Adaptive Inverse Pyramid Representation
k 0= 1
k 0= 2
k 0= 3
s u b -im a g e 2kn 0×=2 1n
s u b -im a g e
s u b -im a g e n 2kn0×=2 3
2 n× 2 n
ID P la y e r p = 0 , o r i g in a l i m a g e [ X ]
V
307
- -- - - -- - - -- - - -- - - -- - -
- - - - - - - - - - -
k 0= K S u m m a r iz a t io n - - - - - - - - - S t ru c t u r e
- - - - - - - - - -
- - - - - -- - - -- - -
s u b -im a g e 2 n× 2 n
H
a. The original image of size H×V, divided into K sub-images of size 2n×2n (layer p=0). k 1 =1
k 1= 2
k 1 = k3 0 = k1 1 = 4
k 1 =k5 0 = k2 1 = 6
k 1 = k9 0 = k31 = 1 0
k 1 =2 0×=2
s o u s -i m a g e k 1 =21kkn100×==2k31n1 = 1 2
7kn
k1n
1= 8
ID P la y e r p = 1 o f th e d i f f e r e n c e im a g e [ E 0 ]
V
S u m m a r iz a t io n - - - - - - - - - S tr u c tu r e
- -- -- -- -- --- -- -- -- -- -
- - - - - - - - - - -
k = k1= k = K1 4 K − 30 4 K − 2 - - - - - - - - - -
- - - - - -- - - -- - -
s o u s -im a g e
k1= n k 1= 2k1 0×=2 1n4 K 4K−
H
a. Each sub-image of the difference image [Y0] for layer p=0 is divided into 4 sub-images of size (2n-1×2n-1) in the pyramid layer p = 1
Fig. 1. The IDP layers p=0,1 for an image of H×V pixels
the s-th frequency band of the two-dimensional OT of the input or of the difference image for the IDP layer p is represented by the relation: ~y (s) = ~y [u = ϕ (s), v =ψ (s)] (3) p p where u = ϕ (s) and v = ψ (s) are functions, which define the transformation for the two-dimensional massif of coefficients in the s-th frequency band for the layer p. The block diagram of the IDP decomposition is shown in Fig. 2. The image decoding is performed in reverse order. The processing of color images depends on the color component representation – individual pyramid is build for each component [7]. The object representation based on the IDP decomposition offers the solution for problems, concerning image rotation and translation: for RST-invariant transforms (Fourier-Mellin) [8] the values of the decomposition coefficients are invariant as well. The object representation is done using a single original image, or more than one images (different view, lighting, color, scaling, etc). In the second case the initial object representation (in the lower decomposition layers) is fuzzy and the more exact representation is defined for the higher decomposition layers. The coefficients for every sub-image in the consecutive pyramid layers build the vectors of the object’s features, which are then used for the model evaluation. For this the selected coefficients are processed with inverse transform, and the quality of the restored image (i.e. the model error), is estimated. In case that this error is too big, the neural network,
308
M. Milanova et al. Coded data [E11 ]
[E14 ]
[E13 1 ]
[E16 1 ]
p=2
DT 2n−2 ×2n−2
~ [Y21 ]
~ [Y24]
~ [Y213]
~ [Y216]
~y (s) p
~y (u,v) 2
[E1 ] ~ [E 0 ]
+ p=1
Σ −
2 [ E10 ] [ E 0 ]
[ E 04 ]
[ E 30 ]
~ ~ [E10 ] [E 02 ]
IT
~ ~ [E30 ] [ E 40 ]
2n−1 × 2n−1
DT
~ ~2 [ Y11 ] [Y 1 ]
n −1
n −1
2 ×2
~ ~ [Y13 ] [ Y14 ]
~ y1(u,v)
[E 0 ]
p=0
+
Σ
[X ]
−
~ [X 0 ]
DT 2n × 2 n
IT 2n × 2n
~ [Y0 ]
~ y0(u,v)
Fig. 2. Block diagram of 3-layer IDP decomposition
which controls the features’ selection for the next decomposition layer performs the corresponding model tuning. At the last decomposition layer is obtained the final description of the object model. The abbreviations used in the block diagram are: [X] – the matrix of the processed image (sub-image); DT/IT – direct/inverse orthogonal transform; [Ep] - the difference (error) matrix in layer p; ~yp (u,v) – the retained set of coefficients in layer p.
3 Multi-layer Image Retrieval The image retrieval is performed by comparing the object model of 2 or more layers with the content of the images in the database. The multi-layer search is based on the evaluation of the multi-layer distance, which is defined as a sequence of differences between approximations, obtained in result of the layered IDP decomposition for any couple of compared objects: objects with maximum similar content have smallest multi-layer distance. The initial presumption is that the queried object image is smaller than the database image. In general, the number of search layers corresponds to the number of decomposition layers used for the object model creation. The queried object model is used for the creation of the corresponding pyramid decomposition for a sub-image (window) of same size in every image from the database. The initial position of the search window in one of the database image corners (for example, the lower left corner). For this position is evaluated the distance between the object model vector for the layer 0 and the corresponding vector obtained
Content Based Image Retrieval Using Adaptive Inverse Pyramid Representation
309
for the search window content. After translation by one step in the selected direction (horizontal or vertical), the distance between the compared vectors is evaluated again, etc. When the scanning in the database image is finished for the decomposition layer 0, the search continues in similar way with the next database image for the same decomposition layer until all images are processed. When the analysis for the layer 0 is completed, the database images, containing an object which is close enough to the queried object model for this layer, are separated in a special group. In case that there are no images, which answer the requirements, this group is empty. The described operations are performed in similar way for the decomposition layer 1 of the separated images only. In the consecutive layers the number of images, which answer the requirement to be close enough to the queried one, becomes smaller. For the defined empty groups additional search should be performed, for which through feedback is introduced the next model (different view angle, lighting, etc., if there is such) and the described operations are preformed again. The search for the closest object in the image database {Xt} for t = 1, 2,.., N is represented by the relations below: For the IDP layer p = 0 the distance between the object model request [X] and the object representation [X t ] , from the database, is: ~ ~ D 0 {[X 0 ],[X 0t ]} =
K
S0
∑ ∑ [~y0,k (s) − ~y0,t k (s)]
k 0 =1 s =1
o
(4)
0
where S0 is the number of the retained spectrum coefficients in the layer р=0. For IDP layers p = 1, 2,.., P the distance between the object model for the corresponding layer and the sub-image from an image in the database, is: 4 ~ ~ Dp{[Ep−1],[Ept −1]}=
p
K Sp
∑∑ [~yp,k (s)− ~yp,t k (s)]
kp =1 s=1
p
(5)
p
where S0 is the number of the retained spectrum coefficients in the layer р. The distance between object models [X] and [X t ] is calculated for p = 0, 1,…,P: ~ ~ D{[X], [X t ]} = D 0 {[X 0 ][X 0t ]} +
P
∑ D p {[E p−1 ],[E pt −1 ]} ~
~
(6)
p =1
The multi-layer search in the image database comprises the following operations: - All distances for layer p = 0 of the IDP decompositions between the object request and the images in the database are calculated and is found the smallest one. The image from the database, which has a part, containing an object with smallest distance, is named tr and is represented by the relation: ~ ~ t =t r if D 0{[X 0 ], [X 0t r ]= min< d min (0) for p=0
(7)
where d min (0) is a threshold for the IDP layer p=0. In case that there is only one such image, the search is successfully finished.
310
M. Milanova et al. lmage database Image 1
Image 2
Image N
IDP 1
IDP 2
IDP N
Search window control
Level P1 Level 01 Level 0
ln
query object IDP model
Level P2
Level PN
Level 02
Level 0N
similarity estimation for level 0
Level 1 Level 11 Level 12
max similarity
Level 1N
1 No
Yes
similarity estimation for level 1 Level P1 Level P2
max similarity
Level PN
Yes
2 No
similarity estimation for level P Level P Out 1
NN tuning
Yes
Decision and association rules
max similarity
1 2 more models?
No
No
Yes Out 2
Fig. 3. Block diagram of the method for cognitive multi-layer image retrieval
If the images with smallest distance for p=0 are more than one, they are separated in a group, and the search continues in similar way for the next IDP layers, for the so defined group only: ~ ~t t =t r if D p {[E p−1 ],[E pr−1]}=min< d min (p) for p = 1,2,..,P, (8)
where: d min (p) are the thresholds (the values of the thresholds define the required accuracy of the performed search process for the corresponding IDP layers); tr is the image from the database, whose distance is the smallest for the decomposition layer p. Maximum similarity is obtained for the case, when the function, which represents the multi-layer distance (6), has a minimum. The block diagram of the image retrieval method based on the IDP decomposition is shown in Fig. 3. In the block diagram are shown two possible outputs: for detected
Content Based Image Retrieval Using Adaptive Inverse Pyramid Representation
311
closest image from the database (Out 1) and for missing similar image (Out 2). Significant element is the use of a feedback, through the block named NN tuning, which provides iterative change of the cognitive models' parameters in accordance with the data mining results obtained. This block modifies the object model in two main cases: 1. the needed similarity is not achieved; 2. in accordance with predefined decision and association rules.
4 Experimental Results The experiments were performed with more than 50 test objects. In the experiments was evaluated the efficiency of the models and the ability for their recognition. For the object models creation was used the software implementation of the IDP decomposition of 7 layers. The basic models were created with truncated pyramid of
1
2
3
4
Original
Layer 1
Layer 1,2
Fig. 4. Object representation from multi-view images of same plane
a. Original
b. Layer 1
c. Layer 2
Fig. 5. Test image “Chris”
a. Original
b. Layer 1
c. Layer 2
Fig. 6. Test image “Tank”
312
M. Milanova et al. Table 1.
Layer 1 Bit-rate Image size [B] [bpp] Plane1 67 0,039 Layer1+2 Plane2 84 0,049 Layer 1+2 Plane3 89 0,053 Layer 1+2 Plane4 81 0,047 Layer 1+2 Chris 122 0,086 Layer 1+2 Tank 700 0,054 Layer1+2
CR 203
PSNR [dB] 22,10
162
20,20
150
21,29
168
20,94
92
19,80
147
22,11
Layer 2 size [B] 216 229 261 270 264 262 264 256 442 393 2000 1880
Bit-rate [bpp] 0,127 0,134 0,150 0,159 0,155 0,153 0,155 0,150 0,310 0,280 0,149 0,140
CR 63,0 59,4 52,0 50,4 51,52 51,91 51,52 53,13 25,48 28,66 53,43 56,78
PSNR [dB] 24,21 24,21 22,37 22,37 23,25 23,25 22,85 22,85 22,24 22,24 24,60 24,60
300
25
250
20
200 150
15
100
10
50
5
0
0 -7
-5
-3
-1
2
4
6
a. “Plane 1” model
-9 -7 -5 -3 -1 2 4 6 8 11 13 15 26 28 31 33 37 44
b. “Chris” model
Fig. 7. Graphic representation of the object models for “Plane 1” and “Chris” - Layer 1
2 layers (initial layer with sub-block of size 8×8 pixels). The retained coefficients were 4 for the lower layer and 3 - for the next one. The 2D transform was WalshHadamard. All test images “Plane” are grayscale, of size 170×80 pixels. The image “Chris” is of size 88×128 pixels and the image “Tank” – 272×400. Some of the obtained results are shown in Table 1. The column “Layer 1” gives the information about the size of the data for the lower layer of the object model and “Layer 2” - the size of the compressed data for the next layer. For each layer are given the bit-rate and the compression ratio (CR). In rows “Layer 1+2” is given the size of the compressed 2-layer object model data. The experiments prove the efficiency of the presented method for object model creation (the bit-rate of the object models for layers 1 and 2 is very low). The PSNR, i.e. the similarity between the object and the model for these two layers is low, but enough to recognize plane, face or tank. For more difficult tasks (which model is the plane, etc.) we need more complicated representation, using Layer 3 or even higher. Different views are necessary as well. In Fig. 4 are shown the original test images Plane 1 – Plane 4 (multi-view) with their models for Layers 1 and 2. In Figs. 5 and 6 are shown the test images “Chris” and “Tank” with their corresponding models for decomposition layers 1 and 2. The graphics in Fig. 7 represent the histograms of the values of the coefficients, which build the object models (4 coefficients, Layer 1) for a plane (Plane 1) and
Content Based Image Retrieval Using Adaptive Inverse Pyramid Representation
313
human face (Chris). The graphic representations for all planes are quite similar and the graphics for the remaining three test objects are not given here. The difference with the object representation of “Chris” is quite clear, both in range and allocation.
5 Conclusion The new method for objects search was simulated with MATLAB using two or more pyramid decomposition levels. The 2D transform was Fourier-Mellin. The matching obtained for images in several image classes (forest, city, desert, etc.) was more than 80%. The new method for content based image retrieval ensures faster search in large databases, because the layered processing permits significant part of the images to be excluded from further search at the end of the Layer 1 analysis. The use of the IDP decomposition permits the creation of efficient multi-view models. Another important advantage is the multi-scale representation, based on the relations between transform coefficients in adjacent layers, which offer significant reduction of the transform coefficients, needed for the object model creation [9]. The introduction of a flexible feedback in the process of object model creation and search makes this approach close to the human way of thinking. The new method permits to develop flexible models for some basic image kinds, for example: texts/graphics, cartoon images, medical images, natural grayscale or color images, etc., which to be later defined more accurately, in accordance with the object peculiarities. This approach, which is based on preliminary knowledge, will facilitate the object representation and search. Acknowledgment. This paper is supported by the National Fund for Scientific Research of the Bulgarian Ministry of Education and Science (Contract No VU-I 305), NSF grant 0619069 Development of IAEL and by funding from the U.S. Defense Threat Reduction Agency (DTRA-BA08MSB008).
References 1. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys 40(2), article 5, 60 (2008) 2. Hubel, D.: Eye, Brain and Vision Scientific American Library, vol. 22. W. Freeman, New York (1989) 3. Mancas, M., Gosselin, B., Macq, B.: Perceptual Image Representation. EURaSIP Journal on Image and Visual Processing, article ID 98181 (2007) 4. Schyns, P., Oliva, A.: From blobs to boundary edges: evidence for time and spatial scale dependent scene recognition. Psychol. Sci. 5, 195–200 (1994) 5. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Computer Vision 42, 145–175 (2001) 6. Carrasco, M., McElree, B.: Covert attention accelerates the rate of visual information processing. PNAS 98, 5363–5367 (2001)
314
M. Milanova et al.
7. Kountchev, R., Milanova, M., Ford, C., Kountcheva, R.: Multi-layer Image Transmission with Inverse Pyramidal Decomposition. In: Halgamuge, S., Wang, L. (eds.) Computational Intelligence for Modelling and Predictions, ch.13, vol. 2, pp. 179–196. Springer, Heidelberg (2005) 8. Derrode, S., Ghorbel, F.: Robust and efficient Fourier-Mellin transform approximations for gray-level image reconstruction and complete invariant description. Computer vision and image understanding 83(1), 57–78 (2001) 9. Kountchev, R., Kountcheva, R.: Image Representation with Reduced Spectrum Pyramid. In: Tsihrintzis, G., Virvou, M., Howlett, R., Jain, L. (eds.) New Directions in Intelligent Interactive Multimedia, pp. 275–284. Springer, Heidelberg (2008)
Event Extraction and Visualization for Obtaining Personal Experiences from Blogs Yoko Nishihara1, Keita Sato2, and Wataru Sunayama2 1
The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo [email protected] 2 Hiroshima City University, 3-4-1 Ozuka-Higashi, Asa-Minami-ku, Hiroshima {keita,sunayama}@sys.im.hiroshima-cu.ac.jp
Abstract. Internet users write blogs related to their personal experience, daily news, and so on. Though we can obtain blogs about personal experience using search engines on the Web, the search engines also output blogs about other topics unrelated to personal experiences. Therefore, we need to take too much time to read all blogs for obtaining those about personal experiences. This paper proposes a support system for obtaining blogs about personal experience efficiently. The system extracts three keywords that denote place, object, and action from a blog. The three keywords describe an event that leads a person to write a blog about personal experience. The system expresses the event with three pictures related to the extracted keywords. The pictures help users to judge whether personal experiences are written in the blog or not. We experimented with the system, and verified that it supports users to obtain personal experiences efficiently. Keywords: personal experience, pictures expressing an event, place keyword, object keyword, action keyword.
1 Introduction Since blogs have been written by many people, we can obtain personal experiences and/or reviews of commercial items from blogs. For example, if a man/woman need information about local area in Hokkaido, he/she can obtain such information by entering queries Hokkaido AND sight seeing into a search engine and reading Web pages of search results. The search results also contain other information, plans for sight seeing in Hokkaido, sight seeing in some areas near Hokkaido and so on, except those about personal experience in Hokkaido. Therefore, he/she has to spend much time to read all of the blogs. It causes the low efficiency of information acquisition. This paper proposes a support system for obtaining information about personal experience from blogs. We consider that people obtain personal experiences through some events that they have experienced. Therefore, the proposed system uses the events for extracting personal experiences. The proposed system extracts some keywords denoting an event, and visualizes the event using pictures expressing the extracted keywords. Users can judge whether personal experiences are written in blogs or not by watching the pictures. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 315–324, 2009. © Springer-Verlag Berlin Heidelberg 2009
316
Y. Nishihara, K. Sato, and W. Sunayama
Visualizing events using pictures makes users not to read all of blogs, and the amount of time to obtain personal experiences is reduced [6, 10]. The possibility that events and personal experiences are written together in a blog is usually high (In the experiment for the proposed system, the possibility was 93%). Therefore, we can support users obtaining personal experiences by extracting events. We define an event as three keywords, an action keyword, an object keyword, and a place keyword. Other keywords, such as time keywords, subject keywords, and reason keywords, are generally used to visualize events. In case of time keywords, though some methods have been proposed [9], we do not use time keywords because we need a large amount of training data for machine learning to output the time keywords. In case of subject keywords, the subject of personal experience in a blog is always the writer of the blog. Therefore, we do not extract subject keywords from blogs. And we do not extract reason keywords because of the difficulty of extraction.
2 Related Work 2.1 Information Extraction from Web There are many studies about information extraction from the Web [2]. In case of extracting information that is interesting to the public, there has been a method to extract human names, keywords, sentences attracted by the public from blogs [5]. There have been some methods for extracting noticed persons and noticed events [12, 13, 16]. Another method extracts keywords that will be attracted in the future using time series analysis of keyword frequencies [4, 8]. Though the proposed system also extracts information from the Web, the proposed system extracts information about personal experience instead of information attracted by the public. 2.2 Review Extraction It is considered that personal experience is one of reviews. Some methods have been proposed to extract reviews of commercial items and movies from the Web. Some methods learn features of the reviews by machine learning [3, 11]. The methods proposed in [14, 15, 17] extract and show the reviews of commercial items using learned features. The proposed system extracts events leading persons to write blogs about personal experience, not extracts personal experiences themselves. Events are different from commercial items and movies. Since each people feel the same event in each way, it is considered that the number of keywords for writing personal experiences is higher than the number of keywords for writing events [18]. Events and personal experiences appear together with high frequency in blogs. Therefore, the proposed system extracts events and supports users for obtaining personal experiences. 2.3 Support for Obtaining Personal Experience The method proposed in [18] extracts personal experiences from blogs semiautomatically, and shows thumbnails of the blogs to users. The proposed system also shows pictures corresponding to events. Therefore, the proposed system is different from the method of [18] in showing pictures.
Event Extraction and Visualization for Obtaining Personal Experiences from Blogs
317
3 Proposed System The system takes queries as input. The queries represent a theme that blogs are written based on. For example, if he/she need to obtain personal experiences about sight seeing in Hawaii, he/she should input Hawaii AND sight seeing into a search engine. The system downloads blogs including the queries and narrows the downloaded blogs to blogs including sentences about events. Since sentences are usually written as past tense to represent events, the system extracts blogs including sentences of past tense for narrowing the blogs. Next, the system separates blog texts every sentences including place keywords. The system extracts keywords (object keywords and action keywords) representing events from separated the texts. After the system sets out pictures representing the extracted keywords, the system outputs the set of the pictures. 3.1 Blog Text Separation We explain how to separate blog texts every sentences including place keywords. It is considered that a blog has some descriptions about different places. Therefore, the system separates blog texts using sentences including place keywords, and looks on the separated text as a block. For extracting place keywords, the system firstly extracts noun keywords following prepositions such as at, on, in representing places. The system extracts such nouns as candidates of place keywords, and decides one noun keyword as a place keyword. 3.2 Extraction of Keywords Corresponding to Event The system extracts three keywords from a block. The keywords are a place keyword, an object keyword, and an action keyword. In the following section, we explain how to extract object keywords and action keywords. 3.2.1 Extraction of Object Keyword We explain how to extract object keywords. The object keywords are noun keywords. If some noun keywords are in a block, the degree of relationship between each noun keyword and the extracted place keyword is evaluated the following equation,
relation( p,o) =
hit( p ∧ o) hit( p ∧ o) hit( p) hit(o)
(1)
where p denotes a place keyword, and o denotes an object keyword. Eq.(1) calculates a rate of the number of Web pages including the place keyword to the number of Web pages including the object keyword. If the value of Eq.(1) is high, it is considered that two keywords are related to each other. The system decides a noun keyword with the highest value of Eq.(1) as an object keyword. 3.2.2 Extraction of Action Keyword We explain how to extract action keywords. Action keywords are verb keywords that appear in the end of sentences including the extracted object keywords. This is because an action keyword and an object keyword appear in the same sentence.
318
Y. Nishihara, K. Sato, and W. Sunayama
3.3 Setting Out of Pictures for Visualizing Event The system sets out of pictures representing the extracted keywords. The pictures are aligned crossly, a place picture, an object picture, and an action picture from the left to the right. If a blog has some blocks, pictures of the first block is visualized only. We prepared a database of pictures. The database has been created using pictures of an image search engine [7]. We input keywords extracted blogs as queries into the search engine, and chose one pictures from top of 20 pictures of the search results for each keyword. We spent from two seconds to 30 seconds for choosing one picture. Now, the database has about 1,000 pictures for place keywords, about 700 pictures for object keywords, and about 200 pictures for action keywords. If the database does not have pictures corresponding to the extracted keywords, the system visualizes blanks instead of the pictures. 3.4 Output: Three Pictures Visualizing Event The system outputs a set of pictures visualizing events (shown in Fig. 1). If titles and summaries of blogs have been obtained in downloading blogs, the system also outputs those with the set of pictures.
Fig. 1. Output of proposed system. A set of blogs with pictures.
Event Extraction and Visualization for Obtaining Personal Experiences from Blogs
319
4 Experiment for Proposed System In the experiment, we asked participants to extract texts written about personal experience from blogs using the proposed system. We collected blogs from a blog site, Yahoo! Blog [1]. We used top 100 blogs of the search results by entering queries for this experiment. The queries for blog collection are shown in Table 1. Five of the queries are about sight seeing, and one of the queries is about school festival. We chose those queries because it is considered that, in obtaining personal experiences, most of people search something about events that they will also experience in the near future. Table 1. Queries for experiment of proposed system Okinawa, Tokyo, Hiroshima, Nigata, Hokkaido School festival
AND sight seeing AND shop for eating
We prepared the other system (baseline system) that shows titles and summaries of blogs as shown in the proposed system. The same texts of blogs as shown in the proposed system can be read in the baseline system. The 100 blogs were separated into four subsets including 25 blogs and shown using a Web browser. The size of a window of a Web browser was 1,200 x 1,920 pixels. We instructed the participants as follows: 1. (Proposed) Watch the set of pictures for each blog. / (Baseline) Read the summaries for each blog. 2. Judge whether events related the queries are written or not. 3. If you find blogs that events are written in, copy the blog texts about personal experience and paste them in a text editor. The number of participants was 36 undergraduate/graduate students majoring information science. We assigned 18 participants to one query and one system. The time of one session was five minutes. This is because it spends about five minutes to search something. We compared the averages of personal experiences extracted using the prepared systems. 4.1 Experimental Results Table 2 shows averages of blogs read by the participants. For all of the queries, the averages for the proposed system were higher than those for the baseline system (P<.05). This is because the time to understand the contents by watching pictures is shorter than the time to understand the contents by reading summaries. The result indicates that the proposed system supports users for reading more blog texts. Table 3 shows averages of blogs from which personal experiences were extracted. The extracted texts were proper to personal experiences. The averages of blogs for the proposed system were higher than those for the baseline system (P<.05). This is because events were visualized by the pictures output by the proposed system, and
320
Y. Nishihara, K. Sato, and W. Sunayama Table 2. Averages of blogs read by participants
Proposed Baseline
Okinawa 5.8 4.9
Tokyo 4.9 4.7
Hiroshima 5.1 4.3
Nigata 4.7 4.6
Hokkaido 5.9 4.7
School festival 5.4 4.7
Table 3. Averages of blogs from which personal experiences are extracted by participants
Proposed Baseline
Okinawa 2.2 1.4
Tokyo 3.1 2.6
Hiroshima 2.9 2.6
Nigata 1.8 1.3
Hokkaido 3.2 2.5
School festival 3.4 2.1
Table 4. Rate of blogs read by participants to blogs from which personal experiences were extracted by participants
Proposed Baseline
Okinawa 0.38 0.29
Tokyo 0.63 0.55
Hiroshima 0.57 0.60
Nigata 0.38 0.28
Hokkaido 0.54 0.53
School festival 0.63 0.45
most of the blogs chosen by the participants had texts about personal experience. This is also appeared in Table 4. In Table 4, except Hiroshima, the rates of read blogs to extracted blogs for the proposed system were higher than those for the baseline system (P<.05). In case of Hiroshima, since many pictures corresponding to place keywords were not visualized in the proposed system, it was difficult to judge whether events were written in each blog or not. However, in case of the other queries, the rates for the proposed system were higher than those for the baseline system. Therefore, we confirmed that the proposed system is more efficiently to obtain personal experiences. 4.2 Efficiency of Pictures for Choosing Blogs In summaries of the used blogs for the experiment, some of them have descriptions about events (for example, I went to a park) and others do not have such descriptions. On the other hand, in texts of the used blogs for the experiment, some of them have descriptions about personal experience and others do not have such descriptions. Therefore, we divided the used blogs into four patterns by using the above two features. The results are shown in Table 5. Though 61% blogs for all of them (sum of pattern (1) and pattern (2)) can be judged using the baseline system, 39% blogs for all of them (sum of pattern (3) and pattern (4)) can not be judged using the baseline system. In case of pattern (3), users of the baseline system do not read blogs including personal experiences. In case of pattern (4), users of the baseline system read the blogs not including personal experiences. However, in case of pattern (3), the proposed system outputs some pictures like Fig. 2, therefore, the users can judge that personal experiences are written in the blog though it is not written in the summary. In case of pattern (4), the proposed system outputs some pictures like Fig. 3, therefore, the users can judge that personal experiences are not written in the blog.
Event Extraction and Visualization for Obtaining Personal Experiences from Blogs
321
Table 5. Number of blogs divided by two features. One feature is whether events are written in summaries or not. Another feature is whether personal experiences are written in blogs or not. Pattern Event : Personal Experience Okinawa Tokyo Hiroshima Nigata Hokkaido School festival Average Rate to all of the blogs
(1) O:O
(2) X:X 31 15 29 11 20 35 23.5 61%
30 60 28 61 26 20 37.5
(3) X:O
(4) O:X 20 20 21 8 28 33 21.6 39%
19 5 22 20 26 12 17.3
Fig. 2. Example of a blog. Events were not written in the summary, but personal experiences were written in the blog. The queries were school festival AND shop for eating.
Fig. 3. Example of blog downloaded using queries Hokkaido AND sight seeing. An event is written in the summary, but personal experiences are not written in the blog.
Table 6 shows the number of blogs divided two features: whether pictures visualize events or not, and whether personal experiences are included in blogs or not. The blogs divided into pattern (3) and pattern (4) in Table. 5 are corresponded to pattern (c) and pattern (d) in Table 6. The sum of pattern (c) and pattern (d) was 22.2%. The sum was lower than the sum of pattern (3) and pattern (4) (39.0%). It is considered that the participants can obtain more texts about personal experience by watching the pictures.
322
Y. Nishihara, K. Sato, and W. Sunayama
Table 6. Number of blogs divided with two features. The first feature is whether the pictures visualize events or not. The second feature is whether personal experiences are written in blogs or not. Pattern Event pictures: Personal experience Okinawa Tokyo Hiroshima Nigata Hokkaido School festival Average Rate to all of blogs
(a) O:O
(b) X:X
40 28 52 9 39 41 34.8 77.8%
(c) X:O
42 52 35 70 41 28 43.0
(d) O:X
5 7 7 7 5 4 5.8 22.2%
13 13 16 14 15 27 16.3
Table 7. Number of blogs including personal experiences School festival 68
Okinawa 51
Hiroshima 50
Hokkaido 48
Tokyo 35
Nigata 19
Average 45.1
Even when the number of the blogs including personal experiences was low, the participants for the proposed system extracted texts about personal experience efficiently. Table 7 shows the number of blogs including texts about personal experience. In case of Nigata, the number of blogs including personal experiences was 19. Therefore, the rate of extracted blogs to read blogs was low using the baseline system (the rate was 0.28 shown in Table 4). However, using the proposed system, the rate was 0.38, and bigger than that of the baseline system (P<.05). This is because the pictures of the proposed system support users for judging whether events are written in blogs or not. The results indicate that the proposed system can extract texts about personal experience even if the number of personal experiences is low. 4.3 Overview Efficient by Event Visualizing Pictures Users of the proposed system watched the outputs in different way from users of the baseline system. In questionnaire to 36 participants of the proposed system, 22 participants answered they watched the whole of outputs, four participants answered they watched the outputs sequentially, and 10 participants answered they watched both of them, the whole of outputs and watch the outputs sequentially. This is because users of the proposed system can understand the contents of blogs at a glance. On the other hand, all of the users of the baseline system answered they watched the outputs sequentially. This is because users of the baseline system can not understand the contents of blogs at a glance. The result indicates that users of the proposed system tend to watch the whole of outputs. The proposed system did not output all of pictures corresponding to extracted keywords. 17 pictures for place keywords, 12 pictures for object keywords, and 9 pictures for action keywords were not output by the proposed system. The number of blogs without pictures was 81. If there are blanks in the output, the system should
Event Extraction and Visualization for Obtaining Personal Experiences from Blogs
323
Table 8. Number of participants using combination of pictures Place O O O O X X X
Object X X O O X O O
Action X O X O O O X
Number of participants 19 6 4 4 2 1 0
output the extracted keywords only. However, it will cause the low efficiency of obtaining personal experiences with two reasons. The first reason is that users can not watch the whole of outputs and can not quickly judge whether texts about personal experiences are written or not if the extracted keywords and pictures are shown together in a Web browser. The second reason is that the time to understand the contents of the extracted keywords is longer than the time to understand the contents of pictures [6, 10]. Therefore, it is considered that it costs much time by showing the extracted keywords instead of pictures. On using pictures for the system output, if the contents are not described simply in pictures, it takes much time to understand the contents of them. Though some of the participants answered that it took much time to understand the contents of the pictures, they pointed that to a part of the pictures, not all of the pictures. The result indicates that showing the pictures supports users for obtaining texts about personal experience. 4.4 Efficiency of Pictures Visualizing Place Keyword We asked the participants of the proposed system which combination of pictures they used for extracting texts about personal experience. Table 8 shows the result. The number of the participants who used pictures corresponding to place keywords was the highest. It is considered that when users judge whether events are written in blogs or not, pictures of place keywords (e.g. mountain, sea, and river) are more useful than pictures of action keywords (e.g. walk, run, and watch) and pictures of object keywords (e.g. book, bicycle, and dish). Therefore, most of the participants used pictures of place keywords. The result indicates that pictures of place keywords are most useful in obtaining personal experiences. In Table 8, though the number of the participants using pictures of place keywords was the highest (19 participants), 17 participants used another combinations of pictures. It is considered that pictures of object keywords and pictures of action keywords are also useful for obtaining personal experiences. Therefore, it is necessary to visualize the three pictures to obtain personal experiences.
5 Conclusion This paper has proposed a support system for obtaining personal experiences from blogs using event visualizing pictures. The system visualizes an event with three pictures,
324
Y. Nishihara, K. Sato, and W. Sunayama
place, object, and action. Users judge whether texts about personal experience are written in blog or not by watching the output pictures. Experimental results showed that the proposed system can support users efficiently to obtaining personal experiences.
References 1. Blog search engine, http://blogs.yahoo.co.jp/ 2. Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 1411–1428 (2006) 3. Dave, K., Lawrence, S., Pennock, D.M.: Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In: Proc. of the 12th International World Wide Web Conference, pp. 519–528 (2003) 4. Fujiki, T., Nanno, T., Suzuki, Y., Okumura, M.: Identification of Bursts in a Document Stream. In: Workshop on Knowledge Discovery in Data Streams (2004) 5. Glance, N.S., Hurst, M., Tomokiyo, T.: BlogPulse: Automated Trend Discovery for Weblogs. In: Proc. of the WWW 2004 Workshop on the Weblogging Ecosystem (2004) 6. Hulbert, S., Beers, J., Fowler, P.: Motorists’ Understanding of Traffic Control Devices. AAA Foundation for Traffic Safety (1979) 7. Image search engine, http://search.yahoo.co.jp/images 8. Kleinberg, J.: Bursty and Hierarchical Structure in Streams. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1–25 (2002) 9. Noro, T., Inui, T., Takamura, H., Okumura, M.: Time Period Identification of Events in Text. In: Proc. of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), pp. 1153–1160 (2006) 10. Pietrucha, M., Knoblauch, R.: Motorists’ Comprehension of Regulatory, Warning and Symbol Signs, vol.2, Technical Report Contract DTFH61-83-C-00136, FHWA, U.S. Department of Transportation (1985) 11. Turney, P.D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 417–424 (2002) 12. Web page, http://search.biglobe.ne.jp/ranking/ 13. Web page, http://blog360.jp/ 14. Web page, http://blogsphere.biz/ 15. Web page, http://opinion.labs.goo.ne.jp/cgi-bin/index.cgi 16. Web page, http://kizasi.jp/ 17. Web page, http://shopping.nifty.com/ 18. Web page, http://shooti.jp
Minato: Integrated Visualization Environment for Embedded Systems Learning Yosuke Nishino1 and Eiichi Hayakawa2 1
Tokyo metropolitan Fuchu technical high school 2-19 Wakamatsu Fuchu, Tokyo, Japan [email protected] 2 Takushoku University 815-1 Tate Hachioji, Tokyo, Japan {Yosuken,hayakawa}@cs.takushoku-u.ac.jp Abstract. This paper describes the modeling, the development and the evaluation of embedded system study environment using robot and a note on experiment at technical high school. This paper discusses about the following points:(1) visualizing the behavior of embedded system in synchronization with the robot’s behavior, (2) integrating environment from concept based learning to implementation based learning, and (3) validating the efficiency of the system through the lecture and the evaluation at technical high school. This report is a summary of this environment, learning courseware of embedded system and research result at study of a technical high school. Keywords: Educations, Robot, Programming.
1 Introduction Education of embedded system is indispensable for students who learn computer science or information technology. Robot is often used in learning of embedded system. The embedded system education that uses robots attracts learners’ interests easily, and is effective as the teaching material that can maintain their motivation. However, it is difficult for existing educational environment using robot to support embedded system study. Since existing embedded system is invisible, system’s internal process within a robot is difficult to understand. In order to solve the problem of the existing environment, we developed the robot programming education support environment by visualization based debugging and by simulation that uses movie. This environment called “MINATO” integrates visualized educational support of the system software which learners can utilize from application layer to hardware layer based on their course needs. For this study, we have evaluated this environment about the understanding of embedded system’s behavior compared with previous methods at technical high school.
2 Embedded System Education at Technical High School The computer science department of the technical high school provides education concerning a wide variety of information technologies. Especially, needs in fields, M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 325–333, 2009. © Springer-Verlag Berlin Heidelberg 2009
326
Y. Nishino and E. Hayakawa
such as the control programming and embedded systems, have risen as seen in the development of the built-in equipment and the upsurge of ROBOCON. However, there are currently few control programming teaching materials which support to teach embedded system. The existing teaching materials do not help students much to understand continuously a mounted control programming and the theoretical system at the concept level. Therefore, it seems that the existing materials have not made remarkable result in the field of education. It would be because there are not ideal educational environment that helps from introductive education to practical education. As another reason, it is difficult for learners to image internal process and structure of the embedded system visually and intuitively. In the following section, we describe requirements to analyze the learning environment for the embedded system at educational stage at the technical high school. Visual Support. For educational environment using a conventional textbook, it is impossible to display the state transition dynamically. Therefore, and it is difficult for learners to follow the state transition. In other words, students have to guess internal process with little information. In order to understand the process intuitively, they need a lot of mounted knowledge. Moreover, the student doesn't attach the image of operation, and not obtain actually feeling of understanding because he or she cannot chase the appearance of the state transition by the explanation that uses static figure by the textbook. Therefore, the environment which can express internal process by movie could be useful. Integration. Currently, existing environments are intended for only certain level, either introductory or practical level. Thus, learners should spend much time to study how to use the environment and to become accustomed to it although the primary objective of the class is to have the learners to study control programming. As a result, the students should become accustomed to the support environment in limited class time. Also, it is time-consuming task for teachers to construct the environment by themselves. Therefore, an integrated environment that can consistently study control programming based on from bare hardware to realtime operating system is required. Greediness for Learning. In technical high school, students often practice with equipments and tools actually used in the professional fields. The students learn not only conceptual understanding of the control but also the usage of the practical equipments. For control programming teaching, bare hardware such as circuit board and wiring is used in most educational settings. However, these existing equipments for educational use are not appropriate to motivate the students since those are sometimes too complex and those appearances are not attractive for them. Moreover, these equipments are not dynamically active. According to these backgrounds, education system could improve students’ willingness to learn with a teaching material such as robots.
3 Research Objective The research objective is development of embedded system education support environment that synchronizes the behavior of robot and visualization of system
Minato: Integrated Visualization Environment for Embedded Systems Learning
327
software. Also, the research explores if learners who studies computer science can maintain their motivation of study by using simple robot materials and visualizations. The requirements of the learning environment at the embedded system education are as follows: 1. Learner can use the environment visually and intuitively 2. Learner can study from introduction to implementation on an integrated environment, and 3. Learners can maintain their motivation to study by active motion of the robot
4 Features of MINATO The educational concepts of MINATO are as follows: Utilizing Synchronized Visualization. MINATO presents monitoring tools that visualize system’s internal process, physical phenomenon of robot using movie and I/O data from/to devices synchronously. The learner can intuitively understand the relationship between the program and the devices through the tools. Furthermore, interrupts from devices and access from/to devices are displayed as simple diagrams. All the data including robot movie are logged in the system and able to replay and trace repeatedly until the students can find bugs. CPU and 3D robot simulators are also utilized to verify the program which the learner writes. Using logged data as input, it is easy to check the behavior of program without robot even at home. Supporting Various Robots. MINATO presents various robots supporting framework which meet needs of various type of schools such as junior/senior high school, university or industry. In junior and senior high school, simple and indestructible robots are essential. In the university and technical introduction course need rich resources, many sensors, powerful CPU, large memory or broader communication bandwidth, are necessary. We support various robots, LEGO Mindstorms NXT[1], and MINATO robot [3] we developed, and also support various communication devices, Wi-Fi, Bluetooth and ZigBee. The system is independent of OS using Java. Integrated Visualization and Programming Environment. MINATO presents integrated environment which are capable of visualizing, editing, monitoring and robot controlling in one window. From our user experience at high school and university, simple, light and easy operation is strongly required. The existing environment like Eclipse is too complex and too heavy for preliminary learners. The learners can use the all the system functions without switching windows or desktop to utilize simple and single window interfaces. 1USB/1CD Based Easy Installation. MINATO can be booted with only 1USB/1CD on Ubuntu linux. Java, capture device drivers and GUI libraries are already installed and learners or lecturers aren’t required to append and configure other software. In educational field, financial and management costs are also one of the important
328
Y. Nishino and E. Hayakawa
issues. In addition, the lecturers requires easily and quickly reboot entire system at every lecture or classroom to prevent the influence of others operation. Presenting Material Templates. MINATO presents many learning materials and templates for lecturers to build their original material. Real-time system and Real-time OS courseware are prepared for high school, university and technical introductory courses.
5 Implementation Figure 1 shows a screenshot of the monitoring tool. Movie (upper-left), input/output devices (right side), source code (lower-left) and device history (lowest horizontal) are displayed within one window. Vertical red line shows current focus of data. All the data is synchronous transition. The learners can control robot with this panel to transfer the user/system program, reboot the system on the robot, acquire the log data, and start and stop the robot. Using 3D simulator, the movie panel is changed to 3D display panel. The learner can use other monitoring tools without switching the window.
Fig. 1. Screenshot of the MINATO monitoring system
Minato: Integrated Visualization Environment for Embedded Systems Learning
329
The existing robot was not able to reproduce operation. MINATO has linked internal process of the robot to behavior. And, behavior is recorded with the Web camera. As a result, the movement of the robot can be reproduced.
6 Practice and Consideration at Technical High School 6.1 Practice at Technical High School We operated MINATO in technical high school practices in order to verify its effectiveness. In the previous study, it has already shown effectiveness of the environment at the OS introductory education, and effectiveness of running and initial cost for the environment within 1CD/1USB. This experiment verifies the effectiveness of the robot teaching material and each tool of MINATO in the control programming study that uses the robot. The students and the target lecture of the experimental class are as follows. Object school: Tokyo metropolitan Fuchu technical high school Subject name: Computer Practical Training (Grade 12) Number of participants: 24 students Number of class: Eight times from January to September (50 minutes each class time) Student's background knowledge: They had learnt the programming and the control since their entered the school. They had learnt the control programming using an arm type robot in previous practices. This is their first time to study control programming using autonomous robots. The aim of this class: This class aims to understand the relationship between the control and the programming with an autonomous robot compared with the previous practice concerning the control, to develop program with limited resource, and to understand the embedded system including the realtime operating system. Especially, it has aimed to learn both control and programming overall, and to tie them up to understand the concept of the embedded system which is dealt with in the next school term. Also the students learned the equipment control in previous practices such as the FA control using arm robot and sequence control using elevators. Since the teaching materials for the equipment control dose not show strong relationship between control and programming, it is difficult for the students to link the program and the behavior. Experiment and control group: We implemented evaluation experiment to verify the effectiveness of MINATO. For the experiment, the students were divided into two groups: one group using MINATO as an experiment group and the other using ROBOLAB as a control group. Both groups have academically even level. The lecture contents are about how to use of NXT, an outline of the control programming, how to use various sensors, and a programming corresponding to the input sensor value. Since the experiment used a block called LEGO which has high degree of freedom, we used the same structure of the block robot for both the experiment and control groups in order to avoid excessive intervention to the hardware. Because the students would try to manage problems by remodeling blocks when they confront the bags, we restricted them to not change the structure of the
330
Y. Nishino and E. Hayakawa
blocks in order to facilitate their study about the control programming and the understanding of the embedded system in this experiment. 6.2 Class Observation This section describes students’ performance in practices using two different environments, ROBOLAB and MINATO, at a technical high school. ROBOLAB. In practice that used ROBOLAB, even students with especially low knowledge of the programming could understood the programming interface in a short time thanks to a tile programming environment that enable them to understand intuitively. However, the robot behavior they imagined and the behavior it actually moved were sometimes quite different, and they changed the programming and the parameter many times in order to move it correctly. Also, even if quite different behavior appeared, some of the students did not understand why it happed because they did not understand logic in the first place. This would be because they did not understand the essence of the control programming or they could not image the internal process. In addition, it was an inefficient practice. Because there is no reproducibility in robot behavior, external causes, such as the state of the battery, the lighting and start position, made the students to fix the program whenever error occurred. MINATO. On the other hand, it did not seen that the students moved robots easily as seen in the ROBOLAB class because the students programmed using Java language. However, since this environment is able to visualize the internal process and the sensor value and to reproduce behavior, the students were able to obtain expected results with several times debagging in fewer trial and error process. Practice sceneries are shown Figure 2 and 3.
Fig. 2. Practice scenery
Minato: Integrated Visualization Environment for Embedded Systems Learning
331
Fig. 3. Practice scenery
6.3 Evaluation Experiment The result was analyzed based on experimental control using problem solving. In concrete terms, the two groups were compared in the following two points; acceptance time to finish the problem of the line trace and the number of times to debag the program. Table 1 shows the result of the evaluation experiment. Table 1. shows the evaluation experiment result Group Minato (average 12 students) ROBOLAB (average 12 students)
Acceptance Time(minutes)
Number of times to debag
44
12.1
45.7
21.3
The group that uses MINATO completes the task with less Acceptance time and fewer numbers of trials. While the deference between both the Acceptance times is not significant, the numbers of trials are significantly different. The result shows that the features of MINATO mentioned above, such as the reproducibility of operation using movie and visualization of sensor value, had great influence on the number of trials. After the evaluation experiment, all members in the experimental group using MINATO answered questionnaire survey. The students were asked to write their opinion freely. In the questionnaire, one of the participants mentioned that he could understand the relationship among the program, internal process and robot behavior. Additionally, another participant answered that he could imagine how general embedded device are developed and operated. There are also positive feedbacks about the visualization of
332
Y. Nishino and E. Hayakawa
sensor value. On the other hand, the questionnaire shows, as negative feedbacks, programming using Java seems difficult for the students compared to tile programming environment (i.e. ROBOLAB). 6.4 Consideration Following points were revealed through the result of the evaluation experiment, class observation, and questionnaire. Effectiveness of integrated support of education MINATO. It has confirmed that the teaching material which integrates movie, sensor value, program promote understanding of the relation between control and programming, which is difficult for students to understand with the current teaching material. Especially, reproducibility of operation and visualization of internal process enhance students’ intuitive understanding of the relation. The evaluation experiment shows that students are attracted and motivated by the integrated and visualized environment, and robot. Even if students who failed and lost their motivation at the introductory level, they could become interested in the learning. Necessity of the review of the programming environment. Since the programming environment used Java in this time, the students had to acquire the programming skill using JAVA. However, it might be possible that if the environment uses easy-to-use programming tool such as tile programming, control programming education could be more effective.
7 Conclusion In this paper, we designed and implemented the embedded systems learning environment that used the robot, and described efficacy of MINATO through the evaluation experiments in the educational field. MINATO can make the behavior of the built-in device visible in various aspects such as system data and physical phenomenon that synchronize with the robot, movie, and internal process. Moreover, through the evaluation experiment which was conducted at the technical high school, it was confirmed that this environment was effective to teach the embedded systems with robot material in the introductory course. The user interface using the integrated window was also effective for the students. Because MINATO is available with 1CD or 1USB on Linux, it is possible to reduce system management and financial costs. As future tasks, it is necessary to restructure programming environment. Moreover, providing the environment to educational field enables to receive more feedback. Acknowledgement. This work was supported by a Grant-in-Aid for Scientific Research on No.19500822 from the Ministry of Education, Culture, Sports and Technology of Japan.
References 1. LEGO Mindstorms, http://mindstorms.lego.com/japan/ 2. Solorzano, J.: lejOS (2003), http://lejos.sourceforge.net/
Minato: Integrated Visualization Environment for Embedded Systems Learning
333
3. Nishino, Y., Yoshida, M., Osumi, K., Tanaka, Y., Sugita, K., Hayakawa, E.: Robot Based System Programming Learning Support Environment. In: HCI 2005 International Conference on Human Computer Interaction (2005) 4. Nishino, Y., Hayakawa, E.: Development of an OS Visualization System for Learning Systems Programming. In: HCI 2003 International Conference on Human Computer Interaction (2003) 5. Yoshlda, M., Yamamoto, S., Nishino, Y., Hayakawa, E.: Realization of a support environment for learning robot programming: The institute of lectronics, Technical report of IEICE (2004)
Batik KR Semantic Network: Visualizations of Creative Process and Design Knowledge for the Malaysian Batik Designers’ Community Ariza Nordin1, Nor Laila Md. Noor1, and Ahmad Zainuddin2 1
Systems Science Department, Faculty Information Technology and Quantitative Sciences Universiti Teknologi MARA, Shah Alam, Malaysia [email protected], [email protected] 2 Ministry of Higher Education, Malaysia [email protected]
Abstract. Designing batik, a decorative textile is guided by insights acquired from the confluence of design and heritage knowledge accompanied by cultural and aesthetic constraints, resulting in the preservation of the designer’s and regional identity embedded in the design artifact. Insights and inspiration are gained from stories, non-textual references of images and photographs from repositories of knowledge such as books and the Web, objects of nature, environmental phenomena, fashion trend and human events. In addition evaluating existing products may lead to the possibility of inspired innovation or repetition of successful design solutions. With surplus on inspirational data available today, batik designers requires knowledge visualization to gain insights for designing task. Reporting a qualitative approach, this paper described our findings as the Batik Knowledge Repository (KR) semantic network to enable knowledge visualization of creative process (task) and design (domain) knowledge for batik textile designers’ community. Keywords: aesthetic, design knowledge, semantic network, storytelling, creative process.
1 Introduction The task of designing batik textile has evolved from task inherited by generations of batik maker families to team effort within an organization to produce batik textile for a more diversified consumer communities. However, it is still guided by a set of directives and intuitions reflecting a confluence of design and heritage knowledge confined to specific community culture and aesthetic constraints, resulting in the preservation of the designer’s and regional identity. Batik textile is crafted by dyeing fabric using resist technique visualizing batik patterns consisting of one or a combination of motifs. The batik makers community exist in Southeast Asia with Indonesia as the pioneer in this wax-resist technique method of decorating fabric, responsible in bringing batik to a respectful height with associated identity as a manifestation of the region’s culture and heritage knowledge. Batik makers in M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 334–341, 2009. © Springer-Verlag Berlin Heidelberg 2009
Batik KR Semantic Network: Visualizations of Creative Process and Design Knowledge
335
Malaysia exist since early 1900 dominantly located in the east coast region of Malaysian Peninsular. In our research case, we observed our local batik designers relying heavily on integration of knowledge gained by experience and knowledge acquired explicitly from mentor stories, related non-textual references such as photographs and images from books and other repositories of knowledge including individual or collection of batik textiles. They are inspired by objects of nature, environmental phenomena, fashion trend and human events. Today, the rapid growth of images and information repositories on the Web have made available surplus of knowledge sources and inspirational data (images). Subsequently, with easy accessibility to digital images of existing products exhibited online, the effort of evaluating existing products may lead to the possibility of inspired innovations and repetition of successful solutions. Hence the workflow of batik textile designing process requires visualization of the related knowledge both from internal and external repositories. Visualizing the right inspirational data and preferred information will assist designers to create designs conforming to specific culture and aesthetic preferences more efficiently. Guided visualization enabling the right information in context on cultural and aesthetic constrains will result in product innovation and heritage preservation. We aimed to construct a semantic network of creative process and design knowledge acquired from local batik designers and locally produced batik images as exemplars for a Batik Knowledge Repository (KR).
2 Creative Process and Design Knowledge In this paper we shall use creativity and creative process interchangeably. Creative process or creativity has been claimed to be knowledge intensive [5, 8, 9, 10] involving few stages which varies among individuals or from situation to situation [6, 8, 12]. Plsek [12] stated that reviewers discovered more than eight (8) models of creativity are developed since 1908. The Wallas model of creative process [8, 12, 15] as the earliest, proposed the process involved four (4) steps of preparation, incubation, illumination and verification. This early model has become the basis for most of the creative thinking training programs today [9, 10]. Creativity has been defined by the psychological discipline as production of an idea, action or object that is new or valued dependable of domain-relevant knowledge, invoking change, involving mental processes and demand appropriateness which is tacit [6, 8,12,15]. Borden [1] differentiated two (2) kind of creativity; for P-Creativity “an idea is P-creative if it is valuable, and the person in whose mind it arises could not have it before”; “the relation holds whether or not the idea has been had before” and H-Creativity as “the idea is not only P-creative but also must never have been had by anyone else in all human history”. However, creative is dependent on the cultural context [6]. Stenberg and Kaufman [6] reviewed published researches of creativity to conclude that creativity research is difficult and not mainstream. Design knowledge is known to be synonymic to tacit knowledge as designers having difficulty to articulate what they are thinking as it is beyond boundaries of verbal discourse [3]. From the psychological perspective it is learned in relations to situations [3, 7] involving mental models.
336
A. Nordin, N.L. Md. Noor, and A. Zainuddin
Resulting from the challenges both creative process and design knowledge portrayed, related researches addressing knowledge transfer of creative process and design knowledge are very limited. With less than sufficient discoveries, efforts to utilize these researches into practical used for the benefit of the designer communities has venture into using technology to enable an alternative medium for knowledge sharing and transferring activities. Ogawa, Nagai and Ikeda [11] investigated explanation style of artistic-idea for specific team of costume design in Japan in their quest to build design support computer system for designing textile.
3 Visualization of Creative Process and Design Knowledge Knowledge visualization encourages creation of new knowledge and knowledge transfer [2, 4]. Visualization allows presentation of the knowledge according to different context, handling by different categories of users [2]. A knowledge visualization framework comprised of three perspectives addressing the type of knowledge to be visualized (object), the visualization goal (purpose of visualization) and the visualization format (method of representation) [2]. Figure 1 shows an extended knowledge visualization model [4] re-emphasizing the value of visualization [14] to justify the development of the Batik KR Semantic Network in promoting knowledge transfer and creation among batik designers.
K e1 D
. . .
I V
Kt
dS/dt
K en S data/Knowledge
dK/dt
P
visualization
E user
Fig. 1. Knowledge Visualization Model [4] based on van Wijk’s Model [14]
4 Research Method To achieve the research aim we proceed with an attempt to acquire knowledge of related vocabulary, cultural and heritage perception in developing ideas for batik designs. For generations this knowledge has been passed through storytelling and mentoring process. At initial stage we attended knowledge sharing sessions with targeted dual objectives. The first is to build social network with the batik making community members in order to allow informal discussion and interviews. The second is to understand the process, the communication style, and issues currently in practice of batik making. Subsequently seven (7) case of local batik designers are selected for observations and interviews. The processes of identifying, locating and profiling and selecting were conducted with the assistance from the National Museum representative as custodian batik textile knowledge, the custodian of research and development of batik artefact in
Batik KR Semantic Network: Visualizations of Creative Process and Design Knowledge
337
Malaysia, Batik International Research and Design Access (BIRDA) and the research team. We grouped case studies into apprentice cases (3), designer cases (3) and advocator cases (3). During observations and interviews, photographs of products and videos of processes are captured for qualitative analysis using Atlas Ti. Each data collected for each case are analysed simultaneously adapting the principles of grounded theory. We analyze the explicit and implicit elements contributing to concepts and semantic relation [7] Group sessions were conducted for knowledge validation. Knowledge of batik designing consists of concepts and relations related to the designer, the design artifact and design aesthetics in performing the process of designing. Upon validation, a hybrid semantic network of definitional and implicational networks [7, 13] emerged as result of linking concepts, entities (instances of concepts) with meaningful relations as discussed in the next subsection.
5 Batik Knowledge Repository (KR) Semantic Network The Batik KR is intended for visualization of knowledge, as the semantic network accurately navigates knowledge workers to search and retrieve information at the right time, in the right place and within the context of their task and problem solving. Knowledge Structures SN-1 SN-2 Images & Tag
Post
SN-n
Designer
Images/ Comment/ Document
Request
Workbench
Insert
. SN-3 . . .
e-Gallery
Images & Tag
Community Portal Services
Sources of inspiration Data/information
Metadata Databases
Fabric
Colors & Dye
Images Repository
Motif
Explicit
Motif Inspirations
Exemplars
Queries
Images/ Comment/ Document Visualization application & User interface Batik Knowledge Repository (KR)
Fig. 2. Batik KR Architecture
Based on the case studies, we assert that creative process of the Malaysian Batik designers is initiated by applying explicit and tacit knowledge to a specific design problem resulting in emergence of aesthetic elements that are constrained by cultural context. The apprentices told their stories of designing experience and embarking on their creative process by relying heavily to explicit knowledge acquired through learning and tacit knowledge acquired undergoing mentoring; the designers elaborated the stories of utilizing specific tacit knowledge gained through experience with the consideration of a specific design problem while the advocators group highly considered case studies, historical and heritage knowledge to guide creative process. All the groups consensually agree that conversion of tacit design knowledge into explicit formats, such as sketches, models, and drawings is crucial in creative process. These explicit expressions are partial reflection of the maker’s tacit design knowledge providing a medium for aesthetic elements realization.
338
A. Nordin, N.L. Md. Noor, and A. Zainuddin
The consistencies of participants’ case stories and concepts from documentations agree that the semantic network visualizes the concepts of finesse, uniqueness (exclusivity) and fitness (culturally constrained) having an active relation to creative process manifestation, cultural traits and aesthetic constrains. The emerging concepts are traits of a visual story. Hence perceptions are to be derived from a semantic interpretation of the visual story created by the maker’s by composition of motifs and colours. On the other hand, an associative relationship is identified between the visual story with the designer’s traits; sincerity and fidelity in plotting a visual story. Design Artifact. The design artifact has several categories namely the traditional design artifact and contemporary design artifact. The contemporary design artifacts are divided into two (2) known categories; the traditionally inspired and the abstract design. The design artifact has aesthetics elements, principles of arrangement (aesthetics) and aesthetics values.
SN D-1
Fig. 3. A definitional semantic network for design artifact
The design is the form and content, it consists of a sincere manifestation of the designer’s conception, embedded with tacit and explicit knowledge, expressed with fidelity (passion and patient).The heritage is the visual story as the non-discursive manifestation of the designer’s conception. The stage of clarity of the visual story can be classified into three (3) clusters. The first cluster defined the visual story as a narrative (having clear traceability of theme and arrangement). The second cluster defines the visual story to be a story (possible theme or arrangement traceability). The third projected the chaotic pattern cluster having no theme or arrangement traceability. Designer. The designer is the creator of the design and the holder of artistic idea [11]. As a member of the community of batik makers, the designers directly involved in producing a design playing the role of a mentor or an apprentice. The designer has personality. Good personality traits associated with designers are sincerity and fidelity which implies positive aesthetics of their idea development.
Batik KR Semantic Network: Visualizations of Creative Process and Design Knowledge
Fig. 4. Instance of Semantic Network for Batik Traditionally Inspired Design
Fig. 5. Implicational Semantic Network of Creative Process
339
340
A. Nordin, N.L. Md. Noor, and A. Zainuddin
Design Aesthetics. Finesse (df) is conception clarity (c), degree of sincerity (s) and degree of fidelity of the designer (fd). df = k(c, f, s) where c = f(t, l ), s=g (o, p), fd = h(a, q, u)
(1)
where [t : Theme, l : Storyline, o : Originality of Theme, p : Composition of Storyline, a : Theme Accuracy, q : Composition Adequacy, u : Technique Unity]
6 Conclusion Concluding this paper, we highlight our aim to construct the semantic network of creative process and design knowledge based on acquired knowledge from local batik designers and locally produced batik images for a Batik Knowledge Repository (KR) has been achieved. The initiative is aimed to enable a future work of developing knowledge visualization application such as an e-Gallery or Designer Workbench for batik designing. The Batik Knowledge Repository promotes a new approach of visualization emphasizing on relevant design knowledge. We begin by developing the definitional networks for batik design artifact to organize the relevant design knowledge, and then we build the implicational networks to visualize the creative process. Finally, we elaborate additional constrains pertaining to the semantic networks produced. For our work, we look forward to develop relevant databases for batik design knowledge vocabulary and implement the knowledge repository as the structure of visualization application such that the e-Gallery or Designer Workbench. We believe that by utilizing the Batik Knowledge Repository structure, visualizations will be able to enhance designer’s existing tacit knowledge through understanding of the explicit design knowledge and creative process implied knowledge. The Batik Knowledge Repository and visualization application are significant efforts to encourage shifting the designers’ paradigm from cottage industry to production of crafted artifact by team effort in the era of knowledge economy.
References 1. Boden, M.: The Creative Mind: Myths and Mechanisms. In: Weidenfeld, Nicolson (eds.) Cardinal, London, UK (1992) 2. Burkhard, R.A.: Towards a Framework and a Model for Knowledge Visualization. In: Tergan, S.-O., Keller, T. (eds.) Knowledge and Information Visualization. LNCS, vol. 3426, pp. 238–255. Springer, Heidelberg (2005) 3. Daley, J.: Design Creativity and the Understanding of Objects. In: Cross, N. (ed.) Developments in Design Methodology. Wiley, Chichester (1984) 4. Dong, H.J., Chang, R., Ribarsky, W.: An Alternative Definition and Model for Knowledge Visualization. In: IEEE Visualization 2008 Workshop on Knowledge Assisted Visualization (2008) 5. Fuchs-Kittowski, F., Fuchs-Kittowski, K.: Knowledge-intensive work processes for creative learning organizations Forschungszentrum Karlsruhe GmbH, Institut für Technologiefolgenabschätzung und Systemanalyse -ITAS-; VDI/VDE: Innovations for an e-Society. Challenges for Technology Assessment, Berlin (2001)
Batik KR Semantic Network: Visualizations of Creative Process and Design Knowledge
341
6. Kaufman, J.C., Stenberg, R.J.: The International Handbook of Creativity. Cambridge University Press, Cambridge (2006) 7. Khoo, C., Na, J.C.: Semantic Relations in Information Science. Annual Review of Information Science and Technology 40, 157–228 (2006) 8. Lubart, T.I.: Models of the Creative Process: Past, Present and Future. Creativity Research Journal 13(3-4), 295–308 (2001) 9. Marjanovic, O., Seethamraju, R.: Understanding Knowledge-Intensive, Practice-Oriented Business Processes. In: Proceedings of the 41st, Hawaii International Conference on System Sciences, January, vol. (7-10), pp. 373–373 (2008) 10. Morita, J., Nagai, Y., Taura, T., Takeuchi, T.: A Study on Creativity in Comparison with Linguistic Interpretation. In: McNamara, D.S., Trafton, J.G. (eds.) Proceedings of the 29th Annual Cognitive Science Society Austin, TX: Cognitive Science Society, pp. 64–70 (2007), https://dspace.jaist.ac.jp/dspace/bitstream/10119/4034/ 1/32-2.pdf 11. Ogawa, T., Nagai, Y., Ikeda, M.: An Ontological Engineering Approach to Externalize Designers’ Communication Style in Support of Artistic idea Sharing. In: Proceedings of the International Association of Societies of Design Research (IASDR) on CD-ROM (2007) 12. Plsek Paul, E.: Working Paper: Models for the Creative Process (1996) (accessed October 11, 2008), http://www.directedcreativity.com/pages/WPModels.html 13. Sowa, J.F.: Knowledge representation: Logical, philosophical, and computational foundations. Brooks/Cole, Pacific Grove (2000) 14. VanWijk, J.J.: The value of visualization. In: Proc. IEEE Visualization 2005, pp. 79–86 (2005) 15. Wallas, G.: The Art of Thought. Harcourt Brace, New York (1926)
A Tool for Analyzing Categorical Data Visually with Granular Representation Kousuke Shiraishi, Kazuo Misue, and Jiro Tanaka Department of Computer Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan {shira,misue,jiro}@iplab.cs.tsukuba.ac.jp
Abstract. Categorical data appears in various places, and dealing with it has been a major concern in analysis fields. However, representing not only global trends but also local trends of data simultaneously by conventional techniques is difficult. We propose a visualization method called “granular representation” for analyzing categorical data visually. Our approach visually represents data as a set of objects and allows intuitive analysis instead of the traditional way with tables of numbers. We developed a tool by integrating granular representation and bar charts. The effectiveness of the tool is demonstrated using real data about media consumption. Keywords: categorical data, visualization, multi-dimensional analysis.
1 Introduction The visualization of categorical data occurs in various fields, such as mass media and marketing research. Visualizing categorical data has two purposes [1]. The first purpose is to present global trends or characteristics of data to the audiences of books, televisions, newspapers, and so on. In this case, the process of visualizing data is simple and facile because results are shown with conventional graphics, such as a bar chart or pie chart. The second purpose is to analyze data in detail to use it in research. For this, analyzing both global and local trends of data in further detail is necessary. In other words, discovering relations among two or more attributes is needed for multi dimensional analysis. Generally, this visualization is performed with spreadsheet software such as Microsoft Excel or statistical tools such as Statistical Package for the Social Sciences. However, analyzing intuitively from tables of numbers and letters is difficult, especially for casual users. We present a granular representation (GR) technique for analyzing categorical data visually. Our approach represents individual entities in data separately as small circles. Users are able to drill down data and analyze both global and local trends interactively.
2 Analyzing Categorical Data This section describes the general process of analyzing categorical data. An overview of the process is shown in Fig. 1. The process is divided into two parts: “tabulation” M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 342–351, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Tool for Analyzing Categorical Data Visually with Granular Representation
343
Fig. 1. Analyzing categorical data is divided into two parts: tabulation and visualization
and “visualization”. “Tabulation” forms a table consisting of the distribution of categories. These tables are called cross tables or contingency tables. Initially, most of the categorical data is provided as lists in which each record corresponds to an individual entity. This raw data is transformed into a contingency table by counting the frequencies of categories in attributes. This counting of frequencies is performed automatically. For instance, this calculation can be performed in Excel by simply selecting the attributes of interest. The second part of the process is “visualization”. As the name indicates, in this part, the table made in the preceding “tabulation” part is represented by visualization techniques. There are several types of visual representations available. Graphics such as a bar chart or pie chart are the most common as visual representations. Analysis is usually a recursive process that repeats “tabulation” and “visualization”. In these processes, if the number of attributes and categories increases, maintaining correspondence among elements in charts and tables is difficult, especially for casual users. Sometimes, categorical data may include an attribute that is difficult to categorize. An example is comments included in questionnaires and census data. Although knowing the tendency of comments as a whole is possible by reading all the comments in order, the important thing is to analyze the relations between comments and segments, such as “what segments of people make what comments”. However, conventional graphics, such as a bar chart or pie chart, are only able to represent global data trends. Analyzing from both the global and local sides simultaneously is difficult.
3 Related Work Many techniques have been developed to efficiently visualize categorical data. There are two approaches: frequency-based and quantification. Frequency-based techniques represent frequency values in contingency tables proportional to the size or length of the figures. Mosaic Display [2] represents the frequency values with rectangles proportional to the area and lays out the rectangles like tiles recursively. Cattrees [3] is also frequency-based visualization and uses Treemap [4]. These techniques are effective for visualizing a couple of attributes. However, more than three attributes leads to complicated visualizations.
344
K. Shiraishi, K. Misue, and J. Tanaka
Quantification techniques transform categorical data into graphical representations of quantitative data. Rosario et al. [7] proposed a quantification technique for categorical data and presented an example of visualization using parallel coordinates. Johansson et al. [8] suggested a quantification process for mixed data that contains categorical and continuous variables. Some techniques use parallel coordinates. Hammock Plot [5] corresponds frequency values with line widths using parallel coordinates. Parallel Sets [6] combines frequency-based techniques and parallel coordinates and offers interactive analysis. Dust&Magnet [9] is a visualization technique of multivariate analysis for quantitative data. Dust&Magnet represents each data point as a dot and uses a magnet metaphor for intuitive analysis. The magnitudes of the magnets correspond to the attribute values of each data point. Attracting data points with magnets spatially separates positions of each data point depending on attribute value. Similar to Dust&Magnet, our proposed GR technique represents individual records as visual elements. However, Dust&Magnet represents the differences of attribute values by the distances between data points. In contrast, GR represents the quantitative differences among collectives. Thus, several interactions and the interfaces of the tools are different.
4 Granular Representation Granular representation is the visualization technique we propose. Categorical data, due to its nature, can be thought of as sets of objects. For instance, questionnaires or census data consist of people’s opinions. The basic idea of our approach is to effectively represent that nature visually instead of as tables and to enables users to intuitively tabulate and visualize. Consider an example of a contingency table that consists of two attributes, “sex” and “opinion.” The “sex” attribute has two categories, “male” and “female,” and the opinion attribute has three categories, “agree,” “disagree,” and “undecided.” An example of visualizing this contingency table with GR is shown in Fig. 2. Small white or gray circles represent individual entities, that is, each circle can be thought of as a person in this example. These circles are called elements. Text labels represent the categories of an attribute, e.g., “opinion.” In this way, GR displays the frequency value of each cell as small circles like grains. Users are able to see information about each individual entity (element) by hovering the cursor over the element. See the rectangle on the right side of Fig. 2. This contains information about the element on the left far from the others. It means that this element is female and her opinion is “agree.” We tend to perceive spatially close objects as groups. This has been theorized as the law of proximity by psychologists. If we look at the GR on the right side of Fig. 2, we immediately see that there are three collectives of elements, and the labels reveal their categories. In this figure, the collective at the top is people who have an “undecided” opinion, the left is those who “agree”, and the right is those who “disagree”. Color is also good for representing categories of elements. Elements can be easily seen as groups if they have the same properties, such as color, shape, or texture. This perception is called the law of similarity. The elements in Fig. 2 are colored by their “sex” attributes; white represents females and black represents males. With color, the “sex” attribute is identified without a label. In this way, GR represents categories of elements by label and color.
A Tool for Analyzing Categorical Data Visually with Granular Representation
345
Fig. 2. Contingency table (left) represented by granular representation (right)
5 Interactions We explained the basis of GR in the previous section. Although the representations are static images, the formation of the element layout is animated. In this section, we introduce various interaction techniques used to control elements. Users can drill down interactively using these techniques. 5.1 Categorization by Label Users divide, i.e., “categorize”, elements into collectives to drill down data. In Fig. 2, elements are categorized into three collectives. Categorization of elements is performed using a label. Elements from the contingency table in Fig. 2 are represented by GR on the left side of Fig. 3. The elements are all collected together, so the female and male elements are mixed. To separate the male elements from this collective, the user drags the “male” label, and the elements having the “male” category follow because they are attracted towards the label (right side of Fig. 3). After the dragging stops, the elements are categorized into two collectives; the rightmost one is male and the other is female. Dust&Magnet supports a similar interaction. It uses this feature as the attraction of magnets. In contrast, GR uses this as the categorization of elements.
Fig. 3. When the “male” label is dragged, elements having the “male” category are attracted towards the label
5.2 Controlling Attraction of Label Dragging a category label attracts all the elements that have that category. However, categorizing elements by the relation of two or more dimensions with this method can be inconvenient. For instance, consider three collectives divided by the “opinion”
346
K. Shiraishi, K. Misue, and J. Tanaka
Fig. 4. (a) Attraction of a label without a line. (b) Attraction of a label using a line.
attribute, such as in Fig. 2. In this case, if the user drags the “female” label, elements that have the “female” category will be attracted from each of the three collectives, and then the attracted elements will collect together (Fig. 4(a)). To control the attraction of a label, the user draws a line to separate the collectives he or she does not want to be affected by attraction. In Fig. 4(b), the line separating the two collectives on the right side from the label ensures that attraction does not affect those collectives. A label attracts elements from only collectives on the left side of the line. In this way, a line allows users to categorize the relations of elements by more than two attributes to drill down. 5.3 Clusters Attributes usually contain many categories. Categorizing by labels is difficult because each category label must be dragged separately, due to the features of labels. Clustering automatically divides elements that have the same categories by the selected attributes, so the labels do not have to be manually controlled. Clustering by two attributes is shown in Fig. 5. The elements are divided into six clusters. A glance at the figure will reveal the differences in the number of elements at once. Clustering supports users in discovering data trends in advance of categorizing by label.
Fig. 5. Elements automatically clustered into six collectives by their categories
5.4 Merging Due to our approach representing each individual entity, visualizations are more complicated as the number of data entities is much larger than in other approaches. To
A Tool for Analyzing Categorical Data Visually with Granular Representation
347
deal with data containing large amounts of individuals, GR has a “merging” interaction that represents multiple entities as one element proportional in size to the number of entities. An example of a merging interaction is shown in Fig. 6. The 60 elements on the left side merge into 30 elements (the center) and finally merge into 15 elements. That is, in each merging interaction, two elements merge into one. GR can deal with data containing many records by reducing the number of elements as a whole by merging. Although the individual information of merged elements cannot be seen when the cursor is hovered over them due to the merged elements containing several elements, users are able to divide merged elements into their former states. Merging and dividing interactions can be frequently switched. Therefore, users can merge when analyzing the global side and divide to see individual information.
Fig. 6. Two elements merge into one element of proportional size
6 Construction and Implementation of Tool In this section, we talk about the interface of the tool developed with our approach. In section 2, we explained the process of analyzing categorical data as being roughly divided into two processes, “tabulation” and “visualization”. Categorization by label or clustering corresponds to the “tabulation” process, and comparing the number of elements among collectives corresponds to the “visualization” process. Although GR supports both tabulation and visualization, a bar chart is more suitable for comparing values. We designed an interface for the tool by integrating GR and a bar chart. A screenshot of the tool, which is divided into two views, is shown in Fig. 7. On the left is a settings and chart view, where users can control settings and where data are represented by bar charts. On the right is the main view, where data are represented by GR. The user mainly works in this view. To integrate these two views, linking and brushing techniques are applied in the tool. Linking connects the common elements between two different visualizations. Between the left and right views, the bar charts and colors of elements are linked. Brushing summarizes a part of all the elements. Users can make bar charts of a part of whole elements by selecting the desired part. 6.1 Drawing and Layout of Elements The layout of the elements is manipulated through a simple physical model, and they are frequently animated. There are two forces: a repulsive force among elements to
348
K. Shiraishi, K. Misue, and J. Tanaka
avoid occlusion and an attractive force to attract elements by labels. After the elements are dragged by their labels, they spread out in a circle due to the effects of their repulsive forces. Seeing elements as a large clustered circle helps users to compare numbers. For example, Fig. 7 shows two clusters, the lower one larger than the upper one. The attractive force of a label changes according to the distance between the label and elements. Thus, elements are attracted by a strong force when the distance is long.
7 Walkthrough Here we show an example of analysis using the tool. The data set used in this section is a media consumption survey for April to May, 2006 [10]. We chose eight attributes from the data set: two basic attributes, “sex” and “age”; four questionnaire attributes about the consumption of media, such as television, radio and the Internet; and two attributes with unique variables being comments about liking or disliking newspapers. In this example, we analyze “the difference between people who read newspapers and those who do not.” As the data set is read, all the individual records are represented as gray circles in the main view. By default, there are no bar charts on the right. Next, the user divides the elements into two collectives: one reads newspapers and the other does not. To divide the elements, the user chooses the “yes” and “no” labels of the “Do you happen to read any daily newspaper or newspapers regularly?” attribute from the left view. The “yes” and “no” labels then appear in the main view.
Fig. 7. Screenshot of tool
A Tool for Analyzing Categorical Data Visually with Granular Representation
349
Fig. 8. Elements are divided by ages. Upper bar charts represent values of 60s, 70s and 80s collective, and lower ones represent values of 20s and 30s collective.
As the user drags these labels, the elements are attracted towards their relevant labels and divide into two collectives. To analyze the difference of these two collectives, the user colors each collective differently and then selects each collective to make bar charts (Fig. 7). The chart view in Fig. 7 shows there could be a difference in opinion by “age”. Older people read newspapers more often than younger people, except people under 20 and those in their 90s. To analyze this trend in detail, the user further divides the elements by age. Here, the user obtains four collectives divided by age. Figure 8 shows that the elements are divided into four collectives and all the merged elements are released. The upper chart view shows the frequencies of the 20s and 30s collective and the lower one shows the frequencies of the 60s, 70s, and 80s collective. From the bar charts on the left, many differences between the two collectives can be seen. For instance, take the attribute “Do you happen to watch TV news programs regularly?” The “yes” rates of the 60s, 70s, and 80s collective are higher than those of the 20s and 30s collective. In other words, senior people get news from the TV more frequently than the young. In addition, it is possible to analyze the local trends between these two collectives by seeing the comments for each collective.
8 Discussion We believe that GR has three advantages, as we have analyzed several data with the tool using GR.
350
K. Shiraishi, K. Misue, and J. Tanaka
The first advantage is the simultaneous representability of both the quantitative and qualitative sides of data. The quantitative side means the global trends of the data, i.e., analyzing the frequency distribution by comparing the numbers of elements among the collectives. The qualitative side means the local trends. The user can analyze this by hovering the cursor over elements and seeing individual information. Traditional visualization techniques, such as a bar chart or pie chart, are only able to represent the global side of data. For instance, in the previous walkthrough, the elements were divided into five collectives by the age attribute. Seeing the individual information for each collective uncovers trends between segments and unique variables such as comments. The second advantage is the ability to represent absolute values. Frequency-based techniques represent the rates of two or more variables by comparing shape size or length of the representative figure. Although these techniques are suitable for comparing quantities, they cannot represent the absolute values of numbers. In contrast, GR represents individual entities as individual, independent elements; therefore, absolute values can be compared. The third advantage is intuitiveness. Most traditional analysis tools need special knowledge to use. In particular, casual users require a lot of time to understand the tools. GR visually represents the structure of data, and the interactive control of animated elements enables users to analyze the data intuitively.
9 Conclusion and Future Work We proposed granular representation (GR), a visualization method for analyzing categorical data, and several interactions that fit our approach. We designed an interface for a tool by integrating charts and GR to improve the effectiveness of both representations. An analysis example using real data was presented. Although we discussed the advantages of GR in the previous section, the validity of these advantages has not yet been evaluated. For our next work, we plan to evaluate the effectiveness and validity of the advantages.
References 1. Friendly, M.: Visualizing Categorical Data. Sas Inst. (2000) 2. Friendly, M.: Mosaic displays for multi-way contingency tables. American Statistical Association 89(425), 190–200 (1994) 3. Kolatchm, E., Weinstein, B.C.: Dynamic visualization of categorical data using treemaps (2001), http://www.cs.umd.edu/class/spring2001/cmsc838b/Project/ Kolatch_Weinstein/index.html 4. Johnson, B., Shneiderman, B.: Treemaps: A Space-Filling Approach to the Visualization of Hierarchical Information Structures. In: Proceedings of IEEE Information Visualization 1991, pp. 275–282 (1991) 5. Schonlau, M.: Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots. In: Proceedings of the Section on Statistical Graphics, American Statistical Association (2003)
A Tool for Analyzing Categorical Data Visually with Granular Representation
351
6. Card, S.K., Mackinlay, J.D., Shneiderman, B.: Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann Pub., San Francisco (1999) 7. Yi, J.S., Ponder, R.M., Stasko, J., Jacko, J.: Dust & Magnet: multivariate information visualization using a magnet metaphor. Information Visualization 4, 239–256 (2005) 8. Johansson, S., Jern, M., Johansson, J.: Interactive Quantification of Categorical Variables in Mixed Data Sets. In: Proceedings of IEEE International Conference on Information Visualization (IV 2008), pp. 3–10 (2008) 9. Rosario, G.E., Rundensteiner, E.A., Brown, D.C., Ward, M.O., Huang, S.: Mapping nominal values to numbers for effective visualization. In: Proceedings of the IEEE Symposium on Information Visualization 2003 (INFOVIS 2003), pp. 80–95 (2003) 10. Biennial Media Consumption 2006, Pew Research Center Data Archive (2006), http://people-press.org/dataarchive/
Understanding Key Attributes in Mobile Service: Kano Model Approach Seung Ik Baek1, Seung Kuk Paik2, and Weon Sang Yoo3 1
School of Business, Hanyang University, Seoul, Korea [email protected] 2 Systems & Operations Management Dept., California State University, Northridge, USA [email protected] 3 School of Business, Hanyang University, Seoul, Korea [email protected]
Abstract. This study investigated how customers perceive currently available 3G mobile services. More specifically, by using the Kano model, it tried to categorize them into five quality attributes: Attractive, One-Dimensional, MustBe, Indifferent, and Reverse. The results showed that picture messaging, instant messaging, navigational aid, and mobile internet are considered as “onedimensional quality attributes”. That is, the higher the level of fulfillment of these mobile services, the higher the customer’s satisfaction, and vice versa. Keywords: Multimedia Mobile Service, 3G Technology, Kano Model.
1 Introduction The mobile Internet can defined as a combination of Internet and mobile technology that allows users to access the Internet via their mobile devices, such as cellular phones and PDAs [8]. Since the mobile Internet can be accessed anytime and anywhere, users can use it in various contexts. Users of the traditional (wired) Internet can use it in mostly limited contexts, such as in an office or at home. Therefore, compared to the traditional (wired) Internet, the mobile Internet has greater potential to create more opportunities to deliver advanced services to users. Specifically, 3G technology enables mobile users to access various multimedia services across mobile networks at the high speed [9]. Still, mobile users could use Internet-like services by using the Wireless Application Protocol (WAP) in 2G mobile networks, but this 2G technology has provided very limited support for data services due to its low data transfer rate. Since the market for voice communications has reached its maturity, telecommunication companies must develop new businesses and markets [6] [10]. Although the growing penetration rate of 3G based mobile devices has increased the number of mobile Internet users rapidly, the main usages of the mobile Internet still stem from 2G based data services, such as Short Messaging Service (SMS), ring/avatar download service, and 2D stand-alone games [1]. Telecommunication companies have introduced many innovative multimedia services, but only early adopters use these pioneering services. The adoption rate of the mobile services is much slower than that of the traditional Internet services. The 3G-based mobile M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 355–364, 2009. © Springer-Verlag Berlin Heidelberg 2009
356
S. Ik Baek, S.K. Paik, and W.S. Yoo
services stimulate the curiosity of early adopters, but they cannot fully satisfy the high expectations and the various needs of majority [8]. Although users have 3G enabled mobile devices, they do not see urgent needs about 3G mobile services. In order to commercialize mobile services successfully, telecommunication companies need to discover customers’ needs and to develop attractive services that satisfy these needs.
2 Research Objectives Traditional studies on service quality proposed the one-dimensional quality model. This model assumes a linear relationship between the fulfillment of customers’ needs and their satisfaction or dissatisfaction experience. If perceived service quality exceeds customers’ expectation, it results in customer satisfaction. If not, it results in customer dissatisfaction [13]. One of the criticisms for the one-dimensional model is that it simplifies the real world too much [5]. For example, the communication quality of mobile phones might affect the customer’s satisfaction levels in the past, but it might not affect the customer’s satisfaction levels anymore. If the communication quality has problems, it might cause dissatisfaction. The communication quality of mobile phones is a service that causes dissatisfaction, not satisfaction. There are different service attributes that cause the customer satisfaction and that cause the customer dissatisfaction. Table 1. Service Attributes of Kano Model Attributes Attractive Attribute One-Dimensional Attribute Must-Be Attribute
Explanation Its fulfillment brings more than proportional satisfaction, but it does not bring dissatisfaction if it is not met. The higher level of fulfillment, the higher the customer satisfaction and vice versa.
Indifferent Attribute
If its fulfillment is insufficient, customers will be dissatisfied. Even if its fulfillment is sufficient, it will not cause customer satisfaction. Its fulfillment does not bring satisfaction and dissatisfaction.
Reverse Attribute
Its fulfillment causes dissatisfaction.
Source: Kano et al., 1984
In order to address this limitation, Kano suggests a two-dimensional model [7]. He emphasizes that satisfaction and dissatisfaction are two independent concepts and should be considered independently. He defines the service attributes which are related to only satisfaction levels as “Attractive Service Attribute”, the service attributes which are related to only dissatisfaction levels as “Must-be Service Attribute”, and the service attributes which can cause both satisfaction and dissatisfaction as “One Dimensional Attributes”. In addition, the service attributes, which customers do not care if they are or are not present, are defined as “Indifferent Service Attributes”. [Table 1] explains service attributes of the Kano model. Companies introduce various mobile services, but customers appreciate values from only a few of them. Customers might even neglect most of them. Thus, it is critical to identify services that result in satisfaction, dissatisfaction, or no interest. Companies need to allocate their resources to improve the quality of the services that affect customer satisfaction, and they need to eliminate the services that cause the dissatisfaction. In order to develop various strategies for mobile services, telecommunication companies need to know which services cause satisfaction, dissatisfaction, or no interest.
Understanding Key Attributes in Mobile Service: Kano Model Approach
357
Based on the discussion above, this study has the following research objectives. Research Objective 1) The first objective is to discover how customers perceive 3G mobile services, based on Kano model. Research Objective 2) The second objective is to understand how 3G mobile services are perceived differently between experienced users and non-experienced users.
3 Research Methodology The main purposes of this study are to explore how customers perceive 3G mobile services and what factors affect the customer perceptions. In order to answer the above question, we used the Kano questionnaire. In general, there are three steps to develop and administer the Kano questionnaire: identify service attributes, develop Kano questionnaire, and determine service attributes [13]. Step 1: Identify Service Attributes Currently, many studies have examined the adoption patterns for mobile services. Carlsson et al. (2005) grouped 13 mobile services into four categories (communication, entertainment, reservations and purchases, and information), and investigated their usage levels [4]. Nysveen et al. (2005) identified four major mobile services (text messaging services, contact services, payment services, and gaming services) and examined user intentions to the services [11]. The 2005/2006 National Technology Readiness Survey1 examined the usages of 17 mobile services that can be enhanced by 3G technology. [Table 2] summarizes the classifications of mobile services. The first step to develop the Kano questionnaire is to identify service attributes that will be grouped into five Kano attributes. Based on the mobile services which the above studies examined and which mobile operators currently provide, this study identifies 12 mobile services (Refer to Table 2). Step 2: Develop the Kano Questionnaire The Kano questionnaire consists of pairs of customer requirement questions: how do you feel if a feature is present (functional questions), and how do you feel if the same feature is not present (dysfunctional questions). Depending on customers’ responses on two types of questions (functional/dysfunctional questions), we determine whether a specific mobile service is an attractive attribute, an one-dimensional attribute, a must-be attribute, an indifferent attribute, or a reverse attribute. [Figure 1] shows a part of the Kano questionnaire. A customer answers choosing one of three alternatives (Satisfied, Neural, Dissatisfied) for functional questions and dysfunctional questions. Step 3: Determine Service Attributes Based on the responses to the functional and dysfunctional questions, each mobile service is classified into one of the six Kano categories. If the customer answers, for example, “I am satisfied” as regards “if you can order cinema tickets online” – the functional form of the question, and answers “I am neutral” as regards “if you cannot 1
“2005/2006 National Technology Readiness Survey,” http://www.rhsmith.umd.edu/ces/pdfs_docs/NTRS-2005-06.pdf
358
S. Ik Baek, S.K. Paik, and W.S. Yoo Table 2. Mobile Services
Nysveen et al. (2005) Text Messaging Contact Payment Gaming
Carlsson et al. (2005) Communication SMS MMS Mobile Email Entertainment Ring Tones Icons and Logos Listening to Music Games Resevations and Purchases M-Banking Payment Reservation of Movie Ticket etc. Making Reservations, Purchasing Flight/Train Tickets Shopping Information Personalized Information Messages Internet Browsing Checking Time Tables Location Based Services
National Technology Readiness Survey -
-
-
-
-
Text messaging Mobile Web or Internet Send and receive e-mail Picture messaging Bluetooth technology, which provides a wireless connection between devices over a short distance Play MP3 or other music files uploaded from your computer or another device you own Broadband Internet access Use GPS (Global Positioning technology) to get directions or navigate to a certain address or place of interest Download audio content over the air Video messaging Listen to live radio programming over the air Streaming music content over the air Watch video content uploaded from your computer or another device you own Watch live video programming over the air View streaming video content over the air Participate in live video conferences Download video content over the air onto your mobile phone or device
This Study -
Picture Messaging Mobile Shopping Instant Messaging Video Conferencing - Radio on Demand - Mobile TV - Video on Demand - Navigational Aid - Mobile Games - Mobile Banking - Mobile Internet - My Space (Social Networking)
Videoconferencing Service allows you to see and hear the person you are talking to in real-time and they will see and hear you. Have you used Videoconferencing Service (Yes/No)? [ ] If your cell phone HAS Videoconferencing service, how do you feel? [ ] 1. I am SATISFIED. 2. I am NEURAL. 3. I am DISSATISFIED If your cell phone DOES NOT HAVE Videoconferencing service, how do you feel? [ ] 1. I am SATISFIED. 2. I am NEURAL. 3. I am DISSATISFIED
Fig. 1. Kano Questionnaire
order cinema tickets online” – the dysfunctional form of the question, the combination of the answers in the evaluation table produces category I, indicating “online ticket purchase” is an indifferent requirement from the customer’s viewpoint (Refer to Figure 2). He/she does not care whether it is present or not. Step 4: Administer the questionnaire In order to improve the quality of responses, we had group interviews (7~10 respondents), instead of simply distributing and collecting the questionnaires. Since
Understanding Key Attributes in Mobile Service: Kano Model Approach
359
Fig. 2. Interpretation of Kano Questionnaire
the Kano questionnaire was to be new to respondents, we explained the purposes of this study and gave detail directions about filling out the questionnaire. Based on the Kano questionnaire, we asked university student respondents the two types of questions for each 3G mobile service. For this study, we selected university students as our respondents. The 83 responses were selected for the data analysis. The data sample consisted of 44 women and 39 men. The ages of most of the respondents were between 20 and 25.
4 Preliminary Data Analysis This study examined the current usage status of 12 current mobile services that will be enhanced by 3G technology in the future. The results show that the most common mobile services used by US university students are picture messaging (87%) and mobile games (81%). Since the respondents of the study were university students, they tended to use the cellular phones for personal uses rather than work or business uses. In the same data sample, the least used mobile services were mobile banking (10%) and video conferencing (10%) services. In addition, due to the limited availability of phones with cutting-edge features, very few respondents used radio/video on demand services (11%). This result indicates that the diffusion and adoption of advanced mobile services, such as videoconferencing, streaming video/audio, broadcasting, have not progressed as expected. Although more advanced 3G mobile services are available, many respondents used the limited mobile services, such as mobile messaging services and mobile games, which they can use without 3G technologies. Research Objective 1) The first objective is to discover how customers perceive 3G mobile services, based on Kano model. [Table 3] summarizes the attribute classification for each of the 12 mobile services. The simplest method to group the mobile services is evaluation and interpretation based on the frequency of responses. However, because there are some cases that the largest percentage and the second largest percentage of a given category do not have significant differences, we need to test if there are significant differences between their proportions. For example, regarding to mobile game, the largest percentage of
360
S. Ik Baek, S.K. Paik, and W.S. Yoo Table 3. The Percentage of Each Mobile Service Attribute of Kano Quality Attribute
Picture Messaging Mobile Shopping Instant Messaging Video Conferencing Radio on Demand Mobile TV Video on Demand Navigation Mobile Game Mobile Banking Mobile Internet My Space
S (%) 1.2 1.2 2.41 2.41 3.61 2.41 0 2.41 1.2 0 2.41 0
A (%) 24.1 21.69 19.28 36.14 31.33 30.12 30.49 26.51 33.73 26.51 27.71 26.51
O (%) 46.99 3.61 48.19 8.43 12.05 9.64 10.98 45.78 38.55 14.46 50.6 15.66
R (%) 4.82 10.84 4.82 10.84 3.61 13.25 13.41 2.41 7.23 14.46 2.41 13.25
I (%) 15.66 57.83 25.3 39.76 46.99 39.76 45.12 20.48 14.46 42.17 13.25 40.96
M (%) 7.23 4.82 0 2.41 2.41 4.82 0 2.41 4.82 2.41 3.61 3.61
P-value 0.0000* 0.0000* 0.0000* 0.2386 0.0016* 0.0700 0.0034* 0.0012* 0.0722 0.0024* 0.0000* 0.0004*
Category O I O C I C I O C I O I
S: Skeptical Evaluation A: Attractive Attribute O: One-Dimensional Attribute M: Must-Be Attribute I: Indifferent Attribute R: Reverse Attribute C: Combination : p<0.01
respondents perceives it as one-dimensional attribute (38.55%) and the second largest percentage of respondents perceive it as attractive attribute (33.73%). It is very hard to say that mobile game is one-dimensional service attribute. In this study, we selected the two dominant categories for mobile services, and tested for the significant difference between the two categories, using z-test. When we could not find significant differences, we categorized the mobile services into “combination”, which means no definite classification can be made. The results show that picture messaging, instant messaging, navigational aid, and mobile Internet is considered as “one-dimensional attributes”. That is, the higher the level of fulfillment of these mobile services, the higher the customer’s satisfaction, and vice versa. Typically, customers explicitly ask for these one-dimensional attributes. However, the analysis shows that more advanced and newer mobile services, such as mobile shopping, radio on demand, video on demand, mobile banking, and my space, were perceived as indifferent attributes. These mobile services do not affect customer satisfaction and dissatisfaction. This result suggests that telecommunication companies need to invest their efforts and resources to enhance one-dimensional attributes rather than indifferent attributes. In order to calculate the average impact of a mobile service on customers’ satisfaction and dissatisfaction, Berger et al. (1993) suggest an index, the customer satisfaction (CS) coefficient [2]. It tells whether satisfaction can be increased by meeting a requirement, or whether fulfilling the requirement only prevents customers from being dissatisfied. The CS-coefficient is indicative of how strongly a mobile service may cause satisfaction or, in case of its unfulfillment, customer dissatisfaction. The CS-coefficient can be calculated by the following equations. Better Index = (A+O)/(A+O+M+I)
Worse Index = -(O+M)/(A+O+M+I)
The positive better index indicates customer satisfaction will increase by providing the mobile service, and the negative worse index means that customer satisfaction will decrease by not providing a mobile service. The closer the better index is to 1, the greater the influence on customer satisfaction. The closer the worse index is to – 1, the greater influence on customer dissatisfaction. [Table 4] summarizes better indexes and worse indexes for 12 mobile services. For instance, a good picture messaging service
Understanding Key Attributes in Mobile Service: Kano Model Approach
361
Table 4. Better Index vs. Worse Index Better Index
Worse Index
Picture Messaging
0.7564
- 0.5769
Mobile Shopping
0.2877
- 0.0958
Instant Messaging
0.7273
- 0.5195
Videoconferencing
0.5138
- 0.1250
Radio on Demand
0.4676
- 0.1559
Mobile TV
0.4714
- 0.1714
Video on Demand
0.4789
- 0.1268
Navigation
0.7595
- 0.5063
Mobile Game
0.7894
- 0.4737
Mobile Banking
0.4789
- 0.1972
Mobile Internet
0.8228
- 0.5696
My Space
0.4862
- 0.2222
with a positive CS coefficient of 0.76 brings more than proportional satisfaction. A bad picture messaging service with a negative CS coefficient of -0.58 can increase dissatisfaction. In customer satisfaction and dissatisfaction, the picture messaging service is the most important factor. For mobile shopping, even if companies provide a bad service, it does not cause much dissatisfaction (Worse Index= - 0.1). Picture messaging, navigation, and mobile Internet have relatively high better indexes and worse indexes meaning that, among the 12 mobile services, these three mobile services are considered the most important services. Research Objective 2) The second objective is to understand how 3G mobile services are perceived differently between experienced users and non-experienced users. Secondly, this study explores the perception differences between customers who have used the services and customers who have not used the services. Unlike the analysis for the first research objective, this analysis simply counted the number of occurrences without statistical analysis due to the small sample size. One of the important issues in a Kano analysis is the evaluation of Kano categories with nearly equal number of occurances. The most frequent observations approach works well when one category dominates the sample [12]. However, as the difference between the frequencies of two classifications gets narrower, proper classification becomes unclear [12]. [Table 5] shows first dominant categories and second dominant categories of 12 mobile services. As shown in [Table 5], some mobile services have nearly equal numbers of occurrences in the first and second dominant categories. For those services, no clear classification can be determined. In order to handle this issue, this study employed two methods. Walden (1993) suggests the following method to decrease the noise levels [13].
362
S. Ik Baek, S.K. Paik, and W.S. Yoo
If (One-dimensional + Attractive + Must-be) > (Indifferent + Reverse + Questionable), Then Maximum (One-dimensional, Attractive, Must-be) Else Maximum (Indifferent, Reverse, Questionable) Table 5. Experienced Customers vs. Non-Experienced Customers
Picture Messaging
Mobile Shopping Instant Messaging Video Conferencing Radio on Demand Mobile TV
Video on Demand Navigation
Mobile Game
Mobile Banking Mobile Internet My Space (Social Networking)
Sample Size
1st Dominant Category (Largest Percentage)
2nd Dominant Category (Second Largest Percentage A: 16 (22%)
Category (Based on Walden Method)
Better Index
Worse Index
Experienced
72
NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced Experienced NonExperienced
11
O: 38 (53%)* I: 5 (45%)
O
0.79
0.652)
A: 4 (36%)
I
0.50
0.10
12 71
I: 6 (50%) I:42 (59%)
A: 4 (33%) A:14 (20%)
I I
0.50 0.25
0.17 0.08
51 32
O: 36 (71%) I:14 (44%)
I: 7 (14%) A: 11 (34%)
O I
0.85 0.52
0.75 0.14
8 75
A: 7 (88%) I: 32 (43%)
I: 1 (13%) A: 23 (31%)
A I
0.88 0.47
0.00 0.14
9 74
A:4 (44%) I: 37 (50%)
O: 3 (33%) A: 22 (30%)
A I
0.78 0.43
0.33 0.13
12 71
A: 7 (58%) I: 31 (44%)
O: 3 (25%) A: 18 (25%)
A I
0.83 0.40
0.25 0.16
10 73
O: 5 (50%) I: 36 (49%)
A: 3 (30%) A: 22 (30%)
O I
0.80 0.42
0.50 0.06
33 50
O: 21 (64%) O: 17 (34%)
A: 9 (27%) I: 16 (32%)
O O
0.971) 0.632)
0.68 0.401)
67 16
O: 32 (48%) I: 7 (44%)
A: 22 (33%) A: 6 (38%)
O I
0.86 0.46
0.57 0.00
8 75
A: 4 (50%) I: 33 (44%)
I: 2 (25%) A: 18 (24%)
A I
0.71 0.45
0.14 0.20
53 30
O: 36 (68%) A: 12 (40%)
A: 11 (21%) I: 8 (27%)
O A
0.922) 0.641)
0.731) 0.292)
30 53
A: 11 (37%) I: 25 (47%)
I: 9 (30%) A: 11 (21%)
A I
0.68 0.36
0.29 0.18
*Category: Number of Respondents (Proportion) S: Skeptical Evaluation A: Attractive Attribute O: One-Dimensional Attribute M: Must-Be Attribute I: Indifferent Attribute R: Reverse Attribute
The sixth column of [Table 5] summarizes the results of the Walden’s method. Although both user groups have the same perceptions to some mobile services, they also have different perceptions to most of the mobile services. The second method is to look for market segment differences. In order to explore the perception differences between customers who have used and who have not used
Understanding Key Attributes in Mobile Service: Kano Model Approach
363
the mobile services, this study calculates better and worse indexes for mobile services. The better and worse indexes in [Table 5] suggest that, by improving the navigation and mobile Internet services, we can improve both “Experienced” and “Non-Experienced” customers’ satisfaction levels. On the other hand, by improving the mobile Internet service, we can reduce the unsatisfaction levels `of both groups. Additionally, the picture messaging service is closely related to the unsatisfaction levels of “Experienced” customers, whereas the navigation service is closely related to the unsatisfaction levels of “Non-Experienced” customers.
5 Conclusion Successful companies understand that their business strategies need to be developed to satisfy the specific needs of their customers. Any strategy that is based on a tryeverything mentality often leads to inefficiency and poor customer services. In order to develop business strategies tailored to meet the specific customer needs, companies need to get close to customers and understand how customers perceive their products and services. This study tried to investigate how customers view 3G mobile services. The results show that customers perceived picture messaging, navigation, and mobile Internet as a one-dimensional service attribute: an element that results in satisfaction when it is fulfilled and in dissatisfaction when not fulfilled. Since these services represent the critical elements of customer satisfaction among the various 3G mobile services, telecommunication companies need to improve the quality of these services. For the services with indifferent service attributes that do not result in satisfaction or dissatisfaction, companies need to develop marketing strategies to disseminate the values of these services to their customers. Although this study provides good insights into the current usage status of 3G mobile services, there are important limitations that provide good opportunities for further research. The major findings of this study were based on the responses from university students in USA. Thus, there is a need to expand the geographic scope to other states in the USA, other countries, and other respondent groups to verify the findings from this study.
References 1. Aarnio, A., Enkenberg, A., Heikkila, J., Hirvola, S.: Adoption and use of mobile services: Empirical evidence from Finnish survey. In: Proceedings of the 35th Annual Hawaii International Conference on System Sciences, pp. 1454–1463 (2002) 2. Berger, C., Blauth, R., Boger, D., Bolster, C., Burchill, G., DuMouchel, W., Pouliot, F., Richter, R., Rubinoff, A., Shen, D., Timko, M., Walden, D.: Kano’s Methods for Understanding Customer-Defined Quality. The Center for Quality Management Journal 2(4), 1–37 (1993) 3. Bhattacharyya, S.K., Rahman, Z.: Capturing the customer’s voice, the centerpiece of strategy making. European Business Review 16(2), 128–138 (2004) 4. Carlsson, C., Hyvonen, K., Repo, P., Walden, P.: Asynchronous Adoption Patterns of Mobile Services. In: Proceedings of the 38th Hawaii International Conference on System Science (2005)
364
S. Ik Baek, S.K. Paik, and W.S. Yoo
5. Chen, T., Lee, Y.: Kano Two-dimensional Quality Model and Important-Performance Analysis in the Student’s Dormitory Service Quality Evaluation in Taiwan. Journal of American Academy of Business 9(2), 324–330 (2006) 6. Gruber, H., Verboven, F.: The Diffusion of Mobile Telecommunications Services in the European Union. European Economic Review 45(3), 577–588 (2001) 7. Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive Quality and Must-Be Quality. The Journal of Japanese Society for Quality Control 14(2), 39–48 (1984) 8. Kim, J., Lee, I., Lee, Y., Choi, B., Hong, S., Tam, K.Y., Naruse, K., Maeda, Y.: Exploring e-business implications of the mobile internet: a cross-national survey in Hong Kong, Japan and Korea. International Journal of Mobile Communication 2(1) (2004) 9. Lehr, W., McKnight, L.W.: Wireless Internet Access. Telecommunication Policy 27(5), 351–370 (2003) 10. Mao, E., Srite, M., Thatcher, J.B., Yaprak, O.: A Research Model for Mobile Phone Service Behaviors: Empirical Validation in the U.S. and Turkey. Journal of Global Information Technology Management 8(4), 7–28 (2005) 11. Nysveen, H., Pedersen, P.E., Thorbjornsen, H.: Intentions to Use Mobile Services: Antecedents and Cross-Service Comparison. Journal of the Academy of Marketing Science 33(3), 330–346 (2005) 12. Sireli, Y., Kauffmann, P., Ozan, E.: Integration of Kano’s Model Into QFD for Multiple Product Design. IEEE Transactions on Engineering Management 54(2), 380–390 (2007) 13. Walden, D.: Kano’s Methods for Understanding Customer-Defined Quality. Center for Quality Management Journal 2(4), 3–35 (1993)
Discovering User Interface Requirements of Search Results for Mobile Clients by Contextual Inquiry David L. Chan, Robert W.P. Luk, Hong Va Leong, and Edward K.S. Ho Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong {csdlchan,csrluk,cshleong,csksho}@comp.polyu.edu.hk
Abstract. This paper reports our in situ study using contextual inquiry (CI). It solicits user requirements of hierarchically organized search results for mobile access. In our experiment, search activities of our subjects are recorded in the video, and the interviewer solicits the interface requirements, during and after the experiment. An affinity diagram is built as a summary of our findings in the experiment, and the major issues are discussed in this paper. The search behavior of our subjects is summarized into a flow chart. In this study, we report mobile interface features that are desired by our users in addition to those found in an earlier survey. Keywords: Hierarchical access, mobile search, interface design.
1 Introduction Recently, there has been much interest [1-10] in organizing search results as hierarchies. Such works can be used for semantic-based retrieval [6,7] and for mobile search (e.g., [8,9]) in order to circumvent the limitations of mobile clients (such as limited display, input constraints, bandwidth costs, etc). They focused on the technical advancement of the access structures, for examples, by building better hierarchies ([1-6]), by determining better hierarchical structures [10] or by advanced browsing using the Accordian Summarization [11]. By contrast, there are less research works in obtaining requirements of user interfaces (e.g., [12]) that access search results hierarchically. Earlier, [13,14] provided some observations to fill this gap by surveying the user preferences of user interface design of existing mobile content providers that used list and hierarchical access structures. However, surveying existing user interfaces as a means of soliciting user requirements is too restrictive, because many desired user interface features may be needed but not implemented because of: (1) financial costs, (2) subjects' unwillingness to make suggestions, (3) the absence of practical knowledge in the field. [15-19] have advocated that user requirements should be obtained by soliciting them from end-users who are directly using the interfaces in the work environment. Such in situ studies have been carried out for mobile systems in general (e.g., [20-23]), but they have not investigated the specific problem of soliciting user interface requirements of hierarchically organized search results for mobile clients. This paper addresses this problem by using an in situ study methodology, called Contextual M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 365–374, 2009. © Springer-Verlag Berlin Heidelberg 2009
366
D.L. Chan et al.
Inquiry (CI), which is a component of Contextual Design [15] in user requirement solicitation. This paper discusses and analyzes practical difficulties as well as the functional requirements of searching using mobile clients based on CI.
2 Our Work According to [23], it is not easy to accomplish naturalistic study. An alternative is to make direct observation by shadowing a small number of users. As a result, our data collection and analysis are mainly qualitative, and our experiments assign realistic mobile search tasks to users. We do not limit our observations to a pre-defined issue space [23] (see Table 1 for examples), and we collect a wide range of information about mobile search systems. Our analysis is based on both our observation and the user provided information. Table 1. Example search tasks
In Yahoo, find some item that you want to buy and tell us the result. “Existing technology used to detect diseases could also allow parents to select their babies' most important physical traits from eye color and hair color to brain power and even the shape of the babies' nose.” Please find out the full story of the news in ABC news. Start from BBC, find out an Israeli delegation which is due to arrive in Washington and is trying to secure an emergency aid package worth $12bn. Start from Internet Wire, find out “AFM Hospitality Corporation, one of North America’s leading hospitality companies, (which) announced today that it has discontinued negotiation regarding the acquisition of Marshall Management, Inc. of Salisbury, Maryland.” 2.1 Set Up Fifteen experienced PDA users participate in our study – ten males and five females. The background of these subjects is similar to an earlier survey [13,14] because some participated in the earlier survey in order to gauge their in-depth knowledge. All users have experience in human computer interaction. They are undergraduates aged 20–23, and they have modest to expert level of experience with mobile phone and Internet. On average, they have at least four years of online search experience, and the experience to access the Internet and mobile clients. They reflect the most active population using mobile clients [25], and they are expected to use mobile clients in the future. Figure 1 shows our set up for conducting contextual inquiry on mobile search. A PDA (Pocket PC - Toshiba e740) is connected to a laptop computer that is connected to the Internet. An audio recorder and a video capturing tool record the experiment. Data traffic and user interactions are recorded by the proxy server. The Internet Explorer and Klondike WAP browser access Web and WAP based information,
Discovering User Interface Requirements of Search Results for Mobile Clients
367
Fig. 1. Set up of Contextual Inquiry for studying mobile search
Fig. 2. Internet Explorer (Google)
Fig. 3. Klondike WAP Browser (Google)
respectively. As the browsers’ interfaces and functionalities are similar, we can reduce the influence of the interface in the experiment because of the difference in Web and WAP content. The search tasks are designed for observing and analyzing practical user requirements, opinions, and behaviors, when the users use mobile clients to search. Users are asked to perform various Web/WAP search tasks by using keywords or hierarchy based search tools/engines. These tasks vary from general search for any
368
D.L. Chan et al.
related information items to specific search that has detailed relevance criteria (please refer to Table 1 for examples). 2.2 Data Collection Each of the four interviews with users is divided into two subsections lasting for one hour each. The interviews are open ended. While the interview is in progress, an interviewer observes and discusses with users on their requirements and difficulties, as well as opinions on mobile search systems. Such information, together with users’ actions and interviewer’s observations, is recorded during the interview. A short discussion is conducted with the users in order to discover and wrap up problems and ambiguities after each interview which is video/audio taped as supplementary information. Additional information, such as user background, is collected with post experimental questionnaires for a better understanding of the users’ behaviors.
3 Findings Data are gathered from open-ended interviews, discussion and observations. As a result, users can raise their requirements, opinions and difficulties freely while they are searching via mobile clients. However, the collected data are normally unstructured and do not have a fixed focus on the users’ requirements, opinions and difficulties, so it is quite difficult to draw conclusions from such data directly. Therefore, we adopt an affinity diagram to consolidate such data for analysis as in many Contextual Inquiries/Designs. An affinity diagram is a hierarchical structure that represents information from general to specific, where each of the parent nodes represents a general idea concluded from the associated child nodes. Initially, our collected data are represented by leaf nodes of the affinity diagram. Similar nodes belonging to the same layer are grouped and consolidated in order to find a general idea of each group. As a result, the affinity diagram represents grouped and concluded ideas from the collected data. Using the affinity diagram, we explore the requirements, opinions and difficulties of users by referring to those consolidated conclusions. Apart from adopting affinity diagrams, the search behavior is simplified into a flow chart. 3.1 Affinity Diagram Summary The consolidated requirements are re-constructed into an affinity diagram which is itemized [24]. Due to space limit, we show an extract of the itemized affinity diagram in Figure 4. In addition, we summarize the findings from this affinity diagram and cross-reference these findings to the itemized affinity diagram labeled with [AD section number] in [24]. Feasibility of hierarchical access: We assume that hierarchical access is not only a human preferred access structure, but also a suitable one for supporting mobile search in practical situations. If our assumption is valid, hierarchical access will fulfill those practical requirements mentioned in the affinity diagrams. Thus, we try to support our assumption with observations as follows.
Discovering User Interface Requirements of Search Results for Mobile Clients
369
Fig. 4. Extracts of our affinity diagram (itemized) in [24] after minor modifications
First, owing to the limited input capacity of mobile clients, selection based input is more suitable than keyboard based input [AD 1.1.1.3.6]. This favors hierarchical access since hierarchal access input is selection based. This finding is further confirmed by [AD 1.1.1.3.7] that a local catalog (hierarchy) is preferred more than local keyword search for Web access. Second, [AD 2.1.1.3.1] indicates that much information is not grouped or organized which results in an ambiguous layout, a slower access speed, and a waste of display area. The use of hierarchy is one way for grouping and organizing information. Users agree that hierarchy can provide a tidy layout [AD 2.1.1.6, 2.1.1.6.5.2] and prefer to use hierarchies for grouping information [AD 2.1.1.6.6.3], because hierarchies can help them to identify relevant information quickly by providing a tidy, organized layout and an effective summary [AD 2.1.2.3.1.1.5]. Third, it will be more convenient to use hierarchical access if a user knows about the search item belongs to which catalog or category in the hierarchy [AD 2.1.2.3.1.1.7.3]. In some cases, users need to use a hierarchy when they cannot specify queries or do not have a clear search objective [AD 2.1.2.3.1.1.7.1-2]. Fourth, hierarchies could make good summaries by concentrating important information which, in turn, allows users to anticipate the relevance of a group of documents when browsing only the small and organized hierarchical summary. Potentially, users access faster as mentioned in [AD 1.1.1.2.2] and can decide quickly whether to wait for loading detailed content or not as found in [AD 1.1.1.2.1.2-3]. Exploration of User Requirements: First, users need a simple, clear and tidy layout in a common or static form, where interface components are indicated clearly and fitted
370
D.L. Chan et al.
to a single screen [AD 1.1.2.1]. This is because a complicated, information-rich layout causes ambiguity because of the limited display area. Simple text-based links, wellcontrasted text (black) and clear background (white) are thought to be suitable for information display on mobile clients [AD 2.1.1.4, 1.1.2.2]. Second, users report that summarization, concentrated information, heading, indexing and local hierarchy (catalog) could be useful for them to understand, search, and identify useful content quickly and conveniently in mobile clients. This is because they experienced difficulties to browse a lot of information in a small display area using a mobile client, unlike searching using desktop computers [AD 2.1.1.7.1]. They suggested that only necessary content, such as abstract and heading, is needed in the search result, while additional useful information, such as the total number of search result and the total number of pages, may be useful [AD 2.1.1.8.2, 2.1.1.8.3]. Third, users indicate that they will be unable to search for a single purpose for more than 15 minutes if they use mobile clients to search for information in real life [AD 1.1.1.2.1.1]. Also, they could not accept slow loading and access speed, when they are searching via mobile clients [AD 1.1.1.2.2]. They suggest that pictures and information that cannot be fit onto a single page should be removed, so that the loading speed can be increased, and less scrolling is needed (because of slower accessing speed) [AD 1.1.1.2.2]. Users also find scrolling difficult to locate, capture and access information [AD -2.1.1.3.3]. They suggest that utilization of splitting pages, physical control keys and hierarchical summarization could relieve the reliance on or even replace scrolling. Thus, communication cost, access speed and loading speed as well as tidy layout are ranked at higher priorities than other functions such as pictures and scroll bars [AD 1.1.1.3.9]. Fourth, users suggest sorting search results by relevance ranking, alphabetical ordering, or user interest [AD 2.1.2.3.2]. They prefer less than 20 items in one page [AD 2.1.2.3.3.1-2]. Fifth, function keys should be unique and popular, together with descriptions clearly shown. For drawing users’ attention effectively and for users’ convenience, function keys should be grouped together, and put at the top and bottom of a long page (scrolling is required), or at the bottom of a single screen page [AD 2.1.1.1]. In addition, highlighting important keywords in the search results, abstracts and headings may provide useful hints to identify useful information, because users indicate that they usually pay attention to them. Also, highlighting visited links can also help users to quickly skip the visited links [AD 1.1.1.1]. Finally, users want a feasible mechanism to group information, and the use of hierarchy is one method that can help users to identify information quickly by providing a tidy display and a summary of information. Users use hierarchies, when they are unable to specify queries or do not have a clear objective for searching, or when they know exactly that the search items belong to certain parts of the hierarchy. Users accept changes to the hierarchical structure based on dynamic data (i.e. dynamic hierarchy), while (path) information provided by the hierarchy needs to make sense to users and should not be changed often. Also, they prefer detailed hierarchy and accept multiple paths that matched the same concept. This concept should make sense to users, or reflect the underlying content of the information units. For display, hierarchy and headings should not be put together. The names of nodes of catalog should be well separated in a single screen [AD 2.1.1.6]. Also, users need
Discovering User Interface Requirements of Search Results for Mobile Clients
371
status tracing functions, like forward, backward and direct jump functions, to go to the different status or levels of the hierarchy. This is useful when users change their mind [AD 3.1.2.2]. 3.2 User Search Behavior Summary User behavior is recorded using video, and the behavior is summarized using a flow chart. The final analysis integrates all the users' search behavior by combining the flow charts. Because the aim of this paper is to explore issues on designing hierarchical access structure, only user behavior regarding hierarchical access is analyzed. The behavior flow chart (Figure 5) contains all the possible paths collected because it represents all the users’ behavior regarding hierarchical access. Paths via pull-down menus are ignored, because pull-down menus do not belong to the hierarchical access structure. This simplifies the analysis, and it focuses on hierarchical access to search result only. Also, paths via related headings are ignored because it is not related to search results. After discarding the irrelevant user behavior, the final analysis of the user search behavior is shown in Figure 6. The user behavior of hierarchical access on mobile client is simple and straightforward (Figure 6). There are only three major components involved, namely, catalog or hierarchy, picture/abstract/headings and content. We may further ignore the picture component because pictures are not preferred [AD 1.1.1.3.8-9]. Instead, we observe that the linkage between the above three components is important. Hence, the interface should have some device to enable users to jump to these three search behavior components in the hierarchy.
Fig. 5. Search behavior flow chart
Fig. 6. Simplified search behavior flow chart
372
D.L. Chan et al. Table 2. Features found by survey and contextual inquiry
User Preference Survey [13,14] No similar features No similar features No similar features No similar features No similar features No similar features No similar features No similar features No similar features No similar features No similar features No similar features No similar features No similar features Provide facility for jumping back to higher level Provide facility for jumping in between different catalogs Classify articles into different catalogs List titles together with abstract in each of the catalog Jump to the next/previous document List of catalog names with the related titles
Contextual Inquiry (In This Article) Bold, underline or highlight the keywords and visited links Always fit information on a single screen (i.e. avoid scrolling) Use selection based input Local catalog, menu and abstract to provide a summary of information Show description of function keys Group function keys Use text-based links Black and white background and fonts (for good contrast) Break/split long pages to avoid scrolling Use physical function buttons Use pull-down menu Make summary of information by heading, abstract or local catalog/hierarchy Sort items by relevance ranking, alphabetical ordering or user interests Show total number of search results, or available documents or pages Use backward and forward functions
Add function for directly jumping in between different status, e.g. Page number list <1,2,3,4,5> Categorize information Browsing and search result only need the abstract and heading No similar features No similar features
4 Comparison Table 2 summarizes the important user interface features indicated by an earlier survey [12,13] and CI. Some of the features in the survey are similar to features found by the CI. Some of the features are only indicated as important in either the user preference survey or the CI. Of particular interests are the importance of
Discovering User Interface Requirements of Search Results for Mobile Clients
373
categorization of information (a similar important feature in the survey is “Classify articles into different catalogs”), because it supports the use of hierarchical access that is similar to catalog. To illustrate the advantage of CI, we compared the user requirements found by CI with those discovered by an earlier survey of mobile search interfaces. Out of a total of 24 interface features found, four similar feature pairs are discovered by both CI and the survey, 14 features by CI alone and only two features by the survey alone.
5 Conclusion An affinity diagram summaries our findings in the experiment, and the major issues are discussed in this paper. The search behavior of our subjects is summarized into a flow chart. We found 14 additional mobile interface features that enrich our knowledge from an earlier survey on mobile interface design for search results. Acknowledgements. This work is supported by the HKPU project # G-U289.
References 1. Sanderson, M., Croft, W.B.: Deriving structure from texts. In: ACM SIGIR 1999, pp. 206– 213. ACM Press, New York (1999) 2. Lawrie, D., Croft, W.B.: Generating hierarchical summaries for web searches. In: ACM SIGIR 2002, pp. 457–458. ACM Press, New York (2002) 3. Lawrie, D., Croft, W.B., Rosenberg, A.: Finding topic words for hierarchical summarization. In: ACM SIGIR 2001, pp. 349–357. ACM Press, New York (2001) 4. Lawrie, D., Croft, W.B.: Discovering and comparing topic hierarchies. In: RIAO 2000, pp. 314–330 (2000) 5. Robertson, G.G., Cameron, K., Czerwinski, M., Robbins, D.C.: Polyarchy visualization: visualizing multiple intersecting hierarchies. In: CHI 2002, pp. 423–430 (2002) 6. Shadbolt, N.R., Gibbins, N., Glaser, H., Harris, S., Schraefel, M.C.: CS AKTive Space or how we stopped worrying and learned to love the Semantic Web. IEEE Intelligent Systems 19(3), 41–47 (2004) 7. Schraefel, M.C., Shadbolt, N.R., Gibbins, N., Glaser, H., Harris, S.: CS AKTive Space: Representing computer science in the semantic web. In: WWW 2004, pp. 384–392 (2004) 8. Chan, D.L., Luk, R.W.P., Mak, W.K., Leong, H.V., Ho, E.K.S., Lu, Q.: Multiple related document summary and navigation using concept hierarchies for mobile clients. In: ACM SAC, pp. 627–633 (2002) 9. Karlson, A.K., Robertson, G., Robbins, D.C., Czerwinski, M., Smith, G.: FaThumb: a facet-based interface for mobile search. In: CHI 2006, pp. 711–720 (2006) 10. Yang, C.C., Wang, F.L.: Fractal summarization for mobile devices to access large documents on the web. In: WWW 2003, pp. 215–224 (2003) 11. Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Accordion summarization for endgame browsing on PDAs and cellular phones. In: CHI 2001, pp. 213–220 (2001) 12. Capra, R., Marchionini, G., Oh, J.S., Stutzman, F., Zhang, Y.: Effects of structure and interaction style on distinct search tasks. In: JCDL 2007, pp. 442–451 (2007)
374
D.L. Chan et al.
13. Chan, D.L., Luk, R.W.P., Leong, H.V., Ho, E.K.S., Lu, Q.: A preliminary study of user interface design issues for mobile web access. In: WWW 11 Mobile Search Workshop, pp. 4–14 (2002) 14. Chan, D.L., Luk, R.W.P., Ho, E.K.S., Lu, Q.: A preliminary study on multiple documents access via mobile devices. In: Chung, C.-W., Kim, C.-k., Kim, W., Ling, T.-W., Song, K.H. (eds.) HSI 2003. LNCS, vol. 2713, pp. 116–127. Springer, Heidelberg (2003) 15. Beyer, H., Holtzblatt, K.: Contextual design: defining customer-centered systems. Morgan Kaufmann, San Francisco (1998) 16. Williams, G., McClintock, M.: Usability at Microsoft. In: CHI 1997, pp. 144–145 (1997) 17. Cleary, T.: Communicating customer information at Cabletron systems Inc. ACM Interactions 6(1), 44–49 (1999) 18. Coble, J.M., Karat, J., Kahn, M.G.: Maintaining a focus on user requirements throughout the development of clinical workstation software. In: HFCS, pp. 170–177 (1997) 19. Curtis, P., Heiserman, T., Jobusch, D., Notess, M., Webb, J.: Customer-focused design data in a large multi-site organization. In: HFCS, pp. 608–615 (1999) 20. Druin, A.: Cooperative inquiry: developing new technologies for children with children. In: HFCS, pp. 592–599 (1999) 21. Fouskas, K., Pateli, A., Spinellis, D., Virola, H.: Applying contextual inquiry for capturing end-users behavior requirements for mobile exhibition services. In: 1st Intern. Conf. Mobile Business, CDROM (2002) 22. Nokia 9000 Communicator. PC Week, pp. 39–40 (1996) 23. Palen, L., Salzman, M.: Beyond the handset: designing for wireless communications usability. ACM Transactions on Computer-Human Interaction 9(2), 125–151 (2002) 24. Chan, L.: Supporting searching for mobile clients. M.Phil. Thesis, Department of Computing, The Hong Kong Polytechnic University (2005) 25. Video Research Ltd.: Survey on Mobile Phone Usage Situation (2000)
Evaluation of Pointing Efficiency on Small Screen Touch User Interfaces Ryosuke Fujioka1, Takayuki Akiba2, and Hidehiko Okada2 1 Kobe Sogo Sokki Co. Ltd., 4-3-8 Kitanagase-dori, Chuo-ku, Kobe, Hyogo, 650-0012, Japan [email protected] 2 Kyoto Sangyo University, Kamigamo Motoyama, Kita-ku, Kyoto 603-8555, Japan [email protected]
Abstract. Researchers have been investigating screen designs for small screen touch user interfaces (UIs), but further research is still required for smallerscreen devices including current smart phones. This paper reports on our evaluation of pointing efficiency on devices with touch-by-stylus small screen UIs. User performances were measured by experiments with three devices: a mobile phone, a PDA and a tablet PC. The size of pointing targets was designed so that the target index of difficulty (ID) by Fitts’ law ranged in a consistent interval among the three devices. Users’ pointing speed and accuracy were compared in terms of throughput and error rate respectively. It is found that the throughput and the error rate for the mobile phone were significantly smaller than those for the PDA and the tablet PC. It is also found that the error rate was not significantly larger in the case where users performed tasks with the mobile phone held by their hands than in the case where they did with the mobile phone put on desktop, although it was in the case of the PDA. Keywords: usability, touch user interface, small screen, throughput, error rate, Fitts’ law.
1 Introduction Usability of mobile phones has been more important because more numbers and kinds of users use mobile phones. Smart phones with touch-by-stylus (or fingers) UIs have been available, providing new style of interactions. In designing UIs for such mobile phones, designers should be careful that screen sizes are smaller than other devices as PDAs and tablet PCs. Researchers have been investigating screen designs for small screen touch UIs [1-5], but further research will still be required for smaller-screen devices, some of which less than 3 inches. The degree of difficulty for a user to point targets was formulated by Fitts, wellknown as index of difficulty (ID) [6, 7]. ID is formulated as a function of the size and the distance of targets. Widgets (buttons, icons, menu items, etc.) can be designed for devices with various screen sizes so that ID values are theoretically consistent among M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 375–384, 2009. © Springer-Verlag Berlin Heidelberg 2009
376
R. Fujioka, T. Akiba, and H. Okada
the devices: larger/smaller sizes&distances for larger/smaller screens. If the ease of pointing targets under each of the widget design variations is consistent among the design variations, users’ pointing performances are also consistent among them. The aim of our research is to investigate whether such scalability of pointing difficulties in screen sizes holds: if yes/no, widgets that are smaller-sized in accordance with the screen sizes are acceptable/not recommended for the smaller devices. The authors compare user’ pointing performances by user experiments with three touch UI devices.
2 Evaluation Experiment Methods and conditions of the experiment were designed as follows. 2.1 Test Tasks Participants were asked to point targets on the screen by using a stylus. A test task consisted of pointing two rectangle targets (target 1, 2) in a predefined order. An “attempt” was the two successive pointings of the targets 1 and 2, and a test task consisted of a predefined number set of the attempts. For each combination of experiment conditions, each participant was asked to perform a predefined set of the tasks. The pointing operations were logged for later analyses of pointing speed and accuracy. 2.2 Conditions In this experiment, the following three conditions were employed. The “device” condition is necessary in this research, and the other two conditions were employed to investigate whether these conditions could affect user performance differences among devices: these conditions might help us discover some characteristic differences among devices and the differences may contribute to develop widget design guidelines. Devices. This condition is for comparing user performances among devices. A mobile phone, a PDA and a tablet PC were used in the experiment. They were all commercial products with touch screens. Screen sizes of the devices were shown in Table 1. In the followings of this paper, the three devices are denoted as devices S, M and L. All the devices were set in portrait orientation. Each stylus attached to each device was used when a task with the device was performed. Device Positions. The devices S&M can be used by holding them in a hand and pointing with a stylus by the other hand. Pointing performances might be affected by the device position, so this condition was employed in this experiment: pointing Table 1. Device screen sizes Devices (S) Mobile Phone (M) PDA (L) Tablet PC
Sizes (inch) 2.9 3.6 10.2
Evaluation of Pointing Efficiency on Small Screen Touch User Interfaces
377
performances were analyzed for the two device positions. Under both position conditions, each participant was seated and performed test tasks. The device L was excluded from the comparison under this condition because it was not easy for users to hold the device L in a hand as the devices S&M. Errors. Pointing speed and accuracy are usually a tradeoff [8]. Users will point widgets carefully where mispointings cause critical problems (e.g., unintended and not-undoable deletion of important data). In such situations, pointing speed becomes much slower as the target widget is more difficult to point. The amount of decrease in speed may not be the same among the devices. Performances were analyzed for two error conditions: errors acceptable or not. In a test task where errors were acceptable, a participant could continue the task even if s/he made an error, and the task was complete when the count of no-error attempts reached to a predefined number. In a condition where errors were not acceptable, a test task was cancelled by mispointing and the task was retried until the count of no-error attempts reached to a predefined number. The error condition was told to each participant before performing each task: s/he had to try a task more carefully in the errors-not-acceptable condition. 2.3 Design of Pointing Targets The size of the target 1 was a constant with which the target could be pointed easily enough. The size of the target 2 was random within a predefined range. The size range was determined as follows. A small pointing target instance in the device S commercial screen design was a scroll bar of a web browser: the size was around 2.0mm. Based on this widget instance, the lower limit of the target 2 size was determined to 2.0mm. Besides, the device S doesn’t have 10 keys and provides an onscreen keyboard for character inputs. The size of an alpha-numeric key on the software keyboard was 3.0mm. This size was employed as the upper limit of the target 2 size. Thus, on the device S screen the target 2 size was random within [2.0, 3.0]mm. The target 2 size ranges on the devices M&L were determined so that the range of “index of difficulty (ID)” in Fitts’ law were consistent among the three devices: [2.0, 3.0]∗(3.6/2.9)mm for the device M, and [2.0, 3.0]∗(10.2/2.9)mm for the device L. The ID ranges in this experiment were shown in Table 2: mean and SD values of ID were around 3.40 bit and 0.72 bit, consistently among the three devices. The width and the height of the target 2 were independently determined. The positions of the targets 1 and 2 were both random under the constraint that the two targets had no overlap. The size of the target 1 was constantly 6.0mm (= 3.0mm ∗ 2, where 3.0mm was the upper limit size of the target 2 on the device S). Participants seldom made errors for pointing the target 1 (a 6.0mm∗6.0mm rectangle). Fig. 1 shows a screenshot of targets 1 and 2 on the device S. The targets 1 and 2 are the Table 2. Index of difficulty of the pointing targets Device S M L
Mean (bit) 3.37 3.41 3.36
SD (bit) 0.72 0.71 0.72
Min (bit) 1.19 1.07 0.80
Max (bit) 4.83 4.95 4.86
378
R. Fujioka, T. Akiba, and H. Okada
Fig. 1. Screenshot for target pointing tasks
black and white rectangles respectively (the target colors were consistent for all the devices). The two targets were shown at the same time, and each participant was asked to find both targets before s/he pointed the target 1. This was because visual search time should not be included in the pointing time interval. After an attempt of pointing targets 1 and 2, new targets were shown for the next attempt. 2.4 Methods of Experiments Table 3 shows 10 combinations of the conditions. Each participant performed test tasks seven times under each of the 10 condition combinations. The order of the 10 condition combinations was random. Under the condition where “position”=“handheld”, each participant was told not to put her/his hands on the desktop or her/his legs. The first 2 out of the 7 trials of the test tasks were for training, so the log data of the 2 training trials were excluded from later analyses. The first 5 attempts in each trial of a task were also excluded because user performances in these attempts (beginning stage of a task) might not be stable enough. The number of successful pointing attempts in a task was set to 10. Table 3. Condition combinations in the experiment No. Device 1 (S) 2 3 Mobile Phone 4 5 (M) 6 PDA 7 8 (L) 9 Tablet PC 10
Position
Error Acceptable Not acceptable Acceptable Handheld Not acceptable Acceptable Desktop Not acceptable Acceptable Handheld Not acceptable Acceptable Desktop Not acceptable Desktop
2.5 Participants Eight subjects, the minimum recommended number for a user segment in CIF recommendations [9], participated in the experiment. They were university graduate or undergraduate students. They were all novices in using devices with touch-bystylus UIs, but they had no trouble in performing test tasks after the two training trials for each condition combination.
Evaluation of Pointing Efficiency on Small Screen Touch User Interfaces
379
2.6 Logging Pointing Operations The following data was recorded for each pointing (each tap by a stylus) into log files. • • • • • •
Target: 1 or 2 Target position: (x, y) values Target width and height: pixels Tapped position: (x, y) values Tap time: msec Error: Yes or No
The tapped position and the tap time were logged when the stylus was land on the screen, and the pointing was judged as an error or not based on the tapped position. No attempt was observed for which the stylus was landed on the target 1, moved into the target 2 and left off.
3 Data Analysis and Findings 3.1 Methods for Comparing Pointing Speed/Accuracy Comparison of Pointing Speed. Pointing speed was measured by the throughput [10]. It is known as Fitts’ law that target pointing time can be modeled as a function of target ID [6,7]. t = a + b ∗ ID.
(1)
ID = log2(A/W+1).
(2)
In Eqs. (1-2), t is the pointing time, A is the distance between the target 1 and 2, W is the target 2 size and a, b are constant that depend on experiment conditions. In this research, t is the interval from the target 1 tap time to the target 2 tap time, A is the Euclid distance between the tapped points for targets 1 and 2, and W is the smaller of the target 2 width or height. This “smaller-of” method for W was reported as better in obtaining linear regression model (Eq.(1)) from users’ pointing log data [11]. Throughput is defined as ID/t in Eqs. (1-2) [10]. Throughput is larger as a target with larger ID is pointed faster. (ID, t) could be observed for each attempt, so a value of throughput could also be obtained for each attempt. Mean and SD values of the throughput were calculated for comparing pointing speeds among different conditions. In addition, it was tested by t-test whether there was a significant difference between population mean values of throughput for two conditions. It should be noted that error attempts were included in the data under the condition “error”=“acceptable”. Error attempts might be faster (of larger throughput values) than successful attempts. Thus, two ways are possible for analyzing throughput under the condition: with all the data including both successful and error attempts or with the data of successful attempts only. It was found by calculating throughput mean and SD values by both of the method that the differences of the mean/SD values were very small: <1% in the mean values and <5% in the SD values. In the following of this paper, the result by only the former method was reported.
380
R. Fujioka, T. Akiba, and H. Okada
Method for Comparing Pointing Accuracy. To measure pointing accuracy, error rate was defined. Error rate = (#error attempts in a task trial)/(#total attempts in the trial)
(3)
Mean and SD values of the error rate were calculated for comparing pointing accuracies among different conditions. In addition, it was tested by t-test whether there was a significant difference between population mean values of error rate for two conditions. Error rate could be calculated for only the condition “error”=“acceptable” because the data under the condition “error”=“not acceptable” didn’t include any error attempt (if an error was occurred in a trial under the condition “error”=“not acceptable”, the trial was cancelled and retried1). 3.2 Mean & SD Values of Throughput and Error Rate Table 4 shows mean & SD values of throughput and Table 5 shows those of error rate. Table 4. Mean and SD values of throughput (bit/sec)
S M L
mean SD mean SD mean SD
Desktop Acceptable Not Acceptable 6.45 5.24 1.38 1.16 6.68 5.94 1.25 1.06 6.83 6.06 1.16 0.95
Handheld Acceptable Not Acceptable 5.48 4.55 1.00 0.91 6.16 5.42 1.19 0.92 -
Table 5. Mean and SD values of error rate (%)
S M L
mean SD mean SD mean SD
Acceptable Desktop Handheld 13.9 12.2 9.87 9.27 3.56 8.82 6.46 9.40 3.64 7.26 -
3.3 Comparisons among Devices User performances are compared among devices in this section. Table 4 revealed that the smaller the screen size of a device, the smaller the mean value of throughput. Table 6 shows the result of t-test for testing whether there was a significant difference in population mean values of throughput among devices. In Table 6, for example, 1
The number of “retries” could be an accuracy measure for the “not acceptable” condition, but the measure was not used in our analysis. This was because the number of retires could not be measured on the task basis but on the participant basis so that the number of data was small.
Evaluation of Pointing Efficiency on Small Screen Touch User Interfaces
381
Table 6. T-test for throughput (comparison among devices)
Desktop Handheld
Acceptable Not acceptable Acceptable Not acceptable
S vs. M t=-2.64** t=-8.90** t=-9.29** t=-13.4**
S vs. L t=-4.51** t=-11.0** -
M vs. L t=-1.82 t=-1.74 -
values in the column “S vs. M” are the t values for comparing the devices S&M (conditions other than the devices are the same). “**”-marked t-scores are those with p<0.01, and non-marked t-scores are those with p>0.05. Table 6 revealed that the throughput with the device S was significantly smaller than those with the devices M&L but no significant difference was observed between the throughputs with the devices M&L. This result indicates that users are likely to point targets slower on devices with smaller screens as the device S. Table 5 revealed that the mean value of error rate with the device S was larger than those with the devices M&L, especially under the condition “position”=“desktop”. Table 7 shows the result of t-test for testing whether there was a significant difference between population mean values of error rate among devices. Table 7 revealed the followings. • Under the condition “position”=“desktop”, the error rate with the device S was significantly larger than those with the devices M&L, but no significant difference was observed between the error rates with the devices M&L. • Under the condition “position”=“handheld”, the difference of error rates between the devices S&M was not significant. • Although the device M had a smaller screen than the device L, the estimated population mean of error rate was slightly smaller for the device M than for the device L. It should be noted that the difference between the devices M&L was much larger in mean throughput values (Table 6) than in mean error rate values (Table 7). It was likely that participants performed tasks much faster (and thus less accurately) on the device L so that the error rate with the device L was relatively larger. Table 7. T-test for error rate (comparison among devices)
Acceptable
Desktop Handheld
S vs. M t=5.46** t=1.60
S vs. L t=5.22** -
M vs. L t=-0.05 -
It was found from the comparisons described above that pointing speed and accuracy were significantly smaller with the device S than those with the devices M&L (under the condition “error”=“acceptable” and “position”=“handheld”, the accuracy was not but the speed was). Therefore, UI designers should pay more attention in designing widgets in a smaller-sized touch screen as the device S: widgets should be designed so that the ID values are smaller than those for larger screens as the devices M&L. To make the ID value smaller, the size/distance of widgets (W and A in Eq.(2)) are required to be larger/smaller. Further research is necessary to
382
R. Fujioka, T. Akiba, and H. Okada
investigate which of the size or distance is more effective in assuring desirable pointing performance on smaller touch screen devices. 3.4 Comparisons among Device Positions User performances are compared among device positions in this section. Table 4 revealed that the mean value of throughput was smaller under the condition “handheld” than that under the condition “desktop”. This holds for both of the devices S&M. Table 8 shows the result of t-test for testing whether there was a significant difference in population mean values of throughput among device positions. Table 8 revealed that, for both of the devices S&M, the throughput for “handheld” was significantly smaller than that for “desktop”. This result indicates that users are likely to point targets slower if they hold a device in their hands than if they put the device on a desktop, which will hold for both PDAs and smaller smart phones. Table 8. T-test for throughput (comparison between device position conditions)
S M
Acceptable Not acceptable Acceptable Not acceptable
Desktop vs. Handheld t=12.2** t=9.33** t=6.20** t=7.40**
Table 5 revealed that, the mean value of error rate with the device M was larger for “handheld” than for “desktop”, but those with the device S was larger for “desktop” than for “handheld” (it should be noted that the difference was relatively large for the device M but small for the device S). Table 9 shows the result of t-test for testing whether there was a significant difference in population mean values of error rate among device positions. Table 9 revealed that, the error rate with the deice M was significantly smaller for “handheld” than for “desktop” but no significant difference was observed in population mean values of error rate with the device S for “handheld” and “desktop”. Table 9. T-test for error rate (comparison between device position conditions)
Acceptable
S M
Desktop vs. Handheld t=0.77 t=-2.88**
It was found from the comparisons described above that, with the device M the speed and the accuracy were both smaller for “handheld” than those for “desktop”, but with the device S only the speed was smaller (the accuracy was not). This will be partially because the device S was smaller that the device M so that the device S was easier for users to hold in a hand. Although the participants were asked to perform tasks as fast&accurate as possible, they tended to point targets much slowly (to keep accuracy to a certain extent) in the case of the device S and the “handheld” condition.
Evaluation of Pointing Efficiency on Small Screen Touch User Interfaces
383
This result indicates that designers of smaller screen touch UIs as in the device S should keep in mind users may not be able to point widgets more accurately even if they put the device on a desktop (more stable than the handheld situation). This finding will be useful for future mobile UI designs in which smart sensors enable them to design device-position-adaptive UIs. Further research is necessary to investigate user performances while users stand performing tasks. 3.5 Comparisons among Error Conditions User performances are compared among error conditions in this section. As noted in 3.1 the error rate could be calculated for “acceptable” only, so the error rates cannot be compared among error conditions: only the throughputs are compared. Table 4 revealed that the mean value of throughput was smaller for “not acceptable” than for “acceptable”, which held regardless of the devices {S,M,L}. Table 10 shows the result of t-test for testing whether there was a significant difference in population mean values of throughput among error conditions. Table 10 revealed that, for all the devices {S,M,L}, the throughput for “not acceptable” was significantly smaller than that for “acceptable”. This result indicates that there is no clear difference in the effect of the error conditions between the device S and the others (M, L): screen design of devices as the device S will require the same considerations as that of lager screen devices as the devices M&L. Table 10. T-test for throughput (comparison between error conditions)
S M L
Desktop Handheld Desktop Handheld Desktop
Acceptable vs. Not acceptable t=14.0** t=14.2** t=9.11** t=10.2** t=10.4**
4 Conclusions Users’ pointing speed and accuracy on a mobile phone with a small screen touch UI were compared with those on a PDA and tablet PC with larger screens. Our findings in designing widgets on the UI are summarized as follows. • On the mobile phone, the speeds were significantly smaller for any condition and the accuracies were also significantly smaller for most of the conditions, than on the PDA and the tablet PC. Thus, designers should keep in mind that relatively difficult-to-point widgets will be acceptable on device as the PDA and the tablet PC but should be avoided for usability on devices as the mobile phone (more specifically, widgets with ID 4.0-5.02). It should also be kept in mind that, on small screen devices as the mobile phone, decreases in the widget distance (A in Eq.(2)) 2
See Table 2: in this experiment the ID values ranged in about [1.0, 5.0], so the difficult-topoint targets were those with IDs in [4.0, 5.0].
384
R. Fujioka, T. Akiba, and H. Okada
theoretically make the ID values smaller but may not contribute well to improve users’ pointing performances: instead, increases in the widget size (W in Eq.(2)) will be necessary. • On the PDA, the accuracy was significantly larger for the “desktop” condition than for the “handheld” condition. On the mobile phone, however, the accuracy was not. This means that, in a case of a small screen device as the mobile phone, designers should not expect users can point targets more accurately even if they put the devices on a desktop. Smart sensors will enable to design device-position-aware UIs which can adapt widget designs to the current device position, but widget sizes should not be changed smaller for the “put-on-a-desktop” situation. The authors hope these findings will contribute to usable screen designs for small screen devices. Further research will still be necessary to, e.g., obtain more specific guidelines in widget sizes and distances according to screen sizes, and investigate ways to appropriately control IDs (investigate how theoretical changes in ID values by changing W and/or A affect users’ actual pointing performances). Experiments with more devices, more participants, under other conditions such as performing tasks while standing, are also included in our future work.
References 1. McClintock, M., Hoiem, D.: Minimal Target Size in a Pen-based System. In: Abridged Proc. of 5th Int. Conf. on Human-Computer Interaction (HCI International 1993), p. 243 (1993) 2. Douglas, S.A., Mithal, A.K.: The Ergonomics of Computer Pointing Devices. Springer, Heidelberg (1997) 3. Sarah, A., Douglas, S.A., Kirkpatrick, A.E., MacKenzie, I.S.: Testing Pointing Device Performance and User Assessment with the ISO 9241, Part 9 Standard. In: Proc. of ACM conf. on Human Factors in Computing Systems (CHI 1999), pp. 215–222 (1999) 4. Soukoreff, R.W., MacKenzie, I.S.: Towards a Standard for Pointing Device Evaluation, Perspectives on 27 Years of Fitts’ Law Research in HCI. Int. J. of Human-Computer Studies 61(6), 751–789 (2004) 5. Oehl, M., Sutter, C., Ziefle, M.: Considerations on Efficient Touch Interfaces - How Display Size Influences the Performance in an Applied Pointing Task. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4557, pp. 136–143. Springer, Heidelberg (2007) 6. Fitts, P.M.: The Information Capacity of the Human Motor System in Controlling the Amplitude of Movement. Journal of Experimental Psychology 47(6), 381–391 (1954) 7. MacKenzie, I.S.: Fitts’s Law as a Research and Design Tool in Human-Computer Interaction. Human-Computer Interaction 7, 91–139 (1992) 8. Plamondon, R., Alimi, A.M.: Speed/Accuracy Trade-offs in Target-Directed Movements. Behavioral and Brain Sciences 20(2), 279–349 (1997) 9. ANSI/NCITS 354-2001, Common Industry Format for Usability Test Reports (2001) 10. ISO 9241, Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs) - Part 9: Requirements for Non-Keyboard Input Devices (2000) 11. MacKenzie, I.S., Buxton, W.: Extending Fitts’ Law to Two-dimensional Tasks. In: Proc. of ACM Conf. on Human Factors in Computing Systems (CHI 1992), pp. 219–226 (1992)
An Integrated Approach towards the Homogeneous Provision of Geographically Dispersed Info-Mobility Services to Mobile Users Dimitrios Giakoumis1, Dimitrios Tzovaras1, Dionisis Kehagias1, Evangelos Bekiaris2, and George Hassapis3 1
Centre for Research and Technology Hellas, Informatics and Telematics Institute, 6th Km Charilaou-Thermi Road Rd., 57001, Thermi, Thessaloniki, Greece [email protected], [email protected] 2 Centre for Research and Technology Hellas, Hellenic Institute of Transport, Greece 3 Aristotle University of Thessaloniki, Department of Electrical and Computer Engineering [email protected]
Abstract. In this paper we introduce a mechanism enabling applications to present information retrieved from different services, thus delivered with different structure, in a homogeneous and seamless fashion. This mechanism was an outcome of the research that took place within the European Integrated Project ASK-IT. The project developed an ambient intelligence framework which supports mobility impaired people on the move to access contextsensitive information dependant on user geographic location and the use case under consideration. The information derives from geographically dispersed web services and is rendered on mobile devices. The ASK-IT framework enables the presentation of information that covers a wide variety of domains which belong to the info-mobility scope (Points of Interest, Route Guidance etc.). Our approach deals with the integration of information-providing services, in order to facilitate the homogeneity of the final presentation of the content, through the end-user application. Keywords: Info-Mobility Services, Service Integration, Information provision, Web Services, Ontologies.
1 Introduction As the world-wide stack of available services covering domains of the info-mobility scope constantly grows, challenges regarding the homogenization of the information retrieved have to be faced from modern HCI systems. A large number of Web services offer today online access to desired content through suitable software interfaces, thus solving interoperability problems between heterogeneous and distributed Internetbased applications. However, the structure of the content delivered differs between services that provide similar type of information. Info-mobility services are usually consumed by systems that offer users on the move (and not only) information regarding Points of Interest, how to reach them etc. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 385–394, 2009. © Springer-Verlag Berlin Heidelberg 2009
386
D. Giakoumis et al.
These systems usually have the ability to use combinations of different services, each of them offering different types of content, in order to cover the user needs as effectively and efficiently as possible. For example, in order to show a map to a user with a specific type of Points of Interest (e.g. Restaurants) around him/her, one such system must first invoke a service that provides information regarding Points of Interest and then, invoke a Mapping service, in order to get a map of the area around the user, with the Points of Interest drawn on it. Furthermore, in order for such a system to ensure the best Quality of Service (QoS) possible, it should have the ability to choose an appropriate service among different ones, which offer similar type of content and most probably belong to different providers. The availability of more than one service for each type of required content ensures better Quality of Service delivered to the end users due to several factors. For instance, if different service providers are located in different countries, the seamless provision of location-aware information [1] to the end users becomes possible. This “geographically dispersed” approach towards effective service provision allows for the selection of the most appropriate provider every time retrieval of information is needed; regarding the Info-Mobility scope, the most appropriate provider (the one that will offer the best content) is usually the one that comes from the country of interest. Furthermore, very important is the fact that mechanisms which eliminate phenomena like “denial of service” are also applicable in the case where multiple service providers are used. However, in the “multiple-providers” case, a major problem is introduced: Each provider delivers information within specific structures, which usually differ from the others’. As a result, the end-user application has to be able to “translate” the content retrieved by each provider to a form suitable for presentation on the client’s device. In order to tackle with this “translation” requirement, we developed a mechanism to allow applications present information retrieved from different and heterogeneous services, in a homogeneous and seamless fashion. This mechanism is based on the use of ontologies that define a common vocabulary for available services, as well as a set of software wrappers with the goal to translate the service-specific data format to the common one defined by the ontology. Currently, describing the semantics of Web services through the use of ontologies, are very active research areas. The W3C has launched the initiative to develop Semantic Web [2] and a semantic markup language for publishing and sharing ontologies; the Web Ontology Language (OWL) is being developed for this purpose. Also, there are a number of efforts for specifically describing the semantics of Web services. Two important initiatives have emerged in this respect: OWL-S and WSMO. OWL-S defines an upper ontology, that is, a generic “Service” class. Furthermore, the Web Service Modeling Ontology (WSMO) [3] is an initiative, which seeks to create an ontology for describing various aspects related to Web services, aiming at solving the integration problem. On the other hand, in order to achieve automated discovery, composition, and execution of Web services, the Web Service Modeling Language (WSML), which is a family of languages and which formalizes WSMO, is proposed. WSML is a formal language to write down annotations of Web services according to the conceptual model (WSMO). Logical inference-mechanisms can be applied to the descriptions in this language.
An Integrated Approach towards the Homogeneous Provision
387
Work presented in this paper is an outcome of the research European Integrated Project ASK-IT [4]. The project developed an ambient intelligence framework which supports mobility impaired people on the move to access context-sensitive information dependant on user geographic location and the use case under consideration. The information derives from geographically dispersed Web Services integrated in the system by a “service alignment” process and is rendered on mobile devices. The user requirements elicitation process determined the services that should be supported in ASK-IT in order to fulfil the information needs of the users, which in our case are considered to be Mobility Impaired. These services are implemented as W3C Web Services [5], [6] offered by various providers throughout Europe. Several types of services are integrated in ASK-IT, and for each one of them, there are typically more than one provider - specific implementations. Different service providers may be located in different countries, thus making it possible to seamlessly provide location-aware information to the end users.
2 Architecture of the Integrated Service Provision Framework The overall functionality of a system that wants to provide information regarding the Info-Mobility scope through Human-Computer-Interaction can be broken down in two separate parts, Information retrieval and Information presentation: the system should be able to invoke web services and then present the retrieved content to the users. As explained above, the system’s ability to choose among different services of the same domain in a context-related fashion ensures better Quality of Service. However, due to the fact that each service has its own representation of the information provided, the system should have the capability to “translate” the information derived from each service, to a form suitable for presentation. This capability can be ensured from our proposed three-layer architecture depicted in Figure 1. The main concept of our approach is the provision of a mechanism that acts like a middleware between the information retrieval and presentation components, facilitating homogeneity in the way information from different services of the same domain is presented. Within our proposed Framework, the end user application has
Fig. 1. The ASK-IT ontology-based Middleware
388
D. Giakoumis et al.
absolutely no idea of the actual representation of the information delivered upon the invocation of a real service. The only representations that it is able to understand are the ones defined within the system’s Ontology. In our case, this refers to a collection of concepts and conceptual service models, able to cover the needs of information provision within the specific scope of interest. 2.1 The ASK-IT Ontology The implementation of our approach was based on the “ASK-IT ontology” [7], which was developed within the ASK-IT project. Its development was originally motivated by the need to support the access to services and information for elderly and disabled users. Ontology
Service
Use Case
ha sU s
er G ro up
User Group
Trip Management
Plan a Trip Wheelchair Users
Get Traffic Information
e ervic hasS Be Guided on a Trip hasService
roup serG hasU
Hearing Impaired
hasUserGroup
Visual Impaired
Trip Planning Find Current Location
Fig. 2. ASK-IT Ontology snapshot
A snapshot of the ontology is depicted in Figure 2. It is divided into the following sub-ontologies: • Service Ontology. This includes descriptions about the supported services, the supporting user groups, the supported use cases categories, as well as additional information related to the special needs of MI users. • Domain-specific ontologies. These ontologies deal with the following application domains: a) Transportation, b) Tourism and Leisure, c) Personal Support Services Domain, d) e-Learning and e-Working, e) Social Relations and f) Community Building Domain. The ASK-IT ontology has been developed by using the Protégé tool [8] and it is available in OWL-DL [9]. The ontology included more than 1400 concepts and 1100
An Integrated Approach towards the Homogeneous Provision
389
properties. Within the ASK-IT ontology, various service models covering the needs of the domains of the Info-Mobility scope are defined, together with appropriate data types that represent the inputs and outputs of each service model. Each model’s input and output is designed in a way so as for the service model to be adequate enough to carry and enable the presentation of information that covers the needs of the scope, with special attention given to the needs of mobility impaired users. 2.2 System Architecture The middleware provided by our proposed framework acts like a bridge between the concepts defined within the actual services integrated for each domain and the ones defined within the ontology, thus providing the required “translation” mechanism of the information retrieved. This mechanism is lead by a “Service Alignment” process, which maps the concepts of every new service to the conceptual models defined within the ASK-IT Ontology. Adopting an architectural model similar to the client-server, the ASK-IT ambient intelligence framework is divided in two main subsystems, the Server Side and the Client Side. The Server Side is responsible for the integration of the services that provide data and content, whereas the Client Side includes all modules that are responsible for handling interaction with users and manipulating data and content received upon request from the Server Side. The communication between these two subsystems is utilized through the exchange of messages between software agents. Several types of agents reside in both the Server and the Client Sides of the framework. The service alignment process constitutes the cornerstone of our approach towards the homogeneous service provision. This procedure maps each integrated service to the appropriate ontological concepts, which are in turn used within the rest of the application for the proper presentation of the content delivered to the end users. In specific, each “real” service is mapped to a specific service model defined within the ontology, and the data types of the actual inputs and outputs of the “real” service are mapped to the appropriate ontological concepts that constitute the inputs and outputs of the respective model. Supplementary information regarding the real Web Service is also stored during the process (Geographic Location of the provider, etc.), enabling the system to choose -each time retrieval of content is needed- the best service to invoke. Fig. 3 depicts the interoperations occurring between the basic modules of the Server Side, as well as any external actors or any other layer components. The Data Management Module (DMM) aims at developing an automatic mechanism for aggregating information originated from multiple service content providers. The enduser requests, through a user interface docked on the Client Side subsystem, (desktop application, mobile phone, PDA) invoke a specific service through a personal useragent, which in turn requests the service from a broker agent after being translated in a machine-readable format. The actual role of the DMM is to listen to the request, decompose it, perform ontology-based search for the appropriate services and finally return the requested information back to the user. From a technical point of view, the input of the DMM is an invocation request for a specific ASK-IT ontological “service model”. Based on the information stored during the service alignment process, the “service model – based” request is translated within the DMM to invocation requests that are proper for actual services. This in fact
390
D. Giakoumis et al.
Fig. 3. The ASK-IT Server-Side Overall Architecture
means that the DMM takes the content of the “ontology-based” input, translates it properly and then puts it inside a structure that is appropriate for the invocation of the actual services. The inverse procedure takes place after the response of an actual service; the returned content is taken from the response’s structure and is put inside the one defined by the ontology’s specific service model’s output. As a consequence, the user agent application does not need to have any knowledge over the actual services invoked and the data structures needed for the actual service invocation. It just communicates with the ASK-IT middleware by the means of the service models and the ontological concepts defined within the ASK-IT ontology. These concepts and models are on one hand adequate for the invocation of all the services of the various domains of our scope of interest, and on the other, appropriate for the presentation of the content derived after a service invocation.
3 The ASK-IT User Agent Application Our developed user agent is a mobile application for PDA devices that consists of a set of individual modules, each one responsible for the effective and efficient presentation of content regarding one specific domain of interest. In Fig. 4, the modules regarding Indoor / Outdoor localization (a, b) and route guidance (d, e) are
An Integrated Approach towards the Homogeneous Provision
a
391
c
b
d f
g
h e
Fig. 4. The ASK-IT User Agent Application
depicted, together with the modules responsible for the functionalities of searching for Points of Interest (c), Social Events (f), E-Learning (g) and E-Working (h). The modules that constitute the end-user application were designed on the basis of the user needs regarding each domain of interest, and their integration focuses on the provision of seamless user interaction. Each module was designed in a way so as to provide the best possible utilization of the capabilities that were offered by the service models defined within the ASK-IT ontology. Special attention was given to the fact that all the information which was capable to be delivered (as input or output) through the service models should be able to 1) be provided from the end users and 2) be presented to them in the most effective and efficient way. At this point it is obvious that the end user application of our approach relies heavily on the quality of the ontology used within the proposed system. The amount of information that can be delivered through our end user application depends of course on the amount of content that can be retrieved from the invocation of the actual, integrated services, but could as well be constrained from a possible lack of “completeness” in the ontology being used within the system.
392
D. Giakoumis et al.
4 Experimental Results The ASK-IT project was supported by 8 pilot sites, dispersed among cities throughout Europe: Athens and Thessaloniki (GR), Madrid (ES), Bucharest (RO), The Hague (NL), Nuremberg (DE), Genoa (IT), Helsinki (FI) and Newcastle (GB). In order to test and evaluate the overall ASK-IT framework, each site provided Web Services, which were integrated in the ASK-IT system, by the means of our service alignment process. The concept of our experimental sessions was to evaluate whether our approach is capable to provide sufficient information seamlessly when users move from country to country and use different service providers for content retrieval. Within this context, we had to evaluate: 1. 2. 3. 4.
The overall usability of the system The sufficiency of the information delivered The preservation of the system’s usability when moving from country to country The preservation of the sufficiency of information delivered when moving from country to country Table 1. Number of Services Integrated and tested per application area Service Search for Points of Interest Multimodal Route Planning Social Events E-Learning E-Working Domotics management Health and Emergency Mapping
Number of Services aligned, integrated and crosssite tested 10 7 5 2 2 3 2 3
For this purpose, we drew upon the ASK-IT pilot sites and their user - testing sessions, organized by the ASK-IT consortium and all of the pilots in the supporting cities in order to evaluate the ASK-IT project’s outcomes. Within these sessions, mobility impaired users coming from all the supported sites tested and evaluated the system. Since each pilot site had provided services for a number of the domains of interest, the users could use services for each domain, from several different providers, in several different geographic locations. The tests were divided in two different types of sessions. First, the overall usability of the system, together with the sufficiency of the information delivered was evaluated in “stand-alone” testing sessions. These were sessions organized by each pilot site separately, within which users coming from the specific site evaluated the system for the case where information derived from the providers of the same site. For example, during the Greek site stand-alone tests, Greek users evaluated the ASK-IT Framework in Greece, by using the information provided by the Greek service providers. The second type of sessions organized was the “cross-site” tests, within which users evaluated the ASK-IT Framework by using services from several different providers,
An Integrated Approach towards the Homogeneous Provision
393
Fig. 5. Evaluation Results regarding the sufficiency of information delivered
in several different geographic locations. These sessions provided feedback regarding the preservation of the system’s usability and the sufficiency of the information delivered when the users are moving from country to country. Questionnaire-based short interviews and “think aloud” protocols were used in both the stand-alone and cross-site tests, in order to record the user’s opinion regarding the outcome of our approach. During the stand-alone tests, 70 users evaluated the ASK-IT framework within sessions organized in all of the sites. The users stated in general that the overall application was usable (at a percentage of 93%), and the information delivered was sufficient enough at a percentage of 87% (Fig 5.a). Some of the users (21 individuals) that had already tested the application within the stand-alone tests evaluated our approach’s outcome during cross-site testing sessions. At the cross-site evaluation, the majority of the users stated again that the application was usable, in a percentage (95%) very close to the one that came from the stand-alone tests. This was an expected result, since the end-user application and the way it presents information was due to our approach the same, regardless the providers used for information retrieval. Thus, the users could find the information they were looking for exactly at the same place that it was residing during the standalone tests. Regarding the content delivered in this case, the users again stated in general that it was sufficient enough, at the high percentage of 81% (Fig. 5.b). However, it is obvious that the percentage in this case was a bit lower than the respective percentage of the stand-alone tests. The latter was more or less an expected finding too, given the fact that during the cross-site testing sessions the users were located at a city that was an unknown environment to them (simulating a user’s travel to a different country than his/her own), compared to the cities of the stand-alone tests, which in most cases were the ones that the users lived in. As a consequence, during cross site testing, the users had increased needs regarding the amount and the detail level of information delivered. According to the users’ opinion, these increased needs were masked to an extent due to 1) the homogeneity of the content delivered during the stand-alone and cross-site tests and 2) the fact that the information was presented from exactly the same user interface of the end user agent, which they had already used during the stand-alone tests. As a result, the difference between the percentages of the satisfied users from the sufficiency of the delivered information, between the two types of testing sessions, was kept at a significantly low level (6%).
394
D. Giakoumis et al.
The results of the pilot studies indicated that users usually need more detailed information from an info-mobility services system, when they are moving within an unknown environment, than when they are inside an already familiar one. However, the fact that the information provided from our approach is given to the users in a homogeneous fashion, regardless their location and the service provider used, improves their understanding of the content derived from a location-aware information provisioning system. Furthermore, in a few cases service providers indicated that they could offer through their services more information than our ontology could handle. The degree of detail of the information delivered from our approach relies heavily on the detail level of the ontological concepts used within the middleware used. Thus, it is absolutely essential for a proper implementation of such frameworks to use ontological concepts of a good detail level, that cover the scope of interest in a way as complete as possible. Concluding, the analysis of the testing sessions’ outcome indicated that our approach’s ontological middleware for the homogenization of content derived from info-mobility services, prior to its delivery to end users, enhances the capabilities of a system regarding the provision of location-aware information. This approach allows for large amounts of similar services to be integrated, in order for the system to be able, each time information is needed to select the best content source to be used, thus ensuring the best Quality of Service (QoS) possible.
References 1. Hazas, M., Scott, J., Krumm, J.: Location-aware computing comes of age. IEEE Computer 37, 2 (2004) 2. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 34–43 (2001) 3. Roman, D., Keller, U., Lausen, H., De Bruijn, J., Lara, R., Stollberg, M., Polleres, A., Feier, C., Bussler, C., Fensel, D.: Web Service Modeling Ontology. Applied Ontology 1(1), 77– 106 (2005) 4. ASK-IT Project Official Web Site, http://www.ask-it.org/ 5. W3C Web Services, http://www.w3.org/2002/ws/ 6. WS-I Basic Profile Version 1.1, http://www.ws-i.org/Profiles/BasicProfile-1.1.html 7. ASK-IT Ontology, http://askit.iti.gr/ontology/ 8. Protégé Official Web Site, http://protege.stanford.edu/ 9. Smith, M.K., Welty, C., McGuinness, D.L.: OWL Web Ontology Language Guide. W3C, Feburary 10 (2004), http://www.w3.org/TR/owl-guide/
Legible Character Size on Mobile Terminal Screens: Estimation Using Pinch-in/Out on the iPod Touch Panel Satoshi Hasegawa1, Masako Omori2, Tomoyuki Watanabe3, Shohei Matsunuma4, and Masaru Miyao5 1
Department of Information Media, Nagoya Bunri University, Inazawa Aichi, Japan [email protected], [email protected] 2 Faculty of Home Economics, Kobe Women’s University, Suma-ku Kobe, Japan [email protected] 3 Faculty of Psychological and Physical Science, Aichi Gakuin University, Aichi, Japan [email protected] 4 Nagoya Institute of Technology, Showa-ku Nagoya, Japan [email protected] 5 Graduate School of Information Science, Nagoya University, Chikusa-ku Nagoya, Japan [email protected]
Abstract. Using a multi-touch display on an iPod touch mobile multimedia player, an evaluation experiment was conducted to determine the most legible size for characters displayed on screens of mobile terminals. Subjects enlarged the characters by pinch-out (pinch opening) and/or reduced them by pinch-in (pinch closing) on a multi-touch display with more than one finger, and adjusted to the sizes of alphanumeric or Japanese characters to be the most legible. The characters were displayed positively (black characters on a white background) or negatively (white characters on a black background) using graphic text on the iPod touch. The adjusted sizes of characters and viewing distances were measured and visual angles were calculated. The influence of the positive or negative image display mode and the age of subjects on these legibility parameters is described. Keywords: Readability, Visibility, Small Display, Character, Aging Effects.
1 Introduction Mobile information terminals are very popular in the world today. The use of text email and Web browsing on mobile phones is spreading. Mobile phones with embedded media players, such as the iPhone (Apple Inc.), and portable media players such as the iPod touch are also widespread. Opportunities to read characters on small screens of mobile terminals are increasing. Legibility and readability of characters on mobile terminal screens should be investigated and the appropriate sizes of characters should be estimated. Readability of characters on mobile phones has been studied using several patterns of character size displayed on the screen [1-6]. Most of these studies were conducted with voice reading experiments [1-5]. Recently we conducted M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 395–402, 2009. © Springer-Verlag Berlin Heidelberg 2009
396
S. Hasegawa et al.
an experiment in which target characters were searched from meaningless text [6]. However, Tamura [7] pointed out that this method might not indicate real reading performance of users. In this study we evaluated the legibility of characters on the liquid crystal displays of mobile terminals, using the multi-touch screen on iPod touch, with which users can enlarge the characters displayed on the screen by pinch-out (pinch opening) and reduce them by pinch-in (pinch closing) with their fingers to adjust them to the most legible sizes (Fig. 1).
(a) top menu
(b) enlargement by pinch-out
(c) reduction by pinch-in
Fig. 1. Multi-touch screen on iPod touch
2 Method The iPod touch with multi-touch screen (480×320 pixels, 3.5 inch, 163 ppi) was used to display graphic text in PNG format. Four types of graphic texts were prepared as shown in Fig. 2. Alphanumeric characters were randomly ordered without meaning (Fig. 2 a, b) following the method of ISO 9241-3 Amendment 1 [8], with slight modification for the small screens of mobile terminals. ISO [8] shows a method to evaluate the quality of flat panel displays, but we used the meaningless alphanumeric character strings only for the purpose of investigating the sizes of characters displayed on mobile terminals that are legible. The Japanese characters (Fig. 2 c, d) formed sentences with meaning. Both were prepared in positive (Fig. 2 a, c) and negative (Fig. 2 b, d) images. Subjects enlarged the graphic text by pinch-out (pinch opening) (Fig. 1 b) and/or reduced them by pinch-in (pinch closing) (Fig. 1. c) with their fingers to adjust the character sizes to be the most legible. Two experiments were conducted in this study. All character images shown in Fig. 2 (a-d), including negative displays, were used in experiment 1 for young subjects of 19-23 (21.2±1.0) years old (n=30). Only positive images (Fig. 2 a, c) were used in experiment 2 with variously aged subjects of 19-76 (49.0±18.3) years (n=64). All subjects in both experiments were Japanese. Subjects who normally wore
Legible Character Size on Mobile Terminal Screens
(a) Alphanumeric, Positive
(b) Alphanumeric, Negative
(c) Japanese, Positive
(d) Japanese, Negative
397
Fig. 2. Graphic text. (a), (b), (c) and (d) were used in experiment 1. (a) and (c) were used in experiment 2. Red or cyan color squares were used to measure character height.
glasses when reading the newspaper participated in the experiment wearing glasses as usual. Subjects sitting on a chair adjusted the character size to be easy to read in a comfortable posture. Character heights and viewing distances were measured and recorded for each adjusted character image on the screen.
3 Results Visual angles were calculated from measured character heights and viewing distances. The results of experiment 1 (positive vs. negative) are shown in Fig. 3 (alphanumeric) and Fig. 4 (Japanese). Experiment 2 results are shown in Fig. 5 (age dependency), Fig. 6 (age groups. alphanumeric) and Fig. 7 (Japanese), respectively. Sixty-four subjects in experiment 2 were grouped into 19 young (age: 19-39, 26.6±5.9), 22 middle-aged (40-59, 46.9±6.1) and 23 elderly (60-76, 69.5±4.3) subjects in Fig. 6 and Fig. 7. Significant differences of the averaged values are shown in these Figures with the p-values.
398
S. Hasegawa et al.
Positive
Negative
(a) Viewing distance (Alphanumeric)
Positive
Negative
(a) Viewing distance (Japanese)
p=0.0447
Positive
Negative
(b) Character height (Alphanumeric)
Positive
Negative
(c) Visual angle (Alphanumeric) Fig. 3. Positive vs. negative (alphanumeric)
Positive
Negative
(b) Character height (Japanese)
Positive
Negative
(c) Visual angle (Japanese) Fig. 4. Positive vs. negative (Japanese)
Legible Character Size on Mobile Terminal Screens
(a) Viewing distance
(b) Character height
(c) Visual angle Fig. 5. Age distribution of legibility
399
400
S. Hasegawa et al.
(a) Viewing distance (Alphanumeric)
(a) Viewing distance (Japanese) p = 0.0147 p = 0.0483
(b) Character height (Alphanumeric)
(c) Visual angle (Alphanumeric) Fig. 6. Age group effects (alphanumeric)
(b) Character height (Japanese)
(c) Visual angle (Japanese) Fig. 7. Age group effects (Japanese)
Legible Character Size on Mobile Terminal Screens
401
Character height and visual angles were larger with positive images than with negative ones (Fig. 3, Fig. 4); the difference was especially significant in the height of alphanumeric characters (Fig. 3 b). Visual distance and legible character height tended to be larger as the age of subjects increased (Fig. 5, Fig. 6 and Fig. 7). Significant differences were seen in the height of the Japanese characters (Fig. 7 b).
4 Discussion As VDT work criteria [8], the ISO 9241-3 [9] recommends that the minimum alphabetical or numerical character height should be 16 minutes of the arc, with character heights of 20 to 22 minutes of the arc preferred. In this study young subjects in experiment 1 indicated the most legible sizes to be 35.3±19.8 minutes (3.1±1.3 mm) in alphanumeric character height, and 29.5±12.1 minutes (2.9±1.0 mm) in Japanese character height in the positive display. Subjective estimation of the most legible character size shown in the results of this experiment might be larger than the recommended size or the size needed for legibility. Negative images showed lower legibility than positive images in experiment 1. Positive displays are expected to be legible for young persons with normal eyesight. Since the experimental conditions are not equal, results of experiments 1 and 2 cannot be compared easily. The decrease in legibility with age was shown in the results of experiment 2. The elderly may have various visual conditions and need greater viewing distances than younger persons. Viewing distance increased with the age of the subjects due to the influences of presbyopia [2,4,10]. Larger character size is expected to be needed by elderly persons. The aged may have difficulty in adjusting the viewing distance to become optimal, at least in the range of adjustment allowed by the length of their arm with a mobile terminals in their hand. The influence of contrast (positive or negative) and age on legibility can be shown with this method by the character sizes adjusted by subjects. It is notable that the legibility of characters on mobile terminal screens has been quantified with the use of subjectively adjusted legible character sizes, and the influences of display contrast and user's age clearly indicated.
5 Conclusion When characters are small, younger people ensure readability by shortening the viewing distance. However, elderly people find it far more difficult to see small characters. Legibility also deteriorates in negative display modes. Adjusted sizes might not be sufficiently legible for some elderly. In the current ubiquitous society, mobile terminals should be usable and harmless for workers, students and people of all age groups with various visual conditions. Universal design that considers the abilities of older people is desirable. The results of this study may contribute to developing guidelines for mobile character terminals equal to VDT work standards [8,9] that have been established for conventional desktop type information.
402
S. Hasegawa et al.
References 1. Hasegawa, S., Fujikake, K., Omori, M., Miyao, M.: Readability of characters on mobile phone liquid crystal displays. International Journal of Occupational Safety and Ergonomics (JOSE) 14(3) (2008) 2. Omori, M., Watanabe, T., Takai, J., Takada, H., Miyao, M.: Readability and characteristics of the mobile phones for elderly people. Behaviour & Information Technology 21, 313– 316 (2002) 3. Darroch, I., Goodman, J., Brewster, S., Gray, P.: The effect of age and font size on reading text on handheld computers. In: Costabile, M.F., Paternó, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 253–266. Springer, Heidelberg (2005) 4. Hasegawa, S., Matsunuma, S., Omori, M., Miyao, M.: Aging effects on the readability of graphic text on mobile phones. Gerontechnology 4(4), 200–208 (2006) 5. Hasegawa, S., Sato, K., Matsunuma, S., Miyao, M., Okamoto, K.: Multilingual disaster information system: Information delivery using graphic text for mobile phones. AI & Society 19(3), 265–278 (2005) 6. Hasegawa, S., Miyao, M., Matsunuma, S., Fujikake, K., Omori, M.: Effects of aging and display contrast on the legibility of characters on mobile phone screens. International Journal of Interactive Mobile Technologies (iJIM) 2(4), 7–12 (2008) 7. Tamura, H., Omori, M., Hasegawa, S., Fujikake, K., Choui, M.: NIRS quick component analysis of brain activities engaged in visual text search. In: Symposium on Mobile Interaction 2008, Tokyo, Japan, pp. 187–190 (2008) (in Japanese) 8. ISO 9241-3. Amendment 1. International Organization for Standardization (2000) 9. ISO 9241-3. Ergonomic requirements for office work with visual display terminals (VDTs) Part 3: Visual display requirements. International Organization for Standardization (1992) 10. Miyao, M., Hacisalihzade, S.S., Allen, J.S., Stark, L.W.: Effect of VDT resolution on visual fatigue and readability: an eye movement approach. Ergonomics 32, 603–614 (1989)
Location-Based Mixed-Map Application Development for Mobile Devices Hyo-Haeng Lee, Kil-Ram Ha, and Kwang-Seok Hong School of Information and Communication Engineering, Sungkyunkwan University, 300, Chunchun-dong, Jangan-gu, Suwon, Kyungki-do, 440-746, Korea [email protected], [email protected], [email protected]
Abstract. This study proposes a Mixed-Map based on a mobile device. The proposed mobile Mixed-Map is designed to download 2-dimensional and 3dimensional data provided by Google maps and Yahoo maps. It then controls the map data independently. A variety of applications can be implemented. Transferring geographical data from a web geographical information system can improve efficiency by reducing response time. It adopts a tile cache method to provide continuous service when a wireless internet cannot be connected. As an example of the proposed system, we implemented a real-time location tracking system application between mobile devices that obtains current location and GPS information in real-time using GPS. The Mixed-Map can be easily applied to different technologies as the application does not simply rely on API in the ubiquitous environment. The study develops and suggests basic technologies necessary for a ubiquitous geographic information system. Keywords: Mixed-Map, Mobile, GPS, Map.
1 Introduction Ubiquitous is a new paradigm in computer systems. It is defined as the effort to enable intelligent services between physical devices and to simultaneously connect diverse objects. Ubiquitous computing is a concept where every object including roads, tunnels, buildings, and other structures are added with computer functions. This creates intelligent objects and targets, enabling sharing of information without limitations of time and space [1]. It is a computing environment that transcends existing home network or mobile computing. Ubiquitous computing entails connection of all computers, where connections are invisible to users and in which information processing has been thoroughly integrated into everyday objects and the environment. The ubiquitous network allows anyone at anytime to share information without limitations on speed and space [1]. That is, it overcomes various limitations of existing network and service, allowing users to access IT service freely. In particular, by utilizing various sensors in conjunction with a ubiquitous network, it can create a community regardless of time and space. It makes facilitates context-awareness and location awareness of people and objects. U-LBS (Ubiquitous Location Based Services) service has come to the fore in providing a service based on this model. The location recognition system technology, M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 403–412, 2009. © Springer-Verlag Berlin Heidelberg 2009
404
H.-H. Lee, K.-R. Ha, and K.-S. Hong
one of the most important technologies in providing a ubiquitous location service, is actively being researched in many countries. In this paper, we applied a mixed-map for ubiquitous and propose mobile based location tracking utilizing WiBro and GPS. This remainder of this paper is structured as follows: In section 2, we introduce related work including Google maps, Yahoo maps services, GPS and WiBro. We describe the proposed mixed mobile map device-based real-time location tracking system using a GPS module, TCP/IP protocol and WiBro in section 3. In section 4, we evaluate performance obtained from experimental observations. Section 5 summarizes this paper; we outline challenges and future directions.
2 Related Work Research on location-based service has been actively pursued for several years. A number of research projects have experimented with attaching digital information to locations. The mobile generation does not restrict users to a fixed desktop location but allows mobile computing, anywhere anytime. Web map service based applications provide wide functionality to mobile users. Google Maps Mobile [2] and Microsoft Mobile map [3] are prominent examples of mobile geospatial services. PDAs, mobile phones and other portable devices are increasingly provisioned to have location awareness via GPS devices. Satellite image service offered by Google Map [4] is better than either Microsoft live search maps [5] or Yahoo Map [6] service in terms of sharpness; it fares poorly on map readability compared to Yahoo. The street map service is inconsistent depending on the country. For instance, in case of Yahoo, the signs and colors indicating highways, national roads, country roads, subways and others are similar to actual maps, making it easier for users to use the map services. Mobile devices have several limitations compared to desktop computers. Consequently web map applications developed for the desktop cannot be easily ported to mobile environments. Limitations on memory size, processor, battery and connectivity further reduce performance, hindering the development of complex applications. Location positioning for handsets requesting services can use the GPS or the locations of base stations of a wireless network. Location positioning techniques can be divided into network, handset, and hybrid methods, the latter combining the first two [7] [8]. This paper uses Google and Yahoo maps, instead of using API. By saving map information, it increases processing speed and system stability. Using GPS and WiBro, we propose a real-time location tracking service anytime, anywhere between mobile devices.
3 Location-Based Mixed-Map System for Mobile Device 3.1 System Architecture Fig 1 shows the System Architecture of the Mixed-mobile device based real-time tracking system, proposed in this paper.
Location-Based Mixed-Map Application Development for Mobile Devices
405
Fig. 1. Mixed mobile map device architecture based real-time tracking system
The user receives the desired information from the mobile device, mounted with a GPS receiver, to access the location-based service. When the client transfers longitude and latitude data received from the GPS receiver to the GPS application server via WiBro using TCP/IP, other mobile client users may convey map data of Google maps and Yahoo maps, accessing the GPS application server. 3.2 GPS GPS is a satellite-based navigation system consisting of a network of 24 orbiting satellites in six different orbital paths [9]. The satellites are constantly moving, making two complete orbits around the Earth in just less than 24 hours. The identifier is followed by the sequence of the l data fields, delimited by a comma. The terminal character is an asterisk, followed by a checksum value. A common NMEA Sentence used for location is: GPRMC,021708.000,A,3717.6037,N,12658.6239,E,2.07,23.75,280907,,,A GPGGA,021709.000,3717.6077,N,12658.6237,E,1,04,2.3,77.1,M,19.6,M,,0000 GPGSA,A,3,16,14,07,01,,,,,,,,,5.3,2.3,4.7 GPRMC,021709.000,A,3717.6077,N,12658.6237,E,2.14,17.03,280907,,,A GPGGA,021710.000,3717.6102,N,12658.6233,E,1,04,2.3,61.1,M,19.6,M,,0000 GPGSA,A,3,16,14,07,01,,,,,,,,,5.3,2.3,4.7 GPRMC,021710.000,A,3717.6102,N,12658.6233,E,2.31,9.15,280907,,,A The $GPGGA sentence has other information including Latitude, Longitude and Altitude [10]. Most mapping applications require latitude and longitude information to be represented as signed decimal degrees, with negative latitudes for south and negative longitudes for west. This paper converts latitude and longitude information from the "degrees, minutes, and decimal minutes" format to the "decimal degree.” 3.3 Map Analysis The Google maps server and Yahoo maps server contain vast earth image databases. All the major tile-based map interfaces (Google Maps, Microsoft Virtual Earth and Yahoo Maps) map the spherical Mercator system to tiles.
406
H.-H. Lee, K.-R. Ha, and K.-S. Hong
Fig. 2. A Mercator projection map of the earth
Fig 2 is a Mercator projection map of the earth. Equations (1) and (2) determine the x and y coordinates of a point on a Mercator map from its latitude φ and longitude λ (with λ0 being the longitude in the center of map) [11].
TileX = λ − λ0
π
ϕ
TileY = ln(tan( + )) 4 2
(1) (2)
1 1 + sin(ϕ ) ln( ) 2 1 − sin(ϕ ) = sinh −1 (tan(ϕ )) = tanh −1 (sin(ϕ )) = ln(tan(ϕ ) + sec(ϕ )). =
The scale is proportional to the secant of the latitude φ, getting arbitrarily large near the poles, where φ = ±90°. The pole's y is plus or minus infinity as seen from the formulae. Google maps and Yahoo maps maximum latitude φ occurs at ±85.05113 degrees when the Mercator y value = π [11]. To calculate the PixelX and PixelY coordinates of the tile, the author first determine the latitude and longitude of the upper left and lower right corner of the tile. Since the author knows each tile is 256 x 256 pixels, and the author now have the decimal TileX and TileY coordinates, this is a simple calculation. For the upper left corner: (3)
(4)
Location-Based Mixed-Map Application Development for Mobile Devices
407
The mathematics to calculate the latitude and longitude of the upper left and the lower right corner is a little bit more complicated, since the author has to consider that the tile is a flat object, but the latitude and longitude are in the WGS84. The formula to calculate the latitude is
(5)
The formula to calculate the longitude is (6)
The author can identify the latitude and longitude coordinates in the tile with the equation stated above. The latitude and longitude coordinates received from GPS were applied to the tile system in this paper. Google maps and Yahoo maps hold the world in a number of 256x256 pixel tiles. A tile is a raster image, usually a JPEG or PNG file. Tiles form a zoom pyramid with the whole globe covered by one tile, at the next zoom level there are 4 tiles, then 16 tiles, and so on. Zoom level ranges from 18 to 1. Each Yahoo tile has corresponding latitude, longitude and zoom values. Yahoo uses an x, y coordinate system combined with a zoom value to specify the tiles to retrieve from the server.
Fig. 3. Yahoo street map
408
H.-H. Lee, K.-R. Ha, and K.-S. Hong
Fig 3 shows the URL of a tile takes the form of: http://kr.tile.maps.yahoo.com/ tl?locale=kr&v=4.1&t=m&x=27948&y=3688&z=3 using x and y for the tile coordinates, and the zoom factor. At factor 17, the earth is divided in 2x2 parts, where 0<=x<=1 and 0>=y>=-1. At each zoom step, each tile is divided into four parts. So at a zoom factor z, the number of horizontal and vertical tiles is 2^ (17-z). The number of map tiles for each zoom level is given by formula [12] [13].
Fig. 4. Google satellite map
Processing Google satellite maps resembles that from Yahoo street maps. Fig 4 is an image of the entire earth retrieved by the web address http://khm1.google.com/ kh?v=33&hl=ko&x=55897&y=25390&z=16. At zoom 0 the entire world is captured in one tile. At zoom 18 the world spreads over 68,719,476,736 tiles. With every increment of the zoom the width/height of the bitmap doubles, so the image area size is multiplied by four each time. At factor 0, the entire earth is one tile where x=0 and y=0. At factor 1, the earth is divided in 2x2 parts, where 0<=x<=1 and 0<=y<=1. 3.4 Tile Image Storage and Display With Google and Yahoo maps data, there may be frequent disconnections during communication due to prolonged database system manipulation by the users.
Fig. 5. Tile image storage and display system
Location-Based Mixed-Map Application Development for Mobile Devices
409
Fig 5 shows the focus on the target if we save eight tiles near the target; we could display a map on a screen promptly, as well as using the map without the interruption of map data. Even in a case where Internet connection has been disconnected and we cannot access the Google or Yahoo map service, we can use saved map data. 3.5 GPS Server This study requires a GPS application server in mobile devices for wireless communication. If a mobile device user acquires GPS location information, the acquired location information is transferred to the GPS application server after transforming the coordinates. Then other mobile device users access the multi-user real-time location information from the GPS application server. This study designed and implemented mobile device-based real-time location tracking for a mobile web map that can offer real-time location tracking service among multiple users in mobile environments using TCP/IP and a GPS application server. 3.6 WiBro WiBro (Wireless Broadband Internet) refers to the technology that offers wireless Internet connection at high transmission speed anywhere, anytime, guarantees mobility to the public transportation speed (120 km/h) or higher in downtown areas, and enables broadband multimedia services [7]. WiBro terminals and systems were developed at the end of 2004 by Korean engineers. WiBro was approved as a reference standard for mobile BWA (broadband wireless access) in the general meeting of the ITU-RSG 8 on September 2006 [7].
4 Experiments and Results The Mobile device was implemented in C# using Microsoft.Net Compact Framework 2.0 and Windows Mobile 5.0 Pocket PC SDK. Two mobile devices were used during the trial, running the Windows mobile Pocket PC 2005 operating system. The Samsung SPH-M8200 mobile devices were used with the GPS module. Field tests were performed in the WiBro environment by walking at 5km/h and driving at 100km/h speed and receiving GPS data.
Fig. 6. User interface for GPS information
410
H.-H. Lee, K.-R. Ha, and K.-S. Hong
Fig 6 shows latitude, longitude, speed, and altitude of GPS data in a Samsung SPH-M8200 Mobile device.
Fig. 7. User interface for login
Fig 7 shows a mobile device for tracking location in real-time, a host address and a port number to login to the server.
Fig. 8. A Yahoo street map and a Google satellite map
Fig 8 shows a Yahoo street map and a Google satellite map downloaded from the Google map server and yahoo map server. The latitude and longitude information obtained from a GPS device allows us to display a current location in a mobile device in real-time and transmit the information to the server through TCP/IP.
Location-Based Mixed-Map Application Development for Mobile Devices
411
Fig. 9. Location log
Fig 9 shows the GPS location log, received at the campus, representing a five minutes walk at an average speed of 5 km/h. The X-axis is the Longitude and Y-axis is Latitude.
Fig. 10. Number of satellites and HDOP
As shown in Fig 10, the error value increased in the actual test, as the number of satellites decreased, and the HDOP value increased between buildings. Map data of the Google Maps server and Yahoo Maps server were rapidly updated in the mobile devices. Wireless internet communication for real-time location tracking among mobile devices is appropriate for location detection, as confirmed in the experiment.
5 Conclusion The Mixed-Map can be easily adapted to various technologies, as the application does not rely on the API in the ubiquitous environment. This sharing reduces development cost and time to provide rich functionality and data. The new design exhibits high feasibility in Mixed-map. The study develops and suggests basic technologies necessary for a mobile HCI (Human-computer interaction). The study will be extended in the future to add functions to improve user convenience.
412
H.-H. Lee, K.-R. Ha, and K.-S. Hong
Acknowledgments This research was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (No. 2008-000-10642-0).
References 1. Wikipedia, http://en.wikipedia.org/wiki/Ubiquitous_computing 2. Google maps mobile, http://www.google.com/gmm/ 3. Microsoft mobile map, http://livesearchmobile.com/windows_mobile.htm 4. Google maps, http://maps.google.com 5. Microsoft live search maps, http://maps.live.com 6. Yahoo maps, http://kr.gugi.yahoo.com/map/ 7. Adusei, I.K., Kyamakya, K., Jobmann, K.: Mobile Positioning Technologies in Cellular Networks: An Evaluation of Their Performance Metrics. In: MILCOM 2002 Proc., vol. 2, pp. 1239–1244 (2002) 8. Syrjärinne, J.: Studies of Modern Technologies for Personal Positioning. Doctor of Technology Thesis Work, Tampere University of Technology (2001) 9. GPS, http://en.wikipedia.org/wiki/GPS 10. NMEA data, http://www.gpsinformation.org/dale/nmea.htm 11. Mercator Projection, http://en.wikipedia.org/wiki/Mercator_projection 12. Simple Analysis of Google Map and Satellites, http://dunck.Us/collab/ Simple_20Anysis_20of_20Google_20Map_20and_20Satellite_20Tiles 13. Add Your Own Custom Map, http://mapki.com/wiki/Add_Your_Own_Custom_Map
A Comparison of Artifact Reduction Methods for Real-Time Analysis of fNIRS Data Takayuki Nozawa and Toshiyuki Kondo Institute of Symbiotic Science and Technology, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan {tknozawa,t_kondo}@cc.tuat.ac.jp
Abstract. Due to its convenient, low physical restraint, and electric noise tolerant features, functional near-infrared spectroscopy (fNIRS) is expected to be a useful tool in monitoring users’ brain activity in HCI. However, fNIRS measurement suffers from various kinds of artifacts, and no standardized method for artifact reduction has been established so far. In this study, we compared high-pass/band-pass filtering, global and local average references, independent component analysis (ICA) based method, and their combinations. Their effectiveness for artifact reduction was evaluated by a cognitive task recognition experiment. The results showed all the methods have artifact reduction capability, but their effectiveness depends on subjects and tasks. This suggests that it can be more practical to try various artifact reduction methods and chose the best one for each task and subject, instead of pursuing a single standardized method.
1 Introduction Human computer interactions, including mobile interactions, are beginning to incorporate wide range of sensors for detection of user intention and context, from GPS and accelerometer to the sensors of various biomedical signals. Among them, sensing users’ brain activity is expected to provide rich information for HCI, and being studied extensively in the field of brain-computer interface (BCI) or brain-machine interface (BMI). There, brain activity data are analyzed and classified in real-time, and the result is used for identification and monitoring of the user’s cognitive states or for control of devices, etc. While electroencephalography (EEG) is most widely utilized in non-invasive BCI, usefulness of functional near-infrared spectroscopy (fNIRS) as a convenient, low physical restraint, and electric noise tolerant method, is worth to be studied more intensely for HCI and mobile interactions. (For example, HITACHI Inc. has developed a “wearable optical topography” system that facilitates study of brain activity in daily life [1].) Like the functional magnetic resonance imaging (fMRI), fNIRS evaluates neural activity indirectly from blood-flow change in cortical microvasculature. This brings some difficulty in real-time analysis and utilization of fNIRS data for BCI. For one thing, the temporal resolution is said to be not very high (several seconds in time scale), due to relatively large time constant in neurovascular coupling. However, it M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 413–422, 2009. © Springer-Verlag Berlin Heidelberg 2009
414
T. Nozawa and T. Kondo
has recently been reported that quicker ( ≤ 100 milliseconds in time scale) component may also reflect changes in cognitive tasks [2]. Another difficulty from the measurement mechanism of fNIRS is the existence of blood flow artifacts that do not originate in cognitive activity of the brain. Periodic components corresponding to cardiac, respiratory and other blood-flow regulation dynamics are frequently observed [3, 4]. Changes in posture, such as tilting the head, often induce relatively large artifacts on the fNIRS data (given that the measurement optodes are well settled, this is likely to be due to skin blood concentration or dispersion by gravity). Hence naive analysis of fNIRS data from an experiment without any posture regulation can result in a spurious correlation, which is in fact caused by body movement accompanied by task execution. However, neither posture regulation nor systematic artifact reduction has been mentioned in some literature on fNIRS studies. Continued pressure and heating by optodes occasionally induce a drift component. In some HCI applications, slow components derived from change in arousal level or mental fatigue should be distinguished as artifacts, even though they are associated with brain activity. These types of artifacts are commonly expected in the application for HCI, and besides, even in the same measurement setting, the existence and magnitude of these artifacts are considerably different among subjects. In the current study, we focus on this problem, and study the effectiveness of several artifact reduction methods which are available for real-time analysis.
2 Method In this section, we briefly review the basis of fNIRS measurement mechanism, and then describe the artifact reduction methods that we tested. 2.1 fNIRS
Transmission of near-infrared light in living tissue is sensitive to hemoglobin concentration and oxygenation state, with different absorption characteristics for different wavelengths. Using this physical property, fNIRS estimates changes in hemoglobin concentration associated with changes in regional cerebral blood flow (rCBF), which are coupled to those in neuronal activity [5]. We use an fNIRS imaging system (FOIRE-3000, Shimadzu Co., Japan), which adopts near-infrared lasers of thee wavelengths, 780, 805, and 830 nm. For each channel j (that corresponds to a neighboring pair of source and detector optodes, as shown in Fig. 1), optical densities A j ,λ in wavelengths λ are measured. Then, from the changes in the optical density ΔA j ,λ , relative concentration changes of oxygenated hemoglobin Δoxy j and of deoxygenated hemoglobin Δdeoxy j are estimated by ⎛ ΔA j , 780 ⎞ ⎟ ⎛ Δoxy j ⎞ ⎛ − 1.4877 0.5970 1.4877 ⎞⎜ ⎜ ⎟=⎜ ⎟ Δ A ⎜ ⎟, j , 805 ⎜ ⎟ ⎜ Δdeoxy j ⎟ ⎟ ⎝ ⎠ ⎝ 1.8545 − 0.2394 − 1.0947 ⎠⎜ ΔA ⎝ j ,830 ⎠
(1)
which is derived from the modified Beer-Lambert law and the extinction coefficients of the tissue reported by Matcher et al. [6, 7].
A Comparison of Artifact Reduction Methods for Real-Time Analysis of fNIRS Data
415
2.2 Artifact Reduction Methods
As artifact reduction methods, we picked up high-pass/band-pass filtering, global and local average references, independent component analysis (ICA), and their combinations. High-pass/Band-pass Filtering. High-pass filter and band-pass filter have been frequently used in offline analysis of fNIRS data, to eliminate drift component and flatten the baseline (in the case of band-pass filter, also to eliminate quick components and smooth the data) [8-10]. In these cases, the filtering has been applied in the frequency domain, attaining virtually ideal brick-wall filter. For real-time analysis, however, a causal filter must be used, though it entails the risk of phase shift and distortion. One should note that the cutoff frequency must be chosen according to the setting of HCI tasks. In our experimental setting, as explained in Section 3, fNIRS recording lasted about 10 minutes and high-frequency components were averaged out for the performance evaluation of artifact reduction. Therefore, aiming at reducing only slow artifact components (especially the drift which can be induced by change of arousal level, fatigue, continued warming, etc.), we used 4th order Butterworth high-pass filter [11], with cutoff frequency 1.67 × 10 −3 Hz. Global Reference. If an artifact component is assumed to be added uniformly on wide range of channels, it can be reduced by subtracting the global average across all the channels from the raw data of each channel in every time. This global reference method can be also effective in extracting localized brain response, and has often been used for EEG data (e.g. [12]). As the concentration changes Δoxy j and Δdeoxy j are relative quantities, direct comparison or averaging of those values among channels can cause false results. One way to avoid this problem is to convert the raw data from each channel into the z-score in advance [13]: First, mean and standard deviation (SD) of (Δoxy j , Δdeoxy j ) in “preparation phase” of a measurement are calculated for each channel j , as ( μ Δj oxy , μ Δj deoxy ) and (σ Δj oxy , σ Δj deoxy ) . Using these values, every data at time t in “analysis (testing) phase”, as well as data in the preparation phase, are converted by
Δoxy j (t ) :=
Δoxy j (t ) − μ Δj oxy
σ Δj oxy
, Δdeoxy j (t ) :=
Δdeoxy j (t ) − μ Δj deoxy
σ Δj deoxy
.
(2)
Then the global reference is applied as n
n
l =1
l =1
Δoxy j (t ) := Δoxy j (t ) − ∑ Δoxyl (t ) , Δdeoxy j (t ) := Δdeoxy j (t ) − ∑ Δdeoxyl (t ) ,
(3)
where n is the number of channels. To justify the use of the mean and SD in the preparation phase for the z-score conversion in the analysis phase, it is supposed that the preparation phase is sufficiently long and the tasks in the two phases are qualitatively the same.
416
T. Nozawa and T. Kondo
Local Reference. Local reference method is similar to the global reference method, but instead of the average over all channels, it uses average of neighboring channels for the reference:
Δoxy j (t ) := Δoxy j (t ) −
∑ Δoxy (t ) , Δdeoxy
l∈N j
l
j
(t ) := Δdeoxy j (t ) −
∑ Δdeoxy (t ) ,
l∈N j
l
(4)
where N j denotes the neighboring channels of j . This method is expected to emphasize localized activity, like the Gabor convolution kernel. Although there are no objective criteria to define the range of the “local” neighbors, we used the nearest neighboring channels (that means N 1 = {5, 6} and N 12 = {7, 8,16,17} , etc. in the configuration of Fig. 1). Independent Component Analysis. Independent component Analysis (ICA) assumes that the observed data x(t ) =[x1 (t ),… x n (t )] ( t = 1,2, … , p ) is a linear combination of
unknown and statistically independent sources s (t ) = [s1 (t ),… , s m (t )] ( m ≤ n ), that is x(t ) = s (t ) A ,
(5)
where the m × n matrix A is called mixing matrix. The problem for ICA algorithms is to find a demixing matrix W , such that the source signals s (t ) are recovered from the observed data x(t ) by s (t ) = x(t )W ,
(6)
with maximal statistical independence among the source components. If it is reasonable to suppose that artifact components are statistically independent from the components originating from cortical activity, the artifacts are eliminated by (i) demixing the observed data into the independent sources, (ii) eliminating source components with characteristic feature for expected artifacts, and (iii) re-mixing the remaining source components (this step is optional for some BCI applications). This procedure is expressed as x(t ) := x(t )WA' ,
(7)
where A' is the modified mixing matrix of A , which is obtained by substituting zeros for the rows corresponding to the artifact components. ICA has also been widely used for the artifact reduction in various brain recording methods, where physiological/technical artifact components were identified by checking the components’ inconsistency with experimental design [14], correlation with external references like electrocardiogram (ECG), or frequency/spatial distribution [15]. For fNIRS [7], Kohno et al. speculated that the skin blood flow artifact tends to be distributed uniformly in wide spatial range, as it is controlled by the autonomic nervous system. Based on this hypothesis, they defined a statistical value named coefficient of spatial uniformity (CSU) for each independent component i , as
csui = μ ( Ai* ) σ ( Ai* ) ,
(8)
A Comparison of Artifact Reduction Methods for Real-Time Analysis of fNIRS Data
417
where μ ( Ai* ) and σ ( Ai* ) are the mean and SD of the mixing matrix A ’s row i , respectively. Components with high CSU show spatially uniform changes and thus are considered as artifacts. For real-time analysis, artifact components must be automatically identified based on some predefined criteria. In this study, we adopted the CSU and cardiac pulsation frequency ratio (CPFR), which we calculated as integration of the spectral density in frequency region [0.75, 1.5] Hz divided by the total spectral power. ICA was applied separately for (Δoxy j ) and (Δdeoxy j ) , and components with highest CSU and CPFR are eliminated respectively by the above procedure (7). We supposed that the source contains as many dimensions as the observation ( m = n ), and adopted joint approximate diagonalization of eigen-matrices (JADE) for ICA algorithm, which is based on the fourth-order cumulant [16]. Combined Methods. The high-pass filter is applied independently for each channel. On the other hand, the latter three methods employ inter-channel comparison, based on the shared hypothesis that the artifacts in fNIRS signals, mainly from skin blood flow, are not restricted in a specific channel but distributed throughout the channels. Therefore, it can be expected that these two types of methods can complement each other. Based on this idea, we tried three combined methods, which are obtained by first passing the data through the high-pass filter and then applying one of the other three methods.
3 Experiment We conducted a preliminary experiment to compare the effectiveness of the above methods. In this experiment, each method’s effectiveness for artifact reduction was evaluated by improvement in task recognition performance by a classification algorithm.
5 14
1 10 19
6 15
2 11 20
7 16
3 12 21
8 17
4 13 22
9 18
Fz
source detector channel
Fp2
Fp1
Fig. 1. Configuration of source and detector optodes used in the experiment. For each neighboring pair of source and detector optodes, a recording channel is assigned.
418
T. Nozawa and T. Kondo
Time lag Task switching
Classes
(dominant tasks)
task C
C
C
task A
A
task B
A
B
task A
A
A
t
Time windows for fNIRS data Time window width
Current time
Fig. 2. Construction of feature vectors from fNIRS data, and determination of their classes. For each feature vector, corresponding class is given by the most dominant cognitive task in the lagged time window.
3.1 Settings
Four subjects conducted three types of cognitive tasks which were switched repeatedly in an order defined by the experimenter. The tasks assigned to each subject are shown in Table 1. Every task continued for 15-25 s (the duration was varied to avoid inducing a specific frequency component), and was repeated 10 times each. Subjects were asked to suppress task-dependent postural changes. Table 1. Assignment of cognitive tasks to the subjects Subject Subject A Subject B Subject C Subject D
Tasks Listening quiet instrumental music, silent text reading, number puzzle (Sudoku). Listening quiet instrumental music, silent text reading, 3D block puzzle. Listening quiet instrumental music, silent text reading, 3D block puzzle. Rest with eyes open, metal arithmetic, typing text using keypad.
We placed 8 source and 7 detector optodes, covering prefrontal regions (including Fp1, Fp2 and Fz positions of the international 10-20 system of EEG electrode placement [17]), by 22 channels. The configuration of optodes is shown in Fig. 1. For the training and testing of task classifier, 44 dimensional feature vectors were constructed by averaging Δoxy j and Δdeoxy j signals within shifting time windows. Different time window widths ranging from 1 to 10 s (interval 0.5 s) were tried. Considering relatively slow nature of fNIRS response, we also tried various lengths of time lag, ranging from 0 to 10 s (interval 0.5 s), in linking each feature vector to one of the cognitive tasks. Feature vector at the current time was linked to a task which was most dominant in the window shifted to the past by the time lag, as shown in Fig.2. Data in the former half of the experimental period were used for training of the classifier, calculation of mean and SD which were used in z-score conversion for the global/local references, and estimation of the demixing/mixing matrices for the ICA based method. Data in the latter half were used for classification performance test.
A Comparison of Artifact Reduction Methods for Real-Time Analysis of fNIRS Data
419
Table 2. Task classification precision with no artifact reduction (none), reduction methods based on high-pass filter (HPF), global reference (GR), local reference (LR), independent component analysis (ICA), and their combinations. For each subject and artifact reduction method, the best precision with the time window width (w.w.) and lag values yielding it, average precision (Avg.) ± SD over all the time window widths from 1 to 10 s and lags from 0 to 10 s, and precision at 1 s window width and 0 s lag (At (1, 0)), are shown.
Subject A
Subject B
Subject C
Subject D
Method None HPF GR LR ICA HPF+GR HPF+LR HPF+ICA None HPF GR LR ICA HPF+GR
Best 0.525 0.651 0.595 0.579 0.524 0.794 0.689 0.630 0.613 0.800 0.646 0.718 0.581 0.857
HPF+LR
0.844
HPF+ICA
0.800
None HPF GR LR ICA HPF+GR
0.730 0.719 0.781 0.757 0.714 0.714
HPF+LR
0.750
HPF+ICA None HPF GR
0.676 0.606 0.533 0.697
LR
0.650
ICA HPF+GR HPF+LR HPF+ICA
0.600 0.571 0.609 0.514
(w.w., lag) (5.0, 5.0) (4.5, 7.5) (7.0, 0.5) (3.0, 8.0) (7.0, 0.5) (8.5, 4.0) (6.5, 6.0) (5.5, 2.0) (9.5, 2.5) (8.5, 1.5) (3.5, 2.5) (7.5, 8.0) (9.5, 0.0) (8.5, 2.5) (6.5, 6.0), (6.5, 6.5) (8.5, 0.5), (8.5, 1.5) (8.0, 7.5) (9.5, 5.0) (9.5, 0.0) (8.0, 9.0) (8.5, 6.5) (8.5, 6.5) (9.5, 4.0), (9.5, 4.5) (9.0, 0.0) (9.0, 2.5) (10.0, 2.5) (9.0, 5.0) (7.5, 0.0), (7.5, 1.0) (7.5, 0.0) (7.0, 6.0) (6.5, 1.0) (8.5, 4.0)
Avg. 0.434 0.519 0.446 0.468 0.421 0.575 0.554 0.504 0.447 0.583 0.391 0.475 0.411 0.638
± ± ± ± ± ± ± ± ± ± ± ± ± ± ±
SD 0.041 0.057 0.047 0.052 0.037 0.060 0.050 0.059 0.086 0.077 0.092 0.097 0.077 0.081
At (1, 0) 0.428 0.424 0.444 0.444 0.385 0.572 0.556 0.576 0.469 0.601 0.360 0.465 0.349 0.640
0.662
±
0.073
0.636
0.599
±
0.079
0.628
0.587 0.558 0.605 0.574 0.594 0.563
± ± ± ± ± ±
0.049 0.056 0.056 0.065 0.049 0.059
0.615 0.576 0.599 0.469 0.637 0.561
0.622
±
0.046
0.599
0.561 0.436 0.369 0.523
± ± ± ±
0.046 0.048 0.056 0.056
0.588 0.508 0.373 0.538
0.502
±
0.058
0.515
0.437 0.415 0.440 0.391
± ± ± ±
0.046 0.047 0.065 0.044
0.562 0.381 0.423 0.385
420
T. Nozawa and T. Kondo
For the classification, we adopted nonlinear support vector machine (SVM) algorithm with Gaussian radial basis kernel and the one-against-one method for multiclass-classification [18]. The kernel width parameter was determined heuristically from the training data [19]. 3.2 Results
Table 2 shows task classification precision with and without the artifact reduction methods. For each subject and artifact reduction method, the best precision value with the time window width and lag values yielding it, average precision ± SD over all the time window width and lag parameter values, and precision at 1 s window width and 0 s lag, are given. Fig. 3 shows difference of the classification precision by the artifact reduction methods for a subject (subject B), with detailed dependence on the time window width and lag. For subject A and B, most of the methods were effective in improving task classification precision. Especially the high-pass filter and its combinations with other methods brought significant improvements. Although the ICA based method was not much effective by itself, it achieved fair improvement combined with the high-pass filter. On the other hand, for subject C and D, the high-pass filter and combined HPF
0.9 0.8
precision
0.7 0.6 0.5 0.4 0.3 10
0.9 0.8 0.7 0.6 0.5 0.4 0.3
0.7 0.6 0.5 8
6 4
lag [s]
6
2
4 0
2
8
0.4
6
10
4
lag [s]
6
2
window width [s]
4 0
2
8
10
0.3
window width [s]
LR
ICA
0.9 0.8
0.9 0.8
0.9 0.8
0.7 0.6 0.5 0.4 0.3 10
precision
GR precision
precision
0.8
10 8
0.7 0.6 0.5 0.4 0.3 10
8
0.7 0.6 0.5 0.4 0.3 10
8 6 4
lag [s]
6
2
4 0
2
8
8 6
10
4
lag [s]
6
2
window width [s]
4 0
2
8
6
10
4
lag [s]
HPF+LR
HPF+ICA
0.9 0.8
0.9 0.8
0.9 0.8
0.7 0.6 0.5 0.4 0.3 10
0.7 0.6 0.5 0.4 0.3 10
8 4
lag [s]
6
2
4 0
2
8
10
window width [s]
4 0
2
0
2
8
10
window width [s]
0.7 0.6 0.5 0.4 0.3 10
8 6
6
2
window width [s]
HPF+GR precision
precision
0.9
precision
precision
None
8 6 4
lag [s]
6
2
4 0
2
8
10
window width [s]
6 4
lag [s]
6
2
4
8
10
window width [s]
Fig. 3. Detailed dependence of the task classification precision for a subject (subject B) on the time window width and lag, with no artifact reduction (none), reduction methods based on high-pass filter (HPF), global reference (GR), local reference (LR), independent component analysis (ICA), and their combinations.
A Comparison of Artifact Reduction Methods for Real-Time Analysis of fNIRS Data
421
methods were not very effective and in some cases even diminished the precision. The global reference was comparatively effective for these two subjects. These results suggest the possibility that for each task (application) and subject (user) we should try out all the methods in some “general artifact reduction package” and chose the best one, rather than pursuing a single standardized method. Such a package can be also utilized for adaptive boosting algorithm [20], to prepare a collection of “weak” classifiers. Our study provides a starting point to assemble such a package.
4 Conclusions As artifact reduction methods for real-time analysis of fNIRS data, we compared high-pass filtering, global and local average references, independent component analysis based method, and their combinations. Their effectiveness was evaluated by a cognitive task recognition experiment. The results showed all the methods have artifact reduction capability, but their effectiveness depends on subjects and tasks. This suggests that it can be more practical to try various artifact reduction methods and chose the best one for each task and subject, instead of pursuing a single standardized method. We studied the effectiveness of the methods in an experiment where the tasks were switched in a block-design fashion and thus each cognitive state is expected to continue for relatively long time. For future work, we are planning to compare the artifact reduction methods for quicker responses in the event-related BCI scheme, with real-time biofeedback of the analysis results to the subjects. Acknowledgments. This study was in part supported by “Symbiotic Information Technology Research Project” of Tokyo University of Agriculture and Technology, and also by the Grant-in-Aid for “Scientific Research on Priority Areas (Area No. 454)” from the Japanese Ministry of Education, Culture, Sports, Science and Technology.
References 1. Koizumi, H., et al.: Present and Future of Brain Imaging in Mind, Brain and Education. In: IMBES Conference (2007) 2. Tamura, H., et al.: On Physiological Role of Quick Components in Near Infrared Spectroscopy. In: MOBILE 2008 (2008) 3. Obrig, H., et al.: Spontaneous Low Frequency Oscillations of Cerebral Hemodynamics and Metabolism in Human Adults. NeuroImage 12, 623–639 (2000) 4. Katura, T., et al.: Quantitative evaluation of interrelations between spontaneous lowfrequency oscillations in cerebral hemodynamics and systemic cardiovascular dynamics. NeuroImage 31, 1592–1600 (2006) 5. Hoshi, Y., Tamura, M.: Detection of dynamic changes in cerebral oxygenation coupled to neuronal function during mental work in man. Neurosci. Lett. 150, 5–8 (1993) 6. Matcher, S.J., et al.: Performance Comparison of Several Published Tissue Near-infrared spectroscopy algorithms. Anal. Biochem. 227, 54–68 (1995)
422
T. Nozawa and T. Kondo
7. Kohno, S., et al.: Removal of the Skin Blood Flow Artifact in Functional Near-infrared Spectroscopic Imaging Data through Independent Component Analysis. Journal of Biomedical Optics 12, 062111–1–9 (2007) 8. Schroeter, M.L., et al.: Towards a Standard Analysis for Functional Near-infrared Imaging. NeuroImage 21, 283–290 (2004) 9. Huppert, T., et al.: A Temporal Comparison of BOLD, ASL, and NIRS Hemodynamic responses to motor stimuli in adult humans. NeuroImage 29, 368–382 (2006) 10. Plichta, M., et al.: Event-Related Functional Near-infrared Spectroscopy (fNIRS): Are the Measurements Reliable? NeuroImage 31, 116–124 (2006) 11. Proakis, J., Manolakis, D.: Digital Signal Processing. Prentice Hall, Englewood Cliffs (2006) 12. Bénar, C., et al.: Single-Trial Analysis of Oddball Event-Related Potentials in Simultaneous EEG-fMRI. Human Brain Mapping 28, 602–613 (2007) 13. Matsuda, G., Hiraki, K.: Sustained Decrease in Oxygenated Hemoglobin during Video Games in the Dorsal Prefrontal Cortex: A NIRS Study of Children. NeuroImage 29, 706– 711 (2006) 14. Ikeda, S., Toyama, K.: Independent Component Analysis for Noisy Data — MEG Data Analysis. Neural Networks 13, 1063–1074 (2000) 15. Mantini, D., et al.: Complete Artifact Removal for EEG Recorded during Continuous fMRI Using Independent Component Analysis. NeuroImage 34, 598–607 (2007) 16. Cardoso, J.-F., Souloumiac, A.: Blind Beamforming for Non Gaussian Signals. IEEE Proceedings-F 140, 362–370 (1993) 17. Homan, R.W., Herman, J., Purdy, P.: Cerebral location of international 10-20 system electrode placement. Electroencephalogr. Clin. Neurophysiol. 66, 376–382 (1987) 18. Hsu, C.-W., Lin, C.-J.: A Comparison of Methods for Multiclass Support Vector Machines. IEEE Trans. Neural Networks 13, 415–425 (2002) 19. Karatzoglou, A., Meyer, D., Hornik, K.: Support Vector Machines in R. Journal of Statistical Software 16(9) (2006) 20. Freund, Y., Schapier, R.E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Investigation on Relation between Index of Difficulty in Fitts’ Law and Device Screen Sizes Hidehiko Okada1, Takayuki Akiba1, and Ryosuke Fujioka2 1 Kyoto Sangyo University Kamigamo Motoyama, Kita-ku, Kyoto 603-8555, Japan [email protected] 2 Kobe Sogo Sokki Co. Ltd. 4-3-8 Kitanagase-dori, Chuo-ku, Kobe, Hyogo, 650-0012, Japan [email protected]
Abstract. It is well-known as Fitts’ law that the time for a user to point a target on a GUI screen can be modeled as a linear function of “index of difficulty (ID)”. The authors investigate whether the ID formulation is appropriate independently of device screen sizes. Result of our experiment revealed that the ID formulation may not consistently capture actual difficulty: users’ pointing performances were not consistent among pointing target variations of which index of difficulty are consistent. The term A/W may not be appropriate because the term causes the observed inconsistency. Keywords: usability, Fitts’ law, touch user interface, small screen, smart phone, throughput, error rate.
1 Introduction It is well-known as Fitts’ law that the time for a user to point a target can be modeled as a linear function of “index of difficulty (ID)”, where ID is formulated as a function of the target size and distance [1, 2]. t = a + b ∗ ID.
(1)
ID = log2(A/W+1).
(2)
In Eqs. (1-2), t is the pointing time, A is the amplitude (distance) to the target, W is the target size and a, b are constant that depend on experiment conditions. ID is larger as A is larger and/or W is smaller. Values of a and b in Eq. (1) are determined by sampling (A, W, t) data and applying the linear regression analysis to the data. Eq. (2) shows that ID values are the same for (A, W) and (nA, nW) where n > 0. This research is motivated by recent smart phones that employ touch UIs. Compared with other touch screen devices such as tablet PCs, mobile phones have smaller screens so that widgets on mobile phone screens are likely to be smaller. Widgets can be designed for devices with various screen sizes so that theoretical ID values in Eq. (2) are consistent among the devices: larger/smaller sizes&distances for M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 423–429, 2009. © Springer-Verlag Berlin Heidelberg 2009
424
H. Okada, T. Akiba, and R. Fujioka
larger/smaller screens. If ID in Eq. (2) is an appropriate index of actual pointing difficulty independently of screen sizes, users’ pointing performances on the same device are consistent among widget designs (A, W) and (nA, nW): note that a, b in Eq. (1) are constant (independent to ID) so that a, b must be the same for two data sets sampled with the two widget designs (A, W) and (nA, nW). The aim of this research is to investigate whether the above is true: appropriateness of the ID formulation in Eq. (2) is evaluated from the viewpoint of dependency on screen sizes, by experiments with participants. Limitations of Fitts’ law have been researched and extensions have been proposed. For example, MacKenzie et al. [3] proposed an extension for 2D pointing tasks. Our research aims at investigating possible limitations on screen sizes. Beside, a related research was previously reported by Oehl et al.[4]. They investigated how display size influenced pointing performances on a touch UI and reported that in large displays a fast and comparably accurate execution was chosen in contrast to a very inaccurate and time-consuming style in small displays. In their research the size of small screen was 6.5”, and only a large screen touch UI device was utilized for user experiments: screen sizes were controlled by means of software program as virtual screens on the device display. In our research, the size of small screen is less than 3”, and a commercial smaller-screen mobile device is utilized.
2 Experiments 2.1 Test Tasks Participants were asked to point targets on a screen. A test task consisted of pointing two rectangle targets (target 1, 2) in a predefined order. An “attempt” was the two successive pointings of target 1 and 2, and a test task consisted of a predefined number set of the attempts. For each combination of experiment conditions, each participant was asked to perform a predefined set of the tasks. The pointing operations were logged for later analyses of pointing speed and accuracy. 2.2 Conditions Devices. Three commercial devices were used in our experiment: two tablet PCs and a PDA which have a {10.2”, 6.0”, 2.8”} touch screen respectively. The PDA was selected because several recent smart phones have such small touch screens (i.e., the PDA was used as a substitute for the recent smart phones). Screen sizes of the devices were relatively larger/middle/smaller. In this paper, these devices are denoted as devices L/M/S respectively. Participants performed test tasks by using a stylus attached to each of the three devices1. Target Sizes & Distances. For each of the three devices, two sets of targets were designed so that ID values in Eq. (2) were consistent between the two sets. Targets in 1
Differences in stylus designs may affect pointing performances as reported by Ren et al.[5]. It is assumed in our research that the stylus attached to each device is designed optimal for the device so that the stylus contributes to achieve better performances on the device than other styluses.
Investigation on Relation between Index of Difficulty
425
one of the two sets were designed with smaller sizes and distances, and those in the other were designed with larger ones. Specific designs of the two target sets are described later. In this paper, these two target sets are denoted as targets L/S respectively. Errors. Pointing speed and accuracy are usually a tradeoff [6]. Participants performed tasks under each of two error conditions: errors acceptable or not. In a test task where errors were acceptable, a participant could continue the task even if s/he made an error (misspointing), and the task was complete when the count of no-error attempts reached to a predefined number. In a condition where errors were not acceptable, a test task was cancelled by misspointing and the task was retried until the count of noerror attempts reached to a predefined number. The error condition was told to each participant before performing each task: s/he had to try a task more carefully in the errors-not-acceptable condition. 2.3 Pointing Target Designs Table 1 shows the design of target sizes and distances. Values for the device {M,L} were determined as [values for the device S] ∗ [the ratio of screen sizes, i.e., 6.0/2.8 for the device M and 10.2/2.8 for the device L]. ID values were designed to range in [2.00, 3.50] consistently among the devices {S,M,L} and the targets {S,L}. The size of target 1 was fixed to 6.0mm, empirically found to be easy enough to point first, for all conditions. Positions of targets 1 and 2 were randomly determined for each attempt under the following two constraints. • All area of both targets were inside the device screen. • Distance between center points of the two targets was a predefined value. Fig. 1 shows a screenshot of targets 1 and 2 for the device M and the targets L. The targets 1 and 2 are the black and white rectangles respectively (the target colors were consistent for all the devices). The two targets were shown at the same time, and each participant was asked to find both targets before s/he pointed the target 1. This was because visual search time should not be included in the pointing time interval. After an attempt of pointing targets 1 and 2, new targets were shown for the next attempt. 2.4 Methods of Experiments Condition combinations were 12 in total: {the devices S, M, L} ∗ {the targets S, L} ∗ {errors “acceptable”, “not acceptable”}. Each participant was asked to perform four trials of a task under each of the 12 condition combinations. The number of attempts in a task trial was 11 (of which ID=2.00-3.50 shown in Table 1) for the errors “not acceptable” condition: none of the 11 attempts had to be an error. For the errors “acceptable” condition, a task trial included 11 successful attempts for the 11 IDs respectively in Table 1 and 0+ error attempts. Each participant first performed a training task trial under each of the 12 condition combinations (thus, 12 training trials), and then performed tasks in a random order of the 12 condition combinations. The order of the 11 IDs in a trial was also randomized for each trial.
426
H. Okada, T. Akiba, and R. Fujioka Table 1. Target sizes and distances
ID 2.00 2.15 2.30 2.45 2.60 2.75 2.90 3.05 3.20 3.35 3.50
Device S Targets S Targets L W A W A 4.00 12.00 12.00 36.00 3.80 13.07 11.40 39.20 3.60 14.13 10.80 42.39 3.40 15.18 10.20 45.53 3.20 16.20 9.60 48.60 3.00 17.18 9.00 51.54 2.80 18.10 8.40 54.30 2.60 18.93 7.80 56.80 2.40 19.66 7.20 58.97 2.20 20.23 6.60 60.70 2.00 20.63 6.00 61.88
Device M Targets S Targets L W A W A 8.53 25.60 25.60 76.80 8.11 27.87 24.32 83.62 7.68 30.14 23.04 90.42 7.25 32.38 21.76 97.14 6.83 34.56 20.48 103.69 6.40 36.65 19.20 109.96 5.97 38.61 17.92 115.84 5.55 40.39 16.64 121.17 5.12 41.93 15.36 125.79 4.69 43.16 14.08 129.49 4.27 44.01 12.80 132.02
Device L Targets S Targets L W A W A 14.61 43.82 43.82 131.45 13.88 47.71 41.63 143.12 13.15 51.59 39.44 154.77 12.41 55.42 37.24 166.26 11.68 59.16 35.05 177.47 10.95 62.74 32.86 188.21 10.22 66.09 30.67 198.27 9.49 69.13 28.48 207.40 8.76 71.77 26.29 215.31 8.03 73.88 24.10 221.63 7.30 75.32 21.91 225.96 (ID: bits, W&A: mm)
Fig. 1. Screenshot for target pointing tasks
2.5 Participants Twelve subjects participated in the experiment, but 3 of the 12 subjects could for the devices S and L only due to the experiment schedule. Thus, users’ pointing log data set (A, W, t) were collected with 12 subjects for the devices S and L but 9 subjects for the device M. The 12 participants were university graduate or undergraduate students. They were all novices in using devices with touch-by-stylus UIs, but they had no trouble in performing test tasks after the 12 training trials. 2.6 Logging Pointing Operations The following data was recorded for each pointing (each tap by a stylus) into log files. • • • • • •
Target: 1 or 2 Target position: (x, y) values Target width and height: pixels Tapped position: (x, y) values Tap time: msec Error: Yes or No
Investigation on Relation between Index of Difficulty
427
The tapped position and the tap time were logged when the stylus was land on the screen, and the pointing was judged as an error or not based on the tapped position. No attempt was observed for which the stylus was landed on the target 1, moved into the target 2 and left off.
3 Data Analyses and Findings Pointing speed and accuracy were measured by throughput [7] and error rate respectively. In this research, t is the interval from the target 1 tap time to the target 2 tap time, A is the Euclid distance between the tapped points for targets 1 and 2, and W is the target width (= height). Throughput is defined as ID/t in Eqs. (1-2) [7]. (ID, t) could be observed for each attempt, so a throughput value could also be obtained for each attempt. To measure pointing accuracy, error rate was defined. Error rate = (#error attempts in a task trial)/(#total attempts in the trial)
(3)
Error rate could be calculated for only the condition “error”=“acceptable” because the data under the condition “error”=“not acceptable” didn’t include any error attempt (if an error was occurred in a trial under the condition “error”=“not acceptable”, the trial was cancelled and retried). Mean and standard deviation (SD) values of the throughput and the error rate were calculated to compare user performances on targets S to those on targets L. Throughput mean and SD values were calculated from the data of { tp(s, t, a(s,t)) } for all of the subjects, the task trials and the attempts in a task: tp(s, t, a(s,t)) denotes the throughput value for the s-th subject, t-th task and the a(s,t)-th attempt in the t-th task by the s-th subject. Error rate mean and SD values were calculated from the data of { er(s, t) } for all of the subject and the task trials: er(s, t) denotes the error rate value for the s-th subject and the t-th task. In addition, it was tested by t-test whether there was a significant difference between population mean values of throughput and error rate for two conditions. It should be noted that error attempts were included in the data under the condition “error”=“acceptable”. Error attempts might be faster (of larger throughput values) than successful attempts. In the following of this paper, throughput values were calculated with both of successful and error attempt data. Table 2 shows mean and SD values of the throughput, and Table 3 shows those of the error rate. Tables 4&5 show t-test results for throughput and error rate respectively. In Tables 4&5, “**”-marked t-scores are those with p<0.01, and nonmarked t-scores are those with p>0.05. Table 2. Mean and SD values of throughput (bit/sec) Device S Device M Device L Targets S Targets L Targets S Targets L Targets S Targets L Mean 5.73 5.73 5.86 5.76 5.52 4.76 Acceptable SD 1.37 1.14 1.30 1.80 1.34 0.87 Mean 5.15 5.57 5.69 5.63 5.32 4.60 Not acceptable SD 1.20 1.21 1.30 1.78 1.23 0.97
428
H. Okada, T. Akiba, and R. Fujioka Table 3. Mean and SD values of error rate (%) Device S Device M Device L Targets S Targets L Targets S Targets L Targets S Targets L Mean 11.23 0.52 0.93 0.69 1.56 0.52 Acceptable SD 10.35 2.04 2.66 2.34 3.29 2.04 Table 4. T-test for throughput Device S
Device M Device L Not Not Not Acceptable Acceptable Acceptable acceptable acceptable acceptable t=0.514 t=11.04** t=10.66** Targets S/L t=3.65∗10-3 t=-5.74** t=0.875 Table 5. T-test for error rate Acceptable Device S Device M Device L Targets S/L t=7.03** t=0.393 t=1.87
These tables revealed the followings. • On the device L participants could point targets S significantly faster than targets L, but on the devices S&M they couldn’t. Instead, on the device S, they could point targets L significantly faster than targets S under the condition “error”=“not acceptable”. This result indicates that, even though ID values by (2) are designed consistently among targets S&L, users’ pointing speeds will not be consistent: faster for larger/smaller size&distance widgets on smaller/larger screen devices, respectively. • On the devices M&L no significant difference was observed in the pointing accuracy among targets S&L, but on the device S participants could point targets L significantly more accurately than targets S. This result indicates that, even though ID values by (2) are designed consistently among targets S&L, users’ pointing accuracies will not be consistent too: more accurate for larger size&distance widgets on smaller screen devices. Thus, it is found that the ID definition in (2) may not consistently capture actual pointing difficulty among target designs. The result of our experiment shows that, on a smaller/larger screen, targets with smaller/larger sizes&distances are actually more difficult to point than those with larger/smaller ones. A/W in (2) is not appropriate in terms of screen size variations because the term caused the observed inconsistency.
4 Conclusion Index of difficulty formulation in Fitts’ law was evaluated from the viewpoint of consistency in widget size&distance design variations. It was found that ID in (2) may
Investigation on Relation between Index of Difficulty
429
not appropriately capture actual difficulty: user performances on the same device were not consistent among target designs (A, W) and (nA, nW). Further research is necessary to investigate better formulation of ID.
References 1. Fitts, P.M.: The Information Capacity of the Human Motor System in Controlling the Amplitude of Movement. Journal of Experimental Psychology 47(6), 381–391 (1954) 2. MacKenzie, I.S.: Fitts’s Law as a Research and Design Tool in Human-Computer Interaction. Human-Computer Interaction 7, 91–139 (1992) 3. MacKenzie, I.S., Buxton, W.: Extending Fitts’ Law to Two-dimensional Tasks. In: Proc. of ACM Conf. on Human Factors in Computing Systems (CHI 1992), pp. 219–226 (1992) 4. Oehl, M., Sutter, C., Ziefle, M.: Considerations on Efficient Touch Interfaces - How Display Size Influences the Performance in an Applied Pointing Task. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4557, pp. 136–143. Springer, Heidelberg (2007) 5. Ren, X., Mizobuchi, S.: Investigating the Usability of the Stylus Pen on Handheld Devices. In: Proceedings of The Fourth Annual Workshop on HCI Research in MIS, pp. 30–34 (2005) 6. Plamondon, R., Alimi, A.M.: Speed/Accuracy Trade-offs in Target-Directed Movements. Behavioral and Brain Sciences 20(2), 279–349 (1997) 7. ISO 9241, Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs) - Part 9: Requirements for Non-Keyboard Input Devices (2000)
Influence of Vertical Length of Characters on Readability in Mobile Phones Masako Omori1, Satoshi Hasegawa2, Tomoyuki Watanabe3, Shohei Matsunuma4, and Masaru Miyao5 1
Faculty of Home Economics, Kobe University, 2-1 Aoyama Higashisuma, Suma-ku, Kobe 654-8585, Japan [email protected] 2 Dept. of Information Culture, Nagoya Bunri University, 365 Maeda Inazawa-cho, Inazawa, Aichi 492-8520, Japan 3 Faculty of Psychological and Physical Science, Aich Gakuin University, 12 Araike, Iwasaki-cho, Nisshin 470-0195, Japan 4 The Institute for Science of Labour , 2-8-14 Sugao, Miyamae-ku, Kawasaki 216-8501, Japan 5 Information Technology Center, Nagoya University, Furo-cho, Chikusa-Ku, Nagoya 464-8601, Japan [email protected]
Abstract. The visibility of Japanese characters on the liquid crystal displays in mobile phones was studied by measurements of reading time and visual distance, and subjective evaluations. Graphic text was used to prepare various samples with various character heights of 1, 1.25, 1.5, 1.75, 2 and 2.25 times greater than the width of the characters. The vertical length of the characters had a significant effect on the parameters of reading speed and subjective evaluation of legibility. Characters with a height of between 1.5 and 2 times greater than the width showed the highest visibility in this experiment. Keywords: readability, character size, cataract cloudiness, mobile phone e-mail and graphical characters.
1 Introduction Today, the utilization of IT devices such as mobile phones (MPs) is a focus of attention as one way to enable elderly people to remain independent and continue participating in society. We thus need maintain an environment in which everyone, including elderly people, can enjoy the benefits of IT. However, in previous studies on the use of MPs, the subjects were mainly ordinary young people. No systematic studies have been done on how elderly people, physically handicapped people, and foreign residents of a country use, operate and are physiologically and psychologically affected by IT devices. Although the recommended character size on video display terminals (VDTs) of personal computers has already been standardized (ISO, 1992) [1], (JIS, 2002) [2] supported by many studies [3], research on the visibility of characters on LCDs in M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 430–438, 2009. © Springer-Verlag Berlin Heidelberg 2009
Influence of Vertical Length of Characters on Readability in Mobile Phones
431
MPs remains insufficient. The visibility of characters displayed by fonts supported by MPs was studied previously [4]. The purpose of the present study was to examine the relationships between the visibility of mobile phone displays and aging effects or subjects’ visual functions, based on measurements of vision, including far and near eyesight, refraction of the eyes, accommodative power, cataract cloudiness, and power of eyeglasses. We investigated the reading performance of users who read characters on LCDs in MPs, and examined character size, which may be related to readability. Graphic characters in Portable Document Format (PDF) were used in this research. The use of graphic characters allows characters of various sizes, shapes and contrast beyond the display functions of the product to be displayed in an experiment.
2 Method 2.1 Subjects and Methods The subjects were 74 people aged 20 to 80 years (age = 39.2±16.7), with normal or corrected-to-normal vision. The following visual functions of the subjects were measured: pupil distance (PD), refraction and lens power of their glasses, far visual acuity (5 meters), near visual acuity (NV: 70 cm, 50 cm and 30 cm) and cataract cloudiness (CC) using an anterior ocular segment measuring instrument, EAS1000TM (NIDEK Inc.). The indication of cataract cloudiness had 256 levels [5], in which 0 indicated no cloudiness and 255 maximum cloudiness. We investigated readability with 6 character heights of 1, 1.25, 1.5, 1.75, 2 and 2.25 times greater than the width of the characters. Using the 6 character heights, we analyzed the performance of elderly people in reading Japanese sentences with each character height on MPs. Then the subjects, with either naked eyes or glasses for near visual acuity, were asked whether they could read the sentences on backlit MPs in daytime room luminance conditions (bright luminance: 700-800 lx). We used sentences adopted randomly and shown in random order. We measured the time it took the subjects to finish reading the sentences (reading speed: RS), and recorded the number of figures they misread (ER). 2.2 Display Figure 1 show mobile phones (MP) used in the experiment. The size of the liquid crystal display used in this experiment was 2.4 inch (48mm×36mm). This is about the same as mean size of the MP displays used by the subjects in the experiment. The LCD resolution is 240×320 pixels and the display is QVGA TFT. Image characters in the Portable Document Format (PDF) were used in this experiment. Vector data has the advantage that the expansion and reduction of the display is possible with the layout maintained. The sentences used by the experiment were excerpted from an article in a newspaper. The subjects did not scroll the display but read out loud the sentences on one screen. The luminance and the contrast of the MP were adjusted to the appropriate levels by the tester in advance. Window blinds prevented glare or uncomfortable light.
432
M. Omori et al.
2.3 Character Size In this experiment, the readability between font and size of graphical characters on small liquid crystal displays (LCDs) in MPs was researched. In the experiment, two different fonts (Ming type and Gothic type) were used. The base character size was height and width of 3mm×3mm. The font did not change in width (3mm), only the height of the character was changed. Examples of character size are shown in Table 1. With each font size there was a uniform ten characters per line.
(a) Mincho
(b) Gothic
Fig. 1. 1 Samples of graphic characters (Mincho, Gothic) Table 1. Samples of graphic character Type of
Characters
Width length of
font
per line
character(mm)
MS-Mincho Gothic
10
3
Vertical length of characters(mm) 1 time
1.25 time
1.5 time
1.75 time
3
3.8
4.5
5.3
3
3.8
4.5
5.3
2 time
6
2.25 time
6.8
2.4 Experimental Procedures Experimental conditions were adjusted so that the average luminance was 800~1200 (lx) on the horizontal plane on the desk, and the MP display was vertical. Sample sentences were displayed in rotated order. Reading time and visual distance between the eyes and LCD were measured. After each reading, subjects evaluated readability by choosing from 1 (very hard to read) to 5 (very easy to read). The total number of characters displayed on one screen was different according to the font size. The number of characters displayed on one screen was therefore divided by the reading time. Reading speed (number of characters/sec.) was calculated from the measured reading time. In a two-way ANOVA, the two dependent variables of RS and Error were used for the subjects' reading performance. For both ANOVAs, two independent variables were taken from among 3 variables. In all cases one independent variable was the type of font or size of character, while the other was taken from age, CC and NV. Only the results of CC are reported.
Influence of Vertical Length of Characters on Readability in Mobile Phones
433
We classified subjects by cataract cloudiness into 3 groups as follows: (-): 0-100, (+): 100-160, (2+): 161-255.
3 Results 3.1 Result of MS-Mincho In the two-way ANOVA, we analyzed whether reading speed or the error rate in reading was influenced by the two factors of the height of a character on the display and CC. CC was associated with a significant difference in RS in the results of the two-way ANOVA (p<0.001). In multiple comparisons of CC, significant differences were seen between 0 to 99 and 100 to 159 (p<0.001), and between 0 to 99 and 160 or more (p<0.01). Namely, RS was faster in the non-cataract cloudiness group (-), and RS became slower as cataract cloudiness grew severer. However, a significant difference was not seen with the font size (Figure 2). We analysed the number of errors (ER) in reading as well as reading speed. ERs were tested statistically as the dependent variable with the two independent variables of character height and CC group. CC was associated with a significant difference in ER in the results of the two-way ANOVA (p<0.001). In multiple comparisons of CC, significant differences were seen between 0 to 99 and 100 to 159 (p<0.001), and between 0 to 99 and 160 or more (p<0.001). Namely, ER decreased in the non-cataract cloudiness group (-), and RS increased as cataract cloudiness grew severer. No significant difference was seen in ER with the size of characters (Figure 3). However, when CC was 160 or more, ER was 0 (1.5 times). Character height
1.25;
1.5;
1.75 (Time)
Reading speed (character/sec)
1;
**: p<0.01 ***: p<0.001
Cataract cloudiness Fig. 2. Relation between RS and cataract cloudiness for different character heights with the Mincho font
434
M. Omori et al. height CCharacter haracter height:
1;
1.25;
1.5;
1.75 (Time)
Number of Errors
***: p<0.001
Cataract cloudiness Fig. 3. Relation between errors and cataract cloudiness for different character heights with the Mincho font
Character height
1;
1.25;
1.5;
1.75 (Time)
Subjective evaluation (1-5)
c
*: p<0.05
Cataract cloudiness Fig. 4. Relation between subjective evaluation and cataract cloudiness for different character heightswith the Mincho font
Moreover, a significant difference was not seen in the results of the visual distance for either CC or font size. However, a tendency was seen in which the visual distance increased when the CC was 100 layers or more. In the SE, the CC was associated with a significant difference in the results of the two-way ANOVA (p<0.05). In multiple comparisons of CC, significant differences were seen between 0 to 99 and 100 to 159 (p<0.05). The evaluation of non-cataract cloudiness group (-) was good with all character sizes. Moreover, a significant difference was not seen in the font size. However, the evaluation of 1.5 (times) was best when CC had more than 100 layers. (Figure 4).
Influence of Vertical Length of Characters on Readability in Mobile Phones
435
3.2 Result of Gothic Type
Reading speed (character/sec)
Similarly, we tested statistically whether there were any associations between the size of graphical characters and the readability. Gothic type was tested statistically as the dependant variable with the two independent variables of character size and three ranks of CC using two-way ANOVA. Figure 5 shows the relation between CC and RS. CC was associated with a significant difference in SR in the two-way ANOVA (p<0.001). In multiple comparisons of CC, significant differences were seen between 0 to 99 and 100 to 159 (p<0.01), and between 0 to 99 and 160 or more (p<0.001). Namely, RS decreased in the non-cataract cloudiness group (-), and RS increased as cataract cloudiness grew severer. A significant difference in RS was seen with character size (p<0.001). In multiple comparisons of size of character, significant differences were seen between 1.5 (times) and 2.25 (times) (p<0.01), and between 1.75 (times) and 2.25 (times) (p<0.01). Thus, RS became slower with font size of 2.25 (times). Figure 6 shows the relation between CC and ER. CC was associated with a significant difference in ER in the two-way ANOVA (p<0.01). In multiple comparisons of CC, significant differences were seen between 0 to 99 and 100 to 159 (p<0.001), between 0 to 99 and 160 or more (p<0.001), and between 100 to 159 and 160 or more (p<0.01). Namely, ER decreased in the non-cataract cloudiness group (-), and ER increased as cataract cloudiness grew severer. A significant difference in ER was seen with the size of characters (p<0.01). In multiple comparisons of character size, significant differences were seen between font sizes of 1 (times) and 2 (times) (p<0.05). ER was low with the 2 (times) font size. In the SE, CC was associated with no significant difference in the results of the two-way ANOVA (p<0.05). However, a significant difference was seen in the size of
1 1.25 1.5 1.75 2 2.25
** **
Character height
Cataract cloudiness Fig. 5. Relation between RS and cataract cloudiness for different character heights with the Gothic font
436
M. Omori et al.
1 1.25 1.5 1.75 2 2.25
*
Number of Error
Character height
Cataract cloudiness Fig. 6. Relation between errors and cataract cloudiness for different character heights with the Gothic font
Subjective evaluation (1-5)
*: p<0.05
1 1.25 1.5 1.75 2 2.25
*
Character height
Cataract cloudiness Fig. 7. Relation between subjective evaluation and cataract cloudiness for characters of different heights with the Gothic font
characters (p<0.01). In multiple comparisons of size of character, significant differences were seen between font sizes of 1.25 (times) and 2.25 (times) (p<0.05) (Figure 7). SE had the worst evaluation with the font size of 2.25 (times).
4 Discussion MPs with built-in cameras are useful to send and receive graphic data by e-mail. However, the liquid crystal displays (LCDs) in MPs are so small that the characters on the LCDs may not be adequately visible.
Influence of Vertical Length of Characters on Readability in Mobile Phones
437
ISO 9241-3 (1992) [1] and the Japanese VDT guideline (2002) [2] recommend that on VDTs “the minimum character height shall be 16 minutes of arc and the maximum character height shall be 24 minutes of arc for tasks in which readability is important. Character heights of 20 to 22 minutes of arc are preferred for reading tasks.” Thus, character height of 3 mm or more is preferred, which becomes approximately 2.9 mm at 20 minutes of arc for a 50 cm visual distance. However, in the JIS S 0032 (2003)[6]: Guidelines for the elderly and people with disabilities--visual signs and displays--estimation of minimum legible size for Japanese single characters, the minimum legible character size is presumed. According to this JIS criteria, the legible size ranges from 5.4 points to 7.3 points with a Gothic object, when people younger than 22 years of age read characters at a visual distance of 30 cm. The number of points is converted into mm as follows: 5.4 points = 1.90 mm, 7.3 points = 2.57 mm. These standard values (minimum legible size) were set for signs, displays or pamphlets with dark characters and white background on boards or paper, rather than dot characters on CRTs. In this experiment, the e-mail text used was a Gothic object and character size was 3 x 3 mm. Although it was not written on paper but on an LCD, the text (3 x 3 mm) was larger than the minimum legible size. With the MS-Mincho font, readability was not influenced by the character size. However, when the CC had 100 or more layers, it was thought from the readability evaluation that the 1.5 times character size was readable. With the Gothic font, RS was fast and the ER was high with both character sizes of 1.5 (times) and 2 (times) and CC of 100 layers or more. However, RS was slow with the character size of 2.25 (times). With both Ming and Gothic type, legibility was judged to be good with the 1.5 times font size. The character size is converted into mm as follows: 1.5 times = 3.0 mm, 2 times = 6 mm and 2.25 times = 6.8mm. Therefore, it was shown that the character heights from 4.5mm to 6mm are readable. Hasegawa [7] reported that readability of Japanese characters improved when they were vertically enlarged to approximately twice the width. In the present experiment, the character sizes from 4.5 mm to 6 mm suggest that readability can be assured for anyone when they read the sentence with proper near sight glasses and receive appropriate cataract care. However, font size of 2.25 times is not easily read. Therefore, character size of 6.8 mm is an improper height for readability.
5 Conclusions Today, MPs are essential as IT devices for most of people aged 40 to 60 years, as well as for young people. Today’s MPs can send and receive not only sound but also letters and pictures. The functions of MPs will continue to progress. However, the size of the display is restricted. In thinking about the universal design of MPs, it is important to consider the property of MPs and visual functions. In the present study, readability was improved by lengthening the height of the character. It is desirable that display settings such as character height can be easily adjusted, so that elderly users can adjust the displays to meet their individual needs.
438
M. Omori et al.
References 1. ISO, Ergonomic requirements for office work with visual display terminals (VDTs). ISO 9142-3 (1992) 2. Japan’s Ministry of Health, Labour and Welfare, New health guideline for VDT workers (2002) 3. Miyao, M., Hacisalihzade, S.S., Allen, J.S., Stark, L.W.: Effect of VDT resolution on visual fatigue and readability: an eye movement approach. Ergonomics 32(6), 603–614 (1989) 4. Omori, M., Watanabe, T., Takai, J., Takada, H., Miyao, M.: Visibility and characteristics of the mobile phones for elderly people. Behavior & Information Technology 21(5), 313–316 (2002) 5. Sasaki, K., Yamamura, T.: Current cataract epidemiology studies in Japan. Developments in Ophthalmology 21, 18–22 (1991) 6. JIS, Guidelines for the elderly and people with disabilities — Visual signs and displays — Estimation of minimum legible size for Japanese single character. JIS S 0032 (2003) 7. Satoshi, H., Kazuhiro, F., Masako, O., Masaru, M.: Readability of characters on mobile phone liquid crystal displays. Int. J. Occu. Safety Ergonomics 14(3), 293–304 (2008)
Intelligent Photo Management System Enhancing Browsing Experience Yuki Orii, Takayuki Nozawa, and Toshiyuki Kondo Tokyo University of Agriculture and Technology 2-24-16 Naka-cho, Koganei, Tokyo, Japan [email protected], {tknozawa,t_kondo}@cc.tuat.ac.jp
Abstract. We developed a web-based intelligent photo management system which enables automatic clustering of unstructured personal digital photo collections. We conducted a user study to assess the usability of the developed photo management system (automatic photo classifier, APC) compared with ones with limited functions. The user task adopted here was finding some of target photographs indicated by an experimenter from the subject’s personal or somebody else's photo collections. The results show that APC is better in the case of the personal photographs while it does not have a significant advantage for somebody else’s photo collections. It was suggested that the look-and-feel of a photo management system should be considered according to whether the photographs had been taken by the user him/herself or not.
1 Introduction Recent popularization of digital cameras and growing capacity with lowering price of storage media made consumers easier to take a huge amount of digital photographs. According to a certain research, we take more than 1000 of photographs a year. As the size of the personal digital photo collection grows, even just finding a target photograph becomes difficult. In most cases, people are satisfied just by taking photographs, and consequently the flood of the personal digital photo collection becomes dead storage in our personal computers. There have been several approaches to activate these digital photographs for personal use. FotoFile [1] and PhotoFinder [2] use databases to support creating annotations for photographs. Although these systems have powerful search functionality, photographs have to be annotated manually by the user. PhotoMesa [3] utilizes novel layout mechanisms to maximize the screen-space usage, but again, users have to organize the photographs beforehand. Using visual features to automatically group photographs is a popular approach. It is ideal to extract high-level semantic content from photographs automatically, and use this to index photographs objectively. But of course, this is a very difficult problem, and current systems can only extract low-level contents like color and texture. Rubner [4] proposed that a low-level content-based image similarity metric can be used to create an effective layout of sets of photographs, but he did not carry M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 439–447, 2009. © Springer-Verlag Berlin Heidelberg 2009
440
Y. Orii, T. Nozawa, and T. Kondo
out any experimental evaluation. Rodden [5] showed that users generally find given photograph faster in the created layout than in a random arrangement. Using timestamps is also a common approach. iPhoto [6] manages the photographs taken in the computer at a time as a “Roll”, and the photographs in the roll are further divided into some of “Events” at a manually specified interval (e.g. a day, two hours, etc.). On the contrary, PhotoTOC [7] and its predecessor, AutoAlbum [8], proposed an adaptive time gap detection algorithm for adjusting the interval automatically. The photo management system developed by Graham [9] cluster photographs with similar function, but with a different user interface. Based on the knowledge of user studies, most of the researches concluded that clustering the digital photo collection by the timestamp is one of the effective methods to separate contextually related groups. Also, it is worth increasing the opportunity to see passively the photographs taken in past days, so as to make good use of the personal digital photo collection. So far, most of operating systems or photo browsing applications have slideshow function. The function, however, merely shows the photographs in random or predetermined order, and there has been no photo application that positively utilizes the slideshow function for efficient browsing. In this study, we developed a web-based intelligent photo management system (automatic photo classifier, APC) which enables automatic clustering of an unstructured collection of personal digital photographs. To achieve a hierarchal structure with high affinity with human memory system, two stages clustering (into Groups and Events) was adopted in APC. It can also offer the chance to see passively the photographs taken in the past time by introducing dynamic thumbnails. To assess the usability of APC and effectiveness of its functions, we conducted a user study. The user task adopted in this study was finding several target photographs chosen from the user’s or somebody else’s photo collections. The results show that APC works better for the user’s own photographs, while it requires more time for others’ photo collections. Although it has been reported “It is better that the thumbnails as many as possible can be seen at a glance [10]”, the results suggested that the lookand-feel of the photo management system should be considered according to whether the photographs had been taken by the user him/herself.
2 APC: Automatic Photo Classifier 2.1 Clustering Method Photographs taken by digital still cameras store metadata in EXIF (exchangeable image file format) [11] which includes timestamp, exposure time, camera settings, etc. The data is encoded into the digital photograph files, and is available to applications that access these files. As specified in [10], the timestamp seems to be highly useful in helping people browse their photographs. We also decided to use the timestamp in the EXIF data to cluster digital photo collections. Our photo browsing system, APC, clusters photographs in two stages. In the first stage, all photographs are clustered using the k-means clustering algorithm [12]. Here, the number of the cluster, k , is given adaptively by rounding off
Intelligent Photo Management System Enhancing Browsing Experience
k = α {log10 (N1 )}
2
441
(1)
where N1 is the number of photographs in a collection, and α is a coefficient whose value is empirically chosen to be 1.3. Using this setting, the photos can be clustered into several similar sizes of bursts, each of which generally fits in a single pane. We call each burst a Group. In the second stage, photographs from each Group are further clustered in a hierarchical bottom-up way using the NN (nearest neighbor) algorithm [12]. The clustering procedure will terminate if the shortest distance between two clusters becomes more than ⎛H T = κ log10 ⎜⎜ ⎝ N2
⎞ ⎟⎟ ⎠
(2)
where H is the time difference between the first and last photos in the Group, N 2 is a number of photos in the Group, and κ is a threshold coefficient whose value is empirically chosen to be 250 [sec]. By this setting, photographs in a Group can be organized into smaller bursts, reflecting the manner in which the photos are taken, thus expected to have higher affinity with human memory system. We call each burst an Event.
Fig. 1. Example of clustered photo collection
Figure 1 shows an example of a clustered photo collection. After the first stage of clustering, the photo collection is divided into Groups which can correspond to “sightseeing”, “shopping”, etc. After the second stage of clustering, each Group is divided into Events which can be interpreted as “Temple 1” and “Temple 2” in the “Sightseeing” Group. 2.2 Implementation
Figure 2 shows the appearance of the developed photo management system, APC. It was developed using PHP (version 5.2) with Apache HTTP server (version 1.3.33) and JavaScript.
442
Y. Orii, T. Nozawa, and T. Kondo
Fig. 2. Appearance of APC
User interface of APC is partitioned into two panes. The bottom pane shows a page at a time, and each page contains photographs from one Group. Clicking white arrows in the bottom scroll pages. The photographs in each page are in chronological order and clustered into Events. The first photograph in each Event is shown in bigger thumbnail and clicking each thumbnail shows the full-sized photograph. The top pane contains representative thumbnails which represent Groups. We call these representative thumbnails dynamic thumbnails. Rather than showing one representative photograph from each Group, a dynamic thumbnail consecutively shows the first photos of the Events in the Group. Clicking each dynamic thumbnail scrolls the bottom pane to its corresponding Group page.
3 Experiments We conducted a user study to assess the usability of dynamic thumbnails and Event clustering. We compared three browsing conditions: APC, APC without Event (APC woEv), and APC without dynamic thumbnails (APC woDT). APC without Event shows dynamic thumbnails, but it does not cluster photographs into Events. APC without dynamic thumbnails is vice versa: it clusters photographs into Events, but does not show dynamic thumbnails. Figure 3 shows the instrumental computer layout for the user study. We used two computers: the one set on the participant’s right-hand side showed a target photograph to be found while the left-hand side one was used as the photo management system with one of the three conditions. The participants were asked to use the photo browsing
Intelligent Photo Management System Enhancing Browsing Experience
443
Fig. 3. Instrumental computer layout
system and find the target photographs. When he/she found one, the next target was shown. Needed time to find each target was measured as a task completion time. We tested six participants (four males and two females, age: mean 24.0 ± SD 5.0). All participants were relatively experienced computer users. The user study was formed by two parts. In the first part, participants were asked to find the target photographs from somebody else’s novel photo collection (660 photographs). We divided these photographs into three groups (each contained 220 photos) in chronological order, and assigned them to the three browsing conditions. For each browsing condition, participants were asked to find randomly chosen 11 photographs (first one was for practice and remaining 10 photos for the actual test). In the second part, on the other hand, these participants did the same task with their personal photo collection. Each participant provided a digital photo collection ranging from 263 to 436 photographs which were taken in trips between four to seven days. We divided participant’s own photographs into three groups and asked them to find randomly chosen 5% of each group (4–7 photos) as the targets. At the end of user study, participants answered questionnaire sheets asking the usefulness of dynamic thumbnails and Event clustering, the validity of automatic clustering, etc.
4 Results Table 1 and 2 show the average task completion times [sec] for the novel and users’ own photo collections, respectively. In the first part, as shown in figure 4, average
444
Y. Orii, T. Nozawa, and T. Kondo Table 1. Average task completion times ([sec]) for the novel photo collection Participant
APC woEv
APC
APC woDT
1
21.17
17.92
20.98
2
16.11
17.12
19.65
3 4
27.92 15.73
25.43 16.70
25.31 20.75
5 6 Average
16.06 15.73 18.79
16.42 17.16 18.46
17.25 20.42 20.73
Table 2. Average task completion times ([sec]) for participant’s own photo collection Participant
APC woEv
APC
APC woDT
1
8.85
9.04
16.06
2
13.93
11.49
12.92
3 4
9.68 10.89
8.91 9.07
9.94 12.89
5 6 Average
17.29 17.25 12.98
13.19 10.81 10.42
9.02 16.80 12.94
Fig. 4. Task completion time for the novel photo collection
task completion time for APC and APC woEv was almost the same. We saw a significant difference between APC and APC woDT using t-test (p<0.05). In the second part, on the average, participants achieved better task completion time on APC
Intelligent Photo Management System Enhancing Browsing Experience
445
than on APC woEv and APC woDT. We also saw a significant difference between APC and APC woEv using t-test (p<0.05). From the questionnaire sheets, all participants specified that they felt easier to find the target photographs when clustered by Event (Likert scale of 6 levels: M=5.33). In addition, five participants also specified that they felt easier to find the target photographs with dynamic thumbnails (Likert: M=4.83) although one of the participants did not positively use dynamic thumbnails to move among pages.
Fig. 5. Task completion time for participant’s own photo collection
5 Discussions The above results indicated that finding a target photo from the user’s own photo collection can be made easier by using dynamic thumbnails and Event clustering. They also showed that Event clustering does not have a significant effect on novel collection. From these results, we can suggest that by changing the look-and-feel of the photo management system according to whether the photographs had been taken by the user him/herself or not, one can improve the browsing experience. That is, for the user’s own photo collection, the Event clustering can provide a structure which is in good accordance with the user’s memory about the events. On the other hand, some clustering method based on visual feature similarity can be more effective for the management of novel photo collections. The result of the user study did not confirm the advantage of dynamic thumbnails clearly. We think, however, we can confirm it when the time scale of the photo collection grows further, because it seems worthful offering the more chance to see passively the photographs taken in past days.
6 Conclusions In this paper, we explained our web-based intelligent photo management system (APC) which enables automatic clustering of unstructured personal digital photo
446
Y. Orii, T. Nozawa, and T. Kondo
collections based on the photo-taking behaviors inferred from the metadata added to them. By introducing dynamic thumbnails, APC can also offer the chance to see the photographs taken in the past time, without the user’s active effort. The results of user study showed that APC works better for the personal photographs, while it does not make large difference when he/she browses others’ photo collections. It is suggested that the look-and-feel of the photo management system should be changed according to whether the photographs had been taken by the user him/herself.
7 Future Works At the moment, APC only use timestamps to cluster photographs. However, we are in the process of developing newer version of APC which uses timestamps and visual feature of the photograph. We are hoping to see the effectiveness of the photo management system which combines the results of time-clustering and colorclustering. Acknowledgments. This research was partially supported by the Ministry of Education, Culture, Sports, Science, and Technology, Grant-in-Aid for Scientific Research on Priority Areas (No.20033007), and "Symbiotic Information Technology Research Project" of Tokyo University of Agriculture and Technology.
References 1. Kuchinsky, A., Pering, C., Creech, M.L., Freeze, D., Serra, B., Gwizdka, J.: FotoFile: a consumer multimedia organization and retrieval system. In: Proc. of Conference on Human Factors in Computing Systems CHI 1999, pp. 496–503 (1999) 2. Kang, H., Shneiderman, B.: Visualization Methods for Personal Collections Browsing and Searching in the PhotoFinder. In: Proc. of IEEE International Conference on Multimedia and Expo ICME 2000, pp. 1539–1542 (2000) 3. Bederson, B.B.: PhotoMesa: A zoomable image browser using quantum treemaps and bullemaps. In: Proc. of UIST 2001, pp. 71–80 (2001) 4. Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: Proc. of the IEEE International Conference on Computer Vision (January 1999) 5. Rodden, K., Basalaj, W., Sinclair, D., Wood, K.: Evaluating a visualization of image similarity. In: Proc. of SIGIR 1999, pp. 275–276 (1999) 6. iPhoto (2008), http://www.apple.com/ilife/iphoto/ 7. Platt, J.C., Czerwinski, M., Field, B.A.: PhotoTOC: Automatic Clustering for Browsing Personal Photographs, Technical Report MSR-TR-2002-17, Microsoft Research (2002) 8. Platt, J.C.: AutoAlbum: Clustering Digital Photographs using Probabilistic Model Merging. In: Proc. of Content-based Access of Image and Video Libraries (2000) 9. Graham, A., Garcia-Molina, H., Paepcke, A., Winnograd, T.: Time as Essence for Photo Browsing Through Personal Digital Libraries. In: Proc. Joint Conference on Digital Libraries (2002)
Intelligent Photo Management System Enhancing Browsing Experience
447
10. Rodden, K., Wood, K.R.: How Do People Manage Their Digital Photographs? In: Proc. of CHI 2003, vol. 5, pp. 409–416 (2003) 11. Exchangeable image file format for digital still cameras: Exif Version 2.2 (JEITA CP3451), Standard of Japan Electronics and Information Technology Industries Association (2002) 12. Duda, R.O., Hart, R.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Inc., Chichester (2001)
Freeze TCPv2: An Enhancement of Freeze TCP for Efficient Handoff in Heterogeneous Networks Minu Park1, Jaehyung Lee1, Jahwan Koo2, and Hyunseung Choo1,* 1
School of Information and Communication Engineering, Sungkyunkwan University, Korea {minupark,jhyunglee,choo}@skku.edu 2 Computer Sciences Department, University of Wisconsin-Madison, USA [email protected]
Abstract. The advancement of mobile communications for the last few years has accompanied an increasingly extensive and diverse array of environments for TCP applications, which has led to a host of wireless TCP approaches that may provide more stability and ensure higher performance. In a heterogeneous network, as opposed to conventional wireless networks, a mobile node performs a handoff to another network cells with a different bandwidth and latency. Conventional wireless TCP schemes, however, are not capable of properly addressing sudden changes in round trip time (RTT) and packet loss resulting from handoffs within a heterogeneous network and do experience such problems as the waste of available bandwidth and frequent packet loss. As a solution to these problems, this paper proposes Freeze TCP version 2 (v2), an enhancement of Freeze TCP designed to dynamically obtain the available bandwidth in a new network cell. Comprehensive simulation study is conducted by using a network simulator, ns-2, to compare the proposed scheme to a few other schemes, such as Freeze TCP, DEMO-Vegas, and TCP Vegas, in a heterogeneous network environment with vertical handoffs in terms of throughput per speed of the mobile node and per bit error rate of the wireless link. Keywords: TCP, Congestion control, Heterogeneous Networks, Vertical Handoff.
1 Introduction Transmission control protocol (TCP) is one of the most widely used Internet protocols for web browsing, email, file transfer and other applications [1], [2]. The advancement of mobile communications for the last few years has accompanied an increasingly extensive and diverse array of environments for TCP applications, which has led to a host of wireless TCP schemes that may provide more stability and ensure higher performance [3] such as TCP Westwood [4], TCP Jersey [5] and TCP New Jersey [6]. As these schemes, however, are based on homogeneous wireless networks, each of which consists of a single wireless access network, they do not deal with the performance degradation occurring in heterogeneous networks with different link *
Corresponding author.
M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 448–457, 2009. © Springer-Verlag Berlin Heidelberg 2009
Freeze TCPv2: An Enhancement of Freeze TCP for Efficient Handoff
449
characteristics [7], [8]. A classic example of a heterogeneous network would be a GPRS-WLAN network consisting of a general packet radio service network (GPRS) [9] and a wireless local area network (WLAN) [10]. As a mobile node changes its connection point to the network from time to time in these heterogeneous and homogeneous environments, sustained connection requires handoffs. A handoff in a heterogeneous wireless network, in particular, spans heterogeneous network cells and is thus called a ‘Vertical Handoff’ as opposed to a ‘Horizontal Handoff’, which refers to a handoff of a more conventional type [11], [12]. Conventional wireless TCP schemes, however, experience serious performance degradation in vertical handoffs due to two primary reasons: The first one is an abrupt change in RTT within a heterogeneous network. To ensure reliable transfer, TCP is designed to control the transfer rate by repeatedly sending a chunk of data and receiving an acknowledgement of it. In a heterogeneous network, however, each network cell has distinct bandwidth and latency attributes, causing the sender to receive acknowledgements in different arrival patterns than in conventional networks, which could translate into sudden changes in RTT. Consequently, the sender may transfer a series of packets before recognizing and responding to changes in RTT and obtaining the available bandwidth in the new network cell. The second one is that the conventional wireless TCP schemes do not take into account the packet loss that may occur during handoffs. Moving from one access router to another, a mobile node (MN) must undergo the discovery of a new access router, registration, and authentication, resulting in handoff latency. This in turn produces packet loss resulting from retransmission timeout (RTO), causing the TCP sender to assume congestion in the network and make unwarranted cuts in the transfer rate. In consequence, this phenomenon implies an issue of inefficiency in the use of network resources. In this respect, this paper proposes Freeze TCPv2, an RTT-based congestion control scheme designed to make efficient use of the resources of a new network and provide an appropriate transfer rate according to the network bandwidth within a heterogeneous wireless network. The proposed scheme measures RTT values prior to and after the occurrence of handoffs, and monitors trends in changes. It then uses buffer status checking mechanism [13] to find out how many packets sent by the sender to the receiver (MN) remain in the network. As the count of remaining packets is determined in this manner, the MN controls the transfer rate according to the condition of traffic existing in the link when it resumes communication in the new network cell, effectively minimizing the probability of network congestion and ensuring a higher transfer rate. Performance evaluation using ns-2 establishes the superiority of Freeze TCPv2 working in a heterogeneous network and demonstrates that the proposed scheme is capable of obtaining the network bandwidth in a more efficient manner and sustaining a higher transfer rate. This paper is organized as follows: Chapter 2 introduces recent studies on TCP schemes working in wireless and cellular network environments while Chapter 3 describes the proposed algorithm. Chapter 4 is a comparative analysis of the proposed algorithm in terms of performance evaluation through simulation. Finally, Chapter 5 concludes this paper.
450
M. Park et al.
2 Related Work 2.1 Freeze TCP In contrast to conventional TCP schemes, Freeze TCP [14] prevents unnecessary packet transmission during a handoff by modifying the MN that serves as a receiver in the network. If the strength of the signal received by the MN from the base station falls below a certain level, the MN predicts the occurrence of a handoff and then sends a zero window advertisement (ZWA) to the sender. Having received the ZWA, the sender sets cwnd maintained by itself to 0 and fixes values of all other variables as well so that it could temporarily cease transmission during a handoff. Upon the completion of the handoff, the MN sends a positive acknowledgement (PACK) to the sender to inform that communication is available, and the sender receives this and then restores the values of cwnd and other variables to their original values that it had prior to the occurrence of the handoff before finally resuming data transmission. This process allows Freeze TCP to prevent unnecessary packet transmission and resultant performance degradation and thereby cope with handoff situations in a more proper manner than TCP Reno. The assumptions made during the development process, however, took into consideration a homogeneous network consisting of network cells with an identical bandwidth only, which lead to a difficulty in obtaining the available bandwidth after a handoff to a new network cell in a heterogeneous network. Furthermore, this scheme has another downside that a ZWA or PACK missing in the middle of a link, if any, leaves the sender unable to detect the start and end of a handoff, resulting in severe performance degradation. 2.2 DEMO-Vegas As a MN enters a new network cell within a cellular network environment, it performs a handoff to maintain communication. A TCP Vegas’ sender at this point invokes a routing optimization algorithm to resume communication and applies the RTTbase value as measured prior to the handoff without modifying it. In the above case, however, the new network cell has different propagation delay and bandwidth values, which implies an accuracy issue with the RTTbase as measured prior to the handoff. Consequently, the Δ value based on the inaccurate RTTbase results in the inability of TCP Vegas to properly dealing with handoff situations. DEMO-Vegas [15], therefore, handles a change in the care-of-address (CoA) of a MN in a mobile IP environment by using a reserved bit in the header of the acknowledgement packet sent by the MN to inform the sender of the handoff. Having received an acknowledgement with the reserved bit named ‘SIG’, the sender is informed of the occurrence of the handoff and temporarily ceases transmission. Upon the completion of the handoff, the MN resets the reserved bit in the header. Having received this, the sender updates its RTTbase value and resumes transmission. DEMOVegas informs the sender of the occurrence of a handoff in this manner and thereby prevent unnecessary transmission, providing higher performance than TCP Vegas.
Freeze TCPv2: An Enhancement of Freeze TCP for Efficient Handoff
451
This scheme, however, has a flaw that the congestion control algorithm obtains the bandwidth available in a new network cell in the same manner as with TCP Vegas, resulting in less than better performance when the network bandwidth undergoes a significant change.
3 Proposed Scheme 3.1 Dynamic Available Bandwidth Estimation (DABE) As two different network cells in a heterogeneous network are likely to provide different network bandwidths, a sender operating in the same manner as in a homogeneous network would exhibit unsatisfactory performance. For example, if the bandwidth provided by a new network cell is larger than that by the previous cell, the available bandwidth between the sender and the receiver increases. In other words, this implies that the sender makes use of more network resources and then RTT, which takes to transfer a packet, becomes shorter. In this situation, however, the sender keeps up with the same transfer rate as prior to the occurrence of the handoff because of ignoring a change of bandwidth. It in turn leads to inefficiency of the available bandwidth as well as the inability to produce a higher transfer rate. Furthermore, if the bandwidth provided by a new network cell is smaller than that by the previous cell, the sender applies the larger bandwidth, which exceeds the available bandwidth, to the new network cell as obtained prior to the occurrence of the handoff. It means that a considerable amount of time is required in order to accurately adjust the transfer rate to the available bandwidth. It generates serious network congestion and packet loss as well as a relatively longer RTT. Consequently, it does not produce a higher transfer rate because of the sender’s inability to properly obtain the available bandwidth within a heterogeneous environment consisting of two neighboring network cells with different bandwidth. To address this issue, Freeze TCPv2 compares the RTT values measured prior to and after the occurrence of a handoff to predict a change in the available bandwidth and utilizes the ZWA and PACK of Freeze TCP [16]. As in the case with Freeze TCP, if signal strength falls down to a certain level due to the mobility of the MN (ż,4 in Fig. 1) while communication is on going (ż,1 - ż,2 in Fig. 1), the MN assumes the occurrence of a handoff and then generates a ZWA before sending it to the sender (ż,5 - ż,8 in Fig. 1). Having received the ZWA message, the sender sets its cwnd to 0 maintained by itself and fixes the RTO and other variables used to calculate the RTO value. This process allows the sender to halt the transmission temporarily at the occurrence of a handoff, as in the case with Freeze TCP. At this point, the sender stores the RTT value for the ZWA to RTTprev in order to determine the bandwidth provided by the previous network ( ż,9 in Fig. 1). Upon ompletion of the handoff ( ż,10 in Fig. 1), the MN generates and sends a PACK message to inform the sender of the availability of communication (ż,11 - ż,14 in Fig. 1), which enables the sender to resume packet transmission. In addition, the sender, having
452
M. Park et al.
Corresponding Node (Sender)
:Data packet :ACK packet BS :Base Station ZW A : Zero W indow Advertisement PACK :Positive ACK
16 1
- Setting cwnd as 0 - Fixing all the other variables’ 9 value - Storing RTT into RTTprev
15
8
Gateway
20
14 21 13
- Recovering cwnd and other variables’s value - Refreshing RTTbase
17
2 7
- Storing RTT into RTTcurr
22 - Comparing RTTprev with RTTcurr - Operating WAVE algorithm
12
BS
11 PACK
19 18
4 BS 3 6 ZWA 10 5
Mobile Node (Receiver)
Reconnection
Impending Disconnection
Fig. 1. DABE mechanism in Freeze TCPv2
received the PACK, restores the values of the cwnd and other variables to their original values stored before the handoff (ż,15 in Fig. 1). The sender stores the RTT value for the first packet transmitted after the handoff (ż,16 - ż,21 in Fig. 1) as RTTcurr in order to monitor the changes occurring during the handoff (ż,22 in Fig. 1). If the RTTprev is smaller than the RTTcurr, the sender ascribes the increase of RTT to the decreased available bandwidth in the new network cell as well as to certain difficulties in packet transmission. The sender linearly increases the window size saved prior to the handoff. If the RTTprev is larger than the RTTcurr, however, the bandwidth provided by the new network cell is increased and there is no problem in packet transmission with the decreased RTT value. The sender thereafter adjusts the transfer rate as in Eq. (1), according to the change of RTT and the size of the packet that has been sent. cwnd + =
PacketSize | RTT prev − RTTcurr |
(1)
As shown above, Freeze TCPv2 is capable of accurately adjusting the transfer rate according to the change in bandwidth while monitoring the change of RTT during a handoff. Even in a homogeneous network having a constant bandwidth, detecting any change in the bandwidth provided by a new network cell makes it easy to determine the available bandwidth.
Freeze TCPv2: An Enhancement of Freeze TCP for Efficient Handoff
453
3.2 Window Adjustments Based on Vegas Estimator (WAVE) Since the sender and the receiver, MN, cannot be connected with each other during the handoff, the data packets, which are sent before the handoff, remain in the network links. As a result, packets buffered in the network links decline the accuracy of RTTcurr calculated after the handoff. Freeze TCPv2 adjusts the transmission rates and checks the network congestion status based on the buffer checking algorithm of TCP Vegas to solve this problem. Freeze TCPv2 keeps two variables, the RTTbase measured as a minimum RTT, and the current RTT measured from the last transmitted packet. These two values are used to deduce the Expected and Actual using the following Eq. (2). The Expected indicates the transmission rate when the network status is stable, and the Actual implies the sending rate in the current network status. The difference between the two values is represented as Δ , and is calculated by the following Eq. (3) which indicates the amount of packets buffered in network elements. In other word, Δ implies the congestion status in the network. Moreover, Freeze TCPv2 defines α and β, which are the minimum and the maximum numbers of the network buffers, respectively. Using these variables, Freeze TCPv2 determines the network condition and remaining packets in the network links, subsequently adjusting its transmission rates based on the network congestion status after the handoff by calculating α, β, and Δ . Expected =
Δ=(
WindowSize RTTbase
Actual =
WindowSize RTT
WindowSize WindowSize − ) × RTTbase RTTbase RTT
(2)
(3)
When the sender receives a PACK, it refreshes the RTTbase as RTTprev so as to check the number of remain packets buffered; this algorithm is adapted from DEMO-Vegas. Since the communication between the sender and the MN is connected during the handoff, the RTTbase means the minimum RTT from the previous network cell. Here, the RTTbase cannot be applied for the variable in the new network because it causes the difficulty for the sender to determine the available bandwidth well. As a result, for checking the network condition through TCP Vegas, the proposed scheme refreshes RTTbase as soon as receiving the PACK. After refreshing the RTTbase, the sender calculates Δ and compares it to α and β. If Δ is less than α, the sender considers the network status to be stable because the network buffer is empty and it activates its DABE algorithm. If Δ is larger than β, the sender assumes that there are many packets remaining in the intermediate routers, so it counts the number of ACK packets which the sender received during one window transmission time. This algorithm is based on TCP Westwood. The reason for this is that TCP Westwood is considered to be a good algorithm for ameliorating the network congestion in wireless networks. As a result, Freeze TCPv2 uses the ABE algorithm from TCP Westwood to take care of the congestion in network links. If Δ is between α and β, it just initiates the original TCP Vegas (line 26 in Fig. 4). In the case that the sender receives a normal ACK without ZWA or PACK, it follows the congestion control algorithm of TCP Reno as Freeze TCP.
454
M. Park et al.
4 Performance Evaluation 4.1 Simulation Setup We evaluate the performance of Freeze TCPv2 with metrics such as the throughput, and fairness, using the ns-2 network simulator [16]. We examine the performance of the proposed scheme in a heterogeneous network topology with TCP Vegas, Freeze TCP, and DEMO-Vegas. The simulation topology is described in Fig. 2.
– ’
Fig. 2. Simulation topology
All simulations assume that a handoff occurs only once, and in an overlapping area of network cells. The corresponding node (CN) as a sender transmits the data to the MN though a wired connection from the CN to the access point (AP). The radio coverage from the AP is 500 m, and the MN moves from the old network cell to the new network cell with various speeds. At this time, the AP forwards the packets from the gateway (GW) to the MN through wireless links. In this scenario, we evaluate the throughput under the speed of the MN and the wireless link error rates. The queue sizes for all of the nodes are set to 20, and a single TCP connection running a longlive FTP application delivers the data from the sender to the receiver. The two buffer thresholds of Freeze TCPv2, α and β, are set to 14 and 16, as done in [17]. 4.2 Throughput under Various Speeds of the MN in Heterogeneous Networks In this simulation, we evaluate the performance in terms of the speed of the MN in heterogeneous networks. These scenarios are chosen to describe two cases where the MN moves from a lower bandwidth network cell to a higher bandwidth network cell (6Mbps to 11Mbps), and from a higher bandwidth cell to a lower bandwidth cell (6Mbps to 1Mbps). We measure and compare the throughput during the movement of the MN, where the speed of the MN speeds up from 10 km/h to 100 km/h. Fig. 3 depicts the results of a simulation lasting over 180 seconds, an average of 10 times.
Freeze TCPv2: An Enhancement of Freeze TCP for Efficient Handoff
455
Comparing with TCP Vegas, Freeze TCP, and DEMO-Vegas, Freeze TCPv2 outperforms by 40%, 8% and 14%, respectively, when the MN moves from a previous network cell with 6 Mbps to a new network cell with 1 Mbps (Fig. 3). Moreover, in the case where the MN moves from an old network cell with 6 Mbps to a new network cell with 11 Mbps, Freeze TCPv2 shows a performance improvement of 55% over TCP Vegas, 13% over Freeze TCP, and 17% over DEMO-Vegas. As for two plottings, the shorter handoff durations they experience, the more throughput of all schemes increases. In this situation, Freeze TCPv2 recognizes the change of network bandwidth, and subsequently obtains the available bandwidth of the new network cell by activating the DABE algorithm. TCP Vegas
TCP Vegas
Freeze-TCP DEMO-Vegas
Mobile Node Speed VS. Throughput
Freeze-TCP
Mobile Node Speed VS. Throughput
Freeze TCPv2
DEMO-Vegas Freeze TCPv2
7.5
3.5
Throughput (Mbps)
Throughput (Mbps)
7.0
3.0
2.5
6.5 6.0 5.5 5.0 4.5
2.0
4.0 10
20
30
40
50
60
70
80
90
100
Node Speed (km/h)
(a) When MN moves from 6 Mbps network cell to 1 Mbps network cell
10
20
30
40
50
60
70
80
90
100
Node Speed (km/h)
(b) When MN moves from 6 Mbps network cell to 11 Mbps network cell.
Fig. 3. Throughput under various speeds of the MN
4.3 Throughput under Various Wireless Link Error Rates in Heterogeneous Networks In this simulation, we investigate the throughput of both the proposed scheme and the others in terms of wireless link error rates in heterogeneous networks. With an altered network bandwidth, from 6 Mbps to 1 Mbps and from 6 Mbps to 11 Mbps, we evaluate the throughput of Freeze TCPv2, Freeze TCP, DEMO-Vegas and TCP Vegas with various wireless link error rates, between 1% and 5%. In this simulation, the speed of the MN is fixed to 50 km/h. The simulation results show that Freeze TCPv2 outperforms the other TCP schemes, achieving 20% - 50% improvements in goodput as shown in Fig. 4. As the wireless link error rates increases, the throughput of all of the schemes also decreases. In particular, when the MN moves from a 6 Mbps network to a 1 Mbps network, Freeze TCPv2 outperforms Freeze TCP by 7%, DEMO-Vegas by 9%, and TCP Vegas by 48%, respectively. In addition, Freeze TCPv2 shows a better performance than TCP Vegas, Freeze TCP and DEMO-Vegas by 58% , 12% and 11% when the MN moves from a previous network cell with 6 Mbps to a new network cell with 11 Mbps. These trends are owing to the WAVE algorithm of Freeze TCPv2, because the sender controls the network congestion through investigating the remaining packets in the network. Consequently, it is possible for the sender to obtain an appropriate transmission rate sustaining a higher performance.
456
M. Park et al.
4
Wireless Link Error Rate VS. Throughput
TCP Vegas
TCP Vegas Freeze-TCP
Wireless Link Error Rate VS. Throughput
DEMO-Vegas
7
Freeze TCPv2
Freeze-TCP DEMO-Vegas Freeze TCPv2
6 5 Throughput (Mbps)
Throughput (Mbps)
3
2
4 3 2
1
1
0
0
0
1
2
3
4
5
Error Rate (%)
(a) When MN moves from 6 Mbps network cell to 1 Mbps network cell.
0
1
2 3 Error Rate (%)
4
5
(b) When MN moves from 6 Mbps network cell to 11 Mbps network cell.
Fig. 4. Throughput in terms of wireless link error rates
5 Conclusion In this paper, we propose Freeze TCPv2 using DABE and WAVE algorithms in heterogeneous networks and homogeneous networks. In the DABE algorithm, the sender checks the differentiation between RTTpre, prior to handoff, and RTTcurr, after handoff. Afterwards, the sender adjusts its sending rates according to the estimated bandwidth changes. Ordinarily, the sender and the MN cannot maintain their communication during the handoff. To solve this, Freeze TCPv2 using the WAVE algorithm enables the sender to keep on the stable transmission by investigating the remaining packets in the network links. As a result, the sender quickly adjusts to the appropriate available bandwidth without any network congestion. A simulation result using the ns-2 simulator demonstrates that Freeze TCPv2 shows more better performance over TCP Vegas, Freeze TCP, and DEMO-Vegas, regardless of the network characteristics. Finally, Freeze TCPv2 satisfies the criteria of the Fairness index in terms of sharing the network bandwidth equally. Acknowledgments. This research was supported by the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Advancement) (IITA-2009-(C1090-0902-0046)), and This work was supported by the Korea Research Foundation Grant funded by the Korean Government (KRF-2008-314-D00296).
References 1. Postel, J.: Transmission Control Protocol. RFC 793 (1981) 2. Allman, M., Paxson, V., Stevens, W.: TCP Congestion Control. RFC 2581 (1999) 3. Tian, Y., Xu, K., Ansari, N.: TCP in Wireless Environments: Problems and Solutions. IEEE Radio Communications 43(3), S27–S32 (2005) 4. Casetti, C., Gerla, M., Mascolo, S., Sanadidi, M.Y., Wang, R.: TCP Westwood: Bandwidth Estimation for Enhanced Transport over Wireless Links. ACM/IEEE MobiCom, 287–297 (July 2001)
Freeze TCPv2: An Enhancement of Freeze TCP for Efficient Handoff
457
5. Xu, K., Tian, Y., Ansari, N.: TCP-Jersey for Wireless IP Communications. IEEE Journal of Selected Areas in Communications 22(4), 747–756 (2004) 6. Xu, K., Tian, Y., Ansari, N.: Improving TCP Performance in Integrated Wireless Communications. Networks Computer Networks 47, 219–237 (2005) 7. Hansmann, W., Frank, M., Wolf, M.: Performance Analysis of TCP Handover in a Wireless/Mobile Multi-Radio Environment. In: Proc. of IEEE LCN 2002, November 2002 (2002) 8. Chakravorty, R., Vidales, P., Subramanian, K., Pratt, I., Crowcroft, J.: Practical Experiences with Wireless Integration using Mobile IPv6. ACM Mobile Computing and Communication Review 7(4) (October 2003) 9. Ghribi, B., Logrippo, L.: Understanding GPRS: the GSM Packet Radio Service. Computer Network 34(5) (November 2000) 10. Crow, B., Widjaja, I., Kim, J., Sakai, P.: IEEE 802.11 Wireless Local Area Networks. IEEE Communication Magazine (September 1997) 11. Liao, W., Kao, C., Chien, C.: Improving TCP Performance in Mobile Networks. IEEE Transactions on Communications 53(4), 569–571 (2005) 12. Wu, X., Chan, M., Ananda, A.: TCP HandOff: A Practical TCP Enhancement for Heterogeneous Mobile Environments. IEEE ICC, 6043–6048 (June 2007) 13. Brakmo, L., Peterson, L.: TCP Vegas: End to End Congestion Avoidance on a Global Internet. IEEE Journal of Selected Areas in Communications 13(8), 1465–1480 (1999) 14. Goff, T., Moronski, J., Phatak, D., Gupta, V.: Freeze-TCP: A True End-to-End TCP Enhancement Mechanism for Mobile Environments. In: Proc. of the IEEE INFOCOM 2000, vol. 3, pp. 1537–1545 (2000) 15. Ho, C., Chan, Y., Chen, Y.: An Efficient Mechanism of TCP-Vegas on Mobile IP Networks. IEEE INFOCOM, 2776–2780 (March 2005) 16. UCB/LBNL/VINT Network Simulator, http://www.isi.edu/nsnam/ns 17. Jain, R., Chiu, D., Hawe, W.: A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems. DEC Research Report TR-301 (September 1984)
Expanding SNS Features with CE Devices: Space, Profile, Communication Youngho Rhee, Hyunjoo Kang, Yeojin Kim, Juyeon Lee, and IlKu Chang San 14-1 Nongseo-Dong, Giheung-Gu, Yongin-City, Gyeonggi-Do, Korea 446712 {yh.rhee,hyunjoo.kang,yeojin81.kim,florial.lee, ilku.chang}@samsung.com
Abstract. A social network service (SNS) beginning in online increasingly penetrates everyday life deeper and represents new business opportunity with expanded network. Indeed, many SNSs are being served with mobile devices via broadband network. However, there are enormous challenges that must be overcome to provide seamless experience on these mobile devices because of context of use. In the present paper, new definition of SNS is proposed with focus on CE devices building on general features and definition. Accordingly the novel concept and scenarios are presented to compensate users’ needs and service & business insights. Keywords: SNS (Social Network Service), Scenarios, CE devices, Context, Profile, Space, Communication.
1 Introduction The recent emerging trends- social divides, various users’ needs, and technology advance make up brand new social situation called social network services. People, today, share their life, appeal identity via online, and seek to have fun experience: their common features- sharing, connecting, belonging, and blogging- are very faithful to the ones defined as social needs. These trends embrace new business and lead technology innovation. SNS (social network service) such as facebook, cyworld, bebo, QQ, and Mixi are paid attentions to millions of users and on the rise globally, supporting a wide range of interests and practices. A report estimated 24.9 million individual social networking visitors in August 2007 and forecasts rapid growth by next few years globally. The numbers of websites are adding, developing, and refining the features of social networking services and changing the ways in which people use and engage with each other. Also, the methods that the people connect to social network services become vary. As a mobile technology is rapidly advanced, the features of social network services is being served in the device, which represents an opportunities. According to an online user survey, close to half of all social networking users have now visited destinations like facebook via mobile device which implies forty-six percent of social network members have visited their favorite sites on their phones. The young who grown up with embracing the Internet and M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 458–467, 2009. © Springer-Verlag Berlin Heidelberg 2009
Expanding SNS Features with CE Devices: Space, Profile, Communication
459
mobile technologies make up new opportunities and take benefits of emerging services and negotiate appropriate behaviors within the new communities. As of the current services, the present paper argues that the features emerged around online SNS have not been capitalized on the unique attributes of the mobile device. A seamless connectivity and data integration between attributes is preliminary tasks to be situation that thousands devices per one person Mark Weiser stated. Accordingly, the clear role and collaboration between machines including CE devices and online within the system should be clearly defined. In this point of view, the present paper conducted user centered design process within system design approach to define problems and find insights dealing with existing online based SNS. Accordingly, revised definition, novel concept, and concept-proof scenarios are proposed to fulfill SNS with CE devices in expanded network.
2 Background Research Three main tasks were conducted in the background research: literature review, trend, and user research. The purpose of literature review goes over overall previous research works to define the scope. Trend research helps understand current landscape of SNSs to forecast the evolution of future services. User research like contextual inquiry and diary study on paid participants allows refining the core features existing online based SNS and drive core assets in developing scenario. 2.1 Literature Review Maslow’s hierarchy of needs is a theory that extends theory of human motivation to human’s innate curiosity. It is composed of five levels of needs and SNS is being existed in social needs- a sense of belonging or building groups of people. SNS generally embraces three key features: profiles, visualization of social relations, and connections to accomplish a task [18]. Similarly,[1] describes that SNS allows individuals to construct a public or semi public profile, articulate a list of other users and view their list of connections. It emphasizes keeping the relations with people who are already a part of their extended social network than looking to meet new people. The profile is described unique pages where one can type oneself into being and backbone of SNS[17]. [12] examined the effects of profiles and visible relations between friends as a major trigger in QQ service. donath and [2] extended this to suggest that public displays of connection serve as important identity signals that help people navigate the networked social world. [16] found that the activity status feature influenced people behaved and what they choose to reveal. SNSs target homogeneous population likewise specific geographical regions or linguistic group. According to a report [3], a social network services have a tendency to skew in popularity in different regions: mySpace.com and facebook attracts approximately two-thirds of their respective audiences in North America, but not in Asia-Pacific and Middle East-Africa. [8] found the role of national identity in SNS use through an investigation into the brazilian invasion of orkut and the resulting culture clash. Also, Ivins asserted that the success of social network service much depends on cultural characteristics. As SNS continues to evolve over time, it will be
460
Y. Rhee et al.
exciting to see if they are able to overcome cross cultural barriers and bring people from different corners of the globe together in fulfilling the truest ideal of social network. Furthermore, as of the extent of cultural characteristics, the relations between socio cultural attributes are examined in terms of actual usage: these are social norm, pleasure, and usefulness. The social norm that statistically influence on the enjoyment encourages participants to have an intention to accept or use when deploying new artifacts [5]. A bulk of research concerning SNS is also focused on participants’ behavior and role. By examining previous research on network structure, the four core patterns from certain recurrent roles are structured - disconnected, onion, nexus, and butterflyand each pattern show how neighbors are connected to him [7]. The patterns show that key-player who deliver or articulate new issues such as information or contents always exists between different groups of people. For example, in the nexus pattern a single persona shares multiple contexts with ego. The two groups of neighbors are linked via single person in the butterfly. A model “the arc of influence” [6] explains how topics spread and influence causes action. The model proposed inverting the arc by putting the target audiences in control and trying to work out how they get influenced so that they accordingly act whether the user’s role is influencer or influenced. [13] insisted the roles people played in the growth of networks can be categorized into passive members, inviters, and linkers and similar studies are conducted with other researchers [9,10]. 2.2 Trend Research The competitive analysis was conducted based on territories where it initiates at the first time. A total of three areas are laid out according to their birth place and type of device. One was beginingg in online and expanded to a mobile device later. The second, vice versa, a thing that started in the mobile devices and moved to online, and the third is CE (consumer electronics) based online service in the field of fun experience. Famous SNSs - facebook, flickr, del.ici.ous. and Linkin- are all categorized into the first one- a start in online and many of them are being currently expanded to mobile device and get paid attention to target users (i.e., facebook, flickr, Youtube). The features being served in this service are dealt with contents store and sharing. In terms of user behavior, online based activities such as blogging, organizing, and sharing are primary tasks, which are supported by way of mobile devices with its mobility. The features, however, are bit entangled in the mobile devices, since they are built on the quite different usage context such as information architecture and interaction, which lead to have different user journey. As a mobile specific service, Aka-Aki, loopt, mobileme, lifediary, and dodge balls are all included in the second category. A presence or location information via mobile device is additionally provided or works as a tipping point for the target audiences. In this service, a mobile device works as a social mediator since it connects people via mobile censored information: the users who installed same applications which utilizes GPS trajectory on the mobile device are likely to meet strangers or friends within the same area. Furthermore, the audiences of this service are likely to share their photos
Expanding SNS Features with CE Devices: Space, Profile, Communication
461
or video easier with sync technology embedded in mobile, pc, and online. This is primary differentiator compared to online SNS. Finally, the third case is CE based SNS. Any owner of CE device can get together in online to enjoy games or hang out with anonymity. In this case, Wii is a pioneer of CE device with SNS. The console box allows invite ones and play game with them via online space. PS3’s home, and Live at Xbox are follower in this stream. These services provide a virtual space to invite or host people once the audiences have same device. The home presented by PS3 is very faithful to features of SNS. 2.3 Research Conclusion The extensive research allows develop the framework for user behaviors dealing with SNSs. Although some exceptions exist, the modern SNSs are very faithful to the features- enjoy and reinforce- as their main motive for usage and likely to be focused on people relations that already exists in real world. An exception is uncovered via profile: one who has same traits - hobbies and interests- is likely to get together and create virtual tribe, however, the goal is same- seeking fun experience or needing information. As a matter of behaviors dealing with content, self presence and knowledge share are cognitive triggers to the feature- store and share. For example, people are likely to store their photos with music in their online homepage with their innate nature. The figure 1 depicts the pattern of user behaviors as what/as is model. The second and third quadrant - collective intelligence and entertainment- implies groups of people who value on the contents. On the other hand, relation and profile matching laid in first and fourth quadrant weight value on the features- reinforce and expand their existing social network.
Fig. 1. The quadrant is developed based on contents and social network
2.4 Emerging Technology The key technologies are found by examining key features of online SNSs. RSS, for example, is a web feed formats used to publish frequently updated works in a standard format. Almost all of the online sites utilize it to publish dates and authorships. They, however, are less considered as new technologies as something that were disappeared rebirths by user needs. The technologies being disappeared in the era of web 1.0 (1990s) came back to fulfill recent user’s needs as participation and share. The key technologies underpinning the SNSs are relatively consistent while features concerning
462
Y. Rhee et al.
SNSs vary. The features are ongoing to expand to one of applications of a mobile device as previously cited. This research results allow build a model of what is with common features and technology that states current mobile Internet SNS service. In the figure 2, outer of a circle depicts modern technologies such as Ajax, RSS, open API that make sure the feasibility of online SNSs. A bunch of services such as point of interests, communication, profile matching is built on the web with these technologies. At that situation, other technologies including wireless networking, sensor, and broadband network is presented to connect them to devices and mapping on a mobile device. The profile matching existing in online, for example, needs to be integrated into the contact and profile applications on the mobile devices. Activity log captured by a mobile device syncs to online sometime via pc and mash up with map data. Apple recently presents a product “Nike +” using this scheme. Likewise, the feature of communication existing in online is migrated via messaging on a mobile device.
Fig. 2. Key Features and back-end Technologies
2.5 User Research The reasons for conducting user research in the present study are to refine a proposition of SNS with CE devices via understanding user needs. In order to fulfill purpose, three steps were conducted in the present study: participants according to the social technographics ladder developed by Forrester (2007) are recruited, and having them to write everyday activity on a diary book for a week and asked to grant an interview for three hours through home visiting. The data collected via diary book and contextual inquiry is further analyzed according to the framework proposed by [15]. At the process of recruiting participants, the social technogrphics is believed to be a good starting point to determine participants’ constituents since it categorizes the people into a ladder with six levels around social participation ranging from creators to inactive. A total number of target participants is 12 people and recruited based on the job profile and personal characteristics by a local agency. A critic, for example, at the hierarchy of behaviors is conceived as active users who are doing activity than reading or consuming contents given in some context of use. With these guidelines, recruiter access a power bloggers or social controversialists who are actively engaged
Expanding SNS Features with CE Devices: Space, Profile, Communication
463
in online community and asking for participation. The participants ranked at the rest of hierarchy also are recruited by same process. The data obtained by semi structured interview at home visiting and diary, explaining their every day behaviors, are further analyzed to illustrate personal concerns along with a ladder of six levels and activities by the four stages of activities. The four stages of activity proposed by Shneideman separates out an activity spectrum as collect- relate – create – contribute according to the extent to which users participate in.
Fig. 3. The social technographics and four stages of activity are setup to build basic framework for user research in the present study
The four activates are assumed as unique values to each participant who involved in a ladder with six levels, which is built as a relationship table. Accordingly individuals’ needs and subsequent activities concerning SNS are described along with four stages of activity (Table 1). They are hypothesized as core competencies to encourage people to be involved in a SNS. The table 1 describes detail activities of each group. A creator group, for example, is featured with the collect and connect, however, create and contribute are not in their concerns. The user research refines common needs dealing with SNS and accordingly identifies unique features for each group of participants. Four features - recommendation, easy expression, group communication methods, and social presence are identified as overall SNS needs. In addition, each group of participants needs exstra- features that could fulfill their specified concerns. The participants conceived as inactive, for example, ask for a widget for contents push service and contents creation wizard. On the other hand, a group of joiners requests features such as contacts management tool- one to may communication method and a profile matching function to meet new people. Critics underscore the level of attention from audiences as important features, which seeks a knowledge network via social network. 2.6 Insight Extensive research via social, cultural, and technology trend visualizes current landscape of SNSs and help understand general features through definitions and disciplines. In addition, user research clarifies the core needs of target audience
464
Y. Rhee et al. Table 1. An table between activities and six level of participants Collect
Creator
Critic
Collector
Make a record and seek for social contents Approach plenty of information earnestly Manage dynamic information Seek professional information
Relate Communicate with other people by creator's own contents Assistance and collaboration between creators. Personal networking for pursuing knowledge
Experts pool for dealing with information
Create
Contribute
n/a
n/a
n/a
n/a
n/a
n/a
Behaviors of seeking for information change every moment
Behaviors for successful social communities Joiner
Spectator
n/a
n/a
Join in a social community and its maintenance SNS is not as important much as their established life style
n/a
n/a
Share what people know (or have been collected) each other Make newcomers of SN comfortable and strive to maintain their social chain as stable
n/a
Take an interest in other people in SN
Inactive
n/a
n/a
Easier and more simple service is required
Enhance the usefulness on their work (or offline social life) with SNS
concerning SNSs. The syntheses of research have us to strategically select and focus on the 1st and 4th in the quadrant- relations and profile matching (figure 1), because C.E. devices can connect other devices and collect real life based context information via sensor or embedded technologies. The results of overall research can identify the problems toward existing SNSs and new business opportunities in which the future service should move. In next chapter, the present study introduces how concept is being created in terms of problems in existing SNSs.
3 Concept Development In brief, SNS is being considered as online based people management service and expanding to mobile devices through broadband connections and wireless networking technology. At this moment, the present study states that three challenges to clarify problems and concept of present study in serving core features defined in research conclusion: space, profile, and communication.
Expanding SNS Features with CE Devices: Space, Profile, Communication
465
3.1 Space A space means the places where user is able to use and enjoy SNS features. Online SNSs are limited to a space coined web, which limits to support users’ nomadic life and diversity of context. Therefore, the present study argues that the space for SNS should encompass devices to manage real life and collect diversity of context with broadband connection and wireless network. A seamless integration of information between features is considered as a very important point once the boundary for SNSs usage is expanded to the CE devices: a resource owing to one must be reused in the other and sync between objects (i.e., online, CE and mobile devices). Information in a contact application existing on mobile device, for example, must be integrated with the one in online SNS. 3.2 Profile The backbone consists of a profile that displays an articulated list of friends while a wide of SNSs have been presented. The attributes of profile include age, location, interests, and an about me section with photos. In case of facebook, the user allows add applications along with their intentions in the profile page. Most SNSs take the form of profile centric sites, trying to replicate the early success of Friendster or target specific demographics. However, users of most SNSs such as MySpace and Linkin feel hacked for asking fill out forms containing a series of questions. This context of use has users feel frustrated with inputting personal information into proper fields. The present study hypothesizes that the information obtained by collaboration of enduser’s devices is much likely to fulfill user’s intention or goal along with a diversity of context since the new information built on profile is very tightly related to real life. It concludes much help extend their social network. 3.3 Communication A social network is described as a social structure made of nodes that are tide by one or more specific types of interdependency as previsouly cited. It assumes that different groups of people or a people are likely to inhabit as a node at a social structure. Given the situation, having efficient communication methods in a community is assumed to be fairly important no matter what the type of community (i.e., disconnected or onion). In fact, by examining communication log being existed in online SNSs, the users still have old one- one and one- which asks users to do extra efforts to keep track of the communication thread: In the context of SNSs use, novel communication methods being capable of one to many or many to many are assumed to promote proliferation of social network service since it makes faster and convenient communication. 3.4 Conclusion With combining three key features, the main concept of SNSs with C.E. devices is built, which is likely to expand social network service since it provides additional information dealing with context of use. It allows make real-time flexible community on spot
466
Y. Rhee et al.
Fig. 4. The Concept of SNSs with CE devices
(location/space) and being used for more personalized service. The figure 4 explains the information built on the device collaborations can describes the user’s context. The present study argues that SNS with CE devices leans toward context focused social network service that reinforce and maintenance the relations between people: context information, automatically captured by CE devices using GPS and sensors helps people to organize and manage the relationships of people. The key features proposed in the present study are to create, reinforce, and expand the social network given seamless connectivity between online and CE devices.
4 Scenarios The purpose of scenarios is to prove concept- how proposed concept practically improves user experience dealing with social network. As previsouly described, the key features in the present study are to create, reinforce, and expand the relationships between people along with the diversity of context. The present study modifies the quadrant, illustrating the community types in which people are located [14], to present context of use obtained by CE devices. The modified quadrant is built according to patterns and behaviors of social network – fix, dynamic, contents sharing (figure 5), which reflects key features of each theme. A family, friend, work group as the fix community is likely to be reinforced via group communication methods, which have them feel being connected. Somewhat new features likewise voting, thread, schedule share are included in the group communication methods. The feature of seamless connectivity enables to share contents without any interrupts between group members
Fig. 5. The Scenario Themes and Role of Devices for SNSs
Expanding SNS Features with CE Devices: Space, Profile, Communication
467
via RSS technology. For dynamic community, a feature of profile matching is presented, which increase a chance to meet new people based on the goal likewise hobby or business.
References 1. Boyd, d.m., Ellison, N.B.: Social network sites: Definition, history, and scholarship. Journal of Computer Mediated Communication 13 (2007) 2. Boyd, d.m.: Friendster and publicly articulated social networks. In: Proceedings of ACM Conference on Human Factors in Computing Systems, pp. 1279–1282. ACM Press, New York (2004) 3. comScore.: Social networking goes global. Reston, VA (retrieved September 9, 2007), http://www.comscore.com/press/release.asp?press=1555 4. Danyel, F.: Using Egocentric Networks to understand communication. IEEE Internet computing (2005) 5. Dickinger, A., Arami, M., Meyer, D.: The role of perceived enjoyment and social norm in the adoption of technology with network externalities. European Journal of Information Systems 17, 4–11 (2008) 6. Elderman, J.B.: Distributed Influence: Qunantifying the impact of social media, An Edelman White Paper (2007) 7. Fisher, D., Dourish, P.: Social and temporal structures in everyday collaboration. In: Proc. Conf. Human Factors in computing systems, pp. 551–558. ACM Press, New York (2004) 8. Fragoso, S.: WTF a crazy Brazilian invasion. In: Proceedings of CATaC 2006, pp. 255– 274. Murdoch University, Murdoch (2006) 9. Herring, S.C., Paolillo, J.C., Ramos Vielba, I., Kouper, I., Wright, E., Stoerger, S., Scheidt, L.A., Clark, B.: Language networks on LiveJournal. In: Proceedings of the Fortieth Hawai’i International Conference on System Sciences. IEEE Press, Los Alamitos (2007) 10. Hsu, W.H., Lancaster, J., Paradesi, M.S.R., Weninger, T.: Structural link analysis from user profiles and friends networks: A feature construction approach. In: Proceedings of ICWSM 2007, Boulder, CO, pp. 75–80 (2007) 11. Li, C., Bernoff, J.: Social technographics Ladder, from Forrester Research in Groundswell: Winning in a World Transformed by Social Technolgies. Harvard Business Press (2008) 12. McLeod, D.: QQ Attracting eyeballs. Financial Mail (South Africa), 36 (2006) 13. Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: Proceedings of 12th International Conference on Knowledge Discovery in Data Mining, pp. 611–617. ACM Press, New York (2006) 14. Rhee, R.H., Kiran, P.S., Lee, J.Y., Lee, J.Y.: Media sharing and collaboration within mobile community: Self expression and socialization. In: Proceedings of the 12th Interactional conference on Human computer Interaction, Bejing (2005) 15. Shneiderman, B.: Understanding human activities and relationships. Leonardo’s Laptop: Human Needs and the New Computing Technologies. MIT Press, Cambridge (2002) 16. Skog, D.: Social interaction in virtual communities: The significance of technology. International Journal of Web Based Communities 1, 464–474 (2005) 17. Sundén.: Journal of Material Virtualities. Peter Lang, New York (2003) 18. Wave.3.: Power to the people social media tracker. Universal McCann. Next Thing Now (2008)
Empirical Evaluation of Throwing Method to Move Object for Long Distance in 3D Information Space on Mobile Device Yu Shibuya, Keiichiro Nagatomo, Kazuyoshi Murata, Itaru Kuramoto, and Yoshihiro Tsujino Kyoto Institute of Technology Matsugasaki, Sakyo-ku, Kyoto 606-8585, Japan [email protected]
Abstract. In our previous work, a throwing method to move an object for long distance in 3D information space on a mobile device was proposed. With the method, as we throw the object to move it far away in the real world, we can throw the virtual object in 3D information space. This simple throwing method was improved by adding following three functions. They were real time adjusting the direction of moving object, moving viewpoint to follow the thrown object, and initializing viewpoint after the movement. The purpose of this paper is to examine the performance of the improved throwing method to move the object for long distance in 3D information space. From the experiment, it is found that the improved throwing method is efficient to move the object for long distance in 3D information space on mobile devices. Keywords: mobile interaction, 3D information space, throwing method, human computer interaction.
1 Introduction In these days, many people carry their own mobile devices such as PDAs or mobile phones. Modern mobile devices have power to process and display 3D information space. 3D information space is usable for various purposes, such as showing complex structures or representing some realistic objects. However, there is no suitable interaction method to operate the object in 3D information space on mobile devices. In our former work [1], we proposed a new interaction method, named as Handy Window. In that work, it was found that the hand gesture behind the mobile device was usable to move or rotate the virtual object (Fig. 1). However, in order to move the object far away, users are suffering from physical fatigue or the task performance is decreased because they should move the mobile device or their hand widely. In order to avoid such long movement, we then proposed a novel object moving method with throwing gesture, named as throwing method [2] (Fig. 2). With the method, users can not only throw the object in 3D information space but also adjust the direction of moving object as they want, follow the thrown object by moving viewpoint, and initialize viewpoint easily after the movement. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 468–475, 2009. © Springer-Verlag Berlin Heidelberg 2009
Empirical Evaluation of Throwing Method to Move Object
469
From the preliminary experiment, it was found that the proposed method reduced the physical fatigue of the user but there was no significant improvement of the performance. Because the previous experiment treated the short movement only, it might be difficult to find the difference between the throwing method and non-throwing method. This paper focuses on evaluating the performance of the improved throwing method to move the object for long distance in 3D information space.
Fig. 1. Concept of Handy Window. The user can move or rotate the virtual object with the hand gesture behind the mobile device. display
display
nd ka pic
user
mo
ve
t nd ka c i p
w hro
user
Fig. 2. Two object moving methods. With the pick and move method (left), the user can move the object for short distance with changing his/her hand position. With the throwing method (right), the user can move the object for long distance with throwing gesture.
2 Throwing Method As mentioned above, the throwing method was proposed in our previous work [2]. With the method, as we throw the object to move it far away in the real world, we can throw the virtual object in 3D information space in order to move it for long distance. In the method, a user picks up an object with his/her fingers firstly, then moves his/her hand to the desired direction with a certain velocity, and releases the object. After that, the object begins to fly to the target position. When the user makes a picking action again, the object stops to fly. Then the user can adjust the location of the
470
Y. Shibuya et al.
object by moving his/her hand. Finally, when he/she releases the object without moving his/her hand, the object is fixed there. However, with above simple throwing method, it is sometimes difficult to move the object far away. As we experienced in the real world, it is difficult to throw the object to the desired direction accurately. Furthermore, because the thrown object is getting smaller on the display as it moved, it is difficult to grasp the object location accurately. In order to tackle the first problem, the user is allowed to adjust the direction of the flying object by his/her finger gesture (Fig. 3). For the second problem, the view point follows the thrown object and the size of object is not changed until it is fixed (Fig. 4). Furthermore, the user can get the view point back with simple hand gesture (Fig. 5). In this paper, we call this method improved throwing method to distinguish with simple one. Goal
Obstacles Object Start
Fig. 3. Adjustment of direction. The user can change the direction of moving object with changing his/her finger direction.
Object
Viewpoint (Cam era)
Fig. 4. View control. Camera follows the object to show it in moderate size on the display.
Fig. 5. View initialization. After object movement (left), camera view point is initialized (right) with beckoning hand gesture (center).
Empirical Evaluation of Throwing Method to Move Object
471
3 Experiment The purpose of the experiment is to examine the performance of the improved throwing method to move the object for long distance in 3D information space. 3.1 Compared Moving Methods Three kinds of object moving method are evaluated in the experiment. They are the improved throwing method, the simple throwing method, and the non-throwing method. The non-throwing method does not allow the user to throw the object. The user must move his/her hand or body to move the picked object. We expected that the efficiency of both throwing methods was getting better than that of non-throwing method as the distance to move was getting longer. Furthermore, the improved throwing method should be better than simple one. 3.2 Prototype System for Experiment A prototype system was constructed for the experiment as shown in Fig. 6. It consists of a motion tracker for tracking position and orientation of the mobile device and participant’s hand, and a data glove for detecting participant’s hand gesture. With the limit of the detectable distance of the motion tracker, participants could not be longer than one meter away from the transmitter of the tracking system. With this reason, non-throwing method, which needs participant’s movement, is evaluated in short distance movement only. In the prototype system, sensed data of the motion tracker and that of the data glove are processed by a desktop PC. The PC also makes a visual feedback for the mobile device. Visual Feedback Desktop PC Mobile Device Sensed Data
Motion Tracker
User
Data Glove
Fig. 6. System configuration of the experiment
472
Y. Shibuya et al.
3.3 Procedure Twelve participants were recruited from our research laboratory. In the experiment, each participant was asked to pick up the object at the starting point and move it the target point. The distance between the staring point and target one was set at 0.5m, 1m, 1.5m, 3m, and 5m respectively. In the experiment, task completion time and error rate were measured. Subjective evaluation was also done to evaluate the intuitiveness, ease of learning to use, and the level of fatigue of each moving method. Fig. 7 shows a snap shot of an experiment.
(a) Overview
(b) View over the user’s shoulder Fig. 7. A snapshot of experiment
4 Results and Discussion 4.1 Task Completion Time From the experiment, as shown in Fig. 8, it was found that the task completion time with non-throwing method was significantly shorter than the simple throwing method
Task completion time [sec]
30
Non-Throw Throw Improved Throw
*
*:p<0.05
*
20 *
*
* *
10
0 Target distance [m] Fig. 8. Task completion time
Empirical Evaluation of Throwing Method to Move Object
473
(p<0.05) while the distance for move was less than or equal to 1.5m. However, it was linearly getting longer as the distance was increased. The task completion time of simple throwing method was also getting longer with the distance was increased. On the other hand, the task completion time of the improved throwing method was not so changed while the distance was increased. The results show that the improved throwing method has better performance to move the object for long distance than other methods. With the improved throwing method, participants can adjust the direction of the object while moving and they can also easily set the object at the target point because both object and target has moderate view size on the display. Furthermore, in order to complete the task, they can quickly initialize the viewpoint after the movement with simple hand gesture. These are why the improved throwing method is more efficient than others. 4.2 Error Rate The result of error rate is shown in Fig. 9. From this figure, it is found that there was no significant deference but the error rate of simple throwing method increased as the target distance became long. However, talking about the improved throwing method, the error rate of it was not changed so much. With the simple throwing method, it was difficult to set the object to the far target position because the target size was so small on the display. With the improved method, the view camera in 3D information space follows the moving object ant the user can see the target in reasonable size on the display when the object closes to it. This made the error rate of the improved throwing method lower than that of simple one.
100 1RQ 7KURZ 7KURZ ,PSURYHG7KURZ
Error rate [%]
80 60 40 20 0 0.5
1.0
1.5
3.0
5.0
Target distance [m] Fig. 9. Error rate
4.3 Subjective Evaluation Fig. 10 shows the result of subjective evaluation. From the subjective evaluation, followings were found. Firstly, the non-throwing method was significantly more intuitive
474
Y. Shibuya et al.
Subjective evaluation
Good 5
*
*
*
*
* * *:p<0.05
4 3 2
1 Bad
Non-Throw Throw Improved Throw Q1.Intuitiveness Q2.Easy to learn
Q3.Fatigue
Questionnaire Fig. 10. Subjective evaluation
than others (p<0.05) but the intuitiveness of both throwing method was moderate. Secondly, all three methods were easy to learn. From these results, most user might be soon familiar with throwing virtual object method while they were not so familiar with it until the experiment. Finally, the level of fatigue of the improved throwing method was significantly better, namely lower, than others (p<0.05). With the improved throwing method, participants did not move themselves so much while they must move so much with non-throwing method or re-throw so many times with simple throwing method.
5 Related Work Throwing action has been used to move objects across larger distance on wall size screen [3], [4], but the action has not been used for the object in 3D information space on mobile devices. Gilbertson et al. [5] explored a tilt interface for a 3D graphics first-person driving game and compared it with a traditional phone joypad interface experimentally. They showed that the tilt interface was experienced as fun and attractive to players. They focused on navigation in virtual 3D information space. This paper focuses on not only the navigation but also the manipulation of the virtual object in 3D information space on mobile phones.
6 Conclusion In this paper, the empirical evaluation of the improved throwing method was done. From the experiment, it is found that the improved throwing method is efficient to move the object for long distance in 3D information space on mobile device. Furthermore, it is also found that the additional three functions, adjusting the direction
Empirical Evaluation of Throwing Method to Move Object
475
while moving, moving viewpoint to follow the thrown object, and initializing the viewpoint after movement, are effective to improve the task performance.
References 1. Shibuya, Y., Taniguchi, N., Kuramoto, I., Tsujino, Y.: Handy Window: An Interface for Intuitive Interaction of My Portable Information Terminal and the Other Ubiquitous Devices. In: Proc. HCI International 2005 (CD-ROM), vol. 5 (2005) 2. Nagatomo, K., Murata, K., Kuramoto, I., Shibuya, Y., Tsujino, Y.: Efficient Object Moving Method with Throwing Gesture in 3D Information Space on Mobile Device. In: Proc. Symposium on Mobile Interactions, pp. 53–58 (2008) (in Japanese) 3. Geissler, J.: Shuffle, Throw or Take It! Working Efficiently with an Interactive Wall. In: CHI 1998 conference summary on Human factors in computing systems, pp. 265–266 (1998) 4. Collomb, M., Hascoeet, M., Baudisch, P., Lee, B.: Improving Drag-and-Drop on Wall-size Displays. In: Proceedings of Graphics Interface 2005, pp. 25–32 (2005) 5. Gilbertson, P., Coulton, P., Chehimi, F., Vajk, T.: Using “Tilt” as an Interface to Control “No-Button” 3-D Mobile Games. Comput. Entertain. 6(3), 1–13 (2008)
Usefulness of Mobile Information Provision Systems Using Graphic Text -Visibility of Graphic Text on Mobile Phones Tomoyuki Watanabe1, Masako Omori2, Satoshi Hasegawa3, Shohei Matsunuma4, and Masaru Miyao5 1
Faculty of Psychological and Physical Science, Aich-Gakuin University, 12 Araike, Iwasaki-cho, Nisshin 470-0195, Japan [email protected] 2 Faculty of Home Economics, Kobe University, 2-1 Aoyama Higashisuma, Suma-ku, Kobe 654-8585, Japan [email protected] 3 Dept. of Information Culture, Nagoya Bunri University, 365 Maeda Inazawa-cho, Inazawa, Aichi 492-8520, Japan [email protected] 4 Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan [email protected] 5 Information Technology Center, Nagoya University, Furo-cho, Chikusa-Ku, Nagoya 464-8601, Japan [email protected]
Abstract. Textual information can be sent as graphic images without being coded in the broadband digital network. By using graphic text, characters or symbols with unsupported fonts can be displayed. Graphical e-mail systems in mobile phones for sending digital photographs are useful in sending graphic text. We researched the visibility of graphic text on the liquid crystal displays in mobile phones comparing with that of font in the mobile phones, by measuring the variables of reading time and visual distance. We also recorded the number of errors, and subjects evaluated the visibility. Graphic text prepared in the JPEG format had nearly the same visibility of the original font. However, it must be noted that visibility deteriorates as the character size become smaller, and as the user become older. We also mention about the possibility of multilingual disaster information system with mobile phones as an application of the graphic text. Graphic text enables easy display of multilingual information by ordinary types of mobile phones which does not support multilingual characters. Keywords: graphical character, digital photo e-mail, size of character, disaster information, multilingual text.
1 Introduction The increasing use of broadband in the information network is making it possible to send and receive large amounts of data, and the exchange of graphic data on mobile phones and other mobile information devices is becoming common. With this M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 476–485, 2009. © Springer-Verlag Berlin Heidelberg 2009
Usefulness of Mobile Information Provision Systems
477
technology textual information can be sent and received as graphic images, without being coded for communication. The usefulness of optical character recognition systems that convert graphic textual information into coded text is clear, but there are also various possible applications for “graphic text” in which textual information is converted to graphic image data. The advantage of graphic text is that it is not limited by the display functions of the terminal. Thus, text can be displayed freely in terms of language, letter or character shape, and arrangement, controlled by the display layout at the information source. In addition, symbols and pictograms other than letters can be used, and combinations of pictures and letters such as in maps or comics can be easily realized. We report here the results of an assessment of the visibility [1-3] of graphic text on liquid crystal displays, with the aim of realizing a multilingual information provision system using graphic text for mobile phones [4]. We also discuss possible applications of graphic text.
L
M
S
VS
Fig. 1. Examples of graphic text used in visibility assessment experiment. Short Japanese sentences are presented in the JPEG format. Width of these images are the same as the resolutions of the mobile phones used in the experiments. Large (L): 8 characters/line. Medium (M): 10 characters/line. Small (S): 12 characters/line. Very Small (VS): 20 characters/line.
2
Visibility Experiment
2.1 Aim The aim of this experiment was to investigate the visibility of graphic text (Fig. 1) on mobile phone liquid crystal displays (LCDs), in order to clarify the possibilities and problems for mobile phones as information supply systems using graphic text. We performed the following 3 experiments. Experiment 1: Comparison of fonts and graphic text. Experiment 2: Influence of text size on visibility. Experiment 3: Differences in visibility depending on subject age. 2.2 Methods Experiment 1: The subjects were 28 Japanese males and females aged 20-41 years (24.8 ± 6.9 years). The mobile phone used was a Sharp SH53 (240 x 320 dot, 2.4 inch
478
T. Watanabe et al.
CG Silicon LCD). Four samples were used: (1) font (character size: small), (2) graphic text (small), (3) font (very small), and (4) graphic text (very small). Examples of character size are shown in Fig. 1. Experiment 2: The subjects were 24 Japanese males and females aged 21-28 years (23.2 ± 1.8 years). The mobile phone used was a Sanyo SA51 (2.1 inch, 132 dot width, TFT color LCD). Three graphic text samples were used: (1) large, (2) medium, and (3) small (Fig. 1). Experiment 3: The subjects were 88 Japanese males and females aged 20-79 years (46.3 ± 17.9 years). There were 41 people in the young age group of 20-39 years (29.3 ± 6.2 years), 14 in the middle-aged group of 40-59 years (47.6 ± 6.8 years), and 33 in the elderly group of 60-79 years (66.4 ± 4.7 years). The mobile phone used was a Sanyo SA51 (same as Experiment 2). The three types of graphic text used were (1) large, (2) medium, and (3) small (Fig. 1).
Visual Distance
Mobile Phone
Voice Reading
Reading Time
Error Count
Subjective Evaluation
Fig. 2. Items measured in evaluating visibility. Subjects sit on the chair and read aloud the sentences displayed on the LCD in the mobile phone. Visual distance, reading time and number of error were measured. Subjective evaluation was recorded every after reading.
The graphic text used in the experiments was prepared in the MS-Mincho typeface in Joint Photographic coding Expert Group (JPEG) format. The character sizes were large (8 characters/line), medium (10 characters/line), small (12 characters/line), and very small (20 characters/line) (Fig. 1). The graphic text was prepared to match the resolution of the liquid crystal in the mobile phone used in each experiment, so that it filled a single screen without scrolling. The text samples were prepared in sufficient variation so that no subject read the same text more than once, and the samples were displayed in rotation so as to avoid influence from the order and content of the text. Subjects used eyeglasses as needed according to their normal habits, and read aloud the text shown on the liquid crystal display of the mobile phone. An examiner recorded the time to read the text, number of errors, and visual distance (Fig. 2). After reading the text, the subjects graded (subjective evaluation) the ease of reading on a 5-point scale (5: very easy to read -1: very difficult to read).
Usefulness of Mobile Information Provision Systems
479
2.3 Results Measured parameters shown in Fig. 2, reading speed (character/sec.) (= number of characters / reading time) and error rate (%) (= 100 × error count / character) were statistically analyzed. Results of the experiment 1, 2 and 3 were shown in Fig. 3, 4 and 5, respectively. Experiment 1 (see Fig. 3): The results of a two-way ANOVA with character size (small or very small) and type of text data (font or graphic text) as factors showed that only character size was significant as a main effect in subjective evaluation (p<0.0001) and visual distance (p=0.0121). Neither the type of data nor the interaction had a significant effect. The p-values shown in the Fig. 3 are the results of t-test conducted with mean values, with consideration of application to each subject. In this test, subjective evaluation, reading speed and visual distance were uniformly lower significantly in very small size than in small size, although the type of text data had no significant effect. Almost no error was detected in this experiment. Experiment 2 (see Fig. 4): The effect of character size (large, medium, small) was examined, and with smaller character size the subjective evaluation of reading ease decreased, reading speed decreased, and visual distance became shorter. Significant differences by one-way ANOVA were seen in the subjective evaluation as shown in Fig. 4. Most subjects could read perfectly so that error rate was <0.005% in average (few subject took at most 2 mistakes while reading 45 characters), and that rare had no significant difference between the size of characters. Experiment 3 (see Fig. 5): The results of a two-way ANOVA with character size (large, medium, small) and age groups of subjects (young, middle-aged, elderly) as factors showed that both of them were significant as main effects in reading speed (both p<0.0001). The interaction had no significant effect. The reading speed and error rate increased as subjects became older. 5 person in elderly group of 33 person (15.2%) could not complete reading graphic characters of small size because of bad view. These 5 were excepted from the data of elderly (60-79) and size S in reading speed and error rate. No difference was seen between the different age groups in the subjective evaluation. Although visual distance became shorter as character size become smaller in young and middle-aged group, visual distance increased with subject age, and among the elderly subjects no significant difference was seen with character size. The p-values shown in Fig. 5 are the significant differences among character size in the results of the two-way ANOVA and the differences among the age groups by a one-way ANOVA in individual character size. 2.4 Discussion The decrease in the parameters of subjective evaluation, reading speed and visual distance in young people with normal uncorrected or corrected vision was thought to be a result of decreased visibility. Bad visibility should also cause an increase of the error count. It was seen from the results of Experiment 1 comparing visibility of font and graphic text that graphic text prepared in the JPEG format had nearly the same visibility as the font included in the mobile phone. Moreover, it was seen from
480
T. Watanabe et al. p<0.0001
p<0.0001
p=0.0448
(b) Reading Speed
(a) Subjective Evaluation
p=0.0001
p=0.0137
p<0.0001
(c) Visual Distance
Fig. 3. Experiment 1, comparison of visibility of font and graphic text p<0.0001 p=0.0140 p<0.0001
(a) Subjective Evaluation
(b) Reading Speed
Fig. 4. Experiment 2, visibility and font size
(c) Visual Distance
Usefulness of Mobile Information Provision Systems
481
0
Fig. 5. Experiment 3, visibility of graphic text by age. Young: 20-39, Middle-aged: 40-59, Elderly: 60-79 years old.
Fig. 6. Multilingual graphic text (Chinese, Korean, English)
Experiment 2 that visibility decreases with smaller character size, and from Experiment 3 that visibility was lower for the elderly, especially when character size was small.
482
T. Watanabe et al.
3 Possibility of Information Provision System Using Graphic Text The results for subjective evaluation of visibility in the previous section indicate that, in phone models with comparatively high resolution and brightness, the visibility of graphic text on mobile phone LCDs is nearly equivalent to that of fonts. Even with lossy compression of the JPEG format, it was found that graphic text compressed to the size that can be transmitted on today’s mobile phones has sufficient picture quality in terms of visibility for reception on a mobile terminal. However, caution is needed since visibility decreases with smaller character sizes, especially for older people. Providing information as graphic text allows the text to be displayed in fonts or display formats not handled by the mobile terminal. For example, mobile phones in most countries display text in the letters or characters of the language of that country plus alphanumeric characters. With graphic text, however, these phones can display multiple languages. Nearly all mobile phones in Japan are equipped to handle only alphanumeric characters and the Japanese language, but Chinese, Korean, or other languages sent as graphic text (Fig. 6) can be received by anyone in Japan regardless of the model of phone they are using.
Fig. 7. Template system for translating disaster information into multiple languages. This system is a Web application. Buttons of 38 categories concern with disaster information and 2 listing function are shown in the upper area of this page. Template sentences in the selected category are shown in the lower area.
Usefulness of Mobile Information Provision Systems
483
Fig. 8. Examples of multilingual sentences translated by the template disaster information system (shown in Fig. 7). By filling blanks in Japanese template, English, Korean, Chinese and Portuguese text are available.
Information Source
Multilinguall Information l
Graphicall Character
Fig. 9. Provision of multilingual information through graphic e-mail
The authors’ group has already developed a multilingual template translation system [5] to provide information in time of disasters (Fig. 7). This system is a Web application. More than 500 sentences and their translated templates are recorded in this system of the current version. These templates has categorized in 6 main categories and 38 detailed categories concerning with disaster information, especially with earthquake information as shown in Fig. 7. Buttons of these 38 categories and buttons to list up all Japanese templates and to select information for foreigners are
484
T. Watanabe et al.
shown in the upper part of the page (Fig. 7). With Japanese as the template, information such as places or times can be input as necessary and instantly output as sentences translated into English, Korean, Chinese, Portuguese, and other languages. Each template sentence is related to disaster and disaster prevention, and is short enough to be sent to mobile phones as graphic text (Fig. 8). A possible flow of a mobile information provision system using graphic text is shown in Fig. 9. Moreover, in languages such as English and Korean in which there is a space between words, appropriate line breaks have a great effect on readability. Such languages become more difficult to read if line breaks occur in the middle of words. With graphic text, however, appropriate line breaks can be placed in advance by the sender in the graphic text layout, so that the displayed text is easy to read even if the receiver’s phone does not have such a line break function like ordinary models of mobile phones used in Japan now (Fig. 10). This is another example of the application of graphic text. Another advantage of graphic text is that pictograms and special symbols other than letters can be used, making it easy to combine maps and words, for example. Handwritten text can also be sent as is. Money can be withdrawn at bank ABC if you present personal identification such as your (a) Graphic text
Money can be withdrawn at bank ABC if you pr esent personal identif ication such as your
(b) Font
Fig. 10. Appropriate line breaks with graphic text
4 Conclusion and Future Issues With the aim of developing a mobile information provision system using graphic text, we experimentally assessed the visibility of graphic text on mobile phone displays, and found that graphic text is useful in displaying written information. Graphic text enables easy display of information in multiple languages, and has the advantage that text size and layout can be controlled by the sender. It should be useful in providing disaster prevention and emergency information to foreign residents of Japan, the aged, or persons with certain disabilities who may be at a disadvantage in terms of obtaining information. A multilingual information provision system using template translation will make it possible to provide information to people from other countries. However, caution is needed in displays intended for the elderly [1,3], as many have difficulty reading small characters [2,6]. To realize universal design [6-8], it may be necessary to have a text scaling function on the user’s side. Future issues will be compatibility of the display when providing information using combinations of graphic text and fonts, and realization and validation of bidirectional conversion between graphic text and fonts using character recognition systems that convert between graphic and font-based text.
Usefulness of Mobile Information Provision Systems
485
References 1. Omori, M., Watanabe, T., Takai, J., Takada, H., Miyao, M.: Visibility and characteristics of the mobile phones for elderly people. Behavior & Information Technology 21, 313–316 (2002) 2. Omori, M., Miyao, M., et al.: Visibility of mobile phones -Display characteristics and visual function. In: IEA 2003, Seoul, Korea (2003) 3. Fujikake, K., Mukai, M., Kansaku, H., Miyoshi, M., Omori, M., Miyao, M.: Readability for PDAs and LCD monitors among elderly people. Jpn. J. Ergonomics. 40, 218–227 (2004) (in Japanese) 4. Hasegawa, S., Irie, Y., Omori, M., Matsunuma, S., Miyao, M.: Visibility of graphical character e-mail in multiple languages on mobile phones: ESK and JES joint symposium 2004. Jpn. J. Ergonomics 40, 50–53 (2004) 5. Sato, K., Okamoto, K., Miyao, M.: Multilingual and ubiquitous information system for disasters: ESK and JES joint symposium 2004. Jpn. J. Ergonomics 40, 88–91 (2004) 6. Miyao, M., Hacisalihzade, S.S., Allen, J.S., Stark, L.W.: Effect of VDT resolution on visual fatigue and readability: an eye movement approach. Ergonomics 32, 603–614 (1989) 7. ISO 9142-3: Ergonomic requirements for office work with visual display terminals (VDTs) (1992) 8. JIS S 0032: Guidelines for the elderly and people with disabilities -Visual signs and displays- Estimation of minimum legible size for Japanese single character (2003)
The Importance of Information in the Process of Acquisition and Usage of a Medicine for Patient Safety: A Study of the Brazilian Context Patricia Lopes Fujita and Carla Galvão Spinillo Postgraduate Program in Design, Federal University of Paraná, Rua General Carneiro, 460, 8o andar/ Curitiba / PR 80060-150, Brazil [email protected], [email protected]
Abstract. Considering the importance of oral and visual information on acquisition and use of medication, this paper discusses the outcomes of an analytical study on the cognitive load in performing the task “take a medicine” during a health treatment process (acquisition, usage and disposal of medicine) in Brazil. The task was described in a flowchart according to the steps/actions, decision points and expected outcomes to a successful performance. This allowed finding out the task informational structure during the process of taking a medicine. Then, the cognitive activities in each moment of the process were identified. The outcomes indicate the highest cognitive load occurs in the acquisition of information by patients during a health treatment process, what would affect task performance. The findings ratify the relevance of information on the process of acquisition and usage of medicines by Brazilian patients. Keywords: Medicines, information, task analysis.
1 Introduction Information transmitted through visual instructions play an important role on the use of medicines and good information is part of efficient communication between healthcare providers and patients [1]. Therefore, the success on performing a task such as ‘taking a medicine’ depends on the reading and understanding the information of medicine inserts (visual instructions). The medicine insert is a printed instruction document, essential in the use of any kind of medication, providing specific information, such as: indication for medical treatment, pharmaceutical composition, warnings, and instructions of use [2]. A study conducted in England [3] suggests that patients have special interest for medicine information concerning side effects, what the medicine is for, and how to take it. However, such information may not be properly available to readers, mainly regarding its graphic presentation. This plays an important role in medicine use and in comprehension of medicine inserts by patients. Thus, graphic presentation of information is a crucial aspect for a medicine successful communication to patients [1]. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 489–496, 2009. © Springer-Verlag Berlin Heidelberg 2009
490
P.L. Fujita and C.G. Spinillo
Deficiencies in both information content and its graphic presentation in medicine inserts may lead to erroneous medicine usage. It may cause additional (or even serious/lethal) health problems to patients, such as error on the dosage, and frequency of taking a medicine [4]. Healthcare providers (e.g. doctors, pharmacists, nurses) also play an important role on the process of acquisition of a medicine by a patient, possibly providing oral and written information, contributing for patients/readers previous knowledge acquisition [5]. In a study conducted in Brazil [6], the medicine insert was considered by patients the most relevant source of information after the medical prescription and that both oral and written information acquired for the use of a medicine are chiefly important on patients healthcare education. The authors also point out that the oral information provided by doctors is insufficient, considering that during a consultation the patient may give priority to other kinds of information (e.g. diagnosis, disease) which comes prior to the medicine prescription. Further on the patient may not memorize or remember all the information given by the doctor after the consultation. Therefore, visual information can supplement and reinforce patients’ knowledge about a medicine, on the cognitive and memory perspective. 1.1 The Brazilian Medicine Context of Use In Brazil, in synthesis there are three general types of medicine: (a) Over the counter (medicines that can be sold to a customer/patient without a medical prescription, such as: analgesics or aspirin); (b) Medical prescription medicines (those can only be sold with medical prescription); and (b) Controlled medicines (generally distributed for free by the Brazilian government health units for patients under treatment of chronicle diseases). Those type of medicines are presented on Table 1. Table 1. Types of medicine in Brazil
(a) Over the counter medicines (OTC) (can be sold without medical prescription)
Fig. a. OTC blister
Fig. b. OTC blister
The Importance of Information in the Process of Acquisition and Usage of a Medicine
491
Table 1. (continued)
(b) Medical prescription medicines (MPM) (can only be sold with medical prescription)
Fig. c. MPM package
Fig. d. MPM package
Fig. e. MPM package
(c) Controlled medicines (CM) Generally distributed (for free) by the Brazilian government health units for patients under treatment (e.g. hiv, tuberculosis)
Fig. f. CM package
Fig. g. CM package
Fig. h. CM package
In reference to Table 1, the (a) Over the counter medicines are usually packed only in form of blisters (as presented on Figures a and b). The (b) Medical prescription and (c) Controlled medicines are packed in paper boxes as extern package and inside comes with the intern container depending on the medicine’s pharmaceutical form (e.g. pills, solution). Despite the [10] Brazilian regulation stating that (b) Medical Prescription medicines can only be acquired by customers/patients presenting a document containing the medicine prescription written and signed by a doctor, many medicines of this kind are frequently sold in pharmacies without the presentation of
492
P.L. Fujita and C.G. Spinillo
any medical prescription. This fact may lead to self-medication by patients and consequently put their health at risk depending on the medicine. Although, in this case, the medicine insert is the only official source of information for the patient. This emphasizes the importance of the medicine insert as to visual information in the process of acquisition and usage of a medicine, even though medical oral information is also of large relevance to patient’s education. By taking into account the above mentioned aspects (1. Introduction and in this topic 1.1 The Brazilian medicine context of use) and considering the importance of oral and visual information on the acquisition and the use of medication, this paper presents an analysis about the demand of cognitive activity necessary to perform the task “take the medicine” according to the Brazillian context of use for a medical prescription medicine. The Analysis was made prior to the adapted diagram structured by [1] [7] Van der Waarde (2004, 2006) about a patient’s experience during a acquisition and use of a medicine, it was developed a structure of decomposition of task analysis, proposed by [8] Moraes & Mont’Alvão, being analyzed through the model developed by [9] Militello & Hutton (1998), to identify which steps demand more cognitive activity load.
2 Task Analysis of the Process of Acquisition and Usage of a Medicine by Patients in Brazil To provide appropriate and efficient information it is important to consider the whole process of acquisition and use of a medicine by a patient [1,7]. Thus, it demands a deeper look on the situations in which information is more relevant/significant for patients. In order to analyze patients’ actions/experience of prescribed medicines context of use, [1,7] Van der Waarde (2004, 2006) structured a diagram to describe the process of acquisition and use of a prescribed medicine, divided in five stages: (1) Health decision (problem); (2) Consult doctor; (3) Visit pharmacy; (4) Take the medicine; (5) Health decision (treatment results). Although this diagram focus on the regulation, social context and cultural characteristics of the European Union, it was possible to be adapted and applied to investigate the Brazilian patients’ context of use, taking into account the generalized character of the mentioned steps. The Figure 1 presents the diagram adapted by the authors:
Fig. 1. Adapted diagram of the process of acquisition and use of a medicine (VAN DER WAARDE, 2004, p.85; 2006, p.41)
The Importance of Information in the Process of Acquisition and Usage of a Medicine
493
In reference to Figure 1, the first step (1) Health decision (problem), initially the patient needs to recognize adversary health symptoms to make the decision of consulting a doctor. Although, according to [1,7] Van der Waarde (2004, 2006) this decision may vary depending on the patients’ cultural, geographic, age and gender characteristics. On the second step (2) Consult doctor (Figure 1), the patient must describe personal details and symptoms to the doctor. Based on this description the doctor will diagnose the health problem and ask for further examination, if necessary, and prescribe a medicine(s) to start a health treatment. In this step the patient has to comprehend and memorize the oral information transmitted by the doctor. After the consultation the patient will perform the third step to (3) Visit the pharmacy (Figure 1), to acquire the prescribed medicine and eventually receive oral information from the pharmacist. It is important to notice that once more the patient has to comprehend and memorize oral information. On the fourth step (4) Take the medicine (Figure 1); initiates the process of using the medicine. Especially for this step Van der Waarde (2004, 2006), developed a subdiagram concerning four sub-steps about taking/using a medicine: (1) Open the package; (2) Consider the information; (3) Take the medicine; (4) Stop taking? (Health decision). This sub-diagram is presented and described on the next topic (‘2.1 Decomposition of the task “Take the medicine” to analyze cognitive load’). On the fifth step (5) Health decision (treatment results) (Figure 1) the patient has to make a decision to stop taking the medication depending on the results/feedback on his health symptons. If more treatment is needed the patient will need to consult the doctor and therefore perform the other steps again. 2.1 Decomposition of the Task “Take the Medicine” to Analyze Cognitive Load Considering the fourth step (4) Take the medicine and its sub-diagram [1,7] in order to obtain a more detailed visualization on the process of using a medicine it was developed an task representation using a task analysis decomposition model [8]. This representation characterize the task structure defining the sequence of activities necessary to perform the task “Take the medicine” correctly. The Figure 2 presents the visual representation of the task decomposition.
Fig. 2. Decomposition of the task “Take the medicine”
494
P.L. Fujita and C.G. Spinillo
In accordance with Figure 2, it is observed that the second sub-step (2) Consider the information involves the highest number of outputs: four different sources of information (doctor; pharmacist; medicine insert; package) to be regarded by the patient. On the third sub-step (3) Take the medicine the three options (3a. pill; 3b. injection; 3c. suspension) refer to possible pharmaceutical forms of a prescribed medicine. In order to analyze and to identify the cognitive activity load necessary to use a medicine correctly based on the decomposition of the task “Take the medicine” (Figure 2) it was used the model developed by [9] Millitello and Hutton (1998). This model allows describing and identifying cognitive activity load during a task performance related to six categories: knowledge, comprehension, application, analysis, synthesis and evaluation. The Cognitive activities related to each of those categories are described on Table 2. Table 2. Model to analyze and identify cognitive activity load [9]
Categories
Cognitive activities Acquire; Identify; Knowledge Recognize; Define; Nominate Explain; Comprehension Describe; Interpret; Illustrate Apply; Relate; Application Use Resolve; Construct Analize; Analysis Categorize; Compare; Discriminate Create; Project; Synthesis Specify; Propose Develop Validate; Evaluation Argument Judge; Recommend Justify According to Table 3, the presented and described categories where applied to evaluate the cognitive activities load involved to the task decomposition “Take the medicine” (Figure 2).
The Importance of Information in the Process of Acquisition and Usage of a Medicine
495
Table 3. Cognitive activity during the task ‘Take the medicine’
Categories Knowledge Comprehension Application Analysis
Task: ‘Take the medicine’ 1. Open the 2. 3. Take package Consider the information medicine Cognitive activities Identify Acquire Recognize Identify Interpret Relate Resolve Analize Compare
Evaluation
4. Health decision
Analize Compare Judge
The first step (1) Open the package is the initial interaction between the patient and the medicine with its visual information located on the extern (e.g. paper box)/intern (e.g. blister or bottle) package and medicine insert leaflet. In this moment the patient conduct cognitive activity on the categorie of Knowledge, identifying and recognizing the prescribed and acquired medicine (Table 3). On the second step (2) Consider information, the patient has to interpret and interrelate all the gathered information during the medicine process of acquisition. Thus, in order to process this information the patient will need to conduct the cognitive activities (presented on Table 3): Knowledge (identify and acquire information); Comprehension (interpretate the read information); Application (relate information and resolve, making decisions based on the interpreted information); Analysis (analyze and compare the acquired information). In accord with the third step (3) Take the medicine, the patient has to be prepared to perform the task of use the medicine correctly. Although, depending on its pharmaceutical form (e.g. pill, injection, suspension) it will imply on different task performance actions (e.g. to swallow, to inject, to prepare a solution). Thereof, the considered cognitive activity on the categorie of Application (apply and usage of the medicine), presented on Table 3. On the fourth step (4) Health decision, the patient has to evaluate effects of the medicine on his/her health (e.g. symptoms, side effects, adverse reactions); and decide if/when he/she needs to stop taking the medicine or to consult the doctor. In this concern, the performed cognitive activities will be (presented on Table 3): Analysis (analyze health/symptoms and compare) and Evaluate (judge).
3 Conclusion and Final Considerations Considering the importance of oral and visual information on the process of acquisition and the use of a medicine, it was developed an analysis about the patient’s
496
P.L. Fujita and C.G. Spinillo
cognitive activity load necessary to perform the task “Take the medicine”, based on the Brazilian context of use. The second described and analyzed step (2) Consider information (part of the task “Take the medicine”) demanded more cognitive activity to be performed, combining the categories of: Knowledge; Comprehension; Application and Analysis. In this step (2) Consider information, the reading comprehension of information supplied on medical prescription, medicine insert and package occur in interaction with the oral memorized information provided by the doctor (during the consultation) and pharmacist (visit the pharmacy to buy the medicine). According to this outcome it is possible to infer that the processed information on this step (Consider the information) is essential for the success of the task performance “Take the medicine”. Therefore, problems on the step “Consider the information” may affect the success of the task performance and consequently compromise the treatment efficacy and patients’ health. On this regard, it is ratified the importance of oral and visual information on the process of acquisition and usage of medicines by Brazilian patients.
References 1. Van der Waarde, K.: Visual information about medicines. Providing patients with relevant information. In: Spinillo, C.G., Coutinho, S.G. (eds.) Selected Readings of the Information Design International Conference 2003. Recife, SBDI | Sociedade Brasileira de Design da Informação, pp. 81–89 (2004) 2. Fujita, P.T.L., Spinillo, C.G.: Verbal Protocol as an information ergonomics tool to verify reading strategies in medicine inserts. In: AEI 2008: 2nd International Conference on Applied Ergonomics, Proceedings of the AHFE International Conference 2008, Las Vegas, Nevada, vol. 1. USA Publishing | AHFE International, Louisville (2008) 3. Dickinson, D., Raynor, T.D.K., Kennedy, J.G., Bonaccorso, S., Sturchio, J., et al.: What information do patients need about medicines? BMJ, Education and Debate 327, 861–864 (2003) 4. Fujita, P.T.L., Spinillo, C.G.: A apresentação gráfica de bula de medicamentos: um estudo sob a perspectiva da ergonomia informacional. In: Congresso Internacional de Ergonomia e Usabilidade ‘ERGODESIGN, 2006, pp. 1–6. UNESP, Bauru (2006) 5. Raynor, D.K.T.: Patient compliance – the pharmacist´s role. Int. Journal Pharm Practice 1, 126–135 (1992) 6. da Silva, T., Dal-Pizzol, F., Bello, C.M., et al.: Bulas de medicamentos e a informação adequada ao paciente. Revista Saúde Pública, abr. 34(2), 184–189 (2000) 7. Van Der Waarde, K.: Visual information about medicines for patients. In: Frascara, J. (ed.) Designing Effective Communications: Creating contexts for clarity and meaning, pp. 38– 50. Allworth Press, New York (2006) 8. Moraes, A., Mont’AlvÃO, C.: Ergonomia: conceitos e aplicações. Rio de Janeiro: 2AB (1998) 9. Militello, L.G., Hutton, R.J.B.: Applied cognitive task anlysis (ACTA): a practicioner’s toolkit for understanding cognitive task demands. Ergonomics 41(11), 1628–1641 (1998) 10. Brasil. Ministério da Saúde. Resolução RDC Nº 140, de 29 de maio de (2003), http:// e-legis.anvisa.gov.br/leisref/public/showAct.php?id=6311
A Proposal of Collection and Analysis System of Near Miss Incident in Nursing Duties Akihisa Furukawa and Yusaku Okada School of Science & Technology, Graduate School of Keio University 3-14-1 Hiyoshi Kohoku-KU Yokohama, Japan [email protected]
Abstract. In this study, we proposed collection and analysis system of near miss incident (CASN) as a support tool of safe activity at the medical institutions. CASN is consists of supporting software and reference list which helps risk managers who don’t have special knowledge and skill do factor analysis easily and nurses write good reports for analysis. CASN accumulates the data of PSF of near miss incidents and gives the tendency of PSF at each post and analyzer. As a result, diverse factors are found by the analysis of the risk managers and we expect the diversity promotes the growth of the risk managers. Keywords: Human error, Accident prevention, Knowledge management.
1 Introduction Human errors including accidents, incidents and near miss incident occur for many factors. We call these factors PSF (Performance Shaping Factors). It is important to understand what kind of PSF there is and deal with PSF properly to prevent accidents. In addition, in order to prevent tragic accidents which may happen in the future, it is important to analyze not only the accident that happened for a past in detail but also near miss incident which is the case that doesn’t cause serious accident but has potential danger. In the nursing duties, the importance of collecting and analyzing near miss incident is attracting attention because medical accidents are liable to result in miserable. However, many medical institutions can’t collect and analyze sufficiently near miss incident in the fact because of the difficulty of the analysis. Therefore, the purpose in this paper is to propose collection and analysis system of near miss incident (CASN) as a supporting tool which enables staffs to analyze near miss incident easily at the medical institutions.
2 Analysis of Near Miss Incident in a Nursing Duties At first, we surveyed how near miss incident was collected and analyzed in medical institutions. In collecting process, the nurse must record the contents of near miss M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 497–502, 2009. © Springer-Verlag Berlin Heidelberg 2009
498
A. Furukawa and Y. Okada
incident; date, place, and situation as a report, and consider and write factors and improvement plans for themselves when he or she experiences near miss incident. However, it is natural that people don’t want to confess their failure. Some nurses are reluctant to write a report or write a report like the apology, for example “I was careless, I will be careful from now on”. Therefore, it became clear that the data which was able to analyze didn’t collect sufficiently. In analyzing process, a few nurses discuss the near miss incident and consider the factors and measures. However, there are many near miss incident cases in comparison with the accident cases. The amount of collecting data is enormous. And risk managers who analyze near miss incident mainly are busy with nursing duties and don’t have special knowledge and skill of the factor analysis. So, it is difficult for risk managers to analyze all the near miss incident data. We performed a questionnaire to about 50 risk managers at two hospitals. As a result, 65% risk managers answered they could not analyze data to prevent accidents. (Fig.1) 20% 35% neither collection nor a nalysis is sufficient collection is sufficient, but a nalysis is not sufficient both collection and analysis is sufficient 46%
Fig. 1. Survey about near miss incident in hospital
3 Collection and Analysis System of Near Miss Incident (CASN) Therefore, we proposed CASN which is consists of supporting software and reference list which helps risk managers who don’t have special knowledge and skill do factor analysis easily and nurses write good reports for analysis. 3.1 PSF Reference List PSF reference list is made up of 17 PSF which was thought about as factors of the errors in the nursing duties and incident cases which are expected to be caused by these PSF. We chose these PSF in reference to 98 PSF which were compiled by past examples and made the example sentence of the expected incident cases. These PSF are classified into four categories, Gestalt, Affordance, Preview, Workload. This concept called GAP-W is suggested by Yukimachi, Nagata. This list is added to the incident report and used for supporting when doing factor analysis. Nurses can write good reports by this list because they can imagine their
A Proposal of Collection and Analysis System of Near Miss Incident in Nursing Duties Gestalt Lack of skill Uncertain work Lack of knowledge Affordance Difficulty of confirmination Difficulty of distinction Imperfection of apparatus Lack of criterion for judgment Preview Difficulty of prediction Inappropriate of communication
Lack of preparation Unplanned work Workload Excess of information and direction Complicated work
Psychological burden Bad work environment
Physical burden Distract attention
499
A skill about the work is short. The work is depending on personal judgment. There is not a manual, manual is vague. A knowledge about the work is short. A basic medical knowledge is short. The work is hard to confirm, impossible to confirm. A confirmination item is uncertain. Indication of the apparatus is bad. There are plural similar objects. It is hard to use an apparatus. How to use operation appliance is incomprehensible. There is not a criterion or creterion is lacking in concreteness. It is hard to get information to predict it. It is hard to predict latent danger. Instructions, communication and guidance from the boss is not appropriate . Cooperation with the other post is not good. Begin work with incomplete preparation. Daily lack of preparation. The work has interruption or changes. A lot of information is given from plural people or places There is much incidental work such as preparations, a record, and the transaction. The work needs processing at the same time. Independent work. Fear of the failure is severe. There is not enough area about the work. The duty of the long time. Illumination is dark. The work needs hard posture or movement. Long time work. Cannot distribute mind to surroundings in a focalization.
Fig. 2. The list item
near miss incident’s factors for looking categorized PSF. Risk managers can analyze cases easily through the check list made by this reference list. 3.2 Knowledge Discovery in Databases Risk managers at each section analyzed the report in their own section by using the reference list. And an administrator who is the general risk manager in the medical institution and has higher skill and knowledge than other risk managers analyzed the case again after risk manager’s analysis. By this analysis of the check list method, the reports can be arranged to the database. CASN discovers knowledge from accumulated data in two ways, the characteristic of the section and risk manager.
500
A. Furukawa and Y. Okada
Fig. 3. The concept of knowledge discovery
CASN compares the analysis of each section’s data with the analysis of whole hospital’s data and extracts PSF checked a lot statistically at that section. For this function, risk managers can know a tendency of PSF hiding in their sections and take more proper action to prevent accidents. In the comparison of an administrator and risk managers of each section, CASN extracts PSF checked a lot by an administrator while risk managers of each section didn’t check. These PSF are thought that it is difficult for the risk manager of each section to notice them. It promotes the growth of the risk managers to inform them of these hiding PSF and it is expected that they become to be able to do factor analysis from various viewpoints.
Checked Non-checked
General risk manager
Checked
Non-checked
Risk manager at each section Fig. 4. The extraction of the characteristic of the analyst
4 Results of Examination We tested effectiveness of the reference list. In reports, the factors mentioned by nurses were mainly self-factor like “not notice, not confirm, forget, make mistake”
A Proposal of Collection and Analysis System of Near Miss Incident in Nursing Duties
501
25 20 15 10 5 0
Lack in confirmination Mistake Patient Not notice(forget) Lack of skill Uncertain work Lack of knowledge Difficulty of confirmination Difficulty of distinction Imperfection of apparatus Lack of criterion for judgment Difficulty of prediction Inappropriate of communication Lack of preparation Unplanned work Excess of information and direction Complicated work Psychological burden Bad work environment Physical burden Distract attention
Report Reference list
Fig. 5. Factors mentioned in reports and extracted by using reference list Table 1. One-sided p-value (significance probability)
Lack of skill Uncertain work Lack of knowledge Difficulty of confirmination Difficulty of distinction Imperfection of apparatus Lack of criterion for judgment Difficulty of prediction Inappropriate of communication Lack of preparation Unplanned work Excess of information and direction Complicated work Psychological burden Bad work environment Physical burden Distract attention
A3 A4 A5 㻜㻚㻝㻜㻥㻢 㻜㻚㻠㻤㻢㻥 㻜㻚㻜㻡㻜㻝 㻜㻚㻠㻟㻤㻟 㻜㻚㻝㻜㻡㻢 㻜㻚㻟㻣㻢㻣 㻜㻚㻟㻤㻡㻟 㻜㻚㻜㻢㻞㻤
B3 B4 B5 C4 C5 㻜㻚㻞㻥㻜㻥 㻜㻚㻠㻞㻤㻣 㻜㻚㻠㻝㻡㻜 㻜㻚㻞㻥㻜㻥 㻜㻚㻠㻞㻤㻣 㻜㻚㻠㻝㻡㻜 㻜㻚㻟㻥㻤㻣 㻜㻚㻝㻢㻟㻤 㻜㻚㻝㻥㻤㻞 㻜㻚㻠㻜㻣 㻜㻚㻟㻣㻞㻝 㻜㻚㻞㻞㻡㻞 㻜㻚㻠㻟㻟㻡 㻜㻚㻝㻥㻝㻥 㻜㻚㻠㻡㻝㻝 㻜㻚㻠㻡㻠㻥 㻜㻚㻞㻥㻣㻥
㻜㻚㻝㻞㻡㻞 㻜㻚㻟㻢㻣㻝 㻜㻚㻝㻤㻞㻣 㻜㻚㻝㻝㻜㻠 㻜㻚㻠㻟㻟㻠 㻜㻚㻝㻜㻡㻝 㻜㻚㻜㻞㻜㻣 㻜㻚㻠㻜㻣㻞 㻜㻚㻜㻝㻟㻡 㻜㻚㻠㻡㻠㻥 㻜㻚㻜㻤㻠㻠 㻜㻚㻝㻠㻤㻤 㻜㻚㻝㻣㻞㻢 㻜㻚㻜㻡㻢㻤 㻜㻚㻞㻞㻠㻞 㻜㻚㻜㻤㻡㻠 㻜㻚㻝㻣㻤㻠 㻜㻚㻜㻜㻜㻞 㻜㻚㻝㻞㻣㻠
ICU OPE 㻜㻚㻟㻡㻠㻜 㻜㻚㻟㻡㻠㻜 㻜㻚㻠㻣㻜㻢 㻜㻚㻝㻜㻞㻢 㻜㻚㻜㻡㻤㻟 㻜㻚㻟㻣㻢㻣 㻜㻚㻞㻠㻢㻝 㻜㻚㻝㻞㻟㻜 㻜㻚㻜㻝㻥㻝 㻜㻚㻟㻝㻡㻣 㻜㻚㻝㻞㻣㻟 㻜㻚㻜㻟㻝㻞 㻜㻚㻝㻜㻤㻣 㻜㻚㻟㻜㻢㻡 㻜㻚㻠㻥㻝㻟 㻜㻚㻜㻢㻢㻟 㻜㻚㻟㻜㻢㻥 㻜㻚㻜㻢㻡㻜 㻜㻚㻜㻞㻜㻣 㻜㻚㻞㻥㻣㻥 㻜㻚㻠㻢㻣㻝 㻜㻚㻝㻝㻞㻟 㻜㻚㻠㻞㻢 㻜㻚㻟㻥㻤㻞 㻜㻚㻜㻜㻣㻥 㻜㻚㻞㻝㻡 㻜㻚㻜㻜㻣㻝 㻜㻚㻜㻤㻝㻞 㻜㻚㻞㻤㻥㻝 㻜㻚㻜㻜㻝㻝
and patient-factor, for example, the patient who is liable to fall down, and these factors were 54% in all factors mentioned in reports. Risk managers at each section analyzed the report in their own section by using the check list. In comparison with the factor mentioned in reports, the number of extracted factors became 2.1 times and various factors were able to be extracted by using the reference list. (Fig.5)
502
A. Furukawa and Y. Okada
In this test, a writer of the report and the checker are not the same person. So these increases of factors don’t necessarily mean that the CASN helps users analyze the case easily. However, considering the present condition of analysis near miss incident is that even analyzing all the data is impossible, these results can put value from the points which all the data can be analyzed and various factors except self-factors are found. We show a result example of a characteristic of each section in Table1. The figures in Table1 is one-sided p-value (significance probability). The blank shows that PSF was selected fewer than whole hospital in that section. PSF whose significance probability was less than 5% were observed nine in total at four sections. Less than 10%, eighteen PSF were observed at six sections. In addition, it was cleared that some sections has many these characteristic PSF.
5 Conclusion In this study, we proposed collection and analysis system of near miss incident to help risk managers to do factor analysis easily and inform them of tendency of PSF each section and themselves have. Analysis of near miss incident from wider viewpoint was enabled for this system. On the other hand, there were opinions from staffs that it took time to analyze all reports even though they used the check list. Therefore, we will improve CASN through changing the information which is presented against users and expect the growth of the risk managers.
References 1. Yukimachi, T., Nagata, M.: Reference List on Basis of GAP-W Concept and a Case Study. Human Factors in Japan 9(1), 46–62 (2004) 2. Yukimachi, T., Nagata, M.: Study of Performance Shaping Factors in Industrial Plant Operation and GAP-W Concept. Human Factors in Japan 9(1), 7–14 (2004) 3. Sagawa, N., Fujitsuka, R., Furukawa, A., Okada, Y.: A Study on Usability of a Reference List in Nursing Duties. In: Proceedings of the 38th Annual Meeting of Kanto-Branch, pp. 47–48. Japan Ergonomics Society (2008)
Effects of Information Displays for Hyperlipidemia Yang Gong1 and Jiajie Zhang2 1
University of Missouri, CE707 CS&E Bldg, DC006.00, Columbia, MO, 65212, USA [email protected] 2 University of Texas Health Science Center at Houston, 7000 Fannin St, Houston, TX, 77030, USA [email protected]
Abstract. How information is distributed between internal and external representations significantly affects information search performance. For a distributed information search task, data representation and cognitive distribution jointly affect the user search performance in terms of response time and accuracy. Guided by UFuRT (User, Function, Representation, Task), a human-centered framework, we propose a search model and task taxonomy. The model defines its application in the context of healthcare setting. The taxonomy clarifies the legitimate operations for each type of search task of relation data. We then developed experimental prototypes of hyperlipidemia data displays. Based on the displays, we tested the search tasks performance through two experiments. The experiments are of a within-subject design with a sample of 24 participants. The results in general support our hypotheses and validate the prediction of the model and task taxonomy. Keywords: Relational Data Display, Taxonomy, Hyperlipidemia.
1 Introduction Clinicians’ work setting in nature is an information overloaded, time constrained environment [2]. Being able to search information efficiently to support decision making processes is an important part of medical practice. As a tool, an Electronic Medical Record(EMR) empowers the clinicians to view health record data, especially numeric data in multiple ways [23]. Through effective representations of numeric data, we expect that clinicians’ daily task could be facilitated. In this project, our research interest lies in exploring effective representations for clinical numeric data which are typically found in lab result section of EMR systems. For a typical lab results, internal and external information are two kinds of information that jointly affect a decision-making process. According to distributed cognition, internal and external information are essential in revealing the distributing pattern and the interaction between human and artifacts [14]. In a typical information search environment, information may be distributed across human, artifacts (tools), time and/or space. Under such a distributed environment, an information seeker needs M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 503–512, 2009. © Springer-Verlag Berlin Heidelberg 2009
504
Y. Gong and J. Zhang
to process both internal information, which is stored in his brain, and external information, which is stored outside the seeker’s brain, across time or space. How external information is represented can significantly affect the decision process [22]. Specifically, when numeric data is represented by different levels of data scales(nominal, ordinal, interval, and/or ratio data), they affect user’s search performance. In this project, we investigate the relationship between data representations and search tasks at the scale levels. The fundamental principles of this research support the design and evaluation of human-centered information system in general. The search model and task taxonomy are helpful in determining task complexity and designing interfaces for relational data display in EMR. 1.1 Relational Information Displays and Distributed Cognition In an EMR, lab results are usually represented with Arabic numbers. The Arabic numbers could be further transformed into tabular, graphic formats, etc. to facilitate clinicians’ decision making process. Within a table of graph, the data and their relations jointly construct Relational Information Displays (RIDs), which are of those representing the relationship between dimensions (as shown in the format of rows or columns, X or Y axes and so on). There are two kinds of dimensions: represented dimensions and representing dimensions. The represented dimensions of a RID refer to the dimensions of an original domain in the world represented by various RIDs. The representing dimensions refer to the physical dimensions of RIDs representing the dimensions of the original domain in the world. These two dimensions have to be matched in scale so as to guarantee the efficient and accurate representation between the display and the world [24]. This is a basis of effective of data display in this project. A display of a RID represents the relations among different types of information. For example, a patient’s name along with his lab results constructs multiple relations. This relationship is now represented by a table which can be transformed into many other formats like line graphs, bar charts, pie charts, scatter plots, and matrices, etc. All these formats carry the same amount of information yet the representational effects might be different because the distributive patterns of the internal and external information are different. Distributed cognition theory puts emphasis on individuals and their environment and it views a system as a set of representations [12]. The theory models the interchange of information between these representations that can be either in the brain of the participants or represented on the artifacts in the environment. Zhang proposed a theory that external representations are not simply peripheral aids, but that they are an indispensable part of cognition [21]. According to Zhang’s theory, external information presented in an appropriate format can reduce the difficulty of a task by supporting recognition-based memory or perceptual judgments rather than recall. Proper external representations support internal memories and therefore enhance task performance. In many tasks, such as the search tasks in this study, people often use external artifacts to enhance internal memory and the artifacts are often created specifically for the purpose of remembering. For example, a patient chart is designed for reviewing a patient’s medical history. A typical task in an EMR system could be, for example, finding all the abnormal values on lipid panels of a patient over the past 12 months. In this task, the normal
Effects of Information Displays for Hyperlipidemia
505
range required, if not presented on screen, is the internal information. The observed values, which are presented in the patient’s record, are the external information. The observed values of the lipid panels, for example, can be presented in a format of table, graph and/or the mixture of the two and so on. The observed values can be also represented in different data scales. Examples are +/-(nominal), low, normal, high(ordinal), or absolute values in Arabic numbers(ratio) and so on. A variety of representations are applicable to the same set of data. In this research, we are interested in the efficiencies of some representations developed based upon the nature of tasks. 1.2 Scales and Information Presentation In a relational information display, dimensions are basic units used to describe the relationship of data. Scale types of data are finer granularity of dimensions and they provide the detail on how data are interrelated to each other. The notion of scales is important for the understanding of information search tasks [17]. Stevens proposed a distinction between four types of scales: nominal, ordinal, interval, and ratio [19]. The scale type of data determines which operations can be legitimately applied to data. The four scales each have different strengths in operations. Operations 1 to 4 listed below are accumulative, which means the bigger number of operations may also include those operations in smaller numbers. For example, interval scale, besides its determination of equality of differences, may allow all the operations that either nominal or ordinal scale allows. 1. Determination of equality of two instances on the scale (=) (nominal scale) 2. Determination of the rank-order (greater or less) of two instances on the scale (>,<) (ordinal scale) 3. Determination of equality of differences on the scale (+,-) (interval scale) 4. Determination of equality of ratios on the scale (/,*) (ratio) Applying the scale type of data to information searching tasks, each type of search tasks can be expanded as a set of operations which can be legitimately applied to data on this scale type. Examining the clinically meaningful tasks in different representation formats may help explain the reason why some tasks are more difficult in certain representation than those in seemingly isomorphic representations. 1.3 Theoretical Framework of Human-Centered Design Built upon the theory of distributed cognition and a set of analysis techniques, Zhang et al developed a method called UFuRT (User, Function, Representation, Task) for the effective design and evaluation of human-centered distributed information system [23, 6]. It provides systematic principles, guidelines, and procedures for designing human-centered information systems. Theoretically, an information search interface designed by the UFuRT process ensures that the design matches the information search task which leads to a better task performance. Applying the UFuRT process in relational search tasks, we believe that the internal information and external information required for each step can be used to predict the efficacy of search tasks.
506
Y. Gong and J. Zhang
For a type of clinical data, certain types of display are superior to other isomorphic representations in terms of search performance [5]. A variety of studies have identified that the users such as clinicians and medical researchers may use the same data set in different ways [17, 7, 15]. For example, in a clinician group, it may include but not limit to physicians, nurses, dieticians, pharmacists, and so on. In a clinical researcher group, it may include but not limit to epidemiologists and clinical statisticians. They may have common questions to get answered in the situations where they need to solve a certain problem or make a decision (treatment & diagnosis), or they need to check the background information of diseases (etiology) or they need to keep up with the latest information of or a given subject, so as to keep abreast of the professional development and continue their medical education. However, examining the same collection of medical records, they may use different approach to conduct their researches. Clinicians whose interests are typically about various aspects of a particular patient at individual level, therefore the with-in patient searches are their key tasks, whereas clinical statistician may view the patient records at collective level to reveal the trends or epidemic status of diseases. A patient record contains both free text description, which is often read in the reports or discharge summaries, and coded data, which exist typically in the lab results section. We believe they belong to the basic two types, and all other types, like x-ray reports, graphs can be converted into these two for information search purpose. In this study, we investigated examples of coded data drawn from the lipid panel lab results. We used these results to conduct our empirical studies on the effect of type of relational information display on the coded data search tasks. The figure 1 illustrates the model we followed in this study. Information search efficiency can be improved by several factors that characterize human information behaviors [12, 16], the focus of this research is on cognitive factors and their implications on human-computer interaction. We propose a search model with a special concern on the interactions between the user and computerbased information systems [18]. It is a subset of information behavior models and
Fig. 1. Human-Centered Distributed Search Mode
Effects of Information Displays for Hyperlipidemia
507
information seeking models [1, 4]. The model also has a close connection with visual search models which are about the cognitive strategies that people use on specific displays. The models at this level could be of help in explaining the information search performance in terms of the patterns of information distributions [10, 11].
2 Search Task Taxonomy Applying UFuRT process to the distributed information search interface design, we developed several information search prototypes based on lab result module of an EMR system. The tasks are centered on searching a patient’s hyperlipidemia lab results. To better organize and represent the generalizability of hyperlipidemia data, we developed a search task taxonomy based upon functional analysis. In this taxonomy, search tasks are categorized into direct search and comparative search. A direct search is to find a specific value under specific conditions. A direct search task could be further divided into a dimensional search or relational search. A comparative search compares the value within one dimension (within-dimension search) or between two (between-dimension search). This taxonomy provides a basis of experimental design. The examples for each type of search are included in the taxonomy as shown in Table 1. Table 1. A taxonomy of information search tasks in relational information display Definition
Relational Search
Search for values on multiple dimensions
Within-dimension Betweendimension
Comparative search
Dimensional Search
Direct Search
Search for values on one dimension
Compare values within one dimension
Example 1 Are there any abnormal levels of cholesterol in the patient record? -search data within one dimension
In which month of 2003, the patient’s LDL level was abnormal? -search data within two dimensions
Compare values between multiple dimensions
Did the patient’s triglyceride level drop since the start of his diet treatment? - to detect trends of data distribution Did the cholesterol ratio (total cholesterol/HDL) change over the past year? - calculation involved
Example 2 2. How many times was the patient’s diastolic pressure recorded as abnormal in his record? -an extended question based on Example 1, counting the abnormal numbers becomes part of the dimensional search 2. Was there any date during 2003 when both HDL and triglyceride were abnormal? -search data with three or more dimensions ( if counting other lipid panel values)
508
Y. Gong and J. Zhang
3 Research Hypotheses In this research, we studied the relationship between types of interfaces and types of search tasks in terms of effectiveness and efficiency for information search. The purpose of the experiments was to validate the model at the operation level and to determine the degree of difference of search performance under some representations (nominal, ordinal, ratio representations; table & graph representations) designed for clinically meaningful tasks(to localize, to compare and to calculate). We had the following two hypotheses. 1. Information search with more external information will yield better task performance than those with less external information. This is because that the information in external representations can be picked up by perceptual processes, whereas the information in internal representations has to be retrieved from memory. 2. The exact representation between task and data representation yield a better performance than over representations. This is because the higher levels of data scales legitimately allow more operations which may or may not fit into the search task.
4 Methods The experiment was comprised of two sub-experiments and conducted as one session for each participant. Experiment I focused on paired nominal, ordinal, and ratio tasks with nominal questions. Experiment II focused comparing user performance on all four types of questions represented by table and graph representations. Both experiments were a within-subject design with variables (questions types and representing dimensions), and the dependent measures were response time and correctness of answers. To avoid the carry-on effects of within subject design, counterbalancing methods were taken. The questions and representing dimensions sets were ordered so as to prevent from the learning effect from previous trials. The same type of question (nominal, ordinal, ratio) can not be asked consecutively. The same type of representing dimensions can not be used in a consecutive order either. With the above consideration and to achieve statistical power, the minimum number of participants required in the research was 12. 4.1 Subjects Twenty-four graduate students (12 male and 12 female; 12 have healthcare training as MD, RN, PT etc, 12 have no healthcare training) were recruited from the University of Texas Health Science Center with approval of the Institutional Review Board (IRB) Committee for the Protection of Human Subjects at the same university. All participants agreed and signed up the consent forms. 4.2 Materials Participants were asked to view a set of relational information displays (appendix I and II) which comprise of questions developed based on the search task taxonomy and data representations in tables with nominal, interval, ratio scales and line graphs.
Effects of Information Displays for Hyperlipidemia
509
All contents were the records of the lipid panel results of hypothetical patients. Microsoft Visual Basic for Applications (VBA) codes were used to implement the interface design and capture the response time and answers to each question. 4.3 Procedure A training session was given to each participant prior to the formal experiments in order for the participant to memorize the normal value range of lipid panel and become familiar with the interfaces. Response time and answers were recorded automatically in spread sheets. After the training session, the participant was asked to perform coded data search tasks in a total of 12 trials. Within trial N-O-R(6 trials in total), there were 6 questions (search tasks) based on nominal, ordinal, ratio scale data display for each trial. The data were displayed in either table or graph format. Within trial T-G(6 trials in total), there were 8 questions (search tasks) based on nominal, ordinal, interval and ratio scale data display for each trial. The data were displayed in table format. Participants were requested to answer each and every question as quickly and accurately as possible.
5 Results Data were successfully collected and transformed into SPSS statistical software. Statistical considerations were given to the power calculations of ANOVA designs. In the coded data search experiment, the nominal data display (represented by abnormal, normal) had the shortest response time comparing to ordinal (represented by lo, ok, hi) and ratio displays (represented by absolute numbers). The average response time for the tasks performed in nominal display was 8.60 ± 1.25 seconds, ordinal display 9.90 ± 1.70 seconds, ratio display 11.20 ± 2.11 seconds. As expected, there was a main effect of display type (nominal, ordinal, ratio), F (2,40)=30.28, p<.001. This phenomenon supports the prediction of hypothesis II. There was a trend that as the data scales upgrading to higher levels, which contains extra amount of information, the nominal search tasks become harder (Fig 2). The incorrect answers for nominal, ordinal and ratio displays were analyzed. There were no effects due to gender and training background. No significant two-way interactions between display and gender were found.
Time (Sec)
12.00 11.00 10.00
Nominal Question
9.00 8.00 Nominal Question
Nominal
Ordinal
Ratio
8.60
9.90
11.20
Display
Fig. 2. The response time of nominal questions on the nominal, ordinal and ration displays
510
Y. Gong and J. Zhang
Time (Sec)
15.00 13.00 11.00 9.00
Graph
7.00
Text
5.00
Nominal
Ordinal
Interval
Ratio
Graph
7.95
8.70
9.90
12.90
Text
8.20
10.56
12.53
14.56
Question
Fig. 3. The response time of the nominal, ordinal, interval and ratio questions searched on graph and text displays respectively
Comparing a set of search task performed on graph and text displays, the graph displays were superior to the text display when nominal, ordinal, interval, and ratio questions were asked (Fig.3). The average response time for the nominal, ordinal, interval and ratio tasks performed in graph display were 7.95±1.31, 8.70±2.05, 9.90±2.45 and 12.90±3.02 seconds respectively. The average response time for the nominal, ordinal, interval and ratio tasks performed in table display were 8.20±1.51, 10.56±2.11, 12.53±2.82, and 14.56±3.53 seconds respectively. As expected, there was a main effect of question type (nominal, ordinal, interval and ratio), F (1,21)=28.69, p<.001. These results support hypothesis I. As internal information requirement increases, the response time increases significantly. The incorrect answers were also analyzed. There were no effects due to gender and training background. No significant two-way interactions between display and gender were found.
6 Discussion In healthcare, effective information search is critical to clinical outcomes, as the nature of mission is life related and time critical. Clinicians often carry an interactive high workload which frequently involves information search activities. Artificial intelligence supported by reasoning methods would definitely facilitate the decision making process. However, human computer interface is still considered a key factor that represents the results and plays an essential role in the entire process. This approach directed by UFuRT presents an efficient and feasible way for designing a human-centered data display interface of relational data. As it is confirmed that proper design of the user interface for information search can substantially increase the efficiency of human-computer interaction in terms of increased task performance, user satisfaction, and user’s knowledge retention, and decreased training time and error rate [4]. The information search model and search task taxonomy as described in this paper integrate the consideration of humancentered design. They jointly describe and explain user performance in information search tasks under different data scale representations. This research demonstrates both theoretical and practical implications. It reveals how a human information seeker interacts with artifacts in relational data search under different distributed conditions. The human-centered information search model
Effects of Information Displays for Hyperlipidemia
511
and the search task taxonomy theoretically contribute to the study of information search, distributed cognition, and the disciplines of human-centered computing. The practical contribution is an effective prediction and a better design of search interfaces with consideration of data scales and distributed nature of information.
References 1. Bystrom, K., Jarvelin, K.: Task Complexity Affects Information Seeking and Use. Information Processing and Management 2, 191–213 (1995) 2. Coiera, E.W., Jayasuriya, R.A., Hardy, J., Bannan, A., Thorpe, M.E.C.: Communication loads on clinical staff in the emergency department. In: MJA 2002, vol. 176, pp. 415–418 (2002) 3. Dervin, B.: On studying information seeking methodologically. The implications of connecting metatheory to method. Information Processing and Management 35(6), 727– 750 (1999) 4. Ellis, D., Cox, D., Hall, K.: A Comparison of the Information Seeking Patterns of Researchers in the Physical and Social Sciences. Journal of Documentation 49(4), 356–369 (1993) 5. Elting, L.S., Martin, C.G., Cantor, S.B., Rubenstein, E.B.: Influence of data display formats on physician investigators’ decisions to stop clinical trials: prospective trial with repeated measures. BMJ 318, 1527–1531 (1999) 6. Gong, Y., Zhang, J.(eds.): A human-centered design and evaluation framework for information search. In: Proceedings of AMIA 2005, Washington, DC (2005) 7. Gorman, P.N.: Excellent information is needed for excellent care, but so is good communication. Western journal of medicine 172(2000), 319–320 2003) 8. Gorman, P.N.: Information needs of physicians. Journal of the American Society for Information Science 1995 46(10), 729–736 (1995) 9. Hersh, W.R., Hickman, D.H.: How well do physicians use electronic information retrieval systems. Journal of the american medical association 280(15), 1347–1452 (1998) 10. Hornof, A.J.: Cognitive strategies for the visual search of hierarchical computer displays. Human-Computer Interaction 19(3), 183–223 (2004) 11. Hutchins, E., Klausen, T.: Distributed cognition in an airline cockpit. In: Engestrom, Y., Middleton, D. (eds.) Cognition and Communication at Work. Cambridge University Press, Cambridge (1996) 12. Hutchins, E.: Cognition in the wild. Massachusetts Institute of Technology, Massachusetts (1995) 13. Jacob, R.: User interface. In: Ralston, A., Hemmendinger, D., Reilly, E. (eds.) Encyclopedia of Computer Science, 4th edn. Macmillan, Basingstoke (2000) 14. Marchionini, G.: Interfaces for end-user information seeking. Journal of the American society for information science 42(2), 156–163 (1992) 15. Mendonca, E.A., Cimino, J.J., Johnson, S.B., Seol, Y.-H.: Accessing heterogeneous sources of evidence to answer clinical questions. Journal of biomedical informatics 34, 85– 91 (2001) 16. Norman, D.A.: Things That Make Us Smart: Defending Human Attributes in the Age of the Machine. Addison-Wesley Perseus, Massachusetts (1993) 17. Petersen, J., May, M.: Scale transformations and information presentation in supervisory control. International journal of human-computer studies (2006)
512
Y. Gong and J. Zhang
18. Saracevic, T.: Modeling interaction in information retrieval (IR): A review and proposal. In: Proceedings of the American Society for Information Science 1996, vol. 33, pp. 3–9 (1996) 19. Stevens, S.S.: On the theory of scales and measurement. Science 103(2684), 677–680 (1946) 20. Wilson, V.: The information needs of primary care physicians: digital reference service (2004), http://www.slis.ualberta.ca/cap04/virginia/ capping_exercise.htm (cited June 15, 2005) 21. Zhang, J.(ed.): The interaction of internal and external representations in a problem solving task. In: Proceedings of the Thirteenth Annual Conference of Cognitive Science Society. Erlbaum, NJ (1991) 22. Zhang, J., Norman, D.A.: A representational analysis of numeration systems. Cognition 57, 271–295 (1995) 23. Zhang, J., Patel, V.L., Johnson, K.A., Malin, J., Smith, J.W.: Designing human-centered distributed information systems. IEEE intelligent systems 17(5), 42–47 (2002) 24. Zhang, J.: A representational analysis of relational information displays. International Journal of Human-Computer Studies 45, 59–74 (1996)
Appendix: Data Display Interfaces
The Interfaces for Experiment I
The Interfaces for Experiment II
Clinical Usefulness of Human-Computer Interface for Training Targeted Facial Expression: Application to Patients with Cleft Lip and/or Palate Kyoko Ito1,2, Ai Takami2, Shumpei Hanibuchi2, Shogo Nishida2, Masakazu Yagi3, Setsuko Uematsu4, Naoko Sigenaga5, and Kenji Takada5 1
Center for the Study of Communication-Design, Osaka University 2 Graduate School of Engineering Science, Osaka University 3 The Center for Advanced Medical Engineering and Informatics, Osaka University 4 Osaka University Dental Hospital 5 Graduate School of Dentistry, Osaka University {ito,takami,hanibuchi,nishida}@nishilab.sys.es.osaka-u.ac.jp, {mgoat,suematsu,nao-sida,opam}@dent.osaka-u.ac.jp
Abstract. This study is toward introducing a treatment modality into clinical practice to manage facial expression. In particular, it focused on facial expression training, which has recently attracted attention. Facial expression training teaches the patient to move the facial muscles on his or her own initiative. The experiment was for patients with cleft lip and/or palate and the intent was to introduce facial expression training into clinical practice in order to determine the possibility of using the support interface. The study was planned and conducted in 2 phases. Phase 1 results demonstrated the possibility of introducing facial expression training using the method proposed in this study. In the future, by analyzing the results of the Phase 2 experiment, the usefulness of the support interface for facial expression training proposed in this study will be further examined. Keywords: Clinical Usefulness, Facial Expression Training, Cleft Lip and/or Palate, Interface, Medical Practice.
1 Introduction Cleft lip and/or palate are congenital facial anomalies. The occurrence of these anomalies together has a high incidence in Japan of 1 in 500 [1]. Patients were born with a cleft disorder of the lip and/or palate, and long-term treatment is required until adulthood [2]. There is a broad range of impairments, including those that are primarily functional or aesthetic. Common functional disorders are dystithia, dyslalia, and mastication disorder. Aesthetic problems include postoperative scarring, deformation due to lesions and disturbance in the development of the jaw. Treatment requires the coordinated expertise of a team of oral surgeons, plastic surgeons, orthodontists, and speech therapists [3]. Conventionally, treatment has focused on morphologic aspects, such as closure of the cleft or occlusal management. However, M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 513–521, 2009. © Springer-Verlag Berlin Heidelberg 2009
514
K. Ito et al.
treatment of facial scarring has dramatically improved and scarring is far less conspicuous, due to progress in surgical techniques. Recently, treatment for countenance and movement of soft tissues has been examined, which is a paradigm shift in treating cleft lip and/or palate. The expectation is that treatment will include countenance and flexibility of the soft tissues, that is, facial expression, unlike the past when treatment mainly focused on facial structure. However, it is difficult to introduce a new approach that does not directly lead to treatment because each specialist has a specific therapeutic role in clinical practice. Trotman et al. quantitatively analyzed the facial expressions of patients with cleft lip and/or palate. Their study is beginning the verification phase for evaluating the analysis of facial movement [4] and the configuration or movement of the lip [5], and has not yet progressed to the clinical phase. For psychological aspects, patients with cleft lip and/or palate are reported to feel dissatisfied with their facial appearance [6] and to behave more passively in personal relationships [7]. Though a direct relationship between facial expression and these psychological effects has not been demonstrated, when determining treatment it is important for the physician to consider patient distress concerning facial appearance [8], [9]. This study is toward introducing a treatment modality into clinical practice to manage facial expression. In particular, it focused on facial expression training, which has recently attracted attention. Facial expression training teaches the patient to move the facial muscles on his or her own initiative. With this training, the patient can learn how to present an expressive face. To appropriately examine the modality requiring psychological consideration of patients, carefully introducing the method is necessary. This study is considering a method to verify the possibility of introducing facial expression training into medical practice based on human interface studies. Facial expressions are important for social life and are crucial in human communication. Enhancing the patient’s use of facial expression will provide greater satisfaction with his or her appearance as well as assertiveness in personal relationships. Using facial expression training in the medical practice should also enhance the effectiveness of medical care. For this human interface study, one important issue is investigating the possibility of the interface providing a new modality for patients in the practice.
2 How to Introduce Facial Expression Training for Patients with Cleft Lip and/or Palate into Clinical Practice 2.1 Examination of How to Introduce Facial Expression Training into Clinical Practice In this section, methods of introducing facial expression training for patients with cleft lip and/or palate into clinical practice are studied. When first examining a new modality in clinical practice, it is important to provide detailed and specific content since verbal explanation is insufficient to provide an overview of the modality. Seeing if facial expression training is accepted by patients, and how the training would contribute to treatment, is also necessary. For patients with cleft lip and/or palate, it is important to consider their feelings of resistance in expressing their own facial expression.
Clinical Usefulness of Human-Computer Interface
515
Fig. 1. A sample screens to set the target facial expression of iFace (For rough setting)
A questionnaire, asking for patient opinion about using a specific tool in medical practice is used. It is important to present the specific tool and ask for patient opinion at the clinical practice. If there is resistance toward facial expression from patients with cleft lip and/or palate, it is examined if the new modality leading to facial expression training will be accepted, and if the new modality could be utilized in medical practice. For this study, selected is a software tool, iFace [10], a system supporting facial expression training. iFace sets target facial expressions and evaluates if the patient achieves the targets. Experiments to assess iFace have thus far been conducted with 12 dentists. Favorable results show the possibility of its usefulness in medical practice. The function that sets the target facial expressions used for training bases the expressions on a photograph of the face. Using the composed facial expression, iFace can display various facial expressions. Figures 1 and 2 show sample screens to set the target facial expression of iFace. The evaluation compares the actual facial expression of a user with the target facial expression, and then identifies the differences. With iFace, patients are expected to understand the concept of facial expression training and its objectives. With facial portraits of patients, the expectation is to confirm each patient’s reaction toward his or her expressions. The experiment is considered with iFace. Because patients with cleft lip and/or palate may resist using their own facial portraits, careful experimenting is necessary. Considering this resistance, the experiment is conducted step-by-step. In the first phase of the experiment, it is checked if iFace is acceptable to patients with cleft lip and/or palate. Then, considering the results of Phase 1, Phase 2 is conducted. In Phase 2, we studied the possible usefulness of facial expression training with iFace in medical practice.
516
K. Ito et al.
Fig. 2. A sample screens to set the target facial expression of iFace (For detailed setting)
2.2 The Experiment First, in order to determine the possibility of training patients at the clinical practice, the experimental environment is discussed. In experiments, one difficulty is understanding the clinical context because data obtained from experiments in the laboratory are separate from clinical practice. In order to arrange the clinical context to study the subjects, who are the patients receiving medical treatment, the venue chosen was a hospital. Patients going to the hospital for treatment were chosen as subjects. To conduct the experiment within the clinical practice, a time convenient for patients is chosen. Second, items at 2 phases of the experiments are verified. In a Phase 1 experiment, to assess patient resistance to facial expression training, we confirmed their acceptance of iFace. As to its use and the results, available functions of iFace and applicable elements at medical practice were selected for feasible utilization in the medical field. For Phase 2, the effectiveness and usefulness of selected functions are verified from the Phase 1 experiment results. For the experiments, interview arrangements and the questionnaire are scheduled to avoid burdening patients. The experiments are summarized below: • Experiment objectives • Phase 1: Examination of the acceptability of facial expression training • Phase 2: Examination of the possibility of introducing facial expression training into medical practice • Institution: hospital • Subjects: patients with cleft lip and/or palate (at diagnosis) • Time: approximately 30 minutes • Method of assessment: observation, interview, questionnaire
Clinical Usefulness of Human-Computer Interface
517
3 Phase 1 Experiment: Assessing Possible Use of iFace 3.1 Experimental Setting At the Phase 1 experiment, by considering patient resistance to facial expression training, introducing facial expression training into the medical practice is considered. First, it should be noted that patient feelings about facial expression training with iFace may be linked to their perceptions or self-consciousness about their own facial appearances or expressions, not only toward iFace. Patient resistance might also be linked to iFace use of a composed facial image with the patient’s own facial portrait used for the experiment. At the same time, a skillful approach to the inquiry is required so patients can easily express their reactions. With this as background, a program of experiments is as follows for examining acceptability of facial expression training: • Participants actually operate iFace • Participants are interviewed • Questions are focused mainly on self-consciousness about the face and facial expression, impressions after using iFace, and willingness to try facial expression training • Interviewers are the patient’s attending physicians or similarly experienced physicians 3.2 Method of Experiment Procedures. After learning how to operate iFace, participants use iFace to set the target facial expression and use it to evaluate their achievements. Each participant selects the target facial expression to set. Participant reactions are gathered, using the questionnaire and interview. Items in questionnaire • Q1: Are you conscious of changing facial expression in public? (Yes/No) • Q2: What did you think about changing your own face on a computer? (Interesting/disinclined to/do not care/other) • Q3: Do you want to try facial expression training? (Yes /No) Interview • Supplemental explanation of items in questionnaire • Overall impressions The experiments above were approved by the Ethical Review Board at Osaka University School of Dentistry. 3.3 Results The experiment was conducted with 17 patients with cleft lip and/or palate (12 women and 5 men) who came to the hospital for diagnosis.
518
K. Ito et al.
For Q3 results (consciousness of own facial expression) in the questionnaire, 9 answered "yes" and 8 answered "no." For Q2 (changing face on a computer), 13 answered "interesting," 2 answered "disinclined to," and 2 answered "do not care." For Q3 (willingness to try facial expression training), 15 answered "yes" and 2 "no." From the interview, participant reactions on facial expression were gathered, together with explanations supplementing their responses to the questionnaire. Responses demonstrated that participant consciousness of their own facial expressions varied, including the interview results. Reactions regarding facial areas of concern when with people in daily life or if to care about other people’s facial expressions were also divided, demonstrating individual differences. In response to the interview, some participants used the word “muscle,” suggesting a sense of physical limitation in their facial expression. Both participants who answered "disinclined to" in Q2 (change of face on a computer) did not show clear resistance to the entire experiment. For iFace, several participants remarked favorably on iFace allowing them to observe their face and facial expressions that were usually not visible to them, or to watch how their facial expressions changed. Several participants stated that they were very interested in facial expression training. 3.4 Use of iFace in Medical Practice Experiment results demonstrated that participants could confirm changes of their facial expressions with the iFace function that sets the target facial expression. The results also indicated that presentation of changing facial expression may lead to a positive opinion about using facial expression training as part of medical care. This training could help patients to visualize their facial expressions and increase their willingness to try facial expression training. Considering these results, the impact of the following factors is examined in detail in Phase 2 of the experiment in terms of presentation and operations provided to set the target facial expression on iFace. • • • •
Presentation: individual facial portrait image Presentation: facial expressions Presentation: facial expressions change Operation: facial expression search
The screen used to set the target facial expression on iFace is referred to as the "support interface for facial expression training." As regards the usefulness of the training in clinical practice, the following points are examined as the impact from the above factors. • Imaging the facial expression: To clearly imagine the facial expression that one can express • Introduction of facial expression training: To increase patient willingness to engage in facial expression training • Reaction expressed at medical care: Expression of hope and/or opinion on treatment for using facial expression training as part of medical care
Clinical Usefulness of Human-Computer Interface
519
4 Phase 2 Experiment: Verifying the Usefulness of the Support Interface for Facial Expression Training in Clinical Medicine 4.1 Experimental Setting At the Phase 2 experiment, the results from using the support interface for facial expression training were verified. This process follows the Phase 1 experiment. Based on the examination in Section 3.4, the main issues are as follows: 1. Interest in facial expression change with a composed facial expression 2. Impact created by facial expression change with a composed facial expression These issues were raised in a questionnaire and a supplemental interview before and after the experiment. 4.2 Additional Information Required by the Volume Editor Procedures. Similar to the Phase 1 experiment. Items in questionnaire. Before and after the experiment: • Q1: Are there any aspects of your facial expression (shape/movement) that you are concerned about? (4-scale evaluation: not at all, a few, some, many) • Q2: Do you want to consult your attending physician about your facial expression (shape/movement)? (7-scale evaluation: +3 yes to -3 no) • Q3: Do you want help for anything that concerns you about your facial expression (shape/movement)? (7-scale evaluation: +3 yes to -3 no) • After the experiment: • Q4: Was it interesting to select a facial expression with the help of a facial image? (7-scale evaluation: +3 yes to -3 no) • Q5: Was it interesting to observe the various facial expressions? (7-scale evaluation: +3 yes to -3 no) • Q6: Was it interesting to see how the facial expressions changed? (7-scale evaluation: +3 yes to -3 no) • Q7: Have you learned to clearly visualize the kind of facial expressions you wish to make when happy, angry, sad, etc.? (7-scale evaluation: +3 achieved to -3 not achieved) • Q8: Have you learned how to clearly picture the kind of facial expression you actually have? (7-scale evaluation: +3 achieved to -3 not achieved) • Q9: Have you learned to clearly imagine the kind of impression your facial expression makes? (7-scale evaluation: +3 achieved to -3 not achieved) • Q10: Has your interest in facial expression training increased? (7-scale evaluation: +3 increased to -3 not increased) • Q11: Do you want to try facial expression training? (7-scale evaluation: +3 yes to 3 no)
520
K. Ito et al.
Contents of interview • Supplemental explanation of items in questionnaire • Overall impressions The experiments shown above followed procedures approved by the Ethical Review Board at the Osaka University School of Dentistry. 4.3 Results The experiment was conducted on 10 patients with cleft lip and/or palate (9 women and 10 men) who presented at the hospital for diagnosis. The questionnaire results were summarized. Regarding the question of medical care, no significant change was observed before or after the experiment. As for interest in the support interface for facial expression training/average scores of answers to Q4-Q6 were 2.3 (0.8), 2.2(0.9), and 2.3(1.1), respectively. The figure shown in parentheses indicates standard deviation. For imaging of facial expression, average scores of answers to Q7-Q9 were 1.6(1.3), 1.5(1.3), and 1.8(1.0), respectively. For willingness to try facial expression training, average scores of answers to Q10 and Q11 were 2.3(0.8) and 1.9(1.2), respectively.
5 Conclusion For this study, using the interface to introduce a new modality into medical practice was considered. The experiment was for patients with cleft lip and/or palate and the intent was to introduce facial expression training into clinical practice in order to determine the possibility of using the support interface. The study was planned and conducted in 2 phases. Phase 1 results demonstrated the possibility of introducing facial expression training using the method proposed in this study. In the future, by analyzing the results of the Phase 2 experiment, the usefulness of the support interface for facial expression training proposed in this study will be further examined. The future challenge is additional research into the possibility of conducting facial expression training for patients via the support interface and the possibility of increased medical communication between physicians and patients.
References 1. Moriguchi, T. (ed.): Treatment of Cleft Lip and/or Plate 2nd edn., Kokuseido, Tokyo (2007) (in Japanese) 2. Ogino, Y., Nishimura, Y., Tsuyoshi, T. (eds.): Treatement of Cleft Lip and/or Plate – Clinical Practice and Surgery. Kokuseido, Tokyo (2001) (in Japanese) 3. Takado, T. (ed.): Management of Cleft Lip and/or Palate. Kanehara & Co., Tokyo (2005) (in Japanese) 4. Trotman, C.A., Faraway, J.J., Phillips, C.: Visual and Statistical Modeling of Facial Movement in Patients with Cleft Lip and Palate. Cleft Palate Craniofac J. 42(3), 245–254 (2005)
Clinical Usefulness of Human-Computer Interface
521
5. Ritter, K., Trotman, C.A., Phillips, C.: Validity of Subjective Evaluations for the Assessment of lip Scarring and impairment. Cleft palate Craniofac J. 39(6), 587–596 (2002) 6. Hunt, O., Burden, D., Hepper, P., Johnston, C.: The Psyhosocial Effects of Cleft Lip and Palate: a Systematic Review. European J. of Orthodontics 27, 274–285 (2005) 7. Hirose, T.: A literature Review of syhosocial Problems of Children with Cleft Lip and/or Palate. J. Jpn. Cleft Palate Assoc. 24, 348–357 (1999) (in Japanese) 8. Kawai, M., Natsume, N.: To understand Cleft Lip and/or Palate. Ishiyaku Pub., Tokyo (1989) (in Japanese) 9. Inudou, F.: Facening. Seishun Publishing Co., Ltd., Tokyo (1997) (in Japanese) 10. Ito, K., Kurose, H., Takami, A., Shimizu, R., Nishida, S.: Proposal of a Facial Expression Training System toward Target Expression, IEICE Technical Report, HCS2007-4, pp.19–24, (2007) (in Japanese)
The Evaluation of Pharmaceutical Package Designs for the Elderly People Akira Izumiya1, Michiko Ohkura2, and Fumito Tsuchiya3 1
Graduate School of Engineering Shibaura Institute of Technology 2 Shibaura Institute of Technology 3 Dental Hospital, Faculty of Dentistry, Tokyo Medical and Dental University
Abstract. In recent years, many medical accidents have been caused by the confusing designs of pharmaceutical packages and displays. For osteoporosis treatment, a common disorder of elderly females, improving display visibility and operability is especially important. However, since the osteoporosis drugs to be taken once a week are sold in various package designs, differences of viewability, understandability, and operability must be clarified from the differences of package design. We performed experiments as follows. Keywords: drug labels, drug of osteoporosis, medical accident, package design.
1 Introduction In recent years, many medical accidents have been caused by the confusing design of pharmaceutical packages and displays [1]. For osteoporosis treatment, a common disorder of elderly females, improving display visibility and operability is especially important. Also, the osteoporosis drug used in this research is bisphosphonate in weekly doses. Improving this drug’s display visibility is more important than other displays of PTP-sheets for the following reasons. • Long-term continued use is necessary. • The drug has specific precautions due to bisphosphonate. • If these precautions aren’t heeded, such side effects as digestive disorders are possible. • Taking a drug once a week is relatively uncommon in Japan. Thus, we have researched the display of osteoporosis drugs that are taken weekly [2]. However, since the osteoporosis drugs taken weekly are sold in various package designs, the differences of viewability, understandability, and operability must be clarified from the differences of package design. We performed the following experiments.
2 Experimental Overview Four types of osteoporosis drugs are sold in weekly doses (Figs. 1a-d). These drugs are packaged in “blister cards” with one or two tablets, because the precautions of the M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 522–528, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Evaluation of Pharmaceutical Package Designs for the Elderly People
523
drugs need many words. A “blister card,” which covers the drug in PTP sheets with paper and resembles a card comes in two package types. • “Card-type”: a tablet is removed directly from the card (Figs. 1a-b). • “Fold in three-type”: a tablet can be removed from the card after opening the fold (Figs. 1c-d). Forty-eight senior citizens served as participants in this experiment. We showed these drugs individually to each participant and then sent out questionnaires about the drugs to them. The orders of the shown drugs were counter-balanced.
(a) Drug A
(b) Drug B
(c) Drug C
(d) Drug D Fig. 1. All osteoporosis drugs taken weekly
3 Questionnaire The questionnaires in questionnaire sheets had 14 items (Table.1). Participants evaluated each item on a scale of 1 to 7. The oral questionnaires had two items (Table.2). Participants selected the best drug from the four drugs.
524
A. Izumiya, M. Ohkura, and F. Tsuchiya Table 1. Questionnaire items on 1 to 7 scale Number Q. 1 Q. 2 Q. 3 Q. 4 Q. 5 Q. 6 Q. 7 Q. 8 Q. 9 Q. 10 Q. 11 Q. 12 Q. 13 Q. 14
Content Do you think the display of “weekly dose” is eye-catching? Do you think the precautions for taking this drug are eyecatching? Do you think the characters are easy to read? Do you think the colors are easy to use? Do you think the display is easy to understand? Do you think the way of coping with mistaken dosages of this drug are easy to understand? Do you think the place for writing the date for taking this drug is right? Do you think it is easy to remove the tablets from the package? Do you think the package is easy to handle? Do you think it is easy for recycling? What do you think about its desirability? What do you think about its tenderness? What do you think about its familiarity? What do you think about its complexity? Table 2. Oral questionnaire items
Number Q. 1 Q. 2
Content Which drug do you think is the best for preventing dosage mistakes? Which drug do you think best prevents taking the wrong medicine?
4 Experimental Result The average score of each questionnaire is shown in Fig. 2. We performed an analysis of the principal components on the averaged scores. The eigenvector of each item from the first component is shown in Fig. 3, and the eigenvector of each item from the second component is shown in Fig. 4. As a result, we extracted the first and second bases. The first basis expresses the “drug’s visibility,” and the second basis expresses the discrimination between “look” and “handling.” The point of the principal component of each drug is shown in Fig. 5, where the vertical axis is the first component and the horizontal axis is the second component. We got the following classification from Fig. 5. • Easy to view from physical aspect: Drug A. • Easy to use: Drugs B and C. • Good impression of display: Drug D. The scores of Drugs C and D for question Nos. 8 and 9 were lower than the scores of Drugs A and B because the pharmaceutical package designs of Drugs A and B are “card-type” but Drugs C and D are “fold in three-type”. We got a two-factor analysis
The Evaluation of Pharmaceutical Package Designs for the Elderly People
Very
Neither
525
Very
"Weekly dose" isn't eye-catching
"Weekly dose" is eye-catching
Precautions aren't eye-catching
Precautions are eye-catching
Characters aren't easy to read
Characters are easy to read
Colors aren't easy to view
Colors are easy to view
Display isn't easy to understand
Display is easy to understand
Way of coping isn't easy to understand
Way of coping is easy to understand Place to write a date is right
Place to write a date isn't right
It is easy to remove the tablets
It isn't easy to remove the tablets Package isn't easy to handle
Package is easy to handle
It isn't easy to recycle
It is easy to recycle
Not desirable
Desirable Tenderness
Not tenderness Not familiarity
Familiarity
Not complexity
Complexity
1
2
3
4
5
6
Drug A Drug B Drug C Drug D
7
Fig. 2. Average score from each questionnaire
first component
complexity precautions display easy to understand characters are easy to read easy to remove "weekly dose" color are easy to view place to write date easy to handle familiarity way of coping desirability tenderness easy to recycle 0
0.1
0.2
0.3
0.4
characteristic vector
Fig. 3. Eigenvector of each item from first component
0.5
526
A. Izumiya, M. Ohkura, and F. Tsuchiya
second component characters are easy to read place to write date easy to recycle desirability precautions display easy to understand familiarity color are easy to view "weekly dose" desirability way of coping complexity easy to remove easy to handle
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
characteristic vector
Fig. 4. Eigenvector of each item from second component
“look”
↑ | |
1
second component
| | ↓
points of principal component
2
Drug A Drug B Drug C Drug D
0 -1
“handling”
-2
-2
-1
not easy to view ←
0 first component
1
2
→ easy to view
Fig. 5. Points of principal component of each drug
of variance of each question item. The factors are “gender” and “four types of drugs.” The main effects from “gender” are significant on the following items. • Q4. Do you think the colors are easy to view? • Q6.Do you think the way of coping with mistaken dosages of this drug are easy to understand? The main effects from “four drugs” are significant on the following items. • Q1. Do you think the display of “weekly dose” is eye-catching?
The Evaluation of Pharmaceutical Package Designs for the Elderly People
• • • • • • • • •
527
Q2. Do you think the precautions for taking this drug are eye-catching? Q3. Do you think the characters are easy to read? Q4. Do you think the colors are easy to view? Q5. Do you think the display is easy to understand? Q7. Do you think the place for writing the date for taking this drug is right? Q8. Do you think it is easy to remove the tablets from the package? Q9. Do you think the package is easy to handle? Q13. What do you think about its familiarity? Q14. What do you think about its complexity?
These results show that the effect of “four types of drugs” is stronger than the effect of “gender.”
5 Results of Oral Questionnaires The numbers of answers for the two oral questions (Table.2) are shown in Figs. 6a-b. From Figs. 6a-b, “Drug A” was selected by many people both in Q1 and Q2 (Table.2). The result shows that “Drug A” is well organized for visibility and preventing “mistaken dosages” and “taking the wrong medicine.”
Male
14
Female
sr 12 er e 10 w s na 8 f o sr 6 eb 4 m u n 2
16
Male
Female
14
rse re 12 w sn 10 a f 8 o sr 6 e b m4 u n 2 0
0
drug-A drug-B drug-C drug-D
(a) Question 1
drug-A drug-B drug-C drug-D
(b) Question 2
Fig. 6. Number of answers for two oral questions
6 Conclusion In this study, we experimentally clarified how viewability, understandability, and operability differ depending on the differences of the package designs of osteoporosis drugs. The results of the principal component analysis, suggest the positioning from the principal component points of each drug. The results of the analysis of variance to
528
A. Izumiya, M. Ohkura, and F. Tsuchiya
each question item suggest that the effect of “four types of drugs” is stronger than the effect of “gender.” From the results of oral questionnaires, “Drug A” was selected by many people both in Q1 and Q2 (Table.2). “Drug A” is well organized for visibility and for preventing “mistaken dosages” and “taking the wrong medicine.”
References 1. Tsuchiya, F.: Malpractice prevention and ideal way of packaging of medical products and display. Pharm Tech Japan, Jiho 19(11), 27–37 (2003) (in Japanese) 2. Izumiya, A., Ohkura, M., Tsuchiya, F.: The evaluation of blister card design of drug for single weekly dose intended for the female elderly. Collected papers of Human Interface 9(4), 481–484 (2007) (in Japanese)
Implications for Developing Information System on Nursing Administration – Case Study on Nurse Scheduling System – Mitsuhiko Karashima1 and Naotake Hirasawa2 1
School of Information and Telecommunication Engineering, Tokai University, 1117, Kita-Kaname, Hiratsuka, Japan [email protected] 2 Department of Information and Management Science, Otaru University of Commerce, Midori 3-5-21, Otaru, Hokkaido, Japan [email protected]
Abstract. This research was focused on the nurse scheduling system as the supporting system for the nursing administration. In this research as a case study, the nurse scheduling system was developed by applying the human centered design process. The head nurses claimed that it was a higher workload for them to rearrange the imperfect roster of the first automated scheduling system than to make the roster from the beginning because the mathematical solution for the system cannot always propose the roster which has no violation. The nurse required the scheduling support system which supported her heuristic scheduling instead of the automated system. The nurse scheduling support system was developed and the other nurses were more satisfied with the system than with the conventional popular automated scheduling system. From the results of this research, the approach of the development of the supporting system for administration was discussed. Keywords: nurse scheduling, human centered design, usability, administration system.
1 Introduction Japanese government has been promoting the development and implementation of information systems and networks for the administration of health services these last 10 years in order to realize the supply of the high quality, efficient, and low cost medical services [1]. The situation in some western countries might be similar [2]. The electric health care card system is a kind of medical information system and the system has been gradually implemented from the large hospitals in Japan and about 20% of the large hospitals, which have more than 400 beds, had implemented the system in 2005 [3]. The nurse scheduling system was the popular supporting system for nursing administration in Japan because the scheduling system tended to be included in the electric health care card system. The nurse scheduling problem has been researched in Operational Research field since 1970s [4] and many system developers have supplied the automated nurse scheduling systems to hospitals. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 529–538, 2009. © Springer-Verlag Berlin Heidelberg 2009
530
M. Karashima and N. Hirasawa
However, few of the researches and the developments of the nurse scheduling system have been focused on the usability of the systems. In this research, as a case study, the nurse scheduling system is developed by applying the human centered design process [5] in order to increase the usability of the nurse scheduling system. Generally the system development approach for the nurse scheduling problem aimed the automated scheduling solution as previously stated. On the other hand, the head nurses had adopted heuristic scheduling approach on the paper. There was the other approach of the system development which aimed to support their heuristic scheduling. In this research, through the development of the nurse scheduling system, it was discussed that the system development approach influenced to the usability of the supporting system for administration.
2 Methods This research project was carried out as the case study in the period between 1999 and 2003. The project was designed that consisted of the steps, the research for the context of use for the nurse scheduling system, the definition of the nurses’ requirements for the system, the development of the system, and the estimation of the system by the nurses, and the repetition of this circular steps according to the human centered design process in ISO13407 [5]. Step1: The questionnaire about the constraints, the methods of the nurse scheduling to a head nurse was held in a public hospital in northern Japan. The questionnaire also asked the experience of the nurse scheduling. The questionnaire was consisted of items as follow; • The profile of the head nurse who solves the nurse scheduling problem. ─ Age, position, experience of nurse scheduling, and time spent in scheduling per month. • The constraints of the nurse scheduling in the section which depend on the nursing administration regulation of government or hospital regulation. ─ The number of the nurses who belong to the section. ─ The minimum required number of the nurses per night shift, per semi-night shift and per day ─ The number of the teams in the section ─ The existence of the classification of the nurses ─ The night shift time, the semi-night shift time, and the day time ─ The maximum times of the night shift and the semi-night shift in a month ─ The maximum number of the continuous days ─ The permission of the continuous night shifts and semi-night shifts ─ The minimum interval between night shifts and semi-night shifts ─ The number of the holiday in a week, in four weeks, and in a month ─ The continuous weekend holidays ─ The nurse service patterned order about holiday, working day, working night shift, and working semi-night shift.
Implications for Developing Information System on Nursing Administration
531
• The constraints which depend on each nurse’s personal preferences and conditions. ─ Holiday, Annual paid leave, Required working night shift, Required working semi-night shift, Required working day, consumption rate of annual paid leaves, and the satisfaction of the requirements of the last month. • The four degrees of the priority which each constraint is kept. ─ ─ ─ ─
Grade 1: the constraint has to be kept strictly. Grade2: the constraint should be kept as much as possible. Grade3: the constraint had better be kept if possible. Grade4: the constraint does not need to be kept.
• The rules of the nurse scheduling. ─ The patterned order about holiday, working day, working night shift, and working semi-night shift. ─ The adjusting methods for conflicting constraints. ─ The adjusting methods for conflicting requirements with the nurses. • The devices for the nurse scheduling. ─ The supported equipments ─ The estimation of the outcome of the nurse scheduling • The experience of the nurse scheduling ─ The problem of the present nurse scheduling, the experience of the nurse scheduling system, the problem of the system, the necessity of the system, and the requirements of the system The observation of her behaviors of the paper based nurse scheduling was held in the public hospital in northern Japan. She was required to schedule a set of day, night shift, and semi-night shift nurses within a month and make a roster as usual. She was required to think loud during scheduling. Her behaviors and utterance were recorded by two video cameras and the characteristic behaviors and utterance were measured by the event sampling method. Step2: The requirements for the nurse scheduling system were discussed from the results of both the questionnaire and the observation. Step3: The nurse scheduling system was developed based on the requirements. Step4: The head nurse of the hospital was required to schedule a set of nurses within a month and to make a roster by using the nurse scheduling system. She was required to think loud during scheduling. Her behaviors and utterance were recorded by two video cameras and the characteristic behaviors and utterance were measured by the event sampling method. She was required to answer the satisfaction of the roster by the system and the usability of the system in comparison with the paper based nurse scheduling. As the repetition of the steps, the functional and usability problems of the proposed system and the new requirements of the officer were extracted by the results of the former step. The system itself changed from the scheduling system to the scheduling support system according to the officer’s requirements. The officers of the other
532
M. Karashima and N. Hirasawa
private hospital in the central area of Japan were required to estimate the roster by the support system and the usability of the system through scheduling a set of nurses within a month for two months.
3 Automated Approach 3.1 Mathematical Model of Nurse Scheduling In this section, we present the nurse scheduling model. This model is categorized as the mathematical programming approach [6]. This model is built up by three stages, nurse preferences, night shift working, semi-night shift working, and day and holiday working scheduling. On the first stage, all the preferences of all the nurses are input the system. On the second stage, night shift working and semi-night shift working schedule is arranged. On the last stage, day working and holiday working is arranged. On each stage of both the second and third, the former arranged schedule might be modified if the value of the fitness function could decrease. The fitness function evaluates the degree of constraint violation. The model adopts the greedy algorithm for a combinatorial optimization problem which can be formulated as Integer programming. The arranged schedule is the approximate optimal schedule when the value of the fitness function is the minimum. The fitness function is consisted of three kinds of the variables (Xi, Yj, Zk) of the constraints and their own weighted parameters (αi, βj, δk) as Eq. (1) shows. Eq. (2) shows the total number of violation which is counted by the unit of the nurse. N is the set of the nurses. If the constraint was satisfied in the case of constraint “i” and nurse “l”, Xil equals 0, and if not, Xil equals 1. Eq. (3) shows the total number of violation which is counted by the unit of the day. T is the set of dates in the scheduling period. If the constraint is satisfied in the case of constraint “j” and date “t”, Yjt equals 0, and if not, Yjt equals 1. Eq. (4) shows the total number of violation which is counted by the unit of the nurse and the day. If the constraint is satisfied in the case of constraint “k”, nurse ”l”, and date “t”, Zktl equals 0, and if not, Zktl equals 1.If all the constraints are satisfied, the value of the function “P” equals zero. All the variables of the constraints are chosen and all the parameters are decided by the results of the questionnaire. Table 1 shows all the variables and all the weighted parameters.
P = ∑αi X i + ∑ β jY j + ∑δ k Z k
(1)
X i = ∑ X il
(2)
Y j = ∑ Y jt
(3)
∑Z
(4)
l∈N
t∈T
Zk =
l∈N ,t∈T
klt
Implications for Developing Information System on Nursing Administration
533
Table 1. Weighted parameters for the penalties of the violations 9DULDEOHV
;L
<M
=N
:HLJKWHG SDUDPHWHUV &RQVWUDLQWV .HHSWKHQXP EHURIQLJKWRUVHPLQLJKWVKLIWZRUNLQJXQGHUQLJKWVSHUPRQWK .HHSWKHZRUNLQJKRXUVXQGHUKRXUVSHUZ HHNV .HHSWKHQXP EHURIKROLGD\V .HHSWKHGHYLDWLRQRISOXVRUPLQXVGD\IURP WKHDYHUDJHGKROLGD\ZRUNLQJQXPEHU .HHSFRQVHFXWLYHKROLGD\VPRUHWKDQRQFH .HHSFRQVHFXWLYHKROLGD\VPRUHWKDQWKUHHGD\VLQFOXGLQJZHHNHQG .HHSFRQVHFXWLYHKROLGD\VPRUHWKDQWKUHHGD\V .HHSUHTXLUHG QXPEHURIQXUVHVSHUGD\ .HHSUHTXLUHG QXPEHURIQXUVHVSHUQLJKWVKLIW .HHSUHTXLUHGQXPEHURIQXUVHVSHUVHPLQLJKWVKLIW $YRLG WKHWHDP ZKLFKFRQVLVWVRIRQO\QRYLFHQXUVHV $YRLG WKHWHDP ZKLFKFRQVLVWVRIRQO\SUDFWLFDOQXUVHV $YRLG WKHWHDP ZKLFKGRHVQRWNHHSWKHUHJXODWLRQ $YRLG ZRUNLQJ FRQVHFXWLYHO\LQGD\VHPLQLJKWVKLIWDQGQLJKWVKLIW .HHSWKHQXUVHVHUYLFHSDWWHUQHGRUGHU .HHSWKHQXUVHVHUYLFHSDWWHUQHGRUGHU .HHSWKHQXUVHVHUYLFHSDWWHUQHGRUGHU .HHSWKHQLJKWVKLIWRUVHPLQLJKWVKLIWZRUNLQJLQWHUYDOPRUHWKDQQLJKWV $UUDQJH WKHKLJKVNLOOHGQXUVHVWRWKHVSHFLDOVHFWLRQV $YRLG FRQVHFXWLYHGD\ZRUNLQJVPRUHWKDQGD\V $YRLG FRQVHFXWLYHGD\VHPLQLJKWVKLIWRUQLJKWVKLIWZ RUNLQJVPRUHWKDQGD\V .HHSWKHVDPH QXUVHDVWKHOHDGHUEHWZ HHQZHHNGD\V 6DWLVI\WKHQXUVHSUHIHUHQFHV
3.2 User Interface of the Nurse Scheduling System In this section, we present the interfaces of the nurse scheduling system. The interfaces for input are designed as satisfying the requirements from the questionnaire and the observation. Every information input can be done by pushing the command button with mouse device. The interface for output is designed which has the same layout as the paper based roster. The procedure of the usage of the system is as follows. The head nurse is required to choose the month scheduled on the start screen. Next, if she wants to change the staff information, add, or delete staffs, the head nurse pushes the staffs’ personal information command and can modify the staff information. If she wants to change the constraints’ conditions for scheduling, she pushes the initial condition command and can change the value of the constraints and the value of their weighted parameters. Next if there are some staffs’ preferences, she pushes the preference scheduling command and can enter their preferences. Then if the head nurse pushes the schedule command of a ward, the system starts the scheduling process and makes the month’s scheduled sheet of the ward. 3.3 Results from User Testing The user test for the proposed nurse scheduling system was held. The head nurse was required to schedule a set of nurses within a month and to make a roster by using the nurse scheduling system. Her behaviors and utterance were recorded by the event sampling method. By the event sampling method it was confirmed that the system could be used without any problem. It revealed that the usability of the system was sufficient. On the other hand, the roster was not the optimal schedule but the approximate optimal schedule with which some constraints were not satisfied, because the nurse scheduling problem is NP-hard [7] that the number of the constraints was huge and
534
M. Karashima and N. Hirasawa
the optimal solution for scheduling might not exit. She was not satisfied with the roster by the scheduling system. The reason of her dissatisfaction was because it was so difficult for her to re-arrange the proposed schedule without more violations when she wanted to change the part of the imperfect roster. She reported her feeling that it was easier to make the roster from the beginning than to re-arrange it. Finally she estimated the proposed nurse scheduling system was not useful. She wanted the support system for her heuristic scheduling. And she wanted the support system to supply her with the input patterns, which reflected the nurse service pattern and the nursing administration regulation, and to supply her with the important information on the constraints. So we analyzed the observation records again.
4 Heuristic Support Approach 4.1 Support System for Heuristic Nurse Scheduling The requirements for the scheduling support system were discussed from the results of both the questionnaire and the observation. The support system for heuristic scheduling was developed in consideration of many requirements the system should satisfy. Through the usage of the proposed system by three head nurses, the functional and usability problems were clarified. The system was improved in consideration of the problems. 4.2 User Interface of the Support System The proposed system for nurse scheduling supported the heuristic scheduling of the head nurse. The system was made on Microsoft Excel2000. The system supplied the officer the real time information how the roster satisfied the constraints, such as the number of staffs in each day working, the number of staffs in each night shift working, the number of staffs in each semi-night shift working, the number of day workings in each staff, the number of night shift workings in each staff, the number of semi-night shift workings in each staff, the number of holidays and their excess and deficiency in each staff, the number of annual paid leaves in each staff, and so on. The value of the constraints for the regulations of the hospital and the government could be controlled on the system. The officer could make scheduling of each staff in each date and, of course, she could also input the scheduling pattern which were predetermined and registered. By using this support system, the head nurse can do heuristic scheduling while checking how the constraints were satisfied. The interface of this system for making the roster was designed in order to be able to make scheduling in the same way the head nurses made the roster on paper. The procedure of the usage of the system is as follows. The methods to use the start screen and the screen for modifying the nurse information are the same as the former system. If the head nurse wants to change the constraints’ conditions for scheduling, she pushes the nurse information command and the night shift and semi-night shift command, and changes the value of the constraints. Next she can make scheduling of each staff in each date by choosing the item of the list in the combobox in the scheduling sheet. She can also make scheduling by pushing working type command
Implications for Developing Information System on Nursing Administration
535
The information how the roster satisfies the constraints
The horizontal and vertical scrolling command
Nurse name
The working type selected “night shift”
The item command “night shift”
Fig. 1. Scheduling screen of proposed scheduling support system
and the item command in the working type screen as figure 1 shown. The scheduling sheet can be scrolled by horizontal and vertical scrolling commands. The system supplied the officer the real time information how the roster satisfied the constraints. The scheduling sheet can change to the styles for output or for full screen. 4.3 Result from User Testing The user test for the proposed nurse scheduling support system was held in comparison with the popular automated scheduling system in Japan. Two head nurses, who were the expert and the beginner computer users, were required to schedule a set of nurses within a month and to make a roster by using the nurse scheduling support system for two months. Their behaviors and utterance were recorded by the event sampling method. The several nurses were required to answer the questionnaire and the interview for estimating the usability of the system in comparison with their daily used conventional automated system. By the event sampling method it was confirmed that both of the expert and beginner could use the system without any problem. The results of the questionnaire and the interview revealed that the usability of the system was sufficient and they were more satisfied with the system than the conventional automated system as table 2 shows. Table 2. Two way mixed-design ANOVA with Qeustions and Systems Source Question System Question×System Total
SS 27.0125 58.1405 22.2845 107.4375
DF 49 1 49 99
MS 0.5513 58.1405 0.4548
F 1.5526 16.5011 1.2808
P<0.05 P<0.01 p>0.10
536
M. Karashima and N. Hirasawa
5 Discussions Through the development of the support system for the nurse scheduling, it was clarified that the head nurses were not always satisfied with the automated nurse scheduling system by the mathematical programming approach because the automated system could not propose the perfect solution to a large number of constraints and the nurses had to rearrange the roster. It was so hard work for the nurses to rearrange the imperfect roster without more violations to match the roster with their actual feelings about the priority of the constraints. The more the nurses consider the staffs’ preferences, the more difficult it might be for them to rearrange the roster. It might be impossible to improve the automated system to match the proposed solution with their feelings because there might not be the perfect solution satisfying a lot of constraints. Furthermore the nurses could not be conscious of the priority of the constraints correctly in order to propose the approximate optimal solution satisfying their feelings. The results of this research revealed that the nurses required the support system for their heuristic scheduling rather than the improved automated system. Of course, if the number of constraints was not large or the head nurse did not need to consider staffs’ preferences sufficiently, the head nurse might be satisfied with the automated scheduling system because the system could propose the optimal solution. The results of this research suggested that the supporting system for the administration should not always set goal to the automated solution system (automated system). When the administration belongs to the loose management type, the approach of the development of the automated system would be not appropriate because the system developed would not propose the optimal solution and the system would be not used by the users. The loose management type means that the number of the constraints is large or the management considers the staffs’ preferences sufficiently. In these situations the supporting system should set goal to the supporting system for heuristic solution (heuristic system). On the other hand, when the administration belongs to the rigorous management type, the approach of the development of the automated system would be appropriate because the system might propose the optimal solution. The rigorous management type means that the constraints are only the government and workplace regulations and the management does not need to consider the staffs’ preferences. Figure 2 shows the concept of this relationship between the management types and the system development approach on the usability view. As mention above, the direction of the system development approach depends on the management type. In order to set the direction correctly, the actual task supported by the system should be analyzed in details, and the context of use of the supporting system and the influence of the supporting system to the business and the organization should be researched. The deep understandings of the task and the influence of the system implementation would lead the supporting system in the correct direction. For these understandings, the human centered design approach, the contextual design [8], and the socio-technical design approach [9] would be effective.
Implications for Developing Information System on Nursing Administration
537
ƵƚŽŵĂƚĞĚ^LJƐƚĞŵ
㽢
䘟
>ŽŽƐĞ DĂŶĂŐĞŵĞŶƚ
ZŝŐŽƌŽƵƐ DĂŶĂŐĞŵĞŶƚ
䘟
㽢
,ĞƵƌŝƐƚŝĐ^LJƐƚĞŵ Fig. 2. Concept of relationship between management type and system development approach on usability view
6 Conclusions This research was focused on the nurse scheduling system as the supporting system for the nursing administration. In this research as a case study, the nurse scheduling system was developed by applying the human centered design process. The head nurses claimed that it was higher workload for them to rearrange the imperfect roster of the first scheduling system than to make the roster from the beginning because the mathematical solution for the nurse scheduling system cannot always propose the roster which has no violation. The nurse required the scheduling support system which supported their heuristic scheduling instead of the mathematical programming approach. The nurse scheduling support system was developed and the two other nurses were higher satisfied with the system than with the conventional popular scheduling system in Japan by the mathematical programming approach. From the results of this research, it was suggested that the approach of the development of the automated system was not always correct direction of the development of the supporting system for administration. The concept of the relationship between the management type and the system development approach on the usability view was proposed. It was also proposed that the deep understanding of the task supported by the system and the influence of the system implementation would lead the supporting system for an administration in the correct direction and that the human centered design approach, the contextual design approach, and the socio-technical design approach would be effective.
538
M. Karashima and N. Hirasawa
References 1. Japanese Ministry of Health, Labour, and Welfare (in Japanese), http://www.mhlw.go.jp/shingi/0112/s1226-1.html 2. Ludwick, D.A., Doucette, J.: Adopting electronic medical records in primary care: Lessons learned from health information systems implementation experience in seven countries. International Journal of Medical Informatics 78, 22–31 (2009) 3. Japanese Ministry of Health, Labour, and Welfare (in Japanese), http://www.mhlw.go.jp/toukei/saikin/hw/iryosd/05/kekka1-3.html 4. Vanhoucke, M., Maenhout, B.: On the characterization and generation of nurse scheduling problem instances. European Journal of Operational Research 196, 457–467 (2009) 5. ISO13407: Human-Centered Design Process for Interactive Systems (1999) 6. Smith, L.D., Wiggins, A.: A computer-based nurse scheduling system. Computer and Operations Research 4, 195–212 (1997) 7. Osogami, T., Imai, H.: Classification of various neighbourhood operations for the nurse scheduling problem. In: Lee, D.T., Teng, S.-H. (eds.) ISAAC 2000. LNCS, vol. 1969, pp. 72–83. Springer, Heidelberg (2000) 8. Holtzblatt, K., Wendell, J.B., Wood, S.: Rabid contextual design. Morgan Kaufmann, San Francisco (2005) 9. Eason, K.: Understanding the Organisational Ramifications of Implementing Information Technology Systems, completely revised. In: Helander, M., Landawer, T.K., Probhu, P. (eds.) Handbook of Human Computer Interaction Second, pp. 1475–1495. Elsevier, Amsterdam (1997)
Analysis on Descriptions of Dosage Regimens in Package Inserts of Medicines Masaomi Kimura1, Kazuhiro Okada1, Keita Nabeta2, Michiko Ohkura1, and Fumito Tsuchiya3 1
Faculty of Engineering, Shibaura Institute of Technology, 3-7-5 Toyosu, Koto City, Tokyo 135-8548, Japan 2 Graduate School, Shibaura Institute of Technology, 3-7-5 Toyosu, Koto City, Tokyo 135-8548, Japan 3 Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo City, Tokyo 113-8510, Japan [email protected], [email protected], {m107069,ohkura}@sic.shibaura-it.ac.jp, [email protected]
Abstract. To prevent medical accidents caused by mix-up, the confirmation of usage should be the key to determining error. If a computerized order entry system for medicines shows information concerning therapeutic indications to doctors, they can subsequently avoid mix-ups of medicines such as the case in question. To investigate data which can be utilized for a database in such an entry system, we study the description patterns of the sentences in the dosage regimen portion of the SGML formatted package inserts data via a method based on the text mining technique. Based on this result, we also propose the data structure of dosage regimen information, which will be the basis of a drug information database to ensure safe usage. Keywords: medical safety, text mining, data structure.
1 Introduction To prevent medical accidents, such as mix-ups involving medicines, double dosage and insufficient dosage, it is necessary to ensure the proper treatment of the right medicines, namely, ‘safety of usage’ of medicines. Recently, in some Japanese hospitals, fatal accidents have occurred due to mix-ups involving a steroid, Saxizon, with a similarly-titled medicine, Succine, which is a muscle relaxant. There are two conceivable ways to avoid such accidents, one of which is to prevent the naming and use of medicines resembling other medicines in their name, both in terms of appearance and sound. Another method is to confirm the medicine by checking the actual usage based on their dosage regimens. Though the former method can be realized by utilizing a name checking system such as a ‘medicine similar search engine’, which is provided by the Japan Pharmaceutical Information Center or making a rule to adopt medicines which have confusing names, the accident is known to have occurred despite the existence of a rule to reject M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 539–548, 2009. © Springer-Verlag Berlin Heidelberg 2009
540
M. Kimura et al.
Succine due to its confusing name. This suggests to us that the latter, namely the confirmation of usage, should be the key to determining error. Consider the case when a doctor inputs prescription data into a computerized order entry system for medicines. If the system shows him information concerning therapeutic indications, he can subsequently avoid mix-ups of medicines such as the case in question. To enable this, the order entry system requires a database containing information on dosage regimens so that the proper usage can be verified. As a side effect of utilizing such system, we anticipate the prevention of accidents or incidents caused by incorrect quantities. To obtain data for the database of dosage regimens, the most reliable data source is a package insert, which must be compulsorily published by pharmaceutical companies as an official document and attached to its medicine. Package inserts are, however, distributed as paper documents and unsuitable for processing by a computer system. With this in mind, we use SGML formatted package insert data instead of the original package inserts, released by the Pharmaceutical and Medical Devices Agency (PMDA), which is an extra-departmental body of the Japanese Ministry of Health, Labor and Welfare. SGML is an old-established markup language, which adds metadata and structures to data by tagging, which is defined by DTD. In fact, although the Pharmaceutical and Medical Devices Agency also discloses the DTD of SGML formatted package insert data, it is difficult to leverage the defined structure to analyze data concerning the portion of dosage regimens. This is because information concerning dosage and administration is mainly described by the sentences in tagged elements and not well structured to directly find the necessary information within the same. In other words, the structure of the portion of dosage regimens does not achieve sufficiently fine granularity to enable its effective utilization in a computer system, such as the order entry system mentioned above. In this study, we study the description patterns of the sentences in the dosage regimen portion of the SGML formatted package inserts data via a method based on that of ‘word-link’, namely the text mining technique which we have proposed. Based on this result, we also propose the data structure of dosage regimen information, which will be the basis of a drug information database to ensure safe usage.
2 Target Data In this study, as mentioned in the previous section, we analyze SGML formatted package insert data of medicines for medical care, which can be downloaded from the PMDA web site. Since we need the list of medicines to retrieve the data, we utilize the standard medicine master data (the version released on September 30, 2007), which is provided with The Medical Information System Development Center (MEDIS-DC). As a by-product, we can correspond the package insert data with HOT9, the medical identification code, via which the name, dosage form, standard unit of quantity and pharmaceutical company can be determined, although package inserts are not included. Using the master data, we obtained 11,685 SGML files, which are our target data. We have to note the fact that SGML, which is an ancestor of XML, plays two roles, one of which is a document format and another a data structure. Though SGML
Analysis on Descriptions of Dosage Regimens in Package Inserts of Medicines
541
formatted package insert data are originally created from the aspect of the former role, we focus on the latter, the structure that contains data that is usable as a data source of computer systems such as an ordering system.
3 Methods Though our target data is SGML files whose tags define the structure of descriptions concerning dosage regimens, the detailed information is described using (Japanese) sentences. On the one hand, we analyze the tagged structure to find the description, but at the same time, we have to apply a text-mining technique to analyze the more than ten thousand sentences in the files. We thus discuss the analytical method from the following two aspects: • The (ambiguity of) tag definition that defines the data structure of dosage regimens. • The grammatical structure of sentences contained in the ’detail’ elements, which describe detailed information concerning dosage regimens. 3.1 The Tag Definition Defining the Data Structure of Dosage Regimens DTD released by PMDA defines tags related to dosage regimens, as shown in Fig. 1. The characteristic structure of this DTD is that the ‘infoindicationsorefficacy’ element can also include the ‘indicationsorefficacy’ element, and both may contain ‘doseadmin’ elements. The ‘infoindicationsorefficacy’ element contains information concerning both effect-efficacy and dosage regimens, of which the ‘indicationsorefficacy’ element represents the effect-efficacy portion. Since the ‘doseadmin’ element provides us with information concerning dosage regimens in the form of ’detail’ elements therein, this shows there are two ways to allocate the description of dosage regimens. The structure of the ‘indicationsorefficacy’ element defined in DTD suggests that the iteration of element pairs, ‘variablelabel’ and ‘doseadmin’, corresponds to the description of dosage regimens for multiple effects and efficacies. In addition, there are other structures containing a ’detail’ element, such as the ‘low1subitem’ element, which supports itemized descriptions with multiple levels. (There are six levels of elements, such as ‘low1subitem’, ‘low2subitem’… ‘low6subitem’. The elements have nested structures e.g. ‘low1subitem’ can include ‘low2subitem’.) From a data processing perspective, to extract dosage regimen information from SGML via a computer program, the variation of the description makes retrieval of data while preserving its structure a complex task. Of course, in the event that just one of the structures is adopted, this will be simplified, but otherwise, we would have to say that the data structure is unsuitable for utilization in a computer system. To determine the current state of use of the structures, we investigate the distribution of ‘detail’ elements via the following steps: a-i. a-ii.
Extract the ’detail’ elements contained in the ‘infoindicationsorefficacy’ element from each SGML file. Obtain the relative location path of the ’detail’ element from the root element like XPATH for XML data. Additionally, assign a sequential number for each path in the SGML.
542
M. Kimura et al.
a-iii.
Aggregate the location path with the largest sequential number in each SGML.
Elements such as ‘low1subitem’, which correspond to the itemization of descriptions, contain an ‘item’ element which sits right before each ’detail’ element. Accordingly, it can be assumed that the ‘item’ element holds a kind of metadata of the data contained in the ’detail’ element, though the nature of the metadata contained therein remains unclear. To investigate the information held by ‘item’ elements right before ’detail’ elements, we extract text data, apply morphological analysis and detect frequentlyused nouns, the topics of which are expected to be shown and hints offered, to ensure familiarity with the aspects used to itemize descriptions. Fig. 1. Part of DTD, the definition of the structure of the SGML formatted package insert data released by PMDA
3.2 The Grammatical Structure of Sentences Contained in ‘Detail’ Elements, Which Describe Detailed Information Concerning Dosage Regimens Since the ‘detail’ elements, which we mentioned in the previous section, describe information concerning dosage regimens in sentence form, we apply a text mining technique to them in order to extract the structured data (the structure of data items) of dosage regimens. Generally speaking, the text mining technique employs morphological analysis and/or syntax algorithms such as dependency analysis to divide sentences into words or segments and find the rules or relations between them. The authors have proposed the ‘word-link’ method, which finds the common dependency structures in sentences in descriptions in order to flexibly summarize the common sentences therein. In this study, we apply this method to descriptions in ‘detail’ elements concerning the dosage regimens in each SGML package insert. Since, as a minimum, dosage, administration and adaptation diseases will differ for each medicine, with a considerable scope of expression, our original method, whereby attempts are made to find patterns, including the use of nouns, might result in a failure to find the common sentences. We thus extend it to determine the tendency for the co-occurrence of nouns
Analysis on Descriptions of Dosage Regimens in Package Inserts of Medicines
543
and particles (parts of speech which play roles similar to prepositions in English) and extract structural patterns except for noun variations. The analytical steps are as follows: b-i.
b-ii.
b-iii.
b-iv.
We retrieve sentences in the ‘detail’ elements described in the ‘indicationsorefficacy’ elements introduced in Section 3.1 and apply dependency analysis to them. If the segment in the dependency contains a noun, we differentiate the latter from the segment. The resultant characters are expected to be particles, hence we name a ‘particle candidate’ in this paper. We aggregate nouns that appear in segments including each particle candidate and find the characteristics of the particle candidates in use. We call the part of the segment obtained by removing a particle segment the ‘main part of segment’. We replace the nouns found in b-iii with a symbol such as ‘○○○’ in order to mask them, and apply the word-link method.
If there are certain rules governing the way in which particles should be used, this method extracts the common structures of sentences and suggests us the idea of data items, for which descriptions must be converted into a structured data form.
4 Results 4.1 The Tag Definition Defining the Data Structure of Dosage Regimens Fig. 2 shows the distribution of the occurrence of ‘detail’ elements in ‘infoindicationsorefficacy’ elements, which suggests that there are eleven patterns and that about 75% of ‘detail’ elements appear directly inside ‘infoindicationsorefficacy’ elements. The fact that ‘detail’ elements are found inside ‘indicationsorefficacy’ elements in about 20% of SGML package inserts indicates the actual use of multiple ways to retain dosage regimens in SGML package inserts. The use of ‘low1subitem’ elements in more than 45% of SGML package inserts also suggests the tendency to describe multiple contents by using ‘item’ elements as a title of the content. The result also shows that the 798 SGML package insert files do not include a ‘detail’ element in the ‘infoindicationsorefficacy’ element. These SGML files possibly show information concerning the dosage regimens in tabular form via ‘tblfordoseadmin’ elements. Additionally, there are some (but fortunately in limited number) cases in which the tables of the dosage and administration data are maintained in pictorial format. Please note that it is not realistic for computer programs to extract dosage/administration data from such table images. We show the result of aggregation of words in ‘item’ elements in Fig. 3. This indicates that the word ‘ ’, which means ‘in the case’, frequently surfaces and fulfils the role of reading the content text in the ‘detail’ element following straight after. If we examine the other nouns, we can see there are words related to purposes ’ (disinfection), ‘ ’ (preanesthetic medication), diseases such such as ‘ as ‘ ’ (duodenal ulcers), ‘ ’ (gastric ulcers), persons to be injected such as ‘ ’ (adult), ‘ ’ (child) and product names such as ‘ -k2’
場合
消毒 十二指腸潰瘍 成人
小児
麻酔前投薬 胃潰瘍
リハビックス
544
M. Kimura et al.
テレミンソフト坐薬
(Rehabix-k2) and ‘ ’ (Teleminsoft). Note that we have to see the list from top to bottom in order to find the product names, since they usually appear in only a few SGML files, including the package insert of each product. This result suggests that the perspectives in the descriptions are mainly purposes, disease and individuals to be injected. We do not regard a product name as a condition to apply dosage and administration, since the justification for its inclusion in the list is the fact that some package inserts contain information on plural products, which much be distinguished in order to specify a dosage regimen, though one SGML data should describe one product in order to prevent any mix-up involving the information on each product.
Fig. 2. The distribution of the ‘detail’ elements in ‘infoindicationsorefficacy’ elements
Fig. 3. The nouns in the sentences included in ‘item’ elements that describe dosage regimens
4.2 The Grammatical Structure of Sentences Contained in ‘Detail’ Elements, Which Describe Detailed Information Concerning Dosage Regimens Fig. 4 shows the distribution chart of particle candidates with their frequencies. First, we investigate the nature of the nouns involved in the segments containing the particle candidates appearing frequently in the sentences of dosage regimens. Fig. 4 indicates that the particle candidate of more than 50% of the segments is a null character, namely the segments contain only their main part. Since the targets in Fig. 4 are all segments contained in sentences of dosage regimens, they involve not only nouns but also other part of speech such as verbs. The particle candidate of segments whose
Analysis on Descriptions of Dosage Regimens in Package Inserts of Medicines
545
Fig. 4. The particle candidates of segments included in the ‘detail’ elements describing dosage regimens. (top 20)
Fig. 5. The nouns whose segment has a null character as the particle candidate. (top 20)
Fig. 6. The nouns whose segment has a particle candidate ‘ ’. (top 20)
Fig. 7. The nouns whose segment has a particle candidate ‘ ’ (at/to). (top 20)
Fig. 8. The nouns whose segment has a particle candidate ‘ ’ (as). (top 20)
Fig. 9. The nouns whose segment has a particle candidate ‘ ’ (for). (top 20)
を
として
に
には
546
M. Kimura et al.
により
Fig. 10. The nouns whose segment has a particle candidate ‘ ’ (depending on). (top 20)
Fig. 11. The verbs included in ‘detail’ elements describing dosage regimens
main word is not a noun is expected to be a null character. In the following analysis, we thus exclude segments whose main word is not a noun. Fig. 5 shows nouns in the segments whose particle candidate is a null character. This indicates that such segments contain information about units of administration, ‘ ’ (days), ‘ ’ (times), ‘mg’, the manner of administration, ‘ ’ (arbitrarily), ‘ ’ (usually), and the condition of age such as ‘ ’ (age) and ‘ ’ (adult) and so on. We outline the nouns in the segments, including each particle segment, as follows:
回
日 通常
適宜 成人
年齢
を
• Fig. 6 shows nouns in the segments, including ‘ ’ as a particle segment. We can see that they express amounts of medication such as ‘mg’, ‘ ’ (tablets) and ‘ ’ (titers). • The nouns in the segments whose particle segment is ‘ ’ (at/to) are shown in Fig. 7, which shows that the particle segments tend to be used with frequency-related words ’ (sometimes), and concerning the timing of such as ‘ ’ (times) and ‘ administration, such as ‘ ’ (inter cibos) and ‘ ’ (before bedtime), administration site such as ‘ ’ (in a vein). • The particle segment ‘ ’ (as) is included in the segments whose main words are nouns, as shown in Fig. 8. Besides the nouns for the formulaic phrases, ‘ ’ (as a rule), ‘(1) ’ (as a daily dosage) and ‘ ’ (as a maintenance dosage), the other nouns shown in the figure represent active ingredients of medicines. ’ (for). This • Fig. 9 shows nouns in the segments including the particle segment ‘ mainly contains nouns showing an object person such as ‘ ’ (adult), ‘ ’ (child) and ‘ ’ (elder person). It also shows the name of symptoms such as ‘ ’ (severe infection) and ‘ ’ (hepatic disease). • In Fig. 10, segments whose particle candidate is ‘ ’ (depending on) tend to ’ (symptom). In this figure, we can also read words such as contain the word ‘ ‘ ’ (body weight), ‘ ’ (age), ‘ ’ (objective) and so on. This results and the meaning of the particle candidate suggest that these segments show the condition to adjust a dose.
回
数回 食間 静脈内 として 日量として
として
重症感染症 体重
に
高齢者
症状
力価
就寝前
原則 維持量として
肝疾患
年齢
錠
目的
成人
により
には
小児
Analysis on Descriptions of Dosage Regimens in Package Inserts of Medicines
547
Fig. 12. The result of the word-link method applied to ‘detail’ elements (the links show cooccurrence more than 1149 times)
Based on the results shown above, we can find the tendency of contents in the segments including each particle segment. We, as explained in Section 3.2, replaced each segment containing nouns with the symbol ‘○○○’, to which we appended particle candidates of the segment and applied the word-link method to the same. Fig. 11 shows the verbs used in the sentences of dosage regimens. To absorb the difference in verb expressions, we replace verbs of similar meanings with a representative verb. For instance, the verbs, ‘ ’ (dose orally) and ‘ ’ (drip-feed intravenously) have analogous meanings in terms of medication and are hence consolidated into a single verb. In this paper, to enhance comprehension, we consolidated them into ‘ ’ (administrate/use). Moreover, we consolidated the verbs that mean increase or decrease into ‘ ’ (escalate) and ’ (divide) with ‘ ’ (split). replaced the verb ‘ Following this consolidation, we applied the word-link method and obtained sentence structures based on dependency relationships. Fig. 12 shows the links of dependency relationships appearing more than 1149 times. Based on this figure, we can read the following contents:
経口投与する
投与・使用する 分割する 分ける
点滴静注する 増減する
• Increase or decrease according to conditions such as indication (disease) and age (Part A in Fig. 12). • Dosage based on the information concerning the administration site, frequency, object person, symptoms, amount of medication and (the amount of) active gradients (Part B). • Daily dosage (Part C) and description of conditions (Part D)
548
M. Kimura et al.
Based on these and the fact that verbs indicate the method of administration, we can see that the data structure to describe dosage regimens needs the following items: • • • • • • • •
Indication (disease) Objective person Administration site Amount of medication Amount of active gradient The way of administration Frequency Conditions of increase or decrease
5 Conclusion In this paper, we investigated the description of the dosage regimens of medicines included in SGML formatted package insert data provided by the Pharmaceutical and Medical Devices Agency (PMDA), and suggest the data structure of the database (or data scheme) maintaining the data of dosage regimens, which is expected to contribute to a checking system for the usage of medicines such as a check function in a computerized order entry systems of medicine. The definition of the tag structure in the SGML formatted package insert tolerates two ways of description of dosage regimens, both of which we confirmed to be in actual use. Moreover, we found that the ‘item’ elements, each of which contain metadata of an adjacent ‘detail’ element, include information such as an objective person, indications (disease) and usage. We also analyzed sentences included in the ‘detail’ elements describing dosage regimens by applying the word-link method and found that the data structure of database needs to contain information concerning the indication, the objective person, administration site, amount of medication, amount of active gradient, means of administration, frequency and the conditions of increase or decrease. We can regard this as consistent with the contents in the ‘item’ elements shown above. Though we suggest the data structure to describe dosage regimens in this paper, it must be evaluated by investigating its correspondence with information in each package insert data. After accessing the validity of the structure, we will construct a database of dosage regimens applicable to the checking function of prescription in a computerized order entry system for medicine.
References 1. Pharmaceutical and Medical Devices Agency (PMDA), http://www.info.pmda.go.jp/ 2. Kimura, M., Furukawa, H., Tsukamoto, H., Tasaki, H., Kuga, M., Ōkura, M., Tsuchiya, F.: The Analysis of Questionnaires About Safety of Drug Use, the Application of Text Mining to Free Description Questionnaires. The Japanese Journal of Ergonomics 41(5), 297–305 (2005)
Non-intrusive Human Behavior Monitoring Sensor for Health Care System Noriyuki Kushiro, Makoto Katsukura, Masanori Nakata, and Yoshiaki Ito 5-1-1 Ofuna Kamakura Kanagawa, Japan [email protected], [email protected], [email protected], [email protected]
Abstract. This paper presents a non-intrusive human behavior-monitoring sensor for health care system, especially for elderly person. The sensor detects operation of appliances with thorn like peak of electrical current generated by their working, and identifies patterns of daily residents’ behavior based on the correlation of operating appliances. The sensor reduces the system cost by avoiding installation of massive sensors and keeps residents’ privacy without intrusion of their private space. The human behavior-monitoring sensor is implemented by utilizing an algorithm with a wavelet transform method and is installed in five real residences for a couple of weeks. Accuracy of detecting operations of appliances and identifying life patterns are estimated through the field test.
1 Introduction The ratio of over 65 years people of the population in Japan will increase 31.8% in 2030 and reach 40.5% in 2055 [1]. About 80% of elderly persons are in a good health and about 20% of households are living in an old couple or alone independent from their family [2]. Reflecting these social conditions, there is a growing interest to introduce a health care system into a residence, especially for elderly persons. A human-behavior monitoring sensor, which detects anomaly of behavior of residents and identifies abnormal conditions of their health, is an indispensable technology for the system. Some systems have already succeeded in detecting illness or symptom of dementia of the resident with sensing data [3]. Most of existing systems [3-4] use video systems and/or a lot of occupancy sensors. However, the privacy is invaded by video systems and the cost is increased by installation of massive sensors. Both privacy invasion and system cost hinder penetration of the system into the market. To address these issues, a non-intrusive human behavior-monitoring sensor is proposed in this paper.
2 Basic Approach To avoid the cost and privacy invasion issues, the human behavior-monitoring sensor identifies anomaly of behavior of the residence through the following two procedures: M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 549–558, 2009. © Springer-Verlag Berlin Heidelberg 2009
550
N. Kushiro et al.
Step1: The sensor, attached to a distribution board in each house without intruding private space, detects particular appliances by pattern matching of thorn like peak of electrical current generated by appliances’ working (Fig. 1). Step2: The sensor identifies residents’ behavior from a correlation of appliances’ operations without installing massive sensors. C u r r e n t a t d is tr ib u tio n b o a rd
L ife p a tte r n sen sor
P o w e r D is tr ib u ti o n b o a r d P o w e r li n e
c o m b in e d Li gh t A /C
D e te c t th e o p e ra tio n o f th e a p p l i a n c e s . E a c h a p p l ia n c e h a s u n i q u e c u r r e n t p a tte r n .
In du ctio n co o ke r
Fig. 1. Non-intrusive human-behavior monitoring sensor
Few attempts have so far been made at detecting electrical events with pattern matching of electrical current of appliance using. Hart [9] studied the algorithm to detect appliances from changes of power consumption. Patel [10-11] advanced the algorithm to detect electrical events by high frequency electrical noise. The method utilizes high frequency noise appeared on the power line as features for pattern matching. Those two approaches are focusing on to catch electrical events, such as turning on or off of appliances. The issue is that these methods are hard to detect appliances running anytime like a refrigerator. Onoda’s method [5] detects appliances with odd-order harmonic currents with fast Fourier-transform (FFT) and support vector machine [8]. The method solves the above-mentioned issue, however, the method should learn exponential number of sets corresponding to the combinations of appliances, because the harmonic currents are combined if the multiple appliances run at the same time (Fig.1). It requires much time for learning the sets. We propose a new method to reduce “time for learning” by addressing “thorn like peaks”, which are generated by each appliance.
3 Appliance Detection Method The human behavior-monitoring sensor uses small thorn-like peaks of the current as a feature for pattern matching. The feature is stable even though multiple appliances are running. The thorn-like peaks a1, a2, b1 and b2 of the waveform A and B keep their positions on the time axis even if they are accumulated into the waveform like A+B (Fig.2). Therefore, the sensor reduces learning process for exponential number of sets corresponding to the combinations of appliances. The sensor uses the wavelet transform [6-7] method for extracting features (frequency and position) from the thorn like peaks. The wavelet transform is available for resolving the frequency and position of the peak into the multi-resolution levels values as wavelet coefficients and scaling coefficients. The wavelet transform method
Non-intrusive Human Behavior Monitoring Sensor for Health Care System
551
is suitable for implementing in an embedded hardware because it requires little computational complexity and memory for calculation. a1 b1
b2 a2 Waveform A + B Waveform A (current of Appliance A )
Half of voltage circle
Time
Waveform B (current of Appliance B)
Fig. 2. Basic principle for appliances detection
The sensor identifies operation of appliances based on the three features: frequency, level and position of the peaks. The prototype of the sensor consists of a laptop computer, an oscilloscope (20kHz sampling rate), current transformers and a voltage transformer. The high-pass filter selectively extracts thorn-like peaks from about 0.5 kHz to 10kHz from domestic power line. Sampling rate 20KHz Oscilloscope
Personal computer
Voltage sensor PT Sensor
High pass Filter
( cut off 500Hz) Electrical current sensor
Fig. 3. Prototype of the human behavior-monitoring sensor
3.1 Algorism for Appliances Detection Appliances detection is performed through the following three processes: Process1: Measurement of thorn like peak Process2: Calculation of the features of the peaks Process3: Identification of appliance from the features with pattern matching Process1: Measurement of Thon Like Peaks. Fig. 4 shows waveform of an induction cooker and an air conditioner (A/C). The high-pass filter extracts feeble features (thorn like peaks) of higher harmonics by removing the strong ingredient of the supply frequency.
552
N. Kushiro et al. 5 4 3 2 1 ] A [ 0 I -1 -2 -3 -4 -5
200 Air Conditioner
150 100 50 0
1
Voltage
-50
] V [ V
-100 -150
1 voltage cycle
-200
0 deg
360 deg Induction Cooker
Fig. 4. Example of waveform for induction cocker and air conditioner (Cut off frequency 500Hz high pass filter are inserted)
Process2: Calculation of the Features of the Peaks. The wavelet transform method [6] is applied for calculating features (frequency, level and position). MODWT (Maximal Overlap Discrete Wavelet Transform) [7] is utilized as a method for calculation because of the requirement to handle discrete value elicited by the prototype of the sensor. The wavelet transform resolves the frequency and position of the peak into the multi-resolution levels values as wavelet coefficients and scaling coefficients (Fig.5). Induction cooker 1.5
W1
1
High frequency
0.5 ] A [ I
Level
W2
0 -0.5
1
0 deg
-1
360 deg
-1.5
Intensity W4
phase
Low frequency
Fig. 5. Results of MODWT for induction cocker
For the moment, let us look closely that how the wavelet transform separates the peaks into features. The left side of Fig. 5 shows the electrical current of the induction cooker. The wavelet transform separates the peaks into the different frequency, shown at the right in Fig. 5. The above right shows high frequency of the peaks and the lower shows low frequency of the peaks. A sharp peak appears in the higher frequency portion and blunt peak appears in the lower frequency portion. The peaks are separated into the different dimensions, called “level L”. Wavelet coefficient X in level L is described in X [t , L] = W ( x[t ], L) , where x[t] is current in time t. Then, binarized X[t,L] by the threshold value σ for identifying the position of the peak.
Non-intrusive Human Behavior Monitoring Sensor for Health Care System
553
X [t , L, σ] = f ( X [t , L], σ)
⎧1 ⎪0 ⎪ f ( X [t , L],σ ) = ⎨ ⎪1 ⎪⎩0
( X [t , L] ≥ σ ) ∧ (σ ≥ 0)
( X [t , L] < σ ) ∧ (σ ≥ 0) ( X [t , L] ≤ σ ) ∧ (σ < 0) ( X [t , L] > σ ) ∧ (σ < 0)
σ is varied in the range [–4.0, -2.0, -1.0, -0.5, -0.2, 0.2, 0.5, 1.0, 2.0. 4.0]. The range of threshold value σ is determined empirically from the prior experiment. An example of the results for the induction cocker is shown in Fig.6. binary wavelet coefficients wavelet coefficients 0.4
0.2
0
ex: threshold = 0.05
-0.2
-0.4
Binalize the wavelet data with ten thresholds.
-4.0, -2.0, -1.0, -0.5, -0.2, 0.2, 0.5, 1.0, 2.0, 4.0
Fig. 6. Results of wavelet coefficients for induction cocker
Process3: Identification of Appliance by Pattern Matching of the Features. The sensor performs the pattern matching after the calculation of the features of the peaks. The features for targeted appliances are stored at a prioritized leaning step. Appliances are identified by pattern matching between learnt feature Tm [t , L, σ] and measured features X [t , L, σ] (Fig.7). M ( X [t , L, σ], Tm [t , L, σ]) if (Tm [t , L, σ] = 1) ∧ ( X [t , L, σ] = Tm [t , L, σ]) ⎩0 other ⎧
= ⎨1
Induction cooker Measured electrical current
• Matching 1
0
level = W3
Integration of the result
threshold = 0.05 Calculation of the matching points
matching points = 0.7
ON or OFF
Fig. 7. Pattern matching
The above-mentioned matching calculation is repeated for all “t” and all threshold value “σ”. The number hm (the number of coincidence) is shown as:
554
N. Kushiro et al.
hm ( L) =
⎛
∑ ⎜⎜⎝ ∑ (M ( X [t, L, σ],T σ
m [t , L, σ])
t
⎞
)⎟⎟ ⎠
(1)
The total number used for the judgment Sm is calculated as: s m ( L) =
⎛
∑ ⎜⎜⎝ ∑ (T σ
⎞
)⎟⎟
m [ t , L , σ]
t
⎠
(2)
Pm (the rate of coincidence) is calculated for every level L of Wavelet. Pm is calculated as: p m ( L) =
h( L ) s ( L)
(3)
If Pm is over or equal to the threshold λm for all the level L, then the appliance is regarded as “ON”. If Pm is under threshold λm, the appliance is regarded as “OFF”. The status Sm (the status of appliance “m”) is shown as: ⎧⎪ON S m = g ( pm [ L], λ m [ L]) =⎨ ⎪⎩OFF
if
∀
L,
pm [ L ] ≥ λ m [ L ]
Other
(4)
The value of λm is determined empirically from the prior experiment.
4 Life Pattern Identification The sensor identifies residents’ behavior with correlation and/or sequence of residents’ usage of appliances. It is assumed that the residents’ behavior is related to specific appliances. For example, the residents frequently use an induction cooker or a microwave oven when they prepare their meal and use a vacuum cleaner when they clean up their room. 4.1 Relation between Appliances and Human Behavior Appliances, which are used in each life event, are surveyed with questionnaires for 26 peoples, who were selected randomly from teenage to 70 years old persons. In the questionnaire, typical events occurred in their daily life, and appliances they ordinarily use, when they execute each event, are questioned. As the results, about 40 events and 30 appliances are elicited from the inquiries. 40 events are classified into 6 categories shown in Table 1. Table 1 shows the strength of relation between life events and use of appliances. From the view points of detecting particular life event from others, appliances, like lighting are not significant, but appliances, like oven range, IH cocker, vacuum cleaner and washing machine etc., are significant. Lightings are used at all kinds of life events, so that it cannot be connect to a particular life event. We select four kinds of appliances: air conditioner, oven range, IH cocker and vacuum cleaner, to identify the following significant events: sleep, meal and cleaning in this paper. These events are significant for detecting anomaly of behavior of residents and identifies abnormal conditions of health.
Non-intrusive Human Behavior Monitoring Sensor for Health Care System
555
Table 1. Ratio of appliances using for each life pattern Category Event
Light Air cond. Television Oven Range IH Cocker Rice cocker Vac. CleanerWashing Mach Dryer Venchlator 85.4 56.3 40 4.2 2.1 0 0 2.1 0 2.1 2 Meal 60.6 28.9 31.5 52.6 60.5 44.7 0 0 0 23.4 3 Bath/Wash 78.9 6.1 6.1 0 0 0 0 0 36.4 24.2 28.6 0 0 0 0 0 0 92.9 0 0 4 Washing 41.7 0 0 0 0 0 100 0 0 0 5 Cleaning 6 Others 71.2 37.3 25.4 1.7 0 0 0 1.7 0 1.7 1 Sleep
5 Evaluation of the Sensor We have evaluated the sensor for a couple of weeks in five residents (F1-F5 in Fig.8). The sensor learned the current peaks for four target appliances: vacuum cleaner, induction cooker, microwave oven and air conditioner (A/C). Accuracies for detecting appliances were evaluated in the following two cases: single operation of each appliance (Unit test) and multiple operations of appliances (Combination test) in F1-F5 residents. In Fig.8, the left upper graph shows the schedule of the field test for five residents. 08/1 F1 F2 F3 F4 F5
08/2
distribution board
08/3
F1
prototype of the sensor
F2
Fig. 8. Field test for the non-intrusive human behavior-monitoring sensor
5.1 Unit Test The sensor was attached to the distribution board in five houses (Fig.8). The four appliances were installed dispersedly in each house. There were some other appliances (e.g. lightings and refrigerator, television) besides the four target appliances in each house. Each appliance was turned on and off 25 times for the unit test, and evaluate whether the sensor detected correct appliances or not. As a result, the sensor shows 100% accuracy (Table.2).
556
N. Kushiro et al. Table 2. Results of the unit and combination tests
5.2 Combination Test The appliances were turned on and off on scenarios shown in Fig.9 to evaluate the case of multiple appliances working in five houses. As a result, the sensor shows 9599% accuracies in five fields (Table 2). We confirmed that the sensor has enough capability for detecting appliance both for a single operation and multiple operations of appliances. Fig.9 shows example of the results in field 1 and field 2. There are few errors to detect appliances in the combination evaluation test. The reason for these errors is confirmed in detail as the followings. One error happened in the air conditioner (A/C). A/C requires a few minutes for stand-by before starting. Electrical current of A/C in stand-by mode is too small to detect A/C by the sensor. The other error happened in detecting the induction cooker. A/C generates strong and wide spectrum noise when it starts after stand-by mode, then the sensor missdetected the induction cooker during A/C starting up. But the error of detection was recovered in a short time after the A/C runs at stable mode. The redundancy is an advantage of our proposed algorithm.
field1 Detection Error for A/C
field2 Detection error for induction cooker
Fig. 9. Scenario of the combination test
5.3 Life Pattern Monitoring The sensor monitored life pattern for a week in each residence. Interviews were executed for residents during the field test to grasp their life pattern concurrently. Fig.10 shows examples of the results of life pattern monitoring both for a weekday and a weekend in field 2 and field 3. Significant events, which identify the life
Non-intrusive Human Behavior Monitoring Sensor for Health Care System Weekend
Weekday Microwave Induction Hour A/C cooker Cleaner oven
22 23 0 1 2 3 4 5 6 7 8 9 10 11 12
ON ON ON OFF OFF OFF OFF OFF ON ON ON OFF OFF OFF OFF
OFF ON OFF Diner OFF OFF Sleeping OFF OFF OFF ON OFF Breakfast OFF OFF OFF OFF OFF
OFF ON OFF OFF OFF OFF OFF OFF ON OFF OFF OFF OFF OFF OFF
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF
Microwave Induction oven Hour A/C cooker Cleaner
22 23 24 1 2 3 4 5 6 7 8 9 10 11 12
OFF ON ON ON ON OFF OFF OFF OFF OFF OFF ON ON ON ON
OFF ON ON OFF Diner OFF OFF OFF OFF Sleeping OFF OFF OFF OFF ON OFF Breakfast OFF
OFF ON ON OFF OFF OFF OFF OFF OFF OFF OFF OFF ON OFF OFF
Weekday
Weekend
Microwave Induction oven cooker Cleaner Hour A/C
OFF OFF OFF OFF OFF OFF OFF ON ON ON OFF OFF OFF OFF OFF
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF Sleeping OFF OFF OFF OFF ON OFF ON OFF BreakfastON OFF OFF OFF OFF OFF OFF OFF OFF Lunch ON OFF
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF
field 2
ON Detectiong the appliance OFF none
22 23 0 1 2 3 4 5 6 7 8 9 10 11 12
557
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF ON ON OFF Cleaning OFF OFF
Microwave Induction Hour A/C oven cooker Cleaner
22 23 24 1 2 3 4 5 6 7 8 9 10 11 12
ON OFF OFF OFF OFF OFF OFF OFF ON ON OFF OFF OFF OFF OFF
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF Sleeping OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF ON OFF ON ON OFF OFFBreakfast OFF OFF OFF OFF OFF OFF OFF OFF OFF ON Lunch OFF ON ON OFF
field 3
Fig. 10. Results of the field test
patterns, such as sleep and meal, are identified from the results. The differences of the life pattern between weekday and weekend are obvious, both events in the weekend are later than the weekday. These results were checked against the results of the interview for each resident. Thus, we confirmed that the sensor succeeded in identifying the life patterns, sleep, meal and cleaning, by utilizing history data of four kinds of appliances using.
6 Conclusion This paper describes a non-intrusive human-behavior monitoring sensor for a health care system. The sensor detects operations of appliances from electrical current generated by their operating, and identifies life patterns of daily residents’ behavior based on the correlation and/or sequence of operating appliances. The sensor reduces system cost by avoiding installation of massive sensors and keeps residents’ privacy without intrusion of the house. We focused on the thorn-like peaks of the electrical current for detecting electrical appliances. The sensor is implemented by the algorism based on the wavelet transform method and installed in five real houses for a week. Accuracy of the identifying appliance detection and the life pattern were evaluated through the field test.
558
N. Kushiro et al.
As the results, the followings are confirmed: (1) The sensor result 100% accuracy at the unit test and higher than 95% accuracy at the combination test. (2) The life pattern of the resident is identified from the log of the appliances. There is room for further investigation for the study. For example, we should increase target appliances for identifying the life pattern in detail and should install the sensor in various homes for a long time. Concurrently, we are starting to implement the sensor on the low-cost embedded hardware for accelerating penetration of practical use of the system (Fig.11). In near feature, we will install the sensor into the residence for elderly people and try to identify anomaly of their life pattern.
Fig. 11. The human behavior-monitoring sensor on the embedded hardware
References 1. Ministry of Health, Labour and Welfare, http://www.mhlw.go.jp/wp/hakusyo/kousei/08/dl/04.pdf 2. Ministry of Health, Labour and Welfare, http://www.mhlw.go.jp/toukei/ saikin/hw/k-tyosa/k-tyosa05/1-1.html 3. Iketani, K., et al.: The detection situation of the elderly people by infrared sensor, http://www.sen.jst.go.jp/result/result_h17/Sato.html 4. Matsuoka, K.: Aware home understanding life activities. In: Proceedings of 2nd International Conference On Smart Homes and Health Telematic, ICOST 2004, pp. 186– 193. IOS press, Amsterdam (2004) 5. Onoda, T., Murata, H., Ratsch, G., Muller, K.-R.: Experimental analysis of support vector machines with different kernels based on non-intrusive monitoring data Neural Networks. In: IJCNN 2002, vol. 3, pp. 2186–2191 (2002) 6. Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial Mathematics (1992) 7. Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis. Cambridge University Press, Cambridge (2000) 8. Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000) 9. Hart, G.W.: Residential energy monitoring and computerized surveillance viautility power flows. IEEE Technology and Society Magazine 8(2), 12–16 (1989) 10. Patel, S.N., et al.: Detecting Human Movement by Differential Air Pressure Sensing in HVAC System Ductwork: An Exploration in Infrastructure Mediated Sensing. In: Indulska, J., Patterson, D.J., Rodden, T., Ott, M. (eds.) PERVASIVE 2008. LNCS, vol. 5013, pp. 1–18. Springer, Heidelberg (2008) 11. Patel, S.N., et al.: At the Flick of a Switch: Detecting and Classifying Unique Electrical Events on the Residential Power Line. In: Krumm, J., Abowd, G.D., Seneviratne, A., Strang, T. (eds.) UbiComp 2007. LNCS, vol. 4717, pp. 271–288. Springer, Heidelberg (2007)
Impact of Healthcare Information Technology Systems on Patient Safety Byung Cheol Lee1 and Vincent G. Duffy1,2,3 1 School of Industrial Engineering Regenstrief Center for Healthcare Engineering 3 Agricultural & Biological Engineering Purdue University, 315 N Grant St, West Lafayette, Indiana, 47906, USA [email protected], [email protected] 2
Abstract. Even though healthcare information systems have been introduced as a viable solution for reducing adverse drug events or medical errors, the current adoption rate is low and impact of system on patient safety and quality of care is not established well. To solve this problem, a new research framework with interdisciplinary approaches is suggested. The framework is based on two major characteristics of a healthcare IT system: effectiveness and efficiency. The former connects to patient safety and quality of care, and the latter is related with resources and design of the system. The framework is mainly grounded in human factors engineering, and includes psychology, systems and safety engineering, and an information systems approach. Keywords: Healthcare information system, Patient safety, Interdisciplinary approach, Healthcare Research framework.
1 Introduction Since the Institute of Medicine reported the striking result that medical errors contributed to between 44,000 and 98,000 deaths per year, concerns about healthcare quality have significantly increased, and it has stimulated efforts to reduce medical errors and adverse drug events [1]. In spite of the efforts and attempts, still four percent of all patients experience iatrogenic injuries due to medication errors, and they result in increasing length of stay in hospitals [2]. Information technology (IT) was introduced as one of the solutions, and many studies have been conducted about benefits and usefulness of the IT system [3-5]. However, some side effects and negative outcomes from IT systems were also reported [4,6,7]. These suggest that not only system itself but also its implementation process and environment play a significant role in achieving the goals of a healthcare IT system. A majority of related research studies were mainly result-oriented, without considering much about a system process or human-system interaction. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 559–565, 2009. © Springer-Verlag Berlin Heidelberg 2009
560
B.C. Lee and V.G. Duffy
The successful implementation of a healthcare IT system should consider various factors, and it is difficult to harmonize them by one single approach. Thus, an interdisciplinary approach, which may include human factors engineering, system and safety engineering, psychology and information system research, is required to analyze impact of the system on patient safety [8-10]. The objective of the study is to suggest a new research framework for implementation of a healthcare IT system to improve patient safety and quality of care. 1.1 Characteristics of Healthcare Industry Generally, successful implementation of a new IT system in health care requires consideration of following characteristics of the healthcare industry. First, the healthcare industry has a fragmented nature. It means that its facilities are geographically scattered, and clinical procedure consists of various independent processes. Second, it has large volume of transactions. Considering the number of patients in hospitals and their treatments, we can easily imagine the volume of transactions. Third, current hospitals are pursuing “Evidence based practice”, which is an integration of the best practice evidence with clinical expertise [11]. 1.2 Barriers to Implementation of Healthcare IT System Despite such potential benefits, the adoption rate of healthcare IT systems in the U.S. is fairly low. According to recent national survey of physicians about electronic health record (EHR) system, 83% of survey respondents did not have the system and only 4% reported having full functioning systems [12]. Low adoption rate can be partially explained by some barriers on implementation [3]. First, a redesign of workflow, which is based on the new system, is required, and it should be done incrementally to minimize the reluctance of adoption. A second problem is customization. One commercial system cannot be the panacea for all healthcare facilities and implementation of healthcare IT systems requires an customization process for their environment. Third, lack of financial incentive is another barrier. One study reveals that a military hospital needs at least $3.5 million for Computerized Physician Order Entry (CPOE) system installation [13]. Accordingly, support from state or federal government is required for increasing the adoption rate. A fourth barrier is the cost reimbursement structure for investment. Unlike other industries, the healthcare industry has a different investor (hospital) and beneficiary (patient) structure. This unique structure takes long a time to recover investment and hinders the adoption of new system. As a minor factor, data entry requirements from physicians and high-level nurses also can lead to some reluctance to implement new systems.
2 Theoretical Basis Some human factors engineers have tried to determine the impact of healthcare IT in several ways [14-16]. To provide systematic structure of technology adoption, attitude
Impact of Healthcare Information Technology Systems on Patient Safety
561
theory from psychology can be considered. Among many attitude theories, the Technology Acceptance Model (TAM) is initially chosen because of its simplicity and applicability to information technology systems [17]. 2.1 Technology Acceptance Model (TAM) TAM suggests that users’ decisions about acceptance of new technology or systems are dependent on a number of factors [18]. Among them, two most important factors are “Perceived usefulness” (PU) and “Perceived ease-of-use” (PEOU). The relationship between major constructs is shown in following figure [19].
Fig. 1. Technology Acceptance Model [17]
However, clear-cut definitions of PU and PEOU are difficult to establish, and this confusion results in ambiguous relationship between them and other impacting factors [15]. The other limitation of TAM is incompleteness and lack of practical application as fundamental theory [20-22]. Despite those problems, TAM has had a huge impact on development of information system research. It also provides an integrated theoretical basis as a compact and simple model [23]. 2.2 Effectiveness vs. Efficiency of a Healthcare IT System Effectiveness and efficiency are two important measures to evaluate the impact and satisfaction of a new system, and are directly related with the adoption of the system. Conceptually, effectiveness in healthcare can be explained by quality of care, and it can be measured by the number of medical errors. Efficiency is closely related to practitioners’ resources and utility in the hospital, and response time from the system. Considering TAM, PU and PEOU can be respectively mapped with effectiveness and efficiency. 2.3 System and Safety Engineering Approaches The publications from National Academies of Engineering (NAE) and Institute of Medicine highly recommended systems engineering approaches in healthcare delivery
562
B.C. Lee and V.G. Duffy
in order to improve the quality [24]. The systems engineering approach is concerned with coordination, synchronization, and integration of various components from complex systems using the application of mathematical modeling and various analysis techniques [25]. There are many system approaches that can be applied to the healthcare domain. Patient-staff- machine interaction models and medical work process models are some examples [26]. Another is System-of-Systems (SoS). SoS provides a systematic perspective on problem definition and is effective when a system has a multi hierarchical structure [27]. The other engineering approach that has potential to be utilized in the healthcare domain is safety engineering. Two safety-engineering tools that can be valuable in improving patient safety are Root Cause Analysis (RCA) and Failure Mode and Effective Analysis (FMEA) [28]. Another useful tool from safety engineering is Human Reliability Analysis (HRA). This technique has been developed for nuclear power industry to minimize the possibility of human errors [29].
3 Hypothesis There are two hypotheses to be tested in this study. 1. Effectiveness is connected to “Perceived usefulness” in TAM, and it also impacts patient safety and quality of care in healthcare setting. The first hypothesis is about structure between the concept, measurement, and perspective. According to the definition of “Perceived usefulness”, effectiveness is a corresponding concept in the healthcare domain. Specifically, as mentioned earlier, the concept of effectiveness and “Perceived usefulness” can be applied to patient safety and quality of care and can be measured indirectly by the number of errors or Adverse Drug Events (ADE’s). As these concepts and outcomes directly affect patients, effectiveness can be related to patient perspectives of healthcare IT impact. 2. Efficiency is closely related to “Perceived ease of use” in TAM and it affects workflow and workload. The second hypothesis has a similar structure to the aforementioned one. “Perceived ease of use” can be measured indirectly by time, and other resource consumption of workflow or amount of workload. Well-established Human Computer Interaction (HCI) methodologies are possible tools for quantitative and qualitative measurement for this trait. These have potential to significantly impact healthcare practitioners’ performance.
4 Results The following figure describes the research framework for impact on the healthcare system and includes a patient safety perspective.
Impact of Healthcare Information Technology Systems on Patient Safety
563
Fig. 2. Research framework
The basic scheme from figure 1 is modified for application in a healthcare IT environment. Healthcare IT systems have two objectives, effectiveness and efficiency and is shown in Figure 2 that are common with other IT systems, and they are related to “Perceived usefulness” and “Perceived ease of use” in TAM, as explained in the previous section. Achieving high level of “Perceived usefulness” is a primary purpose of any IT system implementation. To measure the usefulness, both qualitative and quantitative approaches can be applied. Qualitative assessment can be conducted by interview with patients about their experience with IT system. A quantitative measurement can be achieved using systems and safety engineering approaches. For example, based on task analysis of workflow, possible medical/medication errors will be categorized, and simulation tools can provide the distribution and pattern of each distinctive error category [30]. The result will be fed back into the design stage of IT system development. Lower part of figure 2 is the efficiency part. Although efficiency can be related to expectation, usability testing can be for some efficiency measurements. Other than usability, task-fit on work process and log-on information of systems would be possible alternatives to test impact of efficiency of healthcare IT from a users’ perspective.
5 Conclusion A research framework for impact of Information Technology implementation on healthcare domain is developed. The framework is based on interdisciplinary approaches including systems and safety engineering, human factors engineering, psychology and information system research techniques. The analysis of relationships
564
B.C. Lee and V.G. Duffy
and mutual impacts between components may not only provide optimized conditions for successful system implementation and practical strategy to boost the adoption rate but also significantly contribute to improving patient safety and quality of care. As further research, empirical experiments are needed to validate the suggested research framework. Possibly, experiments can be divided into two parts. The first is related to measuring effectiveness of the IT system. Through an analysis of specific IT system data and healthcare practitioners’ work processes, a matrix can be developed and error profiles can be established. The other approach is measuring efficiency. Duration of each work process is major component for evaluating efficiency for work process and system. Based on this information, work and system efficiency can be measured and the result may be significantly correlated with the error distribution or profile in certain categories.
References 1. Kohn, L.T., Corrigan, J.M., Donaldson, M.S.: To Err Is Human: Building a Safer Health System. Institute of Medicine Committee on Quality of Health Care in America. National Academy Press, Washington (1999) 2. Cohen, M.: Medication errors. American Pharmacists Association, Washington, DC (2007) 3. Doolan, D.F., Bates, D.W.: Computerized Physician Order Entry Systems In Hospitals: Mandates and Incentives. Health Affairs 21(4), 180–188 (2002) 4. Weiner, M., Gress, T., Thiemann, D.R., Jenckes, M., Reel, S.L., Mandell, S.F., Bass, E.B.: Contrasting Views of Physicians and Nurses about an Inpatient Computer-based Provider Order-entry System. J. AM. Med. Inform. Assoc. 6, 234–244 (1999) 5. Koppel, R., Metlay, J.P., Cohen, A., Abaluck, B., Localio, A.R., Kimmel, S.E., Strom, B.L.: Role of Computerized Physician Order Entry Systems in Facilitating Medication Errors. JAMA 293(10), 1197–1203 (2005) 6. Han, Y.Y., Carcilo, J.A., Venkataraman, S., Clark, R.B., Watson, R.S., Nguyen, T.C., Bayir, H., Orr, R.A.: Unexpected Increased Mortality After Implementation of a Commercially Sold Computerized Physician Order Entry System. Pediatrics 116(6), 1506– 1512 (2005) 7. Ash, J.S., Berg, M., Coiera, E.: Some Unintended Consequences of Information Technology in Health Care: the Nature of Patient Care Information System-related Errors. J. Am. Med. Inform. Assoc. 11, 104–112 (2004) 8. Duffy, V.G.: Improving Efficiencies and Patient Safety in Healthcare through Human factors and Ergonomics (working paper) 9. Cacciabue, P.C., Vella, G.: Human Factors Engineering in Healthcare Systems: The Problem of Human Error and Accident Management. International Journal of Medical Informatics (2008) (in press) 10. Saleem, J.J., Patterson, E.S., Militello, L., Render, M.L., Orshansky, G., Asch, S.M.: Exploring Barriers and Facilitators to the Use of Computerized Clinical Reminders. J. of Medical Inform. Assoc. 12(4), 438–447 (2005) 11. Hancoch, W.M.: Hospital Systems: Impacts on Cost and Quality, http://www.purdue.edu/dp/rche/events.php 12. DeRoches, C.M., Campbell, E.G., Rao, S.R., Donelan, K., Ferris, T.G., Jha, A., Kaushal, R., Levy, D.E., Rosenbaum, S., Shields, A.E., Blumenthal, D.: Electronic Health Records in Ambulatory Care – A National Survey of Physician. N. Engl. J. Med. 359(1), 50–60 (2008)
Impact of Healthcare Information Technology Systems on Patient Safety
565
13. Wilson, J.P., Bulatao, P.T., Rascati, K.L.: Satisfaction with a Computerized Practitioner Order Entry System at Two Military Health Care Facilities. AM. J. Health-Syst. Pharm. 57, 2188–2195 (2000) 14. Karsh, B.T.: Beyond Usability: Designing Effective Technology Implementation Systems to Promote Patient Safety. Qual. Saf. Health Care 13, 388–394 (2004) 15. Karsh, B.T., Beasley, J.W., Hagenauer, M.E.: Are Electronic Medical Records Associated with Improved Perceptions of the Quality of Medical Records, Working Conditions, or Quality of Working Life? Behavior & Information Technology 23(5), 327–335 (2004) 16. Wakefield, D.S., Halbesleban, J.R., Ward, M.M., Qiu, Q., Brokel, J., Crandall, D.: Development of a Measure of Clinical Information Systems Expectations and Experiences. Medical Care 45(9), 884–890 (2007) 17. Venkatesh, V., Davis, F.D.: A Theoretical Extension of the Technology Acceptance Model: Four Longitudinal Field Studies. Management science 46(2), 186–204 (2000) 18. Davis, F.D.: User Acceptance of Information Technology: System Characteristics, User Perceptions and Behavioral Impacts. Int. J. Man-Machine Studies 38, 475–487 (1993) 19. Davis, F.D.: Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly 13, 319–340 (1989) 20. Benbasat, I., Braki, H.: Quo Vadis, TAM? Journal of the Association for Information System 8(4), 211–218 (2007) 21. Goodhue, D.L.: Comment on Benbasat and Barki’s “Quo Vadis TAM” article. Journal of the Association for Information System 8(4), 219–222 (2007) 22. Bagozzi, R.P.: The Legacy of the Technology Acceptance Model and a Proposal for a Paradigm Shift. Journal of the Association for Information System 8(4), 244–254 (2007) 23. Venkatesh, V., Davis, F.D., Morris, M.G.: Dead or Alive? The Development, Trajectory and Future of Technology Adoption Research. Journal of the Association for Information System 8(4), 267–286 (2007) 24. Reid, P.P., Compton, W.D., Grossman, J.H., Fangjiang, G.: Building a Better Delivery System. National Academy Press, Washington (2005) 25. Kossiakoff, A., Sweet, W.: System Engineering Principles and Practice. Wiley, New York (2003) 26. Carayon, P., Friesdorf, W.: Human Factors and Ergonomics in Medicine. In: Salvendy, G. (ed.) Handbook of Human Factors and Ergonomics, pp. 1517–1537. John Wiley & Sons, Inc, New York (2005) 27. DeLaurentis, D.A., Crossley, W.A.: A Taxonomy Based Perspective for Systems of System Design Methods. In: Conference Proceedings for IEEE International Conference on systems, Man and Cybernetics, vol. 1, pp. 86–91 (2005) 28. Senders, J.W.: FMEA and RCA: the Mantras of Modern Risk Management. Qual. Saf. Health Care 13, 249–250 (2004) 29. Yang, C., Lin, C.J., Jou, Y., Yenn, T.: A Review of Current Human Reliability Assessment Methods Utilized in High Hazard Human-system Interface Deigns. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS (LNAI), vol. 4562, pp. 212–221. Springer, Heidelberg (2007) 30. Macal, M.C., North, J.M.: Tutorial on Agent-based Modeling and Simulation Part 2: How to Model with Agents. In: Proceedings of the 2006 winter Simulation Conference, pp. 73–83 (2006)
Patient Standardization Identification as a Healthcare Issue Mario Macedo1 and Pedro Isaías2 1
IPT-Escola Superior Tecnologia Abrantes, Rua 17 de Agosto de 1808, 2200-370 Abrantes, Portugal [email protected] 2 Universidade Aberta, Rua da Escola Politécnica, 141-147, 1269-001 Lisbon, Portugal [email protected]
Abstract. Healthcare organizations use information systems with several different types of data and user interfaces. The lack of standardization means loss of efficiency and effectiveness. It limits the expected quality of Healthcare services. Some difficulties for this standardization are known. However there are models that can respond to the complexity of this area of science and evolve with the development of knowledge. A problem which is common to several organizations is the lack of automatic identification of patients. Another one is how to solve the problem of having information duplicated in different databases. The purpose of this paper is to show the importance of the standardization of clinical data and the development of unique models of identification that will enable setting unique access keys and the interconnection between all the clinical data. The empowerment of systems that support clinical decision and the use of workflows for treatment plans that involve more than an organization of Healthcare will only be possible if they use standard models, open technologies and unique patient identification. Keywords: HER, Medical Guidelines, Healthcare Plan Workflow.
1 Background The prime objective of having a unique ID for identification of a patient and access to his/her clinical data is to avoid clinical records becoming sidelined and to ensure the correct corroboration among each individual’s data. Each individual’s historical records, along with those of his/her forebears, constitute essential background information for the evaluation of his/her state of health and the likelihood of future pathologies. The storage, integration and standardization of clinical data also make it possible to provide personalized healthcare. The supply of personalized clinical data makes it possible to make more accurate diagnoses and prescribe the treatment most suitable for each pathology and each individual. In order to assist with diagnosis it is possible to develop systems to assist in clinical decisions. There exist three levels of system to assist in clinical decisions. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 566–575, 2009. © Springer-Verlag Berlin Heidelberg 2009
Patient Standardization Identification as a Healthcare Issue
567
According to HL7 CDS Project Update, (2008) [1], those levels are information, rules and computer-interpretable guidelines. At the information level only information is provided. At the rules level alarms, data interchange and data validation become available. In addition, according to Shabo (2005) [2], having the possibility to include genetic data in the electronic record of clinical data for each patient increases the amount of knowledge on which to base health care provision decisions. According to that author Shabo (2007) [3], there are three essential hurdles in the way of complete recording of all of a patient’s clinical data: − Because of data protection legislation each hospital generates its own policies for data security and filing methods compatible with its preserving the privacy and confidentiality of clinical information. For this reason it is impossible for a patient who attends different hospitals on different days to have all his/her data integrated. − Another hurdle is a time-based one. It is a simple fact that an individual’s average life span is far greater than the maximum time that data can be/is stored, So, if an individual lives for 70 years it is very unlikely that the hospital will be able to keep records that long. − Another hurdle derives from the fact that, even if clinical terminology were all standardized between various Health Care Units, it would be extremely difficult to maintain semantic compatibility over a period of several years because the terminology itself is also in permanent evolution. To those three hurdles can be added the question of genetic data, which has evolved in structure and complexity at one and the same time as science itself has evolved. Those hurdles aside, it is self-evident that genetic information needs to be included in the electronic record of clinical data. The SNOMED standard already includes genetic terminology, thus opening the door to the creation of genetic data archetypes. For the HL7 standard a working group was formed to develop a limited model for the storage of chromosome data. That data is referenced by a set of metadata stored in a RIM platform (Reference Information Model). This model is still used in only a limited fashion to communicate data between hospitals and the pharmaceutical industry. OpenEHR works in this area but no defined genotype model as yet exists. Meanwhile another question has to be raised. If clinical data needs to be kept for a long time, and if it needs to retain all data concerning genetics, pathologies and treatment given to every individual what will the storage infrastructure need to be like? There will have to exist either distributed data bases or clinical data banks. The FEHR (Federation Electronic Health Record) concept. The domain of the data is another highly important aspect. What to be the nature and type of data to constitute the identification of an individual and what data quality frameworks will need to be put in place. Some types of data can identify a specific individual unequivocally, whereas other data are secondary or of less importance. Characterization and definition of models is rather complex.
568
M. Macedo and P. Isaías
Access to clinical data is limited to specific users. There will be a need for various levels of access interconnected with temporal windows. Access to personalized data will be available only for the purpose of providing care to the patient. Another consideration raised is whether only public entities shall have access to a patient’s data or whether, on the contrary, private entities will also have access to these data. Identification of those users permitted access to clinical data needs to be protected with secure authentication, and in no way to permit one user’s identification to be used by any other person. In addition, access to the system by non-identified users should not be possible. Legal protection relating to the use and communication of clinical data needs to prevent unauthorized use and transfer of data to third parties. As an example let us examine the case of prescriptions to each individual. From the medication prescribed it will be possible to deduce what each individual’s pathologies and their frequency of occurrence are. Is this information, which is available to pharmacies (chemist’s), actually protected? The storage systems for each individual citizen’s identification are also extremely significant in relation to the architecture of the entire system. Clearly, each patient’s identification will need to be stored in a central data base available to all players in the health system. However, if there are public entities, private entities and entitled entities what will need to be the nature of the central file identifying all users? There are writers who argue that clinical data should be de-identified. What this means is that after being used in a medical episode they should be removed from the individual identification of each person. But how and where would this function be carried out? In the event of it being necessary again to access the patient’s historical data what should the data personalization process be like?
2 Security The security has some dimensions like privacy and confidentiality, identity verification, users identification and authentication. These concepts can have different meanings. Privacy According to Kent (2002) [4], privacy is the right of an individual to decide for himself or herself when and on what terms his or her attributes should be revealed. According to Department of Health (2007) [5], Patient information is generally held under legal and ethical obligations of confidentiality. Information provided in confidence should not be used or disclosed in a form that might identify a patient without his or her consent. There are a number of important exceptions to this rule but it applies in most circumstances. Identity According to Kent (2002) [4], The identity of X according to Y is a set of statements believed by Y to be true about X.
Patient Standardization Identification as a Healthcare Issue
569
According to The Department of Health (2007) [5], Patient Identifiable Information includes name, address, full post code, date of birth, pictures, photographs, video, images, NHS number and anything else that may be used to identify a patient directly or indirectly. Identification According to Kent (2002) [4], is the process of determining to what identity a particular individual corresponds. According to The United Kingdom Parliament (n.d.) [6], citing The Data Protection Act 1998, personal data is defined as: Data which relate to a living individual who can be identified from those data, […] Authentication According to Kent (2002) [4], is the process of confirming an assert identity. The Patient identification and data archives should be compliant with all these issues. Our proposed model for a Federation of Electronic Health Record should include the necessary features to overcome these issues.
3 The Patient Identification Domain For any individual there exist several possible IDs. For example, NHS, Medicare, Health Care number, Identity Card, passport number, driving licence number, Inland Revenue, IRS number or even just a number generated for the specific purpose. There are, however, some considerations to be taken into account. The first question is that not all of the above IDs are available at the time of the individual’s birth. For this reason, only a code generated for each individual will act continuously and without fail throughout an individual’s life. The genetic code is, a priori, an element unique to, and permanently present in, every individual. The principal advantage of using the DNA code as a key to access each individual’s clinical data is that it is unique and works across all existing systems. In addition, analysis of gene mutations can help in the identification of pathologies or the likelihood of pathologies occurring. For these reasons, the use of genetic data to assist in clinical decisions is of the utmost importance. The HL7 organization has introduced a standard called Clinical Genomes Level 7, (Clinical Genomics, 2009) [5]. The model put forward by the HL7 includes a layer of associations between genotype and phenotype entitled Clinical Genomics Standard. The models for recording genetic data are somewhat more complicated than the archetypes for recording other clinical data. The main reasons for this are: − The quantity of data − The complexity of representing the DNA molecule and its variants − The semantic transcription of the genotype/phenotype association.
570
M. Macedo and P. Isaías
Accessibility to genetic data even makes it possible to develop genomic-oriented applications to assist in clinical decisions. These applications can possess parsers for identifying sequences of significant genes for any study taking place. The use of DNA data in the electronic records of clinical data represents an unprecedented advance in medicine and in the provision of medical care. It will be possible not only to identify patients unequivocally and access their entire history but also to take preventative action. It is even possible to observe genetic changes through systems based on artificial intelligence. According to Marko (2005) [8], the challenges of creating an HER that integrates an organization’s clinical record system with a biorepository and a genomic information system involve complex organizational, social, political, and ethical issues that must be resolved. In fact, if, on the one hand, it is going to be possible to analyze the likelihood of a patient succumbing to a particular illness, on the other hand, that patient’s privacy must be guaranteed lest society discriminate against certain individuals. According to Nakaya(2007) [9], The elemental techniques of the data collection platform are the information model, the ontology and the data format. According to this author, the Genomic Sequence Variation Markup Language (GSVML) is a Markup language and is the data exchanging format of genomic sequence variation data to use it mainly in human health. This norm should be standard in the near future.
4 Proposed Technologies The proposal model uses some technologies that should be compliant with standards and industry best practices. Communications The IETF (Internet Engineering Task Force) (n.d.) [10] develops norms and standards for communication on the Internet. The standardization documents are designated as RFC, Request for Comments. RFC 2821 defines the SMTP (Simple Mail Transfer Protocol) and RFC 2616 the http (Hypertext Transfer Protocol) RFC 3335 specifies how EDI (Electronic Data Interchange) messages can be transmitted securely via a peer to peer link. This standard, in addition, ensures communication of messages according to the protocols for Electronic Data Interchange, (EDI – either the American Standards Committee X12 or UN/EDIFACT, Electronic Data Interchange for Administration, Commerce and Transport), XML or other data used for business to business data interchange, (Request for Comments: 3335, Network Working Group,) (2002) [11]. This standard specifies several messages such as the format of the message delivery receipt with or without digital signature, the non-repudiation of receipt message, the format of the message envelope (MIME), with or without signature, and the body of the EDI message with or without cryptography. Using this technology it is possible to define a peer-to-peer archetype communication relationship. These archetypes can contain the clinical data necessary for the HER.
Patient Standardization Identification as a Healthcare Issue
571
Archetypes The word “archetype” comes from Greek and means “original pattern”. According to Soley (2004) [12], an archetype is a primordial thing or circumstance that recurs consistently and is thought to be a universal concept or situation. The concept of “archetype” defined in this way makes it possible to define business objects suitable for any and every activity. These business objects can be any kind of data model stereotype. Object-oriented (OO) information technology reflects the archetype application domain. In this way, an archetype model can be constructed and this model applied to cases with real data. The archetypes define for each type of data the various possible dimensions and methods available. Archetypes can even contain rules for coherence and interassociation. Archetypes also have the property of pleomorphism, which enables different instances of each archetype to be created. Archetype models are specified in UML (Unified Modeling Language) (2009) [13], language, for which several modeling tools exist. Some of these tools even enable UML models to be transposed into physical models. Even archetype patterns can be defined. An archetype pattern contains optional elements that can be implemented or not implemented. The name “pattern configuration” is attributed to each instance of an archetype pattern. Both well-formed and ill-formed configurations can exist. In order to avoid ill-formed configurations there has been created a set of rules to which the name “Pattern Configuration Rules” has been applied . According to Soley (2004) [12] a Pattern Configuration Rule is a formal language for expressing the rules for well-formed pattern configuration. Some party archetype patterns are standardized. For instance, ISO 3166 contains country codes and country names and ISO 5218 contains a representation of the human sexes. In the health area there exist two different approaches to information system architectures, HL7 (Health Level 7) (2009) [14] and OpenEHR (OpenEHR) (2009) [15]. Both approaches present both a model designed for object programming and a reference model. OpenEHR also puts forward a language called “Archetype Definition Language” for defining archetype models.
5 Proposed Model The correct registration, treatment and integration of clinical data are of utmost importance for the provision of health care. Integration of clinical data makes it possible to watch out for public health indicators and carry out epidemiological research and scientific investigation. It is of the greatest importance to develop systems that enable patients to be treated collaboratively and that simultaneously provide data for other levels of tactical, strategic and scientific management.
572
M. Macedo and P. Isaías
Fig. 1. Benefits of Patient ID Normalization, (Authors Proposal)
The model proposed is intended to create an integration framework for all the clinical data for each patient. Clinical data can be integrated into repositories called data pools. These data pools are in their turn filed in a data base called Master Patient Index where all data are stored. The de-identification process enables data to be depersonalized once each episode has been closed. In this way the data from each closed-episode Data Pool are guaranteed not to contain any data comprising personal information, However, via the Master Index data can be personalized. Access to de-personalized data is controlled by search filters that possess no authentication or access authority. Data relating to episodes still open is only available to be consulted by the service that opened the episode, and this authority can be passed on only if the patient has been transferred to another service. The policies relating to access and personalized data search procedures will be approved by a privacy and data protection commission, and will need to be relieved of authorization case by case. With this model the various actors involved in health care provision will be able to share data about each episode. Messages will have to be transmitted under AS1 or AS2 protocol with digital signature and data encryption. In this way authentication, confidentiality and interoperationality between the various information systems within each organization can be ensured. The Master Index will even act as a Federation of Electronic Health Record. This Master Index will control the relationship between the various keys, (DNA, NHS,
Patient Standardization Identification as a Healthcare Issue
573
Healthcare Service number, ID, passport number, and Tax Number) and for each system will establish which keys are necessary for indexing the various systems. In addition, it is proposed that there be created an onthology language which will set out the search rules to be enacted in order to ensure the citizen’s privacy and security of their personal data. Interoperationality among the various systems is ensured via communication protocols that allow online and offline communication between systems. At the same time encryption and authenticity of data must be guaranteed. The protocol proposed is AS1 on smtp. The advantage of this protocol is that it is an asynchronous message protocol in xml.
Fig. 2. The Proposed Model, (Authors Proposal)
This protocol can be used to communicate among systems of various technologies and in addition it employs a message technology, smtp, which is already well distributed around the market. When a patient is presented to the system, the Hospital Information System Queries the ID FEHR (Federation Electronic Health Record), to find an identification, an associate open episode and all clinical data related with the patient. The Patient Identification and data network are resolved with data mining algorithms. The identity resolution is intended to find who is who and create links between data that belongs to the same patient. The data used is demographic data and background clinical history. If there are some proximity of data attributes around a cluster centric it could be possible to say that all data belongs to the same patient. The relationship resolution is intended to find all correlation between the data of different patients. The clusters can be built using a data mining algorithm- Inside the cluster all the data that is less than a δ distance from the cluster center belongs to individuals relationships.
574
M. Macedo and P. Isaías
The type of relationships are clinical data such as − Pathologies and diagnostics − Drugs and treatments prescribed − Hospitals where patients were treated And demographic such as: − − − −
Nationality, Gender, date of birth and race Family relationships Living habitats Professions
The relevant clinical and demographic data are presented to the clinician as far as the treatment episode is open and would be uploaded to the data pool when the episode is closed. In the data pool there is a hash algorithm that processes a de-identification of clinical data. The process of de-identification is intended to overcome the privacy and confidentiality of clinical data. The clinical data primary key is substituted by a hash key data and can only by decrypted by master index algorithm. This master index algorithm is one of the functionalities of ID FEHR. The master index in ID FEHR can be addressed by all sorts of patient identification keys including genome coding, National Security Number, among others. Besides the master index keeps track of nearby identification data and coded primary key of data pool clinical records. The ID FEHR is also responsible for users’ authentication and retrieval onthologies. These onthologies are used each time a query of data pools is needed. When a patient does not belong to an ID FEHR a negotiation with other ID FEHR is initiated.
6 Conclusions The model proposed is founded on three fundamental aspects: − An architecture already well distributed around the market − Use of existing technology allowing interconnection of heterogeneous systems that incorporate privacy and security guarantees − Use of alternative search keys and onthologies with data access rules The reasoning behind this proposal is that it is inconceivable to render obsolete the many existing systems, all with their own different characteristics, and to develop one single, global information system. Additionally, the fact that only one single data repository exists potentially increases the vulnerability of the data. Development via existing technologies also potentially reduces the development lead-time necessary and reduces the cost.
Patient Standardization Identification as a Healthcare Issue
575
Further research is required to find out: How much dada will be needed to store in the Master Index to identify unequivocally a patient with a high degree of confidence? What algorithm should be implemented to refine different patient matches?
References 1. HL7 CDS Project Update:Virtual Medical Record (vMR). In: Clinical Genomics (2008), http://www.hl7.org/library/committees/clingenomics/HL7%20Pho enix%20-%20May%2008%20-%20CDS%20Genomics%20Jt%20Session.pdf 2. Shabo, A.: The Implication of Electronic Health Records for Personalized Medicine. Future Medicine (2005), http://www.hhs.gov/healthit/documents/ Tab3part2Implications103106.pdf 3. Shabo, A.: Health Record Banks:Integrating clinical and genomic data into patient-centric longitudinal and cross-institutional health records. Future Medicine (2007), http://www.futuremedicine.com/doi/pdf/10.2217/ 17410541.4.4.453?cookieSet=1 4. Kent, S.T., Millet, L.I.: IDs-Not That Easy: Question About Nationwide Identity Systems. Committee on Authentication Techonologies and Their Privacy Implications, National Research Council (2002) 5. The Department of Health: Patient confidentiality and Access to Health Records (2007), http://www.dh.gov.uk/en/Managingyourorganisation/Informationp olicy/PatientConfidentialityAndCaldicottGuardians/DH_4084181 6. The United Kingdom Parliament (2009), http://www.parliament.uk 7. Clinical Genomics. HL7 (2009), http://www.hl7.org/Special/committees/clingenomics/docs.cfm 8. Marko, P.G., Wine, M., Joanne: Genomic Information Systems and Electronic Health Records (EHR). In: Virtual Medical World (2005), http://www.hoise.com/vmw/05/articles/vmw/LV-VM-10-05-1.html 9. Nakaya, J.: Clinical Genome Informatics (CGI) and its Social. IJCSNS International Journal of Computer Science and Network Security 7(1) (January 2007), http://paper.ijcsns.org/07_book/200701/200701A08.pdf 10. The Internet Engineering Task Force (2009), http://www.ietf.org/ 11. Request for Comments: 3335, Network Working Group, MIME-based Secure Peer-toPeer. In: Network Working Group (2002), http://www.ietf.org/rfc/rfc3335.txt 12. Soley, R.M.: Enterprise Patterns and MDA. Addison-Wesley, USA (2004) 13. Unified Modeling Language. UML Resource Page (2009), http://www.uml.org/ 14. Health Level 7 (2009), http://www.hl7.org/
A Proposal of a Method to Extract Active Ingredient Names from Package Inserts Keita Nabeta1, Masaomi Kimura2, Michiko Ohkura2, and Fumito Tsuchiya3 1
Graduate School of Technology, Shibaura Institute of Technology, 3-7-5 Toyosu, Koto Ward, Tokyo 135-8548, Japan [email protected] 2 Shibaura Institute of Technology, 3-7-5 Toyosu, Koto Ward, Tokyo 135-8548, Japan {masaomi,ohkura}@sic.shibaura-it.ac.jp 3 Tokyo Medical & Dental University. 1-5-45, Yushima, Bunkyo Ward, Tokyo 113-8549, Japan [email protected]
Abstract. Recently, medical accidents caused by drugs have attracted attention in Japan. To prevent this from affecting medical experts, it is effective to improve the ‘safety of drug usage’ via information systems. Although package insert information in SGML format is provided by PMDA, it is not easy to utilize this information for such systems because they lack a suitable structure to provide direct access to the required information. In this study, we propose certain methods to extract the active ingredient name from such unstructured SGML data. Keywords: Medical safety, Package inserts, Keyword extraction, Edit distance.
1 Introduction Recently, much attention has been attracted to measures intended to prevent medical accidents caused by drugs. To do so, it is considered effective to improve the safety of drug usage via information systems. Since the late 1990s, some attempts have been discussed to utilize information described in ethical drug package inserts for systems to ensure accuracy of prescription. The package inserts are exclusive legal documents used to describe detailed information for each drug, including the composition, efficacy, dosage and cautions. Following such attempts, the PMDA (Pharmaceuticals and Medical Devices Agency) [1] has provided package inserts information in SGML (Standard General Markup Language) format since 2003. However, our investigation [2] indicated that SGML data are not provided for certain drugs and moreover, that their structure is unsuitable for direct access to the information required for M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 576–585, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Proposal of a Method to Extract Active Ingredient Names from Package Inserts
577
confirmation. Additionally, several studies clarified two problems, one of which is the lack of standardization for the terms used in package inserts [3] and another that the SGML structure only defines the layout of the text data [4]. These problems make it difficult to retrieve and extract information from the SGML data, although the latter comprises text files, which are suitable for display and allowing information of package inserts to be accessed via a computer system. This makes it necessary to establish a drug information database suitably structured to leverage information in systems ensuring medical safety. To collect information which should be stored in the database, we discuss the method to extract information, such as active ingredient names, from the SGML data. The reason why we focus on active ingredient names is that, although they are important drug information elements, no list of the same for all drugs has yet been established in Japan. The purpose of this study is to analyze the descriptive structure of items containing active ingredient names and propose certain methods to extract the active ingredient name from the SGML data.
2 Acquisition Object SGML Data We retrieved and downloaded the SGML data from the PMDA Web site, utilizing the ‘YJ code’ as a key, which is a drug identification code and included in the ‘standard drug master (9/30/2007)’ provided by MEDIS-DC (The Medical Information System Development Center) [5]. Additionally, we excluded Chinese herbal medicines, crude drugs and test drugs, because the manner of their description in the drug package inserts differs from other drugs and downloaded 10186 SGML files.
3 Investigation of Items Describing Active Ingredient Names in Package Inserts In package inserts, active ingredient names are described in items including the ‘generic name’, ‘composition’ and ‘physicochemical properties of the active ingredient’. We investigate and compare the descriptions in the items in order to identify the optimal way to extract active ingredient names from the same. 3.1 Data Structure and Content of Each Item The ‘generic name’ is the drug name, which consists of an active ingredient name and a dosage form, such as ‘etilefrine hydrochloride tablet’, where ‘etilefrine hydrochloride’ is the active ingredient name and ‘tablet’ the dosage form. Since the data structure containing a generic name is simple, as shown in Fig. 1, it might be easy to extract an active ingredient name by excluding the name of the dosage form from the string contained in the structure. We must, however, note that we do not have a complete list of the dosage form and it is difficult to exclude all of them.
578
K. Nabeta et al.
Etilefrine hydrochloride tablet
塩酸エチレフリン錠
<detail>
Fig. 1. Sample of generic name
‘Composition’ is an item describing detailed information of the composition of a medicine, including ingredient names. There is also a problem of the potential to have various structures, such as Fig. 2 and 3, the former of which expresses each piece of information and its item name by assigning them ‘item’ and ‘detail’ tags, though the latter describes all of the information in the form of natural language sentences via a ‘detail’ tag. Obviously, the latter case hampers efforts to extract objective information via simple processing. The ‘physicochemical properties of the active ingredient’ describe the Japanese ingredient name, its English name, chemical formula and chemical property. Active ingredient names are usually described in a ‘detail’ tag right behind its ‘item’ tag containing the string, ‘ ’ (Standard name).
一般名
Active ingredient name
Etilefrine hydrochloride
- <detail>
-
<detail>
有効成分の名称 Content 塩酸エチレフリン 含量 1錠中塩酸エチレフリン5mg Etilefrine hydrochloride 5mg in 1 tablet Fig. 2. Sample of composition 1
Composition -
<detail> 250mg 1 5mL 250mg Hexatron injection 250mg is included in the Japanese Pharmacopoeia as a tranexamic acid injection and contains 250mg of tranexamic acid in a tube (5ml).
組成 ヘキサトロン注 は日本薬局方にトラネキサム酸注射液の名 称で収載されており、 管( )中に日本薬局方トラネキサム酸 を含有する。
Fig. 3. Sample of composition 2
A Proposal of a Method to Extract Active Ingredient Names from Package Inserts
579
Standard name Etilefrine Hydrochloride - <detail> Etilefrine Hydrochloride
-
… (Snip) … Chemical name
一般名 塩酸エチレフリン( 化学名
)
Fig. 4. Sample of physicochemical properties of active ingredient
3.2 Distribution of Items Described Active Ingredient Name To identify the portion from which we extract the names of active ingredients, we investigate the emergence patterns of tags expressing each of the foregoing items, namely, ‘genericname’, ‘composition’ and ‘physchemofactingredients’ tag, which exist in each SGML data. Table 1 shows the result of the investigation, where the sign ’ ’ indicates that the item is found, ‘ ’ not found, and the values shown in the far right column are the number of SGML files categorized in each pattern. This result indicates that there is composition data in almost all SGML data, which means there is still some remaining data which must be used to extract the active ingredient name from each SGML in a unified way. However, it is also important to remember that it is difficult to extract the active ingredient name in a straightforward manner given the variation in expression as shown in the previous section. Initially therefore, we extract from the portion of ‘physicochemical properties of active ingredient’, since it covers about 80% of the SGLM files and its descriptive structure is simpler than that of composition data.
✓
✗
Table 1. Result of the investigation
genericname
✓ ✗ ✗ ✓ ✓ ✗ ✗ ✓
7638
composition
✓ ✓ ✓ ✓ ✗ ✗ ✗ ✗
10179
physchemof actingredients
✓ ✗ ✓ ✗ ✓ ✗ ✓ ✗
8512
The number of SGML data 6949 1275 1269 686 3 2 2 0 10186
580
K. Nabeta et al.
4 Active Ingredient Name Extraction from SGML Data 4.1 Extraction from the Physicochemical Properties of the Active Ingredient Initially, we extract text data in the ‘detail’ tag right behind the ‘item’ tag containing ’ (standard name) from the data in ‘physchemofactingredients’ tag the string ‘ and separate the text contained using characters such as , , , , , , , , , , and ‘&enter;’ which expresses a line feed. We also exclude words which are not active ingredient names, such as ‘JAN’ (Japanese Accepted Name), ‘INN’ ’ (alias name). Though some of the (International Nonproprietary Name) and ‘ excluded words may be ranked high on a list of the obtained words in descending order of frequency (Table 2), it is also necessary to take into account the low ’ (English name) and ‘ ’ frequency words to be excluded, such as ‘ (abbreviated name). To find such words, we repeatedly seek character strings containing substrings of exclusion candidate words previously acquired, such as ‘ ’ (name). Since we confirmed that the resultant words in our data do not refer to an active gradient, we exclude them and finally obtain the active ingredient names of the medicines of which the SGML file contains the ‘physchemofactingredients’ tag.
、,
一般名
( )「 」[ ]【 】:
別名
英名
略名 名
Table 2. Extracted words in descending order of frequency
Extracted word
JAN 日局 INN 別名 日局別名 遺伝子組換え
(Japanese Pharmacopoeia)
(Alias name) (Alias name of Japanese Pharmacopoeia) (Genetic recombination) Genetical recombination Ketotifen Fumarate (Ambroxol hydrochloride) (Diclofenac sodium)
塩酸アンブロキソール ジクロフェナクナトリウム
Frequency 1330 284 130 115 112 92 76 73 73 68
In order to ascertain whether the extracted words are actually active ingredient names, we counted the number of words which appeared in text of composition. As a consequence, active ingredient names were extracted from 8383 SGML files and 8965 words (91.98%) were identified as active ingredient name out of 9747 words. However, since not all SGML data include items of physicochemical properties of active ingredients because they are optional, we propose a complementary way to extract active ingredient names, namely extraction from the portion of ‘composition’, which is described in most SGML data. 4.2 Extraction from Composition Now, let us extract active ingredient names from composition of the 1803 SGML data (hereinafter ‘Remaining data’), from which the active ingredient names were not
A Proposal of a Method to Extract Active Ingredient Names from Package Inserts
581
obtained from the physicochemical properties of the active ingredient. For that purpose, we propose a method to specify the locations of active ingredient names by matching the text with patterns created based on the location of previously acquired ingredient names in the text data of composition in the 8383 SGML data, for which we have already succeeded in extracting active ingredient names in Section 4.1 (hereinafter ‘Extracted data’). Generation of the Description pattern list. We append text data extracted from both the ‘item’ tag and ‘detail’ tag in the ‘composition’ tag of Extracted data using the character string ‘
’ as a delimiter, and replacing the number, unit, active ingredient name and brand name with labels ‘’, ‘’, ‘’ and ‘’ respectively (we regard each label as a single letter in subsequent analyses) to absorb the differences in description having originated in differing unique information on each medicine such as the active ingredient name and content (hereinafter ‘Description pattern’). To replace them, we used the regular expression ‘[0-9](([0-9]|[,./ ][0-9])+)?’ to match the numbers, a unit list to match units, the list of active ingredient names obtained in Section 4.1 to match active ingredient names, and the brand name extracted from the ‘approvalbrandname’ tag in each SGML file to match the brand names. Consequently, the composition descriptions extracted from the 8383 SGML data were aggregated into 785 patterns. To match 1803 Remaining data with the patterns obtained, we also replace their number, unit and brand name with the labels introduced above (hereinafter ‘Object data’).
~
Ingredient・content 成分・含量 中エンフルラン1mL
- <detail>1mL
Enflurane 1mL in 1mL
成分・含量
中 Fig. 5. Example of generation description pattern
Retrieval method of approximate pattern. A simple retrieval method for approximate patterns is string pattern matching, based on regular expressions, whereby Object data is compared with the Description patterns after we replace in the latter patterns with ‘.+’, which means arbitrary string. However, when we tried to apply this method to the Remaining data, we found that half the Object data did not match any Description patterns. This is mainly caused by the variations of expressions such as ‘ ’and ‘ ’, both of which mean ‘contain’ in Japanese. Since this suggests the difficulty of the exact matching of patterns, we focus on identifying string patterns which have similar structures of the pattern in question, based on ‘edit distance’ [6], which is the degree of similarity between two strings, defined as the number of times characters are inserted into or deleted from one of the strings until it coincides with the other.
含有
含む
582
K. Nabeta et al.
In this study, we extended the edit distance to allow for matching patterns by neglecting any characters of Object data located in the wild card position ‘*’ in each Description pattern. Using such edit distance, we can expect to obtain similarity of a substring but an active ingredient name by replacing the label with ‘*’. For instance, the extended edit distance between ‘Object data A’ and ‘Description pattern A’ in Fig. 6 is 4. Unfortunately, this extended distance is still affected by another problem. Imagine the case of comparing the distance between ‘Object data A’ and ‘Description pattern A’ with the distance between ‘Object data A’ and ‘Description pattern B’. Though ‘Object data A’ is more similar to ‘Description pattern A’ than ‘Description pattern B’, the former distance, 4, is longer than the latter, which is 1. This problem is due to the fact that the label is a wild card in this method and matches all characters from beginning to the letter right before the label in Object data A. With this in mind, we charge the insert cost as the length of string which corresponds with . (It is preferable that the insert cost be minimized.) We take the insert cost into account, and additionally extend the edit distance by charging a tiny value ε multiplied by the length of the character string having corresponded with the wild card ‘*’.
Active ingredient
Butylscopolamine bromide
[ Object data A ] ศ㔞᭷ຠᡂศ⮯ࣈࢳࣝࢫࢥ࣏࣑ࣛࣥྵ᭷
Calculate the edit distance [ Description pattern A ] ᐜ㔞᭷ຠᡂศྵࡴ [ Description pattern B ] ࢆྵ᭷
Fig. 6. Approximate string pattern matching by extended edit distance
Our algorithm used to calculate the extended edit distance is based on that for the commonly used edit distance, utilizing dynamic programming. When two strings A(a1, a2, …, am) and B(b1, b2, …, bm) are given, we calculate the cost Ci,j of character ai(i =1,2,…,m) and character bj(j=1,2,…,n), which is inductively given by eq. (4.1), the minimum value in Ci,j-1,, Ci-1,j and Ci-1,j-1 with additional incremental cost, as is done for the commonly used edit distance. To extend the incremental cost to the charge insert cost, we redefined each incremental cost as (4.2), (4.3), (4.4).
Ci , j = min(Ci , j −1 + Δ j , Ci −1, j + Δ i , Ci , j + Δ i , j )
(4.1)
⎧ε (ai ='*' ) Δj =⎨ ⎩1 (otherwize)
(4.2)
A Proposal of a Method to Extract Active Ingredient Names from Package Inserts
⎧ε (b j ='*' ) Δi = ⎨ ⎩1 (otherwize) Δi, j
583
(4.3)
⎧0 (ai = b j ) ⎪ = ⎨ε (ai ='*' or b j ='*' ) ⎪ ⎩2 (otherwize)
(4.4)
As an example, the extended edit distance between ‘computer’ and ‘cu*mer’ is given as 2.05 (ε=0.01), although the normal edit distance between ‘computer’ and ‘customer’ is 6. Table 3. Sample of the cost matirx
c o m p u t e r
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00
1.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00
c 2.00 1.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
u 3.00 2.00 1.00 2.00 3.00 4.00 3.00 4.00 5.00 6.00
* 4.00 3.00 2.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07
m 5.00 4.00 3.00 2.01 1.01 2.01 2.04 2.05 2.06 2.07
e 6.00 5.00 4.00 3.01 2.01 3.01 3.04 3.05 2.05 3.05
r 7.00 6.00 5.00 4.01 3.01 4.01 4.04 4.05 3.05 2.05
Table 3 shows the process of calculation Ci,j for this example, and the gray cell indicates the shortest path used to calculate the edit distance. It is possible to have an extracted string correspond with ‘*’ by pursuing this path. The resultant extended edit distances of ‘Object data A’, where ε=0.0001, are 4.0011 to ‘Description pattern A’ and 1.0022 to ‘Description pattern B’. The approximate patterns are high on the list by separating the integer portion of the number from the fractional portion and sorting the latter in ascending order. In this result, the active ingredient name ‘ ’ (Butylscopolamine bromide) can be extracted from the cost matrix which measures the extended edit distance between ‘Object data A’ and ‘Description pattern A’.
臭化ブチルスコポラミン
Decision of extraction words. We additionally found a problem whereby the obtained Description pattern can separate an active ingredient name as Table 4, or add extra letters to it. To improve our method, we focus attention on the fact that the second or lower approximate Description patterns in the list (Table 5) succeed in extracting the active ingredient name and add the process to take a vote on extracted words in the top 20 of the list.
584
K. Nabeta et al. Table 4. Example of failure
Object data Obtained pattern Edit distance Extracted words
本品はイソプロパノール以上を含む。 本 品 は で 以上を含む。 1.008 ‘イソプロパノー ’, ‘ル’ Table 5. Approximate pattern list
Rank 1 2 3 4 5
Description pattern
本品は で 以上を含む 本剤は 以上 を含む。 本品は を含 む。 組成
本品は 以上を含む 本 品 は 定 量 す る と き 以上を含む。
Extracted words
イソプロパノー, ル イソプロパノール イソプロパノール イソプロパノール イソプロパノール
Discussion. We confirmed the validity of our method to 1803 SGML data, for which the active ingredient names were not extracted from physicochemical properties, by selecting 100 SGML data at random and verifying that the extracted words were active ingredient names. Consequently, we obtained 242 active ingredient names, 305 extracted words, though we expected to obtain 334 active ingredient names as a right answer. From these results, we can estimate the precision of our method at 0.80 and the recall at 0.72, which indicates that our method can extract active ingredient names from composition data with reasonable accuracy. Additionally, we found that the Object data matching some approximate pattern were properly extracted. In particular, we observed that the active ingredient names which appear in patterns such as the iteration of ‘’ were extracted with relatively high precision. We must also note the fact that Object data not matching any of the approximate patterns obtained tended to fail to extract the active ingredient name, though this is not a fundamental problem from the perspective of estimating the effectiveness of our keyword extraction method.
5 Conclusion In this study, we proposed a method to extract an active ingredient name from SGML formatted package insert data, which can contribute to the developing a drug information database. The active ingredient names are described in three portions of
A Proposal of a Method to Extract Active Ingredient Names from Package Inserts
585
the data, ‘generic name’, ‘composition’ and ‘physicochemical properties of the active ingredient’. Initially, we extracted from the ‘Physicochemical properties of active ingredient’ since it is easier to extract from there than from other portions. We obtained active ingredient names from 8383 SGML data by means of methods excluding the words such as names of data items and symbols as parentheses, which we collect by using character matching. Additionally, since 1803 SGML files remained to be analyzed, we proposed a method to extract active ingredient names from ‘composition’, which can be found in most of the SGML data, based on the patterns of the description created from 8383 SGML data, from which we had succeeded in extracting active ingredient names and their location in the SGML data. To define the measure of similarity of the patterns, we extended edit distance between the pair of pattern strings to extract an active ingredient name by corresponding substrings with a label of active ingredient name which lies at the same position in patterns. Evaluation of this method shows that the precision is 0.80 and that the recall is 0.72. As a result of application to our target data, we succeeded in the extraction of active ingredient names from 9849 (96.70%) SGML data. We have to notify that composition of the tabular form was out of focus in this study, since it is difficult to extract data from the tabular form, and another method capable of dealing with this problem must be considered in future. Although our method succeeded in obtaining an active ingredient name with considerable accuracy from our target SGML data, which is not suitable for extracting information, drug information must be reliable and comprehensive. A reliable and comprehensive drug information database must therefore be established with a structure facilitating the extraction of information as soon as possible. Since the knowledge, method and data used in this study can be the basis for such database, we will conduct further investigations into package inserts. In order to guarantee that information is both reliable and complete, we require the cooperation of the pharmaceutical companies.
References 1. Pharmaceuticals and Medical Devices Agency, http://www.pmda.go.jp 2. Nabeta, K., et al.: The investigation into problems of utilization of drug information in package inserts to ensure safety of drug usage. In: Proceedings of AHFEI 2008 (2008) 3. Ohtsuki, C., et al.: Study on Inconsistency in Epileptic Seizure Terns Used Package inserts. Japanese Society of Pharmaceutical Health care and Sciences 31(1), 65–71 (2005) 4. Hamada, M., et al.: Creation of Drug Information Database based on Pharmaceuticals Markup Language (PML). Japanese Society of Pharmaceutical Healthcare and Sciences 33(6), 502–509 (2007) 5. The Medical Information System Development Center, http://www.medis.or.jp 6. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
Examination of Evaluation Method for Appearance Similarity of PTP Sheets Yoshitaka Ootsuki1, Akira Izumiya1, Michiko Ohkura1, and Fumito Tsuchiya2 1 Shibaura Institute of Technology {l05023,m708101,ohkura}@sic.shibaura-it.ac.jp 2 Dental Hospital, Faculty of Dentistry, Tokyo Medical and Dental University [email protected]
Abstract. In recent years, many accidents concerned with medicine have been caused by the confusing design of pharmaceutical packages and displays. We concentrate on the appearance similarity of PTP sheets, which are most commonly used for wrapping tablets in Japan, to clarify the factors and the degrees of their effects on appearance similarity. This paper describes our experiments that examined evaluation methods of the appearance similarity of PTP sheets. Keywords: appearance similarity, evaluation method, medical accident, PTP sheet.
1 Background and Objective In recent years, many accidents concerned with medicine have been caused by the confusing design of pharmaceutical packages and displays [1]. The Japanese Ministry of Health, Labor and Welfare is now using a similar name search engine to ban new names for medicine that resemble existing names. However, no such action exists for appearance similarity. We concentrate on the appearance similarity of PTP sheets, which are most commonly used for wrapping tablets in Japan, to clarify the factors and the degrees of their effects on appearance similarity. Our goal is to establish standards to evaluate appearance similarity. This paper describes our experiments that examined evaluation methods of the appearance similarity of PTP sheets.
2 PTP Sheets of Our Experiment The design of the PTP sheets of “Starsis” from Astellas Pharma Inc. was changed in July, 2005, because pharmacists pointed out their appearance similarity to the PTP sheets of “Harnal D” of the same company. However, the appearance was changed again in April, 2006, because pharmacists still thought pointed their similarity to “Harnal D” was excessive. Since we believe that this is a good example to clarify the factors of the appearance similarity of PTP sheets, we performed experiments with them. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 586–593, 2009. © Springer-Verlag Berlin Heidelberg 2009
Examination of Evaluation Method for Appearance Similarity of PTP Sheets
changed
587
changed
Starsis
Harnal D
Fig. 1. Harnal D and Starsis
3 Experimental Method In the experiment, the participants were shown pairs of PTP sheets and were instructed to evaluate their similarities one by one. Three kinds of PTP sheets were used (Fig. 2). The questionnaire used in the experiments was a seven-scale evaluation (Fig. 3).
Harnal D
Starsis H18
Starsis H17
Fig. 2. Three kinds of PTP sheets
completely very similar similar
1
2
similar
neither
3
4
not similar
5
Fig. 3. Questionnaire to evaluate similarity
not very similar
6
not similar at all
7
588
Y. Ootsuki et al.
The experiments were performed three times. The conditions of each experiment are described in Table 1. Table 1. Conditions of experiments
Factors Experiment 1
Presented images whole images 4-tablet images
Participants
Presentation methods
students
simultaneous presentation
Experiment 2
whole images and 4-tablet images
students pharmacists
simultaneous presentation
Experiment 3
whole image (front image only)
students
non-simultaneous presentation
Each experiment is described in detail as follows. 1. Experiment 1: difference of presented images We examined whether any differences exist between the evaluation scores when showing the whole and 4-tablet images. The images used for Experiment 1 are shown in Fig. 4.
Front
Back
Front
Back
Front
Back
whole image 4-tablet image
(a) Harnal D
(b) Starsis H17
(c) Starsis H18
Fig. 4. Presented images in Experiment 1
2. Experiment 2: difference of participants We examined whether any differences exist between the evaluation scores of students, who may become patients, and pharmacists who are responsible for preparing medicine.
Examination of Evaluation Method for Appearance Similarity of PTP Sheets
589
3. Experiment 3: difference of presentation methods We employed simultaneous presentation in Experiments 1 and 2 and non-simultaneous presentation in Experiment 3. These two presentation methods are described in detail as follows: • simultaneous presentation (Experiments 1 and 2) 1. Two images simultaneously presented.
Fig. 5. Simultaneous presentation
• non-simultaneous presentation (Experiment 3) 1. Presented first image. 2. Presented the time remaining for the next image. 3. Presented second image.
(1)
(2)
(3)
Fig. 6. Non-simultaneous presentation
Comparing the results of these experiments, we examined whether any differences exist between the evaluation scores for the two types of presentation methods. Each experimental procedure is described in detail as follows. • Experiment 1 and 2 1. Simultaneously presented two of three images for one second. 2. Similarity evaluated by questionnaire. 3. (1) and (2) repeated 36 times.
590
Y. Ootsuki et al.
• Experiment 3 1. 2. 3. 4. 5.
First image presented for one second. Present the time remaining for the next image (for five seconds). Second image presented for one second. Similarity evaluated by questionnaire. (1) to (4) repeated three times.
4 Experimental Results 1. Difference of shown images Experiment 1 was performed with nine students who did not know its purpose. Fig. 7 shows the averaged scores of the similarity evaluation for the front sides of Harnal D and Starsis H17 in Experiment 1. The t-test result showed that the difference of the scores between the whole and 4-tablet images was significant at 1%. The difference of the presented images affected the difference of the appearance similarity scores [2]. Figure 8 also shows the averaged scores of the similarity evaluations between Harnal D and Starsis H17 and Harnal D and Starsis H18 both for the front and back sides. The t-test result shows that the difference of the scores was significant at 1% for the front side. The appearance similarity of the front side between Harnal D and Starsis was greatly improved by the design change to Starsis H18. On the other hand, there was no such significant difference for the back side. However, both scores averaged more than six, revealing that no appearance similarity exists for the back sides of Harnal D and Starsis. The appearance similarity problem of this example was considered to be the front side, not the back.
n iot a lua v E
7 6 5 4 3 2 1
w hole im age 4-tablets im age H arnal D S tarsis H 17
H arnal D S tarsis H 18
Fig. 7. Difference of presented images
Examination of Evaluation Method for Appearance Similarity of PTP Sheets
591
7 6 n iot 5 a ul 4 av E 3
H arnal D Starsis H 17 H arnal D Starsis H 18
2 1 Front
B ack
Fig. 8. Difference between from front and back sides
2. Difference of participants Experiment 2 was performed with six students and six pharmacists who did not know its purpose. Fig. 9 shows the averaged scores of the similarity evaluation for the front sides between Harnal D and Starsis H17 of Experiment 2. The t-test result indicates that the difference of scores between the whole and 4-tablet images was significant at 5% for the students. However, the difference of scores between the whole and 4-tablet images was not significant for the pharmacists, implying that different participants perceived appearance similarity differently [3].
7 6
w hole im age
n 5 o i t a 4 u l a v E 3
four tablets im age
2 1
students
pharm acists
Fig. 9. Difference of participants
592
Y. Ootsuki et al.
3. Different presentation methods Experiment 3 was performed with six students who did not know its purpose. Fig. 10 shows the averaged scores of the similarity evaluation for the front sides of Harnal D and Starsis H17 for the students of Experiments 2 and 3. The t-test result shows that different scores between the simultaneous and non-simultaneous presentation was not significant. Different presentation methods did not affect the appearance similarity scores [4].
Fig. 10. Different presentation methods
5 Conclusion To examine evaluation methods for appearance similarity, we focused on PTP sheets and experimentally clarified the factors and their degrees of effect on appearance similarity. The results confirmed that the evaluation scores of appearance similarity differed due to the presented images and different participants. In addition, simultaneous and non-simultaneous presentation had no effect on the evaluation scores of appearance similarity. The establishment of an evaluation method of appearance similarity by more experiments and analyzes will be our future work. Acknowledgement. We received generous support from Prof. Murayama, the director of the Showa University Hospital.
References 1. Tsuchiya, F.: Malpractice prevention and ideal way of packaging of medical products and display. Pharm Tech Japan 19(11), 27–37 (2003) (in Japanese) 2. Ootsuki, Y.: Examination of method of displaying medicine to prevent human error (10). The Japanese Journal of Ergonomics 44, 76–77 (2008) (in Japanese)
Examination of Evaluation Method for Appearance Similarity of PTP Sheets
593
3. Ootsuki, Y.: Examination of method of displaying medicine to prevent human error (12). The Journal of Japanese Society for Quality and Safety in Healthcare, Enlargement 3, 215 (2008) (in Japanese) 4. Ootsuki, Y.: Examination of method of displaying medicine to prevent human error (13). In: Proceedings of the 2009 IEICE General Conference, ESS (2009) (in press) (in Japanese)
Identifying Latent Similarities among Near-Miss Incident Records Using a Text-Mining Method and a Scenario-Based Approach Tetsuo Sawaragi, Kouichi Ito, Yukio Horiguchi, and Hiroaki Nakanishi Graduate School of Engineering, Kyoto University Yoshida Honmachi, Sakyo, Kyoto 606-8501, Japan {sawaragai,horiguchi,nakanishi}@me.kyoto-u.ac.jp
Abstract. This research focuses on supporting an analyst’s activity of interpreting the contents of existing incident reports. During this activity, analysts are always predicting expected scenarios of the incidents at hand in comparing that with the actual development of the incidents reported therein. In order to learn lessons from a particular prior experience, analysts should be aware of the latent similarities among the incidents and should experience a breakdown called "expectation-failure" to let that incident be surely printed in their memory. To let the human analysts experience this breakdown, our system introduces a theory of Memory Organization Packets (MOPs) as a framework for explaining the dynamic memory structure of the human. By utilizing this idea as a basis for scenario-based expectation of human analysts and by integrating this idea with a text-mining method, a system for supporting an incident analysis is developed for a domain of medical incidents. Results of the experiments using our proposing system are presented, where the subjects are nurses working for a hospital. Based on those results, effectiveness of the system is discussed from various viewpoints by investigating into the protocols gathered from the subjects of the experiments. Keywords: Text-mining, knowledge management for safety, learning by failure, knowledge creation, semiosis.
1 Introduction As recognized as “Year 2007 Problem” in Japan, decrease of opportunities for expertise/skill transfer in organization is a considerable social concern, and some urgent countermeasures to that problem are seriously requested, especially for the purpose of safety management in organizations. It is said that most of the troubles that organizations encounter could be avoided if the prior experiences concerning with the analogous troubles were shared by the members of the organization and generalized lessons were learned from those. For this purpose, many approaches have been attempted to construct a database that stores failures and/or near-miss incidents to let those be shared by the members of the community. However, this kind of databases has not been utilized in an effective way. This is due to the fact that accessibility to M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 594–603, 2009. © Springer-Verlag Berlin Heidelberg 2009
Identifying Latent Similarities among Near-Miss Incident Records
595
the constructed database is not fitted to users’ needs and utility of such a database depends much upon the user’s ability of interpreting the items stored therein. To let this database be more actively utilized for enhancing the creation of knowledge for safety management, some more advanced support tool is essentially needed. Generally speaking, what can be stored in such a database is restricted only to a little portion of know-how to be shared, and the transfer and sharing of most part of them are dependent upon the abilities of the individual analysts who make access to those storages. Once stored, items stored within a database are fixed and should be true, not allowed to vary in principle. However, what is more important therein is how to make the people interpret the pre-stored cases and discover their own meanings within those pre-stored cases. Whether they can obtain any lessons from past failures may depend much upon analysts’ proactive commitment during this interpretation phase; lessons can only exist in relations with analysts’ conceptions and with awareness about their sources. Wherein, the tools supporting this interpretation phase should be designed so that they can assist the analysts’ “editing” process; to discover relations between what is known and what is not known, to find out latent relations among seemingly irrelevant items, and to reconstruct a new meaning based on information fragments that are newly recognized. In the following of this article, we develop a system assisting such analysts’ interpreting prior failures and/or near-miss incidents stored in the database. For this purpose, we introduce a theory of Memory Organization Packets (MOPs) as a framework for explaining the dynamic memory structure of the human, and by utilizing this idea as a basis for scenario-based expectation of human analysts and by integrating this idea with a text-mining method, we develop a conversation system in which active interactions between analysts and a incident database are assumed and analysts’ learning from prior cases are more actively promoted.
2 Design Principles for Nurturing Safety Conceptions 2.1 Semiotics, Editorial Engineering and Breakdowns In order to assist analysts’ discovering meanings fitted to their current concerns of the targeted incidents, we propose that two technical issues should be resolved. One is to assist the phase of browsing prior incident reports including ambiguous and/or specific terminologies while subconsciously struggling to relate them with others. This subconscious effort inspires analysts to find ways of linking the factual fragments of the report into an original whole as well as into another analogous incident that is out of their expectation. The other is to assist the analysts’ scenario formation concerning with how and why the incident occurred, since the actual chronology of events provides with the primary ways we construct meaning in general and is a human universal on the basis of which trans-cultural messages about the nature of a shared reality can be transmitted [1]. A motivation of introducing those two techniques is to promote analysts’ learning. Especially we focus on the importance of “breakdowns” [2]. At the heart of learning theory lies the concept of breakdowns. When trying something for the first time we experience breakdowns in our usage of that, and we use these breakdowns to
596
T. Sawaragi et al.
experiment with reality. When something goes wrong we are given an opportunity to learn, as the breakdown reminds us of the discrepancy between our expectations and the actual reality. This is also true for the analysts of the incident reports; they can obtain lessons from prior incidents only when they expect what has occurred therein and experience the failure of that expectation in comparing that with the actual incident reported. This expectation failure does bring about a shift of focus of their analysis and promotes their learning on what they lack and what is missed in their prior knowledge. 2.2 Memory Organizing Packets (MOPs) In order to realize such a learning environment, the system should have an ability to support the analysts’ expectation formation as well as to support the analysts to recognize why and how their expectation fails. For the first purpose, we introduce the idea of Shank’s memory structure of MOPs (Memory Organizing Packects). In 1977, Schank and Abelson proposed that our general knowledge about situations be recorded as scripts that allow us to set up expectations and perform inferences [3]. Schank then investigated the role that the memory of previous situations and situation patterns play in problem solving and learning [4]. The primary function of MOPs was to provide top-down expectations, and MOPs had additional advantages. They provided a method for sharing knowledge between structures that was lacking in earlier theories of scripts. Schank’s model [4][5] of ‘dynamic memory’ also includes ‘scenes’ as the basic level of conceptual structure. Schank defined scenes as ‘general structures that describe how and where a particular set of actions take place’ [5]. Scenes are combined to form larger structures of MOPs and different versions of the same scene will be activated depending on the specific context. In this work, we design a system with a function of generating a variety of MOPs based upon the accumulated incident reports, thus the system supports analysts’ expectation formation. Then, the system lets analysts check whether their expectation matches with the actual incidents stored in the platform, while analysts may encounter expectation failures due to the existence of some unexpected events. In discovering this, to update their understanding, analysts try to create a new MOP that includes an expectation predicting what was previously the anomalous event. This process is iterated, and thus analysts’ perspectives and knowledge are enlarged through a conversation with the system. People depend on top-down structures to understand the incidents. Thus, analysts’ creativity is required to stretch those structures even for the cases in which they do not quite fit.
3 Text-Mining Method for Extracting Causal Relations 3.1 Swanson’s ABC Model Text mining usually involves the process of structuring the input text (e.g. morphological analysis and parsing), deriving patterns within the structured data, and finally evaluation and interpretation of the output. ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Relevance among the documents is often evaluated using co-occurrence relations
Identifying Latent Similarities among Near-Miss Incident Records
597
among terms appearing in the documents. By way of definition, co-occurrence networks are the collective interconnection of terms based on their paired presence within a specified unit of text. Networks are generated by connecting pairs of terms using a set of criteria defining co-occurrence. For example, terms A and B may be said to “co-occur” if they both appear in a particular document. Another article may contain terms B and C. Linking A to B and B to C creates a co-occurrence network of these three terms. The text-mining methods have been adopted for the purpose of analyzing the incident reports. Most of the text-mining methods extract indexing terms by usage of morphological analysis and represent individual documents as a collection of those terms. Since the written materials are broken into a collection of terms, such information like relations with the contexts and causalities existing in the original materials are lost during the processing. This limitation is critical for the purpose of our system, since a higherlevel memory structures like MOPs is to represent some relatums of causality among events and is to be activated depending on the specific context. Thus, in order to utilize the text-mining methods to our purpose, we introduce an extended text-mining method that can extract relations among terms preserving the causalities among them and can lead to hypothesis generation. The idea of the text-mining approach towards hypothesis generation, known as Swanson’s ABC model, consists of discovering complementary structures in disjoint journal articles. This model assumes that when one literature reports that agent A causes phenomenon B, and second literature reports that B influences C, we could propose that agent A might influence phenomenon C [6]. To find some published evidence leading to undiscovered knowledge, the A and C literatures should have few or no published articles in common. In such way, Swanson discovered, among others, several relationships that connected migraine and decreased levels of magnesium. There are two approaches to discovery that we have defined as open and closed. The closed discovery starts with known A and C. This may an observed association, or an already generated hypothesis. The discovery in this situation concerns finding novel Bs that may explain the observation. The open discovery process starts in the knowledge structure in which the scientist takes part (A). The first step is to find potential B-connections. These will likely be found within the domain. The crucial step, however, is from B to C which is most likely outside the scientist’s scope, and might therefore be in any point of the knowledge space of science. In most cases, an open discovery concerns generating a hypothesis that is evaluated in a closed discovery process. Thus, for the purpose of our work, the open discovery is more appropriate for generating expectation structures like MOPs. On the other hand, the weakness of Swanson’s ABC model is that the discovered connections do not always represents the chronological causalities among the events. Our solutions to overcome this will be given in the next section. 3.2 Design of Assisting Tool for Analysts of Incident Reports The original incident reports dealt with in this work are structured. For each incident, the report is described according to the following four progressive stages: “cause”, “situation”, “course” and “consequence”. We structure each incident at the chronological progression of a failure. That is, first the cause takes place, followed by
598
T. Sawaragi et al.
its inevitable effect, or result (situation). As a developing failure becomes evident as a failure, a person takes action to deal with the unfolding sequence of events (course). In addition, a variety of related developments take place that are described as consequence. Then, a text mining method is applied to a collection of descriptions that are gathered for each stage. Thus, the description of each stage making up an individual incident report is represented as a set of indexing terms. Next, cooccurrence relations existing in the same incident report are investigated between the indexing terms that appear in the chronologically neighboring stages; between the stages of “cause” and “situation”, “situation” and “course”, and “course” and “consequence”. For instance, as shown in Fig.1, an indexing term of “insufficient precaustion” appearing in the cause stage co-occurs with another indexing term of “incorrect administration” appearing in the situation stage, thus chronological causality from “insufficient precaustion” to “incorrect administration” can be inferred, though the reverse relation were not inferred between them. With the frequency of cooccurrence, the potential causal relationships can be extracted out of a collection of all incident reports with the degree of confirmation. These relationships are shown in diagrams as shown at the bottom of Fig.1. The analyst first select a keyword from a repertoire of indexing terms appearing in the cause stage. Then, using the text-mining method the system extracts a set of
Fig. 1. Extracting chains of indexing terms as analysts’ expectation structures
Identifying Latent Similarities among Near-Miss Incident Records
599
Fig. 2. An overview of the system
candidate indexing terms that are inferred as resultant situation caused by the selected indexing term. Herein, analysts are requested to choose one or more indexing terms from the candidate set, then the system processes in the same way and extracts a set of candidate indexing terms that are inferred as course triggered by the chosen indexing terms. This process is iterated, and the system provides the analysts with potential causal scenarios to be expected according to their specific concerns that are taught to the system through the selection of indexing terms at each stage. At the same time, analysts are assisted to stretch their expectations with the aid of the presented indexing terms. Since the entire set of incident reports are stored within the database of the system, the system at the same time presents the original incident reports that contain the selected causal relationships to the analysts. By comparing these actual incidents with what the analyst expected, they can recognize whether their expectation is right or not, and if they recognize it is wrong, the breakdown of the expectation occurs and they can investigate into why and how their expectation is violated by forming and checking other possible scenarios using the system. An overview of the system is shown in Fig.2. In terms of MOPs, the chronological causalities extracted using the text mining method are the relations connecting the different chronological scenes making us a particular MOP. Note that the system can neither extract nor present a structure of the MOP itself, but only can make the analysts infer the MOP from the presented cues of indexing terms; MOP only exists within the analysts’ mind. However, having this expectation in their mind, analysts are encouraged to proactively interpret the prior incident in a top-down way. Then, by looking at the actual incidents and comparing that with their expectation, they can recognize what knowledge for safety management is missed.
4 Experiments for Medical Incident Reports 4.1 Outline of the Experiments Our system contains 3,690 incidents that actually occurred in a particular general hospital in Japan during 2002 and 2007 (for six years). Each of the original incident
600
T. Sawaragi et al.
reports is described in a structured way according to as many as 110 attributes, out of which we choose four important attributes corresponding to the four stages of cause, situation, course and consequence as mentioned in the previous section. The users of the system are analysts of the incidents (i.e., nurses and pharmacist) working for the hospital where the incidents are gathered. The providers of the original incident reports are different from these analysts, so analysts are requested to infer what and how the incidents occurred and progressed referring to the prior cases stored within the database. The details of the experiments are summarized as follows. 1. Subjects: Seven professional nurses and one pharmacist (all females) with different years of experience varying from 10 months to more than 20 years. 2. Tasks requested to the subjects: Subjects are requested to choose one or more indexing terms of the cause stage that is of their concern and to browse the incidents using the system (Fig.3). During the session, subjects are requested to utter the protocol on what incidents are expected from the indexing terms presented by the system at each of the stages, and those protocols are recorded. At the same time, subjects are instructed to refer to the actual incident reports stored in the database and to tell how their expectations are violated or right in comparing those with what they expected from the presented indexing terms. In these experiments, the data were gathered during 17 sessions performed by eight subjects. During the sessions, the experimenter requested the subjects to select particular incident reports up to four cases that most attracts their interests and asked them why and how. The number of those protocols was 39 in total from 17 sessions. Each of the gathered protocols is transformed into a structured graphic representation using the method of gIBIS (graphical Issue-Based Information System) [7]. For the discussions on the results obtained in the experiments, we classify the subjects into Dereyfus’s typology. Dreyfus and Dreyfus [8] developed a useful fivestage typology of developing expertise, with the characteristics of each stage: Novice, Advanced Beginner, Competent, Proficient and Expert. This model has been extremely influential, particularly in the field of nursing, thus we classify eight subjects into one of these typologies and discuss about the effectiveness of our system for different subjects having characteristics of expertise at each stage. 4.2 Results of the Experiments New Knowledge Acquisition Triggered by Expectation Failures. The result obtained from subject 1 is illustrated. The protocol was obtained at the time when the subject has constructed some expectations (i.e., MOPs) in mind observing a chain of indexing terms that is displayed by the system. At this time, the subject referred to the actual incident reports stored in the database in which those indexing terms are present in their descriptions at each of the progressive stages. The expertise typology of this subject 1 has is “Proficient.” From the presented chain of indexing terms, she thought of an incident in which carelessness simply caused oblivescence of administering intravenous drips to a patient. However, the actual incident report retrieved from the database was on the incident in which a nurse mixed up the kinds of the materials in making solutions for instillation; she made a solution with a wrong material of glucose, while the right material was normal saline. Finding this
Identifying Latent Similarities among Near-Miss Incident Records
601
Fig. 3. A hard copy of the display presented to the analyst
unexpected incident, the subject explained why she could not think of such an incident. She said, “I have known the precaution of checking the material in making solutions for instillation, but I guessed other types of incidents caused by some wrong procedures, and it is really unexpected that such mixing up do occur in such a stage.” This shows that her expectation was formed from the indexing terms based upon her biased knowledge on her prior experiences, but the presented incident was out of her expectation. In other experiments, especially in the sessions by the subjects classified as Proficient and Expert, the subjects are very good at associating prior incidents with the presented indexing terms, but their associations were sometimes apt to be quite biased, which caused them to stretch their expectation towards a wrong way during their sessions. Wherein, the notice on their expectation failures by referring to the actual incident reports is really important and contributive to resetting and expanding their biased focus. Missing of Expectation Failures. In another session, there occurred no expectation failure at all. This was observed in subject 3 who is classified as a typology of “Advanced Beginner” or “Competent.” Subjects with this proficiency are quite good at memorizing all the incidents, but this is simply due to rote learning, meaning that they cannot relate individuals with each other and they miss grasping any contextual factors affecting the incidents. In constructing their expectations through the conversation with the system, they apt to make up scenarios exhaustively by picking many indexing terms in many ways in their sessions. Consequently, the actual incident reports presented to the subject are all within the scope of their expectations, and no expectation failure has occurred. Failures in Forming Expectations. The other typical observation in the experiments is that subjects failed in forming their expectations. This is mostly found in the sessions by the subjects (e.g. subject 8 and subject 7) classified as a typology of
602
T. Sawaragi et al.
“Novice” or “Advanced Beginner”. At the stage of stretching their expectations, subjects are demanded to select the appropriate indexing terms at each progressive stage. At this time, with the poor expertise they cannot imagine any possible scenarios from the displayed indexing terms, thus they fail in constructing any expected scenarios. With such incomplete expectations, even though the actual incident reports were presented to the subjects, they could not understand why they are presented therein, and comparison with what they expected cannot be made. As a result of this, they frequently give up continuing the session and no learning occurs at all. 4.3 Discussions on the Results of Experiments The results of the experiments shown in the previous section reveal that the ways of interacting with the system do vary among the analysts depending upon their developing expertise. First, novice can be characterized by their rigid adherence to taught rules or plans, little situational perception and no discretionary judgment. For those analysts with less experience, our proposing system is less effective, since they fail in forming holistic scenarios. As the novice gains experience actually coping with real situations, he begins to note perspicuous examples of meaningful additional aspects of the situation. After seeing a sufficient number of examples, people of advanced beginner can learn to recognize these new aspects, but their situational perception is still limited and all attributes and aspects are treated separately and given equal importance. These characteristics are improved for competent people; with more experience, the number of potentially relevant elements that the learner is able to recognize becomes overwhelming. At this point, however, since a sense of what is important in any particular situation is missing, performance becomes exhausting. This fact is reflected in the results of the experiments for the analysts ranked as advanced beginner and competent; they could make up their expectation structures well and exhaustively, but fails in focusing their views, thus they could find many of the prior incident cases, but they are all within the scope of their expectations, so learning does not occur. At the stage of proficient, such an ability is drastically reconstructed; to cope with this overload and to achieve competence, people learn through experience to devise a plan or choose a perspective that then determines which elements of the situation are to be treated as important and which ones can be ignored. They can see situations holistically rather than in terms of aspects, and can see what is most important in a situation. This is typically observed in the results of the experiments in which expectation failures occurred and new knowledge acquisition is attained. Since they are able to perceive deviations from the normal pattern and to cope with those deviations by reorganizing the expectation structures of the MOP in an adaptive way even when they encounter expectation failures. Wherein, learning by expectation failure does occur in the most effective way.
5 Conclusions and Future Perspectives Of immediate importance in our proposing system is the Peircian notion of semiosis in which the role of the interpreting instance and of the context in which this “interpretant” operates. The core of the analysts’ activity during the session with the
Identifying Latent Similarities among Near-Miss Incident Records
603
system is in the first place a “checking against” one's own foreknowledge, one's prior observations and experiences. During a conversational session of interacting with the system, existing knowledge that has been acquired before the session interferes with the information supplied on the spot by the system. And exactly this interference creates meaning within the analysts; meaning does not solely exist within the written materials of incident reports. Meaning then largely depends on the cognitive structure of the analysts and on the nature of their expectations. The innovative design principle proposed in this article is the radical shift of the emphasis from the production side, where an incident report is required to be a coherent arrangement of elements, to the reception side of the communication schema, in which the viewing of the actual incident - and not the reading of a text acts as the drive for interpretation by the analysts. Our proposing system does contribute to enhancing analysts’ sense-making [9], i.e., placement of items into frameworks, comprehending, redressing surprise, constructing meaning, interacting in pursuit of deep understanding and patterning for safety management.
References 1. White, H.: The Content of the Form: Narrative Discourse and Historical Representation. Johns Hopkins UP, Baltimore (1987) 2. Winograd, T., Flores, F.: Understanding Computers and Cognition: A New Foundation for Design. Ablex Pub. Addison-Wesley (1987) 3. Schank, R., Abelcon, R.: Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Lawrence Erlbaum, Hillsdale (1977) 4. Schank, R.: Dynamic Memory: A theory of reminding and learning in computers and people. Cambridge University Press, New York (1982) 5. Schank, R.: Dynamic Memory Revisited. Cambridge University Press, NewYork (1999) 6. Swanson, D.R.: Fish Oil, Raynaud’s Syndrome, and Undiscovored Public Knowledge. Perspectives in Biology and Medicine 30(1), 7–18 (1986) 7. Conklin, E.J., Begeman, M.L.: gIBIS: A Hypertext Tool for Exploratory Policy Discussion. In: Proceedings of CSCW 1988, pp. 140–152. ACM, New York (1988) 8. Dreyfus, H.L., Dreyfus, S.E.: Mind over Machine: the power of human intuition and expertise in the era of the computer. Basil Blackwell, Oxford (1986) 9. Weick, K.: Sensemaking in Organizations. Sage Publications, Thousand Oaks (1995)
Patient Safety: Contributions from a Task Analysis Study on Medicine Usage by Brazilians Carla Spinillo, Stephania Padovani, and Cristine Lanzoni PostGraduate Program in Design - Universidade Federal do Paraná –Departamento de Design R. Gal. Carneiro, 460, - Edf. D. Pedro I - sala 811 - Centro – Curitiba – PR. Brazil [email protected], [email protected], [email protected]
Abstract. Medicine misuse in Brazil is one of the most relevant health issues affecting millions of people. This paper discusses the results of a study on usage of five different medicines by 60 adult Brazilians. Problems in task performance occurred with all medicines, especially those requiring measuring doses and object manipulation. Deficiencies in the design of the medicine inserts were also found. The outcomes enable to conclude that improvements in the design of instructions for patients and of medicine inserts, as well as of medicine bottles/containers are necessary to facilitate their use by Brazilians. Keywords: Medicine usage, Medicine inserts, Task analysis, Brazil.
1 Introduction Brazil has a population of 190,706 million people and produces around 10,972 million medicines per year [1]. Although their purchase is regulated by the Ministry of Health [2], one may easily acquire medicines without prescriptions or assistance from health professionals. As a result, Brazilians are poised due to misuse of medicines. In the latest survey, from 1993 to 1996, 57,748 cases of medicine poison were registered in the country [3]. Taking a medicine may be a challenging task, demanding handling objects/devices (e.g., syringes, applicators), following instructions for dosage, storage and disposal, generally available in the medicine inserts. However, deficiencies in the graphic presentation of inserts may negatively affect patients’ comprehension of the task [4-9]. Considering the aspects above, this paper briefly presents the results of a study conducted in Brazil on the effects of medicine inserts on task performance.
2 The Study Sixty male and female adults participated in the study divided across the medicines tested (12 participants per medicine). Five medicines differing in the manner they M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 604–608, 2009. © Springer-Verlag Berlin Heidelberg 2009
Patient Safety: Contributions from a Task Analysis Study on Medicine
605
should be taken were selected together with their inserts: (1) oral inhale capsule; (2) vaginal cream (female participants) (3) oral suspension (paediatric antibiotics); (4) insulin injection; and (5) nasal spray. For participants’ safety, medicines were taken/used in a simulated manner demanding the same actions of the real task. Figure 1 shows the medicines used in the experiments and their inserts.
(1) oral inhale capsule
(2) vaginal cream (female participants)
(3) oral suspension
(4) insulin injection
(5) nasal spray Fig. 1. Medicines used in the study
606
C. Spinillo, S. Padovani, and C. Lanzoni
Initially, each task was described according to the information available in the medicine inserts. Afterwards, participants were asked to take/use the medicine following the instructions in its insert. When they considered the task over, a semistructured interview was conducted regarding their impressions on their task performance and on the medicine inserts. Task performance was analysed as to the classification for human errors: (1) information processing, (2) action, and (3) verification [10] [11]. Subcategories were added to these classifications to suffice the study particularities on the task performances, as shown in Table 2. Data was analysed in a qualitative manner. The numbers presented here are only to indicate trends for discussion purpose. Table 1. Classification for human errors used in the study and the subcategories proposed for analysing the results on task performance by participants 1 – Information processing errors Internal (individual repertoire) Pi 1| Wrong/Mistaken assumption External (insert/package/product) Pi 2| Information was not read/searched Pi 3| Information was incompletely read/searched Pi 4| Wrong information searched Pi 5| Information was searched but not found Pi 6| Information was searched and founded but not understood 2 – Action errors A 1|Task/action was not performed A 2| Task/action was incompletely performed A 3| Task/action was performed in wrong/inappropriate moment A 4| Very long or very short Task/action A 5| Task/action performed in a very little or very large amount/quantity A 6| Task/action in wrong direction A 7| Wrong alignment A 8| Right task/action in wrong/mistaken object A 9| Right task/action but in a wrong part/component of a right object A 10| Wrong task/action in a right object A 11| Wrong task/action in a wrong object A 12| Selection not done A 13| Wrong selection done 3 – Verification errors V 1| Verification not done V 2| Verification incompletely done V 3| Verification in a wrong moment V 4| Right verification in a wrong object V 5| Wrong verification in right object V 6| Wrong verification in wrong object V7| Verification in a very little or large amount/quantity
Patient Safety: Contributions from a Task Analysis Study on Medicine
607
3 Results and Discussion In the tasks’ descriptions, the highest figure for necessary actions to take/use medicines was found in the oral inhale capsule (N=20). For the decision making process the highest figure was on the insulin injection (N=9), which also presented the highest figure (N=5) for conditional situations, together with oral suspension. Regarding the experiments, results confirmed participants’ difficulties on understanding medicine insert information, particularly on selection and verification of dosage. The highest figures for human error were found in insulin injection (N=162) and (3) oral suspension (N=75), as shown in Table 3. In addition, the inserts of these medicines showed major drawbacks in the graphic presentation of text, and lack of clarity in the visual instructions, which lead participants to loose interest in undertaking the task and to misinterpret instructions. Table 2. Errors in task performance Medicine
(1) Oral inhale capsule (2) Vaginal cream (3) Insulin injection (4) Oral suspension (5) Nasal spray Total
Errors Information processing 20 7 60 32 2 121
Total Action
Verification
30 13 68 39 29 179
2 0 34 4 12 52
52 30 162 75 43
Results suggest that expected cognitive load during the process of taking a medicine might be beyond the instructions for medicine usage in the insert. These possibly lead patients to act intuitively, inferring the ways medicine can be taken/used. As a consequence, errors on manipulation / preparation of a medicine and on determining its right dosage occurred. It is worthwhile highlighting that if patients do not comply with the conditions to take/use/manipulate the medicine/medicine components, or do not make the proper decisions; their health may be severely compromised.
4 Conclusions Despite the limited number of participants in this study, results indicated that Brazilians in general have difficulties in taking/using medicines, particularly those demanding manipulation of objects and preparation of dosage. Outcomes also permit questioning the effectiveness of medicine inserts produced in Brazil in communicating usage procedures. Moreover, this study ratifies previous findings on the effect of graphic presentation of inserts on comprehension, and corroborates their influence on task performance. Finally, it is worth pointing out the findings also suggest the design of medicine bottles/containers affects the ways instructions are followed. Although they were not the focus of this study, they influenced task performance of using/taking medicines.
608
C. Spinillo, S. Padovani, and C. Lanzoni
Thus, improvements not only in the design of medicine inserts, but also of medicine containers, seem to be necessary from a user-centred approach. Acknowledgments. We would like to thank CNPq- Funding Agency of The Brazilian Ministry of Science and Technology, and The Brazilian Ministry of Health for supporting this research. We are also grateful to the participants for volunteering to the experiments.
References 1. IBGE. Instituto Brasileiro de Geografia e Estatística, Ministério do Planejamento. Dados estatísticos em saúde., http://www.ibge.gov.br/home/estatistica/economia/ economia_saude/ 2. BRASIL, Ministério da Saúde. Portaria n.110, de 10 de março (1997), http://www. anvisa.gov.br/legis/portarias/110_97.htm (accessed Feburary 20, 2009) 3. Oliveira, E.A., Labra, M.E., Bermudez, J.: A produção pública de medicamentos no Brasil: uma visão geral. Cadernos de Saúde Pública, Rio de Janeiro 22(11), 2379–2389 (2006) (accessed Feburary 20, 2009) 4. Spinillo, C.G., Padovani, S., Miranda, F., Fujita, P.T.L.: Instruções visuais em bulas de medicamentos no Brasil: um estudo analítico sobre a representação pictórica da informação. In: 3º Congresso Internacional de Design da Informação. SBDI, Curitiba (2007) (CD-ROM) 5. Spinillo, C.G., Padovani, S., Miranda, F.: Graphic and information aspects affecting the effectiveness of visual instructions in medicine inserts in Brazil. In: Proceedings of the AHFE International Conference 2008. USA Publishing, Louisville (2008) 6. Sless, D., Tyers, A.: Case history # 5 | Panadol 24 Pack: new instructions for consumers. CRIA (2004a), http://www.communication.org.au/cria_publications/ publication_id_89_1290110197.html 7. Sless, D., Tyers, A.: Labelling code of pratice: designing usable non-prescription medicine labels for consumers (2004b), http://www.communication.org.au/cria_publications 8. Waarde, K.: Visual information about medicines: providing patients with relevant information. In: Spinillo, C.G., Coutinho, S.G. (orgs.) Selected Readings of the Information Design International Conference 2003, pp. 81–89. SBDI, Recife (2004) 9. Wright, P.: Printed Instructions: Can research make a difference? In: Zwaga, H.J.G., Boersema, T., HoonHout, H.C.M. (eds.) Visual information for everyday use: Design and research perspectives, 45–66. Taylor & Francis, London (1999) 10. Barber, C., Stanton, N.A.: Human error identification techniques applied to public technology: predictions compared with observed use. Applied Ergonomics 27(2), 119–131 (1996) 11. Rasmussen, J.: Human Error. In: Information Processing and human-machine interaction, pp. 140–169. North Holland, New York (1986)
Remote Consultation System Using Hierarchically Structured Agents Hiroshi Yajima1, Jun Sawamoto2, and Kazuo Matsuda3 1
Faculty of School of Science and Technology for Future Life, Tokyo Denki University, Japan [email protected] 2 Faculty of Iwate Prefectural University, Japan [email protected] 3 School of Science and Technology for Future Life, Tokyo Denki University, Japan
Abstract. In fields of technological innovation the speed of advance is fast, and while it is difficult for some people to keep up, there are few experts in new technologies. Since consultation is focused on a small number of experts, phenomena such as being unable to obtain sufficient information in a timely manner occur, and are one of the major reasons for the increasing socialtechnological divide. This paper proposes a 2-level hierarchical remote consultation system using two types of agent. The system possesses the features that through the responses to consultation made in advance by multiple agents, experts can focus on only complex questions, and in addition, consultees' waiting times are reduced. Its effectiveness is demonstrated experimentally. Keywords: remote consultation system, agent, remote communication, expert, TV conferencing.
1 Introduction Society has been aging in recent years, and service functions for poorly informed aged persons and patients will be sought. While the number of healthcare professionals is small, remote healthcare consultation which is efficient and yet maintains an appropriate level of service is being sought. In addition, while forms of employment are diversifying, models of employment such as the teleworking remote office are gathering attention. Further, in fields of technological innovation the speed of advance is fast, and while it is difficult for some people to keep up, there are few experts in new technologies. Since consultation is focused on a small number of experts, phenomena such as being unable to obtain sufficient information in a timely manner occur, and are one of the major reasons for the increasing social-technological divide according to which the benefits of advancing technology cannot be fully realized. Regarding policies for resolving this issue in society at present, research focusing on the theme of efficient remote communication support is important. In particular, support for fostering communication among disparate groups of fellow persons is essential. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 609–618, 2009. © Springer-Verlag Berlin Heidelberg 2009
610
H. Yajima, J. Sawamoto, and K. Matsuda
So far, remote consultation systems have been conducted via TV conferencing and so on [1,2,3,4]. However, in such cases, it has been usual for consultation to be conducted with 1 consultee exclusively occupying the services of 1 expert. Regarding information sharing there are also remote conferencing systems [5] such as Skype. Functions for visualizing the topic of a discussion among its members in a shared manner have also been proposed. However, remote conferencing has been centered on discussions along a common theme among all members, and they are inefficient for situations in which experts possessing knowledge and information in a given field present solutions to laypersons lacking such information. In order to solve these problems, this paper proposes a 2-level hierarchical consultation model using 2 levels of agent. The two types of agent established are Service Agent systems (SA) for the clients, and Supervisor Agent system (SVA) existing between the SAs and experts, who provide easily understood support by responding to requests for support from SAs in cases when they are able to do so, or otherwise forwarding the existing message history to experts. This system possesses the features that experts are able to focus on only complex questions, and in addition, consultees' waiting times are reduced.
2 Problematic Points 2.1 Existing Remote Consultation Remote consultation operations over the internet are increasingly tending towards communication among people from different cultures and institutions. This is because the internet generation, new technologies, new organizations and new establishments are being developed, constructed and disseminated on a daily basis, and it has become necessary to rapidly assimilate this flow. In remote consultation, there are synchronous and asynchronous models. Synchronous models are those such as a telephone, where both parties exchange discourse during the same period of time. Asynchronous models are those such as email in which discourse may be exchanged without adopting a specific time period. Asynchronous models are mainly being applied by means of email, but with the rapid speed of business in the present day, there is an increasing need for synchronous models. The objective of this research is a synchronous remote consultation system over the internet among these kinds of disparate groups and individuals. Figure 1 shows an example of an existing remote consultation system which has already been investigated [6,7]. Basically, consultees initiate consultation from a convenient location, while on the other hand, a small number of experts oversee these consultations from a central office and respond to complex queries. Remote consultation is currently being conducted in many fields. PC user support and so on, is widely active in general. Also, remote consultation has also come to be provided in financial and healthcare fields. Along with this model, the provider model has also diversified. At present, remote consultation services are being provided by email, homepages, TV, telephone, and
Remote Consultation System Using Hierarchically Structured Agents
611
Fig. 1. Existing model of a remote consultation system
Fig. 2. Phases of remote consultation
models combining these technologies. However, services using asynchronous communication models such as email incur a time-lag between the receipt of a consultation and the response, so problems cannot be solved immediately. For reasons such as this, the telephone, with its synchronous communication model, is the main channel for the provider model of remote consultation services.It is thought that the general flow of consultation may be broadly divided into 3 phases [6], and in this research the following definition is adopted (see Figure 2). 2.2 Problems with Existing Systems Remote consultation has the following features (communication patterns). Basically, partners from different cultures (clients and experts) communicate as fellows. The disparate groups may include for example, a) groups of experts and laypersons, b) intradepartmental and interdepartmental staff groups, and c) groups of company staff and non-company persons, which thus constitute groups of people with different values, knowledge and objectives. People belonging to heterogeneous cultures often have different levels of knowledge, and the range and content of their basic assumptions also often differ, yielding obstacles to communication. Also, the number of consultees is usually overwhelmingly greater than the number of experts, so if experts respond to consultees on a 1-to-1 basis, the efficiency of consultation is poor.
612
H. Yajima, J. Sawamoto, and K. Matsuda
3 Solution Strategy 3.1 Concept This paper proposes a formula for conducting remote consultation in which experts and agent systems are combined. Consultation is therefore first conducted between consultees and service agent systems (SA), and the SAs are supported by experts in the basic model proposed. This allows consultation to be conducted between consultees and agent systems, without the need for 1 to 1 consultation between consultees and experts. Next, the multiple SAs seek support from the experts in cases when they are unable to respond themselves. However, when multiple SAs seek support simultaneously, experts must deal with multiple support requests at once. Agent system (SVA) with different functions (meta-knowledge and scheduling functions) is therefore placed between the experts and the SAs. By constructing the agent system in 2 layers (SVAs and SAs), consultation is made efficient. By including the SVAs, experts need only deal with a single SVA, rather than multiple SAs. 3.2 System Structure The consultation model of this research is shown in Figure 3.
Fig. 3. The model proposed in this research
Adopting this structure gives rise to the following advantages. • The problem arising when multiple SAs directly request support from experts simultaneously, thus increasing the burden on experts and decreasing the efficiency of consultation, is avoided. • Also, the problem associated with consultee stress arising when multiple SAs send requests for support simultaneously, and one SA must wait for another SA's support to be concluded, thus increasing their consultee's waiting time, can be solved.
Remote Consultation System Using Hierarchically Structured Agents
613
• The function of each agent is as follows. SA: conducting information exchanges with consultees. In this research, SAs question consultees regarding essential items and obtain their replies. When SAs are unable to respond themselves, these replies are forwarded to SVAs as requests for support. SVA; providing support for experts, acting between the SAs and experts. When requests for support from SAs are within the range they can respond to, SVAs respond themselves, and in cases when they cannot respond, the requests are scheduled according to importance, and presented to experts in an easily understood manner along with the message history to date.
4 Remote Consultation Utilizing Hierarchical Agents 4.1 Processing The following procedure is proposed as a method for realizing the concept. 1. Consultation is promoted between SAs and consultees. SAs ask questions of the consultees, and the consultees return their replies. Only the SAs respond during this process, without involving the experts. 2. When the consultees' replies are correct, the SAs present the next question. 3. The SAs send the consultees' replies to the SVAs, and the SVAs process the data, presenting individual SA consultation cases to the experts. Under this process, the experts only observe the data reported to them. 4. When replies from a consultee incur exceptional handling, SAs request support from SVAs, i.e., when the content of replies from consultees cannot be processed by SAs, SAs request support from SVAs. 5. When requests for support received by SVAs can be handled using the metaknowledge they maintain, they return replies to the SAs. When they are unable to reply themselves, the preceding message history is attached, and support is requested from the experts. 6. Experts receiving requests for support send replies to the SVAs. 7. Messages from the experts are sent, via the SVAs, to the SAs originating the requests, and presented to consultees. After receiving these messages, SAs resume questioning. 8. When consultees are satisfied, consultations are concluded. 4.2 Specific Flow of Consultation Existing consultation systems have mainly advanced using audio, but in this research, audio is not used. Consultation is conducted using a chat format in free text. The consultation advances as the agent poses questions to the consultee, and the consultee returns the answers, or asks questions. Then, when the consultee is satisfied, the final result is displayed at the consultee side and the consultation ends. The flow of consultation in shown in Figure 4.
614
H. Yajima, J. Sawamoto, and K. Matsuda
Fig. 4. Flow of consultation
5 Experimental Assessment 5.1 Experimental Objectives In the remote consultation system using agents, the case when SAs and SVAs are utilized, and the case when only SAs are utilized are compared and the variation in the burden on experts is ascertained. The number of consultees for each SA is taken to be 3. 5.2 Experimental Conditions Condition 1. Consultation is conducted with an agent system in which 3 SAs respond to the 3 consultees. Experts respond to all of the requests for support from the SAs. Condition 2. Consultation is conducted in a hierarchically structured system with an SVA added for the 3 SAs. SVA automatically reply when they are able to respond using their own knowledge, and send the problems to which they cannot respond, as requests for support along with the preceding message history, to experts. Consultation is conducted using only text, without audio, in both Conditions 1 and 2. 5.3 Experimental Task As a task, consultation was conducted regarding the bureaucratic procedures involved in registering for a new insurance policy. Consultees do not have any knowledge, and ASs have procedural knowledge, and SVA has exceptional knowledge.
Remote Consultation System Using Hierarchically Structured Agents
615
5.4 Experimental Subjects As experimental subjects, there were 1 expert and 5 groups of 3 consultees, making a total of 16 people. The subjects were students, and all had experience using a PC. 5.5 Experimental Results (1) Data. In the experiments, as an indicator for measuring the burden on the expert, the expert's operating time was determined. The expert's operating time is shown in Figure 5. The average operating time of the expert in Condition 1 was 1570 seconds, and in Condition 2, it was 1150 seconds, so when SVAs were included, the result was a drop of about 27%. Also, the total number of messages to the expert was 245 in Condition 1, and 117 in Condition 2, so the result was a decrease of about 47%.
Fig. 5. Expert's operating time
Fig. 6. Consultees' average waiting time
Regarding the consultees' waiting times, these were measured as the period during which they could not conduct their own operations, i.e., the processing time of each consultee's agent, and the expert's operating time. The experimental results are shown in Figure6 Consultees' average waiting times in each condition. Figure 6. The waiting time in Condition 1 is 915 seconds, and in Condition 2 it is 412 seconds. This result is a drop of about 57%.
616
H. Yajima, J. Sawamoto, and K. Matsuda Table 1. Results of the consultees' questionnaire Smoothness of consultation Level of concentration Atmosphere Reliability
3.3
1.6
3.3
2.6
3
3
2.3
2.6
Ease of consultation
2.6
2.3
Degree of stress
3.3
2.6
3
2
Level of satisfaction
Table 2. Results of the experts' questionnaire With SVAs
Without SVAs
Ease of use Ease of information acquisition Level of concentration
3
3
3
2
2
3
Level of stress
2
3
A questionnaire was completed after the experiment, by both consultees and experts. The experiment was evaluated on a scale of 1 to 5 (1 was best, and 5 was worst). The results of the questionnaires are shown in Tables 1 and 2.
6
Discussion
6.1 Expert's Operating Time Looking at the expert's operating time, when SVAs are present the time is reduced in comparison to when SVAs are not present. It was thus proven that the presence of SVAs reduces the expert's burden. However, there is a big difference in the reduction of the expert's messages by 47% as compared to the reduction of 27% in operating time. The content of the questions directed at the experts is therefore classified in Table 3. According to this data, it can be seen that when SVAs are present, there is a reduction in questions regarding phrasing whichpresent, there is a reduction in questions regarding phrasing which do not require the experts long to answer, and an increase in other types of time-consuming question, particularly those regarding the service. According to the consultees' post-experiment questionnaire, consultation is smooth when SVAs are present, which means that there is an environment in which it is easy to ask questions. It was thus understood that while there are individual differences, making the consultation smooth may increase the consultees' motivation to ask questions.
Remote Consultation System Using Hierarchically Structured Agents
617
Table 3. Total number of questions in each classification
Evaluation Items
With SVAs
Without SVAs
1. Phrasing
77
2. Price
30
32
31.875
6
15
33.125
16
18
32
3. Service 4. Personal circumstances
62
Average response time (seconds) 17.055556
6.2 Consultees' Waiting Times Looking at the results regarding the consultees' waiting times, the waiting times are reduced when SVAs are present, in comparison to the case when they are not. It was thus understood that consultees' waiting times may be reduced through the use of SVAs. 6.3 Questionnaire Results Looking at the results of the questionnaire, as shown in Table 1, the consultees' overall evaluation is increased when SVAs are present. In particular, the evaluation of the smoothness of consultation is very much increased. However, while the overall evaluation is increased, the evaluation of reliability is decreased. According to the post-experiment questionnaire, this means that there is a little resistance to the fact that the responses to questions come from a computer. It was thus understood that in contrast to the increase in the efficiency of consultation, there is a demerit in the sense that the reliability ends up decreasing. Looking at the results of the questionnaire shown in Table 2, when SVAs are present the expert's ease of acquiring information is increased, so it can be seen that consultation has also been made easier for the expert. However, the evaluations of the degree of stress and level of concentration are decreased. This is thought to be related to the fact that the expert's operating time is decreased, so their free time is increased, which may affect their levels of stress and ability to concentrate.
7 Conclusion This paper proposed a 2-level hierarchical remote consultation system with 2 levels of agent. The proposed system is established with SA agents who respond to clients, and SVA agents existing between the SAs and experts, who respond to requests for support from SAs when they are able, or if not, request support from an expert by sending an easily understood request along with the preceding message history. Experimental evaluations proved that the establishment of SVAs shortens experts' operating times, and that the system is applicable as a one-to-many remote consultation system.
618
H. Yajima, J. Sawamoto, and K. Matsuda
References 1. Matsuura, S., Matsumoto, T., Kiyosue, Y., Sugawara, T., Masaki, S.: Development and Evaluation of the NetForum Simplified Multipoint Television Conferencing System. In: Proceedings of the Information Processing Society of Japan, vol. 41(11), pp. 3142–3151 (2000) 2. Kobayashi, S., Iwaki, T.: Home Teleworking Experiment using a Multipoint Internet Conferencing System, Technical Report of the Institute of Electronics, Information and Communication Engineers of Japan, OIS2002-11 (May 2002) 3. Obata, T., Sasaki, R.: OfficeWalker: Video Image Transmission System Supporting Incidental Conversations in Distributed Offices. In: Proceedings of the Information Processing Society of Japan, vol. 40(2), pp. 642–651 (1999) 4. Ido, S., Inoue, K.: Development of a Video Conference Minuting System Based on Nodding, Technical Report of the Information Processing Society of Japan, Vol. 34, pp.19– 24 (March 2006) 5. Grayson, D.M., Monk, A.F.: Are you Looking at Me? Eye Contact and Desktop Video Conferencing. ACM Transactions on Computer-Human Interaction 10(3), 221–243 (2003) 6. Tanaka, T., Koizumi, Y., Yajima, H.: Proposal of a Dialogue Support Environment for Remote Consultation using an Unmoderated Communication Mode. In: Proceedings of the Human Interface Society, vol. 4(3) (2002) 7. Tanaka, T., Mizuno, H., Tsuji, H., Kojima, H., Yajima, H.: Remote Consultation System Supporting Asynchronous Communication in Distributed Environments. In: Proceedings of the Information Processing Society of Japan, vol. 40(2)
How Mobile Interaction Motivates Students in a Class? Akinobu Ando1 and Kazunari Morimoto2 1
Miyagi University of Education, 149 Aramaki-Aoba Aoba-ku Sendai, Japan 2 Kyoto Institute of Technology, Matsugasaki Sakyo-ku Kyoto, Japan [email protected], [email protected]
Abstract. The purpose of this study is to show a way that students can become more active in the classroom. We tried to use a mobile phone as a teaching and learning tool. A mobile phone is a familiar device to Japanese students. Most of them have one and use it everyday. They write many e-mails, browse web pages, take pictures and make calls. This study showed statistically how the use of mobile phones as a teaching and learning tool affected motivation of students. As the result, we found that students’ communication-charge plan did not affect the evaluation of this method. The most effective factors were “The effect on checking attendance” and “Remembering what students learned at the end of lecture”. Thus, it is helps a teacher with two things - this method allows better use of lecture time by shortening attendance checking and allowing students to write comments during the lecture on their mobile phones. This saves time, as the teacher doesn’t have to use lecture time for feedback. So not only traditional paper and pencil should be regarded as a school tool, but also a mobile phone. Keywords: mobile phone, motivational model, anonymity, picture and LMS (Learning Management System).
1 Introduction In Japan, there is more movement to improve higher education. In a recent report by Central Education Council, it is mentioned the necessity to do it. This report describes, “It is important for teachers of higher education to let students who have little desire to learn or lack a sense of purpose have motivation to actively engage themselves in systematic instruction. For example, interactive classrooms and active experience, it is necessary for every university to think back and re-check their ways of teaching.” [1]. Today is not the time when it is enough for university teachers to convey just knowledge. A teacher expects students to ask questions and speak their own mind actively. However, in most cases, teachers’ expectations are disappointed unfortunately. Especially, even if Japanese students participate actively and positively in class, they seldom express their comments, questions and thoughts in a large classroom. It seems the “Japanese national character”. It is not that students have no ideas, comments, questions or thoughts. In general, many Japanese students are so shy, so they find it hard to speak augustly in front of many people, and whisper and grain when a teacher calls on them. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 621–631, 2009. © Springer-Verlag Berlin Heidelberg 2009
622
A. Ando and K. Morimoto
Some previous studies address this. Students do not recognize the usefulness of expressing their comments in a class when many students are participating [2]. Therefore, if a teacher lets students write a handwritten report, they describe their thoughts more freely. Surely, this situation is not suitable for all students. However, to let students participate actively in class is needed. We think that there are two ways to accomplish this. Obviously, it is important to enhance the contents of teaching. Second, we have to establish methodology as the educational engineering. We understand that it is better to adopt game elements and involve students’ experience in the contents of teaching from a traditional approach. Our study puts forward that the “Mobile phone” is one of the best tools to use to target university students. In Japan, mobile phones penetration rate is 95%, in particular 99% of university students own one [3]. Today, all mobile phones have the functions of e-mail, web browser and digital camera. We can connect to the internet via only a mobile phone. It is so easy for every people to send an e-mail with an attached of a picture. Some Japanese university teachers try to adopt the use of mobile phones in large classrooms. This aim has two points as follows. First, it is used to reduce a teacher’s workload e.g. to confirm students’ attendance and to record students’ learning and performance [4]. Another aim is to help interaction with students in class. Dr. Tamura is the one of the teachers who has taken the usefulness of a mobile phone into account from an early date. He has practiced the way to combine e-mail data from a students’ mobile phone with MS-Excel to manage students’ attendance [5]. Dr.Miyata has suggested the use of the comment database system where students can write comments onto web pages via a mobile phone [6]. We have tried to actively use a mobile phone as a teaching tool since 2003 [2][4][7]. We have studied the way for a teacher and students to use a mobile device as a “school supplies (tools)” which were enhanced electronically. The essential functions are four. These are described as follows: First function is to collect and grasp electronically students’ hand written reports and drawing. Until now, there is no way for a teacher to know and compare students working situations except when a teacher watches them directly. This function can allow us do it in a short time without special equipment. Second is to ask questions freely in class. Our teacher is asked some questions in front of a classroom after lecturing. Some of these questions, we would like to introduce them to all of the students. Third is to be able to make students to express their comments without feeling nervous. This function makes students express anonymously what they think via an e-mail. Depending on conditions, it might be better to know who expressed these thoughts. In such a case, a teacher could tell students to write their signature at the end of an e-mail. Using this function, we can collect and read comments immediately. Last function is LMS (Learning Management System) i.e. marking the absence and managing score. So far, our system and method has been held in high repute. In our class of 2007, 86% of students answered “Good” and in the remaining 14% of students, there were no bad remarks. Many of students told us this method allowed them to participate freely in class. Is this only a reason that they use a mobile phone? The purpose of this study is to show a way that students can become more active in class.
How Mobile Interaction Motivates Students in a Class?
623
2 Aspect of the Practice and Theoretical Background This “Using mobile phone” approach has already some clear advantages. The first point is the advantage of using a mobile phone. Although a personal computer diffusion rate is 85% in March 2008 in Japan, there aren’t computers in every classroom [1]. Even if there were, students might say, “Teacher, I don’t know the way to …”, “Teacher, may I click ‘YES’ button on this dialog box?” and “Teacher, my PC has something wrong…Is this Freeze?” After all, we should teach operation and follow a procedure in a computer class. However, if a mobile phone class, a teacher simply tells a “purpose” e.g. “Please send an e-mail to me” or “Access a web page using your mobile phone”. Perhaps, a student says, “My battery goes flat!” instead of asking a question. Therefore, we need the most basic alternative. That is paper. The second point is that a mobile phone is a student’s possession. So, students should pay communication charge. Many students are not concerned about the charge because they subscribe to “packet fixed-communication-charge”. However, a minority of students are concerned because they do not subscribe to the packet fixed communication. It is therefore important for the students to understand why the mobile phones are to be used. Next, we’d like to review the idea of anonymity. Of course, for some teachers, to use e-mail and BBS to express students’ comments may bring discomfort. But unless we adopt this method, we can not process many comments in a short time. By reading these comments, a teacher can know what the students feel and think. If a teacher wants to know more, he/she only asks, “Who wrote this comment? Please explain in detail”. Whereas a teacher may feel uncomfortable with this method, students feel “It is interesting in seeing many comments” and “I can express myself open-heartedly”. Finally, we’d like to describe the “Motivational Model”. The “Motivational Model” is called “ARCS Model”, which is advocate by Dr. J.Keller. According to Keller, there are four elements necessary required for a student to gain their motivation for learning. The first one is “Attention”. The “Attention” makes the students excited and makes the students interested. The second one is “Relevance”. If a student feels the contents of a lecture doesn’t relate to him/her, a student loses interest. The third one is “Confidence”. For example, if the content of a lecture is too difficult, students feel it is impossible to understand and they stop thinking about it. The last one is “Satisfaction”. Students become satisfied when they are able to fully understand and utilize the content of a lecture. Thus, the ARCS motivational model comes from Attention, Relevance, Confidence and Satisfaction. This is one of the educational models employed in the field of educational engineering and instructional design.
3 System Architecture and Target Class Figure 1 shows the outline of this system. Students can use five functions i.e. “Step of attendance”, “Step of feedback today’s lecture evaluation”, “Confirm numbers of attendance”, “Confirm test points” and “Real-time question”. A teacher can control two functions i.e. “Anonymous Comments” and “Visual Presenter”. More details are explained in the following sections.
624
A. Ando and K. Morimoto
Fig. 1. Outline of this system
3.1 How to Use the Functions At the beginning of this course, every student who wants to attend the class should submit their ID number, e-mail address, name and password via a mobile phone. This information remains private. The system server creates a database table based on this information. Using this data, a teacher can monitor every student’s activity – attendance, performance, achievement and satisfaction. Every class, a teacher gives “Today’s special keyword” – attendance code – as “Step of attendance”, when he/she begins the lecture. This keyword consists of five characters e.g. “Aj38#”. This is a procedure to ensure correct attendance figures. Only students in a classroom will know the keyword and so students not in attendance will not be able to key in. Students are waiting to input “Today’s special keyword” after they login into the system (see figure 2). The system will close 20 seconds after the teacher gives “Today’s special keyword”. We think 20 second is enough time for every student to input the keyword. After inputting the keyword, the system records the date of input and time. The teacher can monitor the recorded time on this system. At the end of each lecture, students will login into the system again, inputting “Today’s degree of satisfaction”, “Today’s achievement”, “Important words” and “Feeling on the today’s lecture”. We expect students to remember about today’s study. During a lecture, if students have questions, they can write it on “Anytime question BBS” at any time. If a teacher would like to know students’ comments or students’
Fig. 2. Screenshot of the step of attendance
How Mobile Interaction Motivates Students in a Class?
625
written e.g. pictorial notes, he/she may use the functions; “Correcting comments” and “Correcting pictures”. We use e-mail to collect students’ comments and pictures. On “Anonymous comments”, the system can display only e-mails body text. Thus, all students’ comments can be read anonymously. Only the teacher knows whose e-mail is on this system. On “Visual presenter”, a teacher can also display pictures taken and sent via mobile phones the same way as the e-mail body text. Because this function can display not only pictures but also messages written in e-mail body part, students can add supplements to picture. In addition, the system can create web pages from these contents, a teacher and students can look at them after the class. 3.2 Learning Activity A university first grade “Information Science” class was used to demonstrate the system. In the class, the teacher lectured about aspects of information technology using scientific language. Nearly a half of students had low motivation as this class is one of the most difficult. We used this system and method two years ago in a class of approximately 110 students. This paper is based on research carried out in 2007. Every class, we let students use the functions - “Step of attendance”, “Step of feedback today’s lecture evaluation”, “Confirm numbers of attendance”. We told
Fig. 3. Sample screen shot shows student’ comment
Fig. 4. Students’ works are shown into the screen
Fig. 5. Enlarged screen shot
626
A. Ando and K. Morimoto
students that if you have any questions, you should use the function of “Real-time question” at anytime. Sometimes during a lecture, we wanted to know comments anonymously, such as thought about their own experience of bullying, the question what impact do you think technology will have, or how do you feel about giving speeches (see Figure 3). The “Visual Presenter” was used for showing the pictures that were converted from analog picture to digital picture to all students. (See Figure 4 and Figure 5). 3.3 Examination Method The questionnaire was done anonymously. Table 1 shows the content of the questionnaire. We asked students to evaluate three points: first the whole system, second individual function, and thirdly the ARCS motivational model. Table 1. The contents of questionnaire Total evaluation of this system Have you subscribed to the“packet fixed-communication-charge”? Did you mind paying the communication charge? Are you a heavy user of a mobile phone? Is using the mobile phone difficult? Was the function of confirming attendance useful? Was the function of confirming point of test useful? Could you remember what you lerned at the end of the lecture? Could this method shorten time for checking attendance? Did you think that your comments had any effect on the lecture? What is your total evaluation for this system? About the functions (From 1: worst to 7: greatest) What do you think about “Anonymous Comment” function ? What do you think about “Visual Presenter “ function ? What do you think about “Real-time question “ function ? What do you think about “Step of attendance (LMS: Learning management System)”? About ARCS model (From 1: Weak to 7: Strong) How would you rate your attention using the system? How would you rate he relevance of the lecture? How would you rate your confidence after using the system? How would you rate your satisfaction after using the system?
How Mobile Interaction Motivates Students in a Class?
627
4 Result Figure 6 shows the result of this questionnaire 88% of students had subscribed to the “packet fixed-communication-charge”. Figure 7 shows the cross-tabulation graph about “Charge plan” and “Students’ comments about the communication charge”. Thus, we can see 71% of students (77persons) feel the system causes no problem. But students who subscribed to the “pays as you go plan” worried about the communication-charges. To investigate to see what the differences between these plans were, we analyzed the different plans using a nonparametric analysis. The result of this analysis showed no significant differences between the plans. Thus, there is not difference between the effectiveness of whatever plan students subscribed to.
Fig. 6. The result of this questionnaire 80
65
60 40 20 0
6 1 5
10 1
7
12 1
ly ng so tro are so ink os tc Th ly nk No ng t hi t ro T n' t os Do ks n i pay as you go plan th n' t Do Packet fee free plan
0
ks hin
Fig. 7. The cross-tabulation graph about “Charge plan” and “Students’ comments about the communication charge”
Figure 8 shows total evaluation value of this system. This figure shows the system was highly evaluated. An average value is 8.3 (S.D=1.4). However, a few students did not evaluate it as “Good”. To understand what is different between students that evaluated it highly (average over (9 point)) and those who evaluated in lower (average under (8 point)), we using the Kruskal Wallis test. Table 2 shows this result.
628
A. Ando and K. Morimoto
It appears that the students that rated this high felt better about these three points – “Could you remember what you learned learning at the step of at the end of lecture?”, “Could this method shorten time for check attendance?” and “Did you think that your comments have much effect on lecture?”
Fig. 8. Total evaluation for this system Table 2. Compare by t-test Average under (8 point)
Average over (9 point)
Did you care of communication charge?
4.0
4.2
0.949
n.s.
Are you a heavy user of a mobile phone?
3.4
3.5
0.430
n.s.
Is to use mobile phone troublesome?
3.2
3.2
0.036
n.s.
Was the function of confirming attendance useful?
3.9
4.6
1.615
n.s.
Was the function of confirming point of test useful?
4.3
4.5
1.038
n.s.
Could you remember about learning at the step of at the end of lecture?
3.9
4.6
3.948
***
Could this method shorten time for check attendance?
3.9
5.0
7.901
***
Did you think that your comments have much effect on lecture?
4.3
4.7
3.107
**
≦
≦
t value
significance probability (p)
≦
significance probability: p<0.001***, 0.001 p<0.01 **, .0.01 p<0.05 *, 0.05 p n.s.
In order to consider the evaluation of this system, we carried out a multiple regression analysis of the evaluation point of this system, using each question. We calculated the standard partial regression coefficient (β). Table 3 shows this result. “Advantage for usefulness to check attendance” and “Remember what you learned at the end of lecture” related to evaluation point of this system. This means literacy with mobile phones did not affect the system evaluation. Table 4 shows an average of each factor of the ARCS model which students answered. Each evaluation answer was chosen from “1: disagree strongly” to “7: agree strongly”. This table shows students highly evaluated “Attention” and “Confidence”. Finally, we analyzed the relation between each function of this system and each factor of the ARCS model. We used a multiple regression analysis method to do this. Table 5 shows this result.
How Mobile Interaction Motivates Students in a Class?
629
Table 3. Multiple regression analysis of the evaluation point of this system Standerd partial regression coefficient(β)
Explantory variable Advatage for usefulness to confirm attendance
0.503
***
Reflection of commnets to improve lecture
0.257
**
Contribution rate: 0.39 *** significance probability: p<0.001***, 0.001 p<0.01 **, .0.01
≦
≦p<0.05 *
Table 4. Average of each factor of ARCS model average
standard diviation
Attention
5.6
1.6
Relevance
4.6
1.6
Confidence
5.1
1.2
Satisfaction
4.4
1.4
Table 5. Multiple regression analysis toward each function Function
Anonymous Comments
Visual Presenter
Contribution rate
0.84 ***
0.31 ***
Explantory variable Attention
0.77 ***
Relevance
0.08 n.s.
Confidence
0.35 **
Satisfaction
0.06 n.s.
Attention
0.89 ***
Relevance
0.33 **
Confidence
0.11 n.s.
Satisfaction Attention Learning Management System(LMS)
0.45 ***
≦
Standerd partial regression coefficient(β)
0.14 n.s. -0.03 n.s.
Relevance
0.12 n.s.
Confidence
-0.26 n.s.
Satisfaction
0.43 **
≦
significance probability: p<0.001***, 0.001 p<0.01 **, .0.01 p<0.05 *
The function of “Anonymous Comment” related to A (Attention) and C (Confidence). A (Attention) was valued 2.2 times as much as C (Confidence) was valued. The function of “Visual Presenter” related to A (Attention) and R (Relevance). A (Attention) was valued 2.7 times as much as R (Relevance) was valued. The
630
A. Ando and K. Morimoto
function of “Real-time Question” did not seem to relate to the ARCS model directly. Finally, the function of “Mobile-LMS” related to S (Satisfaction). From these results, we could conclude that our system included all the ARCS factor that effected student motivation.
5 Conclusion We have tried to use a mobile phone as a teaching and learning tool in class. In particular, a mobile phone is a familiar device for Japanese students. Most of them have one and use it everyday. They write many e-mails, browse web pages, take pictures and make calls. This study showed statistically how our method affected students’ motivate. As the result, we found that types of students’ communication-charge plans did not affect the evaluation of this method. The most effective factors were “Advantage for shortening to check attendance” and “Remember what was learned at the end of lecture”. It means that students felt that these two factors were valuable. Moreover, we showed the relationship between our method and the “motivational model”. According to our result, it is better for a teacher to tell two things - this method can use well lecture time by shortening time of checking attendance and not only listen to lecture but also students’ comments affect to the contents of lecture. The “Attention” factor of the ARCS model relates to the “Anonymous Comment” and the “Visual Presenter” functions. It means that it is better for a teacher to use the “Attention factor” at the beginning of each lecture. For example, a teacher may confirm students’ knowledge. The “Confidence” factor was affected by the “Anonymous Comment” function. By viewing many comments, students can know other ideas. So students can understand their comment objectively. The previous study said that Japanese students were averse to making mistakes in front of many people [7]. Students felt uncomfortable to speak their voice. If there are students who are afraid to speak, a teacher can use the “Anonymous Comment” function. The “Relevance” factor had a great effect on “Visual Presenter”. It may be a good chance to remember matters relevant to the lecture by using a writing task. The “Satisfaction” factor had an effect on “LMS (Learning Management System)”. It is waste that precious time is spent by checking attendance. And it seems useful for students to confirm easily how often they attended and their test point. Obviously, mobile phones can be used in every classroom without special equipment. Not only a traditional paper and pencil, but also a mobile phone should be regarded as a school tool. We expect that this method will establish new interaction.
References 1. Ministry of Internal Affairs and Communications: Information and Communications in Japan (2008), http://www.soumu.go.jp/s-news/2008/pdf/080418_4_bt.pdf 2. Ando, A., Abiko, H., Kinefuti, M.: Enhancing Interaction in School Hours by Using Cellular Phones. International Ergonomics Association , XVth Triennial Congress, vol. 3 (2003)
How Mobile Interaction Motivates Students in a Class?
631
3. Cabinet Office Director-General for Policy Planning: 5th Research report of consciousness about information society and young people, http://www8.cao.go.jp/youth/kenkyu/jouhou5/2-1-3.html#2-1-3-1 4. Ando, A., Morimoto, K.: A New Method for Teachers and Students to Record Daily Progress in a Class. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007, Part II. LNCS, vol. 4558, pp. 245–251. Springer, Heidelberg (2007) 5. Tamura, H., Choui, M., Ueniinai, S.: Introduction of Cellular Networking into University Classes. In: Proc. of Symposium on Mobile Interactions and Navigation, pp. 99–104 (2003) 6. Miyata, H.: Development and Evaluation of Web-based Photo Database System to Support Knowledge Sharing in Large-scale Lecture Classes Utilizing a Cell Phone. Japan Journal of Educational Technology 31 (suppl. 20080210), 173–176 7. Ando, A., Abiko, H., Yamada, T.: A Statistical Evaluation of Participants’ Awareness of Using e-mail with Cellular-phone during a University Class. In: International Association of Societies of Design Research. Abstracts of International Design Congress (2005)
Sensation Seeking, Self Forgetfulness, and Computer Game Enjoyment Xiaowen Fang1 and Fan Zhao2 1
School of Computing, College of Computing and Digital Media, DePaul University, USA [email protected] 2 Lutgert College of Business, Florida Gulf Coast University, USA [email protected]
Abstract. This paper investigates the relationship between enjoyment of computer game play and two personality traits (sensation seeking and selfforgetfulness). Hypotheses were proposed based on a review of computer game enjoyment, game characteristics, personality theories, and effects of computer game play. A survey is conducted in two US universities. Results and implications are discussed. Keywords: Sensation seeking, self forgetfulness, personality, computer game, enjoyment.
1 Introduction The majority of prior psychological game research has focused on two specific areas of investigation: (a) the effects of excessive playing on children and adolescents and (b) whether or not playing (violent or nonviolent) video games makes children and adolescents more violent ([1], [2], and [3]). However, computer game play has become a prominent form of entertainment and a comprehensive framework for examining the interaction between player characteristics and game features is needed for a better understanding of the process of game play and its impacts on users. As a first step towards building such a comprehensive framework, this research attempts to investigate the impact of two personality traits (sensation seeking and self forgetfulness) on enjoyment of computer game play. The following sections discuss prior research on computer game enjoyment and personality, theoretical framework, method, and results.
2 Background Literature Prior research in the following allied fields was examined: enjoyment of computer game play and personality. 2.1 Computer Game Play and Enjoyment In several comprehensive studies, Sherry and his colleagues [4] have enumerated a set of factors of video game uses related to gratifications. Their studies used focus group M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 632–641, 2009. © Springer-Verlag Berlin Heidelberg 2009
Sensation Seeking, Self Forgetfulness, and Computer Game Enjoyment
633
research and surveys of over 1,000 participants ranging in age from 10 to 24 years old. They have identified that five factors -- competition, challenge, social interaction, diversion, and fantasy – could lead to a sense of gratification. Grodal [5] explains that much of the fascination with video games can be attributed to the ability of players to control the game in terms of outcomes (i.e., deciding how the "plot" will unfold), the speed at which the game progresses, and mastery of the game or mastery over other players. Vorderer, Hartmann, and Klimmt [6] have provided support for the idea that game play is more enjoyable when there are a variety of ways to solve a challenge offered in a video game. Agarwal and Karahanna [7] propose a model of deep involvement with software. They analyze user intentions to use IT technology and emphasize the cognitive determinants. Hoffman and Novak [8] present a model of flow in computer–mediated environments. The flow model involves “positive affect,” “exploratory behaviors,” and “challenge/arousal,” which could be considered as elements of enjoyment. A stream of recent studies has used the flow model to interpret and understand user experience during game play [9], [10], [11], [12]. Flow is widely considered to have eight elements: concentration, challenge, skills, control, clear goals, feedback, immersion, and social interaction. Fang, Chan, Brzezsinksi, and Nair [13] develop an instrument to measure enjoyment of computer game play based on tripartite media enjoyment model ([14]). This instrument measures three types of reactions during computer game play: cognitive, behavioral, and affective. 2.2 Personality Personality can be defined as a stable set of tendencies and characteristics that determine the commonalities and differences in people’s psychological behavior (thoughts, feelings and actions) that have continuity in time [15]. Over the years, the five-factor model (e.g. [16], [17], [18], [19], and [20]) has gained acceptance among researchers because it establishes a common taxonomy [21]. It contains the following five dimensions of personality: Extraversion - outgoing and stimulation-oriented vs. quiet and stimulation-avoiding; Neuroticism - emotionally reactive, prone to negative emotions vs. calm, imperturbable, optimistic; Agreeableness - affable, friendly, conciliatory vs. aggressive, dominant, disagreeable; Conscientiousness - dutiful, planful, and orderly vs. laidback, spontaneous, and unreliable; Openness to experience - open to new ideas and change vs. traditional and oriented toward routine. In an alternative five-factor model, Zuckman and his colleagues [22] add Impulsive Unsocialized Sensation Seeking, Aggression-Hostility, and Activity, to Sociability and Neuroticism-Anxiety. Sensation seeking is a personality trait defined by the need for varied, novel, and complex sensations and experiences and the willingness to take physical and social risks for the sake of such experience [23]. Cloninger, Przybeck, and Svrakic [24] describe a psychobiological model of the structure and development of personality that account for dimensions of both temperament and character. There are three character dimensions in this model: selfdirectedness, cooperativeness, and self-transcendence. Self-transcendence refers generally identification with everything conceived as essential and consequential parts of a unified whole. The staple of Self-forgetfulness has been described as the same as
634
X. Fang and F. Zhao
experienced transiently by people when they are totally absorbed, intensely concentrated, and fascinated by one thing. In such one-pointed concentration, people may forget where they are and lose all sense of the passage of time.
3 Theoretical Framework Some researchers [25] have suggested that two personality traits, sensation seeking and self-forgetfulness, may lead to higher engagement in computer game play. Sensation seeking is a personality trait defined by the need for varied, novel, and complex sensations and experiences and the willingness to take physical and social risks for the sake of such experience [23]. Computer games are designed to offer thrills and excitement. It is likely that highly sensation seeking players may find a computer game more entertaining. Therefore, it is hypothesized that: H1. Sensation seeking is positively related to enjoyment of computer game play. Self-forgetfulness has been described as the same as experienced transiently by people when they are totally absorbed, intensely concentrated, and fascinated by one thing [24]. In such one-pointed concentration people may forget where they are and lose all sense of the passage of time. Given these characteristics, highly self-forgetful individuals would be expected to experience higher presence when playing computer games and thus perceive higher level of enjoyment. H2. Self-forgetfulness is positively related to enjoyment of computer game play.
4 Method A survey was conducted in two U.S. universities to investigate the relationships between enjoyment of computer game play and the two personality traits: sensation seeking and self-forgetfulness. In total, 173 students responded to the survey. Table 1 shows the demographic information of the participants. There are two sections in the questionnaire. Section 1 contains 21 items about participant’s demographic information and personality traits. Responses from the first 6 items were summarized in Table 1. Sensation seeking trait was measured by 12 items (e.g., “I sometimes like to do things that are a little frightening.”) taken from the sensation seeking scale introduced by Zuckerman [23]. Self-forgetfulness trait was measured by 3 items taken from the Temperament and Character Inventory [24]. All questions about personality traits were rated on a 7-point scale, ranging from 1 (strongly disagree) to 7(strongly agree). Section 2 of the survey questionnaire contains questions about enjoyment of computer game play. Enjoyment was measured by 15 items from the computer game enjoyment scale developed by Fang et al. [13]. This instrument has 3 subscales: affect, behavior, and cognition. A printed questionnaire was handed to students in classes, in a gaming lab, and in other public places on campus. The survey could be completed on the spot or completed at the participant’s leisure time and returned to the research team via campus mail. A gift certificate was provided as an incentive for participation. All the responses were kept anonymous. Participants were required to answer all the 21 questions in Section 1. In Section 2, participants were asked to rate their enjoyment of
Sensation Seeking, Self Forgetfulness, and Computer Game Enjoyment
635
Table 1. Demographic Information of Participants Variables Gender Age How long have you been playing computer/video games? How many hours on average do you play? How often do you play computer/video games?
On average, how many hours do you play in each week?
Male (%) Female (%) Mean Std. Mean (years) Std. Mean Std. Daily (%) Weekly (%) Monthly (%) Seldom (%) Mean Std.
59.5 40.5 22.8 5.02 12.1 5.95 2.0 2.02 8.1 25.4 30.1 35.3 5.8 8.27
five categories of games. The five categories of computer games (Action/ Adventure/ Shooting/Fighting, Role playing, Sport Games/Racing, Family Entertainment/ Simulation, and Strategy) were primarily derived from the classification scheme adopted by the Entertainment Software Association (http://www.theesa.com). For each category of games, some sample games were listed. Participants were instructed to choose a game they played most frequently and answer questions about their experience with it. If they had never played any game in a particular category, they might skip this category. They were not allowed to skip any questions though once they started to evaluate a game category.
5 Results 5.1 Data Analysis Procedure For responses to each of the five categories of games, the following data analysis procedure was applied. A factor analysis was conducted to establish the discriminant and construct validity. Only items highly loaded (loadings > 0.5) on one of the following constructs were retained in the analysis: sensation seeking, self-forgetfulness, affect, behavior, and cognition. In some cases, a whole construct might be excluded from the analysis if none of its items converged together. Complex items loaded on multiple constructs were also excluded from the analysis. Subsequently, reliability analysis was performed. Cronbach’s Alpha values were calculated to check the internal consistency of the items. Only constructs with a Alpha value of greater than 0.7 were retained and used for further analysis. Finally, a correlation matrix was computed and linear regression was conducted to explore the relationships between enjoyment of computer game play and the two personality traits: sensation seeking and self-forgetfulness. In the following 5 subsections, the data analysis results for each of the five categories of games will be presented and discussed in detail.
636
X. Fang and F. Zhao
5.2 Action/Adventure/Shooting/Fighting Games 154 participants evaluated this category. Constructs affect, behavior, cognition, and sensation seeking passed both the factor and reliability analyses. Self-forgetfulness was excluded due to low reliability of the items (Cronbach’s Alpha value=0.417). Tables 2 and 3 present the correlation matrix and the regression analysis results respectively. Table 2. Matrix of Action/Adventure/Shooting/Fighting Games
Affect
Pearson Correlation
Affect
Sensation Seeking
Behavior
Cognition
1
.080
.218(**)
.177(*)
.325
.007
.028
1
.256(**)
-.055
.001
.501
1
-.103
Sig. (2-tailed) Sensation Seeking
Pearson Correlation
.080
Sig. (2-tailed)
.325
Behavior
Pearson Correlation Sig. (2-tailed)
Cognition
Pearson Correlation Sig. (2-tailed)
.218(**)
.256(**)
.007
.001
.177(*)
-.055
-.103
.028
.501
.205
.205 1
Table 3. Regression Analysis of Action/Adventure/Shooting/Fighting Games Model
R-Square
Beta
T-Value
P Value
Behavior=Sensation Seeking + Errors
0.059
0.256
3.267
0.001
Both the correlation and regression analyses suggest that sensation seeking is positively related to and has a significant effect on behavior (correlation coefficient=0.256, p-value=0.001; r-square=0.059, p-value=0.001). This result implies that a higher sensation seeking personality leads to more engagement in computer game play. Because behavior is a sub-factor of enjoyment of computer game play ([13]), indications of higher engagement suggest higher enjoyment. Therefore, hypothesis 1 is supported for action/adventure/shooting/fighting games. Because self-forgetfulness construct was excluded from the analysis due to low reliability, it is impossible to assess its relationship with enjoyment. In the future study, more items on this construct may improve its reliability and make it possible to examine its effect on enjoyment. 5.3 Role Playing Games 76 participants evaluated this category. Constructs affect, behavior, sensation seeking, and self-forgetfulness passed both the factor and reliability analyses. Cognition was excluded because its items didn’t converge in the factor analysis. Tables 4 and 5 present the correlation matrix and the regression analysis results respectively.
Sensation Seeking, Self Forgetfulness, and Computer Game Enjoyment
637
Table 4. Correlation Matrix of Role Playing Games
Affect
Pearson Correlation
Affect
Behavior
1
Sig. (2-tailed) Behavior Sensation Seeking Self-Forgetfulness
-.003
Sensation Seeking -.037
Self Forgetfulness -.068
.977
.750
.562
1
.258(*)
.300(**)
.025
.009
1
.363(**)
Pearson Correlation
-.003
Sig. (2-tailed)
.977
Pearson Correlation
-.037
.258(*)
Sig. (2-tailed)
.750
.025
Pearson Correlation
-.068
.300(**)
.363(**)
Sig. (2-tailed)
.562
.009
.001
.001 1
Table 5. Regression Analysis of Role Playing Games Model
R-Square
Beta
T-Value
P Value
Behavior=Self-Forgetfulness + Errors
0.077
0.300
2.702
0.009
The correlation analysis suggests that behavior is significantly correlated to both sensation seeking (correlation coefficient=0.258, p-value=0.025) and self-forgetfulness (correlation coefficient=0.300, p-value=0.009). A further regression analysis (rsquare=0.077, p-value=0.009) shows that self-forgetfulness has a bigger effect than sensation seeking and sensation seeking didn’t enter the model. These results support both hypotheses 1 and 2. They also suggest that a higher sensation seeking personality results in higher engagement in computer game play and a higher self-forgetfulness personality also leads to higher engagement in computer game play. The higher the engagement, the higher the enjoyment [13]. 5.4 Sport and Racing Games 125 participants evaluated this category. Constructs affect, behavior, and sensation seeking passed both the factor and reliability analyses. Both cognition and selfforgetfulness were excluded due to low reliability of the items (Cognition: Cronbach’s Alpha value=0.597; Self-forgetfulness: Cronbach’s Alpha value=0.493). Tables 6 and 7 present the correlation matrix and the regression analysis results respectively. Both the correlation and regression analyses show the significant relationship between sensation seeking and behavior (correlation coefficient=0.276, p-value=0.002; r-square=0.076, p-value=0.002). This result suggests that a higher sensation seeking personality is associated with higher engagement in computer game play. Higher engagement implies higher enjoyment [13]. Therefore, hypothesis 1 is supported. Because self-forgetfulness construct was excluded from the analysis due to low reliability, it is impossible to assess its relationship with enjoyment. In the future study, more items on this construct may improve its reliability and make it possible to examine its effect on enjoyment.
638
X. Fang and F. Zhao Table 6. Correlation Matrix of Sport and Racing Games
Affect Affect
Pearson Correlation
1
Sensation Seeking .067
Sig. (2-tailed) Sensation Seeking
Pearson Correlation
Behavior
.457
.000
1
.276(**)
.067
Sig. (2-tailed)
Behavior .354(**)
.457
Pearson Correlation
.002
.354(**)
.276(**)
.000
.002
Sig. (2-tailed)
1
Table 7. Regression Analysis of Sport and Racing Games Model Behavior= Errors
Sensation
Seeking +
R-Square 0.076
Beta 0.276
T-Value 3.182
P Value 0.002
5.5 Family Entertainment/Simulation Games 136 participants evaluated this category. Constructs affect, behavior, cognition, and sensation seeking passed both the factor and reliability analyses. Self-forgetfulness was excluded due to low reliability of the items (Cronbach’s Alpha value=0.426). Tables 8 and 9 present the correlation matrix and the regression analysis results respectively. Both the correlation and regression analyses indicate that sensation seeking has a significant effect on cognition (correlation coefficient=0.171, p-value=0.047; r-square=0.022, p-value=0.047). This result suggests that a high sensation seeking personality is linked to a higher cognition value in computer game play. Because cognition is a sub-factor of enjoyment of computer game play [13], a higher cognition value implies higher enjoyment. Therefore, sensation seeking is positively related to enjoyment of computer game play and hypothesis 1 is supported. Table 8. Correlation Matrix of Family Entertainment and Simulation Games Affect Affect
Pearson Correlation Sig. (2-tailed)
Behavior
Pearson Correlation Sig. (2-tailed)
Cognition
Pearson Correlation Sig. (2-tailed)
Sensation Seeking
Pearson Correlation Sig. (2-tailed)
Behavior 1
.209(*)
Cognition
Sensation Seeking
.209(*)
.492(**)
.044
.015
.000
.610
1
.114
.150
.188
.081
1
.171(*)
.015 .492(**)
.114
.000
.188
.044
.150
.171(*)
.610
.081
.047
.047 1
Sensation Seeking, Self Forgetfulness, and Computer Game Enjoyment
639
Table 9. Regression Analysis of Family Entertainment and Simulation Games Model
R-Square
Beta
T-Value
P Value
Cognition= Sensation Seeking + Errors
0.022
0.171
2.008
0.047
It is interesting to observe that sensation seeking impacts on enjoyment through cognition for family entertainment/simulation games instead of through behavior like most of other games. This fact may suggest that one primary reason of enjoying this type of games is the perceived value of game characters’ actions. Because self-forgetfulness construct was excluded from the analysis due to low reliability, it is impossible to assess its relationship with enjoyment. In the future study, more items on this construct may improve its reliability and make it possible to examine its effect on enjoyment. Table 10. Correlation Matrix of Strategy Games Affect Affect Behavior Sensation Seeking
Pearson Correlation Sig. (2-tailed) Pearson Correlation Sig. (2-tailed) Pearson Correlation Sig. (2-tailed)
1 .263 .052 .080 .560
Behavior .263 .052 1 .153
Sensation Seeking .080 .560 .153 .265 1
.265
5.6 Strategy Games 55 participants evaluated this category. Constructs affect, behavior, and sensation seeking passed both the factor and reliability analyses. Cognition was excluded because its items didn’t converge in the factor analysis. Self-forgetfulness was excluded due to low reliability of the items (Cronbach’s Alpha value=0.389). Table 10 presents the correlation matrix. No significant correlation was found in the analysis. Because the sample size (55) is much smaller than other types of games, it probably doesn’t have enough statistic power to test the hypotheses. In the future study, a larger sample should help boost the power and make it possible to test the hypotheses.
6 Conclusion This paper reports a survey that the authors conducted to investigate the impact of two personality traits (sensation seeking and self forgetfulness) on enjoyment of computer game play. Major findings from this survey include: 1) Sensation seeking has a significant and positive effect on enjoyment of computer game play through enhanced engagement during game play for action/ adventure/shooting/fighting, role playing, and sport/racing games. 2) Sensation seeking has a significant and positive effect on enjoyment of computer game play through enhanced cognition values for family entertainment/simulation games. 3) Self-forgetfulness has a significant and positive
640
X. Fang and F. Zhao
effect on enjoyment of computer game play through enhanced engagement during game play for role playing games. Previous research has primarily focused on impact of exposure to violent video games on game player’s behavior ([1], [2], and [3]). There have been few studies investigating how personality is linked to computer game enjoyment. This study attempts to start the process of filling this gap. It shows that personality traits can be linked to computer game enjoyment and the type of games plays an important role. Our endeavor probably raises more questions than painting a complete picture but we definitely see strong research potential. Future research is called on expanding the personality traits associated with computer game play and improving the measurement of both personality traits and computer game enjoyment.
References 1. Anderson, C.A., Bushman, B.J.: Effects of violent video games on aggressive behavior, aggressive cognition, aggressive affect, physiological arousal, and pro-social behavior: a meta-analytic review of the scientific literature. Psychological Science 12(5), 353–359 (2001) 2. Anderson, C.A., Dill, K.E.: Video games and aggressive thoughts, feelings, and behavior in the laboratory and in life. Journal of Personality and Social Psychology 78(4), 772–790 (2000) 3. Uhlmann, E., Swanson, J.: Exposure to violent video games increases automatic aggressiveness. Journal of Adolescence 27, 41–52 (2004) 4. Sherry, J.L., Desouza, R., Greenberg, B., Lachlan, K.: Relationship between developmental stages and video game uses and gratifications, game preference, and amount of time spent in play. Paper presented at the International Communication Association annual conference, San Diego, CA (2003) 5. Grodal, T.: Video games and the pleasure of control. In: Zillmann, D., Vorderer, P. (eds.) Media entertainment: The psychology of its appeal, pp. 197–213. Erlbaum, Mahwah (2000) 6. Vorderer, P., Hartmann, T., Klimmt, C.: Explaining the enjoyment of playing video games: The role of competition. In: Proceedings of the Second International Conference on Entertainment Computing. Carnegie Mellon University, Pittsburgh (2003) 7. Agarwal, R., Karahana, E.: Time flies when you’re having fun: cognitive absorption and beliefs about information technology usage. MIS Quarterly 24(4), 665–694 (2000) 8. Hoffman, D., Novak, T.: Marketing in Hypermedia Computer – Mediated Environments: Conceptual Foundations. Journal of Marketing 60(3), 50–68 (1996) 9. Aderud, J.: Flow and Playing Computer Mediated Games – Conceptualization and Methods for Data Collection. In: Proceedings of the Eleventh Americas Conference on Information Systems, pp. 472–478 (2005) 10. Aderud, J.: Measuring Flow While Playing Computer Mediated Games: A Pilot Study. In: Proceedings of the Twelfth Americas Conference on Information Systems, pp. 611–620 (2006) 11. Sweetser, P., Wyeth, P.: GameFlow: A Model for Evaluating Player Enjoyment in Games. ACM Computers in Entertainment 3(3), 1–24 (2005)
Sensation Seeking, Self Forgetfulness, and Computer Game Enjoyment
641
12. Wan, C.S., Chiou, W.B.: Psychological Motives and Online Games Addiction: ATest of Flow Theory and Humanistic Needs Theory for Taiwanese Adolescents. Cyberpsychology & Behavior 9(3), 317–324 (2006) 13. Fang, X., Chan, S., Brzezinski, J., Nair, C.: Measuring enjoyment of computer game play. In: Proceedings of the Fourteenth Americas Conference on Information Systems (AMCIS 2008) (2008) 14. Nabi, R.L., Krcmar, M.: Conceptualizing media enjoyment as attitude: implications for mass media effects research. Communication Theory 14(4), 288–310 (2004) 15. Maddi, S.R.: Personality theories: a comparative analysis, 5th edn. Dorsey Press, Chicago (1989) 16. Digman, J., Takemoto-Chock, N.: Factors In The Natural Language Of Personality: ReAnalysis, Comparison, And Interpretation Of Six Major Studies. Multivariate Behavioral Research 16(2), 149 (2008) (Academic Search Premier database (1981)) (retrieved October 11, 2008) 17. Peabody, D., Goldberg, L.: Some determinants of factor structures from personality-trait descriptors. Journal of Personality and Social Psychology 57(3), 552–567 (1989) (retrieved October 11, 2008), doi:10.1037/0022-3514.57.3.552 18. Thurstone, L.L.: The vectors of the mind. Psychological Review 41, 1–32 (1934) 19. Tupes, E.C., Christal, R.E.: Recurrent personality factors based on trait ratings. USAF ASD Tech. Rep. No. 61-97, Lackland Airforce Base, TX: U. S. Air Force (1961) 20. McCrae, R., Costa, P.: Comparison of EPI and Psychoticism scales with measures of the five factor model of personality. Personality and Individual Differences 6, 587–597 (1985) 21. Goldberg, L.R.: An alternative ’description of personality’: The Big-Five factor structure. Journal of Personality and Social Psychology 59(6), 1216–1229 (1990) 22. Zuckerman, M., Kuhlman, D.M., Thornquist, M., Kiers, H.: Five (or three): Robust questionnaire scale factors of personality without culture. Personality and Individual Differences 12, 929–941 (1991) 23. Zuckerman, M.: Sensation Seeking: Beyond the Optimal Level of Arousal. Lawrence Erlbaum Associates, Inc., Hillsdale (1979) 24. Cloninger, C.R., Przybeck, T.R., Svrakic, D.M.: A psychobiological model of temperament and character. Archives of General Psychiatry 50, 975–990 (1993) 25. Ravaja, N., Salminen, M., Holopainen, J., Saari, T., Laarni, J., Järvinen, A.: Emotional Response Patterns and Sense of Presence during Video Games: Potential Criterion Variables for Game Design. In: Proceedings of the third Nordic conference on Humancomputer interaction, pp. 339–347. ACM, New York (2004)
Development of an Annotation-Based Classroom Activities Support Environment Using Digital Appliance, Mobile Device and PC Yoshiaki Hada and Masanori Shinohara National Institute of Multimedia Education, Research & Development Department, 2-12 Wakaba, Mihama, Chiba 261-0014, Japan {hada,shino}@nime.ac.jp
Abstract. An annotation-based classroom activities support environment with home network technology is proposed. The environment includes digital appliances, mobile devices, and PCs to facilitate a face-to-face class. The resulting system enables the sharing and playing of learning contents among the teacher and students in the class. The teacher and students can annotate the learning contents without file operation to share. In addition, the system operates devices on a network to support a class and has a user management feature. The system design and the prototype under development are described. Keywords: ubiquitous learning, home networking, pen-based interface, blended e-learning, face-to-face class.
1 Introduction Information and communication technology (ICT) is rapidly progressing, and many educational researchers want to apply these various technologies in the learning space [1, 2]. In the learning environment, small-scale electronic equipment, such as mobile phones or handheld gaming devices, are used. The devices are taken outdoors or to facilities for learning, such as museums. However, with the downsizing of computer terminals, most ICT research focuses on ubiquitous computing for daily life and the everyday environment [3]. There have been few studies on achieving a ubiquitous learning environment. With the progress of computer technology, digital appliances with network functions, such as VCRs, TVs, and digital cameras, have appeared. The network function of most digital appliances is implemented in relation to the Digital Living Network Alliance (DLNA) specification [4]. Digital appliances are not only easier to operate than PCs but also enable the user to collect, accumulate, and browse learning contents more easily than with a PC. For a learning system, the network function can connect several services and contents on PCs, mobile devices, and digital appliances. For example, by using the network function, a teacher can output contents recorded on a VCR through a projector or TV in a classroom by using a laptop in the M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 642–649, 2009. © Springer-Verlag Berlin Heidelberg 2009
Development of an Annotation-Based Classroom Activities Support Environment
643
classroom. Moreover, students can use the service and the contents by using the network functions of their own devices. We propose an annotation-based class activities support environment with home network technology. In the environment, users can use digital appliances, PCs, and mobile devices seamlessly through the home network specification. The environment uses the functions of digital appliances, e.g., collecting, recording, browsing, sending, and receiving of contents, via a network. In addition, for teachers and learners to handwrite comments and to share learning contents in real time, we developed a prototype on a PC. In designing the prototype, we targeted PCs, digital appliances, and mobile devices. Therefore, we considered their compatibilities in trying to develop unique learning support functions. In addition, to consider ease of use, a pen-based userinterface was used to support learning activities, e.g., operating multimedia contents and making presentations in class.
2 Design of Learning Environment 2.1 Use of ICT in Classroom In a face-to-face class, a teacher and students spend time in a classroom. The teacher explains the course content to students by using, for example, a blackboard, projector, or PC. In addition, to obtain student opinions and impressions, a teacher may put questions to the students. To gain a further understanding of the class, the teacher often offers group discussion or similar. To make class activities easy to understand for students, some teachers try to use ICT as a tool, e.g., a projector or PC. Recently, some researchers have been focusing on a learning theory called active learning [5] by theorizing the use of ICT, e.g., mobile phones or PCs, for classroom teaching. Classes have tried to use ICT for active learning [6, 7]. In those classes, the shapes and the disposition of desks were designed to make ICT tools user-friendly. However, only the ICT tools themselves make learning activities easy for students to understand. Thus, we focus on improving an ordinary classroom with home network technology to facilitate the use of ICT tools. Home network technology enables the teacher and students to use ICT tools more comfortably. Therefore, we propose an environment to facilitate using that technology in an ordinary classroom. To adopt a ubiquitous environment in a face-to-face class, we considered a teacher’s actions in a class. We focused on the following actions to design functions to facilitate a face-to-face class. 1) Teacher’s explanation to students: In a class, most teachers use a blackboard or PowerPoint presentation to give explanations. Handouts are often circulated to students in advance and are uploaded on a Web site. When using PowerPoint, teachers may use a laser pointer to highlight something and use the blackboard for a supplementary explanation. 2) Feedback from students: Teachers grasp the understanding of students from student behavior or from quizzes in class. In addition, teachers may incorporate the opinions of students as teaching material in the class.
644
Y. Hada and M. Shinohara
3) Discussion: To complete the active learning in class, the teacher may set students a task for group discussion. Students may determine a definite topic in class, or the teacher may introduce the topic of group discussion in the class. 4) Procedures for class: Procedures include taking attendance, grouping students for group work, and operating equipment in class. To support these four actions in a face-to-face class, we designed the following functions: to share learning contents, annotate learning contents, use a pen-based interface, and manage attendance and groups. 2.2 Sharing Learning Contents This function enables learning contents to be shared electronically instead of by paper. Terminals that are compatible with the home network, e.g., PCs, mobile devices, and TVs, can handle multimedia contents, e.g., music, pictures, and movies. The interfaces of most home network clients show the contents sorted according to the media type. The learning contents in a class consist of learning materials of several media types. Therefore, the prototype treats all learning materials in class without classification by the media type by a unit of a class. The media is compliant as HTML documents and pictures and movies. HTML documents are prepared by using a conversion program on documents created by a word processing tool. HTML documents are also prepared for learning by using HTML documents from Web sites. Pictures are prepared by using a conversion program to create picture files from PowerPoint files. Movies are prepared for using video clips to give explanations accompanied by animated visuals. 2.3 Annotation and Sharing Annotation of learning contents is a function for drawing on the currently displayed learning content. The user can make annotations using not only a mouse but also a pen with a tablet. Annotations can be made on movie files that are being played with the written time of the contents. The annotations are shared with other students and the teacher as files via the network. The annotation file contains the annotation history, such as user, group, color of annotation, and time of annotation (when made on the video media), and so on. The system aims to display the learner’s thinking process through the annotation file. 2.4 Pen-Based Interface The interface is designed to be easy to use with a pen with a tablet PC or a tablet device. The user can select the learning content to annotate and can annotate the learning contents while they are played. Other devices can be controlled with the network function and infrared remote control via the network by user operations on the screen.
Development of an Annotation-Based Classroom Activities Support Environment
645
2.5 User Management By the user registering before using the system, the system can take attendance in conjunction with the network function or other devices. In addition, it can manage group work by grouping the students.
Fig. 1. Overview of the hone network and learning function
3 System Overview 3.1 Adding Class Support Function The mechanism of the home network specification is shown in Fig. 1. The mechanism enables the contents in storage devices to be shown on display devices via the network. Devices are digital appliances, mobile devices, or PCs. Regardless of the location of the storage device, the environment provides contents without complications, such as copying the contents, via the network. For example, pictures taken by a digital camera can be shared with other devices via the network in this environment. However, the home network is hard to use because the contents are classified by media type, e.g., picture, music, or video, in the home network player. Moreover, functions for facilitating education are not available, e.g., ability to annotate contents. Therefore, we added functions to support a face-to-face class, as shown as Fig. 2. The added functions are as follows. 1) Sharing teaching materials: This is a function for sharing contents in the class among the teacher and students. For example, the contents taken by a digital camera can be shared in class in real time. 2) Annotation: This function enables annotations to be made on playing contents. It can record the history of annotations. Students can present the
646
Y. Hada and M. Shinohara
results of student discussion with annotations. The annotations recorded with the teaching materials can be shared with the teacher and other students. Therefore, not only can students view the annotations of other groups, but the teacher can get feedback from the results. 3) Pen-based interface: The system uses a pen-based interface that tablet PCs or some mobile devices have for easy operation. For example, the user can make annotations by hand and can change the status of devices connected by the network. 4) Managing attendance and grouping: This function takes attendance. It can also manage the grouping of students.
Fig. 2. Development area of proposed system
3.2 Adding Class Support Function The system flow is shown in Fig. 3. For using the system, a wireless LAN is put in the classroom. According to need, a projector, PC, digital camera, and so on are put in place. The flow is as follows. 1) Creating and uploading learning contents: The teacher creates teaching materials for a class and uploads them to the system. 2) Preparing the classroom: The wireless LAN router is put in a class before the class starts. The teacher connects the teacher’s PC to the wireless LAN. In addition, the teacher prepares to use devices in the class. 3) Starting the teacher’s system: The teacher starts his or her system. 4) Connecting student devices to network: Students connect their devices to the wireless LAN. 5) Using in the class: The teacher gives classes by using the system.
Development of an Annotation-Based Classroom Activities Support Environment
647
Fig. 3. System flow
4 Example of Usage The purpose of the system is to improve the use of learning contents in class and to gain some feedbacks from students. A teacher can use the system for a presentation tool. In addition, to gain a further understanding of the class, students can actually operate the learning contents by annotation and so on. For example, if a teacher gives an explanation of motion of equipment, the teacher may use a video clip. This system can annotate on playing the video clip for the use. In addition, the annotation and the video clip can be shared. Therefore, students can verify the video and the annotation. In group work, the system facilitates to arrange the students’ through over a finite period of time because the students can draw on the HTML documents and can share with the teacher. Therefore, the result can be used as a feedback in the class.
5 Implementation 5.1 System Architecture The prototype was developed using Java on Windows. The system uses CyberLink for Java and uses Java Media Framework (JMF) and Communication API as libraries. The prototype consists of three parts: the system controller, the annotation editor, and the user manager. The controller can get and set device statuses. In addition, it controls external devices. The editor can make annotations and share learning contents from storage devices on the network. The user manager provides management of user attendance and grouping of students.
648
Y. Hada and M. Shinohara
5.2 Structure of System Devices used in the system consist of three parts: (1) an interface for users, (2) an application to provide services to other devices on the network, (3) and a network function (Fig. 4). For example, a TV that does not have storage consists of a content player, display, and controller. A VCR consists of only a contents server. The architecture of the annotation editor is in Fig. 5. This architecture is part of the architecture in Fig. 4 and has functions to support a face-to-face class.
Fig. 4. Architecture of ordinary device
Fig. 5. System structure of proposed system
5.3 Structure of System A snapshot of the prototype interface is presented in Fig. 6. The system enables the sharing and annotating of learning contents in class. The interface has a content player for annotation and a user management function.
Development of an Annotation-Based Classroom Activities Support Environment
649
Fig. 6. Snapshot of system interface
6 Concluding Remarks We proposed a system that supports activities in face-to-face classes with home network technology to provide ubiquitous learning. We will next adjust the system design to further develop the prototype. In addition, we will consider evaluating the use of the prototype in an actual face-to-face class. Acknowledgements. This work was supported by Grant-in-Aid for Young Scientists (KAKENHI) (B) (18700669).
References 1. Ogata, H., Matsuka, Y., Bishouty, M., Yano, Y.: LORAMS: Capturing, Sharing and Reusing Experiences by Linking Physical Objects and Videos. In: International Workshop on Pervasive Learning 2007, pp. 34–42 (2007) 2. Hada, Y., Shinohara, M., Shimizu, Y.: K-tai Campus: University-sharing Campus Information System Applicable to Mobile Phone and PC. In: Proceedings of IEEE WMTE 2005, pp. 164–168 (2005) 3. Abowd, G.D., Mynatt, E.D.: Charting Past, Present, and Future Research in Ubiquitous Computing. ACM Transaction on Computer Human Interaction 7(1), 29–58 (2000) 4. DLNA specification, http://www.dlna.org/home/ 5. Johnson, D.W., Johnson, R.T., Smith, K.A.: Active Learning: Cooperation in the college classroom. Interaction Book Company (1998) 6. TEAL, http://icampus.mit.edu/TEAL/ 7. KALS (in Japanese), http://www.kals.c.u-tokyo.ac.jp/
An Empirical Investigation on the Effectiveness of Virtual Learning Environment in Supporting Collaborative Learning: A System Design Perspective Na Liu, Yingqin Zhong, and John Lim School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 117543 {disln,diszyq,jlim}@nus.edu.sg
Abstract. This study theoretically develops and empirically tests a model that explains how virtual learning environment (VLE) characteristics can ultimately influence learning effectiveness by directly affecting leaner’s control over the learning material and leaner’s communication and interaction with peers and instructor. The findings of the empirical study shows that system accessibility, ease of navigation, system interactivity and system support serve different roles in achieving effective learning outcomes. This study also demonstrates the feasibility of incorporating different learning theories into VLE design and how they would inform different perspective of learning. Keywords: Virtual learning environment (VLE), system characteristics, selfcontrolled learning, communication-based learning.
1 Introduction Fuelled by the knowledge economy, the importance of education and training to individual and the society has been widely recognized. In the year 2007, total US education and training market has reached $980.2 Billion [1]. With the swift advancement of computer hardware, software and network technologies, learning is enhanced by various information and communication technological supports. Technology-mediated learning has received increasing attention in the filed of information systems research [2]. To better inform the use of information technology in education, researchers pointed out the necessity to understand the underlying pedagogical assumptions better [3]. In the educational psychology literature, constructivism learning model emphasizes the importance of self-controlled learning, and collaborativism highlights the function of communication-oriented learning. However, most researches have been focused on how to realize a single pedagogical purpose using different technologies and few studies have looked at how effectively information technologies could facilitate various pedagogical goals at the same time. As a sophisticated form of e-learning, virtual learning environment (VLE) are computer-based environments allowing interactions and encounters with other learners and providing access to a wide range of resources [4]. In this study, we will look at how the system characteristics of VLE M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 650–659, 2009. © Springer-Verlag Berlin Heidelberg 2009
An Empirical Investigation on the Effectiveness of Virtual Learning Environment
651
could help to realize different pedagogical approaches informed by constructivism and collaborativism.
2 Literature Review 2.1 Learning Theories and Pedagogical Approaches Learning theories are widely used and explored to provide guidance for instructional and learning practice. In this section, two well-cited learning theories are reviewed and related pedagogical approaches are discussed. Constructivism and Self-controlled Learning. Constructivism is a psychological and philosophical perspective stating that individuals form or construct by themselves much of what they learn and understand [5]. According to constructivism, information is not learning; instead, learning involves processing all the instructional inputs to develop, test, and refine mental models in long-term memory [6]. Therefore, learner’s self control of processing learning material at their own time and pace contributes to learning effectiveness [7]. Self-controlled learning refers to the ability of learner to exert control over the pace, sequence, and content of instruction in a learning environment [8]. Learning system is usually equipped with learning control mechanisms to facilitate selfcontrolled learning, such as instructional designs helping learners make their own decisions concerning the aspects of the path, flow, or events of instruction [9]. Collaborativism and communication-based Learning. Collaborativism states that learning emerges through shared understandings of more than one learner [10] as social interaction stimulates elaboration of conceptual knowledge [11]. Hence, the corresponding pedagogical approaches include promoting communication and socialization [3]. According to collaborativism, communications among learners is an important component of effective learning process. This communication process consists of learner’s interaction with peer and instructor. Learner-instructor interaction has been shown to significantly affect learning in both regular classrooms and online [12]. Learner-learner interaction could provide support to cognitive and affective objectives of learning [12]. In particular, leaner-learner interaction supports cognitive learning objectives through its ability to instigate, sustain, and support critical thinking in a group of learners. 2.2 System Characteristics of VLE The principal components of a VLE include systems that can map a curriculum, track student activity, and provide online student support and electronic communication [13]. VLEs could provide “increased convenience, flexibility, currency of material, student retention, individualized learning and feedback over traditional classrooms” [14]. In the field of IS research, system characteristics are shown to be important in predicting user beliefs and technology acceptance [15, 16], which in turn affects the real usage and the success of the information system [17]. With a comprehensive
652
N. Liu, Y. Zhong, and J. Lim
review of literature on both information systems and educational psychology, we selected four characteristics that are considered to be critical for the system functionality of VLE. They are (1) information accessibility, (2) ease of navigation (3) system interactivity and (4) system support.
3 Research Model and Hypotheses Researchers have suggested that technology offers the possibility to accommodate various pedagogical approaches in one integrated system [18]. VLE is believed to offer such flexibility by facilitating both self-controlled learning and communicationbased learning. This study will investigate how VLE system characteristics would affect self-controlled learning and communication-based learning, and whether they are helpful in achieving better learning outcomes. The research model is presented in Figure 1 below. VLE System Characteristics
Pedagogical Approach
Learning Outcomes
Information Accessibility Ease of Navigation System Interactivity
Self-controlled Learning
Learning Skill Development
Communicationbased Learning
Subject-related Learning
System Support
Fig. 1. Research Model
Information Accessibility. Accessibility refers to the ease with which information can be accessed or extracted from the system [19]. Perceived accessibility has been a critical factor affecting information system use and success [20] and information access level has been observed to affect the choice and use of an information system [21]. Researcher noted that information accessibility is an important factor for the study of virtual learning environment [22]. Researchers pointed out that “no matter how well the e-learning system integrates various media and allows for interactivity, the system will not be able to achieve best learning outcomes if it has poor” response time and accessibility [23, p225]. As VLE systems with high information accessibility provide learners with the flexibility to access to relevant information in a short time, learners would have more control of their own learning. Hence, we hypothesize that:
An Empirical Investigation on the Effectiveness of Virtual Learning Environment
653
H1. Information accessibility of VLE is positively related to self-controlled learning. Ease of navigation. The navigation of VLE concerns the evaluations of the links to needed information. Navigability is defined as “the sequencing of pages, well organized layout, and consistency of navigation protocols” [24]. It measures whether there are adequate links; whether the descriptions for links are clear; whether information is easy to locate, whether it is easy to go back and forth in the system etc. [25]. Navigational supports are developed and used to help users easily navigate information and reduce cognitive overhead and disorientation (i.e. becoming ‘‘lost in the hyperspace’’) [26]. Navigation is important to VLE as it makes various information and tools easier to locate [27]. Thus, a system which is easier to navigate provides more flexibility in user’s preference to locate the information and tools needed, we hypothesize that: H2. VLE’s ease of navigation is positively related to self-controlled learning. System Interactivity. Interactivity used in this study is defined as the extent to which the system allows participants to act as both senders and receivers of verbal and nonverbal messages and feedback [28].This definition of interactivity is widely used in e-learning studies [23]. The multiple communication platforms provided in VLE let students to share their ideas, learn from one another, “to exchange emotional support, information, and foster a sense of belonging” [29, p. 44]. As stated by Palloff and Pratt [30], the “key to the learning process are the interactions among students themselves, the interactions between faculty and students, and the collaboration in learning that results from these interactions” (p. 5). However, communications and interactions do not just happen; it must be intentionally incorporated into a web-based learning design [31]. Therefore, only when the various communication tools and facilitators are intentionally built into VLE to achieve high interactivity of the system, the communications and collaborations of learners during learning process can then happen. H3. System interactivity of VLE is positively related to communication-based learning. System Support. System support refers the technical support and support personnel that help learners, facilitators and professors to use and access the learning environment [32]. Once online learning is ‘‘up and running’’ there is a continued need for technical support. Lack of technical support is often seen as a barrier to designing, developing, and delivering any web-based learning systems [33]. Prior studies have stressed the importance of providing high quality of support service [34] in an e-learning system. Support is necessary to ensure the stability and reliability of the system, to reduce errors in operating the system and prevent system breakdown, so that learners can use VLE smoothly without any disruption. Thus, learners will feel more easily to control over their own learning time, pace and material. The reliability of VLE functionalities is guaranteed with good system support, including all the functionalities facilitating interaction among learners and interaction with instructor. Thus, we hypothesize that:
654
N. Liu, Y. Zhong, and J. Lim
H4. System support of VLE is positively related to self-controlled learning. H5. System support of VLE is positively related to communication-based learning. Self-controlled learning is important because learners best know their own instructional need, so that they can construct their own knowledge in the context of their own needs and experiences [7]. Therefore, based on the assumptions of cognitivism, technologies are widely used to provide structured knowledge representation, personalize learning systems [35] and information seeking through search engine [18] to enhance learner’s control over their own learning. Hence, if learners perceive higher self control of learning in a VLE, it will be helpful in achieving better learning effectiveness in terms of improvement in learning skill and subject-related knowledge. Thus, we hypothesize that: H6. Self-controlled learning is positively related to learning skill developments. H7. Self-controlled learning is positively related to subject-related learning. The quality of interaction determines whether real learning takes place [36]. Collaborativism implies that learning is a socially mediated process and all learning is mediated by tools such as language, symbols, and signs [37]. Interaction with members in the VLE is proofed to play a crucial role in knowledge acquisition and the development of cognitive skills, and that interaction is intrinsic to effective instructional practice and individual discovery [38,39] Picciano [40] found that students’ perceived learning from online courses was related to the amount of discussion actually taking place in them as well as their interaction with instructors. Hence, we hypothesize that: H8. Communication-based learning is positively related to learning skill developments. H9. Communication-based learning is positively related to subject-related learning.
4 Research Methodology Survey method was adopted for this study, as survey enhances generalizability of results [41]. More specifically, we use Web-based survey as it allows more flexibility in approaching more subjects and is has fast speed in returning completed questionnaires [42]. 4.1 Operationalization of Constructs Where available, all constructs were measured using questions adapted from prior studies to enhance validity [43]. Elsewhere, new questions were developed based on a review of the previous education and information systems literature. All items were measured using a seven-point Likert-type scale with anchors from “Strongly disagree” to “Strongly agree”. 4.2 Survey Design and Administration A Web-based survey is designed for this study. Two public universities using virtual learning environments in daily teaching practices are selected as the target for sending
An Empirical Investigation on the Effectiveness of Virtual Learning Environment
655
out the survey invitation. In total, 1000 emails were sent out to students randomly selected from the universities. 143 students responded to the survey, with 110 completed the survey. The response rate is 14.3%, and the abandonment rate is 23%.
5 Data Analysis Partial least squares (PLS), as a structural equation modeling (SEM) technique, was used to assess both the research model and the psychometric prosperities of the scales. 5.1 Measurement Model The measurement model of PLS is assessed by examining the convergent [44] and discriminant validity [45] of the research instruments. In PLS, three tests are used to determine the convergent validity of measured constructs: reliability of items, the composite reliability of constructs, and the average variance extracted by constructs. The assessment of the measurement model is reported in Table 1. Table 1. Assesment of meaasurement model Measures of Constructs IA NAV SI SS SCL CBL LSD SRL
Composite Reliability 0.907362 0.928741 0.902677 0.934101 0.935967 0.927766 0.957367 0.894806
Cronbach’s Alpha 0.862908 0.885010 0.857283 0.904586 0.914162 0.896270 0.944707 0.844337
Variance Extracted 0.715363 0.812999 0.698735 0.780622 0.745664 0.762828 0.818250 0.684029
To ensure the discriminant validity, the squared correlations between constructs should be less than the average variance extracted for a construct. Table 2 reported the results of discriminant validity, which was checked by comparing the diagonal to the non-diagonal elements; all items fulfilled the requirement. Table 2. Discriminant validity of constructs Construct IA NAV SI SS SCL CBL LSD SRL
IA 0.8458 0.4463 0.1369 0.3035 0.2904 0.1498 0.0435 0.2047
NAV 0.9017 0.3597 0.4350 0.4649 0.2573 0.2284 0.3840
SI
0.8359 0.4214 0.1869 0.3904 0.3219 0.1868
SS
SCL
CBL
LSD
SRL
0.8835 0.4478 0.1711 0.2211 0.2436
0.8635 0.1787 0.4070 0.4653
0.8734 0.4295 0.4074
0.9046 0.6272
0.8271
656
N. Liu, Y. Zhong, and J. Lim
5.2 Structural Model The path coefficients and explained variances for the model were calculated using a bootstrapping procedure. Each hypothesis corresponded to a path in the structural model. Hypotheses were tested at the 5 percent significance level. The results of hypotheses testing were summarized in Table 3. Table 3. Summary of hypothese testing Hypothesis (path) H1. IA Æ SCL (+) H2. SN Æ SCL (+) H3. SIÆ CBL(+) H4. SS Æ CBL (+) H5. SS Æ SCL (+) H6. SCL Æ LSD(+) H7. SCL Æ SRL(+) H8. CBL Æ LSD(+) H9. CBL Æ SRL (-)
Path Coefficient 0.063 0.309 0.387 0.008 0.294 0.341 0.405 0.369 0.335
t-Value
P-value
0.638 2.705 2.675 0.067 4.102 3.012 3.631 4.146 2.822
0.524824 0.007939 0.008635 0.000080 0. 946706 0.003233 0.000433 0.000068 0.005682
Hypothesis supported? No Yes Yes Yes No Yes Yes Yes Yes
6 Discussion and Implications Based on the findings, system characteristics of VLE is shown to have positive influence on self-controlled learning and communication-based learning, which are positively related to learner’s skill development and subject-related knowledge. In particular, system’s ease of navigation and system support lead to better selfcontrolled learning and system interactivity is positively related to communicationbased learning as predicted. However, information accessibility is not shown to affect self-controlled learning. This may because that with the fast development of network technology, the variation in loading time of various VLE components is quite small and thus will have minor influence on learners. System support is not proofed to influence communication-based learning as predicted. One possible reason is that it is less difficult to communicate with peer and instructor using the channels provided by VLE than to search, locate useful and relevant information in VLE. Hence, high system support may not be necessary for learner interaction. This study advances theoretical development in the area of technology-mediated learning in general and VLE research in particular. It demonstrates that VLE system factors derived from IS theories have positive impact on pedagogical approaches derived from learning theories, and in turn lead to different learning outcomes. The results of this study shed light on how different system quality factors serve different educational goals. This study also demonstrates the feasibility of having different educational approaches in one VLE. Although different educational theory, such as behavirosm, cognitivism and constructivism, has competing assumptions and goals, VLE provides a space to accommodate all these educational needs. This is consistent with previous research effort in proposing an integrated learning platform [18]. As this study has
An Empirical Investigation on the Effectiveness of Virtual Learning Environment
657
also shown the effectiveness of different system features, system designer and course manager can emphasize different features to serve different goals.
7 Conclusion This study theoretically develops and empirically tests a model that explains how VLE system characteristics can ultimately influence learners learning effectiveness by directly affecting leaner’s control over the learning material and leaner’s communication and interaction with peers and instructor. This study also demonstrates the feasibility of incorporating different learning theories into VLE design and how they would inform different perspective of learning. In a future characterized by volatile environments, effective leverage of learning technology would be a factor differentiating more effective from less effective teaching and learning. As an advanced form of e-learning, the design and implementation of VLE need to align with pedagogical approaches and guided by educational theories. As educational institutes invest more resources in VLE, it is imperative that research on the alignment of VLE design with educational needs, such as this study, continue to generate findings that inform practice.
References 1. Adkins, S.S.: Ambient insight’s the US corporate market for learning services: 2007-2012 forecast and analysis, Ambient Insight (May 2007) 2. Alavi, M., Leidner, D.E.: Research Commentary: Technology-Mediated Learning–A Call for Greater Depth and Breadth of Research. Informs. Systems. Res. 12, 1–10 (2001) 3. Leidner, D.E., Jarvenpaa, S.L.: The Use of Information Technology to Enhance Management School Education: A Theoretical View. MIS Quart. 19, 265–291 (1995) (Special Issue on IS Curricula and Pedagogy) 4. Wilson, B.G.: Constructivist Learning Environments: Case Studies in Instructional Design. Educational Technology Publications, Englewood Cliffs, NJ (1996) 5. Bruning, R.H., Schraw, G.J., Ronning, R.R.: Cognitive psychology and instruction, 2nd edn. Merrill, Upper Saddle River (1995) 6. Sheull, T.J.: Cognitive conceptions of learning. Review of Edu. Res. 56, 315–342 (1986) 7. Liegle, J.O., Janicki, T.N.: The Effects of Learning Styles on the Navigation Needs of Web-based Learners. Computers in Human Behavior 22, 885–898 (2006) 8. Milheim William, D., Martin, B.L.: Theoretical Bases for the Use of Learner Control: Three Different Perspectives. J. of Computer-Based Ins. 18, 99–105 (1991) 9. Williams, M.: Learner control and instructional technologies. In: Jonassen, D. (ed.) Handbook of research on educational communications and technology, pp. 957–983. Scholastic, New York (1996) 10. Slavin, R.E.: Achievement Effects of Ability Grouping in Secondary Schools: A BestEvidence Synthesis. Review of Edu. Res. 60, 494 (1990) 11. van Boxtel, C., van der Linden, J., Kanselaar, G.: Collaborative learning tasks and the elaboration of conceptual knowledge. Learning and Instruction 10, 311–330 (2000) 12. Moore, M.G.: Three types of interaction. American J. of Distance Edu. 3, 1–6 (1989) 13. McKimm, J., Jollie, C., Cantillon, P.: ABC of learning and teaching: Web based learning. BMJ 326, 870–873 (2003)
658
N. Liu, Y. Zhong, and J. Lim
14. Piccoli, G., Ahmad, R., Ives, B.: Web-Based Virtual Learning Environments: A Research Framework and a Preliminary Assessment of Effectiveness in Basic IT Skills Training. MIS Quarterly 25, 401–426 (2001) 15. Davis, F.D.: User acceptance of information technology system characteristics, user perceptions and behavioral impacts. Intel. J. of Man-Machine Studies 38, 475–487 (1993) 16. Igbaria, M., Guimaraes, T., Davis, G.B.: Testing the determinants of microcomputer usage via a structural equation model. J. of Management. Inform. Systems 11, 87–114 (1995) 17. DeLone, W.H., McLean, E.R.: The DeLone and McLean Model of Information Systems Success: A Ten-Year Update. J. of Management. Inform. Systems 19, 9–30 (2003) 18. Mishra, S.: A design framework for online learning environments. British J. of Edu. Tech. 33, 493–496 (2002) 19. Wixom, B.H., Todd, P.A.: A Theoretical Integration of User Satisfaction and Technology Acceptance. Informs. Systems. Res. 16, 85–102 (2005) 20. Culnan, M.J.: The dimensions of accessibility to online information: implications for implementing office information systems. ACM Tran. on Office Informs. Systems 2, 141– 150 (1984) 21. Swanson, E.B.: Information channel disposition and use. Decision Sci. 18, 131–145 (1987) 22. Ang, D.B.S., Chan, H.C., Lim, J.L.H.: Learning communities in cyberspace: a proposed conceptual framework. In: Cumming, G., et al. (eds.) Advanced Research in Computers and Communications in Education, New Human Abilities for the Networked Society, vol. 1, pp. 600–607. IOS Press, Netherlands (1999) 23. Pituch, K., Lee, Y.: The influence of system characteristics on e-learning use. Computers & Education 47, 222–244 (2006) 24. Palmer, J.W.: Web site Usability, Design, and Performance Metrics. Informs. Systems. Res. 13, 151–167 (2002) 25. McKinney, V., Yoon, K., Zahedi, F.M.: The measurement of web-customer satisfaction: an expectation and disconfirmation approach. Informs. Systems. Res. 13, 296–315 (2002) 26. Chiu, C.M.: Towards integrating hypermedia and information systems on the web. Information and Management 40, 165–175 (2003) 27. Machlis: Site Designs Keep it Simple. Computerworld 32, 43–44 (1998) 28. Burgoon, J.K., Bonito, J.A., Bengtssonb, B., Cederbergb, C., Lundebergc, M., Allspachd, L.: Interactivity in human–computer interaction: a study of credibility, understanding, and influence. Comp. in Human Behavior 16, 553–574 (2000) 29. Hiltz, S.R., Wellman, B.: Asynchronous learning networks as a virtual classroom. Communicat. of the ACM 40, 44–49 (1997) 30. Palloff, R.M., Pratt, K.: Building learning communities in cyberspace: Effective strategies for the online classroom. Jossey-Bass Publishers, San Francisco (1999) 31. Berge, Z.L.: Interaction in post-secondary Web-based learning. Educational Tech. 39, 5– 11 (1999) 32. MacDonald, C.J., Gabriel, M.A.: Toward a Partnership Model for Web-Based Learning. Internet and Higher Education 1, 203–216 (1998) 33. Daugherty, M., Funke, B.L.: University Faculty and Student Perceptions of Web-Based Instruction. J. of Distance Edu. 13, 21–39 (1998) 34. Liu, C., Arnett, K.P.: Exploring the factors associated with Web site success in the context of electronic commerce. Infor. & Management 38, 23–33 (2000) 35. Conole, G.M., Dykea, M., Oliver, Sealea, J.: Mapping pedagogy and tools for effective learning design. Computers & Edu. 43, 17–33 (2004) 36. Draves, W.A.: Teaching online. River Falls. LERN Books, Wisconsin, USA (2000)
An Empirical Investigation on the Effectiveness of Virtual Learning Environment
659
37. Karpov, Y., Haywood, H.: Two ways to elaborate Vygotsky’s concept of mediation. American Psych. 53, 27–36 (1998) 38. Sims, R.: Interactivity: a forgotten art? Comp. in Human Behavior 13, 157–180 (1997) 39. Richardson, J.C., Swan, K.: Examining social presence in online courses in relation to students’ perceived learning and satisfaction. J. of Asynchronous Learning Networks 7, 68–88 (2003) 40. Picciano, A.G.: Beyond student perceptions: Issues of interaction, presence and performance in an online course. J. Asynchronous Learning Networks 6, 24–40 (2002) 41. Dooley, D.: Social Research Methods. Prentice Hall, Englewood Cliffs (2001) 42. Couper, M.P., Traugott, M.W., Lamias, M.J.: Web Survey Design and Administration. Public Opinion Quar. 65, 230–253 (2001) 43. Stone, E.: Research Methods in Organizational Behavior. Goodyear Publishing Company, Santa Monica (1978) 44. Cook, M., Campbell, D.T.: Quasi-Experimentation: Design and Analysis Issues for Fields Settings. Houghton Mifflin, Boston (1979) 45. Campbell, D.T., Fiske, D.W.: Convergent and discriminant validation by the multitraitmultimethod matrix. Psycho. Bulletin. 56, 81–105 (1959)
Personalization for Specific Users: Designing Decision Support Systems to Support Stimulating Learning Environments Laura Măruşter, Niels R. Faber, and Rob J. van Haren Faculty of Economics and Business, University of Groningen P.O. Box 800, 9700 AV Groningen, The Netherlands [email protected], [email protected]
Abstract. Creating adaptive systems becomes increasingly attractive in the context of specific groups of users, such as agricultural users. This group of users seems to differ with respect to information processing, knowledge management and learning styles. In this work we aim to offer directions toward increasing decision support systems usability, by tailoring toward user learning styles. The results show that decision support systems need to be redesigned toward providing agricultural users with a more efficient time management and study environment, and facilitating group interaction.
1 Introduction There is an increasing trend of taking into account learning and cognitive styles when personalizing computer systems. Because “one-size-fits-all” approach does not seem to work effectively in practice, the idea of personalizing and creating adaptive systems become increasingly attractive. Research in Adaptive Hypermedia tries to integrate knowledge from hypermedia systems development and user modeling, where educational hypermedia systems and online information systems predominate [1]. Although some early research pointed out that cognitive style based personalization of management information systems (MIS) and decision support systems) DSS is not a relevant issue [2], more recent research show that cognitive and learning styles have a significant effect on technology acceptance and usage [3]. In the context of personalizing for differentiated users, it becomes increasingly important to address specific user groups [4] Agricultural users can be considered as an example of a specific IT user group, which seems to become an important research issue [5]. According to research in the agricultural domain, farmers can be characterized based on their economic characteristics; subsequently different farmer groups can be identified [6]. Moreover, it seems that farmer groups differ also with respect to information processing, knowledge management and learning styles. Also, the usage of agricultural DSSs by farmers has been found unsatisfactory [7]. Nowadays, in all business sectors there is an increasing push towards innovativeness and competitiveness. Agricultural users, as an example of specific users, face many challenges, such as an increasing complexity concerning their farm management, M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 660–668, 2009. © Springer-Verlag Berlin Heidelberg 2009
Personalization for Specific Users: Designing Decision Support Systems
661
the need to address climatologic changes, etc. [6]. In order to support agricultural users, various initiatives emphasize on learning and knowledge transfer. For instance, in the Netherlands, different IT systems such as decision support systems, have been developed. Unfortunately, not all systems are used by agricultural users as intended [6,9]. The proposed approach is an attempt to address the (re)design and personalization aspects of DSS, dedicated to a specific group of users. The problem is the gap that exists between the DSS systems developed by designers, and the low usability of these systems, which emphasizes the importance of the relationship between the DSS developer and the potential user [7]. With our approach we aim to offer an instrument to increase DSS usability, by tailoring DSS design toward user learning styles. In this research we propose an approach for personalizing decision support systems for specific groups of users, by taking into consideration user’s learning style. The redesign and personalization aspects are tackling both the interface (adaptive presentation and adaptive navigation support, in terms of Brusilovsky’s terminology), and decision making aspects. In Section 2, we present the theoretical basis underlying our research, the instruments used and data collection aspects. The third section is dedicated to describing the results, which are further used in Section 4 to provide design guidelines for DSS.
2 Methods and Theoretical Underpinnings Agricultural users need to be typified corresponding to their leaning styles. To assess the corresponding learning style, two instruments are used: (i) Motivated Strategies for Learning Questionnaire MSLQ [11] and (ii) Kolb’s Learning Styles Inventory KLSI [12]. Although criticisms exist concerning the validity and reliability of the KLSI instrument, it is often used for determining persons’ learning styles, and for individual profiling in training tasks [13]. We choose MSLQ because we intend to address learning styles not only from an individual, but also from a group perspective. Based on a selection of constructs from these three instruments, a questionnaire has been sent to 1800 starch potato growers in The Netherlands. The questionnaire was sent out on paper. Of the 1800 questionnaires that have been sent to growers, so far 97 useable questionnaires were returned, which means a response rate of 5%. An adapted form of the Task-Technology Fit instrument - TTF [8] is used to determine the fit between a DSS, used by farmers and the cultivar selection tasks. As showed in [9], some agricultural users show difficulties in fulfilling the DSS goal, indicating a possible misfit between the DSS design and the tasks to be supported by the DSS. The problematic TTF dimensions determined are then translated into specific personalization actions, depending on the learning style. The questionnaire consists of a selection of constructs from the Motivated Strategies for Learning Questionnaire (MSLQ) (Pintrich & de Groot, 1990), and Kolb’s Learning Styles Inventory [12]. The selection of constructs from the MSLQ follows Pieters’ (2005) findings. Pieters’ study of learning styles in the agricultural domain in the Northern Netherlands reveals the resource management strategy constructs ‘peer learning’ (5 items) and ‘help seeking’ (5 items) to correlate with grower performance (expressed as yield quantity and yield quality). In addition, the MSLQ construct ‘management time and
662
L. Măruşter, N.R. Faber, and R.J. van Haren
study environment’ (5 items) has been added. Kolb’s Learning Styles Inventory primary scales ‘concrete experience’ (CE), ‘reflective observation’ (RO), ‘abstract conceptualization’ (AC), and ‘active experimentation’ (AE), as well as the combination scales ‘abstractness over concreteness’ (AC-CE) and ‘action over reflection’ (AE-RO) have been included in the questionnaire. Finally, general questions were posed to typify the agricultural user and his business. Respondents were asked to specify their age, size of the farm, percentage of rented land, etcetera. The research is organized based on the conceptual model, shown in Figure 1. Following the question concerning the profile of agricultural users in terms of learning styles, we aim to test the following hypotheses: H1: Learning styles influence task-technology fit. H2: Learning styles influence decision support systems usage. H3: Task characteristics influence task-technology fit. H4: Task-technology fit determines decision support systems usage.
Fig. 1. Conceptual model
3 Results We distinguish between two types of farming supporting systems, namely Management Information Systems (MIS) and Decision Supporting Systems (DSS). The first group MIS- consist of MIS_System1, MIS_System2, MIS_System3, MIS_System4, MIS_System5, while the second group –DSS- consist of DSS_System6, DSS_System7, DSS_System8 and DSS_System91. The overview concerning MIS and/or DSS usage, is shown in Table 1. MIS_System3, MIS_System4 and MIS_System5 systems provide 1
The names of systems remain undisclosed because of confidentiality reasons.
Personalization for Specific Users: Designing Decision Support Systems
663
Table 1. Usage of ICT support
ICT support Management Information Systems Use
Total MIS use DSS Use
N 23 19 30 7 8 87 4 1 1 20 26
MIS_System1_Use MIS_System2_Use MIS_System3_Use MIS_System4_Use MIS_System5_Use DSS_System6_Use DSS_System7_Use DSS_System8_Use DSS_System9_Use
Total DSS use
Percent 26.4% 21.8% 34.5% 8.0% 9.2% 100.0% 15.4% 3.8% 3.8% 76.9% 100.0%
also advising modules, which can be considered as modules used for decision making activities. Therefore, we classify further MIS into MIS including decision making modules (MISwithDMM), and MIS excluding decision making modules, which will be further disregarded. In our analysis we select responses corresponding to those users who use (i) MISwithDMM and/or (ii) DSS. It is worthwhile to remark that MIS are more used than DSS; a possible explanation is they have been available for a longer period of time than DSS. In case of MIS, the maximum amount of years of availability is 35, while for DSS is 8 years. 3.1 Reliability of Constructs Task characteristics are measured by means of two concepts: task dependency (2 indicators) and task complexity (3 indicators). For the measurement of task technology fit (TTF), two constructs are used, namely data quality and training. Because not all indicators are reliable, data quality is constructed as an aggregation of 8 indicators, and training is measured by four indicators. Finally, an aggregated measure for TTF is obtained. Table 2. Reliability of used constructs - Cronbach α scores Constructs Task characteristics
TTF
Learning Styles
No of items Task dependency
2
Reliability (Cronbach α) .705
Task complexity Task characteristics (aggregated) Data Quality Training TTF (aggregated) Time management & study environment Help Seeking Peer Learning
3 2
.793 .581
8 4 2 6
.608 .601 .521 .792
5 3
.606 .856
664
L. Măruşter, N.R. Faber, and R.J. van Haren
Finally, the construct Usage is measured as one cumulative counts, system use. In Table 2 are provided the reliability measures for the used constructs. Apart from the aggregated constructs Task characteristics and TTF, all the other constructs show reliability values above the threshold of 6. 3.2 Learning Styles Based on Kolb’s Method Kolb learning styles ‘converger’ (high scores on AC and AE), ‘diverger’ (high scores on CE and RO), ‘assimilator’ (high scores on RO and AC), and ‘accommodator’ (high scores on CE and AE) could have been ascribed to 45 of the respondents, using the respondents’ reports on Kolb’s dimensions CE, RO AC, and AE. Table 3 provides an overview of Kolb learning styles in the sample regarding growers’ age, farm size, percentage of land rented, and participation to starch potato, sugar beet, consumer potato, and seed potato growths. Table 3. Grower characteristics per Kolb learning style Kolb learning style N Age Total area Percentage rented Area starch potato growth Area sugar beet growth Area consumer potato growth Area seed potato growth
Converger 13 mean sd. 48.9 11. 2 99.5 110 .7 52.0 33. 0 39.2 51. 0 12.1 6.9 4.2 3.2
11. 5 7.5
Diverger 8 mean sd. 44.9 10.0
Assimilator 17 mean sd. 48.1 10.3
Accommodator 6 mean sd. 39.8 12.7
78.0
49.4
88.4
45.4
63.0
22.6
15.0
20.2
31.1
34.9
54.2
32.6
29.9
25.1
32.3
22.8
30.7
15.0
10.9
15.3
12.5
10.6
10.5
9.2
.0
.0
.3
1.2
3.5
8.6
.0
.0
.8
1.9
.0
.0
In our sample the group of convergers (AC-AE) is the oldest group, and has the largest mean area in their business. This group also has the highest mean area dedicated to starch potato, consumer potato, and seed potato growth. Assimilators (ROAC) assign the largest area to sugar beet growth. These differences are not significant but indicative. The percentage of land that is rented shows differences between the learning styles (F(3, 40) = 3.002, p < .05). The accommodator (CE-AE) and converger group report significantly more area as rented than the diverger (CE-RO) and assimilator groups. The accommodator and converger groups focus stronger on active experimentation. The found result might be an indication that the type of land is a factor these groups vary in their experiments. 3.3 Learning Styles and Task Technology Fit (H1) A regression model is build in order to identify those learning styles dimensions impacting task technology fit. Two approaches are used, namely first, the task technology
Personalization for Specific Users: Designing Decision Support Systems
665
fit measure is used as a dependent variable, and second, the components of the TTF construct are used separately (e.g. data quality and training) as dependent variable. The best regression model turns out to be the one with training as the dependent variable. The independent variables are learning styles dimensions, namely time management & study environment, peer learning and help seeking. At an α level of 5%, the model is significant (F(3, 44) = 10.67, p-value <.01) and explains 42,1% of the variance. Time management & study environment is the only significant factor (t = 5.158, p-value <.01) in the regression model, therefore the following model is proposed. Estimation of Training = 1.558+.777*time management & study environment
(1)
The interpretation of this model is the following: the better one manages the time and study environment, the more knowledge and skills to use DSS will develop. 3.4 Learning Styles and Usage (H2) In order to assess the influence of learning styles in terms of MSLQ dimensions, a regression model has been developed. The dependent variable is the cumulative count of systems used, and the independent variables are learning styles MSLQ dimensions. At an α level of 5%, the model is significant (F(3, 57) = 4.04, p-value <.05), and explains 17,5% of the variance. Help seeking (t = -2.272, p-value <.05), and time management & study environment (t = 2.289, p-value <.05) are significant factors in the regression model (α level of 5%), therefore the following model is proposed: Estimation of Usage = .622 - .338*help seeking + .367 * time management &study environment
(2)
We notice that help seeking has a negative effect on system usage, while time management & study environment has a positive effect, however quite comparable in absolute magnitude. Thus, the better one manages his/her time and study environment and depends less on the help of others, the more systems s/he will use. 3.5 Task Characteristics and Task Technology Fit (H3) ANOVA testing of the relation between task characteristics and task-technology fit shows no effects, using task complexity and task dependency split at their means as dependent variables. No differences are found between high and low task complexity and task-technology fit (F(1,48) = 1.238, p = .271) and between high and low task dependency and task technology fit (F(1,46) = 1.123, p = .295). Within the investigated agricultural domain, the results provide no indication that the task characteristics complexity and dependency are predictors for the task-technology fit of available decision support systems. 3.6 Task Technology Fit and Usage (H4) Task-technology fit shows a relation with the number of years decision support systems are used. A higher reported task-technology fit corresponds with a longer period of use of decision support systems (F(1,48) = .3.435, p < .10). This relation is mainly influenced by the number of years management information systems that are equipped
666
L. Măruşter, N.R. Faber, and R.J. van Haren
with decision support modules are used (F(1,48) = 3.852, p < .10). In contrast, the task-technology fit is no predictor for the amount of decision support systems that are used by a grower (F(1,48) = .720, p = .400). These results lead to the conclusion that decision support systems that provide good support for specific tasks of a farmer lead to usage over a longer period, particularly in case the decision support is provided by specific modules from the agricultural users’ management systems. Furthermore, agricultural users are selective in their choice of decision support. Decision support systems are only used if they provide support for tasks for which an agricultural user desires support; a decision support system’s availability and a high task-technology fit will not lead to it being used.
4 Conclusions This study provides some footholds for the redesign of decision support systems for support of tasks in the agricultural domain, providing personalized decision support to growers. Personalization of decision support systems is realized through altering the presentation of the system, and changing the system’s interface. Additionally, personalization can be achieved by changing the decision support focus of the system. The latter form of personalization relates to changing the target of the decision support process, for instance by changing optimization criteria. Changes to a decision support system affect the task-technology fit of the system, which in its turn should affect the system’s use. Based on Kolb’s learning style approach, we profiled agricultural users in four categories: ‘converger’, ‘diverger’, ‘assimilator’, and ‘accommodator’. More research is needed in order to find out how to use this approach for DSS redesign purposes. Based on the MSLQ approach, we found relations between learning styles and task technology fit at one hand, and use of decision support systems on the other hand. They provide indicators for redesign not only at a technical level, but also on the level of the context in which these systems are used. Our first hypothesis leads to the development of a regression model. According to this model, it seems that learning styles dimensions affect task-technology fit (more specifically the training dimension). Therefore, more efforts should be spent on redesigning DSS to facilitate efficient time management and study environment in case of agricultural users. The second hypothesis was confirmed also based on a regression model. The first regression factor -help seeking- provides an indication that decision support systems are used by agricultural users that learn in solitude. The investigated decision support systems connect to this orientation of agricultural users; existing decision support systems focus on supporting individual agricultural users. In order to increase use of these systems, an orientation towards supporting groups of agricultural users might be considered. A way to provide support within groups is to facilitate discussions between group members as part of the decision support system. The second regression factor operates at the level of use context of decision support systems. Only if agricultural users organize their time and study environment will they be inclined to use these systems. Therefore, to support usage of decision support systems, some incentive needs to be installed to convince agricultural users to use it. The exact form of
Personalization for Specific Users: Designing Decision Support Systems
667
such an incentive depends on the motivation of agricultural users to learn. Additional information is required about the motivation, intrinsic or extrinsic, of agricultural users to learn in relation to the use of decision support systems. The results presented in the previous section provide no support towards changing available decision support systems or management information systems that have a decision support module. No relations have been found between task characteristics and task-technology fit (the third hypothesis), or between task-technology fit and usage of these systems (the third hypothesis). Hence, the results provide no indication of the effects of changing the interface or the decision support focus of the systems. As possible limitations, the results reported in this work can be considered as preliminary (only 5% of the questionnaires have been filled in). Therefore, more data should be collected and analyzed, in order to come with a complete picture concerning learning styles. This research connects with other learning-based initiatives that aim to support agricultural users, with currently under development by Gielen (see [6]), namely “Stimulating & Inspiring Learning Environment”, where 12 learning environments are developed for agricultural entrepreneurs: (1) masterclass, (2) clinic, (3) workshop, (4) lab, (5) academia, (6) general repetition, (7) entrepreneur café, (8) boksring, (9) kitchen table, (10) utopia, (11) study club, and (12) expedition. Acknowledgements. This research is sponsored by the KodA foundation and Agrobiokon. Within the KodA (Kennis op de Akker; knowledge on the field) foundation, growers and industry co-operate in the development and dispersion of knowledge and experiences concerning various agricultural crops. The partners of the KodA foundation are Agrifirm, AVEBE, Cosun, CSV, CZAV, the Arable Board/(HPA), IRS, Koninklijke Maatschap de Wilhelminapolder, LTO, Meneba, and Nedaba. AGROBIOKON is financially supported by the Northern Netherlands Provinces (SNN), the Arable Board (HPA), and AVEBE. Additionally, AGROBIOKON has been funded from the INTERREG III A-programme of the Eems-Dollard region by the European Union, the Land Niedersachsen and the Northern Netherlands Provinces (Samenwerkingsverband Noord-Nederland, abbreviated to SNN), and has also been made possible by the Dutch Ministry of Agriculture, Nature and Food Quality (Kompas/UIL-NN).
References 1. Brusilovsky, P.: Adaptive hypermedia. User Modeling and User Adapted Interaction 11(1/2), 87–110 (2001) 2. Huber, G.P.: Cognitive style as a basis for MIS and DSS designs: much ado about nothing? Management Science 29(5) (May 1983) 3. Chakraborty, I., Hu, P.J., Cui, D.: Examining the effects of cognitive style in individuals’ technology use decision making. Decision Support Systems 45(2), 228–241 (2008) 4. Song, Q., Shepperd, M.: Mining web browsing patterns for e-commerce. Comput. Ind. 57, 622–630 (2006) 5. Thysen, I.: Agriculture in the information society. J. Agric. Eng. Res. 76, 297–303 (2000)
668
L. Măruşter, N.R. Faber, and R.J. van Haren
6. Faber, N., van Haren, R.: Knowledge systems for sustainable innovation of starch potato production – achieving more with less. In: Jorna, R.J. (ed.) Sustainable Innovation – The organizational, human and knowledge dimension, pp. 204–226. Greenleaf Publishing (2006) 7. McCown, R.L.: Changing systems for supporting farmers’ decisions: problems, paradigms, and prospects. Agricultural Systems 74, 179–220 (2002) 8. Goodhue, D.L., Thomson, R.L.: Task-Technology Fit and individual-performance. MIS Quarterly 19(2), 213–236 (1995) 9. Maruster, L., Faber, N., Jorna, R., van Haren, R.: Analysing agricultural users’ pat-terns of behaviour: The case of OPTIRas, a decision support system for starch crop selection. Agricultural Systems 98(3), 159–166 (2008) 10. Gielen, P., Biemans, H., Mulder, M.: Inspirerende leeromgevingen voor ondernemers: Aanwijzingen voor ontwerpers en begeleiders. Wageningen: leerstoelgroep Educatie- en competentiestudies, Wageningen Universiteit (trad. Inspiring learning environment for entrepreneurs: indications for designers and coaches) (2006) 11. Pintrich, P.R., Smith, D.A., Garcia, T., McKeachie, W.J.: Reliability and Predictive Validity of the Motivated Strategies for Learning Questionnaire (MSLQ). Educational and Psychological Measurement 53, 801–813 (1991) 12. Kolb, D.A.: Experiential Learning. Prentice-Hall, Englewood Cliffs (1984) 13. Liegle, J.O., Janicki, T.N.: The effect of learning styles on the navigation needs of Webbased learners. Computers in Human Behavior 22(5), 885–898 (2006)
Construction of Systematic Learning Support System of Business Theory and Method Yoshiki Nakamura1 and Katsuhiro Sakamoto2 1
Nihon University, Department of Industrial Management, 1-3-2 Misaki-cho Chiyoda-ku Tokyo 101-8360, Japan [email protected] 2 Aoyama Gakuin University, Department of Industrial Systems and Engineering, 5-10-1 Fuchinobe Sagamihara-shi Kanagawa 229-8558, Japan [email protected]
Abstract. Business administration and industrial engineering expect their students to understand a lot of theories and technique and to be able to utilize their acquired knowledge in the workplace after entering business. However, the current educational system prevents these expectations from becoming a reality. The purpose of this research is designed to develop an educational simulator that teaches theories and methods and changes the effects and results generated by business decisions according to the degree of understanding. To confirm the validity, the developed simulator is tested against students to confirm its efficiency. Keywords: Supply chain management, educational simulator, business theory and technique, decision making.
1 Introduction To survive the competition, manufactures are recently required to satisfy the diversified demands of consumers while inventory reduction and cutting their lead time at the same time. Business administration and industrial engineering offer a solution to this problem [7]. Production control is one theme of business administration and industrial engineering. In relation to production control, universities expect their students to understand inventory control systems and demand forecasting methods and to be able to utilize their acquired knowledge in the workplace after entering business. However, the current educational system prevents these expectations from becoming a reality because it is impossible to indicate cases where learned theories and methods are utilized as corporate activities. In addition, there is the problem that the individual instruction of theories and methods, which are related to each other, prevents learners from applying them to actual problems. A learning method through an enterprise simulator serves to narrow the gap between education and business [6] [7]. This provides the learning effects of (a) quasi-experiences of corporate management, (b) training for cooperating with others, and (c) training for solving problems, etc. However, there is a problem where M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 669–678, 2009. © Springer-Verlag Berlin Heidelberg 2009
670
Y. Nakamura and K. Sakamoto
decision-making becomes more dependent on hunch, and the reasons for success and the effects generated from the flow of corporate activities and business decisions are ignored through the learning process, although there are no issues if the decisions bring positive results. The learning method through the enterprise simulator essentially makes business decisions using learned theories and methods and analyzes the results and activities of competitors for subsequent decision-making. This research is designed to develop an educational simulator that teaches theories and methods and changes the effects and results generated by business decisions according to the degree of understanding. The developed simulator is tested against students to confirm its efficiency.
2 Overview and Features of Educational Simulator The simulation consists of suppliers, makers, distributors, retail store and markets (Fig. 1). Users join the simulation as makers and compete with three competitors (four makers in total) for market share. Competitors are automatically operated by the simulator. The makers purchase product parts from three suppliers. There are two markets. Each manufacturer produces two types of products that are delivered to ten retailers through one distributor and are offered in the market. The simulator has eight decision-making items that are done for 48 periods (Table 1). The decision-making process results in the output of market share, sales results,
MakerA Distributor
Supplier
Market I
Product X
MakerB Supplier Maker Retail Store Supplier MakerC
Supply Components Confirm
Manufacturing Comp
Comp
Comp Storage
Marketing
Product Product
Maker
Confirm
Product Storage
Fig. 1. Fundamental frame of the simulator
Market II
Product Y
Construction of Systematic Learning Support System of Business Theory and Method
671
Table 1. Decision making items
Department Supplier Manufacturing
Sales
DM Items Planned procurement volume Planned production volume The number of employees QC/IE Cost Planned sales volume Marketing cost IT cost
production results, and inventories for each period. In the simulation, learners have two target indicators: the reduction of the bullwhip effect and the achievement of targets for improvement. The bullwhip effect refers to the amplification effects generated from planning information upon its communication to other sections. For example, a small gap between supply and demand at the sales level is amplified accordingly as the information is communicated to makers and suppliers, which results in generating a huge gap in inventory. In the simulation, ideal figures are set for sales, product inventory, and procurement volumes for each period. Learners earn a higher reputation on the degree of understanding when the gap between the ideal figures and the actual figures is smaller. The achievement of targets for improvement refers to the four figures that learners are required to achieve through the simulation and includes reduction in inventory (initial value: 50%), reduction in opportunity loss (40%), cutting lead time (20%), and improvement in the accuracy of demand forecasting (20%), which are all adjusted according to the degree of understanding of the learners. These targets are achieved by improving the accuracy of decision-making through learning and utilizing related theories and methods. The following describes the theory and method related model for developing the simulator and the framework reflecting the results of the simulation according to the degree of understanding of the theories and methods.
3 Developing the Theory and Method Related Model and Setting the Degree of Understanding For the relationship between decision-making items and theories and methods, a model was formulated to compel learners to study theories and methods efficiently and to improve the accuracy of their decision-making (Fig. 2). As a first step, theories and methods related to the decision-making items were selected. As for selected theories, input and output items were defined and connected with those of other theories. For instance, the planned order volume has a strong relationship with ordering methods and demand forecasting methods. In addition, it is clear that demand forecasting methods depend on the product life cycle. Like the above, the theories and methods related to decision-making are selected and the methods are connected. At the production level, the decision-making items include planned production volume, the number of employees and QC/IE cost. In determining the items, learners take into account demand forecasts, which are based on demand forecasting methods
672
Y. Nakamura and K. Sakamoto
Planned procurement volume
Order system
Planned production volume
MRP
Demand forecast method
Planned sales volume
Lifecycle
Inventory control method Marketing
Scheduling Educational method Manufacturing manage Manufacturing system
The number of employees
Quality control
Sales information sys
Marketing cost
IT cost
Kaizen
QC/IE Cost
Fig. 2. The relationship between decision-making items and theories and method
at the distribution level and optimal order quantity, which is based on product inventory. In this case, it is also necessary to determine a schedule for production. Concretely, the production schedule is based on program evaluation review and technique (PERT) or job shop scheduling. They also determine a production method. The available production methods are the line production method, job shop production method, and the built to order (BTO)/assemble to order (ATO) methods. In addition, 7QC tools, six sigma and motivation theory, etc., are provided for in the simulation. At the distribution level, theories such as demand forecasting methods, life cycle analysis, and marketing methods are provided to determine planned sales volume. Demand is forecast using the arithmetic average method, the moving average method, the exponential smoothing method, or the regression line method, the selection of which is subject to the period of decision-making. The demand forecast is related to the life cycle, which has an effect on the sales volume. Therefore, this provides theories and methods for forecasting demand and also permits learners to study the relationship of these methods with others. For procurement and product part inventory levels, theories and methods justifying planned production volume and planned procurement volume are taught. At first, part requirements are computed through material requirements planning (MRP). Entering planned production volume and delivery time of products results in the output of part requirements and the delivery time of parts. In addition, BMO is utilized for product information. The ordering method for product parts, which includes the fixed order quantity method and the fixed order period method, is determined based on ABC analysis. Furthermore, a safety stock of inventory can be calculated and reflected in order control. For each theory, data is entered separately, i.e., learners can study each theory and make decisions in consideration of the results of calculations. To teach the above theory and method related model efficiently, the time for providing theories and methods is subject to the period of the simulation (Fig. 3). For example, MRP and the scheduling method are provided during the fifth period as
Construction of Systematic Learning Support System of Business Theory and Method
5 period’s list of theories
15 period’s list of theories
25 period’s list of theories
Order sys
Inventory control
Demand forecast
Demand forecast
Lifecycle
Marketing
673
Planned production volume
Planned production volume
MRP
Order sys
Scheduling
Inventory control
Educational method Manufacturing system Quality control Kaizen
Fig. 3. An example of theory expansion consonant with time expiration
theories related to planned production volume. During the 15th period, the relationship between the ordering method, MRP, and the scheduling method is provided to learners, while strongly related theories and methods are suggested in the 25th period. This support, which is changed and strengthened step by step, allows learners to gradually study the theories and methods that help with decision-making, the relationship between the theories and methods, and the relationships between each decision-making item. This also leads to improvements in the accuracy of decision-making. Finally, the degree of understanding of these theories is reflected in the results of the simulation. Targets of the degree of understanding are the reduction of the bullwhip effect and the obtained share of sales. The former gradually decreases as it is recognized that the theories have been understood when an estimate, which is based on theories and methods for calculating sales volume, is close to the decision-making figure. The latter is dependent on the degree of understanding of the theories concerning the sales to advertising expenses ratio, favored rate of corporate image, and return on investment.
4 The Simulator Learners use the Java based simulator on a PC. The administrator answers questions on the use of the simulator from users. On the page for entering business decisions, learners enter the planned production volume and the planned sales volume of products, the planned procurement volume of product parts, QC/IE cost, the number of employees, and information system expenses (Fig. 4). Planned production volume and planned sales volume are entered for two
674
Y. Nakamura and K. Sakamoto
Decision items
Effect button
Reference button
Fig. 4. Decision making screen of the simulator
types of products, respectively. The same as above, the planned procurement volume is entered for all three types of product parts. The number of employees is in terms of the number of people and the unit of QC/IE cost, marketing cost, and IT cost is in tens of thousands of yen. For all items, the results for the previous period are represented. Clicking on the “Reference” or “Effect” button on the page for entering business decisions displays the theories and methods related to the decision-making items. Clicking on the “Reference” button displays a reference page on decision-making information. For example, the reference page concerning planned production volume provides lists of demand forecasting methods that generate demand volume and of investment control theories that provide inventory information. Learners use these theories and methods to make decisions. Clicking on the “Effect” button displays a page of information on the effects of decision-making (Fig. 5). On that page, figures on decision-making provide the results and effects from corporate activities through figures calculated using the theories and methods. For example, the determination of planned production volume provides product part requirements through MRP and lead-time through the scheduling method. Learners understand the relationship between the planned production volume and the product part requirements through MRP based on the product part volume as well as the relationship between the production volume and the procurement volume. The theories are gradually updated according to the decision-making period (Fig. 6). This allows learners to study the relationship between theories or decision-making items effectively, in addition to each theory. Concretely, there are no pages on theory support and only results are available until the fourth period. From the fifth period, theories related with decision-making are listed. Learners use the theories to make decisions. Which results are affected by the decision-making theories is also revealed. From the 15th period, theories and methods directly related to decision-making as well as
Construction of Systematic Learning Support System of Business Theory and Method
675
Fig. 5. Clicking after the “Effect” button
5 to 14 period’s list of theories
15 to 24 period’s list of theories
25 to last period’s list of theories
Fig. 6. Updated screen of the reference button
those in common with other decision-making items are available. For example, planned production volume generates product part requirements through MRP. Ordering methods and inventory control theories, which justify planned procurement volume, are also displayed. During the twenty-fifth period, how decision-making theories effects corporate management is revealed.
676
Y. Nakamura and K. Sakamoto
Fig. 7. Final results of the simulator
The page of final results outputs (Fig. 7) not only financial information including an income statement and balance sheet, but also planned production time (the results of scheduling), production throughput time, procurement lead-time, delivery throughput time, the number of stores for which delivery was completed, inventories of products and their parts, actual production volume, planned production volume, actual sales volume, and the trend of the bullwhip effect.
5 Review of Simulation To review the effectiveness of the simulator, we gave a questionnaire to five students: Based on the results of the above questionnaire (Table 2), the simulator won high appreciation on learning theories. For the degree of understanding of SCM, the survey indicated that the relationships between SCM, decision-making, and the bullwhip effect are also understood. Learners are likely to have estimated the results of their decision-making as they took the theories into account when making decisions. In our view, this helps the study of the relationship between production activities and their theories and methods, which is the target of the research. In the open response section, there were some positive opinions: “A representation of the theories required by each item helps to understand the relationship” and “I know how to determine figures and what effect the figures have.” In contrast, we also saw some negative opinions: “It may be misunderstood that there is no cap on expenses for investment items” “It is difficult to understand the relationship with the balance sheet” and “The representation of theories is complex and inconvenient”.
Construction of Systematic Learning Support System of Business Theory and Method
677
Table 2. Questionnaire and answers
Questions 1 Do you tend to make decisions by your hunch as time passes?
2 Can you understand the relationships between the methods?
average period overall 5 15 Last period overall 5 15 Last period overall 5 15 Last
4.3 4.4 4.8 3.8 4.2 4.4 4.4 3.8
4 Do you understand SCM? 5 Do you understand the bullwhip effect?
3.73 3.6 3.8 3.8 3.83 4.3
Theory Marketing 6 What do you focus on for reducing the bullwhip effect (number of checks)? Demand forecast Stock theroy Order system 7 Is the system user-friendly? 8 Do you understand the model?
0 2 2 1 3.2 4
3 Do you know when you understand theories?
6 Result and Future Tasks The current educational system is not able to indicate cases where learned theories and methods are utilized as corporate activities. Therefore, the purpose of this research is designed to develop an educational simulator that teaches theories and methods and changes the effects and results generated by business decisions according to the degree of understanding. To confirm the validity, the developed simulator is tested against students to confirm its efficiency. Through the questionnaire, there were some positive opinions. As a result, this study can develop a useful simulator for learning theories and methods. Future tasks include the selection of theories at the expansion of coverage to nonproduction sections, such as a management section, as well as the expansion of the theories covered.
Acknowledgment We are deeply grateful to Yousuke Motose, who is graduate school student of Aoyama Gakuin University. This research was also partially supported by Grant-inAid for Young Scientists (B), (19700646) in Japan, by Research Grant of Institute of Business Research in Nihon University and by Research Grant of Research Institute of Economic Science in Nihon University.
678
Y. Nakamura and K. Sakamoto
References 1. Ickerott, I., Witte, T.: Analysis of Collaborative Supply Networks using an Agent-based Simulator. PPS Manage. 12(2), 16–19 (2007) 2. Jou, J.J., Liu, C.K.: Computer-aided Analysis of Distortions in SCM Systems with EDFAs including Chirping Effect. Aeu-Int. J. of Elec. and Com. 61(4), 243–248 (2007) 3. Hona, K.K.B.: Performance and Evaluation of Manufacturing System. CIRP Annals Manuf. Tech. 54(2), 139–154 (2005) 4. Neely, A., Gregory, M., Platts, K.: Performance Measurement System Design. Int. J. Operations & Production Manag. 15(1), 80–116 (1995) 5. Stewart, G.: Supply-chain Operations Reference Model (SCOR): The First Cross-industry Framework for Integrated Supply-chain Management. Logistics Info. Manag. 10(2), 62–67 (1997) 6. Katsuhiro Sakamoto, K., Nakamura, Y., Sato, H.: Development of Educational SCM Simulator. In: The Proceedings of the 11th Int. Conf. on HCI, 9 pages (2005) 7. Nakamura, Y., Sakamoto, K.: Design of an SCM Simulator Model for Educational Purposes and the Effects of This Design. In: The Proceedings of the 11th Int. Conf. on Industrial Eng. Theory, Applications, and Practice, pp. 1175–1180 (2006) 8. Edum-Fotwe, F.T., Thorpe, A., McCaffer, R.: Information Procurement Practices of Key Actors in Construction Supply Chains. Europ. J. of Purchasing & Supply Manag. 7(3), 155–164 (2001) 9. Xue, X., Li, X., Shen, Q., Wang, Y.: An Agent-based Framework for Supply Chain Coordination in Construction. Automation in Construction, 14(3), 413–430 (2005) 10. Young-Su Yun, Y., Gen, M.: Advanced Scheduling Problem using Constraint Programming Techniques in SCM Environment. Comp. & Industrial Eng. 43(1-2), 213– 219 (2002) 11. Stadtler, H.: Supply Chain Management and Advanced Planning. Europ. J. of Op. Res. 163(3), 575–588 (2005) 12. Baker, A., Navarro, E.O., Hoek, A.: An Experimental Card Game for Teaching Software Engineering Processes. J. of Sys. and Software 75(1-2), 3–16 (2005) 13. Carrano, A.L., Kuhl, M.E., Marshall, M.M.: Integration of an Experiential Assembly System Engineering Laboratory Module. Int. J. of Eng. Edu. 24(5), 1012–1017 (2008) 14. Strunz, K., Louie, H.: Cache Energy Control for Storage: Power System Integration and Education Based on Analogies Derived From Computer Engineering. IEEE Trans. on Power Sys. 24(1), 12–19 (2009) 15. Davidovitch, L., Parush, A. Shtub, A.: Simulation-based Learning in Engineering Education: Performance and Transfer in Learning Project Management. J. of Eng. Edu. 95 (4), 289–299 (2006) 16. Wilson, J.R., Goldsman, D.: Alan Pritsker’s Multifaceted Career: Theory, Practice, Education, Entrepreneurship, and Service. IIE Trans. 33(3), 139–147 (2001)
Learning by Design in a Digital World: Students' Attitudes towards a New Pedagogical Model for Online Academic Learning Karen Precel, Yoram Eshet-Alkalai, and Yael Alberton The Open University of Israel and Chais Research Center for the Integration of Technology in Education [email protected]
Abstract. Despite the fact that the blended learning model is considered today the preferred model for online course design, there is still ambiguity regarding its implementation in educational systems and regarding the optimal proportions between online learning and F2F meetings in various learning scenarios. The present research examined students' perceptions of pedagogical and design issues of a fully-online course at the Open University of Israel, which offers a new model for blended online learning. Fifty-eight of the course's students completed a questionnaire regarding three major aspects of the course's design: (1) pedagogy, (2) textbook format (print vs. digital) and (3) usability issues in designing the course's learning environment. Results illustrate the importance of a particular in-advance pedagogical and visual design of online learning and the potential of the course's model in creating meaningful learning, which takes into account the state-of-the-art knowledge on the major pedagogical considerations in online learning. Keywords: online learning, blended-learning model, usability, pedagogical model.
1 Introduction In the last decade, the large-scale penetration of communication technologies into educational systems (schools and universities), industry and organizations, along with the availability of effective learning management systems, have led to an increase in online learning and training [3, 4]. However, recent studies report that the integration of online learning environments in the academia faces a wide range of problems, which leads to a disappointment of the limited effect of these technologies on the institutes' teaching and learning culture [5, 8]. Recent studies suggest that the limited success of instructional technologies in online learning results from the following major reasons: (1) Reading academic texts in a digital format is problematic for most learners [10, 16]; (2) Students report that feelings of loneliness and social detachment that are associated with the online setting have a negative effect on their learning [13]; (3) Teachers and students lack necessary cognitive skills for making effective use of online technologies M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 679–688, 2009. © Springer-Verlag Berlin Heidelberg 2009
680
K. Precel, Y. Eshet-Alkalai, and Y. Alberton
(Eshet, 2004) [9, 10, 15]; and (4) Most online courses adopt pedagogical approaches that fit the traditional-frontal teaching and learning, and do not employ pedagogical approaches that fit online learning [1, 6, 9, 11]. As of today, the Blended Learning Model, which combines online and F2F components in the learning process, is considered the most effective model for online learning [1, 14]. However, despite numerous studies that investigated the implementation of the Blended Learning Model in online academic studies, much ambiguity still exists regarding its utilization in real-life situations and the optimal proportion of its components in different instructional settings [5]. The common model for course design, development and instruction in most open universities worldwide (e.g. Israel and the UK) contains some paradoxes [12], the most central of which is the fact that courses are developed and written by experts, who do not teach them, and that the actual instructors of the courses are not involved in writing the textbooks and the learning-guides. As pointed out by various scholars (e.g. [12, 13], this kind of course-delivery model creates a gap between the course developer, the course instructor and the students, and has a severe negative effect on the learning process and on students' satisfaction. Guri-Rosenblit [12] emphasizes the importance of making special efforts to close this gap in the design of online courses in open universities. The present study focused on a special case of a blended online course at the Open University of Israel in which online learning technologies were utilized to address the major problems that are involved in online learning in order to create an effective and satisfactory online learning environment. The paper presents results from a study of students' attitudes toward the interface and the pedagogic design of an M.A.-level online course at the Open University of Israel. 1.1 Course Description and Pedagogical Model As of today, the use of online components in the learning process in most Open University's courses is relatively limited, consisting mainly of a course homepage, instructor's announcements, syllabus, assignments, occasional online resources and a forum for online discussions. In addition, in most cases, the online elements are considered "nice to have" and not mandatory or central to the learning process. Usually, they are added after the course development is completed. Consequently, the online elements usually don't have a significant impact on the learning process. The course investigated in the present study, is an M.A.-level online course titled "Design Principles of Computer-Based Learning Environments". The course focuses on the major aspects of designing technology-based learning environments. Contrary to the typical "nice to have" online components in most online courses at the Open University, this course was designed as a fully-online course, in which all of the learning materials (e.g. online lectures, readings, textbook, time-table, assignments and exercises) are available online, and a major portion of the learning itself takes place online, in the especially-designed course's online learning environment. The course design and development was based on the state-of-the-art knowledge on the major problems that underlie online teaching and learning in universities in
Learning by Design in a Digital World
681
general and in open universities in particular [12]. Following recent research findings on problems involved in reading academic texts from digital displays [10], the course's textbook is made available to students in both digital and print formats. This allows students to choose their preferred mode of learning and to navigate freely between the text and the online environment. In order to bridge the above-noted gap between the course writer and the students [12], the course offers a variety of video lectures by the course writer. The course pedagogy follows the constructivist approach [7]. Accordingly, learning focuses on the students' ability to solve real-life authentic problems in an academic context, and the course assignments require the analysis of Internet-based learning environments and the design of user interfaces and educational simulations. The course's computerized learning environment (CLE) is designed according to state of the art usability standards, flexibility in navigation and principles in designing hyper textual learning environments. [2]. The course instruction is based on the Blended Learning Model [5, 14]): Most of the learning is done online, complemented by six optional face-to-face orientation meetings. According to the blended learning principles [5], the online learning in the course is more central for topics that emphasize practical issues (e.g. interface design, databases or simulation design), for which authentic tasks are assigned, whereas in the more theoretical topics (e.g. learning theories), face-to-face learning is more dominant. As a course that focuses on the major aspects of designing computer-based learning environments and their underlying learning processes, it consists of five learning units (i.e. theoretical aspects of learning with technology, hypertext and hypermedia learning environments, user-interface design, designing data-bases and educational simulations). Each unit combines a discussion of the theoretical and the practical aspects of the topic. The theoretical background is provided by the course's textbook and the assigned articles for each unit. The course's tasks and assignments are designed to help students implement their theoretical knowledge in authentic, real-life situations. As noted above, designed as a "virtual classroom", the course's web-site serves as the major learning environment in which discussions take place, face-to-face meeting's summaries are posted, and assignments and tasks are submitted – making the content and the online learning processes inseparable. 1.2 The Study's Goals The main goals of the present study were to examine students' attitudes toward the following topics: 1. Course pedagogy, including the pedagogical aspects that concern the design of the online and printed textbooks. 2. Issues which relate to text-format reading (print versus digital reading). 3. Usability issues in designing the course's learning environment.
2 Method Participants. Fifty-eight of the course's students participated in the study during three semesters (Table 1).
682
K. Precel, Y. Eshet-Alkalai, and Y. Alberton Table 1. Distribution of the study's participants according to semester Semester Fall 2006 Fall 2007 Spring 2007
Number of participants 21 14 23
Tools. A structured attitudes’ questionnaire was developed in order to examine students' attitudes toward the following issues: the course's instructional pedagogy, the technological tools and the learning materials and their influence on students' learning. The questionnaire consists of 78 questions that refer to the students' use of the various learning components, their usability (i.e. friendliness, ease of use and orientation) and their perceived contribution to learning. The questionnaire which was distributed during the Fall and Spring 2007 semesters was updated to include questions that were absent in Fall 2006 semester. Procedure. Data was collected during three semesters in 2006-2007. Students completed the questionnaire during the last F2F course-meeting, or submitted it via electronic mail.
3 Results Results are presented in respect to the study's three major goals: 1. Course pedagogy (pedagogical aspects of the online textbook's design, online video lectures and online discussion groups). 2. Issues which relate to text-format reading (print versus digital reading). 3. Usability issues in designing the course's learning environment (i.e. the online textbook and the course's website). 3.1 The Course's Pedagogy Students' attitudes toward the pedagogical value of various instructional and learning components in the course were examined in the current study (Table 2). As can be seen in table 2, the instructional components that were perceived as most contributing to learning were the course's tasks (mean=4.72), the printed textbook (mean=4.54), the meeting's presentations (mean=4.42) and the F2F meetings (mean=4.15). The online video lectures were not found as a highly contributing component to learning (mean=3.83), however, 87% of the participants indicated that they would not give them up. The 'personal notebook' (an online annotation tool, which enables students to annotate selected sections of the online textbook) was the most unused component and was perceived as insignificant to learning (only 7.3% used it frequently; mean of contribution to learning = 1.6). The online textbook was considered an average contributor to learning (mean=3.32). However, almost half of the participants (46.5%) indicated that they used it frequently.
Learning by Design in a Digital World
683
Table 2. Students' attitudes toward the pedagogical value of various instructional and learning components
Learning components Online textbook Printed textbook Video lectures Online time-table Personal notebook Discussion groups Meeting's presentations List of links in the textbook Tasks*** F2F meetings***
No. of participants 56 37 54 55 48 53 55
Contribution to learning Mean* Stdv
Frequency of use (%) High**
Low**
3.32 4.54 3.83 3.45 1.60 3.51 4.42
1.42 0.87 1.24 1.26 1.11 1.12 0.94
46.5 83.8 73.7 63.2 7.3 56.1 86.2
53.5 16.2 26.3 35.1 92.7 43.9 13.8
55
3.93
1.09
82.5
17.5
29 26
4.72 4.15
0.8 1.26
*The answer's scale was 1 – "no contribution" – 5 "high contribution" **High frequency – "continuously", "frequently"; Low frequency – "seldom", "never ***Items that were not included in the Fall 2006 questionnaire
3.2 Pedagogical Aspects of the Textbook and the Video Lecture's Design Following constructivist principles of learning via problem-solving in authentic situations [7], the online textbook contains links to brainwork exercises, performance tasks and links to articles and authentic examples on the internet. The current study examined students' attitudes to the above components. As table 3 shows, items 1-3 assessed the actual use of the brainwork exercises by the students. It was found that the more demanding the tasks, the less students favored them (from high preference for examples (mean=3.94) to medium preference for performance tasks (mean=3.17). Nevertheless, the contribution of these components to students' understanding and motivation was found to be high (mean=4.2). Note that the 'components contribution to learning' measure was calculated as a mean of the scores of items 4-9 (Table 3). These items measure the contribution of knowledge construction, relevance to the learning themes, dynamic learning, understanding and internalization of the learning material, satisfaction and fulfillment from the learning and the level of interest in the texts. These six items were found to have high internal validity (Cronbach Alpha=0.91). 3.3 Video Lectures and Discussion Groups Video lectures, given by the course developer, as well as discussion groups, led by the course instructor, were included in the course's CLE in order to bridge the gaps between the course developer and the course instructor, the course instructor and the students and the students and their peers. More than 90% of the respondents reported that they observed at least one lecture and 87% of the respondents reported on the necessity of the video lectures. In addition, the possibility to listen to the lectures,
684
K. Precel, Y. Eshet-Alkalai, and Y. Alberton
Table 3. Students' attitudes toward the pedagogy of the course’s instruction and the influence of the learning environment design and content on learning processes
Exercises components in the textbook: To which extent did you: 1. perform the exercises in the textbook? 2. stop and think about the questions and issues raised? 3. stop and examine the examples the text refers to? The components' contribution to the learning process: 4. Knowledge construction 5. Was relevant to the learning themes 6. Dynamic leaning 7. The level of interest in the text 8. Gratification from learning 9. Understanding and internalization of the learning material Total 'components' contribution to learning' measure** 10. The online textbook's functional design lead you to refer to, think of, or understand the course's contents. Video lectures 11. The acquaintanceship with the course developer contributed to the learning experience 12. The lectures contributed to learning focalization in each unit 13. Listening to the lectures combined with the presentation and examples contributed to understanding the learning material Discussion groups (DG) 14. The satisfaction from the level of discussions 15. Organizing the discussion groups according to units contributed to focalization of discussions in the DG. 16. Organizing the discussion groups according to units contributed to receiving assistance when needed.
Number of respondents
Mean*
Stdv
48 47 48
3.17 3.40 3.94
0.97 0.90 0.81
47 46 46 48 48 47 448 45
4.15 4.33 4.22 4.23 4.21 4.15 4.21 3.67
0.81 0.70 0.81 0.81 0.82 0.81 0.67 1.13
50
3.78
1.18
50 32
4.00 4.40
1.16 0.80
49 49
3.59 3.96
0.84 0.96
49
3.86
1.10
*The answer's scale was 1 – "not at all" – 5 "Very much" **The measure was calculated as a mean of items 4-9 (internal validity, Cronbach Alpha=0.91)
combined with the presentation and the examples, was found to contribute the most to learning (mean=4.4, table 3). Almost all respondents (98.2%) visited the discussion groups. Most of them (67.9%) reported that they continuously followed the activity in the discussion groups while others visited them occasionally. Of all the respondents, 28.6% reported on active involvement in the discussion groups. The students' satisfaction from the discussions' level was found to be higher than average (mean=3.59). 3.4 Course Textbook Format: Reading from Print versus Digital Displays The optimal format for presenting the course's learning materials – in a printed or digital textbook – was examined in relation to three different learning assignments: reading, tasks' implementation and preparation for the final exam (Table 4). As can be seen from table 4, the overall preference of more than half of the respondents (57.9%) was for combining the printed and the digital textbook. Of all the respondents,
Learning by Design in a Digital World
685
Table 4. Students' preferences regarding the textbook's format (printed, digital or combination) in relation to various learning assignments
How do you usually read the course's textbook? Which book do you usually use to prepare the course's tasks? Which book do you prefer to use prior to the final exam? General preference*
Digital textbook % 10.3 15.8
Printed textbook % 50 57.9
Combination % 39.7 26.3
14
59.7
26.3
5.26
36.84
57.9
*This measure integrates the respondents' preferences of the three learning assignments into one measure in the following way: students who preferred the digital textbook in all assignments – "Digital textbook", students who preferred the printed textbook in all assignments – "Printed textbook", all other preferences – "Combination".
Table 5. Print versus Digital – reasons that influence the respondents' preferences
It is hard to read long texts from the computer's screen I'm used to reading and studying from printed textbooks The digital textbook enables easy access to examples of computerized learning environments and other references The printed textbook can be read everywhere It is easy to navigate in the digital textbook You can't mark or write notes in the digital textbook The digital textbook contains interesting information that can not be found in the printed version In the printed textbook you can find what you want easily The reading in the digital textbook requires time investment
High/large influence % 56.8
Little influence % 35.1
No influence % 8.1
Number of respondents 37
59.5
24.3
16.2
37
62.9
22.9
14.2
35
64.9
24.3
10.8
37
47.1
32.3
20.6
34
55.9
32.3
11.8
34
20.0
34.3
45.7
35
62.2
29.7
8.1
37
32.4
50.0
17.6
34
36.84% preferred the printed textbook solely while few students (5.26%) preferred the digital textbook solely. Results (Table 5) show that for most of the respondents (more than 60%), the most influential factors in choosing the printed textbook were the convenience of the printed book’s accessibility and the ease in finding information. The major reasons for choosing the digital textbook were the fast access to online examples of computer-based learning environments and the easy access to other links, which are embedded in the text.
686
K. Precel, Y. Eshet-Alkalai, and Y. Alberton
Table 6. Students' attitudes toward the design of the digital textbook and the video lectures Number of respondents The digital textbook: 1. Text's design – chosen font 2. Text's design – font's size 3. Text's organization in layers 4. Tasks' integration 5. Links and examples' integration 6. Navigation Total Usability measure** The video lectures: 7. Functional design of the video lecture's interface 8. Time length of the video lectures
Mean *
Stdv
46 46 46 45 45 44 46
3.41 3.41 3.46 3.74 3.64 3.39 3.50
0.62 0.62 0.69 0.63 0.53 0.75 0.50
49 48
3.45 3.25
0.68 0.86
*The answer's scale was 1 – "is not suitable for learning" – 4 "Very suitable for learning" **The measure was calculated as mean of items 1-6
3.5 Usability Aspects of Course Design Students' attitudes toward usability issues in designing the course's online textbook and website were examined in the current study (Table 6). Results indicate the students' high satisfaction from various usability aspects of the course's CLE (i.e. its ease of use and friendliness) and the digital textbook. Results show that the organization of the course's digital contents facilitated navigation and reading (mean=4.28). High scores were given to specific design elements, such as the font's type and size (mean=3.41 for both), text's organization (mean=3.46), the integration of tasks and examples in the course's CLE (mean=3.64) and the ease of navigation through the text and the CLE (mean=3.39). The general usability measure, as calculated from items 1-6 in table 6, was high (mean=3.5). In addition, the navigation in the course's CLE, which offers the students flexibility in reaching the course's contents 'from everywhere', was found to be highly usable and the students used this flexibility wisely and in various ways. For example, half of the respondents reached the course's readings via the "Articles" button in the course's CLE homepage, while 20% of the respondents reached it via links in the digital textbook or the "time-table" area in the course's CLE.
4 Discussion and Conclusions The purpose of the current study was to examine students' perceptions of pedagogical, design and usability issues regarding a fully-online course and its learning environment. Results of the current study make a meaningful contribution to our understanding of students' perceived value of learning and instruction in online environments. Students' high rating of the pedagogical and design course elements illustrates the great importance of 'designing in advance' (contrary to 'designing in retrospect'), which takes into account the problems involved in online learning in present-days academic courses [15]. The students' strong preference for the Blended Learning model, which was found in the current study, is compatible with reports by
Learning by Design in a Digital World
687
most current studies on online learning models. Our findings illustrate the need to adjust the instructional model to the content and the learning objectives [6]. Findings of the current study indicate students' high evaluation of the interactive learning components, such as discussion groups and constructivist tasks. Results of the current study reinforce the widely-reported students' preference for reading academic texts in a print format compared to the digital one [16], mainly because of navigation, availability and ownership reasons. Only few students preferred the digital over the printed textbook because of the accessibility to the online examples. Nevertheless, as indicated in many studies, (e.g. [10]), our knowledge of the nature of digital reading is not yet clear, emphasizing the need for solid research data in order to reach conclusions regarding the preferred format for reading academic texts. The high satisfaction from the usability components of the course's CLE, which was found in the current study, is exceptional compared to the general low satisfaction of LMS sites reported by many studies (e.g. [3, 4]). Extremely high satisfaction (mean=4.7 on a 1-5 scale) of the course's CLE was also found in the general course's instruction surveys that was given to students at the end of each semester. We believe that this high satisfaction is an outcome of the major investment in designing the pedagogical and usability elements of the course 'in advance'. Nevertheless, the usability of some components (i.e., the personal notebook) was evaluated as low. Further research is needed to clarify the reasons for these evaluations. It should be noted that results of the current study have a few limitations: (1) the sample was relatively small (2) participants were M.A. students in an educational technology graduate program, and many of them have higher computer skills than the average student. Thus, the high level of satisfaction found in the research might not represent students from other disciplines and (3) even though the questionnaire utilized in the study was modified from the Open University's standard instruction satisfaction questionnaire, it did not undergo through a large-scale validation process. In futures studies, after validating the questionnaire, special emphasis should be put on testing a larger group size, comparing students' attitudes from various disciplines and of different proficiency levels, and comparing online courses that are based on different pedagogical models. Notwithstanding, results of the current study shed new light on our understanding of the proper design of a blended online academic course: in-advance pedagogical and visual design of online learning. In addition, results indicate the potential of the current model in bridging the gap that is typical of online learning between students and instructors and students and their peers, and in creating meaningful learning by employing "online pedagogical considerations" to course design.
References 1. Andrews, R., Haythornthwaite, C.: The Sage Handbook of E-Learning Research. Sage Publications, L.A (2007) 2. Balcytiene, A.: Exploring individual processes of knowledge construction with hypertext. Instructional Science 27, 303–328 (1999)
688
K. Precel, Y. Eshet-Alkalai, and Y. Alberton
3. Bonk, C.J.: The perfect e-storm: emerging technology, enormous learner demand, enhanced pedagogy, and erased budgets. Part 1: Storms # 1 and #2. The Observatory on Higher Education (2004a), http://www.obhe.ac.uk/products/reports/ publicaccesspdf/Bonk.pdf 4. Bonk, C.J.: The perfect e-storm: emerging technology, enormous learner demand, enhanced pedagogy, and erased budgets. Part 2: Storms # 3 and #4. The Observatory on Higher Education (2004b), http://www.publicationshare.com/part2.pdf 5. Bonk, C.J., Wisher, R.A., Lee, J.: Moderating learner-centered e-learning: Problems and solutions, benefits and implications. In: Roberts, T.S. (ed.) Online collaborative learning: Theory and practice, pp. 54–85. Idea Group Publishing (2003) 6. Bonk, C.J., Graham, C.R. (eds.): Handbook of blended learning: Global Perspectives, local designs. Pfeiffer Publishing, San Francisco (2006) 7. Bransford, J.D., Sherwood, R.D., Hasselbring, T.S., Kinzer, C.K., Williams, S.M.: Anchoredinstruction: Why we need it and how technology can help. In: Nix, C., Spiro, R. (eds.) Cognition, Education and Multimedia: Exploring Ideas in High Technology, pp. 115–141. Lawrence Erlbaum Associates, Hillsdale (1990) 8. Cuban, L., Kirkpatrick, H., Peck, C.: High access and low use of technology in high school classrooms: Explaining an apparent paradox. American Educational Research Journal 38(4), 813–834 (2001) 9. Eshet, Y.: Teaching online: survival skills for the effective teacher. Inroads-The SIGCSE Bulletin 39(2), 16–20 (2007) 10. Eshet, -A.Y., Geri, N.: Does the medium affect the message? The influence of text representation format on critical thinking. To be appear in Human Systems Management 26(4) (2007) 11. Graham, C.R.: Blended Learning Systems: Definition, Current Trends, and Future Directions. In: Bonk, C.J., Graham, C.R. (eds.) Handbook of blended learning: Global Perspectives, local designs. Pfeiffer Publishing, San Francisco (2006) 12. Guri-Rosenblit, S.: Eight paradoxes in the implementation process of e-learning in higher education. Higher Education Policy 18, 5–29 (2005) 13. Lazenby, K.: Technology and educational innovation: A case study of the virtual campus of the University of Pretoria. Doctoral Dissertation. The University of Pretoria: Pretoria, South Africa (2003), http://upetd.up.ac.za/thesis/available/etd03172003-094954 14. Osguthorpe, R.T., Graham, C.R.: Blended learning environments: Definitions and directions. The Quarterly Review of Distance Education 4(3), 227–233 (2003) 15. Shemla, A., Nachmias, R.: How Do Lecturers Integrate the Web in Their Courses? WebSupported Courses at Tel-Aviv University. In: Pearson, E., Bohman, P. (eds.) Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006, pp. 347–354. AACE, Chesapeake (2006) 16. Spencer, C.: Research on learners preferences for reading from a printed text or from a computer screen. Journal of Distance Education 21(1), 33–50 (2006)
Promoting a Central Learning Management System by Encouraging Its Use for Other Purposes Than Teaching Franz Reichl and Andreas Hruska E-Learning Centre, Vienna University of Technology, Gusshausstr. 28/E015, A-1040 Wien, Austria {Franz.Reichl,Andreas.Hruska}@elearning.tuwien.ac.at
Abstract. Vienna University of Technology’s E-Learning Centre introduced Moodle in 2006 as the university’s central learning management system. Custom interfaces to existing IT infrastructure as well as modules developed and deployed according to user’s needs provide for a seamless digital workflow in the university’s teaching, learning and organisational processes. Encouraging the use of the LMS as a multi-purpose tool to support all kinds of co-operation and communication activities in addition to curricular teaching led to a rapid and significant increase in user numbers and encouraged university teachers to deal with educational questions and to develop innovative learning and teaching solutions, based on the LMS. Keywords: e-learning, learning management systems, Moodle, learning communities, communities of practice, collaborative work, business integration.
1 Background Vienna University of Technology’s administrative software TUWIS has grown over more than 40 years; modules to satisfy urgent needs and demands have been added step-by-step, resulting in a very heterogeneous and expensive to maintain enterprise information technology landscape with complex interfaces. Concerning e-learning, the situation was similar until a few years ago. Entrepreneurial teachers and researchers at Vienna University of Technology had carried out many successful projects and activities beyond self study which were able to demonstrate that immense benefits can be achieved by media supported learning, e.g.: • iChemEdu [1, 2] developed an internet-based laboratory information and management system, iChemLab, an e-book based e-content pool, iChemLecture, and an e-self-assessment tool, iChemExam; an ever growing database application initially containing more than 450 detailed synthetic experimental protocols has been developed by extracting and revising information from some 8,600 student work reports collected over the last years; • MODULOR and virtual campus for architecture [3]: to support face-to-face learning, the faculty for architecture started to implement a virtual campus, consisting of a learning portal, a media database, and groupware and courseware tools; • blended learning in continuing education, with active facilitation [4]. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 689–698, 2009. © Springer-Verlag Berlin Heidelberg 2009
690
F. Reichl and A. Hruska
Many of the successful and sustaining projects have developed their own hardware and software solutions. While some of these systems contain interfaces to the university’s administration system, many of them were implemented as stand-alone solutions with their own registration and authentication modules – thus resulting in redundant and partly inconsistent data. 1.1 Evolving E-Learning and E-Teaching Strategies In 2004, a new University Law released the Austrian universities into autonomy and gave them the opportunity and the obligation to develop their own strategies. Vienna University of Technology founded an E-Learning Centre as an organisational and administrative structure to consolidate earlier e-learning and e-teaching initiatives and to make the experiences and developments sustainable and applicable for a wider group of users. The E-Learning Centre supports teachers and students in all departments and all fields of study and maintains and permanently improves e-education tools. The university’s strategy aims at improving the quality and efficiency of their study offers: the application of new media shall intensify learning processes, improve learner’s perception of complex subjects, and it shall enable particular groups of students to participate in courses who would otherwise be disadvantaged. The university’s medium to long term goal is to support each basic course in initial studies by means of e-learning – not by replacing the face-to-face courses, but by establishing blended learning as the standard in learning and teaching. In parallel, learning provisions are improved with respect to quality management. Establishing quality standards follows state of the art processes as defined by ISO 9000 and ISO 9126 and described by various publications, e.g. [5]. According to Hagner’s characterisation of technology adaptors (distinguishing between entrepreneurs, second wave, reward seekers, and reluctants [6]), Vienna University of Technology aimed at creating a large group of second wave adopters by consolidating entrepreneurs’ initiatives and by building a “learning community of practice” among the university’s teaching professionals. To take the reward seekers on board, the rectorate has initiated an annual E-Learning Award for outstanding achievements in teaching with new media, encouraging teachers to share their achievements with colleagues. To create synergies, three significantly different universities in Vienna (University of Technology, University of Natural Resources and Applied Life Sciences, Academy of Fine Arts) collectively developed, implemented and evaluated their e-learning and e-teaching strategies; they joined forces in the project Delta 3 [7, 8, 9, 10, 11] which was carried out from October 2005 to September 2007 and was co-funded by the Austrian Federal Ministry for Education, Science and Culture during the tendering for e-teaching/e-education strategies at universities and Fachhochschulen [12]. 1.2 Implementing E-Learning Services and Infrastructure A successful blended learning strategy requires comprehensive offers for support, consultation, and qualification of university teachers, students and managers. Since the university followed a holistic approach for implementing e-learning strategies,
Promoting a Central Learning Management System
691
dealing with strategic, organisational, administrative, financial, and legal aspects, didactics and curricular integration, competence and expertise, acceptance and incentives, public relations and marketing, and quality assurance (compare [13]), efficient and effective implementation of such a strategy requires to develop users’ competencies in many such areas. Thus, the E-Learning Centre developed a portfolio of information and qualification offers that can be rapidly adapted to different needs and demands of users at the university. Integrated support structures have been created which offer a wide spectrum of individualised, problem oriented (or rather solution oriented), low threshold services to university teachers, adequate for any need and any amount of time available, ranging from online FAQs and a helpdesk to consultancy, information events and workshops, as well as coaching and guidance [11].
2 Implementation and Application of Moodle as the Central LMS The use of a centrally managed and maintained learning platform significantly increases the efficiency for teachers and students, leads to a more homogeneous presentation of learning provisions, and reduces management and support overhead. The E-Learning Centre of Vienna University of Technology implemented the LMS (Learning Management System) Moodle (Modular Object-Oriented Dynamic Learning Environment) as its central LMS under the university specific “brand name” TUWEL (Technische Universität Wien E-Learning) with specific adaptations towards the university’s corporate design, with interfaces to existing services, and with features and functions in order to support specific teaching and learning processes. Moodle is open source software with currently more than 28 million users on approximately 50,000 registered validated sites in more than 200 countries [14]. The most important extensions to Moodle were features allowing the integration of the LMS TUWEL into the university’s existing administration systems, thus enabling the use of existing data without having to duplicate them. Access to TUWEL is controlled via the university’s centrally managed authentication system so that all staff members and students can use the learning platform immediately, without additional administrative entry barriers. Information provided by TUWIS upon login is used by TUWEL to determine authorisation and rights of users, so that teachers have access to all the courses provided by their department. Upon every login, user data in TUWEL are updated from data in the central administration system. Additional interfaces connect the already existing e-learning tools (e.g. iChemLab, iRecord) to TUWEL and thus also to the administration system. Data concerning authentication, authorisation, enrolment, group management and meta-data on lectures are imported into TUWEL, and grades can be calculated from results of learning processes and exported from TUWEL back into the administration system TUWIS. All these features are implemented in a Moodle block “TUWEL Toolbox” so that they can be accessed easily from within each TUWEL course [15, 16]. On the other side, TUWIS (which contains all the relevant information) has been enhanced with a menu entry to generate and to announce a course in TUWEL, based on available data from TUWIS; this feature also automatically generates links between course entries in TUWEL and in TUWIS [15, 17].
692
F. Reichl and A. Hruska
2.1 Specific Features The standard open-source software package Moodle has been enhanced with special features oriented at educational requirements and based on users’ needs and requirements to support faculty specific learning processes, e.g.: • “checkmark assignments”: in many courses, students have to solve exercises at home and have to declare (by entering checkmarks into a list of exercises) before the start of a face-to-face group meeting which exercises they were able to solve; during the meeting, randomly selected students have to demonstrate their solutions; the process of submitting the “checkmarks” and the grading of the assignment are represented in TUWEL’s assignment type “checkmark assignment”; • advanced “working group” handling: for splitting large numbers of learners into groups, student working groups can be built in TUWEL in addition to the groups defined in TUWIS; teachers define the number of groups and the maximum number of participants per group; students can then form groups with their colleagues, and a learning product jointly developed for an assignment can be uploaded by one of the students; teachers are able to grade such an assignment for the whole group; • “activity reports”: this feature provides teachers with the possibility to anonymously contact groups of students depending on their activity (or non-activity) level with regard to specific assignments or downloads of documents; • advanced display features, e.g.: mathematical notation, rendering of LaTeX notation, molecular rendering of Jmol descriptions as 3D visualisations, syntactic highlighting of programming language code, displaying mind-maps created with the open-source software freemind via a flash plug-in [15, 18]; • advanced scheduling: a powerful scheduler enables teachers to announce a number of time slots to their students, e.g. for oral examinations, for discussing the results of their exercises, or for meetings during office hours; several teachers and tutors of a course may generate time slots, and students are able to see all time slots provided by all relevant teachers and to register for a specific time; • iRecord: a media-based e-portfolio has been developed by the faculty for architecture; a specific TUWEL assignment feature allows the submission of documents from TUWEL assignments directly into the iRecord system [15]. 2.2 Promoting the LMS for E-Learning, E-Teaching – and beyond A low threshold usage of the centrally maintained learning platform was solicited, aiming at first increasing the quantity of e-learning provisions; later on, every course shall use the centrally supported learning management system and apply those of its functions (administration, content-presentation, organisation of learning activities, communication, feedback, assessment) which are relevant for the specific course. We correctly expected that awareness for the quality of (online) learning and teaching would rise at a later stage, after teachers took their first attempts; like other universities, we experienced that the application of an LMS and the availability of support and training have encouraged the university teachers to deal with educational questions and to develop innovative solutions, thus influencing teaching in a positive way [11].
Promoting a Central Learning Management System
693
Fig. 1. TUWEL LMS use by semester (number of teachers includes staff and tutors)
TUWEL went online in March 2006 with less than 50 courses created by teachers of the computer science faculty. The big challenge was – and still is – to promote the benefits of deploying the platform for teaching purposes. In October 2008, TUWEL provides approximately 350 different courses per year (see Fig. 1). The platform is used almost any time and by up to 5,870 different users per day (see Fig. 2). Like at many other universities, the centrally supplied LMS is used by approximately 15% of the teachers. However, many learning activities applying new media have evolved in parallel (e.g. [19, 20]). As expected, the platform was most intensively used by the faculties and departments with the largest number of students (Informatics: 5.277 students, 874 beginners in winter term 2008/09; Architecture: 3,968 students, 842 beginners in winter term 2008/09 [21]). The largest “regular” courses for students, with up to 800 participants, are – as expected – from these two faculties. However, the LMS TUWEL is not only used for courses in regular curricula but also for courses for the university’s staff members, e.g.: • a welcome-event “getTUgether” is offered twice per year and provides new staff members with information about the university’s structure and organisation, services and service units; • specific information and training is offered for security officers and fire wardens; • the university’s library offers introductory courses to students and researchers; • introduction to the LMS’s features is also provided in TUWEL in several formats (e.g. content creation tutorials, TUWEL tutorials).
694
F. Reichl and A. Hruska
Fig. 2. Increase in the number of different TUWEL users per day
Interestingly enough, the university’s management and the central university offices and service departments very soon discovered the capabilities and benefits of such a platform which is easily available and accessible to all university employees and students. They thus use the TUWEL platform regularly for communication, co-operation and administrative purposes, e.g.: • the E-Learning Centre itself uses TUWEL to administer the annual E-Learning Award: general information is provided in “online resources”, the upload of applications is handled by TUWEL’s assignment feature which allows the jury members to download and evaluate the applications; • in the year 2006, the university had to decide whether to build a new campus at the outskirts of Vienna or to expand the capacity of its buildings located downtown; the rector’s office used TUWEL’s “online resources” to present different concepts, invited staff members and students to discuss pros and cons in TUWEL’s forums and deployed Moodle’s “feedback” module to survey university members’ opinions and preferences; • so far, there is no other centrally maintained tool for content management and for computer supported co-operative work at the university; various commissions and planning committees thus use TUWEL’s upload and download features (online resources and forum entries with attachments) to exchange and co-operatively work on documents; • the medical service department organises appointments with staff members for health checks and specific programmes, e.g. vaccination campaigns, courses, and consulting services: staff members are in the role of course participants, the medical staff acts as teachers; during the summer term 2008, more than 600 appointments have been scheduled, deploying TUWEL’s advanced scheduling;
Promoting a Central Learning Management System
695
• deploying Moodle’s “feedback” module, the teachers’ and researchers’ association ran a survey to investigate on job satisfaction, making use of TUWEL’s possibility to guarantee that only staff members were able to fill out the form (and only once per person) and that the replies were completely confidential; • physics teachers co-ordinate contents and learning material of their courses: they use Moodle’s “database” module to collect and retrieve data on the relationship between courses, teachers and learning material and to organise for peer reviewing and co-ordinating their teaching material. We originally expected the departments with the largest number of students to be the ones offering the courses with the largest number of students – but this is not the case. Some non-curricular activities which are implemented in the TUWEL LMS as courses have well over 1,000 registered participants – they are thus larger than all the “regular” courses offered to students (see Fig. 3): The largest (by number of participants) currently offered course is the TUWEL Tutorial introducing teachers to the use of the LMS with 1,362 participants, followed by the “Medical Services” course with 1,214 participants. The largest course in TUWEL so far has been the “Location Debate” in 2006 with 2,925 participants who generated 13,500 “action log entries” for this specific course on the most intensive single day. Especially the “Location Debate” (which started shortly after TUWEL had been launched) involved many of the university’s staff members, and it thus provided more than 1,000 teachers with a positive first contact to the LMS; within only a few days, the number of TUWEL users increased by approximately 1,500, providing the E-Learning Centre with a “jump start” for many of the university’s established teachers. In 2006 and 2007, the E-Learning Centre of Vienna University of Technology provided essential e-teaching training for two persons of (nearly) every department
1 2 3 4 5 6 7 8 9 10 11 12
course Location Debate (Rector's Office) TUWEL Tutorials (E-Learning Centre) Medical Service Department Software Engineering and Project Management Beginning Students' Information and Orientation Algorithms and Data Structures Building Theory Mathematics 1 Social Issues in Computing Urban Development Fundamentals of Computer Science Introduction to Computer Programming
13
Teachers' and Researchers' Association
curriculum non-curricula non-curricula non-curricula Informatics Architecture Informatics Architecture Informatics Informatics Architecture Informatics Mechanical Engineering non-curricula
Fig. 3. Largest TUWEL courses by number of participants
participants 2,925 1,362 1,214 778 772 749 745 707 669 574 554 415 406
696
F. Reichl and A. Hruska
who further on act as multipliers and provide first-level support for the department’s teachers, further increasing LMS user numbers. Since 2007, all new staff members get in touch with TUWEL for the welcome-events “getTUgether”.
3 Outlook In January 2008, Vienna University of Technology started the Enterprise Application Integration project TISS (TU Wien Information Systems and Services, [22, 23]) which will replace the old TUWIS System step by step with a common technological architecture and will enable an interactive application management, supporting the addition of new services. Existing systems will be integrated into a homogeneous, consolidated platform. TISS will provide TUWEL which has grown into an important tool for learning and teaching with interfaces for integration. Features currently offered by TUWIS, TUWEL, and other platforms in parallel will be harmonised further. TUWEL’s features will thus become even easier to use, more effective and efficient. This development will increase university teachers’ motivation to use the LMS to improve their teaching provisions.
4 Conclusions In order to utilise the full potential of a centrally managed and maintained learning management system for increasing efficiency for teachers and students by leading to a more homogeneous presentation of learning provisions, the E-Learning Centre of Vienna University of Technology promoted the benefits of applying the platform for teaching purposes by integrating the LMS into the university’s central administration system, by adapting the LMS to specific needs and demands of its users, and by encouraging the application of the LMS various purposes. The learning management system which had originally been introduced to support teacher-student communication for learning purposes thus has become a multipurpose tool supporting all kinds of other co-operation and communication activities, due to the benefit of having a well introduced interface. The E-Learning Centre actually encourages this originally not intended use of the learning platform to further promote its application for teaching purposes. Teachers who had not come in touch with efficient and effective use of online communication and co-operation before experience the benefits at first hand. This gets them start thinking about introducing ICT supported processes into their course concepts, consequently creating new courses and increasing the number of courses available, but also dealing with questions of educational quality and increasing the quality of the courses.
References 1. Fröhlich, J., Krebs, H., Lohninger, H., Untersteiner, F., Gärtner, P.: iChemEdu - the e Learning concept of the Faculty of Technical Chemistry at the Vienna University of Technology. ZIDline 12 (2005), http://www.zid.tuwien.ac.at/zidline/zl12/ 2. http://www.ichemlab.at/ (March 6, 2009)
Promoting a Central Learning Management System
697
3. http://modulor.tuwien.ac.at/ (Feburary 27, 2009) 4. Reichl, F., Vierlinger, U.E., Obermüller, E.: Active Learner Support for eLearning in Continuing Engineering Education: Theory and Practice. In: CIEC - Conference for Industry and Education Collaboration, Savannah (2005) 5. Ehlers, U.-D., Goertz, L., Hildebrandt, B., Pawlowski, J.M.: Quality in e learning. Use and dissemination of quality approaches in European e learning. A study by the European Quality Observatory. Office for Official Publications of the European Communities, Luxembourg (2005), http://www2.trainingvillage.gr/etv/publication/download/ panorama/5162_en.pdf (Feburary 27, 2009) 6. Hagner, P.R.: Interesting practices and best systems in faculty engagement and support. National Learning Infrastructure Initiative (NLII) White Paper (2001), http:// www.educause.edu/ir/library/pdf/NLI0017.pdf (Feburary 27, 2009) 7. http://www.delta (Feburary 27, 2009) 8. Csanyi, et al.: (AutorInnenkollektiv des Projekts Delta 3): Delta 3. Ein eStrategie–Projekt der Technischen Universität Wien, Universität für Bodenkultur Wien & Akademie der bildenden Künste Wien. In: Seiler Schiedt, E., Kälin, S., Sengstag, C. (eds.) E-Learning Alltagstaugliche Innovation?, pp. 97–107, Waxmann, Münster (2006) 9. Fröhlich, J., Herbst, I.R., Reichl, F.: Delta 3 – ein interuniversitäres Projekt zur Entwicklung und Umsetzung von e Learning-/e Teaching-Strategien an den Partnerinstitutionen. ZIDline 13 (2009), http://www.zid.tuwien.ac.at/zidline/zl13/delta3.html (Feburary 27, 2005) 10. Henkel, B., Herbst, I., Krameritsch, J.: Delta3. Ein Strategieprojekt der Technischen Universität Wien, Universität für Bodelkultur Wien & Akademie der bildenden Künste Wien. 11. Europäische Jahrestagung der Gesellschaft für Medien in der Wissenschaft, Zürich (2006) 11. Reichl, F., Csanyi, G.S., Herbst, I.R., Hruska, A., Obermüller, E., Fröhlich, J., Michalek, C.R., Spiegl, A.: Delta 3 - A Strategic E Education Project Creating Added Value from Complementarity. In: Luca, J., Weippl, E. (eds.) Proceedings of ED-MEDIA 2008 World Conference on Educational Multimedia, Hypermedia and Telecommunications,Association for the Advancement of Computing in Education (AACE), Chesapeake, pp. 465–473 (2008) ISBN 1 880094 65 7 12. http://strategie.nml.at/ (Feburary 27, 2009) 13. Kleinmann, B., Wannemacher, K.: E-Learning an deutschen Hochschulen. Von der Projektentwicklung zur nachhaltigen Implementierung. HIS Hochschul-Informations-System GmbH (2004) 14. http://www.moodle.org (Feburary 27, 2009) 15. Hruska, A., Potocka, K., Reichl, F.: Mit zwei Klicks zum neuen TUWEL-Kurs. ZIDline 18 (2008), http://www.zid.tuwien.ac.at/zidline/zl18/tuwel_kurs/ (Feburary 27, 2008) 16. Hruska, A., Potocka, K., Reichl, F.: E-Learning mit TUWEL - Tendenz stark steigend. ZIDline 16 (2007), http://www.zid.tuwien.ac.at/zidline/zl16/tuwel/ (Feburary 27, 2007) 17. Hruska, A., Potocka, K., Reichl, F.: TUWEL - News WS2008 & Moodle Konferenz, ZIDline 19 (2009), http://www.zid.tuwien.ac.at/zidline/zl19/ (March 6, 2009) 18. http://freemind.sourceforge.net/wiki/index.php/Main_Page (Feburary 27, 2009)
698
F. Reichl and A. Hruska
19. Csanyi, G.S., Jerlich, J., Pohl, M., Reichl, F.: Formal and Informal Technology Enhanced Learning for Initial and Continuing Engineering Education. In: IACEE 11th World Conference on Continuing Engineering Education (WCCEE), Atlanta (2008) 20. Csanyi, G.S.: Informal Learning in the Context of Formal Academic Education - Some Factors of Success. In: Szűcs, A.,, T. (eds.) New Learning Curtures - How do we learn? Where do we learn? EDEN Annual Conference 2008 - New Learning Cultures, Lisbon. EDEN - European Distance and E-Learning Network, Budapest (2008) 21. http://www.tuwien.ac.at/wir_ueber_uns/zahlen_und_fakten/daten/ (Feburary 27, 2009) 22. Kleinert, W., Grechenig, T., Költringer, T., Bernhart, M., Knarek, A., Schönbauer, F.: The Making of TISS: Juni 2008. ZIDline 18 (2008), http://www.zid.tuwien.ac.at/zidline/zl18/ (March 6, 2008) 23. Suppersberger, M., Bachl, S., Staud, P., Knarek, A., Kleinert, W.: TISS – Planen der Straßen und Roden im Dickicht. ZIDline 19 (2009), http://www.zid.tuwien.ac.at/zidline/zl19/ (March 6, 2009)
Framework for Supporting Decision Making in Learning Management System Selection Yuki Terawaki The University of Tokyo Graduate School of Arts and Sciences 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan [email protected]
Abstract. When introducing computer systems as well as the e-Learning systems such as Learning Management System (LMS), requirements analysis is very important. In recent years, it has been noted that most of the people who introduce e-Learning systems in Japan are non-professionals of information technology. Generally, requirements analysis is well known for being very difficult for the information technology professional, and even more difficult for the non-professional. Therefore, this paper proposes a framework for supporting decision-making in Learning Management Systems selection for the nonprofessional. Using Analytic Hierarchy Process (AHP) and Quality Function Deployment (QFD), this framework recommends a LMS based on the priority of requirements. Keywords: Requirements Engineering, RE, decision making, Analytic Hierarchy Process, AHP, Quality Function Deployment, QFD, Learning Management System, LMS.
1 Introduction In Japan, information systems used for education support, such as e-Learning, have been intergraded into higher education. Unfortunately, the number of people who are qualified to introduce the e-Learning system is low. Furthermore, those people are not always information technology professionals. This creates a unique situation which requires unique attention. When introducing the computer systems, requirements analysis about e-Learning is very important, as well as usual computer systems. Requirements analysis must include identifying stakeholders and acquiring their needs, and modelling these by defined languages. There are a number of inherent difficulties in this process. Generally, requirements analysis is well known for being very difficult for the information technology professional, and even more difficult for the non-professional. Therefore, this paper presents a framework for supporting decision making in Learning Management Systems (LMS) selection for the non-professional. Using Analytic Hierarchy Process (AHP) and Quality Function Deployment (QFD), this framework recommends LMS based on the priority of requirements. The framework is developed as a web-based application, and then the appropriateness is discussed through experimentation. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 699–707, 2009. © Springer-Verlag Berlin Heidelberg 2009
700
Y. Terawaki
This paper is organized as follows. Section2 outlines AHP and QFD, and reviews some related Requirements Engineering (RE) works in order to consider for appropriateness of these techniques. Section3 describes proposed framework. Section4 presents LMS selection support system developed as web-based application nameed JULY. Section5 describes experimentation. Section6 accounts for experimental results. Section7 discusses the availability of this work. Finally, Section8 concludes this research.
2 Supporting Decision Making for Learning Management System In our daily lives we have to make decisions very regularly and choose from a myriad of options. We choose the best candidate in consideration of various factors like knowledge of the chosen object, standard, speculation, and preference, etc. However, it is not easy to say that all factors will be considered when decision making done. Moreover, if knowledge and the standard about the chosen object are clear, it is comparatively easy to choose, but if not, when comparing them, bias will occur. A major technique of the decision-making method to deal with such a problem has the AHP. Furthermore, this research aims to support non-professionals of information technology, and people who have a lack expertise with LMS function and the requirements for LMS. To support these people, this research adds QFD to AHP. QFD helps to decrease language mismatches between customer needs and LMS features. By using QFD, a subjective requirement of non-professionals can be correlated with the feature of LMS function and the professionals’ objective LMS information. As a result, the non-professional is able to understand the functional elements of LMS and can utilize the AHP. The next section outlines AHP and QFD, and the appropriateness of the use of these techniques are discussed by referring to earlier studies of RE. 2.1 AHP: Analytic Hierarchy Process AHP was developed by T.L. Saaty in the 1970's [1]. AHP is a structured technique for helping people deal with complex decisions. AHP is also a theory of measurement through paired comparisons and relies on the judgments of experts to derive priority scales. AHP first decompose user’s decision problem into a hierarchy. The hierarchy is composed of three elements. There is the goal, the criteria and the alternatives, with the goal at the top, the alternatives at the bottom, and the criteria in the middle. Users of the AHP obtain rankings and weighting through paired comparisons of items in each criterion that are converted to normalized rankings using the Eigenvalue method. This means that the relative rankings of alternatives are presented in ratio scale values that total one. 2.2 QFD: Quality Function Deployment In the 1970's QFD was developed by Yoji Akao, Shigeru Mizuno and others. QFD is recognized as a way to expand and implement the view of quality [3]. QFD shows that it is particularly beneficial for interdisciplinary communication, clear understanding of customer and/or user requirements, consent about the solutions found, completion of documentation of all steps taken, a profitable product and satisfied
Framework for Supporting Decision Making in Learning Management System Selection
701
customers. It has been widely applied in many industries worldwide, such as automobile, electronics, food processing, computer hardware and software ever since. QFD gradually decomposes the work forming the quality by the purpose and the means. QFD uses matrices to organize and relate pieces of decomposed data to each other. These matrices are often combined to form a basic tool of QFD. In the RE and Software Engineering (SE) communities, QFD is called Software Quality Function Deployment (SQFD) [2][3]. SQFD focuses on improving the quality of both the software development process and the product. It has been applied to improve software quality in many large organizations. As a result, it's recognized as being able to enhance communication between customers and software developers and testers. It also improves customer satisfaction. 2.3 To Calculate Weight of User's Requirements AHP has received some interest in the RE and SE communities. For instance AHP is used as part of the development of systems based on COTS (Commercial-Off TheShelf)[4][5]. Maiden et al. [5] not only confirmed usefulness of AHP but also debated the issue when they are proposing a template-based method called PORE (Procurement-Oriented Requirements Engineering) to support off-the-shelf systems selection. According to Maiden et al., AHP should be used to weight customer requirements, but not to determine product compliance to these requirements. This research agrees on weighting customer requirements. Moreover, they pointed out the following problems: (i) Assumption for using AHP that all criteria are independent is not satisfied. (ii) As the number of judgments increases, paired comparison, also increases. These problems are discussed at not only RE communities but also communities that consider calculation method of AHP, such as Operations Research. As previously described, AHP normalized the weight sum to be one. For this reason, when there is a subsidiary between criteria, and due to newly added criteria, the original weight will change. However, assumption that all criteria are independent is not always satisfied. Generally, it is difficult to remove a subsidiary completely. Therefore, this research uses the technique proposed by Ichihashi [6]. Ichihashi normalized the weight of criteria’s maximum value to be one. 2.4 Decomposition of Users’ Requirements When users of AHP make paired comparison, they use knowledge about criterion. Therefore, being limited only to the use of AHP cannot support non-professionals of information technology. Hence, it is important to provide requirements using nontechnical language for users of this framework (the target user). Moreover, paired comparison become available by determining these requirements of nontechnical language. Actually, the problem of vocabulary is a critical issue to be treated. According to Finkelstein et al. [4], there is a language mismatch between COTS features description and customers needs, and this mismatch increases the chances of selection failing. Therefore, it is necessary to decompose included requirements in unclear user's needs. Also, it is important to disclose the relationship between the decomposed user’s requirements and the LMS function (in other words, the relationship between nontechnical language and technical language).
702
Y. Terawaki
In RE communities, Goal-Oriented Requirements Engineering (GORE) has been recognized as a leading technique of eliciting or decomposing unclear customer needs. To elicit the requirements for an expected system from stakeholders, GORE uses a hierarchical tree structure. Directly borrowed from problem reduction methods in Artificial Intelligence, AND/OR graphs may be used to capture goal refinements. For instance, KAOS (Knowledge Acquisition in autOmated Specification) developed by Lamsweerede et al. [7] is the leading method of GORE. KAOS not only has features of GORE but also allows formal verification by applying temporal logic to each description of the goal. GORE has several advantages if this research uses it. However, QFD and GORE are similar in some respects. To obtain concrete requirements, GORE gradually decomposes a customer’s requirements from the highlevel goal to sub-goal (low-level goal). QFD decomposes requirements told by an ordinary customer’s voice, and makes a hierarchical list of customer’s requirements. The hierarchical list of customer’s requirements is combined with the hierarchical list created by technical personnel through decomposing data such as the function, operativeness, and flexibility, etc.,. Therefore, depending on the creation of matrices (QFD), representation of the hierarchical tree of GORE can be possible. To obviate mismatch of language, and to make AHP available, this research needs to provide the target user the told requirements by using a user's (or customer’s) voice. Moreover, a requirement by target user's voice demands a translation to the features of LMS function. Therefore, to propose framework, this research uses some ideas from QFD.
3 Concept and Calculation This framework proposes recommended LMS based on priority of target user’s requirements even if the target users are not familiar with functions and requirements of LMS. As for calculating the priority of target user’s requirements, paired comparison by the target user becomes available through providing the told requirements by using a user (or customer) associated with functional elements of LMS. Fig. 1 shows conceptual diagram of the framework.
Fig. 1. Conceptual Diagram
Framework for Supporting Decision Making in Learning Management System Selection
703
Strictly speaking, the typical framework user is a non-professional of information technology or technical personnel (person skilled in information technology and LMS). To provide the requirement statements associated with functional elements for non-professionals of information (target user), this framework requires the knowledge of technical personnel in the first stage. In the first stage, a technical personnel needs to create two data files for this framework: One is a “Requirements- Function Table” and the other is a “LMS- Function Table” (in this research I created the two data files). These data files are created in order to approach requirements of the target user by using QFD. The “Requirements- Function Table” represents correlations between user’s (or customer) requirements and LMS functions. The “LMS- Function Table” represents correlations between LMS names and LMS functions. The “RequirementsFunction Table” and the “LMS- Function Table” creates the categories such as the requirements of BBS, Quiz, and Material using the KJ method. Thus, the decision hierarchy is built. In the next stage, the “Requirements- Function Table” and the “LMS- Function Table” are utilized for providing the recommended LMS. The framework represents a pair of requirement statements at once from the “Requirements- Function Table” for the target user. The target user then makes decision about them: for instance, which is more important. This means that the target user makes paired comparisons by himself. The priority of the target user’s requirements is then calculated. Furthermore, a recommended LMS is calculated by associating the priority of the target user’s requirements with the “LMS- Function Table”. The calculation method is as follows. The priority of the target user’s requirements obtained by using AHP is defined as follows.
W r = [w1
w2
... w 3 ]
T
(1)
Matrix R, the “Requirements- Function Table” is defined as follows: rows are number of requirements ‘m’, columns are number of function ‘n’, and entries are ‘(m, n)’.
R = [rmn ]
(2)
Matrix L, the “LMS- Function Table” is defined as follows: rows are number of LMS ‘m’, columns are number of function ‘n’, and entries are ‘(m, n)’.
L = [lmn ]
(3)
The weight of function is calculated by the following expression.
W f = RT ⋅ W r
(4)
The weight of LMS is calculated by the following expression.
Wl = L ⋅ W f
(5)
In addition, the framework arranges weight of “ W l ” in descending order and then presents it to the target user as recommended LMS.
704
Y. Terawaki
4 Implementation This section presents the development environment and system features for the web based application, JULY. Development environment: The framework was developed as a web-based application called JULY. JULY is organized as follows. Server- side programs are written in the Ruby programming language. Two data files, the “Requirements –function Table” and the “LMS –function Table” are stored on server-side. Client- side programs use Asynchronous JavaScript + XML (Ajax) programming language. The web browser that a target user uses on the client side assumes FireFox and Internet Explorer. System features (Disclosure of requirement statements and recommended LMS): First, a target user chooses a category, then makes paired comparisons with the requirement statements contained within the chosen category. JULY provides only one pair of requirement statements on the screen at a time from the “Requirement Function Table” in random order for the target user. Depending on the context of the requirement statements, the target user needs to change the measurement scale of AHP when the target user makes paired comparison: for instance, the measurement scale needs to change to another convenient verbal, such as “likable”. Thus, when making paired comparisons intuitively, JULY utilizes graphics in order to represent a measurement scale. Finally, JULY provides the result screen including the recommended LMS, the priority of function and the priority of requirements.
5 Experimentation Subjects for the experiment were university staff with the potential to introduce LMS. There were a total of 20 subjects: 3 in their 20’s, 13 in their 30’s, 3 in their 40’s, and one person in their 60’s. 5 were bachelor’s degree holders, 10 were master's degree holders, and 5 were doctorate degree holders. The ratio of men to women was 3:1 and the ratio of Arts to Sciences was 2:3. Methods: Based on the scenario, the experiment was conducted through selecting a pseudo LMS. The scenario is described briefly as follows: a teacher who takes charge of a 300-student class wants to use the LMS to distribute materials with online, and to conduct a online quiz. There are two kinds of scenario: Scenario A has the requirements priority about distributing materials is high, Scenario X has the requirements priority about quiz is high. Depending on the scenario used for the examinee, this framework will calculate different recommended-LMS. Pseudo LMS were created based on the actual product. There are 5 pseudo LMS: one has more functions, and is suitable for the scenario; one has less functions, and is suitable for the scenario; one has more functions, and is moderately suitable for the scenario; one has less functions, and is moderately suitable for the scenario; one is not suitable for the scenario. One examinee executes the selection twice: one using JULY, and the other without using JULY. This experiment keeps the first experimental results secret to examinee.
Framework for Supporting Decision Making in Learning Management System Selection
705
If examinee does not use JULY, the functions forming LMS are given to them on an A3 card. This information is the same as JULY’s data. In addition, if the examinee matches the requirements of a scenario, this experiment assumes that the more multifunctional LMS becomes highly ranked. Collection method of data: There are two types of analysis data: one is the serverside log of JULY, the other is the time taken by examinee until obtaining a recommended LMS. The server-side logs collected the examinee’s action record: They show which category was clicked, and how long need time for a pair of requirement statements, and which measurement scale was clicked. The time taken by examinee was timed with a stopwatch when examinee selects LMS by using JULY, and without using JULY, respectively.
6 Experimental Result Tendency of selected LMS: Fig. 2 shows the number of examinees who obtained the ranking of LMS matching requirements of scenario. When using JULY, 14 examinees had obtained ranking of LMS which matched requirements of scenario. When examinee selects LMS without use of JULY, there were 8 examinee that matched requirements of scenario. Fig. 2 shows that examinees who failed to select LMS suited to the requirements tended to select LMS suited to the requirements by using JULY.
Fig. 2. The number of examinees who obtained the LMS meeting the requirements of scenario
Required time to answer a pair of requirement statements: Fig. 3 shows the vicissitude of time: time the examinees required to answer a pair of requirement statements. Fig. 3 shows that from the first question during the tenth question, examinee took from 15 seconds to 45 seconds per question. Thereafter examinee took from 5 seconds to 8seconds per question. Initially, examinee needs more time to understand JULY’s operation. However, once examinee understood JULY’s operation, examinee can make paired comparison within a relatively short time. The time taken to select LMS: Fig. 4 shows the data grouping the time taken when examinee selects LMS by using JULY, and without using JULY, and then measured T-statistics. Difference in the time of two populations is significant: t0 = 6.114 > t
706
Y. Terawaki
Fig. 3. The vicissitude of time: time the examinee required to answer a pair of requirement statements
Fig. 4. Result of T-statistics
(19, 0.02) = 2.861). In this result, it is clear that time is shortened when using JULY, rather than when LMS is selected by not using JULY.
7 Discussion This research shows the following result. 1. Examinees who failed to select LMS suited to the requirements tended to select LMS suited to the requirements by using JULY. 2. It is clear that time is shortened when using JULY, rather than when LMS is selected by not using JULY. When examinees do not use JULY, examinees cannot select LMS corresponding to requirements of the scenario. However, when examinees use JULY, examinees tend to be able to select LMS corresponding to requirements of the scenario. In addition, necessary time for selecting LMS is reduced by using JULY. This means that comparing all together five pseudo LMS is too difficult. When examinees do not use JULY, requirements are complexly intertwined with related factors such as fondness, speculation and comprehension for LMS. However, if
Framework for Supporting Decision Making in Learning Management System Selection
707
examinees use JULY, there are only two comparative requirement statements at a time. They were able to assess many requirement statements efficiently in a short time. In addition, requirement statements are easy to understand for examinees due to using QFD. This means that the “Requirements- Function” tables were created appropriately.
8 Conclusion There are a lot of existing LMS including commercial LMS or open-sourced LMS. Even if we specialized in accessible LMS when we introduce LMS, it is still difficult to make a decision about necessary function by recognizing all functions of each LMS. It is exceedingly difficult to compare a lot of functions all together, and to implement coherent estimate if knowledge and the standard are not clear. In addition, selecting LMS takes a lot time. To make up for the shortfall in human resources, this research proposed a framework consisting of adding QFD to AHP. This framework allowed recommendation of LMS based on priority of user’s requirements about LMS. Utilizing either AHP or QFD is not enough to satisfy the staffing shortage for introducing the LMS representing the eLearning system as described in section one. In this framework, the multicriteria decision making method AHP is available to non-professionals of information technology themselves, through providing non-technical terms for them, and by using the customer-oriented method QFD. The fusion of these methods was effective, and utilizing this framework will be contributory to solving the staffing shortage for introducing the e-Learning system in Japan.
References 1. Satty, T.L.: How to make a decision -The Analytic Hierarchy Process. European Journal of Operational Research 48 (1990) 2. Liu, X.F.: Software quality function deployment. Potentials IEEE 19(5), 14–16 (2001) 3. Herzwurm, G., et al.: QFD for Customer-Focused Requirements Engineering. In: Proceedings of the 11th IEEE International Conference on Requirements Engineering (2003) 4. Alves, C., Finkelstein, A.: Challenges in COTS Decision-Making- A Goal-Driven Requirements Engineering Perspective. In: Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering, pp. 789–794 (2002) 5. Naiden, N.A., Ncube, C.: Acquiring COTS Software Selection Requirements. IEEE Software 15(2), 46–56 (1998) 6. Ichihashi, H.: On a normalization of the grade of importance whose maximum value attains 1. In: 15th FUZZY System Symposium, Kobe, pp. 307–312 (1989) (in Japanese) 7. Darimont, R., Lamsweerede, A.v.: Formal Refinement Patterns for Goal-Driven Requirements Elaboration. In: Proceedings FSE4 4th ACM Symposium on the Foundations of Software Engineering, pp. 179–190. San Francisco (1996)
Statistics-Based Cognitive Human-Robot Interfaces for Board Games – Let’s Play! Frank Wallhoff, Alexander Bannat, J¨ urgen Gast, Tobias Rehrl, Moritz Dausinger, and Gerhard Rigoll Department of Electrical Engineering and Information Technology, Institute for Human-Machine Communication Technische Universit¨ at M¨ unchen 80290 Munich, Germany
Abstract. The archetype of many novel research activities is called cognition. Although separate definitions exist to define a technical cognitive system, it is typically characterized by the (mental) process of knowing, including aspects such as awareness, perception, reasoning, and judgment. This especially includes the question of how to deal with previously unknown events. In order to further improve today’s human-machine interfaces, which often suffer from deficient flexibilities, we present a cognitive humanrobot interface using speech and vision. The advancements against regular rule-based approaches will become obvious by its new interaction strategies that will be explained in the use case of a board-game and a robot manipulator. The motivation behind the use of cognition for human-machine interfaces is to learn from and adapt to the user leading to an increased level of comfort. For our approach, it showed proof that it is effective to separate the entire process into three steps: the perception of external events, the cognition including understanding and the execution of an appropriate action.1
1
Introduction
The excellence research cluster Cognition for Technical Systems CoTeSys is working on establishing a more intelligent and useful behavior of technical systems, in particular for unpredictable events in its surroundings. Therefore, cognition [1,2] could be the way to surpass the current constraints (rule-based and static behavior) incorporated in nearly all kinds of today’s operating technical systems. Such a cognitive system will monitor its environment and respond on changes. Furthermore, its skills will improve over time using generic learning algorithms. In general there are many possibilities where technical systems equipped with cognition can cover new areas of applications. The cluster of excellence focuses on ambient living [3] and advanced robotics [4], especially in an industrial context [5]. 1
All authors contributed equally to the work presented in this paper.
M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009,LNCS 5618, pp.708–715, 2009. c Springer-Verlag Berlin Heidelberg 2009
Statistics-Based Cognitive Human-Robot Interfaces for Board Games
709
However, the before mentioned two scenarios (ambient living and advanced robotics) entail many restrictions and constraints due to the technical state of the art as well as the field of application, therefore, we consider to work on a further field of application focusing on pure cognitive processing (playing games). The rest of this paper is organized as follows: In Section 2, a brief description of the hardware set-up in action is given. In Section 3, we introduce the underlying system architecture for the presented cognitive game application. Afterwards, in Section 4, we have a closer look on the actual interaction between the humanoid player and the cognitive playing counterpart with its actuator in form of an industrial robot manipulator arm. The paper closes with a summary and an outlook over the next planned steps.
2
Hardware Set-Up
The constructed hardware set-up is built by two major components. First, the actuator used to interact in the board game is an industrial robot arm. Second, for the presentation and scene observation a framework is mounted above the game interaction plane. This plane is equipped with the industrial robot arm. Besides, the necessitated computations for the cognitive game playing interactions and manipulation of the pawns are executed in a distributed manner, involving two processing units (standard PC with AMD Phenom 2.2GHz quad-core and four gigabyte RAM). One of these units is exploited to allow remote access towards the robot controller as well as a displaying module. The remaining unit is in charge of the cognitive game processing as well as scene surveillance. 2.1
Cognitive Interaction Unit
The industrial robot manipulator arm currently involved in the cognitive game playing process, is a Mitsubishi robot RV-3SB. This robotic interaction unit has one arm with six degrees of freedom and it can lift objects with a maximum weight up to three kilograms. The effective radius for performing gripping interactions is limited to a total of 0.642 m around the robot’s trunk. Its tool point is equipped with a force-torque-sensor and a tool-change unit. The instantaneous communication between the robotic arm manipulator and the game control unit is implemented via a network interface provided by a middle-ware similar to CORBA. This approach facilitates the interaction between different operating systems as well as different programming languages. Additionally, the outsourcing of the robotic control unit in an own module allows for an easy substitution and integration of other robotic manipulators in the future. A client-server model has been chosen as communication paradigm. The implemented server provides the client with three dimensional access to all reachable points of the robot arm. However, for the game operation we constrained the allowed movements in two parallel x-y-planes, see Figure 1. One plane is used for the translation of the pawns from one game field point to another. The pick-up and place operation is performed in the second movement layer, closer to the actual game plane.
710
F. Wallhoff et al.
Table Projection Unit Camera
z y
Cognitive Interaction Unit
x
Translation Plane
Pick-Up and Place Plane
Fig. 1. Overview of the working x-y-planes
2.2
Projection and Observation Units
A projection unit is attached in the framework mounted above the interaction plane. Via this projection unit the board game field and required interaction buttons are displayed onto the workbench at a total resolution of 1024 × 768 pixels. The obeyed vision sensor consists of a regular webcam with a resolution of 640 × 480 pixels running at 15 frames per second. Furthermore, a microphone with sufficient sound quality for speech recognition is also embedded in this device leading into a handy setup. These sensors allow for the recognition of speech commands, the recognition of movements and actions on the desk as well as the detection of gaming pieces and the number of eyes on the dice.
3
System Architecture
The connection and interaction of the above described sensors and actuator is formed by the well structured system architecture. This architecture has a modular concept, basing on the approach presented in [6,7,8]. The therein presented real-time database is capable of handling huge amounts of data from different input modules, having a different update rate. The desired module interaction is achieved together with the Internet Communication Engine (ICE) middle-ware. We will have a closer look on these participating modules in the following. 3.1
Dialog Manager
The management and adjustment of information exchange between the user and the system is realized by the dialog manager. In this case we use a commercial version of a dialog management system for speech recognition and speech synthesis. According to the current step within the game and perceived events,
Statistics-Based Cognitive Human-Robot Interfaces for Board Games
711
the system can utter information for the user or comment on these events. The speech recognition is primarily required at the start of the game for initializing the interaction with the player. Therefore, simple decisions have to be made, e.g. which color the pawns of the player should have. In these first interaction steps, the necessitated grammar for the speech recognition can be adapted in run-time to enhance the recognition rate. Afterwards, the state machine will transit into the playing mode, where the game logic and the related robotic actions are performed. Therefore, to enable a cognitive playing interaction, the “Cognitive Playing Strategy Module” takes over the control to create reasonable playing moves. 3.2
Cognitive Playing Strategy Module
This module is responsible for determining the next playing moves. At first, the rules of the board game “Mensch a¨rgere dich nicht” have to be incorporated into the knowledge database. This database is composed of a PROLOG-based rules collection, which can be accessed via a C++ wrapper. The database is managed and kept up-to-date with current positions of the pawns to infer the next possible allowed moves. After extracting all allowed possible moves, a Bayesian Network is used to determine the best choice. Since the training of the Bayesian Network is entirely done by data retrieved from previous games, the inference will improve its performance continuously. With this gained result the related gripping and movement actions are conducted by the robot. 3.3
Displaying Module
The content shown by the projection unit is controlled by the displaying module. This module is implemented as a server and offers remotely callable procedures to display images at given coordinates and size onto the workbench. Highlighting of objects and special regions for interaction, e.g. for rolling the dice, can also be done with the incorporated functionalities. Finally, this module can also project Soft-Buttons from above for interaction purposes. The content of these buttons can be freely chosen and their location can be arbitrarily selected. The spatial information about these buttons is written back into the real-time database for the image processing based detection of button activation. 3.4
Dice Recognition Unit
For gaining more natural feelings and moods in playing with the here presented system, a dice recognition unit is also implemented. The dice has to be tossed in a dedicated area on the game surface, highlighted with the table projection unit. In the following, the required steps are described more precisely. 1. First of all, the user has to toss the dice. 2. In the first processing step, the top plane of the dice is recognized. This is performed by applying a thresholding filter operation on the observer image. We are currently using a dark dice with light eyes on a white surface. Therefore, the dark area of the dice can easily be extracted from the filter
712
F. Wallhoff et al.
image. After applying morphological operations to the filter image, only those areas remain in the filter image, that contain the dice. 3. In the next processing step, the before extracted dice area is further analyzed. The inverse filter operation of the above described is applied to the dice region. Additionally, noisy elements in the filter image are removed. At this step, only those surfaces are visible, which are related to the number, that has been tossed. 4. At this stage, the filter image can further be analyzed to detect the actual number of eyes. Due to the chosen setting, only the number of eyes on the dice are visible in the filter image. By applying the connected components algorithm [9], these areas are labeled. By counting the amount of labeled surfaces, one gets the actual tossed value of the dice. 3.5
Robot Server
This module is working on the second computing unit and offers an interface to control the robot. However, it is not only a simple server performing all received orders, but it also provides for a certain degree of secure handling, because it prevents harmful commands resulting in damage of the hardware set-up. In addition, the system can be interrupted at any time by the user and will continue with the last requested action on demand. 3.6
Controlling Unit
Since all participating modules are built as stand-alone units, a central controlling unit is implemented to keep track of all relevant game data. The main task of this unit is to synchronize data streams provided by the real-time database with delays caused by the user and the movement of the robot. Besides the current game status, adjustments to the visual output like scaling or rotation are stored in this module as well. Thereby, the robot server can proceed the actual movement based on the transformation matrix which is used for the visual output.
4
Let’s Play
In the first of the subsequent two sections, we will present a short overview on the game constituting the foundation for the cognitive interaction scenario, where a cognitive computer system takes part in a contest in playing “Mensch ¨argere dich nicht” against a human player, see Figure 2. In the Section 4.2, the required adaptations for the realization of the presented human-machine contest will be delineated in greater details. 4.1
Rules
“Mensch ¨argere dich nicht” is a classical German board game. It exhibits similarities to the Indian game Pachisi, the American game Parcheesi, and the English game Ludo.
Statistics-Based Cognitive Human-Robot Interfaces for Board Games
713
Fig. 2. “Mensch a ¨rgere dich nicht” with a robot
In the classical style a wooden board is used for the game interaction. On this board all pawns are placed. For each player there are four pawns, which are color-coded (e.g. red, green, blue, yellow) to avoid any possibilities of confusion in the playing. The game can be played by 2, 3 or 4 players – one player per board side. At each board side a so-called home area is situated. This home area is the starting place, where the four pawns of each player are placed at the beginning of the game. The objective of the game is the following: Every player tries to bring the four pawns into the so-called “home row”. The starting point for each player is the home area. The player has to pass his pawns clockwise around the game board from the home area into the home row. One player starts to toss a dice. If he tosses a six, he will put one of his four pawns on the “start field”. The player has to toss again, and the number of eyes of the dice determine the fields, which he can pass with his pawn. If another pawn of an opponent player is at the destination field, this opponent pawn has to be moved back to its home area again. If he tosses no six, he can toss the dice twice again to obtain a six. If no six has been tossed, the turn moves over to the next player.
714
4.2
F. Wallhoff et al.
The Robot Way of Playing
Due to the today’s state of the art of robotic gripping and observation techniques, some constraints and restrictions are imposed on the current interaction setup. First, the robot is unable to toss a dice, therefore, a random number generator is delivering the robot’s numbers for the moves in the board game. Second, because of the small size of the pawns and the current vision-based surveillance all moves in the board game (robot’s and player’s) are conducted by the robot. Therefore, a touch screen display is used as input for the human’s moves. The player selects a field of the board game displayed at the touch screen and thus initiates the according movement of the pawn carried out by the robot. Besides, the board game is projected via the table projection unit.
5
Conclusion and Future Work
In this paper we presented a new form of a cognitive interaction scenario. The focus was laid on the cognition required for a human-machine contest in playing the German game “Mensch ¨argere dich nicht”. The implementation of the cognition playing strategy exhibits great potential for becoming a dignified counterpart for a human. However, in the current set-up, there are still many constraints, we would like to surpass in the future. First, the human should be allowed to move the pawns on his own, thus enabling him to cheat as well as to bring the system disadvantages by misplacing the pawns of the robot. For the realization of this vision, it is absolutely mandatory that the vision-based pawn detection and recognition has to be improved tremendously. On the other side, the cognitive system should attempt to betray the player as well from time to time for improving his winning opportunities and making the game more natural.
Acknowledgment This ongoing work is supported by the DFG excellence initiative research cluster Cognition for Technical Systems – CoTeSys, see www.cotesys.org for further information and details. The authors further acknowledge the great support of Matthias G¨ obl for his explanations and granting access to the RTDB repository. Furthermore, we want to thank all our partners within our CoTeSys-Projects for the fruitful discussions and implementation work to make our visions and ideas become reality.
References 1. Vernon, D., Metta, G., Sandini, G.: A Survey of Artificial Cognitive Systems: Implications for the Autonomous Development of Mental Capabilities in Computational Agents. IEEE Transactions on Evolutionary Computation 11(2), 151–180 (2007), http://dx.doi.org/10.1109/TEVC.2006.890274
Statistics-Based Cognitive Human-Robot Interfaces for Board Games
715
2. Anderson, D.M.L.: Embodied cognition: A field guide. Artificial Intelligence 149(1), 91–130 (2003), http://cogprints.org/3949/ 3. Beetz, M., Stulp, F., Radig, B., Bandouch, J., Blodow, N., Dolha, M., Fedrizzi, A., Jain, D., Klank, U., Kresse, I., Maldonado, A., Marton, Z., M¨ osenlechner, L., Ruiz, F., Rusu, R.B., Tenorth, M.: The assistive kitchen — a demonstration scenario for cognitive technical systems. In: IEEE 17th International Symposium on Robot and Human Interactive Communication (RO-MAN), Muenchen, Germany (2008) (invited paper) 4. Lenz, C., Suraj, N., Rickert, M., Knoll, A., R¨ osel, W., Bannat, A., Gast, J., Wallhoff, F.: Joint actions for humans and industrial robots: A hybrid assembly concept. In: Proc. 17th IEEE International Symposium on Robot and Human Interactive Communication (August 2008) 5. Z¨ ah, M.F., Lau, C., Wiesbeck, M., Ostgathe, M., Vogl, W.: Towards the Cognitive Factory. In: Proceedings of the 2nd International Conference on Changeable, Agile, Reconfigurable and Virtual Production (CARV), Toronto, Canada (July 2007) 6. Goebl, M., F¨ arber, G.: A real-time-capable hard- and software architecture for joint image and knowledge processing in cognitive automobiles. In: Intelligent Vehicles Symposium, pp. 737–740 (June 2007) 7. Stiller, C., F¨ arber, G., Kammel, S.: Cooperative cognitive automobiles. In: 2007 IEEE Intelligent Vehicles Symposium, June 2007, pp. 215–220 (2007) 8. Thuy, M., G¨ obl, M., Rattei, F., Althoff, M., Obermeier, F., Hawe, S., Nagel, R., Kraus, S., Wang, C., Hecker, F., Russ, M., Schweitzer, M., Leon, F.P., Diepold, K., Ebersp¨ acher, J., Heißing, B., W¨ unsche, H.-J.: Kognitive automobile - neue konzepte und ideen des sonderforschungsbereiches/tr-28. In: Aktive Sicherheit durch Fahrerassistenz, Garching bei M¨ unchen, April 7-8 (2008) 9. Jankowski, M., Kuska, J.-P.: Connected components labeling - algorithms in mathematica, java, c++ and c-sharp. In: Mitic, P., Jacob, C., Crane, J. (eds.) New Ideas in Symbolic Computation: Proceedings of the 6th Mathematica Symposium, Positive Corporation Ltd. Hampshire, UK (2004)
The Design and Development of an Adaptive Web-Based Learning System Chian Wang Department of Information Management National Changhua University of Education Changhua, Taiwan [email protected]
Abstract. Currently, most web-based learning systems do not differentiate the content materials presented to the various types of learners. Content adaptation is a concept inspired by enabling dynamic presentation generation based on the learner’s preferences, e.g. knowledge level, gender, age, language, or past visits. The goal of content adaption is to take the heterogeneous and changing needs of the learners into account and thus to provide the most appropriate contents and the best learning satisfaction. To handle content adaptation and dynamic presentation generation, an XML-based content description mechanism called ADAM is proposed in this paper. ADAM’s goal is to enhance learning effectiveness through providing the most appropriate materials under changing learners’ requirements and preferences while simplifying the process of presentation composition. Keywords: content description, multimedia, web-based learning, XML.
1 Introduction The rapid development of the Internet does not only create new types of information, but also changes how people access information. At the same time, many kinds of applications are inspired by the openness and the robustness of the Internet. Among them, web-based learning (WBL) systems have been one of the hottest research topics. The explosive growth of the Internet and the increasing amount of multimedia contents have made WBL systems an important information source for many people. The major benefit of these systems is that they allow learners from anywhere to learn with rich multimedia materials at any time. This kind of growth also comes with increasing diversity and heterogeneity in terms of the learners’ capabilities, backgrounds, and preferences [1, 3]. However, the contents of most WBL systems are static and are designed for a single type of learners in mind only. That is, all the learners are provided with the same web pages and hyperlinks. In many cases, these systems are not suitable for the diverse types of learners coming from all over the world. Thus, they can not truly satisfy an individual learner’s needs. Given the huge amount of learners on the Internet, there are considerable interests in systems that are able to satisfy the diverse needs of different types of learners. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 716–725, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Design and Development of an Adaptive Web-Based Learning System
717
Though more and more learning applications provide rich multimedia information, they do not differentiate content materials presented to the various types of learners. For example, learners with lower knowledge level or foreign learners may experience frustration due to improper content presentations or inability to understand certain materials. The multimedia-rich contents can definitely enhance learning results but may not be satisfactory for learners with specific requirements, such as dubbed audio and subtitles. As a result, most WBL systems are targeted at certain types of learners, and others usually experience frustration or feel lost when accessing these systems. The lack of content adaptation to accommodate this kind of variety or heterogeneity raises challenging research topics for enabling more effective learning over the Internet. Content adaptation is a concept inspired by the issue for enabling dynamic presentation generation based on the learner’s preferences, e.g. knowledge level, gender, age, language, or past visits. The actual contents generated for a given learner is thus a combination of his own preferences and the adaptive criteria of the corresponding media objects. The goal of content adaptation is to provide the most suitable and personalized materials to each learner [2, 4]. To handle content adaptation and dynamic presentation generation, a content description mechanism called ADAM is proposed in this paper. Adaptive content generation is a mechanism that can dynamically compose the corresponding media objects into a presentation according to the learner’s preferences to enhance browsing results. The goal of content adaption is to take the heterogeneous and changing needs of the learners into account and thus to provide the most appropriate contents and the best learning satisfaction. Content adaption also has beneficial business implications beyond just providing a better browsing result. One of the main benefits is to increase the web site visit time of the users. This also means that the users are more likely to stay at the site, thus resulting in a greater profit for ecommerce sites. Currently, HTML (Hyper Text Markup Language) has been the presentation platform for most web-based applications. HTML provides a simple and efficient way for content description. However, HTML lacks for some capabilities when used in certain applications. First, HTML can not handle adaptive presentations. Because with an HTML file, the browser can generate only one presentation. If the user wants to make minor changes to the presentation, e.g., changing the subtitles from English to Chinese, modifying the HTML file is required. This makes the development of adaptive learning systems very inefficient because the instructor must compose several versions of HTML files for different types of learners. Second, HTML can not describe the temporal relationships of the media objects in a presentation, e.g., playing an audio file when a specific video segment is over. With these shortcomings, HTML is not suitable for generating dynamic presentations. For an adaptive learning system, a more robust and dynamic content description mechanism is required. That is, the system should be able to dynamically generate presentation contents according to the learners’ requirements and preferences. Also, it is preferred that the instructor can relieve the burden of composing presentations as much as possible. With ADAM, the instructor only needs to compose a base version of the content description file and specify the adaptive criteria in the file. After that, ADAM generates the adaptive presentations by including only the media objects with the conditions evaluated to be true. ADAM’s goal is to enhance learning effectiveness through providing the most
718
C. Wang
appropriate materials under changing learners’ requirements and preferences while simplifying the process of presentation composition. With the popularity of WBL systems and the increasing diversity of learner backgrounds, there is a need to have adaptation of the learning presentations. Typical examples include the materials for learners with different knowledge levels and language preferences. For an instructor, it takes a lot of effort and time to compose suitable presentations for diverse types of learners to achieve adaptive learning. He/she must consider each individual’s knowledge background, preferences, interests, and some other criteria when composing the presentations. ADAM’s main feature is that it requires only one XML-based description file to specify the adaptive structures and to generate the corresponding presentations. That is, ADAM is a general framework for dynamic presentation generation and can handle the adaptation needs of web-based learning. XML is adopted because it (i) has been a W3C (the World Wide Web Consortium) standard, (ii) allows the specification of a document which is independent of its final presentation, and (iii) is platform independent. Though adaptive presentations can certainly enhance learning effectiveness, they also increase the complexity of presentation composition. ADAM can help the instructors to relief the burdens because they only need to edit a single content description file and ADAM will generate the adaptive presentations accordingly. That is, ADAM can help the instructors to compose the courseware that have different renderings to different types of learners. We propose ADAM to specify a presentation scenario through three dimensions: spatial, temporal, and adaptive. In this paper, we also describe the development of a prototype WBL system to demonstrate the feasibility of ADAM. When a learner requests a presentation, ADAM handles the request by parsing the corresponding content description file and the learner’s preferences and then makes adaptation decisions to generate the final presentation.
2 Design of ADAM As suggested by the researchers, adaptive multimedia presentations are among the important factors in keeping students engaged in learning. This section describes the detailed design of ADAM. We start from how the adaptive presentations are generated. 2.1 Generation of Adaptive Presentations To support adaptivity, ADAM aims to adapt the materials presented to a learner according to his/her preferences, knowledge level, and the adaptation criteria specified by the instructor. Conditional media objects are used to accomplish the generation of adaptive presentations. That is, each constituent media object of a presentation is associated with a condition indicating which type of learners should be presented with it. In this way, several variants of the presentations associated with a specific learning subject are prepared. Each variant presents the materials in a different style to satisfy the learner’s needs. Fig. 1 depicts the flow of generating adaptive presentations.
The Design and Development of an Adaptive Web-Based Learning System
719
Fig. 1. Generation of adaptive presentations
In Fig. 1, the learner management module is responsible for collecting a learner’s basic information such as gender, age, language, education, etc. This information can be obtained when the learner registers. Also, when the user logs in for the first time, he/she will be given some tests such that the system can know his/her knowledge level about the subject that he/she is going to learn. In addition, the learner can have some other options to accommodate his/her needs, e.g., the language of subtitles and audio, and video resolution. For the instructor, the system should provide a friendly user interface and some management functions to assist the composition of learning presentations and the specification of adaptive criteria. With the criteria, ADAM can dynamically change and adapt the presentation contents for various types of learners. Customization of the contents gives the learners some more opportunities for different learning styles. After that, an XML-based content description file is generated. When generating a presentation, ADAM takes both the learner’s preferences and the adaptive criteria specified in the description file into account. Only the media objects with the corresponding adaptation rules evaluated true are included in the generated presentations. For example, if the learner prefers Chinese, then the Chinese version of subtitles and audio will be presented. As revealed in Fig. 1, the instructor only needs to compose the content description file once. Also, a single content description file can be used to generate various types of presentations. Each type of presentations is specifically tailored to accommodate a variety of individual differences and requirements. 2.2 Overall System Architecture As described previously, the literatures have revealed the importance of adaptation on students’ learning performances. As a case study, an English learning system based on ADAM was developed to support the course “English Conversations” that is offered to the freshmen in the Department of Information Management at National Changhua University of Education, Taiwan. The main characteristic of the system is
720
C. Wang
adaptive presentation engine (ADAM)
presentation #1
presentation #2
web browser
web browser
learner #1
learner #2
.…. .…. .….
learner model
presentation
authoring
content description script file
presentation #n
web browser learner #n
Fig. 2. System architecture
that the presented contents are adapted to the learner’s preferences and knowledge level. Fig. 2 depicts the system architecture. As depicted in Fig. 2, the system is developed in the form of three modules: the content description script file, the learner model, and the adaptive presentation engine, i.e., ADAM. The content description script file is XML-based and is composed by the instructor to specify how the presentations are generated. As to ADAM, its job is to adapt different aspects during the learning process, e.g., adapting the content according to the learner’s prior knowledge, generating the presentations through the selection and combination of appropriate media objects, and modifying the corresponding hyperlinks, etc. The learner model is a simple data structure that can reflect the characteristics of different learners. Currently, the learner model contains two categories of information. (i) The personal profile includes static data such as account name, password, real name, student ID, gender, birth date, e-mail address, etc. (ii) The knowledge profile identifies the learner’s knowledge level about a specific subject. 2.3 The Content Description Language A multimedia presentation is composed of a set of media objects, e.g., video, audio, text, images, etc. With most application systems, it is preferred that the media objects can be reused to produce different presentations. Thus, a presentation’s specification should be separated from its actual content. In this subsection, we introduce an XMLbased content description language, called Adaptive Multimedia Markup Language (AMML). AMML’s purpose is to facilitate the specification requirements of ADAM. Because AMML is based on XML, it also allows the specification of document structures independent of their final presentations, which is a basic requirement of
The Design and Development of an Adaptive Web-Based Learning System
721
AMML(root) CMML(root)
region
head
body
layout
content
root-layout
time container
id
width
width
height
par start
height
col
end
top
row
dur
left
media object
seq
clipstart
streaming
audio
non-streaming
video
image
text
src
src
src
src
size
size
size
rate
rate
clipend
class
interaction
content
quality
language
switch
interation
switch
switch
switch
id
id
id
id
id
play
degree
size
sys_language
bitrate
usr_language
stop pause resume jump
Fig. 3. The tree structure of AMML’s tags
WBL systems. In order to fully benefit from XML’s flexibility and expandability, AMML’s syntax is formally described as an XML Document Type Definition (DTD) and therefore AMML can take full advantages of all the existing XML tools. With AMML, the description of an adaptive multimedia presentation is organized around three dimensions: spatial, temporal, and adaptive. The tree structure of AMML tags is depicted in Fig. 3. In the remaining part of this section, we describe AMML from each of these three dimensions and explain their usages. As shown in Fig. 3, an AMML description file is consisted of three major blocks. 1. The head block describes a presentation’s properties of the spatial dimension, i.e., the layout of the media objects and how the presentation looks like. 2. The body block specifies a presentation’s properties of the temporal dimension, i.e., the relative timing sequences of the media objects. This allows the organization of the media objects in a presentation over time. 3. The class block describes the adaptive requirements.
722
C. Wang
To simplify the specification of a multimedia presentation, the underlying model of AMML is interval-based. That is, the presentation is composed by a set of media objects that have some temporal relationships and each media object has a corresponding time interval characterized by a “start”, a “duration”, and an “end” attributes. In AMML, the synchronization among media objects is specified by both the composite objects and their temporal relationships. The composition of media objects is used to temporally group interval elements and is described by the “container” tag that is depicted in Fig. 3. Currently, two temporal relationship tags are provided by AMML: (i) The “seq” tag means that the media objects are presented sequentially. (ii) The “par” tag means that the media objects are presented in parallel. As to the adaptive criteria, they are specified in the class block of an AMML document. Currently, four elements are supported: user interaction, content, video quality, and language. 1. The so-called user interactions are the control functions similar to those of VCRs, e.g., pause, fast forward, and rewind. With these user interactions, the learner can control the presentation flow and speed at any time during the presentations. However, in an adaptive WBL system, not all levels of learners should be provided with the same user interactions. In stead, the system should provide a proper set of user interactions to each level of learners. For example, for a higher level of learners, more use interactions can be provided. Because they have more background knowledge and should be able to control what they want to view. On the contrary, for the lowest level of learners, only the basic pause function is provided to restrict their browsing behaviors. In this way, the instructor can have control over the learners’ learning states and behaviors. 2. Adaptation by content means that the system can generate the presentations about the same subject but with various difficulties. For example, when having a math course, higher-level students can have materials about 4-digit addition while lowerlevel students can have materials about 2-digit addition. When a learner selects a course, his/her information in the learner model is used to generate the most appropriate materials. In this way, the instructor can examine each learner’s learning state more easily. 3. Because the learners’ communication environments vary a lot, it is reasonable that the system delivers the media objects with an appropriate quality to the learners. That is, if there are several versions of a video object, the system will choose the one that has the adequate screen resolution and frame rate after detecting the network bandwidth to the learner. If the detection is not successful, a default version of the media object can be provided. 4. With WBL systems, learning becomes without any nationality and geographical limits. Because the learners are coming from all over the world, language becomes a major consideration of WBL systems. To improve efficiency, ADAM can dynamically adapt the language part, e.g., subtitles and speech, of a presentation, instead of generating the whole presentation from scratch.
3 System Development As mentioned previously, we developed a real WBL system to demonstrate the feasibility of ADAM. This section describes its implementation. Our goal is that
The Design and Development of an Adaptive Web-Based Learning System
723
ADAM and AMML can be integrated seamlessly to further improve learning effectiveness. 3.1 System Components Fig. 4 depicts the components of our system. There are three major modules in Fig. 4: the XML parser, the synchronization processor, and the media content processor. CMML script file
Spatial Controller
course database
XML Parser
Synchronization Processer
Media Controller
media request multimedia Database Media units
media buffer
user interface display
Temporal Controller
decoder
rendering Media Content Processer
Fig. 4. System components
The operation flow of the system is as follows. 1. The instructor composes the course’s content description files in AMML to specify the temporal, the spatial, and the adaptive properties of the presentations. These description files are stored in the course server’s database. 2. When the learner wants to view a presentation, its content description file is retrieved from the server and passed to the XML parser for content generation. 3. The XML parser parses the AMML tags in the description file into a parse tree using the document object model (DOM) and keeps the corresponding attributes, including media type, start time, end time, and spatial location, of each media object in a parse table. 4. The synchronization processor contains the temporal controller and the spatial controller. They together allocate each media object in a presentation according to the object’s temporal and spatial properties stored in the parse table. 5. Finally, the media controller of the media content processor retrieves the corresponding media objects from the multimedia database. At the client site, the media objects are first stored in the receiving buffer temporarily. When the media objects are about to be presented, they are passed to the decoder, then the renderer, and finally the user interface display, to be played out under the control of the temporal controller and the spatial controller. 3.2 An Illustrated Presentation As mentioned previously, we developed an ADAM-based WBL system to support the “English Conversation” course. This subsection demonstrates some major results.
724
C. Wang
1
2
a b
c
3
Fig. 5. An illustrated adaptive presentation
When a learner logs in for the first time, he/she is asked to fill in some personal information. After that, a pre-class test is given to determine his/her knowledge level about the course. The purpose of these steps is to gather the attributes of the student model for each learner. ADAM then utilizes the information stored in the student model to support adaptivity. To accomplish this goal, each media object is associated with a condition indicating which type of learners should be presented with it. Also, ADAM supports adaptive navigation that the selection and the color of hyperlinks are adapted to the individual learner by taking into account the information in the student model and the instructional strategy. Fig. 5 shows an illustrated presentation. In Fig. 5, the presentation can be divided into three main areas. They are described as follows. 1. Video area: This area is used to place video and audio objects. The audio is associated with the video such that different versions of speech, e.g., Chinese or English, can be included. 2. Image area: This area is used to place image objects, e.g., JPEG files. ADAM also provides a slide-show feature to image objects. This is, the displayed image is automatically changed as the time goes by. 3. Text area: Text and audio objects can be placed in this area. The text can be the subtitles of the video or some annotations. In addition to the three areas for placing media objects, Fig. 5 also shows three sets of control buttons, i.e. , a, b, and c. These buttons provide VCR-like user interactions. As mentioned previously, the web browser does not provide these user interactions. They are implemented in JavaScript. Also, some of the buttons may be disabled because the functions are not provided to this learner.
4 Conclusion and Future Work In this paper, we proposed ADAM, an adaptive multimedia content description mechanism. The main feature of ADAM is that only a content description file in
The Design and Development of an Adaptive Web-Based Learning System
725
AMML is required to generate various presentations. In order to demonstrate the feasibility of ADAM, an English-learning WBL system is developed. After developing ADAM and the WBL system, two evaluations, i.e., the expert review and the small group evaluation, are conducted to judge the values of our work. According to the results, the experts and the learners have positive attitude towards the perceived efficacy and enjoyment of the system. From the view point of instructors, with ADAM, it is easier to re-use the materials to compose adaptive presentations to accommodate the needs of various types of learners.
References 1. Benson, V., Frumkin, L., Murphy, A.: Designing Multimedia for Differences: e-Lecturer, eTutor, and e-Student Perspectives. In: Proceedings of the Third International Conference on Information Technology and Application, vol. 2, pp. 159–164 (2005) 2. Chen, C.M., Liu, C.Y., Chang, M.H.: Composing a Complex Biological Workflow through Web Services. Personalized curriculum sequencing utilizing modified item response theory for web-based instruction. Expert Systems with Applications 30(2), 378–396 (2006) 3. Gu, Q., Sumner, T.: Support Personalization in Distributed E-learning Systems through Learner Modeling. Information and Communication Technologies 1, 610–615 (2006) 4. Lin, C.B., Young, S.S.C., Chan, T.W., Chen, Y.H.: Teacher-oriented adaptive Web-based environment for supporting practical teaching models: a case study of “school for all”. Computers and Education 44(2), 155–172 (2005)
Human-System Interface (HSI) Challenges in Nuclear Power Plant Control Rooms Jo-Ling Chang1, Huafei Liao1, and Liang Zeng2 1 Bechtel Power Corporation 5275 Westview Drive, Frederick, MD 21703 {jjchang,hliao}@bechtel.com 2 School of Industrial Engineering Purdue University West Lafayette, IN [email protected]
Abstract. This study uses factor analysis to examine 30 errors due to humansystem interface (HSI) in nuclear power plant control rooms. The results are used to validate the factor structure and the Decision-Action Model developed in this paper. Ten U.S. commercial operating nuclear plants, total of 18 units, participated in this study at the time this paper was written. The result is a fivefactor structure: Operations Uncertainties, Design Improvements, Misoperations, Equipment Control, and Human Factors Redesign. The completed DecisionAction Model provides current operating plants with suggested corrective actions for each type of potential HSI errors. Keywords: Human error, human-system interface, control room, human performance, factor analysis, corrective action.
1 Introduction With the increasing demand for clean and reliable energy in recent years, there is much attention on new construction and increasing output for nuclear power plants in the United States. Concerns for nuclear safety have also risen in view of the numerous modifications current operating plants are installing. This study aims to provide additional insights through operator experience on control room HSI. Human errors, when systematically analyzed and evaluated, provide information on the causes and correction methods. Among the human error analysis/evaluation methods, statistical analysis is effective in identifying error categories, occurrence patterns and trends, and revealing the hidden interrelationships between errors and their casual factors. Such knowledge can significantly improve human error prevention and corrective measure evaluation.
2 Background Improving currently operating nuclear power plant control room design is a continuous effort. A review of recent operating experience (OE) from the Institute of Nuclear M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 729–737, 2009. © Springer-Verlag Berlin Heidelberg 2009
730
J. Chang, H. Liao, and L. Zeng Table 1. Control Room Human-System Interface Hypothetical Factor Structure
Event Operation Based
Item 1. Operation movements. Operation requires small movements or jerking/unsmooth motion. 2. Simultaneous operation. Operator required to multi task. 3. Control room/simulator discrepancies. Trained actions are not applicable to real scenarios. 4. Operate equipment incorrectly. Due to inattention to details/distractions. 5. Inappropriate compensation. From lack of trust in equipment. 6. Over reliance. From over trusting equipment. 7. Defeated safety features. Manual override of safety feature. 8. Inexperience. From lack of operating hours on equipment. Controller 9. Operate on wrong equipment. Due to similarity. Design 10. Controls too far apart. Need excess movement to operate Based consecutive actions. 11. Controls too close together. Poor design leads to inadvertent operation. 12. Incorrect function allocation – Manual actions designed to be automated. 13. Incorrect function allocation – Automated actions designed to be manual. 14. Equipment allowing failures. Allowing operation outside of design parameters. 15. Work-around’s. Known defects that require operators to take less direct action. 16. Time limit to operation. Operation cannot be completed within the allowed time. 17. No operator intervention allowed. To abort or assume control as necessary. Deficient 18. No alarm noting abnormal conditions and/or failures. Indication 19. Insufficient plant information. Based 20. Boolean indication. Indication without level of severity. 21. Unreliable indication. Indication known to reflect plant condition imperfectly. 22. No feedback. Action is performed with no confirmation. 23. No projection. No indication on anticipated result from action. 24. No trending. No indication on equipment failing over a prolonged time period. Ambiguous 25. Control panel visually crowded. Cannot take in presented Indication information at a glance. Based 26. Color/Sound coordination. Many indications of the same color/sound or all indications having different colors/sounds. 27. Over-indication. A single failure represented by more than one alarm. 28. Non-intuitive control. 29. Display challenges. Display font size/color or inconsistency in acronyms/labeling/terminology. 30. Data searching. Extensive navigation needed to look for known existing data.
Source [4] [5] [6] [7] [8] [5] [9] [10] [11] [12] [13] [14] [14] [15]
[16] [17] [18] [19] [20] [21] [22] [17] [3] [23]
[24] [17]
Human-System Interface (HSI) Challenges in Nuclear Power Plant Control Rooms
731
Power Operation (INPO) database shows that control room HSI design may still need to make improvements to increase safety and reduce cost of operation. Between 1991 and 2008, numerous plant events resulting in either a plant trip/transient or technical specification violation were still being reported to INPO. The cost of these errors could be as high as a million dollars per day for repairs and rework. The United States Nuclear Regulatory Commission (USNRC) provided Human Factors Engineering (HFE) guidelines in NUREG-0700 [1] and identified different areas of HSI in both advanced and conventional control rooms. The HFE guideline provides the following categories: Information Display, User-System Interface, Process Control and Input Devices, Alarms, Analysis and Decision Aids, InterPersonnel Communication, Workplace Design, and Local Control Stations. The OE search conducted by the authors in August 2008 on the Institute of Nuclear Power Operations (INPO) OE database revealed 146 plant events between 12/3/1990 and 4/24/2008. These events were listed under control room operator work group and man-machine interface casual factor. Each plant event was reviewed for contributing factors; many were used in the hypothetical factor structure. The hypothetical factor structure utilized the USNRC HFE categories as a starting point and items were expanded or deleted compared to the actual events found in OE. Categories developed for control centers in nuclear or other industries were also examined. Examples of these categories include Davey discussed factors important for structuring review criteria [2] and Grozdanovic’s research on the control center of the railway traffic in Yugoslavia [3]. This study reviewed literature available on control room HSI and related issues to create the hypothetical factor structure listed in Table 1. This table lists one source for each item. For some of these items, the factors contributing to the errors are discussed in several different sources. For items such as these, a representative reference is selected and listed. Items that do not have a reference source listed are simplified or paraphrased from the literature review and cannot be pinpoint into a single source document.
3 Model Development A review of recent plant events relating to control room HSI showed that the majority of the incidents, approximately 70%, did not lead to any immediate corrective actions. The focus of this study, in addition to developing a factor structure, is to provide corrective action guidelines for current operating plants. As such, a Decision-Action Model was developed (see Table 2) to offer current operating plants suggested corrective actions for each of the items in the purified factor structure, which effectively lists the causes of all OE submitted by operating plants. Future errors may Table 2. Decision-Action Model Group I (no incident) Correct Decision + Correct Action Group III Correct Decision + Incorrect Action
Group II Incorrect Decision + Correct Action Group IV Incorrect Decision + Incorrect Action
732
J. Chang, H. Liao, and L. Zeng
be associated with one or more of these items and the suggested corrective actions may help to effectively eliminate similar errors from reoccurring. Decision, in this model, represents cognitive errors. These errors pertain to knowledge and judgment of the operator. This type of error is similar to the concept of “mistakes” described by Norman, 1981 [25]. It is described as representing defects in the formulation of strategy, generated only during the planning process as the result of inappropriate knowledge of the relations between parts of the plant or between physical quantities. Action, in this model, represents the physical errors, or “slips” in Norman’s words, which are imperfections of attention monitoring or errors that occur while implementing intended plans [25]. This model is influenced by Chen-Wing and Davey [26], which illustrates that Designer, Operations, and Human/Technical Resources’ roles on error reduction. Since the Decision-Action Model focuses on current operating plants, as opposed to designing new constructions, the Designer’s role is eliminated. Suggested corrective actions are also based on the same study. These suggested actions are as follows: • Group I Suggested Corrective Action: N/A • Group II Suggested Corrective Action: Improvement to operations procedure, general guidelines, and pre-job briefing. • Group III Suggested Corrective Action: Additional operator training, peer check, management oversight. • Group IV Suggested Corrective Action: Control room modification with human factor re-evaluation to the extended condition.
4 Research Method 4.1 Procedure A survey was developed based on the hypothetical factor structure shown in Table 1 to examine power plant operators’ opinions on HSI errors. Each survey question consisted of two parts: a 7-point Likert scale ranging from Strongly Disagree (1) to Strongly Agree (7), and an option to select the contribution of the error from Operator Decision, Operator Action, or both. The survey contained 32 questions, which included two paired questions to estimate the internal consistency of participants’ responses. Ten commercially operating nuclear power plants, 18 units total, participated in this ongoing study at the time this paper was written. Several methods, e.g., telephone, email, and/or post, were used to contact the head of the Operations Department at each plant. Participants were directed to either forward the online survey URL to qualified licensed operators or mail back completed hard copies of the surveys. By the time when this paper was written, 138 responses were received, out of which eight were completed through paper surveys, and the remainder were collected online. 4.2 Profile of Participants The first 138 responses were analyzed for this study. A single operating unit is expected to have around 20 licensed operators. The approximate number of operators at each unit was verified through operations training instructors at several plants.
Human-System Interface (HSI) Challenges in Nuclear Power Plant Control Rooms
733
From this information, the study’s current participation rate is around 38%. As indicated above, surveys were distributed by the head of the Operations Department. While some plants sent the surveys to all operations personnel, others selected small groups of individuals to participate in this study. This accounts for the low participation rate of this study. Out of the 138 responses, 14 were discarded due to low internal consistency. 3.8% of remaining 124 responses were from USNRC Region I, 21.5% was from USNRC Region II, 18.5% was from USNRC Region III, and 53.4% was from USNRC Region IV. The remaining responses did not provide enough plant information to identify their regions. These regions are designated by the USNRC to oversee the operation of power producing and non-power producing reactors in the United States [27]. Participants’ ages ranged from 27 to 63 (mean = 47.0, standard deviation = 7.55). 18.1% had 1 to 10 years of operations experience, 20.3% had 11 to 20 years, and 61.7% had over 20 years. 4.3 Analysis Descriptive Statistics. The overall internal consistency as estimated by Cronbach’s alpha coefficient was 0.70, which indicates that the survey has acceptable internal consistency. The general characteristics of the survey results on the items loaded on the purified factor structure (see discussion later in this section) were examined. The mean scores for each of the items were between 3.6 and 6.2 and the standard deviations were between 0.87 and 1.60. The overall mean of all the items in the purified factor structure is 5.1. Items scoring above the overall mean indicate relatively strong preferences of the participants and are in bold font. Item 12, incorrect function allocation – manual actions designed to be automated, has the lowest mean value. This suggests that operators believe that designing manual actions to be automated would reduce, rather than increase, the chances of making an error. Three-way ANOVA shows that there are no significant effects of age, experience, or plant of employment on all 18 loaded items. Factor Analysis. Maximum likelihood factor analysis with varimax rotation was conducted to explore the hidden factor structure determined by the correlations among survey items. Five factors, whose eigenvalues are larger than 1.0, were retained (see Table 3). Factor loadings are presented in Table 3. Items with loading higher than, or close to, 0.50 are considered significant. Items 29, 18, and 15 are very close to 0.50 and therefore considered significant as well. The items loaded on Factor 1 pertain to errors caused by doubts in information presented on the job; thus, Factor 1 is classified as “Operations Uncertainties”. Factor 2 contains items describing existing design with the need for improvements and is labeled “Design Improvements”. Factor 3 includes operation-based errors and is categorized as “Misoperations”. The two items loaded onto Factor 4 are equipment allowing failures and over reliance. As such, Factor 4 is labeled “Equipment Control”. Factor 5 includes problems arising from basic human factor issues and is labeled “Human Factors Redesign”.
-0.01 0.21 0.17 0.24 0.32 0.28 0.56 0.52 0.51 0.50 -0.06 0.24 0.42 0.07 0.11 0.09 0.33 0.17 2.7 5.1
15.5
36.4
0.69 0.66 0.55 0.50 0.49 0.49 0.26 0.09 0.11 0.38 0.13 0.22 0.32 0.16 0.04 0.03 0.11 0.24 20.0 6.9
20.9
20.9
21 22 28 19 29 18 24 16 26 20 4 7 25 14 6 12 11 10
51.0
14.6
0.23 0.18 0.20 0.28 0.28 0.33 0.32 0.09 0.19 -0.06 0.51 0.50 0.49 0.20 0.17 0.01 0.46 0.25 2.4 4.8
Factor 3
63.0
12.0
0.33 0.23 0.03 0.32 0.10 0.12 0.29 0.14 0.04 0.42 0.17 0.26 0.14 0.57 0.51 0.30 -0.04 0.37 1.9 4.0
Factor 4
75.2
12.2
0.09 0.03 0.12 0.10 0.12 0.16 -0.08 0.12 0.15 0.12 0.12 0.03 0.30 0.14 0.21 0.69 0.57 0.54 1.6 4.0
Factor 5 6.0 5.4 5.5 5.8 5.2 5.7 4.9 5.0 5.0 4.9 6.2 5.3 5.1 5.2 4.3 3.6 4.9 4.6
Meanb 1.02 1.20 1.20 0.99 1.29 1.14 1.26 1.11 1.23 1.34 0.87 1.57 1.19 1.40 1.46 1.40 1.37 1.50
SD
4.4
4.8
5.5
5.0
5.6
Factor Mean Action (%) 11 (8.9) 29 (23.6) 47 (38.2) 12 (9.8) 33 (26.8) 44 (35.8) 16 (13.0) 44 (35.8) 32 (26.0) 17 (13.8) 30 (24.4) 23 (18.7) 58 (47.2) 29 (23.6) 15 (12.2) 52 (42.3) 87 (70.7) 90 (73.2)
: p < 0.0001; **: p < 0.01; *: p < 0.05 a : Reference Table 1 for Item Description b : Means above the overall mean (5.1) are in bold type. Factor loadings in bold type are considered to be significant. Five factors explained 75.2% of total variance.
***
Eigenvalve Variance explained by each factor % Variance explained by each factor Cumulative % total variance explained
Factor 2
Factor 1
Item No.a
Table 3. Descriptive Statistics and Rotated Factor Pattern Decision-Action Decision (%) Both 85 (69.1) 27 (22.0) 63 (51.2) 31 (25.2) 40 (32.5) 36 (29.3) 79 (64.2) 32 (26.0) 63 (51.2) 27 (22.0) 54 (43.9) 25 (20.3) 78 (63.4) 29 (23.6) 40 (32.5) 39 (31.7) 57 (46.3) 34 (27.6) 91 (74.0) 15 (12.2) 50 (40.7) 43 (35.0) 53 (43.1) 47 (38.2) 37 (30.1) 28 (22.8) 70 (56.9) 24 (19.5) 90 (73.2) 18 (14.6) 51 (41.5) 20 (16.3) 22 (17.9) 14 (11.4) 21 (17.1) 12 (9.8)
2 1
57.04*** 12.57** 0.56 49.33*** 9.38** 1.02 40.89*** 0.19 7.02** 50.70*** 5.00* 11.84** 4.64* 16.98*** 53.57*** 0.01 38.76*** 42.89***
734 J. Chang, H. Liao, and L. Zeng
Human-System Interface (HSI) Challenges in Nuclear Power Plant Control Rooms
735
Compared to the hypothetical factor structure in Table 1, Factors 1 and 2 in the purified factor structure include a shuffling of items from the original Deficient Indication and Ambiguous Indication categories. Factor 3 maps closely to the items in the Operation Based category and Factor 5 includes only items from the Controller Design Based category. Model Population. Chi-square tests were performed to determine if the responses lean towards Operator Action or Operator Decision. Table 3 shows the number of responses that selected Action, Decision, or both for each loaded item. The Decision heavy items are populated into Group II of the Decision-Action Model, Action heavy items are populated into Group III, and the remaining items are populated into Group IV. The final model is shown in Table 4. Table 4. Populated Decision-Action Model Group I (no incident) Correct Decision + Correct Action Suggested Corrective Action: N/A
Group III Correct Decision + Incorrect Action • Control panel visually crowded • Controls too close together • Controls too far apart Suggested Corrective Action: Additional operator training, peer check, management oversight.
Group II Incorrect Decision + Correct Action • Unreliable indication. • No feedback • Insufficient plant information • Display challenges • No trending • Color/Sound coordination • Boolean indication • Operate equipment incorrectly • Defeated safety features • Equipment allowing failures • Over reliance Suggested Corrective Action: Improvement to operations procedures, general guidelines, and pre-job briefing. Group IV Incorrect Decision + Incorrect Action • Non-intuitive control • No alarm noting abnormal conditions and/or failures • Time limit to operation • Incorrect function allocation – Manual actions designed to be automated Suggested Corrective Action: Control room modification with human factor re-evaluation to the extended condition.
5 Conclusion This study collects nuclear power plant operator opinions on control room HSI errors. The factor means, shown in Table 3, indicate that the highest priority should be
736
J. Chang, H. Liao, and L. Zeng
placed on the category of Operations Uncertainties. The remaining four factors, in the order of importance, are: Misoperations, Design Improvements, Equipment Control, and Human Factor Redesign. The populated Decision-Action Model (Table 4) also provides some insight to control room design. A large number of items are populated in Group II. This may be interpreted as suggesting that improvements in the planning process will have better results in reducing HSI type errors. Operating Experience was revisited to compare this study’s finding to documented plant events. Out of the 106 plant events originally evaluated, 17% fall under Operations Uncertainties, 21% fall under Design Improvements, and 22% fall under Misoperations. No event strictly falls into the category of Equipment Control or Human Factor Redesign. This comparison confirms that this study’s conclusion of category importance ranking is valid.
Acknowledgments The authors would like to acknowledge the support and directions offered by Dr. Gavriel Salvendy and Mr. Karey Kimmel.
References 1. United States Nuclear Regulatory Commission: Human Factors Engineering. In: Standard Review Plan for the Review of Safety Analysis Reports for Nuclear Power Plants, NUREG-0800, ch. 18, rev. 2, U.S. NRC (2007) 2. Davey, E.: Criteria for Operator Review of Workplace Changes. In: Canadian Nuclear Society Conference (2000) 3. Grozdanovic, M.: Methodology for Research of Human Factors in Control and Managing Centers of Automated Systems. The Scientific Journal Facta Universitatis: Working and Living Environmental Protection 1(5), 9–22 (2000) 4. Millstone 2: Automatic reactor scram after a steam generator feed pump trip. Plant Event # 336-040315-1, Institute of Nuclear Power Operations, March 15 (2004) 5. Sheridan, T.B., Parasuraman, R.: Human-Automation Interaction. In: Reviews of Human Factors and Ergonomics, vol. 1, pp. 89–129. Human Factors and Ergonomics Society (2005) 6. Monticello: Half scram due to cold water transient during valve operation at Monticello Nuclear Generating Plant. Plant Event # 263-070314-1, Institute of Nuclear Power Operations, March 14 (2007) 7. Darlington 2: Unit 2 turbine leading (normal) mode inadvertently entered. Plant Event # 932-041101-1, Institute of Nuclear Power Operations, Feburary 8 (2005) 8. Gentilly 2: Recirculated service water diesel motor pump 7131-P36 damaged. Plant Event # 851-001126-1, Institute of Nuclear Power Operations, November 26 (2000) 9. Cernavoda 1: Inadvertent draining of in-service fire water tank 7140-TK1 results in start and damage of the diesel engine driven pump. Plant Event # 121-030612-1, Institute of Nuclear Power Operations, June 12 (2003) 10. Cooper 1: Unintended increase in reactor power due to misoperation of reactor recirculation pump speed control. Plant Event # 298-020212-1, Institute of Nuclear Power Operations, Feburary 12 (2002)
Human-System Interface (HSI) Challenges in Nuclear Power Plant Control Rooms
737
11. Trillo 1: During routine tests, start-up of emergency diesel generator GY60 activated by reactor protection system. Plant Event # 715-071128-1, Institute of Nuclear Power Operations, November 28 (2007) 12. Xiaoming, C., Zhiwei, Z., Zuying, G., Wei, W., Nakagawa, T., Matsuo, S.: Assessment of Human-Machine Interface Design for a Chinese Nuclear Power Plant. In: Reliability Engineering and System Safety, vol. 87, pp. 37–44. Elsevier Science S.A, Amsterdam (2005) 13. Nine Mile Point 1: Two control rods scrammed during rod scram timing test. Plant Event # 220-040504-1, Institute of Nuclear Power Operations, May 4 (2004) 14. Hugo, J., Engela, H.: Function Allocation for Industrial Human-System Interfaces. In: 4th International Cyberspace Conference on Ergonomics, International Ergonomics Association 15. Perry 1: Reactor operation in unanalyzed region. Plant Event # 440-060709-1, Institute of Nuclear Power Operations, July 9 (2006) 16. Point Lepreau 1: Containment isolation system button-up during degassing of the degasser condenser. Plant Event # 908-990426-1, Institute of Nuclear Power Operations, April 26 (1999) 17. Naito, N., Itoh, J., Monta, K., Makino, M.: An Intelligent Human-Machine System Based on an Ecological Interface Design. In: Nuclear Engineering and Design, vol. 154, pp. 97– 108. Elsevier Science S.A, Amsterdam (1995) 18. Point Lepreau 1: Primary heat transport (PHT) thermal transient. Plant Event # 908040606-1, Institute of Nuclear Power Operations, June 6 (2004) 19. Gentilly 2: Local radiological alert due to a moderator leak in upgrading plants. Plant Event # 851-020212-1, Institute of Nuclear Power Operations, Feburary 12 (2002) 20. Duane Arnold 1: High temperature in fuel pool because of procedure use problem. Plant Event # 331-000111-1, Institute of Nuclear Power Operations, January 11 (2000) 21. Bugey 3: Two bank rod assemblies blocked in low position. Plant Event # 783-031230-1, Institute of Nuclear Power Operations, December 30 (2003) 22. Burns, C., Vicente, K.J.: A Participant-Observer Study of Ergonomics in Engineering Design: How constraints drive design process. In: Applied Ergonomics, vol. 31, pp. 73–82. Elsevier Science S.A, Amsterdam (2000) 23. Gentilly 2: 3481-TK2 tank draining and 3481-P1 and P2 pump cavitations. Plant Event # 851-050530-1, Institute of Nuclear Power Operations, July 12 (2005) 24. Susquehanna 2: B circulating water pump shutdown instead of B condensate pump. Plant Event # 388-050527-1, Institute of Nuclear Power Operations, May 27 (2005) 25. Norman, D.A.: Categorization of action slips. Psychological Review 88, 1–15 (1981) 26. Chen-Wing, S.L.N., Davey, E.C.: Designing to Avoid Human Error Consequences. In: Workshop on Human Error, Safety, and System Development, Paper Session 5 (1998) 27. United States Nuclear Regulatory Commission, http://www.nrc.gov
The Impact of Automation Assisted Aircraft Separation on Situation Awareness Arik-Quang V. Dao1, Summer L. Brandt1, Vernol Battiste1, Kim-Phuong L. Vu2, Thomas Strybel2, and Walter W. Johnson1 1 NASA Ames Research Center Moffett Field, CA 94035, United States of America {quang.v.dao,summer.l.brandt,vernol.battiste-1, walter.w.johnson}@nasa.gov 2 California State University Long Beach, Dept of Psychology 1250 N Bellflower Blvd, Long Beach, CA 90840, USA {kvu8,tstrybel}@csulb.edu
Abstract. This study compared situation awareness across three flight deck decision aiding modes. Pilots resolved air traffic conflicts using a click and drag software tool. In the automated aiding condition, pilots executed all resolutions generated by the automation. In the interactive condition, automation suggested a maneuver, but pilots had the choice of accepting or modifying the provided resolution. In the manual condition pilots generated resolutions independently. A technique that combines both Situation Global Assessment Technique and Situation Present Awareness Method was used to assess situation awareness. Results showed that situation awareness was better in the Manual and Interactive conditions when compared to the Automated condition. The finding suggests that pilots are able to maintain greater situation awareness when they are actively engaged in the conflict resolution process. Keywords: automation, conflict resolution, situation awareness, cockpit display of traffic information, CDTI, cockpit situation display, CSD.
1 Introduction Without additional tools and automation, increases in air traffic densities during the next few decades will exceed the capabilities of current-day controllers to manage them [1]. However, adding tools and automation to the air traffic management system will require a new distribution of roles and responsibilities across ground control, the flight deck, and automation. In response, the Joint Planning and Development Office (JPDO), in its vision of the Next Generation Air Transportation System (NextGen), has suggested multiple concepts of operation for distributing roles and responsibilities, including the delegation of aircraft separation responsibilities to flight crews, aided by automation, or to automation alone. Automated systems have been successfully used in many areas of transportation to increase operator workload capacity and system safety. However, automation has M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 738–747, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Impact of Automation Assisted Aircraft Separation on Situation Awareness
739
been shown to impact operators’ information acquisition, information analysis, decision making, and action [2]. Thus, a major issue in implementing automation in a system where humans remain “in-the-loop” is its impact on operator situation awareness (SA). As a first step towards addressing this issue, researchers from the NASA Ames Flight Deck Display Research Laboratory (FDDRL) and California State University Long Beach conducted a study comparing SA of commercial pilots performing a conflict resolution task across three levels of automation aiding. 1.1 Automated Separation Assurance on the Ground and the Flight Deck One responsibility of an air traffic controller is to detect and resolve upcoming conflicts (losses of legal separation) between aircraft. Since the number of conflicts increases with traffic density, this can lead to an unacceptable increase in controller workload. One solution is to provide conflict detection and/or resolution automation that take some of the work off the controller. In addition, researchers have been investigating the possibility of delegating separation responsibility to appropriately equipped flight decks [3]. On such flight decks, the pilot would be provided with technology that supports conflict detection and resolution. As one of the primary proponents of such automation, Erzberger [4], at the NASA Ames Research Center, has been developing an “auto-resolver”. While his efforts have been aimed mainly at a ground-based implementation, several concepts of operation could be envisioned which would utilize this automation. For example, Homola [5] examined controller acceptance of concepts where this automation would be used 1) to autonomously generate resolutions without involving pilots or controllers (fully automated mode), and 2) to allow air traffic controllers to request conflict resolutions on demand, and then modify them as they wished. In turn then, the general goal of the present study was to examine pilot acceptance of this same automation, and for this paper, to look at its impact on pilot SA. 1.2 The Impact of Automation on Situation Awareness Situation awareness refers to the operator’s understanding of the state of the relevant environment and his or her ability to anticipate future changes and developments in that environment [6]. A widely cited definition of SA specifies three levels of the construct in terms of 1) perception, 2) comprehension, and 3) projection [7]. Factors that impact SA in the context of automation include automation complacency, automation mistrust, workload, and automation transparency. Automation complacency occurs when the human falls out-of-the-loop due to over trust in the system [8]. In the extreme case, the human no longer actively processes information to maintain an awareness of the system state, diminishing his or her ability to recover from automation failure. Automation mistrust occurs when the human perceives the automation to be unreliable and devotes excessive attention to monitoring the automation. SA can also be diminished when workload is very high, resulting in attentional tunneling [2] where all attentional resources are drawn to the primary task, reducing the amount of resources available for perceiving and processing other information in the environment. SA is also impacted by the requirement to evaluate choice alternatives when interacting with decision support automation [9]. In these,
740
A.-Q.V. Dao et al.
operators must often integrate the information provided by the automation with their assessment of the situation. This additional workload could also reduce the resources available for maintaining SA. A system is “transparent” when the underlying reason and information behind the behaviors of automation are understood by the operator [10]. In a fully transparent system an operator may be led to attend to too much system information, with resulting high workload and diminished SA [11]. In the other extreme, a system that does not display information or provide adequate feedback regarding system behavior may reduce workload at the cost of transparency. Thus, the lack of transparency can also lead to diminished SA. A system that supports good SA provides transparency at a manageable workload level. 1.3 SA Probing Techniques SA measures are classified as probing techniques, rating scales, or performancecorrelated measures. Probe techniques have been identified as the most promising of the measures because they are sensitive to the operator and task environment [12] and can provide diagnostic information regarding the cause(s) of poor SA. The Situation Global Assessment Technique (SAGAT) and Situation Present Awareness Method (SPAM) are two commonly used SA probe techniques [7][13]. SAGAT questions are administered by stopping the scenario at random intervals so that SA probes do not interfere with the current tasks. Because the operator does not have access to the display during the simulation pauses, this technique is highly dependent on his or her memory for information in the simulated environment. Unlike SAGAT, SPAM does not index SA solely on the basis of the information the operator can remember. Instead, SPAM probes are administered during the course of the scenario while operators have access to information on their displays. This technique is based on the notion that information can be stored in memory or looked up on the display. If SA information, or the location of the information, is in the operator’s awareness, then response times are expected to be faster than when the information had to be located. One drawback of SPAM is that the questions may interfere with the operator’s primary task [14]. The present study attempted to avoid the problems associated with SPAM and SAGAT probes by using short 3-minute scenarios, where a conflict was presented and resolved prior to answering probe questions. The probe questions were administered after the conflict resolution and while pilots had full access to their displays, so much of the information in the environment did not have to be held in memory. And, because the scenario stopped, there was no additional workload imposed on the operator’s primary tasks while answering the questions. 1.4 The Current Study Commercial pilots viewed scenarios requiring conflict resolutions, and were presented three aiding conditions: automated, interactive, and manual. In the automated condition, pilots were required to evaluate resolutions generated by the automation. They were not allowed to modify this resolution. In the interactive condition, pilots were given the automated resolution, but were free to modify it with a Route Assessment Tool (RAT) prior to executing it. The RAT allowed the pilot to
The Impact of Automation Assisted Aircraft Separation on Situation Awareness
741
graphically stretch and bend the flight plan to generate a preferred conflict free path but did not itself provide a recommended resolution. In the manual condition, pilots resolved the conflict using the RAT alone (no automation). Using the above SA probe technique, the current study assessed pilot SA across the aforementioned three automation levels (manual, automated, and interactive). We predicted that workload in the manual condition would impose a greater demand on the pilot’s attentional resources, reducing SA compared to conditions with automation aiding. We expected that SA in the fully automated condition would be lower than in the interactive condition due to complacency. We reasoned that the SA obtained by actively examining alternatives to the suggested resolutions in the interactive condition would more than offset the loss due to the increase in workload.
2 Method 2.1 Participants Seventeen commercial airline pilots with glass cockpit experience participated in this study and were compensated $25/hr. Data from five of the pilots were excluded due to difficulty in learning to use the display and conflict resolution tools. 2.2 Apparatus All scenarios were presented on a 30” monitor using a Cockpit Situation Display (CSD; see Figure 1), developed by NASA FDDRL.
Fig. 1. CSD with conflict alerting and route proposal
The CSD, a PC-based 3D volumetric display, provided pilots with the location of surrounding aircraft, plus the ability to view the expected 4D trajectories of ownship and all traffic [15]. Embedded within the CSD was logic that detected and highlighted conflicts. Pilots were told that conflict detection was 100% reliable. In addition, the CSD had pulse predictors that emitted synchronous bullets of light that traveled along the displayed flight plans at a speed proportional to the speeds of the associated aircraft. In this way, a prediction of up to 20 minutes into the future was added to the display and provided a graphical way of confirming how close ownship was expected to come to any other aircraft.
742
A.-Q.V. Dao et al.
A version of Erzberger’s auto-resolver was implemented to aid pilots with the conflict resolution task in the automated and interactive conditions [4]. In addition, in the interactive and manual conditions participants were asked to resolve conflicts by modifying their flight plan using the RAT [16]. This tool was linked to conflict detection software allowing the pilots to find conflict-free paths. Proposed resolutions created by the automation or the operator were color coded in gray to distinguish them from the current route (amber if in conflict; magenta in nominal conditions). A pool of candidate questions was generated to capture two important dimensions of SA, time frame and processing category. A question’s time frame refers to whether it queried awareness involving past, present, or future events. A question’s processing category was divided into recall and comprehension. Recall refers to queries where the pilot had to recall the information, or where the information was located on the display. Comprehension refers to queries where the pilot had to process information before a response was made. Together, these dimensions generated six distinct probe types (see Appendix A for sample questions). All questions were checked by a commercial airline pilot for relevancy to the task. The six types of probes were counterbalanced so that each aiding condition received four probes from each category. Presentation order was randomized for each trial. 2.3 Design and Procedure The SA data were collected within a larger simulation that examined pilots’ acceptance of, and preferences for, automated conflict resolutions, and the impact of these automated resolutions on SA [17]. This paper analyzes the SA probes as a function of 3 aiding conditions, 3 question time frames, and 2 question processing categories. Participants completed 3 blocks of 16 trials. Each block presented only one level of aiding. For the automated and interactive conditions, half of the proposed resolutions were vertical and half horizontal. All participants received the automated condition in the first block due to constraints imposed by goals of the larger study [17]. Following the first block, participants were trained on how to use the RAT to create or modify the conflict resolutions. Half of the participants then received the manual condition followed by the interactive condition, while the other half received the opposite order. Each trial lasted about 3 minutes with the entire experiment lasting approximately 4 hours. Participants were provided breaks after each block of trials. During the initial 15 seconds of each trial participants could manipulate the display in any manner, but could not modify the route. The automation proposed resolution was then displayed in the automated and interactive conditions. Pilots could modify the route in the interactive condition. No automation resolution was shown in the manual condition, but the pilots used the RAT to create a resolution. For all aiding conditions, pilots had a 90 second window in which to examine/modify/create a resolution. Pilots terminated this window by executing the resolution. Pilots were encouraged to change their display between 2D and 3D views when analyzing the conflict to fully visualize the possible resolutions, especially those generated by the automation. Transitioning between 2D and 3D views prevented the pilots from making perceptual errors in differentiating vertical from horizontal resolutions. The pilots were also trained to use a number of decluttering features and could zoom in or out of the situation by varying the horizontal and vertical range.
The Impact of Automation Assisted Aircraft Separation on Situation Awareness
743
At the end of each trial, pilots were given three SA questions. It was emphasized that the questions were being used to assess SA. Therefore, participants were NOT to deliberately search for potential information being queried but to focus on the primary task, conflict resolution, during the active trial. The display remained accessible while the probe questions were presented in order to allow pilots to look up information.
3 Results Responses to the SA questions were reviewed for anomalies and missing values, and three comprehension questions were thrown out as unscorable because definitive answers could not be determined from the information available on the display at the end of the trial. The review also uncovered missing data points (4%) due to computer error and incomplete trials. These data were replaced with unweighted means calculated from the responses for each individual participant for the specific condition. No participant was missing all responses for any condition. 3.1 Analysis of Correct Responses to SA Probes The percent of correct responses to the SA questions was analyzed in a 3 (aiding condition: automated, manual, interactive) x 3 (time frame: past, present, future) x 2 (process: recall, comprehension) within-subjects analysis of variance (ANOVA). Means and standard deviations are presented in Table 1. The main effect of aiding condition was not significant, F(2, 22) = 1.37, p = .27. Participants answered a similar number of the questions correctly in the automated (73%), manual (72%) and interactive (69%) conditions. Table 1. Means and standard deviations for percent correct responses and reaction times in seconds to SA questions % Correct RT Aiding Condition Automated 73 (8) 27.17 (9.01)ab Manual 72 (8) 18.20 (5.60)a Interactive 69 (10) 19.62 (6.33)b Time Frame Past 78 (7)a 18.12 (4.40)a Present 78 (9)b 17.38 (4.52)b Future 58 (8)ab 29.49 (8.33)ab Process Recall 80 (7)a 19.07 (5.79)a Comprehension 63 (10)a 24.26 (5.50)a Note. Means in the same column within a condition that share a subscript differ at p<.05
There was a significant main effect of time frame on percent correct, F(2, 22) = 71.29, p < .001. Post-hoc comparisons performed using the Bonferroni adjustment for multiple comparisons showed significantly fewer questions were answered correctly for future (58%) than past (78%) or present (78%) queries, p < .001. The low accuracy for future questions may be a result of pilots not needing to know the “big” picture (as compared to air traffic controllers). In addition, the RAT automatically
744
A.-Q.V. Dao et al.
Proportion Correct Responses
provided information about potential conflicts along proposed routes. If pilots solely focused their attention on the immediate task of resolving a conflict, processing of information about potential future traffic events may have been prevented. A main effect of process on percent correct was also found, F(1,11) = 31.18, p < .001. Accuracy was lower for comprehension (63%) than recall (80%) questions. This is consistent with the idea that the information required by recall queries was readily available from memory or the display compared to the comprehension queries, which required more in depth understanding of the situation. There was a significant interaction between time frame and process, F (2, 22) = 28.05, p <.001. More recall queries were correctly answered than comprehension queries in the past and present time frames, but there was no difference in recall and comprehension accuracy for the future queries (see Figure 2). The low accuracy for future recall questions is surprising because the information needed to answer the query could have been found directly on the display. 1.0
Time Frame
0.9
Past Present Future
0.8 0.7 0.6 0.5 0
1 2 Recall Comprehension Processing Category
3
Fig. 2. Mean percent correct responses for SA categories
3.2 Analysis of Response Latencies to SA Probes A 3 x 3 x 2 ANOVA was performed on the response time (RT, in seconds) to SA queries (see Table 1). The p-values were adjusted using Greenhouse-Geisser for violations of sphericity, where appropriate. There was a main effect of aiding, F(2, 22) = 8.90, p < .001, where RT was longer in the automated (27.17s) condition than in the manual (18.20s) and interactive (19.62s) conditions, ps< .01, which, in turn, did not differ, p = 1.00. High RTs in the automated condition support the notion that factors such as automation complacency or system transparency have a negative impact on SA. SA appears higher in the interactive and manual conditions, where pilots were engaged in manually modifying or creating new resolutions. There was also a main effect of time frame on RT, F(2, 22) = 57.17, p < .001. Consistent with the accuracy data, pilots had the most difficulty with future queries. RT for past (18.12s) and present (17.38s) questions did not differ, p > .99, but were faster than future (29.49s) questions, p < .001. Similarly, there was a main effect for process on RT, F(1,11) = 41.43, p <.001, with faster response to recall (19.07s) queries than to comprehension (24.26s) queries. The aiding by time frame interaction was significant, F(4, 44) = 4.37, p <.01 (Figure 3, left panel). Response times for past and present questions were significantly
The Impact of Automation Assisted Aircraft Separation on Situation Awareness
745
faster than future in all aiding conditions (p <.01), but more so for the future than past or present time frame, p < .001. The time frame by process interaction was also significant, F(2, 22) = 42.40, p < .001. Again, recall queries were responded to quicker than comprehension queries for past and present queries, p <.001, but the reverse was true for future queries, p < .01 (see Figure 3, right panel). The SA response latencies for correct and incorrect responses to the SA questions were then reviewed. Although not statistically analyzed, trends in the data show that response times were shorter when a participant answered the question correctly compared to when questions were answered incorrectly. This implies that response latencies do reflect SA and are not due to a speed accuracy trade-off. 50
RT to SA Probes (in sec)
RT to SA Probes (in sec)
50 40 30 20 10 0
Time Frame Past Present Future
40 30 20 10 0
0
1 2 3 Manual Interactive Automated Aiding Condition
4
0
1 2 Recall Comprehension Processing Category
3
Fig. 3. Reaction time to SA probe questions as a function of aiding and time frame (left panel) and processing category and time frame (right panel)
Given that the automated condition was always presented first, poorer performance in that automated condition could have been due to a general learning effect. To see if learning was occurring, performance on the first and second halves of the automated trials were compared. There were no significant difference in percent correct between the first half (M = 71%, SD = 12%) and the second half (M = 77%, SD = 10%) of the automated aiding trials (F(1,11) = 1.81, p = .21); or in SA probe latencies (M = 26.05s, SD = 9.86s for the first half and M = 26.80s, SD = 10.56s for the second half, F(1,11) = .06, p = .82). Thus, it is likely that the aiding effects reported are due to the automation level and not to learning.
4 Discussion The goal of this study was to examine SA across two levels of automation and a fully manual condition. SA queries were designed to probe pilot recall of information, as well as comprehension for information about past, present, and future events in the scenarios. There were no overall differences for aiding levels for SA probe accuracy, but there was a difference for SA response latencies. This finding implies that response time may be a more sensitive index of SA than the accuracy of responses. Response times were faster in the interactive and manual conditions when compared to the automated condition, suggesting that fully automated conflict resolution does not equally support SA. The effect of diminished SA in the automated condition may
746
A.-Q.V. Dao et al.
have been due to complacency, perhaps exacerbated by the lack of transparency in the flight deck implementation of the auto-resolver. Relatively better SA in the interactive and manual conditions implies that conflict resolution systems may profit from keeping the human operator actively engaged in the task. However, the lack of a difference between the manual and interactive condition shows that conflict resolution automation can be implemented without a cost to SA. Unlike aiding, probe accuracy varied as function of time frame. Pilots scored lower on probes for information about the future status of an event than on probes for present and past information. It is possible that the 90 second time pressure of the conflict resolution task required pilots to focus more on the immediate traffic situation. We speculate that pilots have more egocentric awareness and do not require the “big picture” of expert controllers. In addition, the increased frequency of response errors to future questions may have been due to limited access to information that would otherwise provide more clues about future traffic events. Although pilots had access to information on the display while responding to the SA probes, the traffic in the scenarios was frozen. Therefore, pilots were not able to access additional information that could have been attained if a traffic event was continued. Consistent with results found for the frequency of correct responses, analysis of response times for time frame revealed slower response times for future questions than for past and present questions. Patterns in the data reveal that latencies were greater when accuracy scores were lower, thus the data does not reflect a speed accuracy trade-off, but imply that the response latencies reflect SA [13]. In general, an operator can experience diminished SA when control of system state changes is delegated to either automation or to other human operators [8]. Future studies will assess team SA in scenarios where air traffic management responsibilities are coordinated across automation, pilots, and controllers. Acknowledgement. This simulation was partially supported by NASA cooperative agreement NNA06CN30A.
References 1. Joint Planning and Development Office: Next Generation Transportation System: Concept of Operation V 2.0. Washington D.C: Government Printing Office (2007) 2. Parasuraman, R., Wickens, C.D.: Humans: Still vital after all these years of automation. Human Factors 50, 511–520 (2008) 3. Lee, P.U., Mercer, J.S., Martin, L., Prevot, T., Shelden, S., Verma, S., Smith, N., Battiste, V., Johnson, W., Mogford, R., Palmer, E.: Free Maneuvering, Trajectory Negotiation, and Self-Spacing Concept In Distributed Air-Ground Traffic Management. In: USA/Europe Air Traffic Management R&D Seminar, Budapest, Hungary (2003) 4. Erzberger, H.: Automated Conflict Resolution For Air Traffic Control. In: Proceedings of the 25th International Congress of the Aeronautical Sciences, Hamburg, Germany (2006) 5. Homola, J.: Analysis of Human and Automated Separation Assurance at Varying Traffic Levels. Master’s Thesis, San Jose State University, San Jose, CA (2008) 6. European Air Traffic Management Programme: The Development of Situation Awareness Measures in ATM Systems. HRS/HSP-005-REP-01 (2003)
The Impact of Automation Assisted Aircraft Separation on Situation Awareness
747
7. Endsley, M.R.: Measurement Of Situation Awareness In Dynamic Systems. Human Factors 37(1), 65–84 (1995) 8. Parasuraman, R., Sheridan, T.B., Wickens, C.D.: A Model For Types And Levels of Human Interaction with Automation. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans 30(3), 286–296 (2000) 9. Endsley, M.R., Bolte, B., Jones, D.G.: Designing for Situation Awareness: An Approach to User-Centered Design. Taylor & Francis, New York (2003) 10. Mark, G., Kobsa, A.: The Effects Of Collaboration And System Transparency. Presence 14(1) (2005) 11. Duggan, G.B., Banburry, S., Howes, A., Patrick, J., Waldron, S.M.: Too Much, Too Little Or Just Right: Designing Data Fusion For Situation Awareness. In: Proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting, pp. 528–532 (2004) 12. Strybel, T.Z., Vu, K.-P.L., Kraft, J., Minakata, K.: Assessing The Situation Awareness Of Pilots Engaged In Self Spacing. In: Proceedings of the Human Factors and Ergonomics Society 52nd Annual Meeting, pp. 11–15 (2008) 13. Durso, F.T., Bleckley, M.K., Dattle, A.R.: Does Situation Awareness Add To The Validity Of Cognitive Tests? Human Factors, 721–733 (Winter 2006) 14. Pierce, R.S., Vu, K.-P.L., Nguyen, J., Strybel, T.: The Relationship Between SPAM, Workload, and Task Performance on Simulated ATC Task. In: Proceedings of the Human Factors and Ergonomics Society 52nd Annual Meeting, pp. 34–38 (2008) 15. Granada, S., Dao, A.Q., Wong, D., Johnson, W.W., Battiste, V.: Development And Integration of a Human-Centered Volumetric Cockpit Display for Distributed Air-Ground operations. In: Proc.12th International Symposium on Aviation Psychology, Oklahoma City, OK (2005) 16. Canton, R., Refai, M., Johnson, W., Battiste, V.: Development And Integration Of HumanCentered Conflict Detection And Resolution Tools For Airborne Autonomous Operations. In: Proc.12th International Symposium on Aviation Psychology, Oklahoma City, OK (2005) 17. Battiste, V., Johnson, W.W., Dao, A.-Q., Brandt, S., Johnson, N., Granada, S.: Assessment of Flight Crew Acceptance of Automated Resolution Suggestions and Manual Resolution Tools. In: 26th International Congress of the Aeronautical Sciences, Anchorage, AK (2008)
Appendix A: Sample Situation Awareness Questions Past Recall Past Comprehension Present Recall Present Comprehension Future Recall Future Comprehension
What was your heading at the start of the trial? What was the difference in altitude of Ownship and the Intruder prior to the resolution? What is your current heading? Currently, what is the difference in altitude between Ownship and the nearest proximal traffic? What will be your groundspeed at the push point? How much additional time will it take you to reach the next original waypoint, given the executed resolution?
Separation Assurance and Collision Avoidance Concepts for the Next Generation Air Transportation System John P. Dwyer1 and Steven Landry2 1
2
Boeing Research & Technology School of Industrial Engineering, Purdue University [email protected]
Abstract. A review was conducted of separation assurance and collision avoidance operational concepts for the next generation air transportation system. The concepts can be distributed along two axes: the degree to which responsibility for separation assurance and collision avoidance is assigned to the controller verses the pilot(s), and the degree to which automation augments or replaces controller and pilot functions. Based on an analysis of the implications of these concepts from a human factors standpoint, as well as the technological readiness of the concepts, it appears that some form of supervisory control of separation by controllers is the most viable concept. Keywords: Air traffic control, separation assurance, automation, roles and responsibilities.
1 Introduction To accommodate 2 to 3 times air traffic demand, significant changes to the roles and responsibilities of air traffic controllers and pilots must occur. Cursory observation of the system, discussions with controllers, and experimental studies confirm this statement. Specifically, experimental work done by NASA Airspace Operations Lab (AOL) has demonstrated that beyond about 1.5 times the current day’s traffic density, air traffic controller performance at the primary function of assuring proper separation declines precipitously [19]. Under higher capacity scenarios, therefore, allocation of the roles and responsibilities for separation assurance must change. An analysis of options for separation assurance roles and responsibilities has been conducted. Implications of these different concepts, including the likely impact on separation assurance situation awareness, the likely impact on workload, and feasibility impacts are discussed in this document. In addition, several specific, existing concepts are examined in detail. Experimental work is underway to provide empirical evidence to support the impact estimates.
2 Separation Assurance Concepts There are a number of plausible changes to separation assurance roles and responsibilities. First, the separation assurance function may be shared between pilots M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 748–757, 2009. © Springer-Verlag Berlin Heidelberg 2009
Separation Assurance and Collision Avoidance Concepts
749
and controllers, without support from new automation. Second, some form of automation may aid or replace the controller’s function. Third, on-board aircraft automation may carry out a separation assurance function or aid the pilot in doing so. Finally, some combination of these options may be used. Each of these general concepts will be outlined in the next section. After this discussion, a review of several tested concepts will be presented. 2.1 Pilot/Controller Shared Separation Assurance (No New Automation) In one sense, the current air traffic control system shares responsibility for safe operation between controllers and pilots. Air traffic controllers provide separation, keeping aircraft apart by a procedurally mandated distance. Collision avoidance is provided by an Automated Collision Avoidance System (ACAS) on board the aircraft, supplemented by “see-and-avoid” – pilots visually scanning for potential collisions. The two systems work independently from one another, with no required coordination between them, although pilots may indicate to controllers when they are following ACAS avoidance maneuvers. However, while providing additional safety, ACAS is not sufficient to increase the capacity of the air traffic management system since it has no direct impact on the controller’s function. ACAS is a backup system, and provides no contribution to procedurally mandated separation. It instead aids in the prevention of collisions, typically after proper separation has already been lost. To improve capacity, pilots must share in the separation responsibility currently allocated solely to controllers. The most aggressive form of this strategy is “free flight,” in which pilots assume primary separation assurance responsibility [1]. However, no free flight concept has been advanced that does not also require new on-board automation, such as a cockpit display of traffic information (CDTI) or an advanced ACAS [3, 23]. For this approach, CDTI would display all traffic within the selected range of the instrument, and would likely be integrated with an advanced ACAS. Current ACASs work well as tactical collision avoidance backups, but lack capabilities required to perform separation assurance. ACASs work by interrogating the transponders of nearby aircraft and calculating relative positions and closure rates. If the system detects an excessive closure rate, it alerts the pilot. The most severe level of alert also includes a resolution maneuver, which can be coordinated with the intruding aircraft (if that aircraft is properly equipped). However, current ACAS has no capability to resolve multiple aircraft intrusion problems, does not consider strategic needs (such as the requirement to return to course), and does not work well in high-density airspace (e.g., the terminal area). For these reasons (among others), it would have to be radically modified to perform well as a primary separation assurance tool. We know of no concept that, in the absence of advanced automation, would augment the controller’s separation assurance function by shifting some responsibility to the flight deck. Moreover, it seems unlikely that, given current air crew workload, they could shoulder more of this burden without automation assistance. In short, this change in responsibility would likely transfer situation awareness from the controller to air crews at the cost of significantly increasing their workload. In addition, under future demand scenarios, this concept would not seemingly reduce the controllers’ workload to an acceptable level. For future discussion in this document, this concept will be referred to as “Shared Responsibility – No Automation.”
750
J.P. Dwyer and S. Landry
2.2 Air Traffic Control Automation Since automation appears to be the only solution to expanding the capacity of controlled airspace to the extent desired, one option is to augment or replace the controller’s separation assurance function with automation. Such a concept is being considered and developed by NASA under the Advanced Airspace Concept (AAC) [8]. (Other concepts are being developed but are further behind in terms of technology readiness.) One possible implementation of AAC uses only a centralized, air traffic control-based automation system. Such an implementation involves little change to pilot roles, so it would have minimal to no direct impact on their situation awareness and workload. Under this implementation [9], an automated system would detect likely future losses of separation up to 15 minutes prior to the event. Another part of the automated system (the “autoresolver”) would then identify a trajectory that would resolve the (primary) conflict. This system would look strategically at conflicts, incorporating intent (via the flight plan) and ensuring that secondary conflicts do not occur when resolving a primary conflict. A shorter separation-horizon tactical system would provide an independent backup for unresolved short-range conflicts using only projected current states and not considering secondary conflicts. The concept then diverges into several possibilities regarding the alerting and resolution functions. One possibility is that the system only identifies the possible separation problem to the controller, who would then be responsible for resolving it. Considerable work has been done on automation that can identify conflicts [2, 22], although no system is sufficiently ready that it could be fielded in the near term. Such a system essentially replaces (or augments) the conflict detection role of the controller but leaves the controller’s conflict resolution role unchanged. However, given that it is both detection and resolution that proved difficult for controllers under high capacity scenarios [1], such a concept seems to be insufficient due to excessive workload for controllers. This concept will be referred to as “Conflict ID Only.” A second possibility is that the system identifies the conflict, and the controller uses an automated tool to manually identify a resolution. This concept replaces/ augments the conflict detection role and augments the conflict resolution role of the controller. Such a concept is being investigated empirically, as will be discussed toward the end of the paper. This additional conflict resolution automation used by the controller, while utilizing the same conflict detection system described above, would pose a currently unresolved human factors challenge regarding the display of this function. It also seems that the workload associated with using the tool would quickly exceed the capability of the controller should multiple, short-term conflicts arise simultaneously or in very high-density situations. Nonetheless, the system, via the highly integrated controller role, would seem to retain a great degree of controller situation awareness whereas more completely automated concepts would begin to reduce it. This concept will be referred to as “Conflict ID with Resolution Tools.” A third possibility is that the system identifies the conflict and at least one resolution to the controller, who would have the option of accepting or modifying the resolution. This concept is similar to the previous one, except that the controller is under no obligation to develop a resolution (i.e. the controller can simply accept the automation’s resolution). Given its possibly low impact on controller situation
Separation Assurance and Collision Avoidance Concepts
751
awareness, along with its reduction in workload compared to the less-automated approaches described above, this option is being pursued as the current primary candidate for an operational concept, and will be referred to as “Conflict ID and Resolution Option.” This concept includes relieving the controller of responsibility for losses of separation, and instead places that responsibility on the automation. A fourth possibility is that the system identifies the conflict and computes a resolution that is automatically implemented, without controller intervention. Such a system would significantly reduce controller workload. This concept is also being seriously pursued, although the significant impact on the situation awareness of the controller is problematic. In addition, it is expected that such a concept is impractical unless the automation can be demonstrated to be 100% reliable, which at the current time seems unlikely. This concept will be referred to as “Conflict ID and Automatic Resolution.” This “Conflict ID and Automatic Resolution” option can be modified in a number of ways. For example, the responsibility for separation can be shared with the flight deck (to be discussed in the next section). Alternately, the controller can be made responsible for some of the aircraft in the sector (including any of the other air traffic automation options), while automation handles the remaining aircraft. This “segmentation” can be on the basis of a number of factors, including equipage, time to conflict, and “nominal” versus “off-nominal” groupings. Some means for distinguishing those under the control of the automation from those under the control of the air traffic controller (such as dimming the datatags of those handled by automation) would need to be defined. This idea is particularly appealing for the more highly automated concepts, as it holds the possibility of recapturing some of the controller situation awareness lost under those concepts. In the near term, many aircraft in the airspace system will not be equipped with technologies required for the automation schemes discussed above to be effective [18]. In a basic implementation of any of the above concepts, all aircraft currently controlled under radar can be tracked and resolved. However, it is likely that additional technologies, such as a satellite-based positioning system, a position and intent broadcast system (such as Mode S or ADS-B), data communications, and advanced Flight Management System (FMS) computers would be helpful for increasing the capacity of the system under high-density scenarios. One option is therefore that automation would resolve conflicts for equipped aircraft, while controllers handled unequipped aircraft. Another proposal is to segment aircraft on the basis of time to loss of separation (LOS). At present, it seems likely that AAC will not be able to resolve 100% of conflicts. In particular, controllers may be required to resolve conflicts either a short time to LOS (inside of a few minutes so as not to rely on tactical backups) or a LOS time longer than that handled by the strategic autoresolver. Such aircraft would be the responsibility of the controller, who may use automation tools to assist with resolutions. Another concept is that responsibility for “off-nominal” aircraft, such as emergencies or other high-priority aircraft, may be shifted to the controller. Such a concept was investigated, but controllers found substantial difficulty in accomplishing this task without resolution automation assistance. Even so, comments by the participant controllers in the experiment seemed to suggest that they were not capable
752
J.P. Dwyer and S. Landry
of understanding whether computer-based resolutions were accurate or not, making their role in such a control loop of questionable benefit. 2.3 Flight Deck Automation A large number of flight deck automation concepts have been proposed [3, 11, 10], although the distinction between these concepts is mainly captured by the algorithms used by an airborne separation assurance system (ASAS). These concepts are typically referred to as “Distributed Control” concepts. These concepts have the desirable feature of reducing controller workload, although this reduction is at the expense of much lower controller situation awareness. Unfortunately, no mature concept has been proposed. However, one can speculate that aircraft with ASAS could handle separation from each other (and possibly from unequipped aircraft as well). The ASAS would identify conflicts to the pilot, who would handle them in a manner similar to the controller options outlined above (ID only, resolution tools, resolution option, and automatic resolution). For such concepts, the mix of aircraft is particularly problematic. Currently conceived algorithms often require (at least) an ASAS to be on board both aircraft, and often work best when the aircraft are ADS-B equipped. This is similar to the current situation with ACAS, which works best when both aircraft are equipped. 2.4 Combination of Centralized and Distributed Separation Assurance Systems What seems more likely is that centralized and distributed systems will act in concert to provide both separation assurance and collision avoidance. Certainly some form of collision avoidance will be on board all commercial aircraft, as it is in today’s system. It is unclear as yet, however, what form such distributed systems would take and what function they would serve in a “Mixed Concept.” One possibility arises from the idea that a future system concept will need to demonstrate that it is at least as safe as today’s system. One way of accomplishing this is to ensure that, in the event of failures, the worst-case mode of the system is today’s system. In other words, the system would need to be able to gracefully degrade from an automated system to a fully manual (i.e. today’s) system. One method of accomplishing this is to ensure that ASASs can “pick up” the separation function should the centralized system fail. ASASs could run in parallel with the centralized system, perhaps deferring to it in normal operation. In case the centralized system fails (e.g., as indicated by the loss of a “heartbeat” signal), the ASAS would resolve separation issues. Under conditions of more general failures of the centralized system, the ASASs could be designed to migrate the system from high-capacity (i.e. 3x) to current-level capacity (i.e. 1x), which could then be managed manually by an air traffic controller. 2.5 Pilot and Controller Roles and Responsibilities under Automation Schemes Under the automation schemes presented, the roles and responsibilities of the controllers and pilots would clearly change. The evolution path to these new concepts from today’s system is undefined, as is the effect that the various schemes would have on the performance of pilots and controllers. Empirical research is underway to
Separation Assurance and Collision Avoidance Concepts
753
answer these questions. As of now, two options appear to be most likely: automation monitoring, and supervisory control (airspace management). These functions are in addition to the assumption that some responsibility for separation assurance and collision avoidance would remain with the pilot and controller. Under a typical automation monitoring concept, the controller would track the automation, intervening when it failed to detect or resolve a conflict, and would take over in case the system failed [21]. However, in the present case this seems highly implausible since the system has been automated because the controller is unable to accomplish the task. Removing the controller from the control loop (radically lowering the controller’s situation awareness) with re-insertion under conditions of automation failure, would simply not work. Humans are (in general) poor monitors of reliable automation [20], and the abrupt and unanticipated shift in workload from near zero to extremely high is conducive to very poor performance. Moreover, the controller would most likely be unable to detect system failures for the same reasons. Therefore, a typical automation monitoring concept seems ill-advised. One variation of the automation monitoring concept attempts to keep the controller “in the loop.” Using “conflict ID and resolution option” and “segment by equipage,” controllers would be shown only those aircraft that are expected to conflict. (Other aircraft would be “grayed out” to reduce clutter.) Automation would identify the conflict and a resolution to the controller, who can choose to implement the resolution, or not address the conflict immediately. If the conflict is left unresolved, the automation would transmit the resolution to a properly equipped flight deck once a certain time to loss of separation was reached. Controllers would not be responsible for losses of separation; that responsibility would remain with the automation. However, controllers would be responsible for communicating resolutions to unequipped aircraft. This concept has been tested at NASA with substantial success. Under one supervisory control concept, controllers would “manage” the airspace by setting global flow factors for the sector, such as separation minima and acceptance rates along sector routes. The automation would control the traffic subject to these constraints. When off-nominal events (such as adverse weather) were likely (or currently) impacting the sector, the controller would “throttle back” the traffic using these settings. Such a concept would keep the controller aware of the conditions in the sector, and keep workload at a reasonable level. However controller situation awareness of the actual traffic could be severely reduced, making it impossible to detect automation errors or taking over in case of system failure. Another supervisory control concept arises out of the observation that controllers currently mitigate many conflicts, whereas proposed automation merely detects and intervenes. Controllers intervene not only to resolve projected conflicts (or near conflicts), but also intervene to ensure that there is sufficient time to intervene should some off-nominal event occur. For example, controllers might place an aircraft that is climbing to one thousand feet below a second aircraft’s altitude on a parallel, rather than intersecting, course, just to be sure that, should the aircraft fail to level off at the assigned altitude, it will not rapidly pose a LOS or even collision danger. Such strategies are extremely common for controllers, and suggest a possible supervisory control role. Controllers might monitor near, but not actual, LOS aircraft pairs (which would be resolved by the automation), and could instruct the automation to resolve such pairs if the controller felt that there was sufficient likelihood that an
754
J.P. Dwyer and S. Landry
adverse situation could arise quickly in the event of an unfortunately timed offnominal event. The controller could even be kept apprised of the number of such pairs, and could intervene to “lessen the pressure” on the sector by (for example) reducing capacity, or by creating contingency plans. 2.6 Summary of General Classes of Concepts Based on the above analysis, a summary of currently proposed automation schemes, along with their technological readiness (with “low” readiness equating to “needing much more development”), impact on controller and pilot situation awareness and workload, and an assessment of their viability, are shown in Table 1. Viable concepts include those that utilize ATC automation, with or without flight deck automation. Table 1. Taxonomy of separation assurance concepts and projected impact on situation awareness and workload Expected impact comparing NextGen environment to current environment
Concept
Shared responsibility - no automation Conflict ID only Conflict ID with resolution tools
ATC automation Conflict ID with resolution options Conflict ID with autoresolver Distributed control Mixed concepts
Expected technological readiness
Controller situation awareness
Controller workload
Pilot situation awareness
Pilot workload
Concept appears viable?
High
Small reduction
Excessive
Moderate increase
Excessive
No
Medium-High
Neutral
Excessive
Neutral
Neutral
No
Medium
Neutral
Very large increase
Neutral
Neutral
Yes
Medium
Small reduction
Moderate increase
Neutral
Neutral
Yes
Medium
Large reduction
Moderate reduction
Neutral
Neutral
Yes
Low
Very large reduction
Large reduction
Large increase
Large increase
No
Medium-Low
Moderate reduction
Moderate increase
Moderate increase
Moderate increase
Yes
3 Concept Implementations Currently Under Study Several versions of the concepts described above are under investigation by researchers at NASA. These candidate schemes are versions of the air traffic automation discussed herein. In the three concepts outlined in this section, automation detects aircraft-pair LOS conflicts, calculates candidate conflict resolutions, and can (depending on extent of air traffic controller involvement) deliver the resulting resolution clearances to flight decks for pilot or FMS implementation. 3.1 Candidate Concept 1: Conflict ID with Resolution Tools Simulations at NASA have been conducted using an operational concept that provides controllers with automated conflict identification and conflict resolution tools [17]. The simulations included arrival spacing tasks, and controllers used the conflict resolution tools to help manage spacing and resolve conflicts.
Separation Assurance and Collision Avoidance Concepts
755
Under this concept, controllers had available to them decision support tools for scheduling and trajectory planning. Aircraft were equipped with an FMS and data communications system, an airborne separation assistance system, and a flight deck display of traffic information. Controllers used substantively the same procedures as today, augmented by the above mentioned decision support tools and communications system. In one version of the concept, voice communication was still necessary to hand off aircraft to the next sector controller, and some of the data communication information was not used by the ground automation. In a subsequent version, these shortcomings were removed. The results of this simulation indicated that without the ability to uplink trajectories electronically to flight decks, the concept, while manageable, did not provide capacity improvements. With more tightly integrated tools, the system was able to manage a 50% increase in traffic capacity with little workload increase over current day operations. A subsequent simulation (with no arrival task) of this concept found that operations with 200% traffic was “somewhat manageable,” and 300% traffic was unmanageable [19]. However, these simulations were conducted using 100% equipage with the improved automation. In conditions of less than complete equipage, it would be expected that this increased traffic capacity would be correspondingly lessened. 3.2 Candidate Concept 2: Conflict ID with Resolution Options Recent simulations at NASA have been using an operational concept that provides controllers with automated conflict identification and a resolution option [19]. The resolution option was displayed when requested by the controller, and could be accepted and uplinked or modified. This concept was implemented in en-route airspace (no arrival task), but otherwise was similar to the configuration for candidate concept 1. A slightly updated controller display was used. Controller workload was reduced using this concept, and appeared feasible at even the 300% traffic level. Controller acceptance of the automated resolutions was high. However, situation awareness was not tested, nor was system disruption (e.g., system failures). Moreover, there was no operational concept for handling system disruptions. Additional simulations were run using this concept at different levels of equipage. The basic finding is that controller workload is driven by the number of unequipped aircraft, and initial results suggest that some unequipped aircraft could be handled under these conditions, but more data analysis is needed to confirm this. 3.3 Candidate Concept 3: Conflict ID with Auto Resolution (with segmentation) One last scheme evaluated at NASA is a fully automated concept in which automation closes the loop by electronically transmitting a conflict resolution to the appropriate aircraft [19]. That resolution is then implemented by the pilot or FMS. In such a case, controllers are “out of the loop,” and are only responsible for unequipped aircraft. The results for this concept are virtually the same as for candidate concept 2 (again, situation awareness was not measured). Resolutions have been found to be acceptable to controllers, and workload at 300% traffic levels is acceptable. Tests are underway to evaluate the effect of disruptions on this (and other) concept(s).
756
J.P. Dwyer and S. Landry
Acknowledgments. This research was supported by NASA Ames Research Center under cooperative agreement NNA06CN30A (Walter Johnson, Technical Monitor).
References 1. Andreatta, G., Brunetta, L., Guastalla, G.: From ground holding to free flight: An exact approach. Transportation Science 34, 394–401 (2000) 2. Barhydt, R., Eischeid, T.M., Palmer, M.T., Wing, D.J.: Use of a prototype airborne separation assurance system for resolving near-term conflicts during autonomous aircraft operations. In: Proceedings of the AIAA Guidance, Navigation and Control Conference, Austin, TX, USA, AIAA (2003) 3. Bilimoria, K.D., Sheth, K.S., Lee, H.Q., Grabbe, S.R.: Performance evaluation of airborne separation assurance for free flight. Air Traffic Control Quarterly 11, 85–102 (2003) 4. Consiglio, M.C., Carreno, V., Williams, D.M., Munoz, C.: Conflict prevention and separation assurance in small aircraft transportation systems. Journal of Aircraft 45, 353 (2008) 5. Consiglio, M.C., Hoadley, S., Wing, D., Baxley, B.: Safety performance of airborne separation: Preliminary baseline testing. In: Proceedings of the 7th AIAA Aviation Technology, Integration and Operations Conference, Belfast, Ireland (2007) 6. Dowek, G., Munoz, C., Carreno, V.: Provably safe coordinated strategy for distributed conflict resolution. In: Proceedings of the AIAA Guidance Navigation, and Control Conference and Exhibit, San Francisco, California (2005) 7. Eby, M.S., Kelly III, W.E.: Free flight separation assurance using distributed algorithms. In: Proceedings of the 1999 IEEE Aerospace Conference (1999) 8. Erzberger, H.: The automated airspace concept. In: 4th USA/Europe Air Traffic Management R&D Seminar, Santa Fe, NM (2001) 9. Erzberger, H.: Transforming the NAS: The Next Generation Air Traffic Control System. In: 24th International Congress of the Aeronautical Sciences, Yokohama, Japan (2004) 10. Galdino, A.L., Munoz, C., Ayala-Rincon, M.: Formal verification of an optimal air traffic conflict resolution and Rrecovery algorithm. In: Leivant, D., de Queiroz, R. (eds.) WoLLIC 2007. LNCS, vol. 4576, pp. 177–188. Springer, Heidelberg (2007) 11. Hill, J.C., Archibald, J.K., Stirling, W.C., Frost, R.L.: A multi-agent system architecture for distributed air traffic control. In: AIAA Guidance, Navigation, and Control Conference AIAA 2005-6049, San Francisco, CA, pp. 1–11 (2005) 12. Hwang, I., Kim, J., Tomlin, C., McNally, D., Gong, C., Rantanen, E.M., Naseri, A., Neogi, N.: Protocol-Based Conflict Resolution for Air Traffic Control. Air Traffic Control Quarterly 15, 1–34 (2007) 13. Jackson, M.R.C., Sharma, V., Haissig, C.M., Elgersma, M.: Airborne technology for distributed air traffic management. In: Proceedings of the 44th IEEE Conference on Decision and Control, Seville, Spain, pp. 3947–3954 (2005) 14. Johnson, W.W., Battiste, V., Delzell, S., Holland, S., Belcher, S., Jordan, K.: Development and demonstration of a prototype free flight cockpit display of traffic information. In: Proceedings of the 1997 SAE/AIAA World Aviation Conference (1997) 15. Kuchar, J.K., Yang, L.C., Mit, C.: A review of conflict detection and resolution modeling methods. IEEE Transactions on Intelligent Transportation Systems 1, 179–189 (2000) 16. McNally, D., Gong, C.: Concept and Laboratory Analysis of Trajectory-Based Automation for Separation Assurance. Air Traffic Control Quarterly 15, 35–63 (2007)
Separation Assurance and Collision Avoidance Concepts
757
17. Prevot, T., Battiste, V., Callantine, T., Kopardekar, P., Lee, P., Mercer, J., Palmer, E., Smith, N.: Integrated air/ground system: Trajectory-oriented air traffic operations, data link communication, and airborne separation assistance. Air Traffic Control Quarterly 13, 201–229 (2005) 18. Prevot, T., Callantine, T., Lee, P., Mercer, J., Battiste, V., Johnson, W., Palmer, E., Smith, N.: Cooperative air traffic management: A technology enabled concept for the Next Generation Air Transportation System. In: 5th USA/Europe Air Traffic management Research and Development Seminar, Baltimore, MD (2005) 19. Prevot, T., Homola, J., Mercer, J.: Human-in-the-loop evaluation of ground-based automated separation assurance for NEXTGEN. In: 8th AIAA Aircraft Technology, Integration, and Operations Conference, Anchorage, Alaska (2008) 20. Senders, J.W.: The human operator as a monitor and controller of multidegree of freedom systems. Ergonomics: Major Writings (2005) 21. Sheridan, T.B.: Telerobotics, Automation, and Human Supervisory Control. The MIT Press, Cambridge (1992) 22. Tomlin, C., Mitchell, I., Ghosh, R.: Safety verification of conflict resolution manoeuvres. IEEE Transactions Intelligent Transportation Systems on 2 2, 110–120 (2001) 23. Yang, L.C., Kuchar, J.K.: Prototype conflict alerting system for free flight. Journal of Guidance, Control, and Dynamics 20, 768–773 (1997)
Analysis of Team Communication and Collaboration in En-Route Air Traffic Control Kazuo Furuta1, Yusuke Soraji1, Taro Kanno1, Hisae Aoyama2, Daisuke Karikawa3, and Makoto Takahashi3 1
Department of Systems Innovation, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan {furuta,soyraji,kanno}@cse.sys.t.u-tokyo.ac.jp 2 Air Traffic Management Department, Electronic Navigation Research Institute, 7-42-23 Jindaiji-higashi-machi, Chofu 182-0012, Japan [email protected] 3 Department of Management Science and Technology, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba, Sendai 980-8579, Japan {daisuke.karikawa,makoto.takahashi}@most.tohoku.ac.jp
Abstract. Ethnographic field observation was carried out at the Tokyo Air Traffic Control Center to obtain video, radio communication, verbal conversation, and journal records. Having analyzed these data based on the cognitive model of a radar controller of our previous work and a notion of Team Situation Awareness (TSA), a cognitive model of an ATC team has been established. It was revealed that instantiation of TSA heavily owes to verbal communication but that role assignment among team members is implicitly and smoothly determined once TSA has been established. Team cognitive process of ATC controllers is therefore well described by TSA development and Naturalistic Decision-Making (NDM). Keywords: Aviation safety, air traffic control, human factors, team cognitive model, task analysis.
1 Introduction Due to increasing air traffic demands, it is expected that workload on Air Traffic Control (ATC) tasks will also increase. Prevention of human errors in ATC is therefore a key issue for keeping a high level of air traffic safety and reliability. Cognitive aspects of ATC have not yet been studied in depth, e.g., no comprehensive and concrete measures of ATC performance have been proposed. With this background, our research project aims at constructing a cognitive model of ATC and proposing a quantitative measure of ATC performance finally to establish the technological basis of ATC human factors. This paper focuses on team collaboration among ATC team in en-route air traffic control, which is for controlling air traffic in the cruising phase of aviation. For enroute air traffic control, air routes are divided into several segments of controlling areas M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 758–765, 2009. © Springer-Verlag Berlin Heidelberg 2009
Analysis of Team Communication and Collaboration in En-Route Air Traffic Control
759
called sectors. An ATC team of two controllers takes charge of each sector: a radar and coordinator controller. Another controller will join the team for acknowledging flight clearance in a busy sector, but this study considers just an ATC team composed of two members as shown in Fig. 1. Our previous study showed that cognitive process of a radar controller can be described well with a concept of routine [1]. Since an ATC team performs ATC tasks in collaboration, modeling individual cognitive processes is insufficient to understand ATC tasks. This study is therefore aims at constructing a cognitive model of team collaboration process of controllers based on ethnographic field observation of enroute ATC tasks.
Fig. 1. Work situation of en-route air traffic control
2 Field Observation We carried out ethnographic field observation to obtain basic data for analyzing ATC tasks. Data was obtained at the Tokyo Air Control Center from May 7th to 11th, 2007, during daytime periods when a certain level of workload was imposed on controllers due to relatively heavy traffic. Two video cameras were used to record activities of controllers and another one to record the radar screen of an auxiliary controlling console showing the same radar image of the controllers in charge. An IC voice recorder installed above the radar screen recorded conversation between controllers. In addition, records of radio communication between radar controllers and pilots, positions, ground speeds, altitudes of air traffic, which were shown on the radar screen and stored in a computer of the control center, were obtained. The target sector of observation is called “Kanto-north” (T03), which covers the northern area of Tokyo. A lot of air traffic overpasses this sector while departing from and arriving at two huge international airports, Haneda and Narita, as well as smaller
760
K. Furuta et al.
airports and air bases. For this reason, this sector is suited to observe various types of en-route ATC tasks.
3 Data Analysis and Result Obtained records of controllers’ conversation and actions were transcribed, segmented according to the units of basic ATC instruction, and then analyzed by goal-means task analysis [2] or distributed cognitive analysis [3]. The aim of the analysis is to understand cognitive process of en-route air traffic controllers, in particular, collaboration between a radar and coordinator controller, and coordination task with neighboring sectors by a coordinator. Since the cognitive process of a radar controller can be described with a routine, the process that a controller team formed shared recognition of a common routine was a point of team collaboration. We carried out this analysis relying on a notion of Team Situation Awareness (TSA) that is defined based on mutual beliefs [4]. The example cases described below show how an ATC team develops TSA in a specific ATC situation through verbal or non-verbal communication. These cases are based on the situation actually occurred in about 15 minutes after 14:45. May 7th, 2007 at the Tokyo Air Control Center.
Fig. 2. Situation of air traffic in Case 1
3.1 Case 1 In the first case six aircrafts bound for the Narita airport were about to enter Sector T03 from the north almost at the same time (Fig 2). The controllers have to hand off
Analysis of Team Communication and Collaboration in En-Route Air Traffic Control
761
traffic approaching the Narita approach at FL150 with a separation of more than 10NM in trail. Both the radar and the coordinator controller recognized the situation of the six aircrafts independently from the flight plans and display on the radar screen. The controllers communicated each other to share situation awareness and the spacing strategy to be used. This process is shown in the protocol of Table 1. They discussed verbally in what order the descending aircrafts should be lined up in trail. From the contents of communication and the positions and speeds displayed on the radar screen, they shared understanding of the situation. In this early stage, since the aircrafts were still flying within the neighboring sector, T02, the controllers of Sector T03 were not allowed to control them directly. They wanted, however, to control the aircrafts before they would enter Sector T03 to achieve smooth spacing. The coordinator proposed to request early hand-off from Sector T02 and its proper timing. In this process the ATC team decided the arriving order of the six aircrafts and requesting early hand-off of four aircrafts among them from Sector T02. At the end of the process the coordinator started to request early hand-off of JAL3042 to the coordinator of Sector T02. Table 1. Team communication in Case 1 (14:45-14:47) Speaker Radar Coordinator Radar Radar Coordinator Radar Coordinator Coordinator Coordinator Radar Coordinator Radar Radar Coordinator
Contents of communication Consideration on the order of arrival watching the radar screen Proposal of early hand-off from Sector T02 Presenting intention to keep vertical separation with an aircraft bound for Haneda Consideration on the order of arrival of the aircrafts coming from northeast Presenting opinion on the order of arrival Agreement to the coordinator’s opinion, but presenting concern about change in speeds Speed monitoring and proposal of 15NM separation in trail Confirmation of the 4th and 5th aircraft to arrive Rearrangement of flight data strips according to the agreed order of arrival Statement that they should already start spacing if 15NM separation is to be adopted Suggestion of early hand-off from Sector T02 Request coordination for early hand-off of three aircrafts Presenting intention on control strategy Presenting decision to request hand-off of JAL3042
3.2 Case 2 Having finished coordination of the early hand-off with Sector T02, the coordinator performed his own part of ATC tasks by sharing situation awareness with the radar controller. Here the coordinator drew attention of the radar controller to the traffic that might interfere the aircrafts under control. In addition, they discussed the means of control instruction. The radar controller informed of his own thinking process to
762
K. Furuta et al.
the coordinator before issuing control instructions to the pilots. This action contributed to share understanding of situation between the controllers. Since wind would affect flight paths of the aircrafts in this situation, the controllers had to adjust prediction of the future watching the situation. Sharing the strategy and the means to be used is very important for smoothly achieving such adjustment, because delay in issuing control instructions will cause subjective difficulties of ATC tasks such as workload increase. In this process the coordinator proposed coordination with Sector T02 on heading and altitude of the aircrafts under control, and the communication was the process to form consensus on the overall spacing strategy. Thereafter the radar controller issued control instructions to the pilots following the agreed strategy, and the coordinator conducted coordination with Sector T02. Having finished individual tasks separately, they achieved smooth spacing of the target aircrafts. From a viewpoint of distributed cognition, sharing of situation awareness and spacing strategy as well as proper division of tasks among team members led to successful achievement of ATC tasks. Table 2. Team communication in Case 2 (14:48-14:51) Speaker Radar Coordinator Radar Coordinator Radar Coordinator Radar Coordinator Coordinator Radar Coordinator Radar Coordinator Radar
Contents of communication [Having heard coordination of hand-off of aircrafts flying in southern area of JAL3042] Acknowledgement Drawing attention to the aircrafts to be kept in mind Identification of an aircraft with no relation to the targets Concern about change in speed Consideration on the mean of separation Proposal of heading request to Sector T02 Disagreement of the proposal explaining relation with another aircraft that might interfere Agreement Proposal to change heading of ANA736 Agreement of directing ANA736 to AY (Kumagaya) Proposal of destination and tentative altitude Agreement Confirmation of tentative altitude Agreement
3.3 Case 3 Having been allowed to instruct all of the aircrafts to be spaced, the ATC team communicated to follow the changing situation so that they can maintain the common strategy of spacing. The communication shown in Fig. 3 represents this process. The radar controller requested the coordinator to carry out coordination with Sector T02 on the heading of an aircraft there to follow the strategy they adopted. The coordinator responded by starting the coordination immediately. Some changes in the means adopted for separation were necessary, but they responded properly by monitoring the situation and by communicating frequently.
Analysis of Team Communication and Collaboration in En-Route Air Traffic Control
Communication between radar and cordinator controller
14:51:43 (1 sec) Cordinator : Inform of details of coordination.
763
Communication with other sectors
14:50:55 From T03 to T02 : Request directing ANA736 to AY (Kumagaya) and chaging altitude to FL200. From T02 to T03 : Acknowledgement.
14:51:50 - (1 sec) Radar : Notice that there is just little room for modifying strategy. Coordinator : Agreement.
14:52:50 - (15 sec) Radar : Mental simulation of another option of strategy. Coordinator : Infer radar's simulation and consider possibility of the option. Radar : Confirmation that it is too late to adopt another strategy. Coordinator : Agreement.
14:53:05 From T01 to T03 : Request hand-off. From T03 to T01 : Acknowledgement.
Radar : Acknowledgement of contents of coordination.
14:53:35 - (11 sec) Radar : Request coordination with Sector T02 to guide ACA001 to the west side of the route. Coordinator : Confirmation of appropriate heading.
14:53:58 - (6 sec) From T03 to T02 : Request heading ACA001 to 240. From T03 to T01 : Acknowledgement.
Fig. 3. Sequence of communication and coordination in Case 3 (14:50-14:54)
4 Discussion We revealed the role of a coordinator controller and features of team work in en-route ATC from data obtained by ethnographic field observation.
764
K. Furuta et al.
It is the most important task for smooth collaboration among an ATC team to establish TSA on air traffic in the target sector. If it is possible to obtain TSA from information in the environment that is observable by the both team members independently, such as the radar screen and flight data strips, verbal communication is unnecessary for obtaining TSA. Such a case, however, applies just to relatively simple situations. In more complicated situations such as shown in the case study, more active communication is used to develop TSA where one will explicitly state his/her own awareness on the situation. Once the team has established TSA, they usually decide task assignment implicitly and then carry on to task execution. A radar and coordinator controller distribute tasks between them in this phase, and it seems that they have agreed on task allocation beforehand in case of complicated situation or heavy traffic. It suggests that the Naturalistic Decision-Making (NDM) [5] model can apply to the distributed cognitive process in team collaboration of the controllers. In NDM, recognition of situation directly leads to decision-making for the recognized situation without assessing and comparing many options for the decision. A routine, which is the cognitive model of a radar controller, plays an important role also in the NDM by an ATC team. It is because both a radar and coordinator controller, who are trained and qualified in common, are well aware of the same routines. In summary, a model of team collaboration for en-route ATC can be well described that an ATC team develops TSA based on routines shared in common and then they makes a decision following the NDM model based on the obtained TSA.
5 Conclusion We analyzed the data obtained by field observation at the Tokyo Air Control Center based on the cognitive model of a radar controller and a notion of TSA. Consequently, it has been shown that an ATC team tries to shape and share TSA actively by verbal communication and that they decide task assignment between team members smoothly and implicitly once they have established TSA. From these findings, team cognitive process of controllers can be modeled as TSA development and NDM. Acknowledgements. The authors wish to express their thanks to the Japan Railway Construction, Transport, and Technology Agency who supported this work under the Program for Promoting Fundamental Transport Technology Research.
References 1. Inoue, S., Aoyama, H., Kageyama, K., Furuta, K.: Task Analysis for Safety Assessment in En-Route Air Traffic Control. In: Proc. 13th Int. Symp. on Aviation Psychology, Oklahoma, USA, pp. 253–258 (2005) 2. Hollnagel, E.: Human Reliability Analysis, Context and Control, pp. 220–228. Academic Press, London (1993)
Analysis of Team Communication and Collaboration in En-Route Air Traffic Control
765
3. Artman, H., Garbis, C.: Situation Awareness as Distributed Cognition. In: Proc. Euro. Conf. Cognitive Ergonomics, pp. 151–156 (1998) 4. Shu, Y., Furuta, K.: An inference method of team situation awareness based on mutual awareness. Int. J. Cognition, Technology, and Work 7(4), 272–287 (2005) 5. Klein, G.: The Recognition-Primed Decision Model: Looking Back, Looking Forward. In: Zsambok, C.E., Klein, G. (eds.) Naturalistic Decision Making, pp. 285–292. Lawrence Erlbaum, Mahwah (1997)
Comparison of Pilot Recovery and Response Times in Two Types of Cockpits Vishal Hiremath1, Robert W. Proctor2, Richard O. Fanjoy3, Robert G. Feyen4, and John P. Young 3 1
Gulfstream Aerospace Corp., 500 Gulfstream Road, M/S B-16 Savannah, GA 31407, USA 2 Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907, USA 3 Department of Aviation Technology, Purdue University, 1401 Aviation Drive, West Lafayette, IN 47907, USA 4 Department of Mechanical and Industrial Engineering, University of Minnesota − Duluth, 1305 Ordean Court, Duluth MN 55812-3042 [email protected], {rproctor,rofanjoy,jpy}@purdue.edu, [email protected]
Abstract. According to general aviation manufacturers, all aircraft rolling off the assembly line are or will be equipped with next-generation electronic flight instrument cockpits, called ‘glass’ cockpits. Because most pilots were trained with older analog displays, it becomes imperative to find out what human factors issues the pilots will encounter when they transition to glass displays. A comparative study was carried out in a general aviation aircraft simulator between instrumentation of the type used in conventional and glass cockpits for recovery from unusual attitudes. Glass displays showed longer recovery time than round-dial displays. Low-time pilots judged analog displays as more usable than glass displays. Suggestions are made to design a hybrid display of round dial and vertical tapes as well as examine unusual attitude training methods more closely. Keywords: glass cockpits, displays, digital displays, aviation, analog displays.
1 Introduction Accident rates for advanced-technology aircraft are lower than for comparable conventional aircraft. Nevertheless, pilots, scientists, and aviation safety experts have expressed concerns about flight deck automation: Pilots may place too much confidence in automation, they may lose manual flying skills, and pilot-automation interfaces may be poorly designed [1]. Recent accidents involving advanced-technology aircraft have served to emphasize those concerns. An example of automation error that has implications for the general aviation industry bears noting – the American Airlines accident in Cali, Columbia [2, 3]. A programming error in the aircraft flight management system (FMS) led the aircraft M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 766–775, 2009. © Springer-Verlag Berlin Heidelberg 2009
Comparison of Pilot Recovery and Response Times in Two Types of Cockpits
767
automation to mistake one navigation beacon for another. This caused a change in heading that went unnoticed by the crew, resulting in impact with the side of a mountain. Unlike many general aviation pilots, these professional pilots flew often and received recurrent training in the use of their equipment. This accident was found to be the result of an aircraft FMS interface error and lack of pilot situational awareness – problems often experienced in commercial and general aviation [4]. Research has suggested that problems in airline pilot transition to glass cockpits may also apply to general aviation aircraft [5]. Between 70% and 80% of aviation accidents can be attributed, in part, to human error [6]. This error frequently reflects pilot confusion due to ambiguous cockpit displays or misinterpretation of automated flight modes. In recent years, research into airline cockpit design has served to make displays more usable. More recently, computerized cockpits with advanced-technology displays have become popular in general aviation aircraft. General aviation is dominated by pilots who fly for recreation, with most having little flight time compared to airline pilots. As new glass cockpits are introduced to general aviation, there are lessons to be learned from similar equipment in airline category aircraft. A particular area where pilots have had problems with glass cockpit instrumentation is the vertical tape display used to show airspeed and altitude. Collins [7] illustrates examples of different kinds of instrumentation such as vertical tapes and graphical navigation displays. During initial training, most pilots used airspeed indicators and altimeters with round-dial presentations. When they transition to more advanced aircraft with linear-tape displays, they may find it difficult to adapt to such displays. The present study compared performance of general aviation pilots when using linear-tape and round-dial displays to identify differences in performance with the two display types.
2 Problem Statement Advances in aircraft instrumentation have led to highly integrated instrument displays. Prior research suggests that general aviation pilots may have difficulty with the interpretation of advanced displays, which can lead to a lack of situational awareness [4]. The problem addressed by the present study was how pilots respond to new glass displays during an emergency such as upset recovery, after learning to fly in older airplanes. This research was conducted in response to increasing sales of integrated glass cockpits in general aviation. Prior research into advanced displays of transport aircraft suggests safety concerns for general aviation counterparts. In this study, we used flight scenarios of unusual attitudes to evaluate this difference in performance, since accident reports show that a majority of fatal general aviation accidents are caused by flight from Visual Meteorological Conditions into Instrument Meteorological Conditions, with resulting spatial disorientation [8]. When pilots lose visual cues they must rely on their instruments to maintain safe flight. Avionics companies claim that simpler, user-friendly displays help pilots maintain situational awareness. The present research addresses that claim.
768
V. Hiremath et al.
The current study analyzed whether training in unusual-attitude recovery and basic flight instrumentation will help pilots cope with advanced-technology cockpits. The method was based on those used in several previous studies. Beringer et al. [9] described a method of analyzing different displays during recovery from unusual attitudes and conducted a multivariate analysis to determine whether displays superimposed with terrain images had an effect on pilot performance. Their project evaluated the time for recovery, which ends at the time instant of recovery (the time when the airplane reaches a level flight attitude) plus three seconds. The additional three seconds is to confirm a steady recovery state. The present study emulated this convention. Liggett and Venero [8] tested nine military pilots for response time in three different types of displays during recovery from unusual attitudes. The three displays were a standard 2-dimensional Primary Flight Display (PFD), PFD with an auditory display, and a PFD with a 3-dimensional display. Casner [10] describes an experiment to investigate usability and training issues when teaching students to fly GPS approaches. His methodology used two groups of pilots to fly scenarios using two different learning techniques. The instructor occupied the right seat and controlled the experiment, a procedure used in the present study. Pilots were put into an unusual attitude by the experimenter, from which they had to recover. This was done with both simulated glass and analog displays. The prediction was that time to the initial response (response time) and time to recover (recovery time) would be longer when flying the simulated glass display than with the analog display. This prediction was based on human factors literature mentioned in the above paragraph suggesting that glass displays are not the best way to convey information.
3 Method A research platform was developed and housed in an aircraft shell. Two computers were networked to run in parallel with two monitors connected to each system. This network allowed flight simulation software in both computers to be linked together with identical software and depended on the network feature of X-plane to move data quickly between the computers. Each computer was connected to two monitors by a cable that splits the signal from each computer to display the image on both monitors. One computer was configured to display the instrument panels, and the other to display the external view (see Fig. 1). For the external view, two 19-in. flat CRT monitors were mounted outside the shell, just outside the forward windows. External images were spread across the two monitors to present the simulated outside world visible from the cockpit. For the in-cockpit view, two 17-in. monitors were fitted into the instrument panel, one on each side of the cockpit, to present the simulated electronic and analog displays. Participants were presented with simulated emergency situations requiring recovery from unusual attitudes. Two main forms of data were gathered: (1) pilots’ performance while flying the simulator to recover from the unusual attitudes; (2) information gathered from the pilots regarding their perceptions of the ease and usability of the different instrument configurations. These data were obtained through a post flight questionnaire.
Comparison of Pilot Recovery and Response Times in Two Types of Cockpits
769
Fig. 1. Simulator display set-up
The independent variable was type of cockpit display: (1) conventional round-dial (analog) instrumented display or (2) new glass cockpit display with vertical tape altitude and airspeed representation. Recovery time and response time were dependent variables. Recovery time was defined as the time taken by a pilot to recover the aircraft from the simulated unusual attitude to a safe attitude. A safe attitude was attained when the aircraft had returned to level flight and was stabilized at target airspeed. If the aircraft was within +/-5° of bank, +/- 5° of pitch and 90 knots +/-5 knots, the aircraft was considered to be in a recovered state. Recovery time was captured from the data output page generated by the flight simulator software. As soon as all three conditions defining a safe state were satisfied, the corresponding elapsed time was noted and recorded as the recovery time for that trial. Response time was defined as the time taken by the pilot to make the first response, either with the control yoke or the throttle. Response time was captured by flight simulator software that calculated the rate of change of flight controls. Eighteen flight students (12 male and six female) who had earned their private pilot license but had not received instrument rating were recruited as participants. Their mean flying time was 119 hrs, and average age was 21 years. The order in which the participants received the simulated events during the flight profile was randomized so that participants would not be able to anticipate the sequence of maneuvers. All participants flew the scenarios with both displays. Half the participants flew the analog cockpit first and the glass cockpit second, and half flew the glass first and analog second. The specific experimental group to which the participants were assigned was determined randomly. Each participant was exposed to four unusual attitude scenarios: nose high/wings level, nose low/wings level, nose high/steep left bank and nose low/steep right bank. The scenarios were representative of unusual attitudes experienced by general aviation pilots during loss of control events. Each attitude consisted of pre-established pitch (+/-30°) and bank (+/-45°) angles selected by the researcher. Predefined tolerance limits for the upset configuration were +/- 5° pitch and +/- 5° bank. If the aircraft was within these limits it was considered established in the simulated unusual attitude. Initial trials were carried out to validate simulator operation, gather initial data to estimate participant standard deviation, and refine the methodology. Five flight
770
V. Hiremath et al.
students volunteered to test fly the simulator and offer feedback. Their responses indicated that the control sensitivity needed to be adjusted. The fidelity of the device was validated by four flight instructors who test flew the simulator to determine how well it replicated flight performance parameters. The Director of Safety of the Aviation Technology Flight Department, an experienced pilot and instructor, flew the simulator and gave feedback on the system. He expressed satisfaction with the handling of the simulator, but noted its limitations with regard to feel and motion. The flight instructors who test flew the simulator agreed that it was adequate to simulate real airplane behavior, consistent with the software manufacturer’s statement that the aerodynamic flight model of the airplane is similar to a real aircraft. Based on the feedback, the simulator software and controls were refined and adjusted to improve similarity to real airplane performance. The initial trials were conducted in the exact same way as the actual experiment was done. Data were collected, and the mean and standard deviation of recovery times were calculated. These were then used to calculate sample size for the main experiment. Participants in the pilot study had, on an average, 200 more hours of flight time than those in the main experimental group. Other factors, like age range and educational experience, were similar. A questionnaire, which asked for demographic data about total flying hours, gender, and age, was given to participants after they completed the simulator trials. Participants were also asked to describe the procedure that they would use to recover from an unusual attitude. This was followed by six questions for which participants responded with a rating of 1 (easy) to 5 (difficult): how easy it was to recognize present (a) airspeed and (b) rate of change of airspeed with a vertical tape airspeed indicator and with a round dial indicator, and how difficult it was to recover using each display indicator. Finally, four open-ended questions asked what features of the glass display they liked and disliked, which design features they would modify, and thoughts about the display color coding that was used. 3.1 Experimental Protocol Step 1. Simulator operation was explained to the participants, who then flew the simulator for 10 min to get familiar with its operation. During this time, they were asked to climb 500 ft with full power and level off. Then they were asked to do a left 30° banking turn to a heading of 90° from the original heading and recover to a wings level state. They were then asked to perform a right 30° banking turn to a 90° heading from the original heading and descend 500 ft with idle power and level off. Finally, a stall entry and recovery was performed. These practice maneuvers were repeated once with each display. Step 2. After the practice flight, the researcher began the trial and asked the participant to accelerate to a speed of 90 knots. He then asked the participant to put on an instrument flying hood provided by the researcher. The participant was asked to release the yoke and locate an airfield on a map. While the participant was looking for the airfield, the researcher took control of the yoke and put the aircraft in an unusual attitude. Step 3. As soon as the aircraft reached the predetermined upset attitude, the researcher released the yoke, which returned to its neutral position. Initial trials indicated that
Comparison of Pilot Recovery and Response Times in Two Types of Cockpits
771
there was a slight change in aircraft attitude when the yoke returned to the neutral position but that it stayed within the predefined tolerance levels (+/- 5° pitch and bank) for the scenario. At this point the researcher said, “Recover to level flight and 90 knots.” This indicated that the participant was to look up, perceive his present attitude, and begin the recovery. At the time the researcher alerted the participant, recovery time measurement was initiated. If the participant did not respond immediately to the instructor, the trial was rerun. Step 4. As soon as the participant brought the aircraft within the recovery envelope (+/- 5 degrees of pitch and +/- 5 degrees of bank and 90 knots +/- 5 knots), the researcher terminated the recovery time measurement. An electrical toggle switch hooked to the simulator was utilized in order for the software to capture the recovery time parameter and make it available on the software output page for analysis. Data Analysis. Recovery time was obtained by subtracting the time when the aircraft was recovered from the trial start time. Response time was obtained by calculating the rate of change of yoke and throttle. The first significant change in flight control movement time minus the initial trial start time was the response time.
4 Results Mean recovery times for the two display designs in the four flight scenarios are presented in Fig. 2. There was neither a main effect for scenarios nor interaction with display design, Fs(3, 136) < 1.43, ps > .24. However, the main effect of display design was significant, F(1, 136) = 7.977, p < .05, indicating that the pilots recovered in shorter time for the analog cockpits across all unusual flight situations.
Recovery times (seconds)
Mean Recovery Times 40 35 30 25 20 15 10 5 0
analog glass
0
-45
45
0
30
-30
30
-30
Flight scenarios
Fig. 2. Mean recovery times (bank +/-30°; pitch +/-45°)
The mean response time as a function of the two factors is presented in Fig. 3. The ANOVA did not indicate any significant effects on response time, Fs < 1.
772
V. Hiremath et al.
Response Time (seconds)
Mean Response Times 1.2 1 0.8 analog
0.6
glass
0.4 0.2 0 0
-45
45
0
30
-30
30
-30
Flight scenarios
Fig. 3. Mean response times (bank +/-30°; pitch +/-45°)
For the questionnaire, there was no significant difference in responses to the questions of which airspeed indicator was easier to interpret, although there was a tendency for the round dial display to be judged as easier (M = 1.5) than the tape indicator (M = 2.0), p = .11. For rate of change of airspeed, the difference between the analog and glass airspeed indicators was significant, p < .025: Analog dials (M = 1.5) were rated as easier than glass tapes (M = 2.3). Although round dials tended to be rated as better for judging airspeed and airspeed trend, for the questions about how easy or difficult it was to recover, there was no significant difference between the round-dial (M = 1.78) and linear-tape (M = 1.67) displays. Analysis of the open-ended questions showed the following. List recovery step by step. The three major recovery components mentioned by the participants were throttle adjustment, level wings, and return to flight attitude. 89% of the respondents mentioned throttle adjustment, and 94% mentioned recovery to level wings and pitch to safe attitude. Determining spatial orientation before initiating recovery action was mentioned by 44% of respondents. What features did you most like about the glass cockpit? The feature the respondents liked was the relatively large attitude and direction indicator, also known in industry as the coast-to-coast artificial horizon because it stretches from one end of the display to the other. This instrument makes it very easy to recognize attitude and was listed by 77% of the respondents. Other significant items mentioned were the precise values of airspeed and altitude obtained from the digital tape readouts (38% of the respondents conveyed this). What features did you not like about the new display? The vertical airspeed tape was disliked by 38% of respondents. They mentioned that it is difficult to spot an airspeed trend when the tape was moving, unlike a round dial. About 33% of the respondents said that they did not like the vertical altitude tape because they were unable to view the whole altitude range. They also found it difficult to differentiate between up or down movement. What do you think of the color coding used? Overall, most respondents were satisfied with the color coding.
Comparison of Pilot Recovery and Response Times in Two Types of Cockpits
773
What design features do you want changed or modified? A design change was needed, in the vertical airspeed and altimeters tapes, according to 38% of the respondents. Their opinion was that it is hard to get a value in a quick scan, which makes it difficult to get trend information quickly. This is because, in a round dial display the pilot gets an idea of his or her airspeed just by looking at the position of the needle. For example, if the needle is in the 3:00 position the airspeed is low. This quick information is not provided on a vertical tape. Another interesting recommendation of participants was to design a display that grows in size as airspeed increases. This is in the form of an inverse trapezoid where the width at the top increases as airspeed increases, which might help in easier identification of speed. Some participants also found it difficult to read the altimeter and suggested that it be marked in hundreds of ft so that they would get a better sense of the approaching altitude.
5 Discussion and Recommendations Analysis of the flight simulation data indicated that recovery times were significantly longer for the glass cockpit display than for the round-dial display, which suggests that glass cockpit displays may not as beneficial for pilot performance as has been claimed. Because pilots are qualified to fly glass cockpits without extensive training, more attention is needed towards training or a modification of the design to allow for easier operation. There was no statistically significant difference in response times for all flight scenarios. A possible explanation for difference in recovery time for the pitch down scenarios might be elevated stress levels when pilots realize that they were flying towards the ground and therefore directed less attention to desired airspeed. It may be that in an emergency situation like an upset, pilots find it harder to read tape displays than analog dials. Analog dials seem to be easier to read than glass displays because the position of the needle relative to the whole range of numbers in the airspeed indicator can be picked out at a quick glance. In the glass display the whole range is not visible, so, to get an idea of the airspeed, the pilot has to focus longer on the numerical readout to perceive the airspeed. A majority of respondents included the essential functions or steps that needed to be accomplished during an unusual attitude recovery. All participants completed most of the aircraft attitude upset recovery steps. None of the participants crashed the simulated airplane. Most airlines use a recovery technique that includes adjusting pitch, roll rate, thrust settings and leveling the aircraft [12]. Most of the study participants mentioned these steps in their responses. However, there was no particular order or consistency in the steps. This suggests that training does not emphasize the order of the steps as much as the general recovery. The sequence of recovery may be an interesting issue for further investigation. There was a significant difference between the two types of displays for questions involving the ease of airspeed trend recognition. This difference suggests that participants had a greater problem comprehending changing airspeed with the vertical tape than with the round dial airspeed indicator. Human factors literature also suggests that moving tapes are not the best way to display rate of change of information. There is a need to redesign this aspect of the glass cockpit. At the medium-level altitude used in this study, trend comprehension may be relatively unimportant but during critical phases
774
V. Hiremath et al.
of flight like approaches, landings, takeoffs and emergencies, trend comprehension might mean the difference between safe activity and a mishap. The majority of respondents liked the big and colorful indication of the glass cockpit ADI or artificial horizon that gave them a good indication of their situation awareness. They also liked the digital readouts of the glass cockpit tapes but did not like their reduced ability to convey trend information. This suggests that current glass cockpit flight displays enhance situational awareness over older versions. Tape displays received a mixed response from pilots who liked the exact information regarding airspeed and altitude but disliked poor trend information. A redesign of glass displays is needed to improve access to trend information that leads to improved safety and situational awareness. The most common suggestion by participants was that the vertical tape airspeed and altimeters should be redesigned to benefit low-time pilots transitioning from round dial instrumentation. One solution is to combine precise readouts of tape displays with trend information depicted on round dials in a hybrid display that uses both features. The resulting display can be color coded to show the approximate speed (like in a pie chart) with a digital readout beneath the display (Fig. 4). A suggested design is for the airspeed tape to be converted to a round dial display with a pink color to indicate present airspeed. This presentation gives the pilot an idea of the current airspeed at a glance. If the exact value is needed, the pilot can refer to the box beneath the display for precise readout. The outside circumference of the airspeed dial will be labeled with numbers like in traditional cockpits. We also recommend that a closer look be taken at unusual attitude and upset training for light airplanes. The fact that no two participants gave the same order for upset recovery suggests that training does not emphasize the order in which to recover. It might be a good idea to test participants trained in the exact order versus participants trained to recover the aircraft without particular attention to the order and compare their performance. Future studies can investigate the following issues. Does age or gender have an effect on transition from analog to glass displays? Is there any difficulty in reverse transition back to gauge displays in general aviation? This is a potential problem since future pilots will learn to fly on glass but older airplanes will always be around. What is
Fig. 4. Redesign recommendation
Comparison of Pilot Recovery and Response Times in Two Types of Cockpits
775
the effect on performance during approach, landing or takeoff? In addition, newly developed small jets are being introduced with a flight management system. Like newer piston singles, they have more complicated navigation displays that require a lot of display programming to call up checklists and set navigation waypoints. With the advent of datalink in commercial aviation, it is only a matter of time before that enhancement adds further complexity to general aviation. Further research is needed to investigate the difference in recovery times with pitch down scenarios. There is also a need to investigate whether a redesigned display will improve performance and be accepted by future pilots.
References 1. Funk, K.H.: Cockpit Task Management: Preliminary Definitions, Normative Theory, Error Taxonomy, and Design Recommendations. Int. J. of Aviation Psychology 1, 271–285 (1991) 2. NTSB Abstract: Safety recommendations concerning the American Airlines 757 accident near Cali, Columbia, December 20, 1995 (1996), http://www.rvs.uni-bielefeld.de/publications/Incidents/DOCS/ ComAndRep/Cali/NTSB/COPY/961001.htm (downloaded Feburary 22, 2009) 3. Gerdsmeier, T., Ladkin, P., Loer, K.: Analyzing the Cali Accident with a WB-Graph. Paper presented at the Human Error and Systems Development Workshop. Glasgow, UK (1997) 4. Adams, C.A., Hwoschinsky, P.V., Adams, R.A.: Analysis of adverse events in identifying GPS human factors issues. In: 11th International Symposium on Aviation Psychology (2001) 5. Lyall, B., Niemczyk, M., Lyall, R., Funk, K.: Flightdeck Automation: Evidence for Existing Problems. In: Proceedings of the 9th Int. Symp. on Av. Psychol., Columbus, OH (1997) 6. Shappell, S.A., Wiegmann, D.A.: The Human Factors Analysis and Classification System– HFACS (DOT/FAA/AM-00/7).Virginia Office of Aviation Medicine: Washington, DC (2000) 7. Collins, R.: How to Avoid Deadly Distractions (2005), http://www.flyingmag. com/article.asp?article_id=522&print_page=y (downloaded Feburary 7, 2005) 8. Liggett, K., Venero, P.: The Effects of Helmet-mounted Display Symbology on Flight Performance. In: Proceedings of Human Machine Systems Symposium, Atlanta, GA (2004) 9. Beringer, D.B., Ball, J.D., Brennan, K., Taite, S.: The Effect of Terrain-depicting Primaryflight-display Backgrounds and Guidance Cues on Pilot Recoveries from Unknown Attitudes. In: Proceedings of the 13th Int. Symp. on Av. Psychol., Oklahoma City, OK, pp. 46–51 (2005) 10. Casner, S.M.: Flying IFR with GPS: How Much Practice is Needed? Int. J. of Applied Aviation Studies 4, 81–97 (2004) 11. Proctor, R.W., Young, J.P., Fanjoy, R.O., Feyen, R.G., Hartman, N.W., Hiremath, V.V.: Simulating Glass Cockpit Displays in a General Aviation Flight Environment. In: Proceedings of the 13th International Symposium on Aviation Psychology, Oklahoma City, OK, pp. 481–484 (2005) 12. Schlimm, K.: Unusual attitude recovery: Reacting quickly in an over-banked situation, http://www.avweb.com/news/airman/190089-1.html (downloaded July 18, 2005)
Information Requirements and Sharing for NGATS Function Allocation Concepts Nhut Tan Ho1, Patrick Martin2, Joseph Bellissimo2, and Barry Berson2 1
California State University Northridge (CSUN), Dept of Mechanical Engineering 2 CSUN Dept of Human Factors and Applied Psychology 18111 Nordhoff Street, Northridge, CA 91330-8348, USA {nhuttho,patrick.martin.855,joseph.bellissimo.51}@csun.edu
Abstract. To support the evaluation of feasible function allocation concepts for separation assurance systems, and to develop a better understanding of the specific information requirements for key tasks (resolving conflicts, avoiding weather, and merging and spacing), air traffic controllers and commercial pilots were interviewed for their goals, sub-goals, and the individual and shared information needed to perform the tasks. The key information requirements obtained can be used as input to ascertain which information is most needed for probing when measuring individual and shared situation awareness. The elicitation also provided insights into the interaction among the controllers, pilots, and automation, and their perception of the concepts’ feasibility. Keywords: information requirements, NGATS, allocation function, shared situation awareness, automation interaction.
1 Introduction In the next two decades, the air traffic in the National Airspace System (NAS) is projected to increase two to three fold. To accommodate this growth, transformations in the air traffic management (ATM) architecture and operational concepts have been proposed for implementation in the NAS under the Next Generation Air Transportation System (NGATS) by 2005 [1], [2]. Some of the new NGATS key capabilities include: 1) trajectory-based operations (TBO) that enable users to dynamically assess changes in four-dimensional trajectories and allocate the resources to meet their demands; and 2) advanced net-centric and shared situational awareness (SA) systems for providing and sharing real-time weather, traffic, and flight information among all users. TBO implementation assumes that automation will take on a larger role in managing real-time operations, and that air traffic control shifts from tactical control of individual aircraft to strategic management of traffic flow and separation, while tactical separation assurance may be delegated to pilots or to airborne or ground-based automation systems. The implementation of these shared SA systems offers controllers and pilots a common picture of the operational information necessary for them to perform their allocated roles and responsibilities. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 776–785, 2009. © Springer-Verlag Berlin Heidelberg 2009
Information Requirements and Sharing for NGATS Function Allocation Concepts
777
To transition to TBO operations it is crucial to determine the appropriate function allocation among the controller, the pilot, and automation in the flight deck and on the ground, and to determine what information should be shared between the controller and the pilot so that they can acquire and maintain sufficient shared situation awareness without being overloaded with information. These issues were preliminarily explored in this paper through the identification of the most relevant individual and shared information requirements (IR) for pilots and controllers for three specific tasks, and their interaction with each other as well as with automation for three viable function allocation concepts that are currently being investigated by NASA researchers [3],[4]. The results of this effort can be used as input to develop measurements of individual and shared situation awareness by probing the information that is most relevant to specific tasks, and to design experimental simulation studies that evaluate the viability of the function allocation concepts. The rest of the paper is organized as follows. The next two sections describe the three function allocation concepts and the method used to elicit information from subject matter experts. The last section discusses the key findings and recommends ways to incorporate them into future studies and system design concepts.
2 Function Allocation Concepts Three function allocation concepts that are currently being investigated by NASA and academic researchers were used as the context to elicit information from subject matter experts. These concepts were designed with a human-centered approach that allocates different functions (via roles and responsibilities) and workload levels among pilots, controllers and the automation in the flight deck and on the ground. Depending on the function allocation, the interaction between the controller and the pilot will be different, providing a rich area to study the sharing of SA information. These are the common assumptions for the three concepts: 1. All aircraft have the capability to communicate and exchange information with ATC through Controller Pilot Data Link Communications (CPDLC) and Automatic Dependent Surveillance Broadcasting (ADSB). 2. All aircraft have a cockpit situation display (CSD) on board integrated with a route assessment tool (RAT) and a 3D-weather display [5]. Using the RAT, the pilot can manually make changes to the trajectory to avoid conflicts and weather. 3. There are two groups of aircraft operating in the airspace: a) trajectory flight rule (TFR) aircraft have the capability to detect and resolve conflicts; and b) managed rule aircraft (MMR) are managed by ATC and do not have conflict detection capability. 4. Both TFR and MMR aircraft are involved in merging and spacing operations. 5. The ground and airborne auto-resolver system uses the NASA Advanced Airspace Concept (AAC) [6], [7] algorithm for detection and resolution of conflicts more than 12 minutes from loss of separation (LOS), and the Tactical Separation Assisted Flight Environment (TSAFE) algorithm for avoidance of conflicts less than 3 minutes to LOS. The auto-resolver tools on the ground and in the flight deck do not take weather into account; thus, pilots must ensure all resolutions are weather free.
778
N.T. Ho et al.
6. To resolve a conflict, the TFR pilot can use either the airborne auto-resolver on board to generate a conflict-free resolution and check for weather-free, or the RAT to do the same tasks. Similarly, the controller can use either the ground-based auto-resolver to generate resolutions and check for weather-free, or the manual trial planning tool to do the same tasks. 7. Rules of the road: TFR aircraft, when in conflict with MMR aircraft, are burdened to resolve the conflict. MMR aircraft are managed by ATC. For 3 minutes to LOS, TSAFE generates conflict-free resolution for all aircraft and informs the controller. While these assumptions constitute an over-simplification of the system and its operation, they offer an effective means to explore the different levels of roles and responsibilities that can be allocated to controllers, pilots, and automation systems. Concept 1: Shared Separation Assurance between ATC and Flight Decks. In this concept, the separation assurance responsibility is shared between the flight deck and ATC as shown in Figures 1a-1b.
Fig. 1a. Flight deck is responsible for resolving conflicts 12-15 minutes to LOS between TFR and TFR aircraft, and TFR and MMR aircraft
Fig. 1b. ATC is responsible for resolving conflicts 12-15 minutes to LOS between MMR and MMR aircraft
Concept 2: Separation Assurance by ATC with Delegation to Ground Automation. In this concept (shown in Figures 2a-2b), ATC is responsible for the separation assurance for all aircraft; however, to ease workload, ATC delegates the ground autoresolver to resolve conflicts between TFR and TFR aircraft, while ATC resolves conflicts between TFR and MMR aircraft, and between MMR and MMR aircraft.
Information Requirements and Sharing for NGATS Function Allocation Concepts
779
Fig. 2a. ATC is responsible for separation but delegates conflict resolution to ground autoresolver to resolve conflicts 12-15 minutes to LOS between TFR and TFR aircraft
Fig. 2b. ATC is responsible for resolving conflicts 12-15 minutes to LOS between MMR and MMR aircraft, and between MMR and TFR aircraft
Concept 3: Separation Assurance by Ground Automation. In this concept (shown in Figures 3a and 3b), the ground auto-resolver has the responsibility for resolving conflicts between TFR and TFR aircraft, and between TFR and MMR aircraft. ATC is responsible for resolving conflicts between MMR and MMR aircraft.
Fig. 3a. Ground auto-resolver is responsible for resolving conflicts 12-15 minutes to LOS between TFR and TFR aircraft, and between TFR and MMR aircraft
Fig. 3b. ATC is responsible for resolving conflicts 12-15 minutes to LOS between MMR and MMR aircraft
780
3
N.T. Ho et al.
Elicitation Method
The following three steps were used to develop the scenarios and tasks for the pilot and controller, and to elicit individual and shared IRs. 1. Create scenarios to depict the concepts and the three key tasks: Three key tasks formed the basis for developing the scenarios: 1) resolving aircraft conflicts; 2) avoiding weather; and 3) merging into a flow and spacing behind another aircraft. The scenarios, created and documented in video clips and pictures, describe the task flow diagrams (shown in Figures 1-3), the role and responsibility allocations, the interaction between the controller and the pilot, and the information and automation tools (e.g., AAC, TSAFE, CSD) available to perform the three tasks. In addition, the operation was assumed to take place in the Kansas City (ZKC) and Indianapolis (ZID) centers and Louisville International airport (KSDF) TRACON, with 36 aircraft (twice the current traffic level) flying a trajectory that goes through two high altitude sectors in ZKC, one high altitude and one low altitude sectors in ZID, the KSDF TRACON, and lands in KSDF airport. 2. Compile an information requirements list tractable for the elicitation: A list of IRs, shown in Tables 2 and 3, was compiled by extracting IRs documented in previous studies [8],[9],[10], which identified goals, sub-goals, and their associated individual and shared IRs for the commercial aircraft pilot and the air traffic controller. We selected the IRs to include in the list by their relevance to the three tasks, and asked the SMEs to provide any additional IRs that were not on the list. 3. Elicit SMEs in individual sessions and in a group session to obtain the information requirements: Two active airline pilots and two retired ATCs participated as subject matter experts (SME). The two ATP-rated pilots had an average of 8000 flight hours and the two retired controllers had an average of 37 years. All SMEs were familiar with both the ground and airborne automation tools and advanced displays (i.e., AAC, TSAFE, CSD, RAT) through their participation in past studies at NASA. In the individual sessions, the SME were interviewed for four hours and were given an introduction to the objective of the study and a detailed explanation of each function allocation concept (via task flow diagrams, pictures, and videos). Then, for each function allocation concept, the SME articulated their goal and sub-goal(s) for the tasks, identified any missing IRs in the provided list, and rated the level of necessity of the IRs (the rating scale is from 0 to 6, with 0 being not necessary, and 6 being absolute necessary). The SMEs were also queried as to: a) how the IRs would change with each concept; b) their preference between using the automated auto-resolver versus the manual tools; and c) their perception of safety, efficiency, and workload for the concepts. In the group session, the pilots and controllers were brought together and were queried as to: a) how they share information and work together to resolve a conflict, avoid weather, and merge aircraft; b) the extent to which the controller takes into consideration the objectives of the pilot, and vice versa; and c) whether the controller wants to be informed of TSAFE avoidance maneuvers.
Information Requirements and Sharing for NGATS Function Allocation Concepts
781
4 Results and Discussion The goals and sub-goals for the three tasks in each concept are presented in Table 1 below. There was good agreement between the two pilots even though they fly for different airlines. The two controllers were also in good consensus. However, for the same high-level goals, the sub-goals of the pilots and the controllers have more differences than similarities. For instance, for the conflict resolution task, the pilots based their decision on the impact of the trajectory change on factors such as the fuel consumption, the time to destination, secondary conflicts, and passenger comfort. The controllers, on the other hand, based their decisions on factors such as the effect that the trajectory change has on the entire traffic flow, additional conflicts induced, and workload. Thus, while the controllers and the pilots shared common goals at the highest level, the controllers’ motivations are system-centric, while the pilots are aircraft-centric. This was further reinforced in the group interview session in which the controllers confirmed that they rarely take into account the sub-goals that are important to the pilots, while the pilots indicated that they prefer the controllers to take into account as much as possible the sub-goals that are important to them. The individual IRs for pilots and controllers, and the shared IRs, shown in bold, are listed in Tables 2 and 3. In general, across all concepts, altitude, heading, and speed were reported to be extremely necessary information, even in concept 3 where automation is responsible for handling the resolutions. These results ascertain which information is most needed for the three specific tasks, and should be useful in determining the most relevant information that should be probed when measuring individual and shared SA. Some additional observations were noted regarding the feasibility of the concepts. First, for concept 1, the controllers identified the delegation of separation responsibility to TFR aircraft reduces performance monitoring (workload reduction) ; however, while assessing the costs and benefits of flight changes, traffic management negotiated with automation brought about more concern with the number of aircraft in the sector, handoffs, traffic inflow, and altitude limits in this concept than in concepts 2 and 3. Second, in concept 2, ATC requested shared information about aircraft state, conflicts, and planned changes when managing traffic. This request is consistent with their comments that they perceived this concept to be more operationally complicated and workload intensive. This is because for situations in which the controllers have to step in to resolve a conflict, they thought they would not have enough time to react because a significant amount of time is used up for the pilot to determine a weather- and conflictfree resolution, for the ground auto-resolver to verify that the resolution is not conflict free. Thus, concept 2 was perceived to be the least favored of the three concepts due to the time delays. Third, in concept 3, when the ground automation is delegated merging and spacing management, the pilot necessity ratings for the information about their aircraft and other aircraft were very low. This implies that the pilots’ awareness can be low when they are not directly engaged with the merging and spacing operation, and that the pilot may be over reliant on the automation. In the group session, the pilots and controllers indicated that maintaining continuous communications and exchanging information are key to developing a collaborative relationship and for effective decision making. But it was interesting to note that although the pilots preferred to share their sub-goals with the controllers, the controllers indicated that such sharing is not feasible because it would increase their
782
N.T. Ho et al.
workload. Hence, the pilots and controllers agreed that controllers do not always take into account pilots’ sub-goals, and, as a result, pilots may have to fly routes that are non-optimal. It is also interesting to note that in situations such as concept 1 where the TFR pilots do not have to interact or share information with the controllers, the controllers indicated that they would like to be aware of the TFR trajectory changes so that if the TFR aircraft should merge with MMR aircraft, they can anticipate and plan for it. Finally, pilots and controllers indicated that the IRs do not change significantly with the concepts, but the priority of the information. These results suggest that when one measures shared SA, care must be taken to ensure that the shared information being probed is relevant to the intention of the human operators. Table 1a. Pilots Goals and Sub-goals Across Function Allocation Concepts Concept 1 Task1: Resolve aircraft conflict
Task 2: Resolve weather conflict
Task 3: Merging & Spacing
Concept 2
Concept 3
Goals
Aware of conflicts near and far Avoid secondary conflict
Avoid conflict
Avoid conflict
Subgoals
Fuel management Time management Manage alt/head Communicate with ATC
Fuel management Passenger comfort Communicate with ATC
Fuel management Passenger comfort Communicate with ATC
Goals
Resolve conflict
Resolve conflict
Resolve conflict
Subgoals
Minimize weather impact on passenger comfort Fuel management
Minimize weather impact on passenger comfort Fuel management
Goals
Spacing Avoid creating conflicts Fuel management Time management
Spacing
Fuel management Minimize weather impact on passenger comfort Spacing
Subgoals
Fuel management Time management
Fuel management Time management
Table 1b. ATC Goals and Sub-goals Across Function Allocation Concepts Concept 1
Concept 2
Concept 3
Task1: Resolve aircraft conflict
Goals
Separation
Separation
Separation
Subgoals
Reduce workload: Communicate less
Reduce workload: Communicate less Sector traffic
Fuel levels for sequence priority Reduce sector workload
Task 2: Resolve weather conflict
Goals
Separation Weather avoidance
Separation Weather avoidance
Separation Weather avoidance
Subgoals
Weather details (ie.temps,percip,wind )
Communicate Monitor weather
Goals Subgoals
Separation Fuel management Sequence priority Know a./c (speed,alt,hdg)
Communicate less Reduce sector workload Know vertical speed Spacing Fuel management Sequence priority
Task 3: Merging & Spacing
Spacing Fuel management Sequence priority
Information Requirements and Sharing for NGATS Function Allocation Concepts
783
Table 2. Pilot Information Requirements Pilot Information Requirements Concept 1 Weather M & S
Traffic
Concept 2 Weather M & S
Traffic
Concept 3 Traffic Weather
Current Ownship state ID Heading Speed Ver. Speed Alt. Attitude Immediate destination Route
3.5 4.5 4.5 4.5 6 4 1.5 5
1.5 6 5.5 5 6 5 2 4.5
2.5 5.5 5.5 3.5 6 4.5 2.5 4.5
3 5.5 5.5 4 6 5 3.5 5
3 5.5 5 4 6 4.5 4 4.5
3 5 5 3 5.5 5 4 5
3 5 5.5 4 5.5 4.5 3.5 4.5
3.5 5.5 5.5 5 6 4.5 3 4
Ownship Planned Changes Heading changes speed changes Alititude Changes Route Changes
2.5 2 2.5 2.5
0 0 0 0
0 0 0 0
2 2 3 2.5
2 2 2.5 2.5
2 2 2.5 2.5
5 5 5.5 4.5
5 5 5.5 4.5
Future Ownship State Future Hor. Pos. Future heading Future Speed Future ver. Speed Future altitude Future destination Future route
3 5 4.5 2.5 5 5 5
2 5 5 4.5 5 3.5 4.5
2.5 5 5 3.5 5.5 3 4
2 4.5 4.5 3.5 5 4.5 5
2 5 5 4 5.5 4.5 4.5
2 5 5 3.5 5.5 4.5 4.5
2 5 4.5 3 5.5 4.5 5
2 5 5 4 5.5 4.5 4.5
Cost/benefit of change in Lateral Flight Vertical flight Speed profile Holding vs. diverting level of automation
5.5 5.5 5 5.5 0
4 4 4 5.5 3.5
3.5 3.5 3 4.5 0
5 5 5 5 0
3.5 3.5 3.5 5 4
3 3 3 4.5 0
4.5 4.5 4.5 5 0
3.5 3.5 3 5 3
Other Aircraft State ID Hor. Pos. Heading Speed Vert. Speed altitude Immediate destination Route Priority
3.5 3 5.5 5.5 5 6 3 4.5 5
0 0 0 0 0 0 0 0 0
4 2.5 5 5 2.5 5.5 3.5 4 3.5
4 2.5 5 5 3 6 4 4 4
0 0 0 0 0 0 0 0 0
6 3 5 5 3.5 6 3.5 4 4
4.5 2.5 5 5 3.5 6 2.5 4 4.5
0 0 0 0 0 0 0 0 0
Other Aircraft Future State Future Hor. Pos. Future Heading Future Speed Future Vertical speed Future Altitude Future Immediate Destination Future Route Future Priority
2.5 5.5 5.5 4.5 5.5 3 4.5 5
0 0 0 0 0 0 0 0
2 5 5 3 5.5 3.5 4.5 3.5
2 4.5 4.5 3 5.5 3.5 4 4
0 0 0 0 0 0 0 0
2.5 5 5 3 5.5 3 3 4
2.5 5 5 3.5 5.5 3.5 4.5 5
0 0 0 0 0 0 0 0
Weather Location of Weather cells Altitudes affected Temperature Dewpoint Precipitation level Precipitation type Visibility Ceiling Wind direction Wind speed Wind rate of change Wind altitutdes Wind gusts Wind crosswind component Conf. in weather cond. Timeliness of info path of min. weather exposure Distance to weather areas Bearing to weather areas
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 5.5 3.5 1.5 4.5 5 5 5 5.5 5.5 4.5 5 5.5 4 5.5 5.5 5.5 5.5 5.5
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 5.5 4 3.5 5 5 5.5 5.5 4.5 4.5 4.5 4.5 5.5 3 5 5 5 5 5
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 6 4 3.5 5.5 5.5 5 5 5.5 5.5 4.5 5 5.5 3.5 5.5 4.5 5 5.5 5.5
784
N.T. Ho et al. Table 3. ATC Information Requirements ATC Information Requirements Concept 1 Weather M & S
Traffic
Concept 2 Weather M & S
Traffic
Concept 3 Traffic Weather
Aircraft States ID Hor. Pos Heading Speed Ver. Speed Altitude Im. Destination Route
3 5.5 5.5 5.5 0.5 3 3 5.5
3 3 6 3 0.5 5.5 3 6
3 2.5 6 6 2.5 6 3 3
3 2.5 5.5 5 1 5.5 3 5.5
3 2.5 6 3 3 6 3 5
3 2.5 5.5 6 3 4 3 4.5
3 1.5 4.5 4.5 0.5 5.5 2.5 1.5
3 2.5 5.5 2.5 1.5 6 3 5.5
Aircraft in Conflict Time to loss of sep. Distance to collision Pos. of Aircraft Time until man. Req. Closure rate perf. cap.
6 5.5 3 4.5 4 4
5 5 5.5 2.5 0.5 4
2.5 2 5.5 4.5 1.5 5.5
6 3 3 6 6 5
5.5 3 3 6 5.5 5.5
5.5 3 6 3 3 6
4 1 2 4 4.5 4
5 1.5 4.5 2.5 1.5 5
Aircraft Planned Changes Heading Changes Speed changes Altitude Changes Imm. Dest. Changes of ownship Route changes
5.5 5 6 3 5.5
6 3 6 0 6
6 6 6 0 3
5.5 4.5 6 0 3
6 3 6 0 6
6 6 3.5 0 5.5
5 1.5 4.5 0 2.5
6 3 5.5 0 6
Future State of Aircraft Future Hor. Pos. Future heading of Future Speed Future ver. Speed Future altitude Future Route
2.5 5.5 5 1.5 6 5.5
2.5 6 3 0.5 6 6
2.5 6 6 2.5 3 6
2.5 5.5 5 1 6 5.5
2 6 5 2.5 6 6
2 5.5 6 3 3.5 6
1.5 4 4.5 1.5 4.5 2
2.5 6 3 2.5 6 5.5
Cost/Benefits of Change degree of change from route amount of coordination req. #of a/c in sector hand-offs inflow sector capacity sector saturation #of potential conflicts pilot intentions flight plan of all a/c affected by confli restricted airspace boundaries alt. limits
4.5 4.5 6 5.5 6 6 6 6 5 3 3 3 6
3 2.5 6 2.5 3 3 6 6 5.5 6 3 3 3
4 4.5 5 3 6 6 6 6 4.5 3 3 6 3
3 6 3 2.5 2.5 6 6 5.5 5.5 3 3 3 3
3 5.5 6 3 5 6 6 6 6 5 3 3 3
3 4.5 3 4.5 3 6 6 5.5 5 6 3 5 3
2.5 4 2 1 1 5.5 5.5 4 5 1.5 2.5 5 2.5
5.5 3 5 3 3 5.5 5.5 6 5 5.5 3 5 3
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 6 1 0.5 3 3 3 6 5.5 5.5 5.5 6 6 6 4.5 5 5.5 3 3
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 6 2 2 2.5 3 2.5 5.5 5 5 2.5 5 3 3 4.5 4.5 6 5 2.5
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 5.5 2 0.5 6 5 2 5 3 5 2 5.5 3 3 5 6 5.5 5.5 5
Weather Formation Avoidance Location of Weather cells Alt. affected Temperature Dewpoint Prec. Level Prec. Type Visibility Ceiling Wind Direction Wind Speed Wind rate of change wind altitudes wind gusts wind crosswind component Conf. in weather info Timelines of info Path of min. weather exposure Dis. To weather areas Bearing to weather areas
Information Requirements and Sharing for NGATS Function Allocation Concepts
785
The pilots and controllers also provided input as to their interaction with the automation. In general, the SMEs believed that trust must be developed with the automation tools, otherwise these concepts would not be feasible. In the situations with three minutes to LOS and TSAFE has to generate conflict avoidance maneuvers, both the pilots and controllers felt strongly that a good form of automation design and function allocation would have TSAFE notify the controllers of the resolution to prevent the aircraft from getting into more conflicts and prepare to insert the aircraft back into the flow. In addition, during high-pressure the pilots and controllers preferred to use the auto-resolver (airborne or ground) over the manual RAT tool, making the automation tools become more critical. In future simulation studies when the concepts are evaluated, it would be fruitful to compare these perceptions and observations provided by the SMEs against the simulation’s quantitative data. The comparison would help identify discrepancies or agreements, and provide further insights into the feasibility of the concepts. Acknowledgement. The authors would like to thank Dr. Walter Johnson, Mr. Vernol Battiste, Mr. Quang Dao of NASA Ames for the development and clarification of the function allocation concepts.
References 1. Joint Planning and Development Office: Next Generation Transportation System: Concept of Operation V 2.0. Washington D.C: Government Printing Office (2007) 2. Next Generation Air Transportation System (NGATS) Air Traffic Management (ATM) Airportal Project, http://www.aeronautics.nasa.gov/nra_pdf/ airportal_project_c1.pdf 3. McNally, D., Gong, C.: Concept and Laboratory Analysis of Trajectory-Based Automation for Separation Assurance. AIAA-2006-6600, AIAA Guidance, Navigation and Control Conference and Exhibit, Keystone, CO, August 21-24 (2006) 4. Prevot, T.: Controller-in-the-Loop Evaluation of Ground-Based Automated Separation Assurance. In: NASA Airspace Systems Technical Interchange Meeting (2008) 5. NASA Ames Flight Deck Display Research Laboratory, http://human-factors.arc.nasa.gov/ihh/cdti/cdti.html 6. Erzberger, H.: Automated conflict resolution for air traffic control. In: Proceedings of the 25th International congress of the aeronautical sciences (ICAS), Hamburg, Germany, September 3-8 (2006) 7. Erzberger, H.: Transforming the NAS: The next generation air traffic control (2004) 8. Endsley, M.R., Farley, T.C., Jones, W.M., Midkiff, A.H., Hansman, R.J.: Situation Awareness Information Requirements for Commercial Airline Pilots (ICAT-98-1). Massachusetts Institute of Technology International Center for Air Transportation, Cambridge (1998) 9. Endsley, M.R., Jones, D.G.: Situation awareness requirements analysis for TRACON air traffic control (TTU-IE-95-01). Lubbock, TX: Texas Tech University (1995) 10. Endsley, M. R., Rodgers, M. D.: Situation awareness information requirements for en route air traffic control (DOT/FAA/AM-94/27). Washington, D.C.: Federal Aviation Administration Office of Aviation Medicine (1994)
HILAS: Human Interaction in the Lifecycle of Aviation Systems – Collaboration, Innovation and Learning David Jacobson1, Nick McDonald2, and Bernard Musyck3 1
DCU Business School, Dublin City University, Dublin 9, Ireland [email protected] 2 Aerospace Psychology Research Group, Trinity College Dublin, Ireland 3 School of Economic Sciences & Administration, Frederick University, 1036 Nicosia, Cyprus
Abstract. The aims of the paper are to describe a particular network in the European aviation sector, to explain what is innovative about this network and to describe ways in which the network may evolve in the future. The paper describes the current state of the literature on human factors in aviation and shows how HILAS partners collaborate to innovate in the field of human factors. The paper highlights to what extent the HILAS partnership is novel, and wherein specifically lies that novelty. The network is sectoral rather than locational. It is an inter-organisational, cross-national, intra-sectoral, virtual cluster of actors, brought together for the purpose of a particular innovative project. The paper is about both the network and the nature of its innovation. Keywords: Innovation, networks, human factors, risk management system.
1 Introduction The paper draws on fieldwork carried out in the context of a European Union-funded project called HILAS (Human Integration into the Lifecycle of Aviation Systems) [1]. The aim of the project is to develop a model of good practice for the integration of Human Factors (HFs) across the lifecycle of aviation systems. The idea is to improve flight safety through the integration of HF knowledge into all aviation related activities. This will reduce the risk of human error and the danger that human error can pose for safety in the operation of aircraft. In substance, the project is a formal network focused on innovation in HF management and integration. The change process involves incorporating HF knowledge into innovation in general, whether in flight operations, maintenance processes, technological design, organisational or industrial change. Of the total of 40 partners in the project, most are industrial partners from each of these strands of the sector; there is a small number of university and research centre partners. HILAS is a project financed by the Sixth Framework Programme (FP6) of the European Commission. Like previous Framework Programmes its objective continues to be the development of a Europe-wide scientific community and the promotion of Europe’s international competitiveness through the strengthening of its scientific and technological base. The programme also aims at solving major societal questions and M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 786–796, 2009. © Springer-Verlag Berlin Heidelberg 2009
HILAS: Human Interaction in the Lifecycle of Aviation Systems
787
at supporting the formulation and implementation of other EU policies [2]. The programme’s expenditure will reach €€17.5 billion. This represents the third largest operational budget line within the EU’s overall budget after funds dedicated to the agricultural policy and funds promoting cohesion among the 27 countries of the block [3]. To reach the necessary critical mass at the EU level, FP6 introduced new instruments, such as networks of excellence and integrated projects, HILAS being an example of the latter. Among the seven targeted thematic priorities of FP6 was the one in which HILAS is situated, “aeronautics and space” (including “aircraft safety” and “increasing operational capacity and safety of the air transport system”). The HILAS project is expected to contribute to the European strategy for air transport (“European Aeronautics: A vision for 2020”) which in view of the substantial growth in traffic has set itself a target of an 80 percent reduction in aircraft accidents; ensuring effective and reliable human performance would be a key contribution to the projected accident reduction [4].
2 What Are Human Factors? The origin of the concept of human factors dates from World War II when psychologists were asked to investigate airplane accidents [5]. It was revealed that pilots had certain expectations of how things should work (how they could locate the landing gear and activate it, for example) and that these expectations were violated by aircraft designers [6]. So human factors were relevant in design; better design could reduce the likelihood of pilot error. Human factors is a systems-oriented discipline, which promotes a holistic approach in which considerations of physical, cognitive, social, organizational, environmental and other relevant factors are taken into account [7]. Most or all are of relevance in the study of safety in aviation. Human error is present in 100 percent of aviation accidents [8]. Errors can occur at the level of the maintenance of aircraft, their design or their operations. The major cause of all aviation accidents is pilot-error [9] though accidents are often a result of a chain of events in which the pilot is the last link in the chain. Traditionally, accident investigations led research to focus on finding factors related to pilot-error, rather than examining the chain of events and, more broadly, systemic factors. This was a reactive approach. Such an approach would be based on the statistical study of accidents. Pilot-error would typically be associated with environmental factors, aircraft factors, airline-specific factors and pilot-specific factors [9]. These traditional approaches have been challenged in recent years; as Maurino [10] puts it, ‘there is a need to attack causes rather than symptoms of safety deficiencies’. The author argues that: ‘Paramount to a revised paradigm is to distance from the allocation of blame those at the operational end of events in favour of the macro appraisal of the aviation system…’ [10] Thus all components of the aviation system have to be taken into account, not just those that failed. The approach must become more pro-active and systemic; it must be understood that where there is failure – for example by pilots in the context of their ‘normal’ activities – this is an expression or symptom of deficiencies of the deep foundations of the system [11]. A new safety paradigm must rest on the consideration of safety as a social construct in which human and organizational performance are
788
D. Jacobson, N. McDonald, and B. Musyck
inseparable from the contexts within which they take place. In short, ‘pro-action must replace reaction’ and conventional views on human error must be challenged [10]. Equipment is not ‘culture neutral’. Aircraft are designed with particular safety solutions in mind from a particular cultural perspective. The use of aircraft and their maintenance and of aspects of their design in different contexts may become problematic since there are cultural issues involved in the transfer of technology. Designs follow the originating cultural standards but users do not always share these. Thus one cannot expect to successfully export a safety solution to a different context without taking into account culture, social bias, perceptions of status, mental models and education. Different contexts generate specific problems and require culturally specific solutions [10]. A good illustration of the importance of cultural factors is offered by Braithwaite et al [8]. In their study of Australia’s reputation as a ‘lucky’ country in terms of aviation safety, they analyse the influence of culture at national, industry and organisational levels. At the national level, for example, they conclude that rather than luck, it is the high levels of individualism and ‘low power distance’ in Australia that are important. These result in junior crew members feeling unconstrained about pointing out errors on the flight deck. Elsewhere failure to cross-check actions in the flight deck is among the highest rated human factor problems in aviation. An aspect of culture at the enterprise or sectoral level is the relationship between management and pilots. Airlines in which there is a high level of co-operation between management and pilot associations enjoy fewer safety incidents and lower pilot turnover [12]. To sum up and following Reason [11], human factors studies need to focus on what makes organisations (and culture) relatively safe instead of focussing upon their moments of un-safety. Thus there is a need for safety investigations to go beyond accident investigation and to look at how positive human factor analysis can help us design much safer socio-technical systems.
3 What Is Novel about the HILAS System? HILAS is a formal, innovation-focused network. The project aims at the transformation of the aviation industry by improving flight safety through the integration of human-factor knowledge into all aviation-related activities. What this means is the reduction of risk of human error and the danger that this can pose for safety in the operation of aircraft. IPR and trust issues arise, particularly in relation to technological – mainly software – R&D and innovations in HILAS. Where necessary, such innovations will be protected by IPRs. The innovative – though non-technical – organizational and informational aspects of the HILAS project are just as important. Here it is less possible to protect such innovation. There is awareness in HILAS that sharing best practices and the subsequent improvements in safety are more desirable than the protection of such innovations. HILAS has therefore been able to develop a prototypical innovation cluster around the notion of human factors knowledge as a source of innovation in aviation systems. This cluster is illustrated in Figure 1. So far there are no national authorities or regulatory organisations involved in this cluster. This framework of collaboration has
HILAS: Human Interaction in the Lifecycle of Aviation Systems
789
Airline MRO Airline Knowledge services
MRO University
R&D organisation Maintenance organisation
Knowledge materials Access for experience Training and learning Sharing information Evaluation
Manufacturer University Software company
Manufacturer
Fig. 1. Structure of the HILAS Innovation Cluster
facilitated the sharing of knowledge and supported the implementation of HILAS processes across the different industrial partners, with a range of support from human factors and software development partners.
4 The HILAS System The HILAS project has developed an integrated set of processes for designing, managing and improving the people part of the aviation system. It has three components: 1. Management of operational performance, risk and change in flight operations and maintenance; 2. Human factors evaluation in cockpit system design; 3. Inter-organisational sharing, learning and innovation. 4.1 Operational Performance, Risk and Change Fig. 2 below describes the three processes – real-time operational support, tactical management and strategic management – that are linked in a collaborative learning system at the industry level. These processes are facilitated by a set of software modules (see Section 6, below). For example, in the real-time operational support process, time-critical risk information is fed into an intelligent flight plan; task support for maintenance operations is provided; and data for operation reports are gathered. In the tactical process reported events generate risk assessments, established channels for change, and routine system performance monitoring. At the strategic level, risk analyses lead to the identification of complex system problems, system change and evaluation. This integrated organisational system more than satisfies new regulatory requirements (ICAO SMS requirements to be translated into European regulation). It provides the basis for a resilient operational system, adapting and transforming itself to meet competitive and regulatory requirements. It also creates the capability to produce the data and knowledge which are critical to new integrated system design and development.
790
D. Jacobson, N. McDonald, and B. Musyck
Fig. 2. The HILAS System of Operational Support, Tactical and Strategic Management and Organisational Learning
4.2 Human Factors Evaluation in System Design A comprehensive set of human factor tools and methods supports human factors evaluation of new applications of technologies. These can be deployed either in a flight simulation environment or in operational settings. This fulfils EASA human factors requirements for certification. 4.3 Innovation and Learning between Organisations The HILAS Knowledge Management System is a collaborative exchange for all documentation. The KMS facilitates sharing of organisational experience and operational data, and is designed to manage training and advanced learning programmes. It currently supports the active collaboration and learning between partners. Establishing this collaborative learning framework is a major achievement of the project – creating a form of ‘social capital’ which can leverage innovation and change. Emerging exploitation plans are built around this framework continuing and intensifying after the project is completed.
5 The ‘Human Factor Support Tools’ How did HILAS evolve? To achieve the double task of improving their competitiveness through lean and cost-effective flight operations while at the same time enhancing safety and reliability, any new tool or methodology in aviation must encompass HF-related information to continuously improve processes. ‘Research
HILAS: Human Interaction in the Lifecycle of Aviation Systems
791
suggests that the design of such tools takes second place to continuous improvement behaviour itself. This involves a suite of behaviours, which evolve over time, rather than a single activity’ [13]. These behaviours cluster around several core themes, for example, the systematic finding and solving of problems, monitoring and measuring processes, and strategic targeting [14]. As an example of continuous improvement of behaviour in aviation, Cahill and Losa [15] show how the definition of new ‘task support tools’ for crew evolved within the HILAS framework, through a participatory process involving several airlines. Information was gathered in a variety of ways (including in-depth interviews, jump-seat observations and workshops) from participants working in flight planning, active flight operation (dispatch, cabin crew and maintenance), safety and quality. It is clear that the introduction of ‘task support tools’ involves changes to task practices and their overall process; it is not just a matter of installing a new piece of technology into the cockpit. The need for social, organisational and routine changes is understood outside HILAS, too. Rather than just reacting to accidents and incidents, and to regulatory changes, airlines are developing preventive safety-management approaches, including: scientifically based risk-management methods, a non-punitive culture of incident and hazard reporting, company commitment to the management of safety, the collection and analysis of safety-related data from normal operations and the sharing of safety lessons and best practices. A variety of risk/safety management systems, and associated monitoring and evaluation tools, have been developed in response to these needs. What had not been developed until recently was a system that will allow airlines to gather, integrate, analyse and communicate all airline information (commercial, operational and safety) in real time. The HILAS system offers this, supporting the airline’s safety strategy by providing analysis of information on safety and risk, and also improving the airline’s safety culture through better reporting and sharing of safety related information [15]. HILAS proceeded by defining four HF ‘tools’ or modules (A, B, C and D). These tools, the key technical innovations of HILAS, aim to help airline operators and maintenance companies understand why certain events take place and to draw conclusions and learn lessons that can lead to the design of new systems that take into account human needs and characteristics. These tools gather (A), store and integrate (B), facilitate analysis (C) and provide a new framework for organisation and change (D). Here we have space only to discuss A. Tool A concentrates on communication and information gathering. The tool offers task support, performance feedback and reporting capability. It can be used by flight and cabin crews as well as maintenance and ground operators. In substance, Tool A is the main tool for allowing operational crew (both in the air and on the ground) to communicate (including feedback and reporting) amongst themselves1. To perform these functions, various hardware devices can be used including Electronic Flight Bags (EFBs)2, workstations in the office, Personal Digital Assistants (PDAs) and 1
Tool A also has reporting functionality for roles other than crew, e.g. cabin crew and dispatch, and thus the tool is more than ‘flight-crew-centric’. The tool also includes an observerreporting facility [17]. 2 On EFBs, see [17], [15], [18], [19], [20], [21].
792
D. Jacobson, N. McDonald, and B. Musyck
mobile phones. Flight data will be recorded continuously during each flight and feedback will be provided to the crew if deviations from the norm are identified. In these cases – when reporting is mandatory – pilots will be able to report electronically instead of on paper as is the case today. The tool makes it possible for pilots to explain why their performance did not comply with the benchmark parameters. This means that in addition to providing raw data, the system aims to collect qualitative data, which could reveal ‘important latent conditions and human factors issues’ [16]. These crucial qualitative data may explain why certain flight parameters or (unsafe) events deviate from the norm. In turn, once this is done, it opens the way to establishing common causes for certain situations; procedural changes to tackle problems at the systemic level can then be developed. The tool also allows pilots to see, either immediately after the flight or later, what their performance was; this facilitates their own learning process (a kind of real-time benchmarking) [16]. The performance feedback is especially relevant when pilots are required to complete a mandatory HF report possibly linked to a safety-critical event discovered in the Flight Data Monitoring (FDM) where flight technical data are recorded [17]. The above process of data collection and benchmarking is also complemented with the ability to file optional reports electronically. In these reports, pilots can offer feedback, highlight weaknesses in operations, suggest known ‘workarounds’ or other, experience-based, constructive suggestions. Archived reports are also available (crews have access to prior reports) and can update and edit them and get information as to what the safety department is doing about the issues raised. This is an improvement in the information loop (reporting and getting feedback from the report). Tool B is essentially the server, and Tool C is a data analysis/mining and reporting tool; it offers risk analysis, process improvement and information flow analysis. Overall, Tool C is both an organizational and technology tool for safety and risk management. Tool D is not a technology tool but rather an overall organizational system that defines safety management, process improvements, procedures, roles, safety culture and responsibilities. Tool D defines the information flow logic for the other three tools and its users will include everybody in the airline.
6 Theoretical Framework There is a literature on systems of innovation – within the broad field of institutional economics – that helps to explain and evaluate many aspects of the HILAS project. The idea of a system of innovation was first applied to national economies in the late 1980s [22]. The concept evolved as analysts found it informed their study of innovation in regions (regional systems of innovation) and, recently, of innovation in sectors (sectoral systems of innovation – SSIs) [23]. The SSI approach offers a multidimensional, integrated and dynamic view of sectors in order to analyze innovation. The notion of SSI places great emphasis on knowledge, learning, interactions (through the market or outside markets) and institutions. Firms are active participants at the centre of a web of interactions that shape their technological and market environment.
HILAS: Human Interaction in the Lifecycle of Aviation Systems
793
We can consider HILAS to be an embryonic element of a sectoral system of innovation. Although there have been research plans involving specific participants, the actual networks that have evolved within the consortium have to varying extents been voluntary, cutting across the plans and strands of the project. What HILAS attempted is new and innovative; it has no precedent in the aviation sector. Among the concomitants of this is a great deal of uncertainty, with participants unsure about whether what they are doing is correct or appropriate. The theoretical backdrop of SSI provides some support for the direction that HILAS took and validation for the type of multi-networking around which the new operational system is evolving. What the literature suggests about the process of diffusion of organisational innovation is that there is a combination of relationship and operational factors that facilitate and accelerate this diffusion. As expressed by Liou and Liou [24], ‘Firms in clusters benefit from linkages (among firms, workers, financiers, and so forth) and spillovers as well as complementary assets in skills, technology, and economic information’. Among the key relationship factors is trust, which enhances the flows of information and the voracity of the knowledge among the participants. The absence of trust does not necessarily obviate these flows but in its absence other factors must be present. For example, contractual undertakings can, in some cases, substitute for trust. Operational factors that facilitate – or impede – innovation diffusion include regulatory regimes, supply chain interdependence, and technological standards. There have been high levels of trust, tight networking relationships, and progress in development of and participation in HILAS processes among sub-sets of the 40 members of the consortium. However, the whole system has not been implemented by all members of the consortium. Can this embryonic element of a system of innovation survive and evolve beyond the life of the EU-supported project? Among the possible evolutionary paths of the HILAS system is some form of commercialisation of the results. The literature on private, club and public goods [25] suggests ways in which this might occur. A private good is one that is “ownable”, i.e. a property right to it can be held. There are goods that are not ownable; such goods (or services) are known as public goods. A “club” good (or service) is one that is available only to members but, once a member, it can be freely used. The HILAS system is generated by a multiplicity of private and club goods. Small networks of firms in HILAS, clustering around centres of excellence, derive the benefit of the private knowledge – for example about risk management – of a leading partner. This small network now has a shared knowledge, or club good. When all club goods of the small HILAS networks are combined, they contribute to the HILAS system which then becomes a good (or service) available to all the members of HILAS. The HILAS club can, if the system is in some sense desirable to other, nonHILAS firms in aviation, sell it on. HILAS becomes an owner of a system that is sold, as a private good (or service) to non-club members. Alternatively, other firms can join the HILAS club and use the system as a club good. All this is contingent on the HILAS system being seen as desirable, both by the HILAS members and, as a consequence, by other, non-HILAS firms in aviation. The better – more attractive to other firms in the sector – the HILAS sub-sectoral system of innovation, the more likely it will be that the result will be commercially sustainable. Even if HILAS is seen by some to be an improvement, it will require a great deal of effort for its results to be sustained and developed.
794
D. Jacobson, N. McDonald, and B. Musyck
7 Description of the Commercial Future of HILAS The key HILAS competence is capability to use human factors knowledge as a source of innovation in the aviation sector. Three interlocking but distinct activities constitute a virtuous cycle for continuing to develop this capability (see Fig. 3): • Knowledge services support the interchange of process knowledge (including about risk), learning from others’ experience and sharing operational data (where it is advantageous to do so); • Training and education (particularly at Masters and Professional doctorate level) develops the capacity to absorb and use the knowledge services in a process of organisational development and change, and provides a framework which can support implementation and evaluation of such change. This is a significant knowledge gap in the industry. • Continuing research and development further develops and refines the cutting-edge knowledge that ensures that the knowledge services are constantly advancing the state-of-the-art. The research component of a professional doctorate can help ensure the maximum industrial impact of a continuing RTD programme (particularly where the in-house RTD capacity is currently very weak, as in the operational sector). Each corner of the triangle is, in principle, relatively independent but each potentially interpenetrates the others and supports and enhances their activities. Fourth Level education, for example, has a strong research component and can lead to the provision of knowledge services. Each part of the triangle contributes to a benign cycle of learning and development that could not happen so effectively with isolated initiatives. Some elements of this triangle are public, some are club goods, and some are private goods. Where firm-specific consultancy services are offered, these constitute private goods, paid for at market prices. Sharing data and the analytical possibilities
Research
Ex plo ra tio n/ ev alu ana atio lys is n /
Implementation support Capability development
t. vlp De
h& arc se Re
Knowledge services
Basic/applied research ic eg rat St
Industrial/ developmental research
4th.level MSc./Prof.D.
Business
Education Consultancy
Training
Human Integration into the Lifecycle of Aviation Systems
Fig. 3. Knowledge-Based Innovation
HILAS: Human Interaction in the Lifecycle of Aviation Systems
795
emerging from shared data are more appropriately developed as club goods. Some of the research – particularly on safety – has to be diffused as a public good, in the public interest. Overall, the elements of the triangle – public and private, governmental and corporate, educational and club based – will all enhance the sectoral system of innovation in aviation.
8 Conclusion Innovations, both in technologies and in ways of organising and operating, are essential for the objectives of increased efficiency and safety to be realised. A research, development and implementation programme has to address three aspects of this nexus: the design process; the operational system; and how the two can work together, managing knowledge, to transform the capability to deliver large scale social goals. Improvement in operational aspects of aviation should drive design, both of aircraft and of the organisation of aviation. This requires much more sophisticated data and analyses of the functional characteristics of operational systems. This is precisely the kind of operational data and knowledge which is required to manage and regulate large integrated operational systems in a proactive and strategic manner. Feedback up the lifecycle stages thus becomes an imperative despite competitive relationships – and despite prisoner’s dilemma impediments to such flows. However, this in turn can create the capability of integrating the design, operational and regulatory phases in a new manner – design for safety, security and environmental sustainability can become natural extensions of the design for operability and maintainability. Creating an integrated framework for design, management and regulation will be necessary to leverage the possibilities of system transformation to deliver radically improved performance across the range of demands. Creating an innovation process capable of transforming and transferring this knowledge across the system lifecycle (design, operations, maintenance and regulation) can synergise the development and transformation of such systems to meet these demanding and urgent social goals. HILAS and other European projects have provided a platform upon which to develop this trajectory: modelling and analysis methodologies; organisational systems for managing performance, analysing risk and managing change; human factors evaluation of new technologies and design for operability concepts; collaborative systems between organisations that create learning and innovation. This is only the first step in developing a sectoral system of innovation which can enable the aviation system to deliver the change in design, operations and regulation which society requires. Acknowledgments. The authors would like to thank the HILAS consortium members and the European Commission for funding this work.
References 1. Human Integration into the Lifecycle of Aviation Systems (2009), http://www.hilas.info/mambo/ (accessed March 13, 2009) 2. European Commission, The Sixth Framework Programme in brief, edition (December 2002), http://ec.europa.eu/research/fp6/pdf/fp6-in-brief_en.pdf (2007)
796
D. Jacobson, N. McDonald, and B. Musyck
3. European Commission (2004:9) Participating in European Research, 2nd ed., http:// ec.europa.eu/research/fp6/pdf/how-to-participate_en.pdf (accessed December 12, 2007) 4. McDonald, N. (ed.): HILAS Human Integration into the Life-cycle of Aviation Systems. A Proposal for the 6th European Framework Program, call Identifier: FP6 2003 – Aero-1, Submission date: 31st March 2004 (version July 10, 2004) 5. Wogalter, M.S., Rogers, W.: Human factors/Ergonomics: using psychology to make a better and safer world. Eye on Psi Chi 3(1), 23–26 (1998) 6. Ward, M. and McDonald, N. Human factors in aircraft maintenance – problems and possibilities, HILAS theoretical discussion paper, Trinity College Dublin (2006) 7. International Ergonomics Association, The discipline of ergonomics, (2000), http://www.iea.cc (accessed January 15, 2008) 8. Braithwaite, G.R., Caves, R.E., Faulkner, J.P.E.: Australian aviation safety – observations from the ‘lucky’ country. Journal of Air Transport Management 4(55), 55–62 (1998) 9. McFadden, K., Towell, R.: Aviation human factors: a framework for the new millennium. Journal of Air Transport Management 5, 177–184 (1999) 10. Maurino, D.: Human factors and aviation safety: what the industry has, what the industry needs. Ergonomics 43(7), 952–959 (2000) 11. Reason, J.: Managing Risks of Organisational Accidents, Avebury, Aldershot, England (1997) 12. Learmount, D.: Union recognition ‘is good for safety’, Flight International, December 511, p.13 (2006) 13. Bessant, J., Caffyn, S., Gallagher, M.: An evolutionary model of continuous improvement behaviour. Technovation 21, 67–77 (2001) 14. Cahill, J., et al.: HILAS Flight Operations Research. In: Proceedings of HCI International (2007) 15. Cahill, J., Losa, G.: Flight Crew Task Performance and the Design of Cockpit task support tools. In: Proceedings of the European Conference on Cognitive Ergonomics (2007) 16. Ulfvengren, P.: HILAS tools for continuous improvement in aviation? Mimeo, KTH, The Royal Institute of Technology, Stockholm, Sweden (2007) 17. Cahill, J.: Personal communication. Trinity College Dublin, Department of Psychology 6 (November 2007) 18. Learmount, D.: Lufthansa Systems tests Class 2 EFB on A340-600 Flight International, 18th edn., June 12–18 (2007) 19. Moores, V.: Approval nears for ATR 42/72 Class 2 EFB Flight International, July 17–23, p. 13 (2007) 20. Croft, J. Warning signals Flight International, July 10-16, pp. 30–33 (2007) 21. Hughes, D.: Finding yourself: Lower cost electronic flight bags, with own-ship position, may proliferate on airlines Aviation Week & Space Technology, June 18, pp. 158–62 (2007) 22. Jacobson, D.: The Technological and Infrastructural Environment. In: Nugent, N., O’Donnell, R. (eds.) The European Business Environment, Macmillan, London (1994) 23. Malerba, F. (ed.): Sectoral Systems of Innovation. Cambridge University Press, Cambridge (2004) 24. Liou, D.Y., Liou, J.D.: Knowledge Creation and Diffusion in Innovation Networks by System Viewpoint. In: IEEE International Conference, pp. 1950–1954 (December 2007) 25. McNutt, P.: Public Goods and Club Goods. In: Bouckaert, B., deGeest, G. (eds.) Encyclopedia of Law and Economics. Edward Elgar Publishers (1999)
Redefining Interoperability: Understanding Police Communication Task Environments Gyu H. Kwon1, Tonya L. Smith-Jackson1, and Charles W. Bostian2 1
Grado Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, 250, Durham hall, Blacksburg, VA, 24061 USA 2 Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 302, Whittemore hall, Blacksburg, VA, 24061 USA {ghkwon,smithjack,bostian}@vt.edu
Abstract. The goal of this research is to understand the concept of police communication environments related to interoperable issues. Interoperability is critical to both inter- and intra- organizational communication. Especially in the emergency operations with multiple groups at the same place, the importance of interoperability has been much appreciated. This study used semi-structured interviews to examine the police communication issues and reconceptualize interoperability in the police communication domain. Based on the interview, we identified three important concepts to specify interoperable groups. First, highly distributed decision making processes are problematic for multiple communication needs. Second, a police team is self-organized at the scene. Finally, their operational boundaries are tentative based on contextual information. Based on these main concepts, we provided high-level suggestions to design a police communication system based on cognitive radio technology. Keywords: Interoperability, Emergency Communication, Police, Cognitive Radio, Public Safety, Distributed decision making, Self organization, Contextual Information.
1 Introduction Communication systems for public safety workers have very important roles in both daily and emergency operations. They can be a link among inter-team members at the same place as well as a bridge among remote teams or other agencies [1]. Although their communication needs have dramatically increased, the limitation of communication technology and the lack of understanding the design space have confined the intrinsic task performance of police, especially, when new technologies are introduced. Since the cognitive radio concept was first presented by Mitola and his colleague [2], with the advance of technologies, cognitive radio has been able to scan available networks and give appropriate feedback to agents. Ideally, the full cognitive radio technology can consider every possible observable parameter by means of a wireless node or network in order to address interoperability issues [1]. Interoperability is critical to ensure the cooperation within a team or between different agencies in M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 797–805, 2009. © Springer-Verlag Berlin Heidelberg 2009
798
G.H. Kwon, T.L. Smith-Jackson, and C.W. Bostian
large operations such as unexpected simultaneous incidents, a massive public event control, and a natural disaster. However, the current concept of interoperability underscores the technical communication possibility among devices. A previous ethnographic study identified the limitation of the current concept of interoperability [3]. Thus, there is a strong need to reconceptualize interoperability considering organizational and operational aspects to enhance cognitive radio capabilities in the public safety domain. The main objective of this study is to specify the concept of interoperability in the context of police emergency communication. Based on this understanding we suggest a set of recommendations to design a public safety cognitive radio. In order to address these issues further, we examined police communication and investigated the needs and definitions of interoperability, reviewing cognitive radio as enabling technology. We conducted a series of interviews with the Virginia Tech (VT) police force. Finally, we suggested the implications for designing police communication system with cognitive radio technology based on the result of the interviews. 1.1 Context of Police Communication In general, the work categories of police are diverse, ranging from the simple patrol task to large-scale disaster management. Although there are some degrees of difference, all public safety work deals with emergency and life critical situations. Situations that are unknown and uncertain may include danger and may be considered risky and timepressured [4]. Even though the public safety workers are doing their daily duty, their particular goals and tasks emerge from the situations they encounter [5]. Ron Westrum [6] has categorized three different kinds of threats according to their frequency. Regular threats are those that occur often enough to trigger certain standard response procedures. In these cases, public safety workers can respond according to their standard training. They can fully identify and understand the situation from the initial dispatcher, and thus can handle the situation. Most of the communication in this phase may be done in normal police communication procedure. Irregular threats, on the other hand, represent more challenging situations for which the police may not be fully prepared. This type of event needs more collaborative work among agencies to resolve the problem in question. The last type of situation is the totally unexpected event; the terrorism of 9/11 is a prime example of such an event. In this case, public safety workers require a megashift in their mental framework. In the case of Hurricane Katrina, most of the special agencies did multi-tasking with heavy collaboration in order to save many lives. Their tasks and work environments restricted their communication alternatives to achieve shared goals given the situation. From all the above, we conclude that understanding the context and situation surrounding a particular threat should be the first activity in designing a public safety communication system. 1.2 Interoperability and Cognitive Radio From the Columbine High School massacre and Hurricane Katrina, to the recent April 16 VT massacre, many news media and official reports on those disasters indicated
Redefining Interoperability: Understanding Police Communication Task Environments
799
there were common needs for interoperability among agencies. The Congress of the United States also has stated their concern to resolve the interoperability issues that were exhibited during such national disasters as the tragedy of 9/11 attack [7]. Interoperability was originally defined as technological term [8]. Thus, related to technical issues that apply to specific types of hardware or systems. Alliance for Telecommunication Industry Solutions (ATIS) defines interoperability in five different ways that is also adopted by the American National Standard [9]. • The ability of systems, units or forces to provide services to and accept services from other systems, units, or forces and to use the services so exchanged to enable them to operate effectively together. • The condition achieved among communications-electronics systems or items of communications-electronics equipment when information or services can be exchanged directly and satisfactorily between them and/or their users. The degree of interoperability should be defined when referring to specific cases. • Allows applications executing on separate hardware platforms, or in multiprocessing environments on the same platform, to share data and cooperate in processing it through communications mechanisms such as remote procedure calls, transparent file access, etc. • The ability of a set of modeling and simulation to provide services to and accept services from other modeling and simulation, and to use the services for exchange enabling them to operate effectively together. • The capability to provide useful and cost-effective interchange of electronic data among, e.g., different signal formats, transmission media, applications, industries, or performance levels. These definitions deal with broad concepts of interoperability even though they do not include context-specific issues. Recently, in many conferences on technical interoperability among the various public safety entities, semantic interoperability (operational interoperability) has been emphasized. Even though the technically dream of Mitola has not yet been achieved, his concept has greatly impacted many aspects of public safety communication. However, The Department of Homeland Security (DHS) listed cognitive radio technology as one of most important communication technologies needed to solve the interoperability problem in the first national emergency communication plan in July 2008 [10]. Virginia Tech’s Center for Wireless Communications Team (CWT2) has developed core cognitive radio technologies as part of a National Institute of Justice research project. This cognitive radio architecture can scan available spectra, automatically sort channels and connect them. It is also capable of bridging two networks. To maximize these capabilities in the public safety domain, it is necessary to understand the contexts of communication tasks and how we can assure the quality and usability of such a device. To tackle these issues, the cognitive radio has knowledge domains including, user-, policy-, and security domains. The user domain should include organizational,
800
G.H. Kwon, T.L. Smith-Jackson, and C.W. Bostian
Fig. 1. A User Model in the VT Cognitive Architecture Model
operational, technological and environmental issues in the public safety domain. Figure 1 shows an architecture of a cognitive radio communication system that was proposed by Virginia Tech CWT2. For effective police communication, the user domain should be designed sophisticatedly considering the multiple aspects of police tasks. From this study, we can collect pertinent inputs for design requirements to develop a user domain in the part of public safety communication system.
2 Study Semi-structured interviews were conducted as an existential-phenomenological study that examines the knowledge gained from the participants’ experience [11, 12]. Since phenomenology as a branch of philosophy assumes that our perception is highly related with the objects perceived, and human epistemology is affected by the object of experience, this approach attempts to holistically understand the essence of human experience in context-specific settings [13]. In this study, we performed a series of interviews with a group of public safety workers with a view to focusing on the interoperability issues with the Virginia Tech Police Department (VTPD). Structured questions guided the outline of our discussion, followed by informal questions exploring in-depth points based on their specific experience, position and needs. All procedures--from the recruiting to the interviews--were conducted under the approval of the Virginia Tech Institutional Review Board. Since the subject matter of these interviews was very sensitive, based on the recommendations of the IRB, the interviewers did not record any video or audio data. Instead, at least two interviewers were involved in a single interview and each took notes.
Redefining Interoperability: Understanding Police Communication Task Environments
801
2.1 Participants The members of the Virginia Tech Police Department (VTPD) participated in the interviews. Six participants volunteered for this study. Before the main study, we interviewed a public safety worker as a preliminary interview, which provided us with the basic knowledge about public safety work domains. Although they had different roles in their regular organization, they performed various roles in the incident organization based on the situation. Their roles varied from the captain to the dispatcher to the patrol officer. We carefully recruited participants through a combination of telephone and email screenings. We called the chief of the VT police. With his approval, we contacted participants. Involvement in a large number of incidents was not a pre-condition for participating in the study. However, the people we interviewed had more than two years of experience in emergency medical service as well as had experienced at least one large incident. Each had highly utilized the current communication devices. Nevertheless, they did not have enough knowledge to customize their portable communication devices. 2.2 Method We developed three different types of representative task scenarios: daily work, massive public event control to highlight cooperative work, and supportive work for other agencies. They were generated based on the interview participants’ past experiences and plausible events to ensure ecological richness. These participants had a variety of experience and performed various roles in police organizations, thereby making it easy to probe organizational issues and draw upon plentiful experiences. Structured questions guided the boundaries of topics discussed, while open-ended questions were addressed to the participants based on their responses. Although our scenarios included administrative tasks, we focused on the core operational policing at the scene of the incident in question [4]. Each interview lasted about one and a half hours, during which time three prepared scenarios were described and the participants were asked 5- 10 questions per scenario, based on their operational task procedures. During an interview session, we explored four different perspectives of their main communication tasks: technology, information, organization, and operation.
3 Results The analyses were done by a systematic procedure using Atlas.Ti (Version 5.4). We set the five a priori codes and additional 13 codes emerged during the analyses. We adopted grounded theory for our analysis [14]. Table 1 shows the codes used in this analysis and their description; note: the codes emerged from the interviews. These codes came from the results of interviews and were related to overall communication problems. From two brainstorming sessions, we distilled three primary findings to define the concepts of interoperability: Distributed Decision Making, Self-organization, and Dynamic Operational Boundaries.
802
G.H. Kwon, T.L. Smith-Jackson, and C.W. Bostian Table 1. Codes and Description
Codes
Description
Blind Spots
Any difficulty with service with the two way radios; for example, being in the basement of one of the dorms and not being able to use the radio because there is no signal
Button Usability
Difficulty with buttons being too easy to press or too difficult to use
Design Limitations
Aesthetic problems with radios or other equipment; e.g., radios being too heavy, too small, ear pieces being uncomfortable
Dispatching limitations
Any hindering of communication from calls made by those calling 911 to the dispatcher and troubles with dispatching emergency personnel
Environmental
Problems in particular environments or service problems in an entire area
Functional limitations
Problems with a radio not performing functions that it was designed to perform
Informational
Any problems related with sending and receiving any type of information
Need for texting
Any situation in which written information would help clarify the communication
Noise Problems
Subset of environmental problems with the same meaning; noise in the area hindering communication over the radio
Operational
Any problems with the radio not working from task procedural issues in operational situations
Organizational
Trouble receiving or giving information because of organizational reasons
Presentation of information
Trouble understanding information in the way that it is presented to the user
Technology
Trouble with the hand-held radios or other equipment due to hardware problems
Privacy
Any information that can be seen or heard by an outside party that it was not intended for
3.1 Distributed Decision Making At the scene of an emergency, police officers’ decision-making is critical to helping save lives or prevent big disasters. Like any other police force, the VT police attempt to make quick decisions based upon the local information they can obtain. They try to get as much information as possible from all available sources. Dispatchers, people in
Redefining Interoperability: Understanding Police Communication Task Environments
803
vehicles, or people at the headquarters collect the information from the entire police database system, including data from other people in relevant places. At the scene, they sometimes make locally optimized decisions to response the situation. However, due to the limitations of their current radio communication systems, the police can only transfer highly abstracted information by voice. Thus, they frequently use alternative ways to expand their communication capacity. In our scenario cases, the police officers interviewed stated they have had trouble in transferring the information that they collect and in making decisions based on incomplete information. The intrinsic characteristics of police tasks and communication capacity have rendered their decision-making distributed. In addition to communication capability, there are many barriers to hamper collaboration: environmental issues such as blind spots or noise, organizational issues such as individual and organizational culture, and privacy issues. 3.2 Self-organization Public safety teams, especially, police teams, tend to self-organize at the scene. Their long-term and planned operations such as a football game day operation are performed by the formal organization. However, in some cases of emergency situations, their organizational structure emerges on the basis of physical and contingent constraints. For example, incident command teams consist of volunteers, firefighters and emergency medical service providers. It is not necessary that a decision maker should be the highest-ranked person in the situation. It depends on the situations. If the situation becomes more settled, the emergent organization reverts to the formal organization. 3.3 Dynamic Operational Boundaries Their operational boundaries are tentative based on contextual information. For example, on a football game day, when some 65,000 people have gathered at the same place, the public safety team is in charge of allocated sections. As usual, their operational boundary is defined by geographical constraints. However, if a large incident would occur, the operational boundary would change dynamically. In case of multiple incidents at the same place, police officers could be overloaded with the communication they are receiving and giving. Most information might not be relevant at all or it might be less important. In this case, it is necessary to set different networks according to each incident to avoid information-overload. For emergency cases, these officers operate more than two tactical channels simultaneously. However, sharing the information of tactical channels with other relevant personnel becomes problematic.
4 Implications Highly distributed decision-making processes are problematic for multiple communication needs. Decision-making in this context is geographically and organizationally distributed. It can be made by people at the scene, dispatcher centers, vehicles and
804
G.H. Kwon, T.L. Smith-Jackson, and C.W. Bostian
other control centers at different levels of organization. In every decision-making task, the communication group shares situational knowledge. Thus, we characterize the concept of interoperability as a unit of decision-making in terms of the joint cognitive systems [15, 16, 17]. Each unit needs to have enough capability to collect and transfer the information in order to yield a seamless decision-making process. The cognitive radio communication system should support the transfer of multi-modal data including texts, images or videos. In addition, device usability is important when the situation becomes increasingly complex. Second, the interoperability can be described in terms of self-organization characteristics: emergent, distributed and non-specific [18]. It is essential to present the organizational information in their communication devices in order to support effective communication. For example, people in the operation need to identify the person who is in charge of the case. It is critical, especially, in the case of multi-group operations. In addition, the functional unit also needs to be represented in their communication system to support direct communication between stakeholders [19]. Finally, the communication system should identify current operational boundaries. The police, for instance, use three or four communication networks for a single operation. In general, those communication networks support the communication within an organization. Yet, even though the police have tactical channels, they are not utilized much. The people at the operation need to know who is involved in this particular operation and how to contact them. Therefore, the communication system should present the boundaries of operation geographically as well as organizationally.
5 Conclusion Understanding these characteristics of interoperability gives opportunities to improve police communication task performance. We can describe an interoperable group in terms of decision-, organizational- and operational groups. The results can be directly used to display and prioritize the networks that combine interoperable groups in cognitive radio devices. In addition, the results enable us to build richer decisionmaking process models in public safety domain from the perspective of joint cognitive systems. Cognitive radio system can include these results as a user model in their cognitive architecture. They can also suggest ways of integrating and allocating functions and other information within multiple device communication environments. Finally, this research on interoperability can be used to identify the contextual requirements of safety-critical systems that interact with their respective environments. Acknowledgments. This work was supported by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice under Award No. 2005-IJ-CXK017, by the National Science Foundation under Grant No. CNS-0519959, and by DARPA under grant W31P4Q-07-C-0210.. The opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of these sponsors or the official policy or position of the DARPA, Department of Defense, or the U.S. Department of Justice or by the by the National Science Foundation.
Redefining Interoperability: Understanding Police Communication Task Environments
805
References 1. Smith, B., Tolman, T.: Can we talk: Public safety and the interoperability challenge. National Institute of Justice Journal 243, 16–26 (2000) 2. Mitola III, J., Maguire Jr., G.Q.: Cognitive radio: making software radios more personal. IEEE Personal Communications 6(4), 13–18 (1999) 3. Imran, A., Smith-Jackson, T.L.: Naturalistic Observation of the operational policing for massive public events. Unpublished Technical report, Virginia Tech, Blacksburg (2005) 4. Manning, P.K.: Policing Contingencies. University of Chicago Press, Chicago (2003) 5. Greene, J.: Community policing in America: Changing the nature, structure, and function of the police. Criminal Justice 3, 299–363 (2000) 6. Westrum, R.: A Typology of Resilience Situations. In: Hollnagel, E., Woods, D., Leveson, N. (eds.) Resilience Engineering: Concepts and Precepts. Ashgate Publishing, Ltd., Burlington (2006) 7. National Task Force on Interoperability, Why Can’t We Talk?: Working Together to Bridge the Communications Gap to Save Lives: A Guide For Public Officials. Technical Report (2003) 8. Federal Communication Commission, Tech Topic 1: Interoperability, http://www.fcc.gov/pshs/techtopics/tech-interop.html 9. Alliance for Telecommunications Industry Solutions, Interoperability ATIS Telecom Glossary (2007), http://www.atis.org/glossary/ 10. Department of Homeland Security, National Response Framework, http://www.dhs.gov/xprepresp/committees/editorial_0566.shtm 11. Patton, M.Q.: Qualitative evaluation and research methods, 2nd edn. Sage Publications, Newbury Park (1990) 12. Rossman, G., Rallis, S.: Learning in the Field: An Introduction to Qualitative Research. Sage Publications, Thousand Oaks (2003) 13. Thompson, G.J., Locander, W.B., Pollio, H.R.: Putting consumer experience back into consumer research: The philosophy and method of existential-phenomenology. Journal of Consumer Research 16(2), 133–146 (1989) 14. Glaser, B.G., Strauss, A.: Discovery of Grounded Theory. Strategies for Qualitative Research. Aldine Transaction, Chicago (1967) 15. Cannon-Bowers, J., Salas, E.: Reflections on shared cognition. Journal of Organizational Behavior 22(2), 195–202 (2001) 16. Hutchins, E.: Cognition in the Wild. The MIT Press, Cambridge (1995) 17. Hollnagel, E., Woods, D.D.: Joint Cognitive Systems: Foundations of Cognitive Systems Engineering. CRC Press, Boca Raton (2005) 18. Vicente, K.J.: Cognitive Work Analysis: Toward Safe, Productive & Healthy Computerbased Work. Lawrence Erlbaum Associates, Mahwah (1999) 19. Department of Homeland Security, National Incident Management System, Technical Report (2004)
Unique Reporting Form: Flight Crew Auditing of Everyday Performance in an Airline Safety Management System Maria Chiara Leva2, Alison Kay2, Joan Cahill2, Gabriel Losa1, Sharon Keating3, Diogo Serradas3, and Nick McDonald2 1
Iberia Airlines P.O. Box E-28042 Azi Ni Barajas - Edif., Operaciones 114 Madrid, Spain 2 Aerospace Psychology Research Group, School of Psychology, Trinity College Dublin (TCD) Ireland 3 Aircraft Management Technologies, Malahide, Dublin, Ireland
Abstract. This paper presents the proposed prototype for a Unique report form, which will constitute the basis for all operational and safety related reports completed by Flight Crew. This reporting form provides an opportunity for operational personnel to audit their own company’s processes and procedures and has been developed in collaboration with a major Spanish Airline as part of the Human Integration into the Lifecycle of Aviation Systems (HILAS) project. This research involved extensive fieldwork, including process workshops, task analysis and collaborative prototyping of new concepts. Traditionally airlines use performance monitoring tools to evaluate human performance and by implication their organizational/system safety. Feedback from these tools is used to direct improvements (re-design procedures, enhance training etc.). The Line Operation Safety Audit (LOSA) methodology constitutes the current state of the art in terms of performance monitoring. Building on this concept, end user requirements elicited were the main focus for the design of this reporting form. Keywords: Human Factors, performance monitoring, threat & error management, task support, safety management systems.
1 The Unique Reporting Form within the Context of a Safety Management Systems Framework Until recently, airline approaches to safety have reflected a reactive model (e.g. complying with regulatory requirements and prescribing measures to prevent the recurrence of undesirable events). Current models follow a more proactive safety management approach. According to the international civil aviation organization (ICAO), this is characterized by a number of factors [1] including: − The application of scientifically-based risk management methods − Senior management’s commitment to the management of safety − A non-punitive environment to foster effective incident and hazard reporting M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 806–815, 2009. © Springer-Verlag Berlin Heidelberg 2009
Unique Reporting Form: Flight Crew Auditing of Everyday Performance
807
− Systems to collect and analyze safety-related data arising from normal operations − Sharing lessons learned and best practices through active exchange of information From ICAO’s perspective, this is supported by the development of appropriate safety management systems (SMS), defining the required organizational structures, accountabilities, policies and procedures [1]. In this regard, most airlines have developed (or are in the process of developing) safety management systems in accordance with regulatory guidance. Currently, airlines use a range of paper and technology based tools to monitor and evaluate human performance (and by implication organizational/system safety). Feedback from these tools is used to direct system safety improvements (e.g. process/procedures re-design, enhanced training etc.). Traditionally, these tools have divided into two types: those that focus on gathering human performance information; using either self report or observer based methodologies (e.g. Air Safety Reports, Line-Checks and Line Operations Flight Training) and those that focus on gathering aircraft performance information (e.g. Flight Operations Quality Assurance). Crucially, these tools fail to provide a real-time picture of routine operations supporting predictive risk management [2]. The use of many discrete tools presents additional information management challenges. Much valuable data is gathered about the operation. Yet this data is gathered, analyzed and stored in different formats, making it difficult to obtain an overall integrated safety/risk picture. Although useful from a data gathering perspective, these tools fall short of providing adequate data integration and analysis support. To this end, airlines are interested in developing tools which provide a real-time and continuous picture of the operation. Furthermore, many airlines are focusing on improving knowledge integration both internally (e.g. within airline) and externally (e.g. with authorities, other airlines etc). Arguably, little or no attention has been paid to the development of tools which embed crew reporting in the Flight Crew task, and link directly to airline safety/risk monitoring and process improvement activities). The HILAS project is part of the Sixth Framework Programme for aeronautics and space research, sponsored by the European Commission. Its overall objective is to develop a model of good practice for the integration of human factors across the lifecycle of aviation systems [3]. The flight operations strand is aimed at developing a new methodology for monitoring and evaluating overall system performance to support improved information sharing, performance management and operational risk management. Its current research suggests that the first step is to gather the right information from operational personnel. Potentially, allowing operational personnel to ‘audit’ their own companies processes and procedures, may provide safety and risk personnel in an airline, with the necessary feedback to determine the safety/risk status of the operation. Furthermore, such an approach may motivate operational personnel to report both routinely and above and beyond what is required by regulatory bodies.
2 Performance Monitoring of Flight Crew Airlines use performance monitoring tools to evaluate human performance and by implication their organizational/system safety. Feedback from these tools is used to direct improvements (re-design procedures, enhance training etc.). The Line Operation Safety Audit (LOSA) methodology constitutes the current state of the art in
808
M.C. Leva et al.
terms of performance monitoring. LOSA evaluations have been successfully undertaken by many airlines to assess routine flight operations. The purpose of LOSA is to identify threats to safety, minimize the risks that such threats may generate and implement measures to manage human error in operational context [4]. In a LOSA evaluation, trained observers watch real-life operations and provide feedback about Flight Crew threat and management skills. Observers (a) document external threats, (b) record flight crew errors in terms of their type, management response, and outcome (e.g. aircraft states), and (c) rate the crew on several Crew Resource Management behavioural markers [5]. Evaluations are conducted under strict nonjeopardy conditions - crews are not at risk for observed actions. Helmreich’s Threat and Error Management (TEM) model [6] provides the theoretical basis for LOSA evaluations. The LOSA model distinguishes (a) threats: both external threats (including latent threats) and internal threats (crew performance), (b) error types, (c) error responses/countermeasures and (d) error outcomes (in terms of aircraft states)[5]. Critically, LOSA has highlighted the fact that error and violation are normal occurrences in operational systems and must be managed. The LOSA framework and methodology has many benefits which should be considered in the design of a future tool. This includes (a) the evaluation of non technical and technical skills (albeit these are separated in LOSA), (b) the attempt to link technical performance (procedures, aircraft handling), non technical performance (CRM, TEM behaviours) and aircraft states, (c) the observation of real operational practice and (d) non jeopardy evaluation (de-identified, confidential, non-disciplinary data collection) -which fosters a sense of trust between operational personnel and management.
3 Empirical Research: A Brief Excursus on the Methodology The fieldwork for this research was carried out with five airlines using participants from across the three flight operations sub-processes: flight-planning, active flight operations and change/safety/quality process. Table 1 presents a summary of the steps undertaken during the fieldwork.
4 Development of the Unique Report Input from flight crew led to the further development of the LOSA concept and the following end-user requirements were elicited: Requirement 1: Establish a reporting framework linked to the journey log of each flight that would allow the pilots to report on threats and errors encountered and managed during each flight. Requirement 2: Enable an improved logical reconstruction of occurrences and their links to flight phases and potential (crew) corrective actions. Requirement 3: Provide an established channel for reporting threats and occurrences in parallel to LOSA. In this sense, the structure of the reporting form should give a clear focus on the actual accounts of the main facts constituent to the event, introducing an “ad hoc” space for the crew to express their analysis at the end. The analysis can be directed towards the elements of the operation performance upon which the company has leverage.
Unique Reporting Form: Flight Crew Auditing of Everyday Performance
809
Requirement 4: Develop an intelligent flight plan in order to provide feedback on performance management concepts. The report is embedded in the Extended Journey Log that the flight crew compiles at the end of each flight, and is contained in an electronic flight bag (EFB). The data collected through the report will allow safety personnel to derive a reliable picture of the main threats and hazards faced in everyday operations, along with the TEM strategies used by crew. Reporting data can be used to generate a picture of the threats associated with a specific flight, and the associated TEM guidance. The unique report should not only provide crucial quantitative data for the safety management process, but also provides the flight crew with a tool which is easily accessible (some reports may be saved and completed later in the crew room, hotel room or at home), more efficient (only one report is required for completion as supplementary reports are prepopulated with data from the unique report) and therefore promotes an integrated data format for reporting and storing data which is related to the overall systems process. Table 1. Summary of fieldwork 1 2
3 4 5 6
7
8
9
Research Type Process derivation workshops were conducted with the airlines, to map the active flight operations process and understand Flight Crew task performance within the context of this process. A detailed task analysis was undertaken with Flight Crew from two partner airlines. This research was directed at understanding the nature of Flight Crew task performance (and associated information requirements) and identifying how this might be facilitated by the design of improved situation assessment and reporting tools. Seven jump-seat observations and de-brief interviews, with one pilot from each airline Two detailed task analysis case studies involving one pilot from each airline Semi-structured interviews with twelve pilots Twelve semi-structured interviews with other operational personnel (e.g. Dispatch, Cabin Crew and Maintenance) from two airlines. The purpose of these interviews was to identify other roles which feed information to Flight Crew and correspondingly, Flight Crew information outputs to these roles. Ten interviews were conducted with Flight-planning Personnel and Quality/Safety personnel from both airlines. This research investigated dependencies between active flight operation and flightplanning and safety/quality processes. A more in depth task analysis exploring certain critical activities (e.g. flight-planning and briefing, reporting etc), was then conducted with Flight Crew from one partner airline. This resulted in the identification of three high level Flight Crew scenarios and associated tool concepts. Subsequently collaborative prototyping activities were conducted with Flight Crew from a Spanish Airline, using participatory design methodologies, [7]. This research focused on modelling low fidelity prototypes of the Graphical User Interfaces with the airline designated working group.
5 Section A: Event Structure Pilots are asked to classify the event as either a single event or a chain of events. This logical distinction is necessary in order to correctly reconstruct the occurrence sequence. If the occurrence is a chain of events; all sections of the report structure will be completed cyclically until the chain is complete.
810
M.C. Leva et al.
6 Section B: When Did It Happen? Pilots are asked to select the process phase in which the event occurred • It was observed by revising a sample of reports that a substantial majority of the narrative accounts start by stating when in the process phase a certain event happened. • The pilots involved in the brainstorming sessions stated that they do not like to classify occurrences before starting an actual account of the event. • The classification of the event can become confusing since it can be assigned according to main actors involved, main causes or consequences. • Currently most of the accident or incident reporting forms force the users to classify events at the beginning, which often leads to a misclassification and a misuse of the information Linking threats and errors to information contained in a process map of the Flight (see Fig.1). Operations can help in building up a systemic repository of information. This generates a living picture of the threats and hazards most recurrent in every process phase that could, in turn be used as a foundation of a safety management system based upon a systemic view of risk. This section utilises the Operational Process Model of Flight Operations, which was previously developed through field research.
Fig. 1. Event structure and timing
7 Section C: “What happened” – Setting the Scene This section allows the pilots an opportunity to list all the actors who were involved in the event, and to provide further information about them as linked to each process phase identified. a. Users are presented with a list of all Actors and be able to select one or more actors involved by using check boxes b. A narrative section enables users to describe what happened during the event by using a combination of selected Threats, Human Errors (based on the LOSA classification) and Delay Codes (company specific), and free text.
Unique Reporting Form: Flight Crew Auditing of Everyday Performance
811
c. The threats and issues list will be linked to the process phases previously selected and can also be screened according to the actors selected. d. If the event reported is classified as part of a chain of events, threats, highlighted threats (or added in previous phases, and outcome of previous phases) are listed as possible threats of subsequent flight phases that the user can select. e. The users can input a qualitative assessment of the perceived criticality of each threat. Five levels of gravity can be reported: Catastrophic, Hazardous, Major, Minor, Negligible (see Fig.2). This data assists in rating the importance of each threat on the basis of historical data. This facilitates a review of how many high/critical threats are successfully managed on a daily basis.
Fig. 2. “What happened: threats and issues”
8 Section D: Actions Taken In this section, users are asked to describe actions taken. It also highlights features of TEM. Actions that prevent undesired outcomes within a chain of events may turn into a worse outcome at the end of the chain of events are symptomatic of effective Threats and Error Management. • For each event the user can input one or more action • For each action it is possible to select the threats in response to actions taken and the main actor should be selected from the actors list • If the actor is a member of the crew the actions taken can be selected from: • Action according to ECAM Procedures ((ECAM: Electronic Centralised Aircraft Monitor is a system that monitors aircraft functions and communicates them to the pilots. It also produces messages detailing failures and lists suggested procedures to correct the problem)
812
• • • • •
M.C. Leva et al.
Non-ECAM procedures (Normal, Abnormal, Emergency): Action taken in accordance with company procedures Action taken in accordance to Authority Regulation None of the above (“I’m not sure”) No action taken
A narrative section enables users to describe what action(s) they took by using a combination of key words and free-text. An example is reported in Fig. 3.
Fig. 3. Actions Taken
9 Section E: Consequences/ Outcome Users are asked to indicate the consequences and outcomes for every element of a chain for both safety and operational issues. They are presented with pre-defined lists and are asked to select relevant issues. The lists are taken from the ECCAIRS classification scheme [8] of potential events and are linked directly to all phases selected at the beginning of the report. Should further levels of classification of the outcome be required, a pop- up box will appear with a short list from which pilots can choose the relevant item. • More than one item can be selected (as events can have numerous consequences) • The outcome of one stage of a flight can be selected as a threat for subsequent flight phases in the report.
10 Final Outcome Pilots are asked to describe the final outcomes of events for both safety and operational issues. The final outcomes can be selected from the same ECCAIRS event classification scheme for each flight phase. Delays are also possible outcomes. Users
Unique Reporting Form: Flight Crew Auditing of Everyday Performance
813
can provide an estimate of the gravity of the outcome. The gravity can be associated to a single item or can be an overall property. Part of the screenshot for this section is reported in Fig. 4.
Fig. 4. Final Outcome
11 Section F: Analysis This section of the report is optional. It provides pilots with an opportunity to report and rate factors which affected an event (see Fig.5). All previous sections appear in a summary format for ease of reference. Users are asked to evaluate contributory factors in terms of: (a) Blockers (contributory factors which have a negative impact on the performance) (b) Facilitators (contributory factors which have a positive impact on the performance). The importance of each contributory factor can be rated (1 = least important and 4 = most important). Users can enter information about possible consequences for an event. This provides input on TEM actions that had a positive impact on the chain of events, or that, near misses should be analysed further. Pilots can save this section of the report for review and submission at a later date.
Fig. 5. Analysis of the event
12 Envisaged Use of the Data: Some Examples The performance management concept is based on the relationship between flight crew task activities in the cockpit (e.g. flight-planning/briefing, ongoing TEM, reporting),
814
M.C. Leva et al. Table 2. Summary of envisaged use of data collected through the UNIQUE REPORT
# 1
2
3
4
5
6
Report Feature At the beginning of the report – users select whether they are reporting on a single event or a chain of events – Users are not required to classify the nature of the event (e.g. event types) or error types. Users are required to state single events or the elements of the chain of events according to the process phases they refer to. The report captures information about contributory factors (e.g. fatigue, poor situation assessment, information availability and so forth). and their estimated relevance in respect to the final outcome. The report captures information about the application of Standard Operating Procedures The UNIQUE REPORT allow the association of threats to specific routes and flight phases
Related Use of data Strategic Safety Management: This data can be used to provide information in relation to process dependencies for each high severity reported Threat or Issue, highlighting the need of specific safety barriers and the placement of them where they might be more effective. Organizational Learning: Further reports on specific event chains that result in unacceptable aircraft states can be used for training purposes. Legal Requirements: Event and error classifications can be flexible in respect to the different accident classification schemes. The ECCAIRS classification was taken as a benchmark. Organizational Learning: The data can be queried according to the specific needs of the company, choosing freely between threats categories, routes, actors involved, and contributory factors examined. Strategic Safety management: The Data collected about threats, errors, delays etc. through the form is linked to a map of the relevant airline processes. This feature can building up a living picture of the threats and hazards most recurrent in every process phase. Organizational Learning: The process map can be periodically updated taking into account the suggestions coming form the report users whenever they needed to add a processes not well depicted in the map Tactical Safety Management: The indication about the relevance of contributory factors can be used to structure an index for decisions regarding the allocation of resources to the possible corrective actions. The process however also required an estimation of the expected impact of the corrective action (in terms of prevention of Loss associated with the related Risk Factor) and its foreseeable costs. Strategic Safety Management: Feedback on the follow up activities and how they took into consideration the notes provided about contributory factors can be made available to the reporters. This is recognised as a factor stimulating a proactive attitude of the front line staff. Tactical/strategic Safety Management: Reports can be filtered according to SOP usage and problems. Threats and route can be associated to a frequency in the issue highlighted in respect to the use of Procedures. Organizational Learning: Update SOPs considering suggestions made by crew (via reports). Tactical/strategic Safety Management: Use reports information, along with information from other data sources, to determine risk ratings for specific types of flights/flight route/operations etc. Organizational Learning: Revise the reports pertaining threats that have been successfully managed, and rank in terms of effectiveness of actions reported. Task support: Provide the list of threats that pertain to specific routes in the intelligent flight plan. Provide additional information available on how threats have been successfully managed in the past.
supporting, operational and organizational processes on the ground. Reporting data can be integrated with a range of safety and operational data, and analyzed by Safety/Risk personnel. For example, flight-planning can receive information about problems/ threats to be managed in flight-planning activities. Flight crew can receive a threat/risk picture for their specific flight. Current performance management processes within airlines often neglect operational feedback. Within HILAS, it has been stressed that feedback to operational processes is a central part of a best practice in performance
Unique Reporting Form: Flight Crew Auditing of Everyday Performance
815
management activities which cannot be conceived in isolation from other safety management processes and functions. For example, the provision of TEM information to Flight Crew (e.g. task support) links to broader risk management activities (e.g. reports data analyzed as part of reactive safety management and ongoing proactive strategic safety management activities). Therefore the provision of reports feedback and safety case studies relates to broader organizational learning processes. Specifically, the critical organizational areas considered for the possible usage of data were the following: (a) Task Support which involves supporting the safe, competent, effective and timely execution of individual and collaborative work tasks/activities in relation to the achievement of the operational goal. (b) Tactical Risk Management which is performed through routine processes such as reporting, investigation, assessment, analysis, recommendations, implementation, monitoring. (c) Strategic Risk Management which performed through involving strategic policy decisions for the organization and systemic assessment of events, monitoring trends against boundaries, analyzing and prioritizing systemic risks, policy decisions, organizational change. (d) Organizational Learning which entails knowledge-acquisition, information distribution, information interpretation, and organizational memory [9]. Table 2 provides a summary of the main high-level functionalities linked to this reporting data as suggested by flight crew as part of the evaluation of this reporting form.
References 1. ICAO Safety Management Manual Doc 9859 -AN/460 (2006 a) 2. Cahill, J., Losa, G., McDonald, N.: HILAS Flight Operations Research: Development of Risk/Safety Management, Process Improvement and Task Support Tools. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS (LNAI), vol. 4562, pp. 648–657. Springer, Heidelberg (2007) 3. McDonald, N.: Human Integration in the Lifecycle of Aviation Systems. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS (LNAI), vol. 4562, pp. 760–769. Springer, Heidelberg (2007) 4. FAA LOSA Advisor Circular, resubmitted to AFS-230, FAA, January 13 (2005) 5. Klinect, J.R., Murray, P., Merrit, A., Helmreich, R.: Line Operations Safety Audit (LOSA): Definition and operating characteristics. In: Proceedings of the Twelfth International Symposium on Aviation Psychology, pp. 663–668. The Ohio state University, Dayton (2003) 6. Helmreich, R.L., Klinect, J.R., Wilhelm, J.A.: Models of Event, error, and response in flight operations. In: Jensen, R.S. (ed.) Proceedings of the Tenth International Symposium on Aviation Psychology, pp. 124–129. The Ohio State University, Columbus (1999) 7. Muller, M., Kuhn, S.: Special issue on Participatory Design. Communications of the ACM 36 (1993) 8. ICAO. ECCAIRS 4.2.6 Data Definition Standard (2006b), http://www.icao.int/anb/aig/Taxonomy/ (accessed August 4, 2008) 9. Huber, G.P.: Organizational Learning: The Contributing Processes and the Literatures. Organization Science 2(1), 88–115 (1991)
Pilot Confidence with ATC Automation Using Cockpit Situation Display Tools in a Distributed Traffic Management Environment Sarah V. Ligda2, Nancy Johnson2, Joel Lachter2, and Walter W. Johnson1 1
Flightdeck Display Research Laboratory NASA Ames Research Center 2 San Jose State University NASA Ames Research Center Moffett Field, CA 94035, United State of America {sarah.v.ligda,nancy.h.johnson,joel.b.lachter, walter.w.johnson}@nasa.gov
Abstract. NASA’s Flight Deck Display Research Laboratory recently investigated air traffic automation designed to alleviate groundside workload in high traffic environments. This paper examines the data from post-experiment debriefings. We found that pilots are comfortable reviewing automated conflict resolutions, as well as modifying those resolutions before execution. The pilots were less comfortable with an automated system that had no pilot or controller human-in-the-loop review process. This traffic management concept will not be optimally achieved if pilots do not trust automation without a human review process in every conflict situation. While initial development of these systems should focus on ways to effectively enable such reviews of the automation, confidence can be expected to increase as pilots develop first-hand experience with the system. Keywords: automation, conflict resolution, cockpit display of traffic information, CDTI, cockpit situation display, CSD.
1 Introduction Air travel’s increasing demand requires a change in the role of air traffic controllers (ATC) that reduces their per flight workload. To accomplish this, part of the responsibility for maintaining safe separation distances between aircraft must be transferred from ATC to the flight deck crew or to automation. NASA’s Flight Deck Display Research Laboratory (FDDRL) recently investigated a form of air traffic automation designed to provide automated conflict resolutions. In this study, the FDDRL examined pilots’ acceptance of resolutions produced by automation developed in the Aviation Systems Division at NASA Ames Research Center [1], and how these resolutions compare with pilot-created resolutions. The automation system used in the present study is designed so that conflict detection and resolution (CD&R) functions would be performed by some combination M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 816–825, 2009. © Springer-Verlag Berlin Heidelberg 2009
Pilot Confidence with ATC Automation Using Cockpit Situation Display Tools
817
of ground-based automation, air-based automation, and pilots - thereby substantially alleviating ATC of that particular function. In this paper, we discuss the results from a debrief administered to pilots after completion of the study. The debrief assessed pilots’ confidence and acceptability of the automated resolutions, the employment of these resolutions, and the flight deck tools used to support the pilot’s activities. Other analyses from this study have been presented elsewhere [2][3]. This study also complements a similar study that examined ATC performance in a manual and interactive environment and ATC acceptability of automated resolutions [4]. A critical element of the concept under investigation is that all conflict detection responsibility and most conflict resolution responsibility is transferred from ATC to automation. The CD&R capabilities automatically detect a predicted loss of separation from Ownship (incursion of another aircraft into Ownship aircraft’s protected zone - within 5 nm lateral and 1000 ft vertical), while the other aircraft is 8 or more minutes away from loss of separation. After detection, the CD&R automatically calculates a proposed resolution. Multiple concepts have been proposed for the use of this type of automated resolution system [4]. For example, if calculated on the ground, this resolution could be sent electronically (datalinked) to the aircraft for review and execution. Alternatively, if calculated on the flight deck, a resolution could be datalinked to the ground for approval prior to execution. The primary flight deck tool for evaluating, modifying, and/or creating resolutions was the three-dimensional Cockpit Situation Display (3D CSD), a tool developed by the FDDRL to support flightdeck traffic awareness and flight path replanning [5]. The 3D CSD contains a conflict resolution tool that provides the pilot an automated resolution for evaluation, and a Route Assessment Tool (RAT) that can be used to modify the automated resolution or create a novel resolution. Additionally, the 3D CSD provides a three dimensional display of Ownship. Three levels of automation were examined: automated: the automation resolver generated and displayed a conflict resolution on the 3D CSD that the pilot was not allowed to modify, only execute; interactive: the automation resolver generated and displayed a suggested conflict resolution on the 3D CSD which the pilot could execute or modify manually utilizing the RAT; manual: the automation resolver did not propose a conflict resolution on the 3D CSD and the pilot needed to generate a solution to resolve the conflict using the RAT alone. In order to provide conflict detection and flight path information, we assume that future aircraft will have access to near real-time information about surrounding traffic (location, altitude, speed, heading, and flight plan). The present study assumes all aircraft are equipped with a version of Airborne Dependent Surveillance Broadcast (ADS-B) capable of transmitting this information to a range of 160 nm.
2 Methods 2.1 Participants Seventeen air-transport rated (ATP) pilots participated in the study. The average glass-cockpit experience of the pilots was approximately 6000 hours, with a range of
818
S.V. Ligda et al.
1000 to 13,000+ hours. One pilot was retired (less than 5 years). Two subjects reported having previous FDDRL 3D CSD experience. All participants reported owning personal computers. The study required the pilots to have a sufficiently high level of skill with the CSD tools to be able to easily assess the automated resolutions and generate unique resolutions. Five pilots failed to adequately master the CSD features during training and thus were excluded from data analysis, leaving twelve participants. 2.2 Equipment Pilots were tested individually in a quiet, dimly-lit room. The testing station consisted of one 30” monitor on which the 3D CSD was displayed [see Fig. 1]. Pilots manipulated the CSD using a standard two-button mouse. The experimenter sat in another room and monitored the pilots using a digital video camera while the pilot’s display was recorded using screen-capture software. The CD&R automation was composed of a conflict detection component and a resolution-generation component. The detection and resolution components utilized broadcast intent (flight plans) and a deterministic prediction logic to calculate conflicts and resolutions. Since no noise or uncertainty was modeled in this study, the automation provided error free predictions. However, it only issued alerts for predicted conflicts within 12 minutes of the anticipated loss of separation. Conflict resolution responsibility was always assigned to Ownship (the pilot’s aircraft). The automated resolution component was a version of a system developed by Erzberger [1] and was generated from identical conflicts found in the ATC study [4]. In addition to the automated resolution component, the pilots were provided with a Route Assessment Tool (RAT) in some conditions. The RAT allowed pilots to modify their flight plans by inserting/deleting waypoints, stretching lateral routes, or inserting/deleting climb and descent segments. Consistent with the constraints on the automated resolutions, no change in flight plans closer than 90 seconds from Ownship at the start of the trial were permitted. This was required to allow sufficient time for approval of plans by groundside automation. A more complete review of the present study’s equipment, training, design and task is presented elsewhere [6]. 2.3 Training Each pilot was briefed on the flight concept before engaging in hands-on training. During the hands-on training phase, each pilot was taught the basic functions of the 3D CSD, the CD&R tool, and the RAT, which was used to manually make flight path modifications. Once trained, the pilots completed the experimental trials. 2.4 Design The experimental design was a 3 (level of automation: automated [no RAT], interactive [automated suggestion plus RAT], manual [RAT only]) by 2 (time to loss of separation – LOS: near-term [8 minutes], far-term [12 minutes]) by 2 (automated resolution type: vertical, lateral) within-subjects factorial design.
Pilot Confidence with ATC Automation Using Cockpit Situation Display Tools
819
Fig. 1. The 3D Cockpit Situation Display presenting a conflict alert
In each scenario, there was three times normal traffic density in a double-sized sector composed of two sectors: ZKC 90 sector from the Kansas City center and ZID 91 sector from the Indianapolis center [4]. Forty-eight trials were presented to each pilot (16 trials with each automation level; within each automation level, there were 4 trials at each combination of time to LOS and automated resolution type). The automated condition was presented in the first block of trials, while the order of the interactive and manual conditions in the last two blocks was counterbalanced across the participants. Ownship was always responsible for resolving the conflict and was the aircraft which the automation selected (or would have selected) to maneuver to avoid the conflict [7]. After each trial, pilots rated the acceptability of the proposed and executed resolutions on the trials, using a five point scale ranging from unacceptable to excellent. The pilots also rated the complexity of the conflict situation, using a five point scale ranging from very simple to very complex, and answered questions relating to their situation awareness [see 6]. 2.5 Task In the automated trials, pilots were required to review and execute the proposed conflict resolution, but could not modify the proposed resolution. In the interactive trials, pilots were required to review the proposed resolution and could (optionally) use the RAT to modify that resolution or create a new resolution before execution. In the manual trials, pilots were required to use the RAT to create their own resolution, then review and execute that resolution. Pilots were required to evaluate the acceptability of the proposed resolutions in the automated and interactive conditions, as well as the acceptability of the manually-created resolutions in the interactive and manual conditions. For all cases, the pilots had up to 90 seconds to view and/or
820
S.V. Ligda et al.
modify the resolutions. In addition, pilots were asked to verbalize the rationale for their actions as they performed each task. The pilot ratings of the acceptability of the proposed automated resolutions will be analyzed in the following section. 2.6 Debrief After completing the trials, each pilot was debriefed by a researcher. All pilots were given the same series of questions and asked to provide their answers verbally. They were questioned about their acceptance of the concept, resolution strategy, and trust in the automation. The questions asked in this debrief are described along with the responses in the following section.
3 Results and Discussion 3.1 Acceptability of Proposed Conflict Resolutions In the automated and interactive conditions, pilots evaluated the acceptability of the proposed conflict resolutions generated by automation. Pilots rated the resolutions on a five point scale: 1) Unacceptable - ATC coordination required You believe the resolution was unacceptable and would reject it because it compromises the safety of flight or you are unable to comply. ATC coordination is required to find a new resolution. 2) Poor - ATC coordination sought You believe the resolution is poor and would definitely seek ATC coordination because a new resolution is highly desired. 3) Marginal - ATC coordination probably sought You believe the resolution is marginal and would probably seek ATC coordination because a better resolution is possible. 4) Good - ATC coordination probably not sought You believe that the resolution is good, although there might be a better one. You would probably not seek ATC coordination. 5) Excellent - ATC coordination unnecessary You believe that the resolution is excellent and would not seek ATC coordination. Of particular interest are the unacceptable, poor, and marginal ratings that the pilots gave to the resolutions generated by the automation. 384 automated resolutions were presented to the 12 pilots in the automated and interactive conditions; of those, 115 resolutions were rated 3 or below (30%). Pilots were asked to provide verbal comments explaining the rationale for their ratings. Pilot’s explanations were generally within (but not limited to) 4 categories: 1) Safety (e.g., this resolution was less safe because… or could be safer if…) 2) Efficiency (e.g., this resolution was not efficient or could have been more efficient because…) 3) Comfort/severity of maneuver (e.g., not comfortable making the maneuver required or the maneuver required could have been less severe if…) 4) Preference (e.g., I would prefer to go vertically or laterally…)
Pilot Confidence with ATC Automation Using Cockpit Situation Display Tools
821
Not all of the verbal comments fit in these four categories. Of the 115 comments that were rated 3 or below, 87 resolution comments fit into these categories. However, because a few resolution comments fit into multiple categories, a total of 117 resolution comments were analyzed. Frequencies and percentages of resolution comments in the four categories are presented in Table 1. Table 1. Frequencies and percentages of resolution comments rated marginal or below
Frequency Percentage
Safety 26 comments 22%
Efficiency 57 comments 49%
Comfort/Severity 11 comments 9%
Preference 23 comments 20%
As illustrated in Table 1, 31% of the low acceptability (marginal or lower) ratings related to safety or comfort/severity of maneuver. In the context of all of comments made in response to the automated resolutions, 7% of the comments are devoted to these two areas (5% safety, 2% comfort/severity). The majority (69%) of the low acceptability comments given to the automated resolutions occurred in the context of pilot preference and efficiency. More specifically, the pilots might have preferred a lateral maneuver rather than a vertical one to conserve fuel, or a maneuver that had less deviation from the original flight path. These findings suggest that future development of these automated systems should focus on methods to provide safer and more efficient proposed maneuvers. 3.2 Acceptability of Concepts The first series of debrief questions were regarding pilot acceptance of several different air traffic management concepts, each providing the cockpit with conflict resolutions. The pilots were questioned on their level of comfort with four resolution concepts that differed in whether the pilot and/or ATC were allowed to review automated resolutions prior to their implementation: 1) An automated system that detects conflicts and generates resolutions which are reviewed by ATC before being datalinked to the aircraft where the pilot does a final review using 3D CSD. 2) An automated system that detects conflicts and generates resolutions which are NOT reviewed by ATC before being datalinked to the aircraft where the pilot does a final review using 3D CSD. 3) An automated system that detects conflicts and generates resolutions which are then reviewed by ATC before being datalinked to the aircraft where the pilot does a final review based on the datalinked route only (no flightdeck CD&R) 4) An automated system that detects conflicts and generates resolutions which are NOT reviewed by ATC before being datalinked to the aircraft where the pilot does a final review based on the datalinked route only (no flightdeck CD&R). Almost all pilots reported being comfortable with the first concept, which allows for prior review of automated resolutions by both pilots and controllers (92% comfortable, 8% somewhat comfortable). In contrast, most pilots were uncomfortable
822
S.V. Ligda et al.
Controller Review
Comfort Rating
Fully Comfortable
Yes No
Somewhat Comforable
Uncomfortable Yes
No Pilot Review
Fig. 2. Graph of pilot comfort with varying concept levels
with the last concept, which permits no prior human review of the automated resolutions (17% comfortable, 25% somewhat comfortable, 57% not comfortable). The other two concepts fell in between, (pilot review only 42% comfortable, 50% somewhat comfortable, 8% not comfortable, and ATC review only 67% comfortable, 25% somewhat comfortable, 8% not comfortable). Comfort ratings were submitted to a 2 (Pilot Review) x 2 (Controller Review) analysis of variance (ANOVA). The comfort ratings were dummy coded as 0, 1 and 2 for not comfortable, somewhat comfortable, and comfortable, respectively. This analysis yielded significant main effects for Pilot Review (F(1,11) = 8.19, p < .05) and Controller Review (F(1,11) = 30.31, p < .05), but no significant interaction (p > .10), see Fig.2. Finally, the ratings of pilot only review were compared with those for ATC only review to see if pilots differentially valued one type of review over the other. Despite the small trend in favor of ATC review, this effect was not significant (p > .10). Thus, based on the ratings, the pilots appear to prefer human review of automated resolutions compared to not having a human operator in the loop. The pilots’ comments also supported this conclusion. During final debriefing, pilots stated that they were comfortable reviewing the resolutions created by the automation, as well as modifying those resolutions before execution. However, they stated that their comfort level and trust in the automation would be higher if ATC also reviewed the resolution before they considered it. The pilots were somewhat less comfortable with a system that had no human-in-the-loop review process of conflict resolutions before they were datalinked to the flight deck although the pilot was given the option of final review at all times. This latter finding could be an artifact from the present-day system in which ATC provides most conflict detection and resolution services for the pilot (the exception being TCAS alerts). Alternatively, pilots might prefer a system that has a human review process in addition to only reviewing it themselves. Pilots stated that, although they would like to see the concept tested in the field before they were asked to use it, they would most likely accept a system that included automation as long as a human remains actively
Pilot Confidence with ATC Automation Using Cockpit Situation Display Tools
823
involved in the process. As one pilot stated when asked if he felt comfortable with the first concept: “Absolutely! Especially since it’s got the human element involved as a backup. Obviously it's an automated system that doesn't see all the human elements. As long as you have a human element in there to watch over it and make sure it doesn't go awry, and doesn't interfere too much that it defeats the purpose, yea, I'm very comfortable with that concept.” 3.3 Preferred Resolutions The pilots were also asked a series of questions regarding the characteristics of both the automated resolutions and the resolutions in their actual flights. None of the pilots had any ATC experience, and most reported that their flight school training (TCAS training in particular) was an important factor when learning how to resolve conflicts in the most efficient and effective manner. Overall, the pilots said that they preferred speed and vector changes (11 out of 12), depending on the situation. Furthermore, they were less enthusiastic about altitude changes. This might be due to a motivation to conserve fuel. When questioned about what factors they considered when making conflict resolutions, the pilots stated that their first concern was the overall safety of their flight, but that they also consider fuel and time to the gate to be major factors. In their initial training for this study, the pilots were told they would be flying at the altitude assigned to them by their flight dispatchers in all the scenarios. Therefore, they may have inferred that an altitude change was suboptimal for fuel. This overriding concern with fuel also appeared in the pilots comments: once they had satisfied safety concerns (avoiding conflicts), they chose resolutions that would not negatively impact their fuel consumption. The pilots were prepared to execute any resolution that satisfied these goals: “It depends on the situation. Sometimes a vector is better, sometimes altitude is better and sometimes a speed change is better. It depends on terrain, weather, turbulence, fuel, traffic flow. Each situation is independent of the other.” 3.4 Time Pressure In a three-times normal traffic density environment, time is of the essence. The pilots were given 90 seconds to review and accept, or review and modify/create, an acceptable conflict resolution. If they were unsuccessful in completing the process in that time frame, the resolution became null and void and the scenario ended. Nine out of the 12 pilots stated that they wanted more than 90 seconds to resolve some conflicts, especially multiple conflicts – “Yes. Oh, yes. It was a 90 second crunch.” However, the pilots felt the allotted time was sufficient in certain cases, usually when they only needed to resolve a single conflict. A majority of the pilots felt that if they were given more time to resolve the conflicts, they would have considered more alternative resolution strategies. “Absolutely. Absolutely! I just went with the first one that worked.” 3.5 Flightdeck Display Pilots were asked a series of questions regarding the 3D CSD display and its suite of tools, in particular: the usefulness of the information displayed, display clutter
824
S.V. Ligda et al.
management, the display’s impact on decision-making, and what additional information might be of assistance to them. Each pilot reported that they used a combination of features and tools that worked best for them: the assigned colors of the aircraft (blue = above, green = below, white = co-altitude), adjusting the range and pulse predictor, aircraft identification tags, distance to airport or top of decent, and others. All the pilots felt the display was cluttered; this was most likely due to the traffic density being set to three-times current day traffic levels. Pilots provided a number of ideas for clutter mitigation in addition to those currently provided by the 3D CSD. The pilots also provided useful ideas for information they would like to see on future 3D CSD applications, which are available, but not used in this study (e.g., terrain, weather). Of particular note, most pilots (7 out of 12) suggested that nonconflicting aircraft or aircraft a specified distance from Ownship should be removed from the display – “Remove the aircraft that are nowhere near a possible conflict. Some that could potentially be conflicts, yeah, leave them there.” With regards to decision-making information, the pilots were queried on both the provided information that supported their evaluation and additional information that might have been useful. As noted, each pilot modified the tools to best support their own strategy, but most felt they had sufficient information to make an informed resolution decision. When asked about the minimum information needed to make a safe resolution decision, responses varied by pilot. The most common requests were altitude, heading, conflict color change, and predicted future position. 3.6 Trust in Automation When asked whether pilots would trust automated resolutions compared to the manual resolutions in the field, and what could be provided to increase their level of trust in the automation, all of the pilots involved stated they would trust the automation. The majority of pilots (8 out of 12) would like the automation to be always activated in their flights, but the pilots would make the final decision regarding the suggested resolutions. “Well, the suggested resolution always on doesn't bother me because I can always fly the airplane myself without using the resolution.” All of the pilots felt that their conflict resolutions were either “very similar” or “somewhat similar” to the automated resolutions. Also, all the pilots felt that with sufficient training and proper certification that they would have an increased trust in the presented automation. “Yeah, I think I would. If it were certificated by the airline or by Boeing or whoever, Airbus, and the FAA. If everybody blessed it, why shouldn't I?”
4 Conclusion The majority of the pilots were naïve to the distributed traffic management concept before their participation in this study (10 out of 12 pilots); therefore, hesitations about utilizing and performing with this new task were expected. Despite this, we found that the pilots’ trust and confidence levels in conflict resolution automation were high, provided that there is a human-in-the-loop resolution review process. This can be expected for several reasons. Before pilots can be expected to trust automation without human review, thorough familiarity and support in application is necessary:
Pilot Confidence with ATC Automation Using Cockpit Situation Display Tools
825
pilots need to be confident that all groundside and airside operators are familiar with, and are optimally efficient, in utilizing this system. Furthermore, ensuring pilots have a basic understanding of the primary factors that are considered by the automation when creating a conflict resolution - such as weight, winds, speed and altitude - may support an increase in their overall level of confidence in the system. This system is designed to alleviate groundside workload within a high traffic density environment. However, this traffic management concept will not be optimally achieved if pilots do not trust automation without a pilot or controller review process in every conflict situation. Pilots stated that, while initial development of these automated systems should focus on ways to effectively enable this review, trust and confidence will increase with first-hand experience of using the system. They cited their experience with TCAS as an example of a system where such trust evolved over time. Overall, the pilots were optimistic that similar confidence and trust would be achieved with this new automation-assisted system.
References 1. Erzberger, H.: Automated Conflict Resolution for Air Traffic Control. In: Proceedings of the 25th International Congress of the Aeronautical Sciences (ICAS), Germany (2006) 2. Battiste, V., Johnson, W.W., Dao, A.Q., Brandt, S.L., Johnson, N., Granada, S.: Assessment of Flight Crew Acceptance of Automated Resolutions Suggestions and Manual Resolution Tools. In: Proceedings of ICAS 2008, Anchorage, Alaska (2008) 3. Johnson, W.W., Battiste, V., Dao, A.Q., Brandt, S.L., Johnson, N., Granada-Vigil, S.: Pilot Acceptance of Automated Resolution Advisories: Preliminary Evaluation. In: Presentation at Human Factors and NextGen: The Future of Aviation, Arlington, TX (2008) 4. Homola, J.: Analysis of Human and Automated Separation Assurance at Varying Traffic Levels. Master’s Thesis, San Jose State University, San Jose, CA (2008) 5. Granada, S., Dao, A.Q., Wong, D., Johnson, W.W., Battiste, V.: Development and Integration of a Human-Centered Volumetric Cockpit Display for Distributed Air-Ground Operations. In: Proceedings of the 12th International Symposium on Aviation Psychology, Oklahoma City, OK (2005) 6. Dao, A.Q., Brandt, S.L., Battiste, V., Vu, K.-P.L., Strybel, T., Johnson, W.W.: The Impact of Automation Assisted Aircraft Separation on Situation Awareness. Paper presented at the 13th annual International Conference on Human-Computer Interaction, San Diego, CA (2009) 7. Erzberger, H.: Transforming the NAS: The Next Generation Air Traffic Control. In: Proceedings of the 24th International Congress of the Aeronautical Sciences, Yokohama, Japan (2004)
A Study of Auditory Warning Signals for the Design Guidelines of Man-Machine Interfaces Mie Nakatani1, Daisuke Suzuki2, Nobuchika Sakata2, and Shogo Nishida2 1
Osaka University, 1-16, Machikaneyama, Toyonaka, Osaka, Japan [email protected] 2 Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka, Japan {suzuki,sakata,nishida}@nishilab.sys.es.osaka-u.ac.jp
Abstract. This paper presents an experimental study of the effects of respective sound parameters on human behavior in an emergency. In recent years, there are many natural disasters. The evacuation procedure is one of the pressing problems. Some audible alerts are entrenched in our daily life, such as police siren, ambulance siren, emergency bell, fire alarm and so on. Most people, however, don’t escape by hearing the alert. People have a cry-wolf syndrome. Some researches have been studied for people to take appropriate action. One is going in the direction of education or training for people. Another is going in the direction of improved evacuation call. We have studied about auditory warning signals to assist the design of the warning systems. In this paper, the dynamic change of the sound parameters due to a degree of danger is focused. First, the reference values of parameters are settled by the reaction time experiment. Then the effect of dynamically change in parameters is subjectively evaluated. Keywords: auditory warning signal, response time, pitch, frequency, waveform, psychological experiment.
1 Introduction Disaster prevention and mitigation are the high-priority issues in Japan where there are many natural disasters. Immediate evacuation is required to reduce human damage. However the rate of evacuation of residents is very low. The rate of evacuation is no more than 50%, though it varies from disaster to disaster. The reasons are reported as following [1]: − Cry-wolf syndrome. Disaster doesn’t sometimes happen despite a warning notice. After the prediction was off two or three times in a row, people don’t believe the warning. − Bias of underestimation. People have a tendency to interpret information conveniently for them. They are apt to think them will not be damaged. − Lack of knowledge about disaster information and natural phenomenon. Additionally Katada’s survey [2] shows other reasons. Some people couldn’t evacuate because they themselves or their family had some physical constraints or M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 826–834, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Study of Auditory Warning Signals for the Design Guidelines
827
they didn’t know the procedures or roots of evacuation. Other people didn’t evacuate because they couldn’t leave their household goods and tools of their livelihood. It is not easy to lead people who do not want to evacuate. There are sociological approach and engineered approach to evacuation issues. From the perspective of social psychology, researchers have developed the education and training method for disaster prevention and mitigation, and supported the development of a community network for mutual helpfulness. From the perspective of engineering, easily comprehensible warning notices, the emergency exit sign in media mix and the accurate and pinpoint hazard assessment have been studied. In this paper, the purpose is the guideline of the interface design to convey a crisis mentality from the perspective of interface engineering. Referring to the previous records, people rarely get panics. On the contrary, most people get used to the warning alert. We have started experiments to develop the warning calls which give people moderate nervousness. Here we report two experiments on parameters of auditory warning signal.
2 Related Study In this chapter, related studies about auditory warning signal are reviewed from a standpoint of ergonomics or interface engineering. According Kuwano [3], a warning signal has to meet the following requirements. 1. 2. 3. 4.
It has to be easily perceived in any noisy conditions. It has to be easily perceived by every age group, elderly people with hearing loss. It has to be easily recognized as a warning sings even after being perceived. It has to have universality transcending national boundaries. In other words, it has to be recognized as a warning sings in any countries or any language.
First, auditory signals have to be recognized without paying any attention to them even if in noisy surroundings. There are many researches about detectability of siren [4-6]. Guo [7] made experiments about sirens easily heard by hard-of-hearing person and about distance of hearing sirens. Also, basic siren used for industrial product are standardized [8-9]. Second, auditory signals have to make everyone perceive instinctively that something dangerous happens. Appreciation of signal’s meaning has two problems, culture-specific perception gap and difficulty to distinguish a critical alert from other various signals. There are many auditory signals in everyday situations. Various electronic sounds come from appliances in one’s house, announcement in the station or vehicle, vending machines or door way in the town and so on. About cross-cultural comparison, Kuwano [10] researched subjective impression of auditory signals between Japan, Germany and America. Some sounds roused the opposite evaluation in different countries. This research suggested that the signal whose frequency shifted and the frequency swept from low to high over a wide range gave the impression of dangerousness and that the impression became more dangerous as the off-time became shorter.
828
M. Nakatani et al.
About various signals in daily life, Yamauchi [11] clarified the factors of sound image common to various daily signals. His research showed that people’s signals images ware divided into two directions. One was “warning or notice,” the other was “calling or starting”. For periodic modulated sound, when the modulating frequency of the signal was from 1.25Hz to 5.0Hz, people recognized it as a warning or notice. When the modulating frequency of the signal was from 10.0Hz to 50.0Hz, people recognized it as a calling or starting. Particular signals, for workplace signals like heavy equipments, a cockpit or medical devices, are prescribed in respective specifications like ISO or JIS. As well as above-mentioned studies, there are many researches about images of signals [12-13]. According to Kuwano [10], the higher the frequency of signal is, the more danger people feel it. And the shorter the silence at intervals during the sweep sound is, the more danger people feel it.
3 Concept and Methodology This paper deals with images of warning signals to control critical feelings. People will soon get used even the appropriate signals as mentioned in the chapter 1. So we aim to change the signal impression depending on the situations. For an example, a fire alarm apparatus is the typical signal which even if people hear, anyone doesn’t evacuate. But people will feel different than usual and try to see what happened if the sound changes according to the situation at the time. The changing signal will make people become more and more uncomfortable. Though previous works suggest the appropriate parameters of warning signals, they don’t mention the psychological change depending on the changing signals. We examine that changing the parameter of signals will control the dangerous impression. At the first experiment, we determine the reference values of parameters by measuring the response time. Most studies use the Semantic Differential method to evaluate subjective images. However we require the quantitative data, not the qualitative evaluation, to determine the reference values. So we try to adopt the new evaluation method. Subjects are asked to push a button as soon as they decide to evacuate by hearing a sound. If the signal gives them the urge to run away, the response time will become short. The relation with the parameter of sound and the response time can be quantitatively analyzed. At the next experiment, the intergraded signals are evaluated with subjective impression.
4 Experiments 4.1 Experiment I This experiment’s goal is to make clear the relation between the parameter of signal and the response time to push the evacuate button.
A Study of Auditory Warning Signals for the Design Guidelines
829
Material This research targets the sweep sound as a warning signal. We prepare the 80 signals by the combination of the four parameters as the following. Each signal is five seconds. − Wave pattern: sign wave, square wave − Pitch: high, low [sign wave: -7dB, -13dB, square wave: -10dB, -16dB] − Modulation period: 1Hz, 2Hz, 4Hz, 8Hz − Frequency: 160Hz-320Hz, 320Hz-640Hz, 640Hz-1280Hz, 1280Hz-2560Hz, 2560Hz-5120Hz Procedure A note PC and a headphone are used for the experiment(the left of Figure 1). These 80 signals are played in random order. Subjects are required to push a button between “Escape” or ”Not Escape” as soon as the signal is played. In addition, they are asked to select one among five descriptions for every signal(the right of Figure 1). − − − − −
This signal has no punch. This signal is appropriate. I can dispassionately evacuate. This signal is too noisy. I can’t dispassionately evacuate. This signal is rather a departure bell than a warning. This signal is neither a warning nor a departure bell.
Subjects are sixteen, including university students and graduate students (13 males and 3 females. Age is from 22 to 25.) Play
ESCAPE
NOT ESCAPE
Repeat
Subjective impression
Fig. 1. A scene of the experiment and an example of the display
Result Average time to evaluate 80 signals was about ten minutes per one subject. The higher limit of the response time to one signal is set to nine seconds. When the response time is over nine seconds, or “Not escape” or “Repeat” button is chose, the response time is regarded as ten seconds. Table 1 shows the map of the response time(second). The numeric value in each cell is the median.
830
M. Nakatani et al. Table 1. The response time (the median) of each signal Sign wave (high -7dB)
Sign wave (low -13dB) (second)
Modulation
(second)
Modulation
1
2
4
8
1
2
4
8
160-320
10.00
10.00
10.00
10.00
160-320
10.00
10.00
10.00
10.00
320-640
10.00
10.00
9.50
10.00
320-640
10.00
10.00
10.00
10.00
640-1280
4.57
2. 73
2.64
3.51
640-1280
10.00
3. 00
7.73
10.00
10.00
9.50
8.59
10.00
10.00
4.47
6.90
7.55
Frequency
Frequency
1280-2560
10.00
5.85
3.98
3.26
1280-2560
2560-5120
7.36
3.19
5.31
2.77
2560-5120
Square wave (high -10dB) Frequency
Square wave (low -16dB) (second)
Modulation
1
2
4
8
(second)
Modulation Frequency
1
2
4
8 10.00
160-320
5.49
2.77
2.71
3.13
160-320
10.00
9.50
2.71
320-640
3.00
2.77
3.42
2.27
320-640
5.01
3.58
2.81
3.13
640-1280
3.20
2. 05
2.52
1.68
640-1280
3.91
2. 24
2.74
2.89
1280-2560
3.63
3.16
2.05
1.63
1280-2560
7.38
3.10
3.56
4.30
2560-5120
4.43
3.45
2.70
2.18
2560-5120
10.00
4.73
3.05
4.07
The shortest response time Most subjects subjectively evaluate it as “appropriate”.
Focusing on frequency, about the signals from 160Hz to 640Hz of sign wave take long response time. These signals can’t make subjects evacuate. About the signals of 640Hz-1280Hz, the time is shorter in all tables. Focusing on modulation period, the signals of 1Hz takes the longer time than others. Other modulation periods have no particular tendency. Focusing on wave pattern, square wave has a tendency to take shorter time. Focusing on Pitch, high pitch has a tendency to take shorter time with a few exceptions. About subjective impression, the low- frequency signal is evaluated as “no punch”. The high-frequency signal is evaluated as “too noisy”. The signals of 640Hz-2560Hz has a tendency to be evaluated as “appropriate”. On the whole, the result doesn’t show that the higher frequency or modulation period is, the shorter the response time is. The signals from 640Hz-2560Hz frequency and 2Hz-4Hz modulation period have the shorter response time. And these signals are also evaluated as subjectively “appropriate” by most subjects. In Table 1, the bold-lined cells mean the shortest response time and the dotted cells mean best evaluation by subjective impression. Totally, we may use the 640Hz1260Hz frequency and 2Hz modulation period as the reference values of parameters for the next experiment.
A Study of Auditory Warning Signals for the Design Guidelines
831
4.2 Experiment II This experiment’s goal is to ascertain whether the changing parameter makes subjects feel more in danger. Material This time we change three parameters in three steps. From the result of the first experiment, 640Hz-1260Hz frequency and 2Hz modulation period are set as the reference value of three steps. The parameters are changed as the following. We incrementally change these parameters on both sign wave and square wave. − Pitch: sign wave: -13dB > -10dB > -7dB square wave: -16dB > -13dB > -10dB − Modulation period: 1Hz > 2Hz > 4Hz − Frequency: 320Hz-640Hz > 640Hz-1280Hz > 1280Hz-2560Hz The signals are combined by following three patterns: one parameter change, two parameters change, three parameters change. − incremented by one parameter: three signals − incremented by two parameters at the same time: three signals − incremented by three parameters at the same time: one signal These patterns are made on both sign wave and square wave, so the total is 14 signals. Each signal is fifteen seconds (the parameters is changed every five seconds). Evaluate the first 5 seconds
Evaluate the next 5 seconds
Evaluate the last 5 seconds
Play The highest urgency
The lowest urgency
Repeat Fig. 2. An example of the display of the second experiment
Procedure A note PC and a headphone are also used for this experiment. The 14 samples are played in random order. Subjects are required to evaluate the degree of urgency by sliding a bar when the signal is played(Figure 2). One signal is composed of three 5-second parts. So
832
M. Nakatani et al.
they have to evaluate three times for one signal. Subjects are sixteen university students and graduate students(13 males and 3 females. Age is from 22 to 25.)
The degree of urgency
Result Average time to evaluate 14 signals was about five minutes per one subject. The slide evaluation score is ranged from 0 to 9. In Figure 3, the evaluation score of the reference value of parameter is center justified for comparison of all signals. When one parameter is changed, the degree of urgency increases slightly according to the incremented parameter. Among them, frequency is an effective parameter to inform the dangerous situation. When two parameters are changed at the same time, the degree of urgency increases more. When three parameters are changed altogether, the degree of urgency considerably increases (Figure 3). 9 8 7 6 5 4 3 2 1 0
The basis value
Three parameter are changed all together. Modulation is changed.
Frequency is changed. Pitch is changed.
First 5 second
Modulation: 1Hz Frequency: 320Hz Pitch: -13dB -16dB
Next 5 second 2Hz 640Hz -10dBd -13dBd
Last 5 second 4Hz 1280Hz -7dB (sign wave ) -10dB (square wave)
Fig. 3. The result when one parameter is changed and when three parameters are changed at a time
4.3 Discussion Previous researches say that the higher the frequency of signal is, the more danger people feel it. But the result of the first experiment suggests that the high frequency isn't always more anxious. Though it is uncertain that the response time is the best way as the evaluation method, the result shows that subjects respond at short times in the particular values of parameters. This suggests that the warning signals have the appropriate parameters. The result of the second experiment suggests the possibility to control the impression of the warning signal. By combining the number and range of the parameters, we will be able to operate the urge to escape. Furthermore the dynamic change of the warning signal will prevent people from getting accustomed to the emergency call and cause a refreshing feeling.
A Study of Auditory Warning Signals for the Design Guidelines
833
This time subjects are only students. We have to examine with subjects in a wide age range and compare with different cultures.
5 Conclusion Today, disaster prevention and mitigation are the important problems. A lot of efforts have been done to minimize damage. To develop the appropriate evacuation calls is one of the tasks, because most people don’t escape despite hearing the call. The auditory property of the warning signals has been studied by many researchers. Most of them use the subjective evaluation and don’t focus the temporal changes of signals. In this paper, we adopt the measurement of response time to evaluate quantitatively and focus on the temporal change of signals. We think the dynamic change of signals will prevent people from getting accustomed to the emergency call and cause a refreshing feeling. The result of two experiments shows the possibility to control the urgency impression by operating the parameters. As future tasks, we have to make experiments with more subjects and more countries. Our final goal is to construct the guideline of interface design for emergency information system.
References 1. Honma, M., Katada, T.: Study on the Relationship between Disaster Advance Information and Resident Evacuation In Tsunami Disaster Prevention. Journal of Disaster Information Studies 6, 61–72 (2008) (in Japanese) 2. Katada, T., Kuwasawa, N., Kanai, M., Kodama, M.: Study of Social technology and Reassurance from Tsunami Disaster. Sociotechnica 2, 191–198 (2004) (in Japanese) 3. Kuwano, S., Namba, S., Shick, A., Höge, H.,, Fastl, H., Fillipou, T., Florentine, T., Moesch, H.: The timbre and annoyance of auditory warning signals in different countries. In: Proceedings of the International Congress on Noise Control Engineering, pp. 3201– 3206 (2000) 4. Edworthy, J., Loxley, S., Dennis, I.: Improving Auditory Warning Design: Relationship between Warning Sound Parameters and Perceived Urgency. Human Factors 33(2), 205– 231 (1991) 5. Stanton, N., Edworthy, J.: Human Factors in Auditory Warnings, Gower Technical, UK (1998) 6. Namba, S., Kuwano, S., Kinoshita, K., Kurakata, K.: Loudness and timbre of broad-band noise mixed with frequency-modulated sounds. Journal of the Acoustical Society of Japan (E) 13(1), 49–58 (1992) 7. Guo, S., Nakazawa, M., Ouchi, Y., Yamasaki, Y.: 1bit Alert Sirens. In: Proceedings of 1bit Fourum 2005, pp. 14–20 (2005) (in Japanese) 8. ISO7731: Ergonomics– Danger signals for public and work areas – Auditory danger signals (2003) 9. ISO7029: Acoustics – Statistical distribution of hearing thresholds as a function of age (2000) 10. Kuwano, S., Namba, S., Shick, A., Höge, H.,, Fastl, H., Fillipou, T., Florentine, T.: Subjective impression of auditory danger signals in different countries. Acoustical Science and Technology 28(5), 360–362 (2007)
834
M. Nakatani et al.
11. Yamauchi, K., Takada, M., Iwamiya, S.: Functional imagery and onomatopoeic representation of auditory signals. Acoustical Science and Technology 59(4), 192–202 (2003) (in Japanese) 12. Namba, S., Kuwano, S., Mizunami, T.: Subjective evaluation of synthesized signals. In: Proc. 3rd Jt. Meet. Between ASA and ASJ, pp. 451–454 (1996) 13. Rogers, W.A., Lamson, N., Rousseau, G.K.: Warning Research: An Integrative Perspective. Human Factors 42, 102–139 (2000)
Computer-Aided Collaborative Work into War Rooms: A New Approach of Collaboration Jeremy Ringard, Samuel Degrande, Stéphane Louis-dit-Picard, and Christophe Chaillou ALCOVE, bâtiment IRCICA Parc scientifique de la Haute-borne, 50 av. Halley 59650, Villeneuve d'Ascq, France {Jeremy.Ringard,Samuel.Degrande,Christophe.Chaillou}@lifl.fr, [email protected]
Abstract. This paper presents the realization of a new software and hardware platform for collocated collaborative work. Our objective is to take the most of the various competences of the teammates. We have created an architecture named MVT (model, view, tool) for supporting collaborative interaction in warroom-like environments. This software distribution offers various interactions modalities, allowing multi-skilled teams to collaborate using different input devices, thanks to multiple visualization and interaction channels. Keywords: collocation, collaboration, CVE, war room, teamwork.
1 Introduction For several years, technologies that could allow efficient real-time collaboration between co-workers separated by a geographic distance have been a major problematic for scientific community. Increasing needs from industry regarding project management brought researchers about working on the development of some tools allowing several teammates to work together on a common project. Several approaches have been issued, through technologies like videoconferencing, document sharing applications, or collaborative virtual environments (CVE). CVEs represent a major subject of research: it consists in bringing several distant users together in a common 2D or 3D environment. These CSCW (computer supported collaborative work) technologies have mainly focused on distant collaboration, which is a situation that occurs while working on coconception activities, but some real collaboration's use cases are still left aside: in a real situation, the co-workers are more likely to be located in the same building. Moreover, when the project requires the intervention of distant users, there is a poor probability for the entire team to be geographically dispersed. This involves group to group collaboration, rather than peer to peer. Consequently, companies are massively adopting others collaboration strategies for teamwork: the “war room” configuration [1][2]. The main advantage of this type of collocated work resides in the permanent and direct contact between the co-workers which allow them to be able to respond immediately to any unexpected issues. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 835–844, 2009. © Springer-Verlag Berlin Heidelberg 2009
836
J. Ringard et al.
Despite the massive adoption of this type of configuration by companies, very few scientific research have focused on software support for collocated work such as war rooms, let-alone regarding interaction on virtual models. In this study, we present a new approach of computer aided collaborative work, through the development of a software that can be assimilated to a traditional CVE, but optimized for collocated multimodal collaboration. Our proposition is a new software and hardware platform that allows a team of users to interact and communicate in the same way than in a war room-like environment. The software platform is especially designed for scenarios like working on virtual models (from architecture to complex mechanic systems). Every user is able to interact differently on the objects thanks to a large range of tools. Before describing precisely our work, we will firstly present the state of art about collaborative software solutions. The next section will describe the concept and the problems addressed by our war room, before detailing the functionality of our prototype. We will conclude this paper by presenting the future works that are planned.
2 State of Art Among all the technologies developed to assist teamwork on a project by synchronous interaction, tools like CVEs are the most complete. This kind of support consist in allowing people to cooperate by virtually bringing them together in a virtual environment (most likely 3D), providing them several tools for interacting on shared virtual objects, and some distant communication means[3][4]. The major benefit brought by CVEs is the possibility to set up purely virtual teams. However, those technologies turn out to be not properly adapted to some use cases: in most of the real situations, the working group is composed at least partly by some collocated people. For these reasons, the working methods have been adapted to this parameter, through the use of war rooms. Jason Leigh has published some complete works regarding this kind of configuration [5]: Working in a war room consists in bringing a whole collaborators team in a closed space, for working on a common project. This project can be a simple brainstorming task, crisis management, or every type of task that requires good reactivity and instant feedbacks between co-workers. Regarding material, the observation of a “traditional” war room is quite interesting, because of the diversity of writing surfaces that are involved in the room [5]. We can cite, among other things, a surface dedicated to the permanent display of some particular information (a corkboard to pin the planning). In the same way, another large scale surface can be used for several teammates to work together on one subtask (a blackboard). Each surface is parented to one kind of subtask. An interesting fact regarding this surfaces diversity is the spatiality of processed data: Leigh et al. [5] highlights the fact that people talking about one element of the project naturally refers to the place where the associated surface is located. Generally, the link between spatial distribution of data and the sequence of collaboration is strong. Mark et al. [1] mention the fact that every user establishes a interdependencies map, that allow them to immediately identify which teammate is
Computer-Aided Collaborative Work into War Rooms
837
the most likely to perform a subtask, depending of everyone's competences. The efficiency of a collocated team have been highlighted in multiple papers [6][2][7]. In our vision of the war room, we can associate this notion of spatial and human distribution of the competences with the multiplicity of writing surfaces: We propose to enrich these surfaces by replacing them with more complex interactive devices. As a consequence, a direct link can be established between the devices and the users’ competences. The major consideration enhancing the war room efficiency is a consequence of the natural human contact: Direct communication, without using third-party technologies such as videoconferencing or virtual avatars, improves comprehension thanks to non-verbal communication items that are not (or not properly) available when working through a virtual environment. Moreover, the spatial proximity of the teammates provides a better accessibility, and a favorable atmosphere for teamwork. Finally, the mutual comprehension between each user is enhanced by informal conversations that could occur between people in the room [8]. Few approaches tackle the notion of collocated collaboration, especially when using multitouch tabletop devices [9][10]. Unfortunately, those researches only offer one single interaction modality, by working exclusively on tabletop devices. Streitz et al. have developed “I-LAND”, a collocated, multi-computer platform for collaboration [11]. However, this solution does not provide multiple heterogeneous displays and tools for interacting on the documents. Some other papers have been presented, describing collaboration between a group and a single distant user, or more generally group to group collaboration [12][13][14]. However, those approaches don't provide any real interaction activity, and rather focus on communication tasks.
3 Computer Aided War Room: The Concept Even if the papers described above bring some interesting facts for setting up collocated collaboration, none provide a generic tool for virtual interaction in a war room-like environment. This is precisely what we are developing, through the realization of a computer support adaptable to several scenarios. Therefore, our proposition consists in a single room equipped with various workstations, and a virtual environment linked to each computer, allowing collaborative interaction. Our interpretation is different from standard CVEs, since the notion of software support is not used in the same way (fig. 2): − According to CVEs, the couple user-computer is fixed, so this couple is considered as one single entity. As a consequence, in a way, the collaboration do not occurs between users, but between computers, The PC to PC link offering a communication channel (videoconferencing, avatars, telepointers,...), and an interaction broadcasting channel. CVEs are not adapted for collocated work: grouping several CVE-connected computers in the same room do not fit our purpose, because most CVE applications provide a WYSIWIS (“What you see is what I see”), relaxed or not. Therefore, the available interactions are the same for each user. This concept does not fit the fact that each user is specialized for one kind of task. Moreover, the fact that each user
838
J. Ringard et al.
is permanently linked to his workstation lower their freedom, as well as the cerebral stimulation provided by collocated teamwork. − Contrary to the configuration discussed above, our approach emphasizes diversity, from visual representations and from interaction. In our proposition, the major collaborative entity is composed of the whole team. The semantic links established between the users and their workstations are breakable and modifiable as required, depending of the needs of each teammate at every instant. The stations count is independent from the physical team. The interaction objects, as well as the virtual tools, can be diffused freely among the workstations (including simultaneous displays) depending of the desired activity distribution.
Fig. 1. Left: Traditional CVE concept. Right: war room application on the right
The development of our war room essentially relies on three elements: Firstly, the heterogeneity of the hardware devices provided to the users, then the interfaces plasticity, and finally the spatial localization of the teammates in the room. 3.1 Hardware Heterogeneity The devices catalog available now allows us to benefit from a large variety of interaction modalities and visualization surfaces. It appears natural to take advantage of this diversity by providing on each workstation the most adapted interaction tools and display methods. When collaborating on a large scale project, multiple different skilled professionals are likely to be involved. As a result the users' needs regarding the project's object are not the same, depending of the human specialties involved. Let us take a simple example: A vehicle conception project. This kind of task leads to the presence of several people: A specialist of electronic, who would be in charge of the electric cables repartition inside the vehicle's body, will be more likely to work with detailed blueprints illustrating the inside of the vehicle. In the same time, another specialist (a designer for instance) working on the same project need to work directly on the shape of the car body. This subtask may be easier to perform by interacting directly on a 3D view.
Computer-Aided Collaborative Work into War Rooms
839
Consequently, even when working on the same project, the collaborators' visions are singularly different. Our choice is to take advantage of the hardware diversity to answer those different needs. A multitouch tabletop display, for instance, will be more likely to be used for 2D interaction. In the same way, a large scale stereoscopic screen coupled with a 6 dof input device will appear perfectly adapted for interaction in a 3D environment. Some mobile terminals like PDAs or smartphones, can be useful too, providing tools adapted to the mobility of these devices. 3.2 Plasticity To extend the use of hardware diversity in the war room, we want to propose a functionality that can be assimilated to the notion of interface plasticity [15]. While working in the war room, each change in interaction modality potentially involve a move from the user, to pass from one computer to another. However, in the case of a short-time interaction, it could be useful to provide to the user the ability to import dynamically the tools and/or the objects necessary for this interaction, avoiding physical move in the room. This proposition involves the necessity to adapt dynamically the interface and the interaction modalities from the second workstation to a new display/input device that was not originally planned. We notice that the two notions of hardware heterogeneity and of spatial distribution of activity are closely linked. 3.3 Collocation This third subsection is focused on a specific goal: To take advantage of collocation and the associated phenomena for enhancing collaboration. We have to make sure that the spatial repartition (of data and users), and software tools are pertinent enough to provide optimum conditions. The common denominator between the multiple questions is the users' behavior during collaboration. For instance, regarding nonverbal communication, the fact that the users are brought together in the same room allow them to naturally benefit from the gesture, the gaze direction, or facial expressions. All these elements have a semantic signification that allows the teammates to better understand each other. This could significantly enhance the collaboration, especially in our war room, where virtual and real environment coexist to constitute a single mixed environment. Deictic gesture, for instance, which is used to show something to the others, not only apply to the 3D scene displayed, but can refers to the physical space, i.e. the room. Moreover, we have seen in the section above that co-workers established several dependences map, including tasks, specialties, and data's localization. This phenomenon may be determining regarding the collaboration's efficiency. Consequently, the spatial distribution of the physical devices could influence the teamwork. In the same way, the distribution of virtual objects across the workstations must keep coherence regarding the dependences map. An evaluation should be performed to understand exactly, through several different configurations, what choices are the most judicious, and what the potential effects of the distribution on teamwork are.
840
J. Ringard et al.
4 War-Room Software Support Proposal 4.1 Software Design We have chosen to start from a existing platform to develop the war room software: SPIN|3D [3]. SPIN|3D is a CVE software platform whose major particularity is the fact that every object in the environment is composed of a structure close to the well known Model-View-Controller structure (MVC), named Model-View-Tool (MVT). (fig. 2). In the MVT architecture, the tool is a component that is used to interact on the virtual object by modifying the data associated to this object. The data describing an interaction object is stored in a dedicated component: the “model”. The third component, the “view”, is a graphical representation of this object, for display. The view is a form of translation of the model's data. A single object can have several views.
Fig. 2. An example of distribution of the application
Contrary to the standard SPIN|3D architecture, the presence of every virtual object on the devices is not always necessary. This is why we have developed a completely free distribution of these components between the workstations, by adding a network communication layer on the links between the Model, the View, and the Tool components. So the three components can be distributed on three different computers acting as a single MVT structure, the distribution being transparent to the platform's high-level layers. In practical terms, this enables the fact to split the tasks and to make them independent from the hardware configuration of the war room. Various scenarios are possible (fig.2), such as: − In a configuration providing one device optimized for 3D interaction, while another station offers a better 3D display, the user can use the two computers simultaneously: the first one with the input device (like a remote control), and the second one for visualization. − If a user is working using a 3D view, he is able to send (or duplicate) his local view to a large scale display workstation, to make the object viewable by the teammates. This distributed architecture enables multiple possibilities that we have illustrated through the realization of a prototype, presented in the next section.
Computer-Aided Collaborative Work into War Rooms
841
4.2 The Prototype To illustrate the various features provided by our architecture, we have developed a prototype, including 3 different computers (fig.3): − A 3D accelerated PC, connected to a large scale projection display. This station is also equipped with a wireless interaction stylus (prototype input device [16]) that allows 3D manipulation, and a ARTrack infrared tracking system. − An interactive tabletop, dedicated to 2D interaction through a tactile interface. − A standard laptop. The scenario proposed is a car review application, allowing users to perform simple interactions on a virtual vehicle. On the tabletop, a 2D interface is available for interactions on blueprints views. Thanks to this interface, the user is able to select a color value and to drag it to the car parts. In the same way, the doors can be manipulated with sliders or buttons, which fit better on a 2D interface. Naturally, every interaction performed on this 2D interface is directly displayed on the 3D views. The others workstations are used to provide direct interaction on the car's 3D model, through different parts manipulation. The 3D view is displayed on one station at a time (laptop or 3D optimized PC). Thus, the users are able to “teleport” the car's 3D view from one workstation to the other, depending of the computer they want to use for visualization and interaction. As those two computers don't provide the same computing power, the level of detail of the 3D mesh used as a view is different, depending on the displaying station. Moreover, to avoid too many moves from the teammates in the room, they are able to teleport their pointer between the computers, allowing them to interact at a distance from the display station, and using their local input device for interaction. Consequently, the coworkers are able to interact simultaneously on the 3D car model, using a common display surface, but using their own input devices, and so different interaction modalities. This demonstrator has been presented during the professional exhibition “Laval Virtual 2008” in France, and has received a very positive feedback from the users. A lot
Fig. 3. The prototype
842
J. Ringard et al.
of visitors have shown a great interest in our activity repartition, especially about the complementarity of each viewing/input device. After a brief presentation of the concept and of the presence of a single distributed object, the use of the installation has not presented any major issue. The ability to modify a 3D object thanks to the 2D interface has been pointed as interesting for the most part of the users. A lot of them have expressed their desire to enable distant collaboration with the room.
5 Future Works 5.1 Mixed Presence While working on large scale projects, bringing every teammate in the same geographic point is not always possible. Therefore, it can be interesting to keep all the advantages provided by CVEs by allowing a third party user (or team) to integrate the collocated team without moving. The major issue raised by this feature lies in an asymmetry, which is involved in two ways: First, human asymmetry, already known as “mixed presence” [17]. Groups involved in distant collaboration are confronted to a problematic communication configuration. Secondly, hardware asymmetry can be problematic. If a distant user wants to join the team, he must be able to get connected to the war room from his personal computer, without suffering the lower interaction possibilities of his devices. These two asymmetries must be evaluated on our prototype to understand how to avoid communication issues between distant co-workers and how to enhance collaboration in such a mixed environment. 5.2 Semantic Data To formalize the technology presented above, we can say that the software side of the war room focuses on multiple “interaction channels”. A channel is defined by a display type (2D, 3D, video...), and its associated interaction modalities. Even if it is possible to factorize several interactions between channels to keep them compatibles with any interface, some other interactions may be only available for one channel, without any equivalence on the others. In a situation presenting a single object displayed on two workstations through different channels, every user must be aware of what the others are doing: To maintain the “social awareness”, the interactions (and ideally, the whole interaction sequence) must be displayed on every station, even if the channel are not compatibles. The solution we are currently working on consists in changing the semantic level of the interactions: In a improved-MVT structure as described above, most of the data stored in the model are geometric or visual data. By raising the level of this data to use real semantic information, the graphical interpretations can adapt to the corresponding display channel. The next step in this work is to create a precise architecture introducing semantic abstraction of the MVT model: this solution requires the introduction of software adapters to convert the semantic data into geometric one. In this way, the structure would approach some known models such as PAC (presentation, abstraction, control),
Computer-Aided Collaborative Work into War Rooms
843
that can be used for semantic abstraction of data. The main difference between improved-MVT and these models lies in the absence of a generic “tool”-like component in PAC.
6 Conclusion The proposition we have presented in this paper allows multiple users to collaborate in a collocated environment, while taking the most of the available hardware diversity. The distribution of MVT components between the workstations of the war room provides to the users a great flexibility, allowing them to be fully independent from their position in space, from the display surfaces, and from the input devices. This independence allows taking into account the teammates' specialties by providing them the most adapted interactions modalities and display methods for the subtask they want to perform. Moreover, this solution offers very good inter-users communication, thanks to the direct contacts allowed by collocation. The spatial repartition of the activity, as well as the possibility to invite any distant user to virtually join the collocated team, raises some questions regarding the ability to collaborate in such an environment. As a consequence, an evaluation has to be performed to answer these questions. In the same way, developing the ability to set up a group to group collaboration using war rooms, taking into account hardware and human heterogeneity constraints, represents a new question. Acknowledgments. We would like to thank Dominique Pavy and Arnaud Bouget from Orange Labs for their participation to realize the prototype. This work is funded by the ANR Part@ge project (06 TLOG 031) and by the IRCICA research federation from CNRS.
References 1. Mark, G.: Extreme collaboration. Commun. ACM 45(6), 89–93 (2002) 2. Teasley, S., Covi, L., Krishnan, M.S., Olson, J.S.: How does radical collocation help a team succeed? In: CSCW 2000: Proceedings of the 2000 ACM conference on Computer supported cooperative work, pp. 339–346. ACM, New York (2000) 3. Picard, S.L.D., Degrande, S., Gransart, C.: A corba based platform as communication support for synchronous collaborative virtual environment. In: M3W: Proceedings of the 2001 international workshop on Multimedia middleware, pp. 56–59. ACM, New York (2001) 4. Margery, D., Arnaldi, B., Chauffaut, A., Donikian, S., Duval, T.: Multi-threaded or modular animation and simulation kernel or kit: a general introduction. In: Virtual Reality International Conference, pp. 101–110 (2002) 5. Leigh, J., Johnson, A., Park, K., Singh, R., Chowdhry, V.: Amplified collaboration environments. In: VizGrid Symposium (2002) 6. Olson, J.S., Covi, L., Rocco, E., Miller, W.J., Allie, P.: A room of your own: what would it take to help remote groups work as well as collocated groups? In: CHI 1998: CHI 98 conference summary on Human factors in computing systems, pp. 279–280. ACM, New York (1998)
844
J. Ringard et al.
7. Teasley, S.D., Covi, L.A., Krishnan, M.S., Olson, J.S.: Rapid software development through team collocation. IEEE Trans. Softw. Eng. 28(7), 671–683 (2002) 8. Bos, N., Olson, J., Nan, N., Shami, N.S., Hoch, S., Johnston, E.: Collocation bindness in partially distributed groups: is there a downside to being collocated? In: CHI 2006: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 1313–1321. ACM, New York (2006) 9. Nacenta, M.A., Pinelle, D., Stuckel, D., Gutwin, C.: The effects of interaction technique on coordination in tabletop groupware. In: GI 2007: Proceedings of Graphics Interface 2007, pp. 191–198. ACM, New York (2007) 10. Tang, A., Tory, M., Po, B., Neumann, P., Carpendale, S.: Collaborative coupling over tabletop displays. In: CHI 2006: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 1181–1190. ACM, New York (2006) 11. Streitz, N.A., Geißler, J., Holmer, T., Konomi, S., Müller-Tomfelde, C., Reischl, W., Rexroth, P., Seitz, P., Steinmetz, R.: 1999. i-LAND: an interactive landscape for creativity and innovation. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: the CHI Is the Limit, Pittsburgh, Pennsylvania, United States, May 15 - 20, 1999. ACM, New York (1999) 12. Bezerianos, A., McEwan, G.: Presence disparity in mixed presence collaboration. In: CHI 2008: CHI 2008 extended abstracts on Human factors in computing systems, pp. 3285– 3290. ACM, New York (2008) 13. Mcewan, G., Rittenbruch, M., Mansfield, T.: Understanding awareness in mixed presence collaboration. In: OZCHI 2007, pp. 171–174. ACM, New York (2007) 14. Mark, G., Abrams, S., Nassif, N.: Group-to-group distance collaboration: examining the ”space between”. In: ECSCW 2003: Proceedings of the eighth conference on European Conference on Computer Supported Cooperative Work, Norwell, MA, USA, pp. 99–118. Kluwer Academic Publishers, Dordrecht (2003) 15. Demeure, A., Calvary, G.: Plasticity of user interfaces: towards an evolution model based on conceptual graphs. In: IHM 2003, pp. 80–87. ACM, New York (2003) 16. Ecole Supérieure d’Informatique, Electronique, Automatique. ESIEA, http://www.esiea.fr 17. Tang, A., Boyle, M., Greenberg, S.: Display and presence disparity in mixed presence groupware. In: AUIC 2004: Proceedings of the fifth conference on Australasian user interface, Darlinghurst, Australia, pp. 73–82. Australian Computer Society, Inc. (2004)
Optimizing Online Situation Awareness Probes in Air Traffic Management Tasks Thomas Z. Strybel, Katsumi Minakata, Jimmy Nguyen, Russell Pierce, and Kim-Phuong L. Vu California State University Long Beach, Center for the Study of Advanced Aeronautics Technologies 1250 N Bellflower Blvd. Long Beach, CA 90840, USA [email protected], [email protected], [email protected], [email protected], [email protected]
Abstract. We examined the effectiveness of situation awareness probe questions in predicting sector performance and behavior in a human-in-the-loop simulation air traffic management (ATM) simulation with low (50%) and high (75%) traffic densities. Probes were presented online during the performance of the air traffic management task, and the accuracy and response latencies were measured. Hierarchical linear modeling was used to analyze the predictive power of each category type. Response latencies for conflict probe questions predicted performance metrics associated with separation assurance. Keywords: situation awareness measurement, air traffic management, NextGen.
1 Introduction The most impacted operators in Next Generation Airspace Transportation System (NextGen) will be those of pilots and air traffic controllers (ATCs). Pilots operating in NextGen environments may assume expanded responsibility for flight planning and separation. ATCs will be using tools that enable them to safely and effectively share responsibility for separation assurance with aircrews and automation, while at the same time being centrally involved in managing aspects of new air-trafficmanagement (ATM) concepts. Presently, the impacts of these NextGen ATM concepts and technologies are unknown, yet success in meeting NextGen objectives depend on optimized function allocations between pilots, ATCs, and automated tools. Effective function allocation requires measures of operator situation awareness (SA), workload, and performance that can assess the impact of changing task demands. Unfortunately, reliable, valid, and robust measures are presently unavailable [1]. SA can be defined as either the processes used to develop and maintain awareness [2], or the information that determine the state of awareness [3]. A precise definition for the construct is still being debated, and consequently, there are no robust measures of the construct. SA measures usually fall into one of three types: subjective, performance-based, and probe. Probe measures query the operator about awareness of information. Two probe methods are commonly used: Situation Awareness Global M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 845–854, 2009. © Springer-Verlag Berlin Heidelberg 2009
846
T.Z. Strybel et al.
Assessment Technique (SAGAT) [2], and Situation Present Assessment Method (SPAM) [3]. With SAGAT, an “offline” probe technique, the simulation is frozen, the operator’s displays are blanked, and the operator is queried about information in the simulation. SPAM is an online technique in which probe questions are administered to operators individually during a scenario [3]. Durso et al. [3] showed that SPAM reaction times predicted novice ATC performance after variance due to individual differences in cognitive skills was removed. SPAM reaction times have been shown to be related to measures of ATC [3, 4] and pilot [5] performance. However, some investigations have found that online probing reduces performance and increases workload [4, 5]. One limitation of online probes is that a standard method for developing probe questions is nonexistent. For offline probes, a Goal Based Task Analysis Technique is recommended, but this technique is time consuming and focuses on information requirements without assessing either priorities or understanding of the task. Online probe questions are usually developed with subject matter experts, but information is needed on what (i.e., information content) to query and how (i.e., question format) to query in order for the technique to be useful in comparing NextGen concepts. In our previous investigations, probe questions addressed SA process (recall and comprehension) and time frame (past, present and future), but the content of information probed was not systematically manipulated [4, 5]. The present study examined the relative effectiveness of questions, based on types of processing, time frame, and information content, for predicting ATC performance variables. These categories were investigated in an ATM simulation in which ATCs managed traffic while responding to online probe questions.
2 Method 2.1 Participants Seven students enrolled in the Aviation Sciences Program at Mount San Antonio College and nine retired air traffic controllers (6 TRACON and 3 ARTCC) participated in the simulation. For more information regarding participant background, see Vu et al. [6]. Each participant ran in six test scenarios with the order of scenario presentation counterbalanced between participants. 2.2 Apparatus The simulation was run using the Multi Aircraft Control System (MACS) developed in the Airspace Operations Lab at NASA Ames Research Center. MACS is a medium fidelity simulation for simulating both ground and air side operations [7]. Each participant’s ATC station was a simulated DSR display of combined sectors ZID 91 and 81. Simulated datalink and conflict probe tools were unavailable for ATC-pilot communications and conflict probing, although a simulated datalink window located outside of the DSR screen was used for online probing. Participant ATCs communicated with pseudopilots located in an adjacent room via VoiceIP software [8]. Six 40-minute scenarios were created, three of which approximated current-day low (50%) and high (75%) traffic densities. An automated ghost controller station
Optimizing Online Situation Awareness Probes in Air Traffic Management Tasks
847
managed all traffic outside the participants’ sectors, and initiated handoffs to the participant ATCs 15nm outside the sector boundaries. ATC participants, when appropriate, initiated AC handoffs to the ghost controller, which were automatically accepted after 30 seconds. 2.3 Procedure Twelve probe questions were developed for each scenario. These were administered at three-minute intervals beginning at four minutes into the scenario. Probes were presented to participants in a datalink window located on the right side of the DSR display at roughly eye level. Participant responses were made with a CH Products Multifunction Panel that allows keys to be arranged in any order. Each key was programmed with a macro consisting of key presses, mouse movements and clicks to send a coded message from the probe display. Probe queries were administered by an experimenter located in an adjacent room. A probe sequence began with a “Ready Question” message sent to the participant’s datalink window accompanied by an audio alert. When the participant had sufficient time to take a question, he/she pressed the Ready button sending an affirmative message back to the experimenter. The experimenter immediately sent the probe question, and the participant responded by selecting one of the six buttons located on the bottom of the response panel. If the Ready response was not acknowledged after two minutes, the query was withdrawn and the next probe was sent one minute later. Queries were developed with subject matter experts who were familiar with the scenarios. The individual questions fit into one of three information processing categories, search/recall, comprehension and subjective assessment, and two time frames, immediate-past/present and future. Examples of questions fitting each combination of processing and time frame are presented in Table 1. Search/recall questions (e.g., Questions 1 and 2 in Table 1) could be answered by retrieving information from memory or finding information on the ATC display. No other processing was required to respond correctly. Comprehension probes (Questions 3 and 4 in Table 1) were used to assess the operator’s understanding of the situation. Correct answers to these queries required the operator to retrieve information from memory or the display and process it. Subjective rating questions (Questions 5 and 6 in Table 1) were questions in which the participant provided an assessment of either the likelihood of an event or severity of a conflict. For each processing category, the probe question was directed at either the immediate past or present state of events, or required projection into the future. In addition to the processing/time frame categorization, the content of probe questions addressed three areas of ATC task knowledge: Sector Status, Commands and Communications, or Conflicts (see Table 1). Questions on sector status requested information regarding current sector state, such as number of aircraft, number departures or distance to a boundary. Command/Communication questions probed ATCs knowledge of the next likely command to be issued, the last command issued, handoffs, and communication errors. Conflict questions probed knowledge of current and future conflicts between an aircraft pair. In addition to information contained in the question, the format of the questions was categorized as Multiple Choice, Yes-NO or rating. Multiple choice questions were answered by selecting one of six alternatives,
848
T.Z. Strybel et al.
Table 1. Examples of probe queries and their classification based on processing (RC: Recall, CMP: Comprehension, SB: Subjective Assessment), time frame (IP: Immediate Past/Present, F: Future), Information Content (SEC: Sector Status, COM: Command/ Communication, CNF: Conflict), and question format (MC: Multiple Choice, OT: other). Processing & Time Frame
Information Content & Question Format
REC CMP SB
Sample Question
1. How many AC are in descent to SDF NOW? 2. Will FDX32 be the next overflight to exit your
SEC
COM
CNF
IP F IP F IP F M OT MC OT MC OT C 9
9 9
9
sector?
3. How many pilot read back errors in the last 5 min.? 4. How many conflicts will ASQ381 have if you
9
9 9
9
take no further action?
5. Rate concern about SWA2898 and AWE989. 6. Rate likelihood you will vector EGF494 for traffic.
9
9 9
9
usually representing a quantity. For example a query “How many aircraft ...” was responded by selecting one of six response buttons labeled 0 thru 4, and 5+. Yes-no questions required answer of agreement/disagreement, and ratings were made on a six item scale corresponding to the six response buttons with the left-most button labeled “very low/very unlikely” and the right-most button “very high/very likely.” Thirtyseven probes were multiple choice format, 17 yes-no and 18 rating. For subsequent data analysis, yes-no and rating questions were combined into an “Other” category. Unfortunately, the number of questions addressing each content area, processing category/time frame, and format combination was not equivalent. Therefore, each category was analyzed separately, and the interpretation of our results is limited to the effects of each probe category. Participants’ responses to probe questions were time stamped and saved in MACS data files. The correct answers for each scenario and participant were obtained by reviewing scenario video and audio recordings and MACs data files. The mean percent correct for probes were determined and averaged based on processing categories, time frame, information content and question format. Response latencies for correct and incorrect answers were also determined as a function of each category. These were analyzed as a function of participant group and traffic density. The results of participant experience are reported elsewhere in this volume [6]. The following ATC performance variables were analyzed:
Optimizing Online Situation Awareness Probes in Air Traffic Management Tasks
849
• Mean Handoff Time: The average time per aircraft between accepting a handoff and handing it to the next sector. • Handoff Time Standard Deviation: Standard deviation of handoff times for each participant and sector. • Mean Sector Time: The average travel time through the sector per AC. • Sector Time Standard Deviation: Standard deviation of sector times in a scenario. • Number LOS. Total number of LOS per scenario. • Average Vertical Distance: The average vertical distance between each aircraft pair. From voice transcripts we obtained measures participant behaviors: • Percentage of altitude, heading and speed changes: Relative number of changes made to aircraft in terms of altitude, heading and speed. • Number of Traffic Advisories: Number of messages that pointed out nearby traffic. • Number of Corrections: Number of times a corrections to an instruction was issued. • Total Number Communications: Number voice messages sent by the ATC participant. We examined the effects of probe categories on accuracy and latency, and the effectiveness of each probe measure in predicting ATM performance behaviors.
3 Results 3.1 Probe Performance The percentage of correct responses and response latencies were analyzed with separate mixed ANOVAs with factors of experience, traffic density, processing, and time frame. A significant interaction of time frame and processing category was obtained for accuracy, F(2,28) = 10.91, p <.001. For the immediate time frame, participants showed most agreement with a subject matter expert in their assessment of the information being queried (see Table 2). For the future time-frame, accuracy was higher for recall and subjective assessment probes than comprehension probes. Significant effects of processing category, time frame, and their interaction, F(2,28) = 21.08, p <.001, were obtained on response latency. Latencies lowest for comprehension probes and immediate probes, but the latencies were more equivalent across processing categories for the future time frame. There is little evidence for speed-accuracy tradeoffs here, because for past/present probes the mean recall latency (13 s) was higher than the mean comprehension latency (9.1 s), yet accuracy was equivalent (56% vs. 58%). Mixed ANOVAs evaluated probes based on information content and question format, see Table 2. A main effect of format, F(1,14) = 49.1, p <.001, and marginally significant interaction between format and information, F(2,28) = 2.81, p = .07, was obtained on the percent correct responses. As expected, the accuracy was significantly lower for multiple choice questions (M=55%) than to the other (yes/no or rating) questions (M=80%). Accuracy was similar among the information content categories for multiple choice questions, but highest for questions that probed sector status for the other format. There were significant effects of information content, F(2,28) = 24.49,
850
T.Z. Strybel et al. Table 2. Probe Accuracy and Latency For Each Probe Category Past/Present
Processing
PC
Future
RT
Mult. Choice
Other
PC
RT
Information
PC
RT
PC
RT
Recall
56%
13 s
78%
16 s
Sector Status
51%
12 s
86%
16 s
Comp
58%
9s
58%
12 s
Command
56%
11 s
76%
12 s
Subj
89%
16 s
75%
15 s
Conflict
58%
14 s
78%
16 s
p <.001, format, F(1,14) = 44.37, p < .001, and a marginally significant interaction between them, F(2,28) = 2.21, p = .08, on response latencies. Response latencies for multiple choice questions were equivalent but latencies for the other format were 4 s faster for command probes. 3.2 Performance Measures Table 3 compares the means of sector performance measures for low and high density scenarios. Although average handoff time per AC was not significant, the standard deviation of the handoff times was marginally significant. Greater variability for handoff times was shown in high density scenarios. The time through the sector was higher and more variable with higher traffic density. The average vertical distance between aircraft was higher in high density scenarios, but this difference only approached significance. Table 3. Significant Effects of Traffic Density on ATC Performance Measures Low Density
High Density
Performance Measure
Mean
Standard Error
Mean
Standard Error
p
Handoff Time Std Dev
145.0
8.0 s
170.2 s
9.7 s
.06
Sector Time
709.0
2.7 s
742.2 s
3.7 s
<.001
Sector Time Std Dev
191.0
4.8 s
195.0 s
3.3 s
<.001
Table 4 summarizes participant actions that were significantly affected by traffic density. The percentage of altitude and heading changes increased with density, while the percentage of speed changed decreased. Participants also issued significantly more traffic advisories in high density scenarios.
Optimizing Online Situation Awareness Probes in Air Traffic Management Tasks
851
Table 4. ATC Behavioral Measures for Low and High Traffic Densities High Density
Low Density Measure
Mean
Standard Error
Mean
Standard Error
p
Altitude %
71%
3%
75%
2%
.08
Heading %
19%
3%
27%
2%
<.01
Speed %
6%
1%
2%
1%
<.01
Traffic
2.7
.4
4.3
.7
<.02
Some of these behavioral measures were significantly correlated with sector performance metrics. LOS was negatively correlated with number of traffic advisories, r(89) = -.33, p < .001, and positively correlated with number of corrections, r(89) = .22, p = .04. In effect, greater numbers of LOS were associated with fewer traffic advisories and more corrections. Variability in handoff times was positively correlated with percentage of heading changes, r(89) = .29, p > .001, and negatively correlated with percentage of altitude changes, r(89) = -.24, p = .02, and number of corrections, r(89) = -.28, p <.001. Greater variability in handoff times was therefore associated with more heading changes, fewer altitude changes, and fewer corrections. 3.3 Predicting Performance and Behavioral Measures from SPAM Probe Latencies Hierarchical Linear Modeling (HLM) was used to determine the effectiveness of online probes in predicting performance and ATC behaviors. There are several advantages to this approach. HLM can be applied to unbalanced data, requires fewer assumptions about variance-covariance matrices, and with centered variables, HLM partitions variance into between-subject and within-subject components. Therefore, we evaluated the extent to which probe latencies predicted differences in performance between-participants and differences within-participants across the scenarios [9]. Probe latencies for each category were evaluated separately because of the unbalanced design. All response latencies were normalized by inverse transformations. For each predicted measure, an unconditional model having no predictors was developed. This model creates two intercepts, representing unexplained betweensubject and within-subject variance. From these models, we determined that the relative proportion of between- and within-subject variance depended on the specific measure. For example, 71% of the total variance in handoff times was due to differences between subjects, but only 21% of the total variance in LOS was between subjects. After the Unconditional model was created, separate HLMs were created with two response latency predictors by centering predictor variables: a between subjects’ predictor computed as differences in the mean probe latency of each participant from the grand mean, and a within subjects’ component, computed as the differences in probe latencies between each probe latency and the participant’s mean latency.
852
T.Z. Strybel et al. Table 5. Summary of HLM analysis of probe predictors
Measure
Intrasubject Probe Intersubject Probe Predictors Predictors
Slope
p
Variance Reduction
Handoff Std Dev.
Future
.007
.04
6%
LOS
Conflict
-.04
.02
2%
Multiple Choice
-.03
.02
2%
Future
-.05
.01
2%
Conflict
-24.6
.03
9%
Multiple Choice
-22.3
.05
15%
Command
-.004
.01
5%
Subjective
.01
.12
1%
Future
.008
.01
7%
.11
.05
10%
Ave Vert Distance
Altitude Change % Speed Change %
Traffic Advisories
Conflict
A summary of measures having significant predictors is shown in Table 5. Latencies of online future probes predicted handoff variability by reducing withinsubject variance by 6%. Because of the inverse transformations, positive slopes means that the response latencies were inversely related to handoff time and standard deviation: Longer response times predicted lower standard deviations. Several probe categories significantly or marginally predicted LOS: sector status, conflict, multiple choice and future time frame, each reducing within subjects’ variability 2%. For each probe category, the negative slope meant that faster response times were associated with fewer LOS. The average vertical distance was predicted by conflict, multiplechoice format and comprehension probe latencies, with faster response latencies predicting less distance between aircraft. For behavioral measures of performance, the proportion of altitude and heading changes were significantly predicted by command probe latencies. Longer response latencies predicted a higher proportion of altitude changes and lower proportion of heading vectors. The proportion of speed changes was predicted by subjective probes and future-time-frame probes but these predictors reduced variance between participants. Traffic advisories were predicted by conflict-probes; longer latencies predicted fewer traffic advisories.
4 Discussion This preliminary investigation of the efficacy of online situation awareness probes in predicting ATC behavior and performance suggests that the technique has merit and
Optimizing Online Situation Awareness Probes in Air Traffic Management Tasks
853
may be used to predict SA and changes in SA when NextGen ATM concepts and automation tools are introduced. Online probe latencies for probes related to the ATC’s awareness of conflicts significantly predicted the number of LOS; longer probe latencies for these questions were associated with greater numbers of LOS. The significant slope obtained with HLM was for intrasubject differences in probe latencies, suggesting that the probes are measuring changes within the operator over scenarios. Moreover, conflict probe latencies significantly predicted number of traffic advisories and number of corrections issued by ATC participants. This is not surprising when one considers that traffic advisories and corrections are negatively correlated with LOS. When ATC participants issued the most traffic advisories and the fewest number of corrections there were fewer LOS. Note also that sector status probes were significant predictors of LOS, possibly another component of SA is involved with LOS that is not determining number of traffic advisories. Similarly, command probe latencies were significant predictors of the percentage of altitude and heading changes. Faster probe latencies predicted a higher proportion of altitude changes and lower proportion of heading changes. Online probe latencies previously were shown to predict pilot error and novice ATC violations [4, 5]. Note that most significant predictor categories were based on information content. Very few significant predictors based on processing were found, and when these were significant, they were for between subject differences. For example, the proportion of speed changes was predicted by comprehension probes and future-time-frame probes. However, these slopes were for probe latencies averaged for each participant and centered on the grand mean. Possibly, these categories assess individual differences in ATC behavior, related to the cognitive skills identified by Durso et al. [3] as predicting performance. Caution must be taken, however, as the interdependence of categories makes definite statements difficult. Nevertheless, we believe these findings indicate that online probing as a method of measuring SA is promising. Acknowledgements. This simulation was partially supported by NASA cooperative agreement NNA06CN30A.
References 1. Rantanen, E.: Development and Validation of Objective Performance and Workload Measures in Air Traffic Control. Tech. Report AHFS-04019/FAA-04-07. Univ. of Illinois, IL (2004) 2. Endsley, M.R.: Measurement of situation awareness in dynamic systems. Human Factors 37(1), 65–84 (1995) 3. Durso, F.T., Bleckley, M.K., Dattel, A.R.: Does situation awareness add to the validity of cognitive tests? Human Factors, 721–733 (2006) 4. Pierce, R.S., Strybel, T.Z., Vu, K.-P.L.: Measuring situation awareness and its contribution to performance in air traffic control tasks. In: Proceedings of the 26th International Congress of the Aeronautical Sciences, Anchorage AK (2008) 5. Strybel, T.Z., Vu, K.-P.L., Kraft, J.: Assessing the Situation Awareness of Pilots Engaged in Self Spacing. In: Proceedings of the Annual Meeting of the Human Factors and Ergonomics Society, pp. 11–15. HFES, NY (2008)
854
T.Z. Strybel et al.
6. Vu, K.-P.L., Minakata, K., Nguyen, J., Kraut, J., Raza, H., Battiste, V., Strybel, T.Z.: Situation Awareness and Performance of Student versus Experienced Air Traffic Controllers. In: Smith, M.J., Salvendy, G. (eds.) Human Interface, Part II, HCII 2009. LNCS, vol. 5618, pp. 865–874. Springer, Heidelberg (2009) 7. Prevot, T.: Exploring the many perspectives of distributed air traffic management: The multi aircraft control system MACS. In: International Conference on Human-Computer Interaction in Aeronautics, HCI-Aero 2002, October 23–25. MIT, Cambridge (2002) 8. Canton, R., Refai, M., Johnson, W.W., Battiste, V.: Development and Integration of Human-Centered Conflict Detection and Resolution Tools for Airborne Autonomous Operations. In: Proceedings of the 15th International Symposium on Aviation Psychology. Oklahoma State University, Columbus (2005) 9. Singer, J.D.: Fitting individual growth models using SAS Proc Mixed. In: Moskowitz, D.S., Hershberger, S.L. (eds.) Modeling Intraindividual Variability with Repeated Measures Data: Methods and Applications. Lawrence Erlbaum, Mahwah (2002)
A Development of Information System for Disaster Victims with Autonomous Wireless Network Yuichi Takahashi1, Daiji Kobayashi2, and Sakae Yamamoto1 1
Department of Management Science, Tokyo University of Science 1-3 Kagurazaka Shinjuku, Tokyo, Japan {yt,sakae}@hci.ms.kagu.tus.ac.jp 2 Faculty of Photonics Science Department of Global System Design, Chitose Institute of Science and Technology, 758-65 Bibi Chitose Hokkaido, Japan [email protected]
Abstract. In times of huge disaster such as earthquake, information needs increase among victims and rescuers. Social ferment rises within afflicted area and the damage is spread, if the needs of information are not satisfied. In this research we developed an information system for disaster victims as a distributed autonomous system using a wireless network. This system consists of many sub systems. These sub systems are robust for collecting disaster information because they are small and simple. An authorized user can register information using one of the sub systems that is working correctly. Asynchronously, they search another sub system via wireless network, and then they communicate to each other in order to exchange information they have. As a result, the information will be shared within a wide area by those processes like a bucket brigade. Keywords: earthquake, disaster victims, distributed autonomous system, wireless network.
1 Introduction An epicentral earthquake with magnitude of seven on the Tokyo metropolitan area is predicted to occur within the next three decades.[1] Rough estimate of the damage are shown as Table 1 and Table 2. In a time of huge disaster such as earthquake, information needs increase among victims and rescuers. The information of disaster includes itself (cause, aftershock activity), the safety of near relatives or acquaintances, situation of evacuation area and so on. Social ferment arises within afflicted area, and the damage is spread, if the needs of information are not satisfied. [4][5] In order to prevent the damage from spreading, information of the damage as well as the situation of victim should be collected as early as possible. Fig. 1 shows the relation between the disaster information system and the social organizations infrastructure and the inhabitants. The right side of figure represents hierarchy of the social organizations infrastructure, and left side represents system that corresponds to these social organizations M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 855–864, 2009. © Springer-Verlag Berlin Heidelberg 2009
856
Y. Takahashi, D. Kobayashi, and S. Yamamoto
infrastructure. This means that these disaster information systems form a similar hierarchy. Currently, the disaster information system gradually is put in government as well as local municipalities. The initial information can be directly collected only at evacuation center; and there are many people who need the information; however, disaster information systems are not deployed to there yet. Therefore, disaster system for evacuation center is required. The system should be constructed in the community-based approach, because the inhabitants involve in the system directly. In this research, we developed information system for disaster victims as the distributed autonomous system using a wireless network, on the basis of the above consideration. Table 1. Rough estimate of the damage (premises and persons)[2] Time Winter, 5AM Winter, 6PM
Wind velocity 3m/s 3m/s 15m/s
Premises Casualty Persons Persons destroyed count injured seriously-injured 230,000 5,300 160,000 17,000 480,000 850,000
7,300 11,000
180,000 210,000
28,000 37,000
Table 2. rough estimate of the damage (essential utilities)[3] Type Electric power Communication Gas Water supply Sewage line
Damage 9.2-16.9% 6.0-10.1% 6.4-17.9% 24.5-34.8% 19.9-23.3%
Recovery period 6 days 14 days 22 days 21 days 21 days
Fig. 1. The relation between the disaster information system and the social organizations infrastructure and the inhabitants [6]
A Development of Information System for Disaster Victims
857
2 Method This research discussed the following issue when the disaster information system is developed. • Just after when the earthquake broke out, it is able to collect and provide disaster information. • Depending on the recovery of system area, service area can extend. • Provides the disaster information system that can be operated actually. • Takes into account the safety such as information’s credibility as well as personal information. 2.1 Structure of the System The system consists of many sub systems that are called nodes. The nodes are laid out the evacuation centers, sconces, and another place. The nodes are robust for collecting disaster information because they are small and simple. Elements of the node are shown in Fig. 2. The node has a power circuit and batteries in order to keep running without AC power line. An authorized user can register information using one of the nodes that is working correctly. A user with IC card can be authenticated by the node, because physical accessing is required for security reasons. Unauthorized users can refer the information via wireless network, but they have no way to register any information. Asynchronously, the nodes search another sub system, and then the pairs of nodes communicate to each other in order to exchange information they have. Hence, the information is gradually gathered by those processes like a bucket brigade, if there is an accessible working node. The information can be sent to another node, even if the nearest node is damaged. As a result, the information will be shared within a wide area. Whole image of the system is shown in Fig. 3. Software such as operating system, autonomous control system, HTTP server, and web applications are installed on the nodes. The interfaces to register information are provided as web application software. Fig. 4 shows the user interface of some of them.
IC Card Reader (optional)
WiFi Access Point + Anntena
Notebook PC (optional)
Power Circuit w/ LR20 batteries Small Server
Fig. 2. Elements of the node
858
Y. Takahashi, D. Kobayashi, and S. Yamamoto
Fig. 3. Whole image of the system: all nodes search another sub system, and then they communicate to each other in order to exchange information they had
Fig. 4. User interface of the web application: to register someone’s safety
2.2 Evaluation 2.2.1 Usability In this research, ISO9241-11’s definition of usability is used. The system is used under particular circumstances compared to normal system. One, the user is disaster victim; the other, the user must operate the system without any training. In the ‘context of use’ as above, each evaluated indexes of the usability are set up as followings, • Effectiveness: a user can register the information (someone’s safety) correctly without any help,
A Development of Information System for Disaster Victims
859
• Efficiency: a user can register sixty set of the information (someone’s safety) within an hour, • Satisfaction: a user is not alarmed, and has no guesswork to operate the system. 2.2.2 Performance of the System Each evaluated indexes of the performance are set up as followings, • Rate of information throughout: hundred registered data should be transferred to nearby nodes within five minutes, • Thrifty power consumption: measurement of the operating time with dry batteries, this result is used to determine number of dry batteries saved against emergency, • Distance of wireless communication: measurement of the distance of wireless communication, this result is used to consider the arrangement of the nodes.
3 Verification Experiment Verification experiments consist of two parts that are corresponding to evaluation. • User testing: Verification of usability for a user who has never operated the system. • Software Simulation: Verification of the number of the nodes and the relation to the rate of information throughout, using the result of actual measurement. • Field testing: Verification of behavior of the system in the actual installation location. • (Measurement of the operating time with dry battery) • (Measurement of the distance of wireless communication) 3.1 User Testing 3.1.1 Participants Participants were 10 persons ranging in age from 22 to 35 years (mean = 26.7, standard deviation =4.6). All of the participants had experience in operating personal computers, but they had never operated the system. They were free volunteers. All had normal or corrected-to-normal vision, and were right-handed. 3.1.2 Apparatus and Materials The experiment was conducted at a notebook computer (IBM ThinkPad TM X61, Windows Vista TM Ultimate (32bit), Japanese Edition, display size: 12.1", display resolution: 1024 x 768), and a small server (AtmarkTechno Armadillo-9 + Sony PaSoRi (RS-320) + Compact Flash (16GB)). The web browser was Mozilla Firefox version 3. The utterances and movements of the participants were recorded with a digital video camera (Panasonic NVMX3000). A manual that instructed on how to set up the notebook computer and how to register information about someone’s safety, was placed near the subject. A list of information that contained thirty victim’s information as name, gender, birthday, habitation area, and additional information was held by the experimenter. The list contained three groups, ten preregistered
860
Y. Takahashi, D. Kobayashi, and S. Yamamoto
list experimenter from back of subject, including screen of PC manual subject
Fig. 5. Layout of apparatus
inhabitants with IC card, ten preregistered without IC card, and the other ten was not registered. The half of each groups had additional information such as injury. As shown in Fig. 5, these apparatus were arranged. 3.1.3 Procedure Prior to experiment, participants sign the letter of consent, and heard the accounts of the tasks. They could speak freely while the experiment, and progress the task without a time limit. Assistances were provided on demand. At first, recording by the video camera was started, and then every participant executed the task as followings: 1. The subject read a manual, booted up the notebook computer, and executed login procedures, 2. The subject launched the web browser, and was authenticated with an IC card, 3. The experimenter told each data of someone’s safety on the list one by one, and the subject registered them. After the task ended, they filled in the questionnaire that asked difficulty of the task, and interviewed about that by the experimenter. 3.2 Software Simulation 3.2.1 Apparatus and Materials The experiment was conducted at a notebook computer (IBM ThinkPad™ X61s, Windows Vista™ Ultimate (64bit), Japanese Edition, Intel® Core™2 Duo CPU L7500, 4GBRAM), and the simulator was written in Java language (jdk1.6.0-11). The simulator generated defined node objects, put defined data, and made them communicate each other. Each node objects recorded the time when they received all of the data. The parameters (required time to search, connect, and transfer data) were taken from actual measurements. 3.2.2 Procedure At first, the node objects and data were defined in text files that contained comma separated values, and then the parameters were defined in properties file of Java language. Next, the simulator was executed ten times.
A Development of Information System for Disaster Victims
861
3.3 Field Testing 3.3.1 Apparatus and Materials The experiment was conducted at small servers (AtmarkTechno Armadillo-220 (as node A) / 240 (as node B) + IO-DATA GW-US54GXS + USB memory (16GB)) and a notebook computer (as node C, IBM ThinkPad TM X61, Windows Vista TM Ultimate (32bit), Japanese Edition). As shown in Fig. 6, these apparatus were located.
Fig. 6. The layout of the nodes
3.3.2 Procedure At first, hundred data were put into node A. Next, the other nodes (B, C) communicated to node A in order to exchange information they had. They recorded the times of beginning and ending the communication. These trials were repeated three times.
4 Result and Discussion 4.1 User Testing Every participants could register all (thirty) data correctly. Therefore, the index about effectiveness was satisfied. Fig. 7 shows the periods required to register information 100
time required to register[sec]
80 60 40 20 0
Fig. 7. Required time to register information about someone’s safety
862
Y. Takahashi, D. Kobayashi, and S. Yamamoto Table 3. Analysis of variance table of two way layout
Source additional information way to search (w/ICcard, registerd, Interaction
SS 745.34 4069.96 806.76
DF 1 2 2
MS 2.56 6.99 1.38
F 3.955 3.105 3.105
P <.01
Table 4. The results of the questionarrie Subject Gender Age Manual Understandable Sufficient information Setup Guessed Alarmed Registration Guessed Alarmed
m
35
m
31
m
26
f
25
f
24
m
30
f
30
m
22
m
22
m
no yes
yes no
yes yes
yes yes
yes yes
yes no
easy yes
yes yes
easy yes
yes yes
yes yes
no no
no yes
no no
no no
no no
yes yes
no no
no no
no no
no yes
no yes
no no
no yes
no no
no no
no no
no no
no no
no yes
22
for one. The average of them was 50.5sec (SD: 18.4sec), thus, the index about efficiency was also satisfied. Table 3 shows analysis of variance table of two way layout. It indicated the efficiency is affected to registration status of inhabitants, and is not related to with or without additional information. Table 4 shows the results of the questionnaire. Almost of the participants had no alarmed and no guessing about the tasks. Thus, the index about satisfaction was also satisfied. However, there were some requirements against the manual or the user interface of the system. The requirements have to be improved. The system is used under particular circumstances compared to normal system. One, the user is disaster victim; the other, the user must operate the system without any training. However, the former was not evaluated well in this research. The ‘context of use’ of usability of the system, was limited as operating the system without any training. 4.2 Software Simulation Table 4 shows the period required to distribute hundred data from the center node. It indicates the performance is well to far nodes; however, to the nearby nodes of the center node is not satisfied. This caused the communication was serialized; the center node could not communicate to the others, when the number of the nearby nodes was three or more. This point of the system should be improved. Anyway, our target city requires two hundred nodes in order to cover the city, thus, the result indicates the information will be shared in the city within thirty minutes. 4.3 Field Testing There were some high-rise apartment house that blocked communication between evacuation center and sconces. However, node B and node C could communicate to node A using wireless access points that were set up as relay node (WDS with node A).
A Development of Information System for Disaster Victims
863
Fig. 8. The period required to distribute hundred data
The required period to transferred hundred data, was 310.7sec (SD: 17.2sec). It indicated that a node could communicate to another node, even if they could not communicate directly, when a relay node was laid out suitable place. 4.4 Measurement of the Operating Time with Dry Battery A node that consists of a small server and wireless access point, could run through 276 minutes (SD: 14minutes) using eight alkaline batteries (LR20). A wireless access point could run 729 minutes (SD: 12minutes) using three alkaline batteries. Therefore, 128 alkaline batteries will be required by a node for three days running, and 18 will be required by a wireless access point. This result can be accepted to create a stockpile. 4.5 Measurement of the Distance of Wireless Communication Two nodes could communicate to each other within 300 m on straight road with a good view. They could connect each other; but they could not complete to transfer hundred data over 300 m. Therefore, maximum distance between two nodes should be less than 300 m.
5 Conclusion We developed information system for disaster victims as the distributed autonomous system using wireless network, and evaluated it. Results of this study will be of service to construct disaster information systems for inhabitants.
References 1. The Headquarters for Earthquake Research Promotion, http://www.jishin.go.jp 2. Cabinet Office, Government of Japan, http://www.bousai.go.jp
864
Y. Takahashi, D. Kobayashi, and S. Yamamoto
3. Masaharu, N.: Estimation of damage of Tokyo by capital inland earthquake. Chigaku Zasshi 116(3/4), 504–510 (2007) 4. Osamu, H., et al.: Disaster information and social psychology, Hokuju Shuppan, p. 177 (2004) 5. Osamu, H.: Hanshin-Awaji (Kobe) Earthquake investigation report in 1995-1, Institute of Socio-Information and Communication Studies, The University of Tokyo (1996) 6. Sakae, Y.: The Providing Disaster Information Services in Ubiquitous Days. Journal of the Society of Instrument and Control Engineers 47(2), 125–131 (2008)
Situation Awareness and Performance of Student versus Experienced Air Traffic Controllers Kim-Phuong L. Vu1, Katsumi Minakata1, Jimmy Nguyen1, Josh Kraut1, Hamzah Raza1, Vernol Battiste2, and Thomas Z. Strybel1 1 California State University Long Beach, Center for the Study of Advanced Aeronautic Technologies 1250 N Bellflower Blvd. Long Beach, CA 90840, USA 2 San Jose State University Foundation and NASA Ames Research Center Moffett Field, CA 94035, United States of America {kvu8,tstrybel}@csulb.edu, {kminakata,mrjimnguyen,krautjosh,hraza84}@gmail.com, [email protected]
Abstract. A human-in-the-loop simulation was conducted to examine performance, workload, and situation awareness of students and retired air traffic controllers using an on-line situation awareness probe technique. Performance of the students did not differ from the controllers on many of the performance variables examined, a finding attributed to extensive sector-specific simulation training provided to the students. Both students and controllers indicated that workload was higher and situation awareness was lower in scenarios where the traffic density was high. However, the subjective workload and situation awareness scores indicate that students were more negatively affected by traffic density. Implications of these findings are discussed. Keywords: situation awareness, air traffic controllers, NextGen.
1 Introduction The Next Generation Airspace Transportation System (NextGen) is a transformation of the existing national airspace system in the US brought about by unprecedented growth in the demand for air travel that will ultimately exceed current day capabilities [1]. The NextGen transformations will include tools and automation that impact the roles and responsibilities of air traffic controllers (ATCs) and pilots. For example, ATCs are likely to be equipped with automation tools that enable them to safely and effectively share responsibility for separation assurance with aircrews and/or automated separation assurance systems. Presently, NextGen concepts of operations and technologies are still in development. These concepts and technologies will need to be evaluated to assess their impact on operator workload and situation awareness. Mental workload refers to the task demands placed on the human operator. Although there is still disagreement regarding the construct of mental workload, mental workload measurement (e.g., NASA-TLX [2]) has been well established [3]. M.J. Smith and G. Salvendy (Eds.): Human Interface, Part II, HCII 2009, LNCS 5618, pp. 865–874, 2009. © Springer-Verlag Berlin Heidelberg 2009
866
K.-P.L. Vu et al.
Intuitively, operators know what is meant by situation awareness: controllers refer to it as “having the picture;” and pilots have called it “staying ahead of the aircraft” [4]. Although situation awareness is an accepted construct, its definition has not been agreed upon. Endsley [5] defines situation awareness as “The perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future” (p. 36). This definition focuses on an operator’s mental processes rather than emphasizing the end state of “awareness” [6]. Durso et al. [7] indicated that emphasis should be placed on comprehension rather than just awareness because comprehension allows for the influence of both explicit and implicit knowledge. Moreover, other authors favor definitions that place more emphasis on the dynamic aspects of situation awareness [8]. This paper describes performance differences between students and experienced controllers in a simulation designed to examine the value of different situation awareness probe questions. The simulation used current-day radar displays, tools, and operations. Situation awareness probes were developed that varied in terms of time frame (immediate vs. future) and processing category (recall, comprehension, or subjective ratings) to uncover the dimensions of situation awareness relevant to the task. Analysis of the situation awareness probes are presented in Strybel et al.’s [9] paper. The present paper focuses on differences between students and experienced controllers in terms of workload and situation awareness. In air traffic management (ATM) research, active controllers are difficult to obtain. Therefore, many research participants are retired ATCs. If students can perform well on ATM tasks with some training, then they can be included in the pool of research participants for evaluating NextGen concepts and technologies.
2 Method 2.1 Participants Seven students who are training for careers in ATM at Mount San Antonio College (Mt SAC) and nine retired ATCs (6 TRACON and 3 ARTCC) participated in the simulation; see Table 1 for demographic information. The students completed a course in the Air Traffic Control Environment at Mt SAC, which includes topics of aircraft characteristics, air traffic procedures, and phraseology. In addition, they completed a 16-week ATC RADAR simulation course offered in the Center for the Study of Advanced Aeronautic Technologies at California State University Long Beach. Radar simulation training was provided with the Multi Aircraft Control System (MACS) software developed by the Airspace Operations Laboratory at NASA Ames Research Center [10]. This course met for 6 hours once a week. Students participated in simulation exercises that focused on ZID Sector 91, shown in Fig. 1. Students were trained in a radar environment to accomplish the tasks of descending arrivals and climbing departures while maintaining an efficient flow of en route traffic. They were given training relating to conflict recognition and resolution, conflict avoidance via structurally placing aircraft on segregated routes, understanding the difference between habitual traffic flows and actively assessing pairs of aircraft for conflicts. Thus, the students had extensive training in the simulation environment used in the present study.
Situation Awareness and Performance of Student
867
Of the experienced ATCs, TRACON controllers had an average of 19.5 years of line experience in SoCal TRACON, but had no previous experience with either ZID 91 or MACS. ARTCC ATCs had an average of 20 years of line experience, and had participated in previous experiments at NASA Ames using MACS and managing traffic in ZID 91. However, in those experiments, the controllers used advanced conflict detection and resolution tools that were not enabled in the present simulation. In sum, students had no line experience in ATM, but they had been trained extensively on MACS and ZID 91. Experienced ATCs had extensive line experience, but little experience with MACs and ZID 91 using current day tools. Table 1. Demographics for students and experienced ATCs. Experience was based on self report, using a 1-7 Likert scale: 1= no experience to 7 = very experienced.
ITEM Experience with ZID Experience with ZKC Experience with MACS software Experience with radar simulation Years of Military ATCo Exp Years of Civilian ATCo Exp
EXPERTISE GROUP Students ATCs 4.57 2.11 1.14 1.22 3.86 2.22 3.33 3.00 0 4 0 23
2.2 Design The simulation employed a 2 (Group: students vs. experienced ATCs) X 2 (Traffic Density: low vs. high) mixed factorial design. Group was a between-subjects variable. We collapsed the TRACON and en-route controllers into a single group for two reasons. First, the number of en-route controllers was low (N = 3) and second, there were little differences in performance between the experienced controllers on many of the variables examined. The dependent measures included performance, subjective workload, and accuracy and latency to situation awareness probes. Performance was assessed with the following sector performance variables: mean handoff time, standard deviation of handoff time, mean time per aircraft through sector, standard deviation of time through sector, number of aircraft through sector, and number of losses of separation. For all time-based measures (e.g., mean handoff time) inverse transformations were used to ensure normal distributions. 2.3 Apparatus Simulation environment. The entire simulation was run using the Multi Aircraft Control System (MACS). The MACS software is a medium fidelity simulation computer application that has the ability to simulate both ground- and air-side operations [10]. Two parallel simulation worlds were created for the ATCs and each world contained eight computers running the necessary simulation components. Each ATC station had a simulated RADAR screen of sector ZID 91 that mimicked current day ATC operations. The display was augmented with a probe window that was used to present situation awareness questions. A voice server station provided a voiceIP
868
K.-P.L. Vu et al.
communication system for the controller – pilot communications. All voice communications were recorded with Creative Media Player, and were later transcribed. All aircraft in the simulation were piloted by experimental confederates who initiated and responded to ATC transmissions.
Fig. 1. Illustration of the sectors and traffic flows used in the simulation
Scenario Development. Six different scenarios were created, three of which corresponded to the low traffic (50% of current day, 1x traffic density) and three to the high traffic (75% of current day, 1x traffic density) manipulation. SA Probe Question Development and Implementation Technique. Three information processing categories were created to reflect different components of situation awareness: subjective, recall, and comprehension. Subjective questions asked operators to rate the information being queried based on their own assessment of the situation. Recall questions are those in which the answers can be based on information in memory or looked up on the display if the operator was aware of where to look for the information. Comprehension questions were used to assess the operator’s understanding of the situation, and usually required the controller to derive the answer rather than recalling the item or looking it up on the display. Within categories, the questions were divided into two time frames, those reflecting information in the immediate past or present versus questions that required projection into the future, see Strybel et al. [9] for data regarding probe categories. All probe questions required closed-ended responses, being yes/no questions, questions requiring a numeric response (0-4 and 5+), or rating questions based on a Likert scale. Probe questions were administered using the SPAM technique developed by Durso and colleagues [11]; see Fig. 2 for probe administration and response sequence. First a “ready” prompt was presented, with the controller responding to this when his or her workload was low enough for them to accept a question. When the ready response was received, the question was presented in the probe window and controllers responded to the questions using pre-assigned response buttons on a configurable
Situation Awareness and Performance of Student
869
response panel (see Fig. 2). Probe questions were sent approximately every three minutes beginning four minutes into the scenario. Three configuration keys on the upper-right response panel were for experimenter use only. In the upper-left corner, there was a “READY” for probe question key. The remaining keys were for the possible responses that could be made to the questions, including one DK key for “Don’t Know.” The six keys on the bottom row of the panel allowed participants to respond to questions with a response of No or Yes, Likertscale responses (e.g., Very Unlikely to Very Likely) and the numbers of 0 through 5+.
Fig. 2. Illustration of probe question administration sequence and the display and control interfaces used for responding to probe questions
SART and NASA TLX. The Situation Awareness Rating Technique (SART) [12] was utilized to capture participants’ subjective SA experiences. The SART is a subjective measure which consists of nine scales which are categorized into three subscales of Understanding, Demand, and Supply. All of the SART scales function as seven-point scales, where 1 = “Low” and 7 = “High.” A combined SART score was used as an estimate of overall SA: SART-Combined = Mean Understanding Rating – (Mean Demand Rating – Mean Supply Rating) [12]. The NASA-Task Load Index (TLX) [2] was used to collect subjective assessments of workload. The TLX consists of six subscales: Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration. All dimensions of the TLX had a 15-point scale, where 0 = “Low” and 15 = “High.”
870
K.-P.L. Vu et al.
2.4 Procedure Training. All participants were trained on the first day of the simulation with 2 hours of classroom training and 3-4 hours of hands-on simulation training. In-class training consisted of a briefing on current day ATC operations and traffic flows in sector ZID 91. It also included information regarding how to interact with the MACS software, and the roles and responsibilities of the controllers. At the end of the training, participants were trained on the probe administration technique. Hands-on simulation training was conducted with two training scenarios (at least two replicates of each), until each controller was comfortable with the ATC task and the probe procedure. Experimental Trials. Day two consisted of the six experimental trials. During each 40-minute scenario, 12 probe questions were presented via the probe panel in about 3 minute intervals. When a scenario was complete, participants were given the NASATLX and SART. A 15-30 minute break was given after each scenario as well as a 1-hour lunch break. After completing all scenarios, participants filled out a post simulation questionnaire and were debriefed.
3 Results 3.1 Performance The participants engaged in their typical traffic management behavior and some differences in sector performance were observed between students and ATCs. All measures were submitted to separate 2 (Group: students vs. controllers) x 2 (Traffic Density: low vs. high) ANOVAs, with Group as a between-subjects variable. For mean handoff time, the main effects of group and traffic density were not significant, Fs < 1.0, but the interaction of the two variables was, F (1,15) = 7.99, p < .013, see Fig. 3. Students showed no difference in handoff time between low and high traffic scenarios, but ATCs took longer to handoff aircraft for high traffic level scenarios. This finding may be due to the fact that ATCs tended to be more cautious about when to handoff aircraft when traffic levels are high. In terms of the mean time per aircraft in the sector and time in sector variability, only a marginal effect of group was obtained for the latter measure. The variability in time for an aircraft to move through the sector tended to be longer (M = 203 s) for students than for ATCs (M = 186 s), F(1,15) = 3.90, p = .06. No other variables, including losses of separation, yielded group differences. The lack of group differences is likely due to the students receiving extensive sector-specific training prior to the simulation. In addition to examining sector performance variables, we examined voice transcriptions to determine whether there were group differences between operator behaviors on the number of commands given (separately for altitude, heading, and speed), number of queries made to the flight deck, number of corrections to pilot read-back errors, etc. However, none of these variables yielded significant group differences.
Situation Awareness and Performance of Student
Handoff Time (in sec)
400
871
Students Controllers
350 300 250 200 0
1 2 Low High Traffic Density
3
Fig. 3. Handoff Time as a Function of Traffic Density and Group
3.2 Situation Awareness Answers to SA Probes. Students and ATCs responded differently to the probe questions. Students answered a higher number of probes (M = 7.7 questions per scenario) than ATCs (M = 6.5 questions per scenario), t (94) = 3.28, p < .001. ATCs also ignored more “ready” prompts (M = 8% of probe questions per scenario) relative to students (M = 3%), t (94) = 2.32, p < .05. Furthermore, students left fewer unanswered questions after accepting the “ready” prompts (M = 0.4% of probes per scenario) compared to ATCs (M = 1.7%), t (94) = 1.80, p = .07. Thus, students seemed to be more compliant than ATCs in answering probe questions. Probe accuracy scores (proportion correct of answered probes) were submitted to a 2 (Group: students vs. ATCs) X 3 (Question category: recall vs. comprehension vs. subjective) X 2 (Question tense: immediate vs. future) X 2 (Traffic Density: low vs. high) mixed ANOVA. Accuracy to subjective questions was derived by comparing the participant’s answer to a standard provided derived by a retired ATC who has been working in the lab. Because we are primarily interested in ATC versus student performance, only the main effect of Group and its interaction with other factors will be reported. Although we report the original degrees of freedom, p-values reflect the Huynh-Feldt correction for violations of sphericity where appropriate. Moreover, all post hoc analyses were performed with a Bonferroni correction for multiple comparisons. A marginal main effect of group was obtained, F(1,14) = 3.86, p = .06. On average, students (M = 73%) were more accurate than ATCs (M = 66%) on the probe questions they answered. However, this effect was moderated by a marginally significant threeway interaction of Group, Question category and Question tense F(2,28) = 2.57, p = .09, as shown in Fig. 4. Both students and ATCs were more accurate for recall questions directed at future events than present events. Both groups showed less agreement with our standard on subjective assessments of future than present events. For comprehension questions, students were less accurate for future events compared with present events, but ATCs were more accurate for future events than present events. However, students were more accurate than ATCs for comprehension questions. Students may have been more accurate than ATCs overall because they tried harder to answer the probes (e.g., accepted more questions, had fewer time outs, and left fewer questions unanswered).
872
K.-P.L. Vu et al.
Percent Correct
1.00 0.75
Present Future
0.50 0.25 0.00
Student ATCo Recall
Student ATCo Comprehension
Student ATCo Subjective
Question Category
Fig. 4. Accuracy of probe questions that were answered as a function of group, question category, and question tense
Latencies to SA Probes. Latencies to correct answers on probe questions was submitted to a similar mixed ANOVA as probe accuracy. However, there was no significant effect of Group, and Group did not interact with any other factors. Subjective Situation Awareness. SART composite scores were analyzed with a 2 (Group: students vs. ATCs) X 2 (Traffic Density: low vs. high) mixed ANOVA. The SART composite score yielded a significant traffic density x group interaction, F (1, 14) = 4.60, p < .05. Specifically, students reported having more SA when traffic density was low (M = 7.0) than when it was high (M = 5.2), p < .01. Similarly, ATCs reported having more SA when traffic density was low (M = 6.9) compared to when it wash high (M = 6.0), p < .05. The main difference was that the students were more affected by scenario difficulty than were the ATCs. 3.3 Workload NASA-TLX. Six mixed ANOVAs on each scale of the TLX were conducted. For all analyses, only the main effects of traffic density were statistically significant, Fs (1, 14) > 27.58, ps < .001. The high density scenarios were rated higher in workload than the low density scenarios, see Table 2. Table 2. Mean TLX workload ratings for the low and high traffic density conditions
Workload Dimension Mental Demand Physical Demand Temporal Demand Performance Effort Frustration
Low Density 7.70 4.35 6.14 4.51 7.82 3.78
High Density 12.33 8.24 10.66 8.50 12.15 7.88
Situation Awareness and Performance of Student
873
4 Discussion Highly trained students did not differ much from ATCs on the sector performance variables measured in this simulation. This finding is likely due to the intense, sectorspecific simulation training given to the students rather than overall air traffic control abilities between the two groups. Although both students and ATCs indicated that workload was higher and situation awareness was lower in the hard than easy scenarios, students reported being more negatively effected by scenario difficulty. In terms of situation awareness as measured by probe question accuracy, students were more accurate overall than ATCs. There are two possible reasons for this finding. The first is that sector-specific knowledge is important for situation awareness. Because students had more training with traffic flows and sector-specific characteristic, they were able to maintain more awareness of the information in the sector, or had more knowledge about where to obtain this information. Second, the students showed more compliance in answering probe questions compared to ATCs, which could make them more motivated to answer the questions correctly than ATCs. ATCs were more willing to abandon the answering of questions compared to students. Since the scenario was never frozen during probe administration, there was a possibility that critical events could appear after the controller accepts the “ready” prompt, but before the controller answered the questions. In those cases, ATCs gave more priority to the air traffic management task than to answering the probe questions, which led to more questions being abandoned compared to students. For comprehension questions, students were less accurate on questions about future events compared to present events, but ATCs showed higher accuracy for future than present events. These findings are consistent with the observation that good controllers are able to anticipate future events [13]. In general, the present simulation showed that sector-specific knowledge is very important at least in some measures of performance and situation awareness. Thus, researchers should make sure that participants are adequately trained on the specific roles and responsibilities being evaluated, and that critical aspects of experimental procedures that are not typical standard controller tasks be emphasized. In ATM research, it is difficult to recruit current FAA employees as participants. This study shows that students, with some training and experience, can perform well in ATM tasks, which allows them to be an additional source of research participants for evaluating ATM concepts in general and NextGen concepts in particular. Acknowledgements. This simulation was partially supported by NASA cooperative agreement NNA06CN30A.
References 1. JPDO, Concept of Operations for the Next Generation Air Transportation System, V2.0 (June 2007) 2. Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (eds.), pp. 139–183. North-Holland, Amsterdam (1988)
874
K.-P.L. Vu et al.
3. Pickup, L., Wilson, J.R., Sharpies, S., Norris, B., et al.: Fundamental examination of mental workload in the rail industry. Theor. Issues in Ergon. Sci. 6, 463–482 (2007) 4. Jeannott, J.: Situation Awareness: Synthesis of Literature Search. EEC Note 16/00, Eurocontrol Experimentale Center (2000) 5. Endsley, M.R.: Measurement of situation awareness in dynamic systems. Human Factors 37(1), 65–84 (1995) 6. Banbury, S., Tremblay, S.: A cognitive approach to situation awareness: Theory and application. Ashgate, Farnham (2004) 7. Durso, F.T., Rawson, K.A., Girotto, S.: Comprehension and situation awareness. In: Handbook of applied cognition, 2nd edn., FT Durso, pp. 163–193. Wiley, Hoboken NJ (2007) 8. Salmon, P.M., Stanton, N.A., Walker, G.H., Baber, C., Jenkins, D.P., McMaster, R., Young, M.S.: What really is going on? Review of situation awareness models for individuals and teams. Theor. Issues in Ergon. Sci. 9, 297–323 (2008) 9. Strybel, T.S., Minakata, K., Nguyen, J., Pierce, R., Vu, K.-P.L.: Optimizing online situation awareness probes in air traffic management tasks. In: Smith, M.J., Salvendy, G. (eds.) Human Interface, Part II, HCII 2009. LNCS, vol. 5618, pp. 865–874. Springer, Heidelberg (2009) 10. Prevot, T.: Exploring the many perspectives of distributed air traffic management: The multi aircraft control system MACS. In: International Conf. on Human-Computer Interaction in Aeronautics, HCI-Aero 2002, October 23–25. MIT, Cambridge (2002) 11. Durso, F.T., Bleckley, M.K., Dattel, A.R.: Does SA add to the validity of cognitive tests? Hum. Factors 48, 721–733 (2006) 12. Taylor, R.M.: Situational awareness rating technique (SART): The development of a tool for aircrew systems design. Situational Awareness in Aerospace Operations, AGARD-CP478 (1990) 13. D’Arcy, J.-F., Della Rocco, P.S.: Air Traffic Control Specialist Decision Making and Strategic Planning. National Technical Information Service, Spring Field (2001)
Author Index
Akiba, Takayuki 375, 423 Alberton, Yael 679 Allamraju, Sri Harsha 105 Ando, Akinobu 621 Anse, Michiko 3 Aoyama, Hisae 758 Ayodele, Taiwo 114 Bae, Guntae 221 Bae, Sangtae 124 Baek, Seung Ik 355 Banerjee, Suman 124 Bannat, Alexander 708 Battiste, Vernol 738, 865 Behal, Amit 229 Bekiaris, Evangelos 385 Bellissimo, Joseph 776 Berger, Arne 239 Berson, Barry 776 Blum, Rainer 74 Bostian, Charles W. 797 Brandt, Summer L. 738 Brooks, Laurence 26 B¨ unnig, Christian 131 Byun, Hyeran 221 Cahill, Joan 806 Cai, Weijia 229 Chaillou, Christophe 835 Chan, David L. 365 Chang, IlKu 458 Chang, Jo-Ling 729 Chang, Pei-Chann 140 Chen, Sherry 26 Chen, Ying 229 Cheong, Yun-Gyung 185 Choo, Hyunseung 149, 448 Chou, Chien-Chang 140 Chun, Robert 105 Chung, Jinwook 124 Cipolla Ficarra, Francisco V. 249 Cipolla Ficarra, Miguel 249 Dao, Arik-Quang V. 738 Dausinger, Moritz 708
Degrande, Samuel 835 Di Marco, Patrizia 259 Di Mascio, Tania 259, 269 Duffy, Vincent G. 559 Dwyer, John P. 748 Eibl, Maximilian 239 Eshet-Alkalai, Yoram 679 Faber, Niels R. 660 Fagerstrøm, Asle 10 Fang, Xiaowen 632 Fanjoy, Richard O. 766 Feyen, Robert G. 766 Firpo, Daniel 45 Frigioni, Daniele 259 Fu, Fong-Ling 17 Fujioka, Ryosuke 375, 423 Fujita, Patricia Lopes 489 Furukawa, Akihisa 497 Furuta, Kazuo 758 Gao, Jie 277 Gast, J¨ urgen 708 Gastaldi, Massimo 259 Ghinea, Gheorghita 10 Giakoumis, Dimitrios 385 Giulianelli, Daniel A. 249 Gong, Yang 503 Grace, Julia 229 Ha, Kil-Ram 403 Ha, Sungdo 159 Hada, Yoshiaki 642 Han, Manchul 159 Han, Sang Yong 295 Hanibuchi, Shumpei 513 Hasegawa, Satoshi 395, 430, 476 Hassapis, George 385 Hattori, Fumio 210 Hayakawa, Eiichi 325 Hirasawa, Naotake 529 Hiremath, Vishal 766 Ho, Edward K.S. 365 Ho, Nhut Tan 776 Hong, Kwang-Seok 403
876
Author Index
Horiguchi, Yukio 594 Hruska, Andreas 689 Huang, Zhao 26 Hui, Zhou 287 Iijima, Tadashi 84 Isa´ıas, Pedro 566 Ito, Kouichi 594 Ito, Kyoko 513 Ito, Yoshiaki 549 Izumiya, Akira 522, 586 Jacobson, David 786 Johnson, Nancy 816 Johnson, Walter W. 738, 816 Jung, Hanmin 36 Kang, Hyunjoo 458 Kanno, Taro 758 Karashima, Mitsuhiko 529 Karikawa, Daisuke 758 Kasemvilas, Sumonta 45 Kato, Linda 229 Katsukura, Makoto 549 Kay, Alison 806 Keating, Sharon 806 Kehagias, Dionisis 385 Khakzar, Karim 74 Khusainov, Rinat 114 Kim, Gunhee 159 Kim, Laehyun 159 Kim, Moonseong 149 Kim, Seok Kyoo 295 Kim, Yeojin 458 Kimura, Masaomi 539, 576 Kobayashi, Daiji 855 Kondo, Toshiyuki 413, 439 Koo, Jahwan 124, 448 Kountchev, Roumen 304 Kountcheva, Roumiana 304 Kraut, Josh 865 Kukula, Eric P. 168 Kuramoto, Itaru 468 K¨ ursten, Jens 239 Kushiro, Noriyuki 549 Kwak, Sooyeong 221 Kwon, Gyu H. 797 Lachter, Joel 816 Landry, Steven 748 Lanzoni, Cristine 604
Lay, Yun-Long 94 Lee, Byung Cheol 559 Lee, Chihoon 124 Lee, Doohyung 124 Lee, Hyo-Haeng 403 Lee, Jaehyung 448 Lee, Juyeon 458 Lee, Mi-Kyoung 36 Leong, Hong Va 365 Leva, Maria Chiara 806 Liao, Huafei 729 Ligda, Sarah V. 816 Lim, John 650 Lin, Darcy 55 Lin, Ya-Li 55 Liu, Na 650 Liu, Shixia 229 Losa, Gabriel 806 Louis-dit-Picard, St´ephane Luk, Robert W.P. 365
835
Macedo, Mario 566 Martin, Patrick 776 M˘ aru¸ster, Laura 660 Matsak, Erika 178 Matsuda, Kazuo 609 Matsunuma, Shohei 395, 430, 476 McDonald, Nick 786, 806 Milanova, Mariofanna 304 Min, Wook-Hee 185 Minakata, Katsumi 845, 865 Mior Ibrahim, Emma Nuraihan 65 Misue, Kazuo 277, 342 Miyao, Masaru 395, 430, 476 Moon, Sung Hyun 295 Morimoto, Kazunari 621 Murata, Kazuyoshi 468 Musyck, Bernard 786 Mutka, Matt W. 149 Nabeta, Keita 539, 576 Nagatomo, Keiichiro 468 Nakamura, Yoshiki 669 Nakanishi, Hiroaki 594 Nakata, Masanori 549 Nakatani, Mie 826 Nguyen, Jimmy 845, 865 Nishida, Shogo 513, 826 Nishihara, Yoko 315 Nishino, Yosuke 325
Author Index Noor, Nor Laila Md 65, 334 Nordin, Ariza 334 Nozawa, Takayuki 413, 439 Ohkura, Michiko 522, 539, 576, 586 Okada, Hidehiko 375, 423 Okada, Kazuhiro 539 Okada, Yusaku 497 Omori, Masako 395, 430, 476 Ootsuki, Yoshitaka 586 Orii, Yuki 439 Padovani, Stephania 604 Paik, Seung Kuk 355 Park, Hyunchul 159 Park, Jukyung 159 Park, Jun 295 Park, Minu 448 Park, Sehyung 159 Perini, Anna 269 Pierce, Russell 845 Precel, Karen 679 Proctor, Robert W. 168, 766 Qian, Weihong
229
Raza, Hamzah 865 Rehrl, Tobias 708 Reichl, Franz 689 Rhee, Youngho 458 Rigoll, Gerhard 201, 708 Ringard, Jeremy 835 Rubin, Stuart 304 Rupprecht, Dominik 74 Ruske, G¨ unther 201 Sabatucci, Luca 269 Saga, Ryosuke 192 Sakamoto, Katsuhiro 669 Sakata, Nobuchika 826 Sakurai, Akito 84 Sato, Keita 315 Sawamoto, Jun 609 Sawaragi, Tetsuo 594 Schw¨ arzler, Stefan 201 Serradas, Diogo 806 Shibuya, Yu 468 Shinohara, Masanori 642 Shiraishi, Kousuke 342 Sigenaga, Naoko 513 Smith-Jackson, Tonya L. 797
877
Soraji, Yusuke 758 Spinillo, Carla 604 Spinillo, Carla Galv˜ ao 489 Strybel, Thomas 738 Strybel, Thomas Z. 845, 865 Su, Chiu-Hung 17 Sunayama, Wataru 315 Sung, Won-Kyung 36 Susi, Angelo 269 Suzuki, Daisuke 826 Tabata, Kuniaki 192 Tabe, Tsutomu 3 Takada, Kenji 513 Takahashi, Makoto 758 Takahashi, Yuichi 855 Takami, Ai 513 Takao, Shinji 84 Tanaka, Jiro 277, 342 Terawaki, Yuki 699 Todorov, Vladimir 304 Tsuchiya, Fumito 522, 539, 576, 586 Tsuji, Hiroshi 192 Tsujino, Yoshihiro 468 Tzovaras, Dimitrios 385 Uematsu, Setsuko
513
van Haren, Rob J. 660 Vu, Kim-Phuong L. 738, 845, 865 Wallhoff, Frank 201, 708 Wang, Chian 716 Watanabe, Tomoyuki 395, 430, 476 WenJun, Hou 287 Yagi, Masakazu 513 Yajima, Hiroshi 609 Yamamoto, Sakae 855 Yang, Hui-Jen 94 Yoo, Weon Sang 355 Yoshida, Kenichi 210 You, Beom-Jong 36 Young, John P. 766 Zainuddin, Ahmad 334 Zeng, Liang 729 Zhang, Jiajie 503 Zhao, Fan 632 Zhong, Yingqin 650 Zhou, Shikun 114